It could be done at the format plugin level today. We were just about to
propose a SQL from t with options or similar syntax. You could look at the
httpd plugin example to see how the first part could work until the new
syntax is supported.
On Oct 2, 2015 9:19 PM, "Ted Dunning" <[email protected]> wrote:

> Protobuf is a bit different from parquet and other formats because you
> would need some way to associate which proto format to associate with each
> file.
>
> In 30 seconds, I don't see an easy way to do that in the current SQL
> syntax.
>
> Does somebody else have a good idea for this?
>
>
> On Fri, Oct 2, 2015 at 7:17 AM, Jim Scott <[email protected]> wrote:
>
> > John,
> >
> > You may want to ask this question on the dev list as well.
> >
> > I think, logically, this could be accomplished similar to the httpd log
> > parsing plugin that has recently been worked on. That plugin works by
> > specifying the apache log format pattern. While a proto definition is
> much
> > more complicated, it is a fundamentally similar approach.
> >
> > I've not really seen much in the way of discussion around protobuf data
> > files being added to Drill, so, not sure about the general interest
> level.
> >
> > Regarding a way to batch convert: I found this project in a quick search
> > for converting protobuf to json, as that would be where I would go with
> > it... https://github.com/dpp-name/protobuf-json
> >
> > Looks like that would do the trick for you in short order.
> >
> > Jim
> >
> > On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote:
> >
> > > I am looking at trying to make use a of a large collection of Protobuf
> > > files. We have the schema definition, and at this time I understand
> that
> > > Drill does not have a reader for Protobuf files.
> > >
> > >
> > > *Disclosure: I am not a strong developer, thus me asking the questions.
> > >
> > > 1. What is the difficulty in creating a plugin for Drill that could
> read
> > > these files "natively" like is done with Parquet. From the little
> > > information I've been able to grok, it would require specifying a file
> > with
> > > Schema information, but beyond getting the schema, what other
> challenges
> > > are inherent to Protobufs?
> > >
> > > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a
> > > performant way?  Without the ability to write a storage plugin, this
> > could
> > > work for me, I'd like to "limit" ETL, but at the same time, I'd like to
> > "at
> > > scale" make use of these files.
> > >
> > > 3. Any other thoughts, projects, examples, that may help me in my quest
> > > here?
> > >
> > > I wish I had a better grasp of the data challenges between formats and
> > how
> > > Drill works, but alas, I will just post out my ignorance with the goals
> > of
> > > solving my problem, and hopefully getting smarter in the process.
> > >
> > > John
> > >
> >
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> > +1 (347) 746-9281
> > @kingmesal <https://twitter.com/kingmesal>
> >
> > <http://www.mapr.com/>
> > [image: MapR Technologies] <http://www.mapr.com>
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>

Reply via email to