Protobuf is a bit different from parquet and other formats because you would need some way to associate which proto format to associate with each file.
In 30 seconds, I don't see an easy way to do that in the current SQL syntax. Does somebody else have a good idea for this? On Fri, Oct 2, 2015 at 7:17 AM, Jim Scott <[email protected]> wrote: > John, > > You may want to ask this question on the dev list as well. > > I think, logically, this could be accomplished similar to the httpd log > parsing plugin that has recently been worked on. That plugin works by > specifying the apache log format pattern. While a proto definition is much > more complicated, it is a fundamentally similar approach. > > I've not really seen much in the way of discussion around protobuf data > files being added to Drill, so, not sure about the general interest level. > > Regarding a way to batch convert: I found this project in a quick search > for converting protobuf to json, as that would be where I would go with > it... https://github.com/dpp-name/protobuf-json > > Looks like that would do the trick for you in short order. > > Jim > > On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote: > > > I am looking at trying to make use a of a large collection of Protobuf > > files. We have the schema definition, and at this time I understand that > > Drill does not have a reader for Protobuf files. > > > > > > *Disclosure: I am not a strong developer, thus me asking the questions. > > > > 1. What is the difficulty in creating a plugin for Drill that could read > > these files "natively" like is done with Parquet. From the little > > information I've been able to grok, it would require specifying a file > with > > Schema information, but beyond getting the schema, what other challenges > > are inherent to Protobufs? > > > > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a > > performant way? Without the ability to write a storage plugin, this > could > > work for me, I'd like to "limit" ETL, but at the same time, I'd like to > "at > > scale" make use of these files. > > > > 3. Any other thoughts, projects, examples, that may help me in my quest > > here? > > > > I wish I had a better grasp of the data challenges between formats and > how > > Drill works, but alas, I will just post out my ignorance with the goals > of > > solving my problem, and hopefully getting smarter in the process. > > > > John > > > > > > -- > *Jim Scott* > Director, Enterprise Strategy & Architecture > +1 (347) 746-9281 > @kingmesal <https://twitter.com/kingmesal> > > <http://www.mapr.com/> > [image: MapR Technologies] <http://www.mapr.com> > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >
