John, You may want to ask this question on the dev list as well.
I think, logically, this could be accomplished similar to the httpd log parsing plugin that has recently been worked on. That plugin works by specifying the apache log format pattern. While a proto definition is much more complicated, it is a fundamentally similar approach. I've not really seen much in the way of discussion around protobuf data files being added to Drill, so, not sure about the general interest level. Regarding a way to batch convert: I found this project in a quick search for converting protobuf to json, as that would be where I would go with it... https://github.com/dpp-name/protobuf-json Looks like that would do the trick for you in short order. Jim On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote: > I am looking at trying to make use a of a large collection of Protobuf > files. We have the schema definition, and at this time I understand that > Drill does not have a reader for Protobuf files. > > > *Disclosure: I am not a strong developer, thus me asking the questions. > > 1. What is the difficulty in creating a plugin for Drill that could read > these files "natively" like is done with Parquet. From the little > information I've been able to grok, it would require specifying a file with > Schema information, but beyond getting the schema, what other challenges > are inherent to Protobufs? > > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a > performant way? Without the ability to write a storage plugin, this could > work for me, I'd like to "limit" ETL, but at the same time, I'd like to "at > scale" make use of these files. > > 3. Any other thoughts, projects, examples, that may help me in my quest > here? > > I wish I had a better grasp of the data challenges between formats and how > Drill works, but alas, I will just post out my ignorance with the goals of > solving my problem, and hopefully getting smarter in the process. > > John > -- *Jim Scott* Director, Enterprise Strategy & Architecture +1 (347) 746-9281 @kingmesal <https://twitter.com/kingmesal> <http://www.mapr.com/> [image: MapR Technologies] <http://www.mapr.com> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
