You could create a file format with an extension, which points to a proto definition file.
Logistically the same as the custom delimiter capability. .psv is for pipes, .csv is for commas... .mydata.proto could use a proto definition. On Fri, Oct 2, 2015 at 11:18 PM, Ted Dunning <[email protected]> wrote: > Protobuf is a bit different from parquet and other formats because you > would need some way to associate which proto format to associate with each > file. > > In 30 seconds, I don't see an easy way to do that in the current SQL > syntax. > > Does somebody else have a good idea for this? > > > On Fri, Oct 2, 2015 at 7:17 AM, Jim Scott <[email protected]> wrote: > > > John, > > > > You may want to ask this question on the dev list as well. > > > > I think, logically, this could be accomplished similar to the httpd log > > parsing plugin that has recently been worked on. That plugin works by > > specifying the apache log format pattern. While a proto definition is > much > > more complicated, it is a fundamentally similar approach. > > > > I've not really seen much in the way of discussion around protobuf data > > files being added to Drill, so, not sure about the general interest > level. > > > > Regarding a way to batch convert: I found this project in a quick search > > for converting protobuf to json, as that would be where I would go with > > it... https://github.com/dpp-name/protobuf-json > > > > Looks like that would do the trick for you in short order. > > > > Jim > > > > On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote: > > > > > I am looking at trying to make use a of a large collection of Protobuf > > > files. We have the schema definition, and at this time I understand > that > > > Drill does not have a reader for Protobuf files. > > > > > > > > > *Disclosure: I am not a strong developer, thus me asking the questions. > > > > > > 1. What is the difficulty in creating a plugin for Drill that could > read > > > these files "natively" like is done with Parquet. From the little > > > information I've been able to grok, it would require specifying a file > > with > > > Schema information, but beyond getting the schema, what other > challenges > > > are inherent to Protobufs? > > > > > > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a > > > performant way? Without the ability to write a storage plugin, this > > could > > > work for me, I'd like to "limit" ETL, but at the same time, I'd like to > > "at > > > scale" make use of these files. > > > > > > 3. Any other thoughts, projects, examples, that may help me in my quest > > > here? > > > > > > I wish I had a better grasp of the data challenges between formats and > > how > > > Drill works, but alas, I will just post out my ignorance with the goals > > of > > > solving my problem, and hopefully getting smarter in the process. > > > > > > John > > > >
