You could create a file format with an extension, which points to a proto
definition file.

Logistically the same as the custom delimiter capability. .psv is for
pipes, .csv is for commas... .mydata.proto could use a proto definition.


On Fri, Oct 2, 2015 at 11:18 PM, Ted Dunning <[email protected]> wrote:

> Protobuf is a bit different from parquet and other formats because you
> would need some way to associate which proto format to associate with each
> file.
>
> In 30 seconds, I don't see an easy way to do that in the current SQL
> syntax.
>
> Does somebody else have a good idea for this?
>
>
> On Fri, Oct 2, 2015 at 7:17 AM, Jim Scott <[email protected]> wrote:
>
> > John,
> >
> > You may want to ask this question on the dev list as well.
> >
> > I think, logically, this could be accomplished similar to the httpd log
> > parsing plugin that has recently been worked on. That plugin works by
> > specifying the apache log format pattern. While a proto definition is
> much
> > more complicated, it is a fundamentally similar approach.
> >
> > I've not really seen much in the way of discussion around protobuf data
> > files being added to Drill, so, not sure about the general interest
> level.
> >
> > Regarding a way to batch convert: I found this project in a quick search
> > for converting protobuf to json, as that would be where I would go with
> > it... https://github.com/dpp-name/protobuf-json
> >
> > Looks like that would do the trick for you in short order.
> >
> > Jim
> >
> > On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote:
> >
> > > I am looking at trying to make use a of a large collection of Protobuf
> > > files. We have the schema definition, and at this time I understand
> that
> > > Drill does not have a reader for Protobuf files.
> > >
> > >
> > > *Disclosure: I am not a strong developer, thus me asking the questions.
> > >
> > > 1. What is the difficulty in creating a plugin for Drill that could
> read
> > > these files "natively" like is done with Parquet. From the little
> > > information I've been able to grok, it would require specifying a file
> > with
> > > Schema information, but beyond getting the schema, what other
> challenges
> > > are inherent to Protobufs?
> > >
> > > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a
> > > performant way?  Without the ability to write a storage plugin, this
> > could
> > > work for me, I'd like to "limit" ETL, but at the same time, I'd like to
> > "at
> > > scale" make use of these files.
> > >
> > > 3. Any other thoughts, projects, examples, that may help me in my quest
> > > here?
> > >
> > > I wish I had a better grasp of the data challenges between formats and
> > how
> > > Drill works, but alas, I will just post out my ignorance with the goals
> > of
> > > solving my problem, and hopefully getting smarter in the process.
> > >
> > > John
> > >
>

Reply via email to