Protobuf is a bit different from parquet and other formats because you
would need some way to associate which proto format to associate with each
file.

In 30 seconds, I don't see an easy way to do that in the current SQL syntax.

Does somebody else have a good idea for this?


On Fri, Oct 2, 2015 at 7:17 AM, Jim Scott <[email protected]> wrote:

> John,
>
> You may want to ask this question on the dev list as well.
>
> I think, logically, this could be accomplished similar to the httpd log
> parsing plugin that has recently been worked on. That plugin works by
> specifying the apache log format pattern. While a proto definition is much
> more complicated, it is a fundamentally similar approach.
>
> I've not really seen much in the way of discussion around protobuf data
> files being added to Drill, so, not sure about the general interest level.
>
> Regarding a way to batch convert: I found this project in a quick search
> for converting protobuf to json, as that would be where I would go with
> it... https://github.com/dpp-name/protobuf-json
>
> Looks like that would do the trick for you in short order.
>
> Jim
>
> On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote:
>
> > I am looking at trying to make use a of a large collection of Protobuf
> > files. We have the schema definition, and at this time I understand that
> > Drill does not have a reader for Protobuf files.
> >
> >
> > *Disclosure: I am not a strong developer, thus me asking the questions.
> >
> > 1. What is the difficulty in creating a plugin for Drill that could read
> > these files "natively" like is done with Parquet. From the little
> > information I've been able to grok, it would require specifying a file
> with
> > Schema information, but beyond getting the schema, what other challenges
> > are inherent to Protobufs?
> >
> > 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a
> > performant way?  Without the ability to write a storage plugin, this
> could
> > work for me, I'd like to "limit" ETL, but at the same time, I'd like to
> "at
> > scale" make use of these files.
> >
> > 3. Any other thoughts, projects, examples, that may help me in my quest
> > here?
> >
> > I wish I had a better grasp of the data challenges between formats and
> how
> > Drill works, but alas, I will just post out my ignorance with the goals
> of
> > solving my problem, and hopefully getting smarter in the process.
> >
> > John
> >
>
>
>
> --
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
> @kingmesal <https://twitter.com/kingmesal>
>
> <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Reply via email to