John,

You may want to ask this question on the dev list as well.

I think, logically, this could be accomplished similar to the httpd log
parsing plugin that has recently been worked on. That plugin works by
specifying the apache log format pattern. While a proto definition is much
more complicated, it is a fundamentally similar approach.

I've not really seen much in the way of discussion around protobuf data
files being added to Drill, so, not sure about the general interest level.

Regarding a way to batch convert: I found this project in a quick search
for converting protobuf to json, as that would be where I would go with
it... https://github.com/dpp-name/protobuf-json

Looks like that would do the trick for you in short order.

Jim

On Wed, Sep 30, 2015 at 10:07 AM, John Omernik <[email protected]> wrote:

> I am looking at trying to make use a of a large collection of Protobuf
> files. We have the schema definition, and at this time I understand that
> Drill does not have a reader for Protobuf files.
>
>
> *Disclosure: I am not a strong developer, thus me asking the questions.
>
> 1. What is the difficulty in creating a plugin for Drill that could read
> these files "natively" like is done with Parquet. From the little
> information I've been able to grok, it would require specifying a file with
> Schema information, but beyond getting the schema, what other challenges
> are inherent to Protobufs?
>
> 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a
> performant way?  Without the ability to write a storage plugin, this could
> work for me, I'd like to "limit" ETL, but at the same time, I'd like to "at
> scale" make use of these files.
>
> 3. Any other thoughts, projects, examples, that may help me in my quest
> here?
>
> I wish I had a better grasp of the data challenges between formats and how
> Drill works, but alas, I will just post out my ignorance with the goals of
> solving my problem, and hopefully getting smarter in the process.
>
> John
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281
@kingmesal <https://twitter.com/kingmesal>

<http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Reply via email to