I am looking at trying to make use a of a large collection of Protobuf
files. We have the schema definition, and at this time I understand that
Drill does not have a reader for Protobuf files.


*Disclosure: I am not a strong developer, thus me asking the questions.

1. What is the difficulty in creating a plugin for Drill that could read
these files "natively" like is done with Parquet. From the little
information I've been able to grok, it would require specifying a file with
Schema information, but beyond getting the schema, what other challenges
are inherent to Protobufs?

2. Is there a turn key way to convert Protobufs to Avro or Parquet in a
performant way?  Without the ability to write a storage plugin, this could
work for me, I'd like to "limit" ETL, but at the same time, I'd like to "at
scale" make use of these files.

3. Any other thoughts, projects, examples, that may help me in my quest
here?

I wish I had a better grasp of the data challenges between formats and how
Drill works, but alas, I will just post out my ignorance with the goals of
solving my problem, and hopefully getting smarter in the process.

John

Reply via email to