I am looking at trying to make use a of a large collection of Protobuf files. We have the schema definition, and at this time I understand that Drill does not have a reader for Protobuf files.
*Disclosure: I am not a strong developer, thus me asking the questions. 1. What is the difficulty in creating a plugin for Drill that could read these files "natively" like is done with Parquet. From the little information I've been able to grok, it would require specifying a file with Schema information, but beyond getting the schema, what other challenges are inherent to Protobufs? 2. Is there a turn key way to convert Protobufs to Avro or Parquet in a performant way? Without the ability to write a storage plugin, this could work for me, I'd like to "limit" ETL, but at the same time, I'd like to "at scale" make use of these files. 3. Any other thoughts, projects, examples, that may help me in my quest here? I wish I had a better grasp of the data challenges between formats and how Drill works, but alas, I will just post out my ignorance with the goals of solving my problem, and hopefully getting smarter in the process. John
