Hello everyone, I recently presented a talk at the ASF DC Roadshow (shameless plug[1] ) but heard a really good talk by a PMC member for the Apache Daffodil (incubating) project. At its core, Daffodil is a collection of parsers which convert various data formats to a standard structure which can then be ingested into other tools. Some of these formats Drill already can ingest natively such as PCAP, CSV however many cannot such as NACHA (bulk financial transactions), vCard, Shapefile, and many more. Here is a brief presentation about Daffodil [2].
The DFDLSchemas github has a handful of DFDL schemas that are pretty good open source examples[3]. On a related note, I stumbled on the Kaitai struct library[4] which is another library which performs a similar function to Daffodil. Would it be of interest for the community to incorporate these libraries into Drill? My thought is that it would greatly increase the types of data that Drill can natively query and hence seriously increase Drill’s usefulness. If there is interest, (and honestly even if there isn’t) I can start working on this for the next release of Drill. [1]: https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill <https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill> [2]: https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615 <https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615> [3]: https://github.com/DFDLSchemas <https://github.com/DFDLSchemas> [4]: http://formats.kaitai.io <http://formats.kaitai.io/>
