Hello everyone, 
I recently presented a talk at the ASF DC Roadshow (shameless plug[1] ) but 
heard a really good talk by a PMC member for the Apache Daffodil (incubating) 
project.  At its core, Daffodil is a collection of parsers which convert 
various data formats to a standard structure which can then be ingested into 
other tools.   Some of these formats Drill already can ingest natively such as 
PCAP, CSV however many cannot such as NACHA (bulk financial transactions), 
vCard, Shapefile, and many more.  Here is a brief presentation about Daffodil 
[2].  

The DFDLSchemas github has a handful of DFDL schemas that are pretty good open 
source examples[3].  

On a related note, I stumbled on the Kaitai struct library[4] which is another 
library which performs a similar function to Daffodil.  Would it be of interest 
for the community to incorporate these libraries into Drill?  My thought is 
that it would greatly increase the types of data that Drill can natively query 
and hence seriously increase Drill’s usefulness.  If there is interest, (and 
honestly even if there isn’t) I can start working on this for the next release 
of Drill.


[1]: 
https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill
 
<https://www.slideshare.net/cgivre/drilling-cyber-security-data-with-apache-drill>
[2]: 
https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615
 
<https://www.slideshare.net/mbeckerle/tresys-dfdl-data-format-description-language-daffodil-open-source-public-overview-100432615>
[3]: https://github.com/DFDLSchemas <https://github.com/DFDLSchemas>
[4]: http://formats.kaitai.io <http://formats.kaitai.io/>

Reply via email to