This is cool. I haven't taken a look yet but I will. Thanks!
-- Jacques Nadeau CTO and Co-Founder, Dremio On Thu, Nov 12, 2015 at 2:35 AM, Magnus Pierre <[email protected]> wrote: > Hello Drill Users, > A few weeks ago i had the pleasure of writing a small SAX parser for XML > that I was using in Storm to convert XML to JSON. > Later I decided this would be great to put into Drill, mostly as a work > around for the fact that Drill does not have native support for XML yet. > With some great help from Jacques basically pointing out how-to integrate > it, I integrated it in just a few minutes last week. > > If you are interested in testing it and maybe even contribute in fixing > some of the issues, please let me know. The code is here: > https://github.com/magpierre/drill/tree/DRILL-3878 < > https://github.com/magpierre/drill/tree/DRILL-3878> > Please observe that I am not an engineer, only an SE so please refrain > from laughing when reading the code ok? :) > > The biggest issue I have uncovered so far is that I am building a more > complex JSON than necessary, making drill not like the JSON provided. > Please also observe that I had to do a minor change in JSONRecordReader in > order for this to work (had to change some variables from private to > protected) and therefore this code will not work on a standard Apache Drill > build. > > In order to use it you put the jar created when building the project into > the jars folder in the newly built drill environment, and add this to dfs: > > , > "xml": { > "type": "xml", > "keepPrefix": false > } > > The keepPrefix thing is to be able to filter away all the namespace things > on tags if you want. > > Then add a workspace such as: > > , > "xmls": { > "location": ”/directory/XML", > "writable": false, > "defaultInputFormat": "xml" > } > > > Regards, > Magnus
