Very cool. Thanks a lot. Recently I have received a request to process some xml files and make the results available in a datamart. I thin I will give it a try and see if I can process it adequately.
Greetings, Uwe -----Original Message----- From: Jacques Nadeau [mailto:[email protected]] Sent: Montag, 16. November 2015 18:41 To: user Subject: Re: XML in Apache Drill This is cool. I haven't taken a look yet but I will. Thanks! -- Jacques Nadeau CTO and Co-Founder, Dremio On Thu, Nov 12, 2015 at 2:35 AM, Magnus Pierre <[email protected]> wrote: > Hello Drill Users, > A few weeks ago i had the pleasure of writing a small SAX parser for > XML that I was using in Storm to convert XML to JSON. > Later I decided this would be great to put into Drill, mostly as a > work around for the fact that Drill does not have native support for XML yet. > With some great help from Jacques basically pointing out how-to > integrate it, I integrated it in just a few minutes last week. > > If you are interested in testing it and maybe even contribute in > fixing some of the issues, please let me know. The code is here: > https://github.com/magpierre/drill/tree/DRILL-3878 < > https://github.com/magpierre/drill/tree/DRILL-3878> > Please observe that I am not an engineer, only an SE so please refrain > from laughing when reading the code ok? :) > > The biggest issue I have uncovered so far is that I am building a more > complex JSON than necessary, making drill not like the JSON provided. > Please also observe that I had to do a minor change in > JSONRecordReader in order for this to work (had to change some > variables from private to > protected) and therefore this code will not work on a standard Apache > Drill build. > > In order to use it you put the jar created when building the project > into the jars folder in the newly built drill environment, and add this to > dfs: > > , > "xml": { > "type": "xml", > "keepPrefix": false > } > > The keepPrefix thing is to be able to filter away all the namespace > things on tags if you want. > > Then add a workspace such as: > > , > "xmls": { > "location": ”/directory/XML", > "writable": false, > "defaultInputFormat": "xml" > } > > > Regards, > Magnus
