Let me know how it goes. I also want to point out that when I’ve seen errors going against ALL columns using * construct, it usually works once you go deeper in the structure so don’t give up if the first query doesn’t work for you. You can use your XML to figure out the tags. Please also observe that keepPrefix false will filter the first part of the tag from for instance ’nsref:tagname’ to ’tagname’.
Regards, Magnus > 17 nov 2015 kl. 11:26 skrev Geercken, Uwe <[email protected]>: > > Very cool. Thanks a lot. > > Recently I have received a request to process some xml files and make the > results available in a datamart. I thin I will give it a try and see if I can > process it adequately. > > Greetings, > > Uwe > > -----Original Message----- > From: Jacques Nadeau [mailto:[email protected]] > Sent: Montag, 16. November 2015 18:41 > To: user > Subject: Re: XML in Apache Drill > > This is cool. I haven't taken a look yet but I will. > > Thanks! > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Thu, Nov 12, 2015 at 2:35 AM, Magnus Pierre <[email protected]> wrote: > >> Hello Drill Users, >> A few weeks ago i had the pleasure of writing a small SAX parser for >> XML that I was using in Storm to convert XML to JSON. >> Later I decided this would be great to put into Drill, mostly as a >> work around for the fact that Drill does not have native support for XML yet. >> With some great help from Jacques basically pointing out how-to >> integrate it, I integrated it in just a few minutes last week. >> >> If you are interested in testing it and maybe even contribute in >> fixing some of the issues, please let me know. The code is here: >> https://github.com/magpierre/drill/tree/DRILL-3878 < >> https://github.com/magpierre/drill/tree/DRILL-3878> >> Please observe that I am not an engineer, only an SE so please refrain >> from laughing when reading the code ok? :) >> >> The biggest issue I have uncovered so far is that I am building a more >> complex JSON than necessary, making drill not like the JSON provided. >> Please also observe that I had to do a minor change in >> JSONRecordReader in order for this to work (had to change some >> variables from private to >> protected) and therefore this code will not work on a standard Apache >> Drill build. >> >> In order to use it you put the jar created when building the project >> into the jars folder in the newly built drill environment, and add this to >> dfs: >> >> , >> "xml": { >> "type": "xml", >> "keepPrefix": false >> } >> >> The keepPrefix thing is to be able to filter away all the namespace >> things on tags if you want. >> >> Then add a workspace such as: >> >> , >> "xmls": { >> "location": ”/directory/XML", >> "writable": false, >> "defaultInputFormat": "xml" >> } >> >> >> Regards, >> Magnus
