Very cool. Thanks a lot.

Recently I have received a request to process some xml files and make the 
results available in a datamart. I thin I will give it a try and see if I can 
process it adequately.

Greetings,

Uwe

-----Original Message-----
From: Jacques Nadeau [mailto:[email protected]] 
Sent: Montag, 16. November 2015 18:41
To: user
Subject: Re: XML in Apache Drill

This is cool. I haven't taken a look yet but I will.

Thanks!

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Nov 12, 2015 at 2:35 AM, Magnus Pierre <[email protected]> wrote:

> Hello Drill Users,
> A few weeks ago i had the pleasure of writing a small SAX parser for 
> XML that I was using in Storm to convert XML to JSON.
> Later I decided this would be great to put into Drill, mostly as a 
> work around for the fact that Drill does not have native support for XML yet.
> With some great help from Jacques basically pointing out how-to 
> integrate it, I integrated it in just a few minutes last week.
>
> If you are interested in testing it and maybe even contribute in 
> fixing some of the issues, please let me know. The code is here:
> https://github.com/magpierre/drill/tree/DRILL-3878 < 
> https://github.com/magpierre/drill/tree/DRILL-3878>
> Please observe that I am not an engineer, only an SE so please refrain 
> from laughing when reading the code ok? :)
>
> The biggest issue I have uncovered so far is that I am building a more 
> complex JSON than necessary, making drill not like the JSON provided.
> Please also observe that I had to do a minor change in 
> JSONRecordReader in order for this to work (had to change some 
> variables from private to
> protected) and therefore this code will not work on a standard Apache 
> Drill build.
>
> In order to use it you put the jar created when building the project 
> into the jars folder in the newly built drill environment, and add this to 
> dfs:
>
> ,
>     "xml": {
>       "type": "xml",
>       "keepPrefix": false
>     }
>
> The keepPrefix thing is to be able to filter away all the namespace 
> things on tags if you want.
>
> Then add a workspace such as:
>
> ,
>     "xmls": {
>       "location": ”/directory/XML",
>       "writable": false,
>       "defaultInputFormat": "xml"
>     }
>
>
> Regards,
> Magnus

Reply via email to