This is cool. I haven't taken a look yet but I will.

Thanks!

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Thu, Nov 12, 2015 at 2:35 AM, Magnus Pierre <[email protected]> wrote:

> Hello Drill Users,
> A few weeks ago i had the pleasure of writing a small SAX parser for XML
> that I was using in Storm to convert XML to JSON.
> Later I decided this would be great to put into Drill, mostly as a work
> around for the fact that Drill does not have native support for XML yet.
> With some great help from Jacques basically pointing out how-to integrate
> it, I integrated it in just a few minutes last week.
>
> If you are interested in testing it and maybe even contribute in fixing
> some of the issues, please let me know. The code is here:
> https://github.com/magpierre/drill/tree/DRILL-3878 <
> https://github.com/magpierre/drill/tree/DRILL-3878>
> Please observe that I am not an engineer, only an SE so please refrain
> from laughing when reading the code ok? :)
>
> The biggest issue I have uncovered so far is that I am building a more
> complex JSON than necessary, making drill not like the JSON provided.
> Please also observe that I had to do a minor change in JSONRecordReader in
> order for this to work (had to change some variables from private to
> protected) and therefore this code will not work on a standard Apache Drill
> build.
>
> In order to use it you put the jar created when building the project into
> the jars folder in the newly built drill environment, and add this to dfs:
>
> ,
>     "xml": {
>       "type": "xml",
>       "keepPrefix": false
>     }
>
> The keepPrefix thing is to be able to filter away all the namespace things
> on tags if you want.
>
> Then add a workspace such as:
>
> ,
>     "xmls": {
>       "location": ”/directory/XML",
>       "writable": false,
>       "defaultInputFormat": "xml"
>     }
>
>
> Regards,
> Magnus

Reply via email to