Hello Drill Users,
A few weeks ago i had the pleasure of writing a small SAX parser for XML that I 
was using in Storm to convert XML to JSON.
Later I decided this would be great to put into Drill, mostly as a work around 
for the fact that Drill does not have native support for XML yet. With some 
great help from Jacques basically pointing out how-to integrate it, I 
integrated it in just a few minutes last week.

If you are interested in testing it and maybe even contribute in fixing some of 
the issues, please let me know. The code is here:
https://github.com/magpierre/drill/tree/DRILL-3878 
<https://github.com/magpierre/drill/tree/DRILL-3878>
Please observe that I am not an engineer, only an SE so please refrain from 
laughing when reading the code ok? :)

The biggest issue I have uncovered so far is that I am building a more complex 
JSON than necessary, making drill not like the JSON provided. 
Please also observe that I had to do a minor change in JSONRecordReader in 
order for this to work (had to change some variables from private to protected) 
and therefore this code will not work on a standard Apache Drill build.

In order to use it you put the jar created when building the project into the 
jars folder in the newly built drill environment, and add this to dfs:

,
    "xml": {
      "type": "xml",
      "keepPrefix": false
    }

The keepPrefix thing is to be able to filter away all the namespace things on 
tags if you want.

Then add a workspace such as:

,
    "xmls": {
      "location": ”/directory/XML",
      "writable": false,
      "defaultInputFormat": "xml"
    }


Regards,
Magnus

Reply via email to