The XPath stuff works reasonably well for simple XML files.

However for complex XML files that change frequently and need to be
ingested in realtime you might look at a 3rd party solution, e.g. here:

On Mon, Jun 11, 2018 at 3:05 PM, kristijan berta <> wrote:

> thanks Jorn. The only alternative is to use xpath UDF? Works as shown
> below but tedious
> Like the example below
> *$cat employees.xml*
> <employee>
> <id>1</id>
> <name>Satish Kumar</name>
> <designation>Technical Lead</designation>
> </employee>
> <employee>
> <id>2</id>
> <name>Ramya</name>
> <designation>Testing</designation>
> </employee>
> *Step:1 Bring each record to one line, by executing below command*
> $cat employees.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed
> 's|</employee>|</employee>\n|g' | grep -v '^\s*$' > employees_records.xml
> *$cat employees_records.xml*
> <employee> <id>1</id> <name>Satish Kumar</name> <designation>Technical
> Lead</designation> </employee>
> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
> </employee>
> *tep:2 Load the file to HDFS*
> *$hadoop fs -mkdir /user/hive/sample-xml-inputs*
> *$hadoop fs -put employees_records.xml /user/hive/sample-xml-inputs*
> *$hadoop fs -cat /user/hive/sample-xml-inputs/employees_records.xml*
> <employee> <id>1</id> <name>Satish Kumar</name><designation>Technical
> Lead</designation> </employee>
> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
> </employee>
> *Step:3 Create a Hive table and point to xml file*
> *hive>create external table xml_table_org( xmldata string) LOCATION
> '/user/hive/sample-xml-inputs/';*
> *hive> select * from xml_table_org;*
> *OK*
> <employee> <id>1</id> <name>Satish Kumar</name> <designation>Technical
> Lead</designation> </employee>
> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
> </employee>
> *Step 4: From the stage table we can query the elements and load it to
> other table.*
> *hive> CREATE TABLE xml_table AS SELECT
> xpath_int(xmldata,'employee/id'),xpath_string(xmldata,'employee/name'),xpath_string(xmldata,'employee/designation')
> FROM xml_table_org;*
> Dr Mich Talebzadeh
> LinkedIn * 
> <>*
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> On 9 June 2018 at 07:42, Jörn Franke <> wrote:
>> Yes.
>> Serde must have been removed then in 2.x.
>> On 8. Jun 2018, at 23:52, Mich Talebzadeh <>
>> wrote:
>> Ok I am looking at this jar file
>>  jar tf hive-serde-3.0.0.jar|grep -i abstractserde
>> org/apache/hadoop/hive/serde2/AbstractSerDe.class
>> Is this the correct one?
>> Thanks
>> Dr Mich Talebzadeh
>> LinkedIn * 
>> <>*
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>> On 8 June 2018 at 22:34, Mich Talebzadeh <>
>> wrote:
>>> Thanks Jorn so what is the resolution? do I need another jar file?
>>> Dr Mich Talebzadeh
>>> LinkedIn * 
>>> <>*
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>> On 8 June 2018 at 21:56, Jörn Franke <> wrote:
>>>> Oha i see now Serde is a deprecated Interface , if i am not wrong it
>>>> has been replaced by the abstract class abstractserde
>>>> On 8. Jun 2018, at 22:22, Mich Talebzadeh <>
>>>> wrote:
>>>> Thanks Jorn.
>>>> Spark 2.3.3 (labelled as stable)
>>>> First I put the jar file hivexmlserde- under $HIVE_HOME/lib
>>>> and explicitly loaded with ADD JAR as well in hive session
>>>> hive> ADD JAR hdfs://rhes75:9000/jars/hivexmlserde-;
>>>> Added 
>>>> [/tmp/hive/7feb5165-780b-4ab6-aca8-f516d0388823_resources/hivexmlserde-]
>>>> to class path
>>>> Added resources: [hdfs://rhes75:9000/jars/hivexmlserde-]
>>>> Then I ran a simple code given here
>>>> <>
>>>> hive> CREATE  TABLE xml_41 (imap map<string,string>)     > ROW FORMAT
>>>> SERDE ''     > WITH
>>>> "column.xpath.imap"="/file-format/data-set/element",
>>>>     > ""="@name->#content"     > )     >
>>>>     > OUTPUTFORMAT ''
>>>>     > TBLPROPERTIES (     > "xmlinput.start"="<file-format>",     >
>>>> "xmlinput.end"="</file-format>"     > ); FAILED: Execution Error,
>>>> return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>>> org/apache/hadoop/hive/serde2/SerDe And this is full error
>>>> 2018-06-08T21:17:20,775  INFO [7feb5165-780b-4ab6-aca8-f516d0388823
>>>> main] ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>>> 2018-06-08T21:17:20,776 ERROR [7feb5165-780b-4ab6-aca8-f516d0388823
>>>> main] exec.DDLTask: java.lang.NoClassDefFoundError:
>>>> org/apache/hadoop/hive/serde2/SerDe         at
>>>> java.lang.ClassLoader.defineClass1(Native Method)         at
>>>> java.lang.ClassLoader.defineClass(         at
>>>>         at
>>>>         at$100(
>>>>         at$
>>>>         at$
>>>>         at Method)
>>>>         at
>>>>         at java.lang.ClassLoader.loadClass(
>>>>         at sun.misc.Launcher$AppClassLoader.loadClass(
>>>>         at java.lang.ClassLoader.loadClass(
>>>>         at java.lang.ClassLoader.loadClass(
>>>>         at java.lang.Class.forName0(Native Method)         at
>>>> java.lang.Class.forName(         at
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
>>>>         at 
>>>> org.apache.hadoop.conf.Configuration.getClassByName(
>>>>         at 
>>>> org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(
>>>>         at 
>>>> org.apache.hadoop.hive.ql.plan.CreateTableDesc.toTable(
>>>>         at 
>>>> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(
>>>>         at org.apache.hadoop.hive.ql.exec.DDLTask.execute(
>>>>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(
>>>>         at 
>>>> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(
>>>>         at org.apache.hadoop.hive.ql.Driver.launchTask(
>>>>         at org.apache.hadoop.hive.ql.Driver.execute(
>>>>         at org.apache.hadoop.hive.ql.Driver.runInternal(
>>>>         at
>>>>         at
>>>>         at 
>>>> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(
>>>>         at 
>>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(
>>>>         at 
>>>> org.apache.hadoop.hive.cli.CliDriver.processLine(
>>>>         at 
>>>> org.apache.hadoop.hive.cli.CliDriver.executeDriver(
>>>>         at
>>>>         at org.apache.hadoop.hive.cli.CliDriver.main(
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(
>>>>         at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(
>>>>         at java.lang.reflect.Method.invoke(         at
>>>>         at
>>>> org.apache.hadoop.util.RunJar.main( Caused by:
>>>> java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
>>>> at
>>>> at java.lang.ClassLoader.loadClass(         at
>>>> sun.misc.Launcher$AppClassLoader.loadClass(
>>>> at java.lang.ClassLoader.loadClass(         ...
>>>> 40 more The jar file has the classes!
>>>> jar tf hivexmlserde-
>>>> META-INF/
>>>> com/
>>>> com/ibm/
>>>> com/ibm/spss/
>>>> com/ibm/spss/hive/
>>>> com/ibm/spss/hive/serde2/
>>>> com/ibm/spss/hive/serde2/xml/
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/
>>>> com/ibm/spss/hive/serde2/xml/processor/
>>>> com/ibm/spss/hive/serde2/xml/processor/java/
>>>> com/ibm/spss/hive/serde2/xml/HiveXmlRecordReader.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlListObjectIn
>>>> spector.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlMapObjectIns
>>>> pector.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlObjectInspec
>>>> torFactory$1.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlObjectInspec
>>>> torFactory.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlStructObject
>>>> Inspector$1.class
>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlStructObject
>>>> Inspector.class
>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor$1.class
>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor$2.class
>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor.class
>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor$1.class
>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor$2.class
>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor.class
>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlQuery.class
>>>> com/ibm/spss/hive/serde2/xml/processor/java/NodeArray.class
>>>> com/ibm/spss/hive/serde2/xml/processor/SerDeArray.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapEntry.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapFacet$Type.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapFacet.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode$1.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode$2.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNodeArray.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlProcessor.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlProcessorContext.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlQuery.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlTransformer.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlUtils$1.class
>>>> com/ibm/spss/hive/serde2/xml/processor/XmlUtils.class
>>>> com/ibm/spss/hive/serde2/xml/SplittableXmlInputFormat.class
>>>> com/ibm/spss/hive/serde2/xml/XmlInputFormat$XmlRecordReader.class
>>>> com/ibm/spss/hive/serde2/xml/XmlInputFormat.class
>>>> com/ibm/spss/hive/serde2/xml/XmlSerDe$1.class
>>>> com/ibm/spss/hive/serde2/xml/XmlSerDe.class
>>>> META-INF/maven/
>>>> META-INF/maven/
>>>> META-INF/maven/
>>>> META-INF/maven/
>>>> META-INF/maven/
>>>> Dr Mich Talebzadeh
>>>> LinkedIn * 
>>>> <>*
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>> On 8 June 2018 at 17:58, Jörn Franke <> wrote:
>>>>> Can you get the log files and start Hive with more detailled logs?
>>>>> In could be that not all libraries are loaded (i don’t remember
>>>>> anymore but I think this one needs more , I can look next week in my docs)
>>>>> or that it does not support maps (not sure).
>>>>> You can try first with a more simpler extraction with a String field
>>>>> to see if it works .
>>>>> Hive has always had external libraries for xml support and I used the
>>>>> one below with Hive 1.x, but it should also work with 2.x (3 not sure, but
>>>>> it should if it works in 2.x)
>>>>> On 8. Jun 2018, at 17:53, Mich Talebzadeh <>
>>>>> wrote:
>>>>> I tried Hive 2.0.1, 2.3.2 and now Hive 3/
>>>>> I explicitly added hivexmlserde  jar file as ADD JAR shown below
>>>>> 0: jdbc:hive2://rhes75:10099/default> ADD JAR
>>>>> hdfs://rhes75:9000/jars/hivexmlserde-;
>>>>> No rows affected (0.002 seconds)
>>>>> But still cannot create an xml table
>>>>> 0: jdbc:hive2://rhes75:10099/default> CREATE  TABLE xml_41 (imap
>>>>> map<string,string>) ROW FORMAT SERDE 
>>>>> ''
>>>>> WITH SERDEPROPERTIES ("column.xpath.imap"="/file-fo
>>>>> rmat/data-set/element",""="@name->#content")
>>>>> TBLPROPERTIES ("xmlinput.start"="<file-forma
>>>>> t>","xmlinput.end"="</file-format>");
>>>>> Error: Error while processing statement: FAILED: Execution Error,
>>>>> return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>>>> org/apache/hadoop/hive/serde2/SerDe (state=08S01,code=1)
>>>>> Does anyone know the cause of this or which version of Hive supports
>>>>> creating an XML table?
>>>>> Thanks
>>>>> Dr Mich Talebzadeh
>>>>> LinkedIn * 
>>>>> <>*
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.

Reply via email to