many thanks. but I cannot see any specific product name there?

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 11 June 2018 at 14:10, kristijan berta <kik...@gmail.com> wrote:

> The XPath stuff works reasonably well for simple XML files.
>
> However for complex XML files that change frequently and need to be
> ingested in realtime you might look at a 3rd party solution, e.g. here:
> https://dataworkssummit.com/san-jose-2018/session/add-a-spark-to-your-etl/
>
> On Mon, Jun 11, 2018 at 3:05 PM, kristijan berta <kik...@gmail.com> wrote:
>
>> thanks Jorn. The only alternative is to use xpath UDF? Works as shown
>> below but tedious
>>
>> Like the example below
>>
>> *$cat employees.xml*
>> <employee>
>> <id>1</id>
>> <name>Satish Kumar</name>
>> <designation>Technical Lead</designation>
>> </employee>
>> <employee>
>> <id>2</id>
>> <name>Ramya</name>
>> <designation>Testing</designation>
>> </employee>
>>
>> *Step:1 Bring each record to one line, by executing below command*
>>
>> $cat employees.xml | tr -d '&' | tr '\n' ' ' | tr '\r' ' ' | sed
>> 's|</employee>|</employee>\n|g' | grep -v '^\s*$' > employees_records.xml
>>
>> *$cat employees_records.xml*
>> <employee> <id>1</id> <name>Satish Kumar</name> <designation>Technical
>> Lead</designation> </employee>
>> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
>> </employee>
>>
>> *tep:2 Load the file to HDFS*
>>
>> *$hadoop fs -mkdir /user/hive/sample-xml-inputs*
>>
>> *$hadoop fs -put employees_records.xml /user/hive/sample-xml-inputs*
>>
>> *$hadoop fs -cat /user/hive/sample-xml-inputs/employees_records.xml*
>> <employee> <id>1</id> <name>Satish Kumar</name><designation>Technical
>> Lead</designation> </employee>
>> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
>> </employee>
>>
>> *Step:3 Create a Hive table and point to xml file*
>>
>> *hive>create external table xml_table_org( xmldata string) LOCATION
>> '/user/hive/sample-xml-inputs/';*
>>
>> *hive> select * from xml_table_org;*
>> *OK*
>> <employee> <id>1</id> <name>Satish Kumar</name> <designation>Technical
>> Lead</designation> </employee>
>> <employee> <id>2</id> <name>Ramya</name> <designation>Testing</designation>
>> </employee>
>>
>> *Step 4: From the stage table we can query the elements and load it to
>> other table.*
>>
>> *hive> CREATE TABLE xml_table AS SELECT
>> xpath_int(xmldata,'employee/id'),xpath_string(xmldata,'employee/name'),xpath_string(xmldata,'employee/designation')
>> FROM xml_table_org;*
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 9 June 2018 at 07:42, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Yes.
>>>
>>> Serde must have been removed then in 2.x.
>>>
>>>
>>>
>>> On 8. Jun 2018, at 23:52, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>> Ok I am looking at this jar file
>>>
>>>  jar tf hive-serde-3.0.0.jar|grep -i abstractserde
>>> org/apache/hadoop/hive/serde2/AbstractSerDe.class
>>>
>>> Is this the correct one?
>>>
>>> Thanks
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 8 June 2018 at 22:34, Mich Talebzadeh <mich.talebza...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Jorn so what is the resolution? do I need another jar file?
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 8 June 2018 at 21:56, Jörn Franke <jornfra...@gmail.com> wrote:
>>>>
>>>>> Oha i see now Serde is a deprecated Interface , if i am not wrong it
>>>>> has been replaced by the abstract class abstractserde
>>>>>
>>>>> On 8. Jun 2018, at 22:22, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Thanks Jorn.
>>>>>
>>>>> Spark 2.3.3 (labelled as stable)
>>>>>
>>>>> First I put the jar file hivexmlserde-1.0.5.3.jar under $HIVE_HOME/lib
>>>>> and explicitly loaded with ADD JAR as well in hive session
>>>>>
>>>>> hive> ADD JAR hdfs://rhes75:9000/jars/hivexmlserde-1.0.5.3.jar;
>>>>> Added 
>>>>> [/tmp/hive/7feb5165-780b-4ab6-aca8-f516d0388823_resources/hivexmlserde-1.0.5.3.jar]
>>>>> to class path
>>>>> Added resources: [hdfs://rhes75:9000/jars/hivexmlserde-1.0.5.3.jar]
>>>>>
>>>>> Then I ran a simple code given here
>>>>> <https://github.com/dvasilen/Hive-XML-SerDe/issues/41>
>>>>>
>>>>> hive> CREATE  TABLE xml_41 (imap map<string,string>)     > ROW FORMAT
>>>>> SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'     > WITH
>>>>> SERDEPROPERTIES (     > 
>>>>> "column.xpath.imap"="/file-format/data-set/element",
>>>>>     > "xml.map.specification.element"="@name->#content"     > )     >
>>>>> STORED AS     > INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
>>>>>     > OUTPUTFORMAT 
>>>>> 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
>>>>>     > TBLPROPERTIES (     > "xmlinput.start"="<file-format>",     >
>>>>> "xmlinput.end"="</file-format>"     > ); FAILED: Execution Error,
>>>>> return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>>>> org/apache/hadoop/hive/serde2/SerDe And this is full error
>>>>> 2018-06-08T21:17:20,775  INFO [7feb5165-780b-4ab6-aca8-f516d0388823
>>>>> main] ql.Driver: Starting task [Stage-0:DDL] in serial mode
>>>>> 2018-06-08T21:17:20,776 ERROR [7feb5165-780b-4ab6-aca8-f516d0388823
>>>>> main] exec.DDLTask: java.lang.NoClassDefFoundError:
>>>>> org/apache/hadoop/hive/serde2/SerDe         at
>>>>> java.lang.ClassLoader.defineClass1(Native Method)         at
>>>>> java.lang.ClassLoader.defineClass(ClassLoader.java:763)         at
>>>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>>>>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>>>>>         at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>         at sun.misc.Launcher$AppClassLoad
>>>>> er.loadClass(Launcher.java:331)         at
>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:411)         at
>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:357)         at
>>>>> java.lang.Class.forName0(Native Method)         at
>>>>> java.lang.Class.forName(Class.java:348)         at
>>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
>>>>>         at org.apache.hadoop.conf.Configu
>>>>> ration.getClassByName(Configuration.java:2099)         at
>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.validateSerDe(DDLTask.java:4213)
>>>>>         at org.apache.hadoop.hive.ql.plan
>>>>> .CreateTableDesc.toTable(CreateTableDesc.java:723)         at
>>>>> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4321)
>>>>>         at org.apache.hadoop.hive.ql.exec
>>>>> .DDLTask.execute(DDLTask.java:354)         at
>>>>> org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
>>>>>         at org.apache.hadoop.hive.ql.exec
>>>>> .TaskRunner.runSequential(TaskRunner.java:100)         at
>>>>> org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
>>>>> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
>>>>> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
>>>>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>>>>>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>>>>>         at org.apache.hadoop.hive.cli.Cli
>>>>> Driver.processLocalCmd(CliDriver.java:233)         at
>>>>> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
>>>>>         at org.apache.hadoop.hive.cli.Cli
>>>>> Driver.processLine(CliDriver.java:403)         at
>>>>> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
>>>>>         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
>>>>>         at org.apache.hadoop.hive.cli.Cli
>>>>> Driver.main(CliDriver.java:686)         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>> at 
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>>         at sun.reflect.DelegatingMethodAc
>>>>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)         at
>>>>> java.lang.reflect.Method.invoke(Method.java:498)         at
>>>>> org.apache.hadoop.util.RunJar.run(RunJar.java:221)         at
>>>>> org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by:
>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
>>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)         at
>>>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)         ...
>>>>> 40 more The jar file has the classes!
>>>>>
>>>>> jar tf hivexmlserde-1.0.5.3.jar
>>>>> META-INF/
>>>>> META-INF/MANIFEST.MF
>>>>> com/
>>>>> com/ibm/
>>>>> com/ibm/spss/
>>>>> com/ibm/spss/hive/
>>>>> com/ibm/spss/hive/serde2/
>>>>> com/ibm/spss/hive/serde2/xml/
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/
>>>>> com/ibm/spss/hive/serde2/xml/processor/
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/
>>>>> com/ibm/spss/hive/serde2/xml/HiveXmlRecordReader.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlListObjectIn
>>>>> spector.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlMapObjectIns
>>>>> pector.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlObjectInspec
>>>>> torFactory$1.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlObjectInspec
>>>>> torFactory.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlStructObject
>>>>> Inspector$1.class
>>>>> com/ibm/spss/hive/serde2/xml/objectinspector/XmlStructObject
>>>>> Inspector.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor$1.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor$2.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/AbstractXmlProcessor.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor$1.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor$2.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlProcessor.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/JavaXmlQuery.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/java/NodeArray.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/SerDeArray.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapEntry.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapFacet$Type.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlMapFacet.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode$1.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode$2.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNode.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlNodeArray.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlProcessor.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlProcessorContext.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlQuery.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlTransformer.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlUtils$1.class
>>>>> com/ibm/spss/hive/serde2/xml/processor/XmlUtils.class
>>>>> com/ibm/spss/hive/serde2/xml/SplittableXmlInputFormat.class
>>>>> com/ibm/spss/hive/serde2/xml/XmlInputFormat$XmlRecordReader.class
>>>>> com/ibm/spss/hive/serde2/xml/XmlInputFormat.class
>>>>> com/ibm/spss/hive/serde2/xml/XmlSerDe$1.class
>>>>> com/ibm/spss/hive/serde2/xml/XmlSerDe.class
>>>>> META-INF/maven/
>>>>> META-INF/maven/com.ibm.spss.hive.serde2.xml/
>>>>> META-INF/maven/com.ibm.spss.hive.serde2.xml/hivexmlserde/
>>>>> META-INF/maven/com.ibm.spss.hive.serde2.xml/hivexmlserde/pom.xml
>>>>> META-INF/maven/com.ibm.spss.hive.serde2.xml/hivexmlserde/pom
>>>>> .properties
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 8 June 2018 at 17:58, Jörn Franke <jornfra...@gmail.com> wrote:
>>>>>
>>>>>> Can you get the log files and start Hive with more detailled logs?
>>>>>> In could be that not all libraries are loaded (i don’t remember
>>>>>> anymore but I think this one needs more , I can look next week in my 
>>>>>> docs)
>>>>>> or that it does not support maps (not sure).
>>>>>> You can try first with a more simpler extraction with a String field
>>>>>> to see if it works .
>>>>>>
>>>>>> Hive has always had external libraries for xml support and I used the
>>>>>> one below with Hive 1.x, but it should also work with 2.x (3 not sure, 
>>>>>> but
>>>>>> it should if it works in 2.x)
>>>>>>
>>>>>>
>>>>>> On 8. Jun 2018, at 17:53, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> I tried Hive 2.0.1, 2.3.2 and now Hive 3/
>>>>>>
>>>>>> I explicitly added hivexmlserde  jar file as ADD JAR shown below
>>>>>>
>>>>>> 0: jdbc:hive2://rhes75:10099/default> ADD JAR
>>>>>> hdfs://rhes75:9000/jars/hivexmlserde-1.0.5.3.jar;
>>>>>> No rows affected (0.002 seconds)
>>>>>>
>>>>>> But still cannot create an xml table
>>>>>>
>>>>>> 0: jdbc:hive2://rhes75:10099/default> CREATE  TABLE xml_41 (imap
>>>>>> map<string,string>) ROW FORMAT SERDE 
>>>>>> 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
>>>>>> WITH SERDEPROPERTIES ("column.xpath.imap"="/file-fo
>>>>>> rmat/data-set/element","xml.map.specification.element"="@name->#content")
>>>>>> STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
>>>>>> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
>>>>>> TBLPROPERTIES ("xmlinput.start"="<file-forma
>>>>>> t>","xmlinput.end"="</file-format>");
>>>>>>
>>>>>> Error: Error while processing statement: FAILED: Execution Error,
>>>>>> return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
>>>>>> org/apache/hadoop/hive/serde2/SerDe (state=08S01,code=1)
>>>>>>
>>>>>> Does anyone know the cause of this or which version of Hive supports
>>>>>> creating an XML table?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * 
>>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>>
>

Reply via email to