I have a requirement to parse an xml and generate columns based on parameters
specified by the user to the pig script.
For eg, consider the following xml
<school>
<students>
<student>
<name>test</test>
<rno>1</rno>
<rank>3</rank>
</student>
<student>
<name>xyz</test>
<rno>3</rno>
<rank>2</rank>
</student>
<students>
</school>
My requirement is to parse the xml and generate the attributes depending on the
field names specified by the user.
For eg, if the user specifies the field name as 'name|rno' , the parser should
parse the xml and return a tuple containing name and rno.
I am using XML Loader to parse the xml up to student and then have written a
java UDF to parse the student xml.
I tried to define a parameterized constructor in my java UDF class wherein I
pass the columns/ attributes to be parsed.
I have then overridden the outputSchema(Schema input) method , in which I fetch
the column names and add new field schema.
However this does not work the way expected. Is there any way of getting this
done?
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the
property of Persistent Systems Ltd. It is intended only for the use of the
individual or entity to which it is addressed. If you are not the intended
recipient, you are not authorized to read, retain, copy, print, distribute or
use this message. If you have received this communication in error, please
notify the sender and delete all copies of this message. Persistent Systems
Ltd. does not accept any liability for virus infected mails.