yeah thank you... now im also struck. if possible, can you share the solution ??
On Mon, Sep 16, 2013 at 7:21 PM, <[email protected]> wrote: > Your example had newlines in the <employee> element. The regular > expression .* does not match newlines. One way to remove newlines is > REPLACE(x,'[\\n]',''). If the text ranges you are interested in do not > contain newlines, for example if you are interested in <employee_id> but do > not care about its relation to other elements inside the same <employee> > element, then you do not need to do this. > > William F Dowling > Senior Technologist > Thomson Reuters > > > -----Original Message----- > From: ajay kumar [mailto:[email protected]] > Sent: Monday, September 16, 2013 1:11 AM > To: [email protected] > Subject: Re: Converting xml to csv > > SORRY IF I AM WRONG.. > > WHY WE NEED TO USE REPLACE...I MEAN WHAT IS THE ADVANTAGE > > > On Fri, Sep 13, 2013 at 7:02 PM, <[email protected]> > wrote: > > > Ajay's suggestion will work for elements like <employee_id> in your > > example, that occur all on one line. If you want to get the whole > > <employee> element, and that spans more than one line, you will not be > able > > to get it with matching (.*) since that will not match a newline > character. > > > > You can remove newline characters using > > B = foreach A generate REPLACE(x,'[\\n]',''); > > > > > > William F Dowling > > Senior Technologist > > Thomson Reuters > > > > > > -----Original Message----- > > From: ajay kumar [mailto:[email protected]] > > Sent: Friday, September 13, 2013 2:21 AM > > To: [email protected] > > Subject: Re: Converting xml to csv > > > > try this ... > > > > register /usr/lib/pig/piggybank.jar > > A = load '/home/sudeep/Desktop/test1' using > > org.apache.pig.piggybank.storage.XMLLoader('employee_id') as > (x:chararray); > > B = foreach A generate > > REGEX_EXTRACT(x,'<employee_id>(.*)</employee_id>',1); > > > > > > On Fri, Sep 13, 2013 at 3:54 AM, jamal sasha <[email protected]> > > wrote: > > > > > Hi, > > > I am trying to parse following json > > > > > > > > > <employee> > > > <employee_id>1234</employee_id> > > > <email>[email protected]</email> > > > <name>(first_name_1234,middle_initial_1234,last_name_1234)</name> > > > > > > > <projects>{(project_1234_1),(project_1234_2),(project_1234_3)}</projects> > > > <skills>[programming:SQL,rdbms:Oracle]</skills> > > > </employee> > > > > > > And my script is > > > > > > a = LOAD 'sample.xml' USING > > > org.apache.pig.piggybank.storage.XMLLoader('employee') as > (x:chararray); > > > B = foreach a generate REGEX_EXTRACT(x,'<employee>(.*)</employee>',1) > > > dump B; > > > now B is empty tuple here? > > > Not sure what am i missing? > > > > > > > > > > > > > > > On Wed, Sep 11, 2013 at 11:35 PM, ajay kumar <[email protected] > > > >wrote: > > > > > > > use org.apache.pig.piggybank.storage.XMLLoader and then extract them > > > using > > > > regex_all > > > > > > > > > > > > On Thu, Sep 12, 2013 at 11:18 AM, jamal sasha <[email protected] > > > > > > wrote: > > > > > > > > > Umm.. yess.. but how do i generalize it.. > > > > > so what I am looking for is.. just like we have json parser in say > > java > > > > > If i give a valid json string.. I can parse it as and then i can > > access > > > > it > > > > > as a hashmap.. > > > > > But in xml loader.. i still have to specify regex rules?? > > > > > > > > > > Actually, is it possible to just flatten the xml.. > > > > > so for example > > > > > convert > > > > > <aux> > > > > > <foobar>1</foobar> > > > > > <fushbar>foo</fushbar> > > > > > </aux> > > > > > to > > > > > <aux><foobar>1</foobar><fushbar>foo</fushbar></aux> > > > > > ??? > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 11, 2013 at 10:32 PM, Jagat Singh < > [email protected]> > > > > > wrote: > > > > > > > > > > > Use piggybank xmlloader > > > > > > On 12/09/2013 10:14 AM, "jamal sasha" <[email protected]> > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > So I have different xml data sources...For example: > > > > > > > > > > > > > > src1.txt > > > > > > > > > > > > > > <foo> > > > > > > > <bar>1</bar> > > > > > > > </foo> > > > > > > > <foo> > > > > > > > <bar>2</bar> > > > > > > > </foo> > > > > > > > .. and so on > > > > > > > > > > > > > > > > > > > > > and another data > > > > > > > > > > > > > > src2.txt > > > > > > > > > > > > > > <aux> > > > > > > > <foobar>1</foobar> > > > > > > > <fushbar>foo</fushbar> > > > > > > > </aux> > > > > > > > > > > > > > > ... and so on > > > > > > > > > > > > > > > > > > > > > So basicaly different xml (valid formats) > > > > > > > > > > > > > > Rather than writing different pig scripts.. is there a way to > > > write 1 > > > > > > > script and then convert all these xml data into csv? > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > *Thanks & Regards,* > > > > *S. Ajay Kumar > > > > +91-9966159106* > > > > > > > > > > > > > > > -- > > *Thanks & Regards,* > > *S. Ajay Kumar > > +91-9966159106* > > > > > > -- > *Thanks & Regards,* > *S. Ajay Kumar > +91-9966159106* > -- *Thanks & Regards,* *S. Ajay Kumar +91-9966159106*
