Hi , Thanks so much It worked for me but can you please explain ([^<]*) and \\n\\s* part by symbols from below.
(RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>') ) Thanks Krishnan On Thu, May 17, 2012 at 7:08 AM, Francisco Javier Gonzalez Garcia < [email protected]> wrote: > this is an example of one revison for page (in other case is more > complex but it's possible): > > REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar; > DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader(); > DEFINE RegexExtractAll > org.apache.pig.piggybank.evaluation.string.RegexExtractAll(); > > revisionXML = LOAD 'Revision.xml' USING XMLLoader('page') AS > (revision:chararray); > > rev = FOREACH revisionXML GENERATE FLATTEN > > (RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>') > ) > AS > ( > page: chararray, > id_revision: chararray, > username: chararray, > ); > > > dump rev; > > > > 2012/5/17, Herbert Mühlburger <[email protected]>: > > Hi list, > > > > I would like to parse the following XML-File using Pig: > > > > <page> > > <id>1</id> > > <revision> > > <id>1</id> > > <username>muehlburger</username> > > </revision> > > <revision> > > <id>2</id> > > <username>muehlburger</username> > > </revision> > > <revision> > > <id>3</id> > > <username>user1</username> > > </revision> > > ... > > <revision> > > <id>34334398</id> > > <username>muehlburger</username> > > </revision> > > </page> > > <page> > > <id>2</id> > > <revision> > > <id>343434</id> > > <username>muehlburger</username> > > </revision> > > <revision> > > <id>25343232</id> > > <username>muehlburger</username> > > </revision> > > <revision> > > <id>43434333</id> > > <username>user2</username> > > </revision> > > ... > > <revision> > > <id>5409589854</id> > > <username>user5</username> > > </revision> > > </page> > > ... > > > > I would like to produce the following kind of csv output: > > > > page_id revision_id username > > 1 1 muehlburger > > 1 2 muehlburger > > 1 3 user1 > > 1 34334398 muehlburger > > 2 343434 muehlburger > > 2 25343232 muehlburger > > 2 43434333 user2 > > 2 5409589854 user5 > > > > How can I acomplish this using PIG? > > > > Thank you very much for your help! > > > > Kind regards, > > Herbert > > -- > > ================================================================= > > Herbert Muehlburger Software Development and Business Management > > Graz University of Technology > > www.muehlburger.at www.twitter.com/hmuehlburger > > ================================================================= > > >
