Hi,
these symbols belong to regex java class:
http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

2012/5/18 krishnan N <[email protected]>

> Hi ,
> Thanks so much It worked for me but can you please explain  ([^<]*)
> and \\n\\s* part by symbols from below.
>
>
> (RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>')
> )
>
> Thanks
> Krishnan
>
> On Thu, May 17, 2012 at 7:08 AM, Francisco Javier Gonzalez Garcia <
> [email protected]> wrote:
>
> > this is an example of one revison for page (in other case is more
> > complex but it's possible):
> >
> > REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
> > DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();
> > DEFINE RegexExtractAll
> > org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
> >
> > revisionXML = LOAD 'Revision.xml' USING XMLLoader('page') AS
> > (revision:chararray);
> >
> > rev = FOREACH revisionXML GENERATE FLATTEN
> >
> >
> (RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>')
> > )
> > AS
> > (
> > page: chararray,
> > id_revision: chararray,
> > username: chararray,
> > );
> >
> >
> > dump rev;
> >
> >
> >
> > 2012/5/17, Herbert Mühlburger <[email protected]>:
> > > Hi list,
> > >
> > > I would like to parse the following XML-File using Pig:
> > >
> > > <page>
> > >    <id>1</id>
> > > <revision>
> > >      <id>1</id>
> > >      <username>muehlburger</username>
> > > </revision>
> > > <revision>
> > >      <id>2</id>
> > >      <username>muehlburger</username>
> > > </revision>
> > > <revision>
> > >      <id>3</id>
> > >      <username>user1</username>
> > > </revision>
> > > ...
> > > <revision>
> > >      <id>34334398</id>
> > >      <username>muehlburger</username>
> > > </revision>
> > > </page>
> > > <page>
> > >    <id>2</id>
> > > <revision>
> > >      <id>343434</id>
> > >      <username>muehlburger</username>
> > > </revision>
> > > <revision>
> > >      <id>25343232</id>
> > >      <username>muehlburger</username>
> > > </revision>
> > > <revision>
> > >      <id>43434333</id>
> > >      <username>user2</username>
> > > </revision>
> > > ...
> > > <revision>
> > >      <id>5409589854</id>
> > >      <username>user5</username>
> > > </revision>
> > > </page>
> > > ...
> > >
> > > I would like to produce the following kind of csv output:
> > >
> > > page_id revision_id username
> > > 1 1 muehlburger
> > > 1 2 muehlburger
> > > 1 3 user1
> > > 1 34334398 muehlburger
> > > 2 343434 muehlburger
> > > 2 25343232 muehlburger
> > > 2 43434333 user2
> > > 2 5409589854 user5
> > >
> > > How can I acomplish this using PIG?
> > >
> > > Thank you very much for your help!
> > >
> > > Kind regards,
> > > Herbert
> > > --
> > > =================================================================
> > > Herbert Muehlburger  Software Development and Business Management
> > >                                      Graz University of Technology
> > > www.muehlburger.at                   www.twitter.com/hmuehlburger
> > > =================================================================
> > >
> >
>

Reply via email to