Hi Francisco,

Thank you very much for your answer. But what I would need is an example for all revisions. Would you have an idea on how to accomplish this?

Kind regards,
Herbert

Am 17.05.12 16:08, schrieb Francisco Javier Gonzalez Garcia:
this is an example of one revison for page (in other case is more
complex but it's possible):

REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();
DEFINE RegexExtractAll
org.apache.pig.piggybank.evaluation.string.RegexExtractAll();

revisionXML = LOAD 'Revision.xml' USING XMLLoader('page') AS
(revision:chararray);

rev = FOREACH revisionXML GENERATE FLATTEN
(RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>')
)
AS
(
page: chararray,
id_revision: chararray,
username: chararray,
);


dump rev;



2012/5/17, Herbert Mühlburger<[email protected]>:
Hi list,

I would like to parse the following XML-File using Pig:

<page>
    <id>1</id>
<revision>
      <id>1</id>
      <username>muehlburger</username>
</revision>
<revision>
      <id>2</id>
      <username>muehlburger</username>
</revision>
<revision>
      <id>3</id>
      <username>user1</username>
</revision>
...
<revision>
      <id>34334398</id>
      <username>muehlburger</username>
</revision>
</page>
<page>
    <id>2</id>
<revision>
      <id>343434</id>
      <username>muehlburger</username>
</revision>
<revision>
      <id>25343232</id>
      <username>muehlburger</username>
</revision>
<revision>
      <id>43434333</id>
      <username>user2</username>
</revision>
...
<revision>
      <id>5409589854</id>
      <username>user5</username>
</revision>
</page>
...

I would like to produce the following kind of csv output:

page_id revision_id username
1 1 muehlburger
1 2 muehlburger
1 3 user1
1 34334398 muehlburger
2 343434 muehlburger
2 25343232 muehlburger
2 43434333 user2
2 5409589854 user5

How can I acomplish this using PIG?

Thank you very much for your help!

Kind regards,
Herbert
--
=================================================================
Herbert Muehlburger  Software Development and Business Management
                                      Graz University of Technology
www.muehlburger.at                   www.twitter.com/hmuehlburger
=================================================================

Reply via email to