Hi Francisco,
Thank you very much for your answer. But what I would need is an example
for all revisions. Would you have an idea on how to accomplish this?
Kind regards,
Herbert
Am 17.05.12 16:08, schrieb Francisco Javier Gonzalez Garcia:
this is an example of one revison for page (in other case is more
complex but it's possible):
REGISTER /usr/lib/pig/contrib/piggybank/java/piggybank.jar;
DEFINE XMLLoader org.apache.pig.piggybank.storage.XMLLoader();
DEFINE RegexExtractAll
org.apache.pig.piggybank.evaluation.string.RegexExtractAll();
revisionXML = LOAD 'Revision.xml' USING XMLLoader('page') AS
(revision:chararray);
rev = FOREACH revisionXML GENERATE FLATTEN
(RegexExtractAll(revision,'<id>([^<]*)</id>\\n\\s*<revision>\\n\\s*<id>([^>]*)</id>\\n\\s*<username>([^>]*)</username>\\n\\s*</revision>')
)
AS
(
page: chararray,
id_revision: chararray,
username: chararray,
);
dump rev;
2012/5/17, Herbert Mühlburger<[email protected]>:
Hi list,
I would like to parse the following XML-File using Pig:
<page>
<id>1</id>
<revision>
<id>1</id>
<username>muehlburger</username>
</revision>
<revision>
<id>2</id>
<username>muehlburger</username>
</revision>
<revision>
<id>3</id>
<username>user1</username>
</revision>
...
<revision>
<id>34334398</id>
<username>muehlburger</username>
</revision>
</page>
<page>
<id>2</id>
<revision>
<id>343434</id>
<username>muehlburger</username>
</revision>
<revision>
<id>25343232</id>
<username>muehlburger</username>
</revision>
<revision>
<id>43434333</id>
<username>user2</username>
</revision>
...
<revision>
<id>5409589854</id>
<username>user5</username>
</revision>
</page>
...
I would like to produce the following kind of csv output:
page_id revision_id username
1 1 muehlburger
1 2 muehlburger
1 3 user1
1 34334398 muehlburger
2 343434 muehlburger
2 25343232 muehlburger
2 43434333 user2
2 5409589854 user5
How can I acomplish this using PIG?
Thank you very much for your help!
Kind regards,
Herbert
--
=================================================================
Herbert Muehlburger Software Development and Business Management
Graz University of Technology
www.muehlburger.at www.twitter.com/hmuehlburger
=================================================================