Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by breed:

New page:
In this exercise we will work through the example shown in the presentation. We 
have two datasets: users and pages. {{{users}}} contains the userid and age of 
every user using some service. {{{pages}}} contains the userid and url visited 
by that user. We are going to work through this exercise using the interactive 
shell: java -jar pig.jar -

We start off by loading the users dataset. 

Users = load ‘/data/users’ as (name, age);
Pages = load ‘data/pages’ as (user, url);

What is the format of this data? (use {{{describe Users;}}} or {{{dump 
Users;}}} to answer the question.

Now we filter:

Fltrd = filter Users by 
        age >= 18 and age <= 25;


Now lets do the join.

Jnd = join Fltrd by name, Pages by user;

What does this data look like? You can use describe to verify your answer.

Grpd = group Jnd by url;

How does group differ from join? Again use describe.

Smmd = foreach Grpd generate group,
       COUNT(Jnd) as clicks;
Srtd = order Smmd by clicks desc;
Top100 = limit Srtd 100;
store Top100 into ‘top100sites’;

Finish it up. Does top100sites contain what you expect?

Reply via email to