Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by breed: http://wiki.apache.org/pig/PigExercise1 New page: In this exercise we will work through the example shown in the presentation. We have two datasets: users and pages. {{{users}}} contains the userid and age of every user using some service. {{{pages}}} contains the userid and url visited by that user. We are going to work through this exercise using the interactive shell: java -jar pig.jar - We start off by loading the users dataset. {{{ Users = load â/data/usersâ as (name, age); Pages = load âdata/pagesâ as (user, url); }}} What is the format of this data? (use {{{describe Users;}}} or {{{dump Users;}}} to answer the question. Now we filter: {{{ Fltrd = filter Users by age >= 18 and age <= 25; }}} Now lets do the join. {{{ Jnd = join Fltrd by name, Pages by user; }}} What does this data look like? You can use describe to verify your answer. {{{ Grpd = group Jnd by url; }}} How does group differ from join? Again use describe. {{{ Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top100 = limit Srtd 100; store Top100 into âtop100sitesâ; }}} Finish it up. Does top100sites contain what you expect?