Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by breed: http://wiki.apache.org/pig/PigExercise2 ------------------------------------------------------------------------------ Now we are going to write a shell script to permute the names into a list of userids with ages. We will invoke it using: (Note, this time those quotes need to be back quotes.) {{{ - users = stream a through `randid.sh` as (user, age); + users = stream sn through `randid.sh` as (user, age); }}} randid.sh will get the contents of 'singlenames' in standard in. Things written to stdout will be taken as output tuples. By default tuples are separated by \n and fields by \t. If you'd rather skip the pain of writing the randid.sh script, here is an example: @@ -52, +52 @@ Okay now we have our users, lets generate the pages dataset. We want to generate a bunch of page requests for each user, so we will make a UDF that takes in tuples from users and generate fake traffic: {{{ - pages = foreach a generate flatten(pig.example.GenerateClicks(*)) as (user, url); + pages = foreach sn generate flatten(pig.example.GenerateClicks(*)) as (user, url); }}} GenerateClicks needs to extend EvalFunc<DataBag>. Here is an example implementation:
