Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigTutorial

------------------------------------------------------------------------------
   * '''Local Mode''': To run the scripts in local mode, no Hadoop or HDFS 
installation is required. All files are installed and run from your local host 
and file system.
   * '''Hadoop Mode''': To run the scripts in hadoop (mapreduce) mode, you need 
access to a Hadoop cluster and HDFS installation.
  
- The Pig tutorial file (attachment:pigtutorial.tar.gz or the 
tutorial/pigtutorial.tar.gz file in the pig distribution) includes the Pig JAR 
file (pig.jar) and the tutorial files (tutorial.jar, Pigs scripts, log files). 
These files work with Hadoop 0.17 and provide everything you need to run the 
Pig scripts. To get started, follow these basic steps: 
+ The Pig tutorial file (attachment:pigtutorial.tar.gz or the 
tutorial/pigtutorial.tar.gz file in the pig distribution) includes the Pig JAR 
file (pig.jar) and the tutorial files (tutorial.jar, Pigs scripts, log files). 
These files work with Hadoop 0.18 and provide everything you need to run the 
Pig scripts. To get started, follow these basic steps: 
  
   1. Install Java.
   1. Download the Pig tutorial file and install Pig.
@@ -112, +112 @@

  REGISTER ./tutorial.jar; 
  }}}
  
-  * Use the [http://wiki.apache.org/pig/PigBuiltins PigStorage] function to 
load the excite log file (excite.log or excite-small.log) into the “raw” 
bag as an array of records with the fields '''user''', '''time''', and 
'''query'''. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_PigStorage_
 PigStorage] function to load the excite log file (excite.log or 
excite-small.log) into the “raw” bag as an array of records with the fields 
'''user''', '''time''', and '''query'''. 
  {{{
  raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);
  }}}
@@ -138, +138 @@

  ngramed1 = FOREACH houred GENERATE user, hour, 
flatten(org.apache.pig.tutorial.NGramGenerator(query)) as ngram;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#DISTINCT:_Eliminating_duplicates_in_data 
DISTINCT] command to get the unique n-grams for all records. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_DISTINCT
 DISTINCT] command to get the unique n-grams for all records. 
  {{{ 
  ngramed2 = DISTINCT ngramed1;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#COGROUP:_Getting_the_relevant_data_together
 GROUP] command to group records by n-gram and hour.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_GROUP
 GROUP] command to group records by n-gram and hour.
  {{{ 
  hour_frequency1 = GROUP ngramed2 BY (ngram, hour);
  }}}
  
-  * Use the [http://wiki.apache.org/pig/PigBuiltins COUNT] function to get the 
count (occurrences) of each n-gram. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_COUNT
 COUNT] function to get the count (occurrences) of each n-gram. 
  {{{ 
  hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0), COUNT($1) as 
count;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#COGROUP:_Getting_the_relevant_data_together
 GROUP] command to group records by n-gram only. Each group now corresponds to 
a distinct n-gram and has the count for each hour.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_GROUP
 GROUP] command to group records by n-gram only. Each group now corresponds to 
a distinct n-gram and has the count for each hour.
  {{{ 
  uniq_frequency1 = GROUP hour_frequency2 BY group::ngram;
  }}}
@@ -163, +163 @@

  uniq_frequency2 = FOREACH uniq_frequency1 GENERATE flatten($0), 
flatten(org.apache.pig.tutorial.ScoreGenerator($1));
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#FOREACH_..._GENERATE:_Applying_transformations_to_the_data
 FOREACH-GENERATE] command to assign names to the fields. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FOREACH_…_GENERATE
 FOREACH-GENERATE] command to assign names to the fields. 
  {{{ 
  uniq_frequency3 = FOREACH uniq_frequency2 GENERATE $1 as hour, $0 as ngram, 
$2 as score, $3 as count, $4 as mean;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
 FILTER] command to move all records with a score less than or equal to 2.0.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FILTER_
 FILTER] command to move all records with a score less than or equal to 2.0.
  {{{ 
  filtered_uniq_frequency = FILTER uniq_frequency3 BY score > 2.0;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#ORDER:_Sorting_data_according_to_some_field
 ORDER] command to sort the remaining records by hour and score. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_ORDER_
 ORDER] command to sort the remaining records by hour and score. 
  {{{ 
  ordered_uniq_frequency = ORDER filtered_uniq_frequency BY (hour, score);
  }}}
  
-  * Use the [http://wiki.apache.org/pig/PigBuiltins PigStorage] function to 
store the results. The output file contains a list of n-grams with the 
following fields: '''hour''', '''ngram''', '''score''', '''count''', '''mean'''.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_PigStorage_
 PigStorage] function to store the results. The output file contains a list of 
n-grams with the following fields: '''hour''', '''ngram''', '''score''', 
'''count''', '''mean'''.
  {{{ 
  STORE ordered_uniq_frequency INTO '/tmp/tutorial-results' USING PigStorage(); 
  }}}
@@ -194, +194 @@

  REGISTER ./tutorial.jar;
  }}}
   
-  * Use the [http://wiki.apache.org/pig/PigBuiltins PigStorage] function to 
load the excite log file (excite.log or excite-small.log) into the “raw” 
bag as an array of records with the fields '''user''', '''time''', and 
'''query'''.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_PigStorage_
 PigStorage] function to load the excite log file (excite.log or 
excite-small.log) into the “raw” bag as an array of records with the fields 
'''user''', '''time''', and '''query'''.
  {{{
  raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, time, query);
  }}}
@@ -223, +223 @@

  }}}
  
   
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#DISTINCT:_Eliminating_duplicates_in_data 
DISTINCT] command to get the unique n-grams for all records. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_DISTINCT
 DISTINCT] command to get the unique n-grams for all records. 
  {{{
  ngramed2 = DISTINCT ngramed1;
  }}}
  
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#COGROUP:_Getting_the_relevant_data_together
 GROUP] command to group the records by n-gram and hour. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_GROUP
 GROUP] command to group the records by n-gram and hour. 
  {{{
  hour_frequency1 = GROUP ngramed2 BY (ngram, hour);
  }}}
  
  
-  * Use the [http://wiki.apache.org/pig/PigBuiltins COUNT] function to get the 
count (occurrences) of each n-gram. 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_COUNT
 COUNT] function to get the count (occurrences) of each n-gram. 
  {{{
  hour_frequency2 = FOREACH hour_frequency1 GENERATE flatten($0), COUNT($1) as 
count;
  }}}
  
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#FOREACH_..._GENERATE:_Applying_transformations_to_the_data
 FOREACH-GENERATE] command to assign names to the fields.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FOREACH_…_GENERATE
 FOREACH-GENERATE] command to assign names to the fields.
  {{{
  hour_frequency3 = FOREACH hour_frequency2 GENERATE $0 as ngram, $1 as hour, 
$2 as count;
  }}}
  
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
 FILTER] command to get the n-grams for hour ‘00’ 
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FILTER_
 FILTER] command to get the n-grams for hour ‘00’ 
  {{{
  hour00 = FILTER hour_frequency2 BY hour eq '00';
  }}}
  
  
-  * Uses the 
[http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
 FILTER] command to get the n-grams for hour ‘12’
+  * Uses the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FILTER_
 FILTER] command to get the n-grams for hour ‘12’
  {{{
  hour12 = FILTER hour_frequency3 BY hour eq '12';
  }}}
  
   
-  * Use the [http://wiki.apache.org/pig/PigLatin#Joining JOIN] command to get 
the n-grams that appear in both hours.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_JOIN
 JOIN] command to get the n-grams that appear in both hours.
  {{{
  same = JOIN hour00 BY $0, hour12 BY $0;
  }}}
  
-  * Use the 
[http://wiki.apache.org/pig/PigLatin#FOREACH_..._GENERATE:_Applying_transformations_to_the_data
 FOREACH-GENERATE] command to record their frequency.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_FOREACH_…_GENERATE
 FOREACH-GENERATE] command to record their frequency.
  {{{
  same1 = FOREACH same GENERATE hour_frequency2::hour00::group::ngram as ngram, 
$2 as count00, $5 as count12;
  }}}
  
-  * Use the [http://wiki.apache.org/pig/PigBuiltins PigStorage] function to 
store the results. The output file contains a list of n-grams with the 
following fields: '''hour''', '''count00''', '''count12'''.
+  * Use the 
[http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_PigStorage_
 PigStorage] function to store the results. The output file contains a list of 
n-grams with the following fields: '''hour''', '''count00''', '''count12'''.
  {{{
  STORE same1 INTO '/tmp/tutorial-join-results' USING PigStorage();
  }}}

Reply via email to