Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigTutorial

------------------------------------------------------------------------------
   * To run the scripts in local mode, no Hadoop or DFS installation is 
required. All files are installed and run from your local host and file system.
   * To run the scripts on a Hadoop cluster, you need access to a Hadoop 
cluster and DFS installation.
  
- The Pig JAR file (pig.jar) and the Pig tutorial file (*.gz) include 
everything you need to get started. Follow these basic steps:
+ The Pig JAR file and the Pig tutorial file (both attached to this page) 
include everything you need to run the Pig scripts in local mode or on a Hadoop 
cluster. To get started, follow these basic steps:
  
   1. Install Java.
-  1. Install Pig.
-  1. Install and run the Pig scripts (in local mode or on a Hadoop cluster).
+  1. Install Pig (using the Pig JAR file).
+  1. Install and run the Pig scripts (using the Pig tutorial file).
  
  == Java Installation ==
  Make sure your run-time environment includes the following:
-  1. Java 1.5 (perferably from Sun)
+  1. Java 1.5.x (perferably from Sun)
   1. The JAVA_HOME environment variable is set the root of your Java 
installation. 
  
  
  == Pig Installation ==
+ We provide two Pig JAR files: '''pig-16.jar''' works with the Hadoop 0.16 
release; '''pig-17.jar''' works with the Hadoop 0.17 release. (If you want to 
run the tutorial scripts with the most recent Hadoop updates, you can also 
create your own Pig JAR file. See [http://wiki.apache.org/pig/GettingStarted 
Getting Started]).
+ 
  To install Pig, do the following:
  
-  1. Download the Pig JAR file (pig.jar) and move it to the appropriate 
directory. For example:  /home/me/pig. 
+  1. Download the Pig JAR file (pig-16.jar or pig-17.jar) from this page.
+  1. Rename the file pig.jar. 
+  1. Move the file to the appropriate directory on your system. For example:  
/home/me/pig. 
   1. Define an environment variable with the location of the Pig JAR file. For 
example: export PIGDIR=/home/me/pig (bash, sh) or setenv PIGDIR /home/me/pig 
(tcsh, csh).
  
  
- == Pig Script Installation and Run - Local Mode ==
+ == Pig Script Installation: Local Mode ==
  To install and run the Pig scripts in local mode, do the following:
  
-  1. Download and unzip the Pig tutorial file (*.gz) to your local directory.
+  1. Download the Pig tutorial file (*.gz) from this page.
+  1. Unzip the file in the appropriate directory on your system.
   1. Review the contents of the [#Pig_Tutorial_File Pig Tutorial File].
   1. Review the [#Tutorial_Pig_Script Tutorial Pig Script] and 
the[#Tutorial_Join_Pig_Script Tutorial-Join Pig Script].
   1. Execute the following command (using either tutorial-local.pig or 
tutorial-join-local.pig).
@@ -37, +42 @@

  $ java -cp $PIGDIR/pig.jar org.apache.pig.Main -x local tutorial-local.pig
  }}}
  
-  1.#5 Review the results:
+  1.#6 Review the results:
  {{{
  $ ls -l /tmp/ngrams.txt
  }}}
  
  
- == Pig Script Installation and Run - Hadoop Cluster ==
+ == Pig Script Installation: Hadoop Cluster ==
  To install and run the Pig scripts on a Hadoop cluster, do the following:
  
   1. Download and unzip the Pig tutorial file (*.gz) to your local directory. 
@@ -129, +134 @@

   * Uses the 
[http://wiki.apache.org/pig/PigLatin#FOREACH_..._GENERATE:_Applying_transformations_to_the_data
 FOREACH-GENERATE] command to assign names to the fields.
   * Uses the 
[http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
 FILTER] command to get the n-grams for hour ‘00’ 
   * Uses the 
[http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
 FILTER] command to get the n-grams for hour ‘12’ 
-  * Uses the [http://wiki.apache.org/pig/PigLatin#Joining JOIN] command to 
join the n-grams in hour “00” and  hour “12” by field $0
+  * Uses the [http://wiki.apache.org/pig/PigLatin#Joining JOIN] command to get 
the n-grams that appear in both hours.
-  * Uses the [http://wiki.apache.org/pig/PigBuiltins COUNT] function to get 
the count (occurrences) of the n-grams in both “00” and “12” 
+  * Uses the 
[http://wiki.apache.org/pig/PigLatin#FOREACH_..._GENERATE:_Applying_transformations_to_the_data
 FOREACH-GENERATE] command to record their frequency.
+ 
   * Uses the [http://wiki.apache.org/pig/PigBuiltins PigStorage] function to 
store the results. The output file contains a list of n-grams with the 
following fields: '''hour''', '''count00''', '''count12'''
  

Reply via email to