Author: olga Date: Thu Jun 11 22:58:40 2009 New Revision: 783954 URL: http://svn.apache.org/viewvc?rev=783954&view=rev Log: PIG-817: 0.3.0 docs update
Modified: hadoop/pig/trunk/CHANGES.txt hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin.xml Modified: hadoop/pig/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=783954&r1=783953&r2=783954&view=diff ============================================================================== --- hadoop/pig/trunk/CHANGES.txt (original) +++ hadoop/pig/trunk/CHANGES.txt Thu Jun 11 22:58:40 2009 @@ -28,16 +28,18 @@ BUG FIXES -PIG-835: Multiquery optimization does not handle the case where the map keys -in the split plans have different key types (tuple and non tuple key type) -(pradeepkth) - Release 0.3.0 - Unreleased INCOMPATIBLE CHANGES IMPROVEMENTS +PIG-835: Multiquery optimization does not handle the case where the map keys +in the split plans have different key types (tuple and non tuple key type) +(pradeepkth) + +PIG-817: documentation update (chandec via olgan) + PIG-830: Add RegExLoader and apache log utils to piggybank (dvryaboy via gates). PIG-831: Turned off reporting of records and bytes written for mutli-store Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml?rev=783954&r1=783953&r2=783954&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml Thu Jun 11 22:58:40 2009 @@ -21,107 +21,121 @@ <title>Pig Getting Started Guide</title> </header> <body> - + +<section> +<title>Overview</title> <section id="req"> <title>Requirements</title> - <p><strong>Unix</strong> and <strong>Windows</strong> users need the following:</p> <ol> - <li> <strong>Hadoop 18</strong>: <a href="http://hadoop.apache.org/core/">http://hadoop.apache.org/core/</a></li> - <li> <strong>Java 1.6</strong>, preferably from Sun: <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a>. Set JAVA_HOME to the root of your Java installation.</li> - <li> <strong>Ant</strong> for builds: <a href="http://ant.apache.org/">http://ant.apache.org/</a>.</li> - <li> <strong>JUnit</strong> for unit tests: <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a>.</li> + <li> <strong>Hadoop 18</strong> - <a href="http://hadoop.apache.org/core/">http://hadoop.apache.org/core/</a></li> + <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> Set JAVA_HOME to the root of your Java installation.</li> + <li> <strong>Ant 1.7</strong> - (optional, for builds) <a href="http://ant.apache.org/">http://ant.apache.org/</a></li> + <li> <strong>JUnit 4.5</strong> - (optional, for unit tests) <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a></li> </ol> - <p><strong>Windows</strong> users need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a>.</p> - </section> + <p><strong>Windows</strong> users need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a></p> + </section> + <section> + <title>Run Modes</title> + <p>Pig has two run modes or exectypes: </p> + <ul> + <li><p> Local Mode - To run Pig in local mode, you need access to a single machine. </p></li> + <li><p> Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. + Pig will automatically allocate and deallocate a 15-node cluster.</p></li> + </ul> + <p>You can run the Grunt shell, Pig scripts, or embedded programs using either mode.</p> + </section> +</section> + +<section> +<title>Beginning Pig</title> <section> <title>Download Pig</title> <p>To get a Pig distribution, download a recent stable release from one of the Apache Download Mirrors (see <a href="http://hadoop.apache.org/pig/releases.html"> Pig Releases</a>).</p> - <p>Unpack the downloaded Pig distribution. You can find the Pig script in the bin directory (/pig-n.n.n/bin/pig).</p> + <p>Unpack the downloaded Pig distribution. The Pig script is located in the bin directory (/pig-n.n.n/bin/pig).</p> <p>Add /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv (tcsh,csh). For example: </p> <source> $ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATH </source> - <p>Try the following command, to get a listing of all Pig commands </p> + <p>Try the following command, to get a list of Pig commands: </p> <source> $ pig -help </source> - <p>Try the following command, to start the Grunt Shell:</p> + <p>Try the following command, to start the Grunt shell:</p> <source> $ pig </source> - - - </section> - - <section> - <title>Build Pig</title> - <p>(optional) To build pig, do the following:</p> - <ol> - <li> Check out the Pig code from SVN: <em>svn co http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li> - <li> Build the code from the top directory: <em>ant</em>. If the build is successful, you should see the <em>pig.jar</em> created in that directory. </li> - <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant test</em></li> - </ol> - </section> - -<section> - <title>Run Pig</title> - <p>Pig has two run modes or exectypes: </p> - <ul> - <li><p> Local Mode: To run Pig in local mode, you need access to a single machine. </p></li> - <li><p> Mapreduce Mode: To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. - Pig will automatically allocate and deallocate a 15-node cluster.</p></li> - </ul> - +</section> <section> <title>Grunt Shell</title> -<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. -(You can also run or execute script files from the Grunt shell. See the RUN and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>). </p> -<p>Local mode: -</p> +<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file used in the example.</p> +<p>You can also run or execute script files from the Grunt shell. See the RUN and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>. </p> +<p><strong>Local Mode</strong></p> <source> $ pig -x local </source> -<p>Mapreduce mode: -</p> +<p><strong>Mapreduce Mode</strong> </p> <source> $ pig or $ pig -x mapreduce </source> -<p>The Grunt shell is invoked and you can enter commands at the prompt. +<p>For either mode, the Grunt shell is invoked and you can enter commands at the prompt. The results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used). </p> <source> grunt> A = load 'passwd' using PigStorage(':'); grunt> B = foreach A generate $0 as id; grunt> dump B; +grunt> store B; </source> </section> <section> <title>Script Files</title> -<p>Use script files to run Pig commands as batch jobs. See the sample code for the script file (id.pig) used in the examples.</p> -<p>Local mode:</p> +<p>Use script files to run Pig commands as batch jobs. See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file and the script file (id.pig) used in the example.</p> +<p><strong>Local Mode</strong></p> <source> $ pig -x local id.pig </source> -<p>Mapreduce mode: </p> +<p><strong>Mapreduce Mode</strong> </p> <source> $ pig id.pig or $ pig -x mapreduce id.pig </source> -<p>The Pig Latin statements are executed and the results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used).</p> +<p>For either mode, the Pig Latin statements are executed and the results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used).</p> </section> +</section> + <section> - <title>Embedded Programs</title> -<p>Embed Pig commands in a host language and run the program. -See the sample code for the java files (idlocal.java, idmapreduce.java) used in the examples.</p> - <section> -<title> Local Mode</title> + <title>Advanced Pig</title> + + <section> + <title>Build Pig</title> + <p>To build pig, do the following:</p> + <ol> + <li> Check out the Pig code from SVN: <em>svn co http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li> + <li> Build the code from the top directory: <em>ant</em>. If the build is successful, you should see the <em>pig.jar</em> created in that directory. </li> + <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant test</em></li> + </ol> + </section> + +<section> + <title>Environment Variables and Properties</title> + <p>Refer to the <a href="getstarted.html#Download+Pig">Download Pig</a> section.</p> + <p>The Pig environment variables are described in the Pig script file, located in the /pig-n.n.n/bin directory.</p> + <p>The Pig properties file, pig.properties, is located in the /pig-n.n.n/conf directory. You can specify an alternate location using the PIG_CONF_DIR environment variable.</p> +</section> + +<section> +<title>Embedded Programs</title> +<p>Used the embedded option to embed Pig commands in a host language and run the program. +See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file and java files (idlocal.java, idmapreduce.java) used in the examples.</p> + +<p><strong>Local Mode</strong></p> <p>From your current working directory, compile the program: </p> <source> $ javac -cp pig.jar idlocal.java @@ -134,9 +148,8 @@ Cygwin: $ java âcp â.;pig.jarâ idlocal </source> <p>To view the results, check the output file, id.out. </p> -</section> -<section> -<title>Mapreduce Mode</title> + +<p><strong>Mapreduce Mode</strong></p> <p>Point $HADOOPDIR to the directory that contains the hadoop-site.xml file. Example: </p> <source> @@ -155,19 +168,17 @@ Cygwin: $ java âcp â.;pig.jar;$HADOOPDIRâ idhadoop </source> <p>To view the results, check the idout directory on your Hadoop system. </p> - -</section> </section> </section> + <section> <title>Sample Code</title> <p>The sample code is based on Pig Latin statements that extract all user IDs from the /etc/passwd file. </p> <p>Copy the /etc/passwd file to your local working directory.</p> -<section> -<title>id.pig</title> +<p><strong>id.pig</strong></p> <p>For the Grunt Shell and script files. </p> <source> A = load 'passwd' using PigStorage(':'); @@ -175,10 +186,8 @@ dump B; store B into âid.outâ; </source> -</section> -<section> -<title>idlocal.java</title> +<p><strong>idlocal.java</strong></p> <p>For embedded programs. </p> <source> import java.io.IOException; @@ -199,10 +208,8 @@ } } </source> -</section> -<section> -<title>idmapreduce.java</title> +<p><strong>idmapreduce.java</strong></p> <p>For embedded programs. </p> <source> import java.io.IOException; @@ -223,10 +230,7 @@ } } </source> -</section> - </section> - </body> </document>