Author: olga Date: Wed Sep 9 22:28:08 2009 New Revision: 813143 URL: http://svn.apache.org/viewvc?rev=813143&view=rev Log: PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan)
Added: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/setup.xml Modified: hadoop/pig/trunk/CHANGES.txt hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/site.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/tabs.xml hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/udf.xml Modified: hadoop/pig/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=813143&r1=813142&r2=813143&view=diff ============================================================================== --- hadoop/pig/trunk/CHANGES.txt (original) +++ hadoop/pig/trunk/CHANGES.txt Wed Sep 9 22:28:08 2009 @@ -28,6 +28,8 @@ IMPROVEMENTS +PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan) + PIG-578: join ... outer, ... outer semantics are a no-ops, should produce corresponding null values (pradeepkth) Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml?rev=813143&r1=813142&r2=813143&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml Wed Sep 9 22:28:08 2009 @@ -33,148 +33,20 @@ <section> <title>Performance Enhancers</title> -<p>The following are a list of tips that people have discovered for making their pig queries run faster. Please feel free to add any tips you have. </p> <section> -<title>Use Latest Code</title> - -<p>The latest code has been merged into trunk on 1/12/09. It is significantly faster than the currently released code in Pig 0.1.1. We are planning to release Pig 0.2.0 that incorporates new changes shortly. Here is the performance comparison: </p> - -<table> -<tr> -<td> -<p><strong>Query Type</strong> </p> -</td> -<td> -<p> <strong>Pig 1.4 (s)</strong> </p> -</td> -<td> -<p> <strong>Pig 2.0 (s)</strong> </p> -</td> -<td> -<p> <strong>Improvement (times)</strong> </p> -</td> -</tr> -<tr> -<td> -<p> GENERATE with Arithmetic operations </p> -</td> -<td> -<p> 837 </p> -</td> -<td> -<p> 345 </p> -</td> -<td> -<p> <strong>2.4x</strong> </p> -</td> -</tr> -<tr> -<td> -<p> DISTINCT with 1 key </p> -</td> -<td> -<p> 186 </p> -</td> -<td> -<p> 129 </p> -</td> -<td> -<p> 1.4x </p> -</td> -</tr> -<tr> -<td> -<p> DISTINCT with 2 key s </p> -</td> -<td> -<p> 436 </p> -</td> -<td> -<p> 184 </p> -</td> -<td> -<p> <strong>2.4x</strong> </p> -</td> -</tr> -<tr> -<td> -<p> GROUP </p> -</td> -<td> -<p> 534 </p> -</td> -<td> -<p> 404 </p> -</td> -<td> -<p> 1.3x </p> -</td> -</tr> -<tr> -<td> -<p> GROUP ALL </p> -</td> -<td> -<p> 3594 </p> -</td> -<td> -<p> 394 </p> -</td> -<td> -<p> <strong>9x</strong> </p> -</td> -</tr> -<tr> -<td> -<p> JOIN </p> -</td> -<td> -<p> 15376 </p> -</td> -<td> -<p> 12783 </p> -</td> -<td> -<p> 1.2 </p> -</td> -</tr> -<tr> -<td> -<p> ORDER BY 1 key </p> -</td> -<td> -<p> 640 </p> -</td> -<td> -<p> 316 </p> -</td> -<td> -<p> <strong>2x</strong> </p> -</td> -</tr> -<tr> -<td> -<p> ORDER BY 2 keys </p> -</td> -<td> -<p> 767 </p> -</td> -<td> -<p> 472 </p> -</td> -<td> -<p> 1.6 x </p> -</td> -</tr> -</table> - +<title>Use Optimization</title> +<p>Pig supports various <a href="piglatin_users.html#Optimization+Rules">optimization rules</a> which are turned on by default. +Become familiar with these rules.</p> </section> + <section> <title>Use Types</title> -<p>If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with speed of arithmetic computation. It has an additional advantage of early error detection. </p> +<p>If types are not specified in the load statement, Pig assumes the type of =double= for numeric computations. +A lot of the time, your data would be much smaller, maybe, integer or long. Specifying the real type will help with +speed of arithmetic computation. It has an additional advantage of early error detection. </p> <source> --Query 1 @@ -280,7 +152,7 @@ <p>Queries that can take advantage of the combiner generally ran much faster (sometimes several times faster) than the versions that don't. The latest code significantly improves combiner usage; however, you need to make sure you do your part. If you have a UDF that works on grouped data and is, by nature, algebraic (meaning their computation can be decomposed into multiple steps) -make sure you implement it as such. For details on how to write algebraic UDFs, see <a href="http://wiki.apache.org/pig/UDFManual">UDF Manual</a>. </p> +make sure you implement it as such. For details on how to write algebraic UDFs, see the <a href="udf.html">Pig UDF Manual</a>. </p> <source> A = load 'data' as (x, y, z) Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml?rev=813143&r1=813142&r2=813143&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml Wed Sep 9 22:28:08 2009 @@ -1,236 +0,0 @@ -<?xml version="1.0" encoding="UTF-8"?> -<!-- - Licensed to the Apache Software Foundation (ASF) under one or more - contributor license agreements. See the NOTICE file distributed with - this work for additional information regarding copyright ownership. - The ASF licenses this file to You under the Apache License, Version 2.0 - (the "License"); you may not use this file except in compliance with - the License. You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. ---> -<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> -<document> - <header> - <title>Pig Getting Started Guide</title> - </header> - <body> - -<section> -<title>Overview</title> - <section id="req"> - <title>Requirements</title> - <p><strong>Unix</strong> and <strong>Windows</strong> users need the following:</p> - <ol> - <li> <strong>Hadoop 18</strong> - <a href="http://hadoop.apache.org/core/">http://hadoop.apache.org/core/</a></li> - <li> <strong>Java 1.6</strong> - <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a> Set JAVA_HOME to the root of your Java installation.</li> - <li> <strong>Ant 1.7</strong> - (optional, for builds) <a href="http://ant.apache.org/">http://ant.apache.org/</a></li> - <li> <strong>JUnit 4.5</strong> - (optional, for unit tests) <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a></li> - </ol> - <p><strong>Windows</strong> users need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a></p> - </section> - <section> - <title>Run Modes</title> - <p>Pig has two run modes or exectypes: </p> - <ul> - <li><p> Local Mode - To run Pig in local mode, you need access to a single machine. </p></li> - <li><p> Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. - Pig will automatically allocate and deallocate a 15-node cluster.</p></li> - </ul> - <p>You can run the Grunt shell, Pig scripts, or embedded programs using either mode.</p> - </section> -</section> - - -<section> -<title>Beginning Pig</title> - <section> - <title>Download Pig</title> - <p>To get a Pig distribution, download a recent stable release from one of the Apache Download Mirrors (see <a href="http://hadoop.apache.org/pig/releases.html"> Pig Releases</a>).</p> - <p>Unpack the downloaded Pig distribution. The Pig script is located in the bin directory (/pig-n.n.n/bin/pig).</p> - <p>Add /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv (tcsh,csh). For example: </p> -<source> -$ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATH -</source> - <p>Try the following command, to get a list of Pig commands: </p> -<source> -$ pig -help -</source> - <p>Try the following command, to start the Grunt shell:</p> -<source> -$ pig -</source> -</section> - -<section> -<title>Grunt Shell</title> -<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file used in the example.</p> -<p>You can also run or execute script files from the Grunt shell. See the RUN and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>. </p> -<p><strong>Local Mode</strong></p> -<source> -$ pig -x local -</source> -<p><strong>Mapreduce Mode</strong> </p> -<source> -$ pig -or -$ pig -x mapreduce -</source> -<p>For either mode, the Grunt shell is invoked and you can enter commands at the prompt. The results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used). -</p> -<source> -grunt> A = load 'passwd' using PigStorage(':'); -grunt> B = foreach A generate $0 as id; -grunt> dump B; -grunt> store B; -</source> -</section> - -<section> -<title>Script Files</title> -<p>Use script files to run Pig commands as batch jobs. See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file and the script file (id.pig) used in the example.</p> -<p><strong>Local Mode</strong></p> -<source> -$ pig -x local id.pig -</source> -<p><strong>Mapreduce Mode</strong> </p> -<source> -$ pig id.pig -or -$ pig -x mapreduce id.pig -</source> -<p>For either mode, the Pig Latin statements are executed and the results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used).</p> -</section> -</section> - - -<section> - <title>Advanced Pig</title> - - <section> - <title>Build Pig</title> - <p>To build pig, do the following:</p> - <ol> - <li> Check out the Pig code from SVN: <em>svn co http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li> - <li> Build the code from the top directory: <em>ant</em>. If the build is successful, you should see the <em>pig.jar</em> created in that directory. </li> - <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant test</em></li> - </ol> - </section> - -<section> - <title>Environment Variables and Properties</title> - <p>Refer to the <a href="getstarted.html#Download+Pig">Download Pig</a> section.</p> - <p>The Pig environment variables are described in the Pig script file, located in the /pig-n.n.n/bin directory.</p> - <p>The Pig properties file, pig.properties, is located in the /pig-n.n.n/conf directory. You can specify an alternate location using the PIG_CONF_DIR environment variable.</p> -</section> - -<section> -<title>Embedded Programs</title> -<p>Used the embedded option to embed Pig commands in a host language and run the program. -See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the passwd file and java files (idlocal.java, idmapreduce.java) used in the examples.</p> - -<p><strong>Local Mode</strong></p> -<p>From your current working directory, compile the program: </p> -<source> -$ javac -cp pig.jar idlocal.java -</source> -<p>Note: idlocal.class is written to your current working directory. Include â.â in the class path when you run the program. </p> -<p>From your current working directory, run the program: -</p> -<source> -Unix: $ java -cp pig.jar:. idlocal -Cygwin: $ java âcp â.;pig.jarâ idlocal -</source> -<p>To view the results, check the output file, id.out. </p> - -<p><strong>Mapreduce Mode</strong></p> -<p>Point $HADOOPDIR to the directory that contains the hadoop-site.xml file. Example: -</p> -<source> -$ export HADOOPDIR=/yourHADOOPsite/conf -</source> -<p>From your current working directory, compile the program: -</p> -<source> -$ javac -cp pig.jar idmapreduce.java -</source> -<p>Note: idmapreduce.class is written to your current working directory. Include â.â in the class path when you run the program. </p> -<p>From your current working directory, run the program: -</p> -<source> -Unix: $ java -cp pig.jar:.:$HADOOPDIR idmapreduce -Cygwin: $ java âcp â.;pig.jar;$HADOOPDIRâ idmapreduce -</source> -<p>To view the results, check the idout directory on your Hadoop system. </p> -</section> -</section> - - -<section> -<title>Sample Code</title> - -<p>The sample code is based on Pig Latin statements that extract all user IDs from the /etc/passwd file. </p> -<p>Copy the /etc/passwd file to your local working directory.</p> - -<p><strong>id.pig</strong></p> -<p>For the Grunt Shell and script files. </p> -<source> -A = load 'passwd' using PigStorage(':'); -B = foreach A generate $0 as id; -dump B; -store B into âid.outâ; -</source> - -<p><strong>idlocal.java</strong></p> -<p>For embedded programs. </p> -<source> -import java.io.IOException; -import org.apache.pig.PigServer; -public class idlocal{ -public static void main(String[] args) { -try { - PigServer pigServer = new PigServer("local"); - runIdQuery(pigServer, "passwd"); - } - catch(Exception e) { - } - } -public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { - pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');"); - pigServer.registerQuery("B = foreach A generate $0 as id;"); - pigServer.store("B", "id.out"); - } -} -</source> - -<p><strong>idmapreduce.java</strong></p> -<p>For embedded programs. </p> -<source> -import java.io.IOException; -import org.apache.pig.PigServer; -public class idmapreduce{ - public static void main(String[] args) { - try { - PigServer pigServer = new PigServer("mapreduce"); - runIdQuery(pigServer, "passwd"); - } - catch(Exception e) { - } -} -public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { - pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');") - pigServer.registerQuery("B = foreach A generate $0 as id;"); - pigServer.store("B", "idout"); - } -} -</source> - -</section> -</body> -</document> Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=813143&r1=813142&r2=813143&view=diff ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml (original) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml Wed Sep 9 22:28:08 2009 @@ -27,9 +27,12 @@ The Pig Documentation provides the information you need to get started using Pig. </p> <p> - Begin with the <a href="getstarted.html"> Pig Getting Started Guide</a> which shows you how to download and run Pig. - Then try out the <a href="tutorial.html">Pig Tutorial</a> to get an idea of how easy it us to use Pig. - When you are ready to start writing your own scripts, read through the <a href="piglatin.html">Pig Latin Manual </a>to become familiar with Pig's features. + Begin with the <a href="setup.html"> Pig Setup</a> which shows you how to download and run Pig. + Then try out the <a href="tutorial.html">Pig Tutorial</a> to get an idea of how easy it is to use Pig. + </p> + <p> + When you are ready to start writing your own scripts, read through the <a href="piglatin_users.html">Pig Latin Users Guide</a> + and the <a href="piglatin_reference.html">Pig Latin Reference Manual</a> to become familiar with Pig's features. Also review the <a href="cookbook.html">Pig Cookbook</a> to learn how to tweak your code for optimal performance. </p> <p>