svn commit: r813143 [1/4] - in /hadoop/pig/trunk: ./ src/docs/src/documentation/content/xdocs/

olga Wed, 09 Sep 2009 15:28:51 -0700

Author: olga
Date: Wed Sep  9 22:28:08 2009
New Revision: 813143

URL: http://svn.apache.org/viewvc?rev=813143&view=rev
Log:
PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan)


Added:
    
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/setup.xml
Modified:
    hadoop/pig/trunk/CHANGES.txt
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/site.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/tabs.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/udf.xml

Modified: hadoop/pig/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=813143&r1=813142&r2=813143&view=diff
==============================================================================
--- hadoop/pig/trunk/CHANGES.txt (original)
+++ hadoop/pig/trunk/CHANGES.txt Wed Sep  9 22:28:08 2009
@@ -28,6 +28,8 @@
 
 IMPROVEMENTS
 
+PIG-938: documentation changes for Pig 0.4.0 release (chandec via olgan)
+
 PIG-578: join ... outer, ... outer semantics are a no-ops, should produce
 corresponding null values (pradeepkth)
 

Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml?rev=813143&r1=813142&r2=813143&view=diff
==============================================================================
--- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml 
(original)
+++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/cookbook.xml Wed 
Sep  9 22:28:08 2009
@@ -33,148 +33,20 @@
 <section>
 <title>Performance Enhancers</title>
 
-<p>The following are a list of tips that people have discovered for making 
their pig queries run faster. Please feel free to add any tips you have. </p>
 
 <section>
-<title>Use Latest Code</title>
-
-<p>The latest code has been merged into trunk on 1/12/09. It is significantly 
faster than the currently released code in Pig 0.1.1. We are planning to 
release Pig 0.2.0 that incorporates new changes shortly. Here is the 
performance comparison: </p>
-
-<table>
-<tr>
-<td>
-<p><strong>Query Type</strong> </p>
-</td>
-<td>
-<p> <strong>Pig 1.4 (s)</strong> </p>
-</td>
-<td>
-<p> <strong>Pig 2.0 (s)</strong> </p>
-</td>
-<td>
-<p> <strong>Improvement (times)</strong> </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> GENERATE with Arithmetic operations </p>
-</td>
-<td>
-<p> 837 </p>
-</td>
-<td>
-<p> 345 </p>
-</td>
-<td>
-<p> <strong>2.4x</strong> </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> DISTINCT with 1 key </p>
-</td>
-<td>
-<p> 186 </p>
-</td>
-<td>
-<p> 129 </p>
-</td>
-<td>
-<p> 1.4x </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> DISTINCT with 2 key s  </p>
-</td>
-<td>
-<p> 436 </p>
-</td>
-<td>
-<p> 184 </p>
-</td>
-<td>
-<p> <strong>2.4x</strong> </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> GROUP </p>
-</td>
-<td>
-<p> 534 </p>
-</td>
-<td>
-<p> 404 </p>
-</td>
-<td>
-<p> 1.3x </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> GROUP ALL </p>
-</td>
-<td>
-<p> 3594 </p>
-</td>
-<td>
-<p> 394 </p>
-</td>
-<td>
-<p> <strong>9x</strong> </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> JOIN </p>
-</td>
-<td>
-<p> 15376 </p>
-</td>
-<td>
-<p> 12783 </p>
-</td>
-<td>
-<p> 1.2 </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> ORDER BY 1 key </p>
-</td>
-<td>
-<p> 640 </p>
-</td>
-<td>
-<p> 316 </p>
-</td>
-<td>
-<p> <strong>2x</strong> </p>
-</td>
-</tr>
-<tr>
-<td>
-<p> ORDER BY 2 keys </p>
-</td>
-<td>
-<p> 767 </p>
-</td>
-<td>
-<p> 472 </p>
-</td>
-<td>
-<p> 1.6 x </p>
-</td>
-</tr>
-</table>
-
+<title>Use Optimization</title>
+<p>Pig supports various <a 
href="piglatin_users.html#Optimization+Rules">optimization rules</a> which are 
turned on by default. 
+Become familiar with these rules.</p>
 </section>
 
+
 <section>
 <title>Use Types</title>
 
-<p>If types are not specified in the load statement, Pig assumes the type of 
=double= for numeric computations. A lot of the time, your data would be much 
smaller, maybe, integer or long. Specifying the real type will help with speed 
of arithmetic computation. It has an additional advantage of early error 
detection. </p>
+<p>If types are not specified in the load statement, Pig assumes the type of 
=double= for numeric computations. 
+A lot of the time, your data would be much smaller, maybe, integer or long. 
Specifying the real type will help with 
+speed of arithmetic computation. It has an additional advantage of early error 
detection. </p>
 
 <source>
 --Query 1
@@ -280,7 +152,7 @@
 <p>Queries that can take advantage of the combiner generally ran much faster 
(sometimes several times faster) than the versions that don't. 
 The latest code significantly improves combiner usage; however, you need to 
make sure you do your part. 
 If you have a UDF that works on grouped data and is, by nature, algebraic 
(meaning their computation can be decomposed into multiple steps) 
-make sure you implement it as such. For details on how to write algebraic 
UDFs, see <a href="http://wiki.apache.org/pig/UDFManual";>UDF Manual</a>. </p>
+make sure you implement it as such. For details on how to write algebraic 
UDFs, see the <a href="udf.html">Pig UDF Manual</a>. </p>
 
 <source>
 A = load 'data' as (x, y, z)

Modified: 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml?rev=813143&r1=813142&r2=813143&view=diff
==============================================================================
--- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml 
(original)
+++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml 
Wed Sep  9 22:28:08 2009
@@ -1,236 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
-<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
"http://forrest.apache.org/dtd/document-v20.dtd";>
-<document>
-  <header>
-    <title>Pig Getting Started Guide</title>
-  </header>
-  <body>
- 
-<section>
-<title>Overview</title>
-    <section id="req">
-      <title>Requirements</title>
-      <p><strong>Unix</strong> and <strong>Windows</strong> users need the 
following:</p>
-               <ol>
-                 <li> <strong>Hadoop 18</strong> - <a 
href="http://hadoop.apache.org/core/";>http://hadoop.apache.org/core/</a></li>
-                 <li> <strong>Java 1.6</strong> - <a 
href="http://java.sun.com/javase/downloads/index.jsp";>http://java.sun.com/javase/downloads/index.jsp</a>
 Set JAVA_HOME to the root of your Java installation.</li>
-                 <li> <strong>Ant 1.7</strong> - (optional, for builds) <a 
href="http://ant.apache.org/";>http://ant.apache.org/</a></li>
-                 <li> <strong>JUnit 4.5</strong> - (optional, for unit tests) 
<a href="http://junit.sourceforge.net/";>http://junit.sourceforge.net/</a></li>
-               </ol>
-       <p><strong>Windows</strong> users need to install Cygwin and the Perl 
package: <a href="http://www.cygwin.com/";> http://www.cygwin.com/</a></p>
-    </section>
-       <section>
-               <title>Run Modes</title>
-               <p>Pig has two run modes or exectypes:  </p>
-    <ul>
-      <li><p> Local Mode - To run Pig in local mode, you need access to a 
single machine.  </p></li>
-      <li><p> Mapreduce Mode - To run Pig in mapreduce mode, you need access 
to a Hadoop cluster and HDFS installation. 
-      Pig will automatically allocate and deallocate a 15-node 
cluster.</p></li>
-    </ul>
-    <p>You can run the Grunt shell, Pig scripts, or embedded programs using 
either mode.</p>
-    </section>         
-</section>      
-        
-        
-<section>
-<title>Beginning Pig</title>
-    <section>
-       <title>Download Pig</title>
-       <p>To get a Pig distribution, download a recent stable release from one 
of the Apache Download Mirrors (see <a 
href="http://hadoop.apache.org/pig/releases.html";> Pig Releases</a>).</p>
-       <p>Unpack the downloaded Pig distribution. The Pig script is located in 
the bin directory (/pig-n.n.n/bin/pig).</p>
-       <p>Add /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv 
(tcsh,csh). For example: </p>
-<source>
-$ export PATH=/&lt;my-path-to-pig&gt;/pig-n.n.n/bin:$PATH
-</source>
-       <p>Try the following command, to get a list of Pig commands: </p>       
-<source>
-$ pig -help
-</source>
-       <p>Try the following command, to start the Grunt shell:</p>
-<source>
-$ pig 
-</source>
-</section>  
-
-<section>
-<title>Grunt Shell</title>
-<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. See the 
<a href="getstarted.html#Sample+Code">Sample Code</a> for instructions about 
the passwd file used in the example.</p>
-<p>You can also run or execute script files from the Grunt shell. See the RUN 
and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>. </p>
-<p><strong>Local Mode</strong></p>
-<source>
-$ pig -x local
-</source>
-<p><strong>Mapreduce Mode</strong> </p>
-<source>
-$ pig
-or
-$ pig -x mapreduce
-</source>
-<p>For either mode, the Grunt shell is invoked and you can enter commands at 
the prompt. The results are displayed to your terminal screen (if DUMP is used) 
or to a file (if STORE is used).
-</p>
-<source>
-grunt&gt; A = load 'passwd' using PigStorage(':'); 
-grunt&gt; B = foreach A generate $0 as id; 
-grunt&gt; dump B; 
-grunt&gt; store B; 
-</source>
-</section>
-
-<section>
-<title>Script Files</title>
-<p>Use script files to run Pig commands as batch jobs. See the <a 
href="getstarted.html#Sample+Code">Sample Code</a> for instructions about the 
passwd file and the script file (id.pig) used in the example.</p>
-<p><strong>Local Mode</strong></p>
-<source>
-$ pig -x local id.pig
-</source>
-<p><strong>Mapreduce Mode</strong> </p>
-<source>
-$ pig id.pig
-or
-$ pig -x mapreduce id.pig
-</source>
-<p>For either mode, the Pig Latin statements are executed and the results are 
displayed to your terminal screen (if DUMP is used) or to a file (if STORE is 
used).</p>
-</section>
-</section>
-
-
-<section>
- <title>Advanced Pig</title>
-
-    <section>
-      <title>Build Pig</title>
-      <p>To build pig, do the following:</p>
-     <ol>
-         <li> Check out the Pig code from SVN: <em>svn co 
http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li>
-         <li> Build the code from the top directory: <em>ant</em>. If the 
build is successful, you should see the <em>pig.jar</em> created in that 
directory. </li>    
-         <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant 
test</em></li>
-     </ol>
-    </section>
-
-<section>
-       <title>Environment Variables and Properties</title>
-       <p>Refer to the <a href="getstarted.html#Download+Pig">Download Pig</a> 
section.</p>
-       <p>The Pig environment variables are described in the Pig script file, 
located in the  /pig-n.n.n/bin directory.</p>
-       <p>The Pig properties file, pig.properties, is located in the 
/pig-n.n.n/conf directory. You can specify an alternate location using the 
PIG_CONF_DIR environment variable.</p>
-</section>
-
-<section>
-<title>Embedded Programs</title>
-<p>Used the embedded option to embed Pig commands in a host language and run 
the program. 
-See the <a href="getstarted.html#Sample+Code">Sample Code</a> for instructions 
about the passwd file and java files (idlocal.java, idmapreduce.java) used in 
the examples.</p>
-
-<p><strong>Local Mode</strong></p>
-<p>From your current working directory, compile the program: </p>
-<source>
-$ javac -cp pig.jar idlocal.java
-</source>
-<p>Note: idlocal.class is written to your current working directory. Include 
â.â in the class path when you run the program. </p>
-<p>From your current working directory, run the program: 
-</p>
-<source>
-Unix:   $ java -cp pig.jar:. idlocal
-Cygwin: $ java âcp â.;pig.jarâ idlocal
-</source>
-<p>To view the results, check the output file, id.out. </p>
-
-<p><strong>Mapreduce Mode</strong></p>
-<p>Point $HADOOPDIR to the directory that contains the hadoop-site.xml file. 
Example: 
-</p>
-<source>
-$ export HADOOPDIR=/yourHADOOPsite/conf 
-</source>
-<p>From your current working directory, compile the program: 
-</p>
-<source>
-$ javac -cp pig.jar idmapreduce.java
-</source>
-<p>Note: idmapreduce.class is written to your current working directory. 
Include â.â in the class path when you run the program. </p>
-<p>From your current working directory, run the program: 
-</p>
-<source>
-Unix:   $ java -cp pig.jar:.:$HADOOPDIR idmapreduce
-Cygwin: $ java âcp â.;pig.jar;$HADOOPDIRâ idmapreduce
-</source>
-<p>To view the results, check the idout directory on your Hadoop system. </p>
-</section>
-</section>
-
-
-<section>
-<title>Sample Code</title>
-
-<p>The sample code is based on Pig Latin statements that extract all user IDs 
from the /etc/passwd file. </p>
-<p>Copy the /etc/passwd file to your local working directory.</p>
-       
-<p><strong>id.pig</strong></p>
-<p>For the Grunt Shell and script files. </p>
-<source>
-A = load 'passwd' using PigStorage(':'); 
-B = foreach A generate $0 as id;
-dump B; 
-store B into âid.outâ;
-</source>
-
-<p><strong>idlocal.java</strong></p>
-<p>For embedded programs. </p>
-<source>
-import java.io.IOException;
-import org.apache.pig.PigServer;
-public class idlocal{ 
-public static void main(String[] args) {
-try {
-    PigServer pigServer = new PigServer("local");
-    runIdQuery(pigServer, "passwd");
-    }
-    catch(Exception e) {
-    }
- }
-public static void runIdQuery(PigServer pigServer, String inputFile) throws 
IOException {
-    pigServer.registerQuery("A = load '" + inputFile + "' using 
PigStorage(':');");
-    pigServer.registerQuery("B = foreach A generate $0 as id;");
-    pigServer.store("B", "id.out");
- }
-}
-</source>
-
-<p><strong>idmapreduce.java</strong></p>
-<p>For embedded programs. </p>
-<source>
-import java.io.IOException;
-import org.apache.pig.PigServer;
-public class idmapreduce{
-   public static void main(String[] args) {
-   try {
-     PigServer pigServer = new PigServer("mapreduce");
-     runIdQuery(pigServer, "passwd");
-   }
-   catch(Exception e) {
-   }
-}
-public static void runIdQuery(PigServer pigServer, String inputFile) throws 
IOException {
-   pigServer.registerQuery("A = load '" + inputFile + "' using 
PigStorage(':');")
-   pigServer.registerQuery("B = foreach A generate $0 as id;");
-   pigServer.store("B", "idout");
-   }
-}
-</source>
-
-</section>
-</body>
-</document>

Modified: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=813143&r1=813142&r2=813143&view=diff
==============================================================================
--- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml 
(original)
+++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/index.xml Wed Sep 
 9 22:28:08 2009
@@ -27,9 +27,12 @@
         The Pig Documentation provides the information you need to get started 
using Pig.
       </p>
       <p>
-        Begin with the <a href="getstarted.html"> Pig Getting Started 
Guide</a> which shows you how to download and run Pig. 
-        Then try out the <a href="tutorial.html">Pig Tutorial</a> to get an 
idea of how easy it us to use Pig. 
-        When you are ready to start writing your own scripts, read through the 
<a href="piglatin.html">Pig Latin Manual </a>to become familiar with Pig's 
features. 
+        Begin with the <a href="setup.html"> Pig Setup</a> which shows you how 
to download and run Pig. 
+        Then try out the <a href="tutorial.html">Pig Tutorial</a> to get an 
idea of how easy it is to use Pig. 
+      </p>
+      <p>  
+        When you are ready to start writing your own scripts, read through the 
<a href="piglatin_users.html">Pig Latin Users Guide</a>
+        and the <a href="piglatin_reference.html">Pig Latin Reference 
Manual</a> to become familiar with Pig's features. 
         Also review the <a href="cookbook.html">Pig Cookbook</a> to learn how 
to tweak your code for optimal performance.
       </p>
       <p>

svn commit: r813143 [1/4] - in /hadoop/pig/trunk: ./ src/docs/src/documentation/content/xdocs/

Reply via email to