svn commit: r901333 - in /hadoop/pig/branches/branch-0.6: ./ src/docs/src/documentation/content/xdocs/

olga Wed, 20 Jan 2010 11:04:24 -0800

Author: olga
Date: Wed Jan 20 19:03:57 2010
New Revision: 901333

URL: http://svn.apache.org/viewvc?rev=901333&view=rev
Log:
PIG-1192: Pig 0.6 Docs fixes (chandec via olgan)


Modified:
    hadoop/pig/branches/branch-0.6/CHANGES.txt
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/cookbook.xml
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/index.xml
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/setup.xml
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/site.xml
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml
    
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_users.xml

Modified: hadoop/pig/branches/branch-0.6/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/CHANGES.txt?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- hadoop/pig/branches/branch-0.6/CHANGES.txt (original)
+++ hadoop/pig/branches/branch-0.6/CHANGES.txt Wed Jan 20 19:03:57 2010
@@ -26,6 +26,8 @@
 
 IMPROVEMENTS
 
+PIG-1192: Pig 0.6 Docs fixes (chandec via olgan)
+
 PIG-1177: Pig 0.6 Docs - Zebra docs (chandec via olgan)
 
 PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan)

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/cookbook.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/cookbook.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/cookbook.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/cookbook.xml
 Wed Jan 20 19:03:57 2010
@@ -36,7 +36,7 @@
 
 <section>
 <title>Use Optimization</title>
-<p>Pig supports various <a 
href="piglatin_users.html#Optimization+Rules">optimization rules</a> which are 
turned on by default. 
+<p>Pig supports various <a 
href="piglatin_ref1.html#Optimization+Rules">optimization rules</a> which are 
turned on by default. 
 Become familiar with these rules.</p>
 </section>
 
@@ -220,29 +220,34 @@
 <section>
 <title>Specialized Join Optimizations</title>
 <p>Optimization can also be achieved using fragment replicate joins, skewed 
joins, and merge joins. 
-For more information see <a 
href="piglatin_users.html#Specialized+Joins">Specialized Joins</a>.</p>
+For more information see <a 
href="piglatin_ref1.html#Specialized+Joins">Specialized Joins</a>.</p>
 </section>
 
 </section>
 
 
 <section>
-<title>Use the PARALLEL Keyword</title>
+<title>Use the PARALLEL Clause</title>
 
-<p>PARALLEL controls the number of reducers invoked by Hadoop. The default 
value is 1. However, the number of reducers you need for a particular construct 
in Pig that forms a MapReduce boundary depends entirely on (1) your data and 
the number of intermediate keys you are generating in your mappers  and (2) the 
partitioner and distribution of map (combiner) output keys. In the best cases 
we have seen that a reducer processing about 500 MB of data behaves 
efficiently.</p>
+<p>Use the PARALLEL clause to increase the parallelism of a job:</p>
+<ul>
+<li>PARALLEL sets the number of reduce tasks for the MapReduce jobs generated 
by Pig. The default value is 1 (one reduce task).</li>
+<li>PARALLEL only affects the number of reduce tasks. Map parallelism is 
determined by the input file, one map for each HDFS block. </li>
+<li>If you donât specify PARALLEL, you still get the same map parallelism 
but only one reduce task.</li>
+</ul>
+<p></p>
+<p>As noted, the default value for PARALLEL is 1 (one reduce task). However, 
the number of reducers you need for a particular construct in Pig that forms a 
MapReduce boundary depends entirely on (1) your data and the number of 
intermediate keys you are generating in your mappers  and (2) the partitioner 
and distribution of map (combiner) output keys. In the best cases we have seen 
that a reducer processing about 500 MB of data behaves efficiently.</p>
 
-<p>The keyword makes sense with any operator that starts a reduce phase. This 
includes  
-<a href="piglatin_reference.html#COGROUP">COGROUP</a>, 
-<a href="piglatin_reference.html#CROSS">CROSS</a>, 
-<a href="piglatin_reference.html#DISTINCT">DISTINCT</a>, 
-<a href="piglatin_reference.html#GROUP">GROUP</a>, 
-<a href="piglatin_reference.html#JOIN">JOIN</a>, 
-<a href="piglatin_reference.html#ORDER">ORDER</a>, and 
-<a href="piglatin_reference.html#JOIN%2C+OUTER">OUTER JOIN</a>.
-
-</p>
-
-<p>You can set the value of PARALLEL in your scripts in conjunction with the 
operator (see the example below). You can also set the value of PARALLEL for 
all scripts using the <a href="piglatin_reference.html#set">set</a> command.</p>
+<p>You can include the PARALLEL clause with any operator that starts a reduce 
phase (see the example below). This includes  
+<a href="piglatin_ref2.html#COGROUP">COGROUP</a>, 
+<a href="piglatin_ref2.html#CROSS">CROSS</a>, 
+<a href="piglatin_ref2.html#DISTINCT">DISTINCT</a>, 
+<a href="piglatin_ref2.html#GROUP">GROUP</a>, 
+<a href="piglatin_ref2.html#JOIN+%28inner%29">JOIN (inner)</a>, 
+<a href="piglatin_ref2.html#JOIN+%28outer%29">JOIN (outer)</a>, and
+<a href="piglatin_ref2.html#ORDER">ORDER</a>.
+ 
+You can also set the value of PARALLEL for all scripts using the <a 
href="piglatin_ref2.html#set">set</a> command.</p>
 
 <p>Example</p>
 
@@ -251,7 +256,6 @@
 B = group A by t PARALLEL 18;
 .....
 </source>
-
 </section>
 
 <section>

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/index.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/index.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/index.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/index.xml
 Wed Jan 20 19:03:57 2010
@@ -31,8 +31,8 @@
         Then try out the <a href="tutorial.html">Pig Tutorial</a> to get an 
idea of how easy it is to use Pig. 
       </p>
       <p>  
-        When you are ready to start writing your own scripts, read through the 
<a href="piglatin_users.html">Pig Latin Users Guide</a>
-        and the <a href="piglatin_reference.html">Pig Latin Reference 
Manual</a> to become familiar with Pig's features. 
+        When you are ready to start writing your own scripts, read through the 
Pig Latin Reference <a href="piglatin_ref1.html">Manual 1</a>
+        and <a href="piglatin_ref2.html">Manual 2</a> to become familiar with 
Pig's features. 
         Also review the <a href="cookbook.html">Pig Cookbook</a> to learn how 
to tweak your code for optimal performance.
       </p>
       <p>

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/setup.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/setup.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/setup.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/setup.xml
 Wed Jan 20 19:03:57 2010
@@ -71,7 +71,8 @@
 <section>
 <title>Grunt Shell</title>
 <p>Use Pig's interactive shell, Grunt, to enter pig commands manually. See the 
<a href="setup.html#Sample+Code">Sample Code</a> for instructions about the 
passwd file used in the example.</p>
-<p>You can also run or execute script files from the Grunt shell. See the RUN 
and EXEC commands in the <a href="piglatin_reference.html">Pig Latin Reference 
Manual</a>. </p>
+<p>You can also run or execute script files from the Grunt shell. 
+See the <a href="piglatin_ref2.html#run">run</a> and <a 
href="piglatin_ref2.html#exec">exec</a> commands. </p>
 <p><strong>Local Mode</strong></p>
 <source>
 $ pig -x local

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/site.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/site.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/site.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/site.xml
 Wed Jan 20 19:03:57 2010
@@ -45,8 +45,8 @@
     <tutorial label="Tutorial"                                 
href="tutorial.html" />
     </docs>  
      <docs label="Guides"> 
-    <plusers label="Pig Latin Users "  href="piglatin_users.html" />
-    <plref label="Pig Latin Reference" href="piglatin_reference.html" />
+    <plref1 label="Pig Latin 1"        href="piglatin_ref1.html" />
+    <plref2 label="Pig Latin 2"        href="piglatin_ref2.html" />
     <cookbook label="Cookbook"                 href="cookbook.html" />
     <udf label="UDFs" href="udf.html" />
     </docs>  

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_pig.xml
 Wed Jan 20 19:03:57 2010
@@ -29,7 +29,7 @@
    <section>
    <title>Overview</title>
    <p>With Pig you can load and store data in Zebra format. You can also take 
advantage of sorted Zebra tables for map-side groups and merge joins. When 
working with Pig keep in mind that, unlike MapReduce, you do not need to 
declare Zebra schemas. Zebra automatically converts Zebra schemas to Pig 
schemas (and vice versa) for you.</p>
-
+   
  </section>
  <!-- END OVERVIEW-->
  
@@ -54,19 +54,19 @@
  <ol>
  <li>You need to register a Zebra jar file the same way you would do it for 
any other UDF.</li>
  <li>You need to place the jar on your classpath.</li>
- <li>When using Zebra with Pig, Zebra data is self-described and always 
contains a schema. This means that the AS clause is unnecessary as long as 
-  you know what the column names and types are. To determine the column names 
and types, you can run the DESCRIBE statement right after the load:
+  </ol>
+  
+ <p>Zebra data is self-described meaning that the name and type information is 
stored with the data; you don't need to provide an AS clause or perform type 
casting unless you actually need to change the data. To check column names and 
types, you can run the DESCRIBE statement right after the load:</p>
  <source>
 A = LOAD 'studenttab' USING org.apache.hadoop.zebra.pig.TableLoader();
 DESCRIBE A;
-a: {name: chararray,age: int,gpa: float}
+A: {name: chararray,age: int,gpa: float}
 </source>
- </li>
- </ol>
    
-<p>You can provide alternative names to the columns with the AS clause. You 
can also provide types as long as the 
- original type can be converted to the new type. <em>In general</em>, Zebra 
supports Pig type compatibilities 
- (see <a 
href="piglatin_reference.html#Arithmetic+Operators+and+More">Arithmetic 
Operators and More</a>).</p>
+<p>You can provide alternative names to the columns with the AS clause. You 
can also provide alternative types as long as the 
+ original type can be converted to the new type. (One exception to this rule 
are maps since you can't specify schema for a map. Zebra always creates map 
values as bytearrays which would require casting to real type in the script. 
Note that this is not different for treating maps in Pig for any other 
storage.) For more information see <a 
href="piglatin_ref2.html#Schemas">Schemas</a> and
+<a href="piglatin_ref2.html#Arithmetic+Operators+and+More">Arithmetic 
Operators and More</a>.
+ </p>
  
 <p>You can provide multiple, comma-separated files to the loader:</p>
 <source>
@@ -186,7 +186,8 @@
    <section>
     <title>HDFS File Globs</title>
         <p>Pig supports HDFS file globs 
-    (for more information about globs, see <a 
href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html";>FileSystem</a>
 and GlobStatus).</p>
+    (for more information 
+    see <a 
href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)">GlobStatus</a>).</p>
     <p>In this example, all Zebra tables in the directory of 
/path/to/PIG/tables will be loaded as a union (table union). </p>
  <source>
  A = LOAD â/path/to/PIG/tables/*â USING 
org.apache.hadoop.zebra.pig.TableLoader(ââ);

Modified: 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_users.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_users.xml?rev=901333&r1=901332&r2=901333&view=diff
==============================================================================
--- 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_users.xml
 (original)
+++ 
hadoop/pig/branches/branch-0.6/src/docs/src/documentation/content/xdocs/zebra_users.xml
 Wed Jan 20 19:03:57 2010
@@ -155,7 +155,7 @@
 <section>
 <title>MapReduce Jobs</title>
 <p>
-TableInputFormat has static method, requireSortedTable, that allows the caller 
to specify the behavior of a single sorted table or an order-preserving sorted 
table union as described above. The method ensures all tables in a union are 
sorted. For more information, see <a 
href="zebra_reference.html#TableInputFormat">TableInputFormat</a>.
+TableInputFormat has static method, requireSortedTable, that allows the caller 
to specify the behavior of a single sorted table or an order-preserving sorted 
table union as described above. The method ensures all tables in a union are 
sorted. For more information, see <a 
href="zebra_mapreduce.html#TableInputFormat">TableInputFormat</a>.
 </p>
 
 <p>One simple example: A order-preserving sorted union B. A and B are sorted 
tables. </p>

svn commit: r901333 - in /hadoop/pig/branches/branch-0.6: ./ src/docs/src/documentation/content/xdocs/

Reply via email to