piglatin_users.xml

olga Tue, 05 Jan 2010 09:19:17 -0800

Author: olga
Date: Tue Jan  5 17:18:51 2010
New Revision: 896134

URL: http://svn.apache.org/viewvc?rev=896134&view=rev
Log:
PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan)


Modified:
    hadoop/pig/trunk/CHANGES.txt
    
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml

Modified: hadoop/pig/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=896134&r1=896133&r2=896134&view=diff
==============================================================================
--- hadoop/pig/trunk/CHANGES.txt (original)
+++ hadoop/pig/trunk/CHANGES.txt Tue Jan  5 17:18:51 2010
@@ -24,6 +24,8 @@
 
 IMPROVEMENTS
 
+PIG-1175: Pig 0.6 Docs - Store v. Dump (chandec via olgan)
+
 PIG-1102: Collect number of spills per job (sriranjan via olgan)
 
 PIG-1149: Allow instantiation of SampleLoaders with parametrized LoadFuncs

Modified: 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml?rev=896134&r1=896133&r2=896134&view=diff
==============================================================================
--- 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml
 (original)
+++ 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_reference.xml
 Tue Jan  5 17:18:51 2010
@@ -4919,58 +4919,7 @@
 
    </section></section>
    
-   <section>
-   <title>DUMP</title>
-   <para>Displays the contents of a relation.</para>
-   
-   <section>
-   <title>Syntax</title>
-   <informaltable frame="all">
-      <tgroup cols="1"><tbody><row>
-            <entry>
-               <para>DUMP alias;Â  Â  Â  Â  </para>
-            </entry>
-         </row></tbody></tgroup>
-   </informaltable></section>
-   
-   <section>
-   <title>Terms</title>
-   <informaltable frame="all">
-      <tgroup cols="2"><tbody><row>
-            <entry>
-               <para>alias</para>
-            </entry>
-            <entry>
-               <para>The name of a relation.</para>
-            </entry>
-         </row></tbody></tgroup>
-   </informaltable></section>
-   
-   <section>
-   <title>Usage</title>
-   <para>Use the DUMP operator to run (execute) a Pig Latin statement and to 
display the contents of an alias. You can use DUMP as a debugging device to 
make sure the results you are expecting are being generated.</para></section>
-   
-   <section>
-   <title>Example</title>
-   <para>In this example a dump is performed after each statement.</para>
-<programlisting>
-A = LOAD 'student' AS (name:chararray, age:int, gpa:float);
-
-DUMP A;
-(John,18,4.0F)
-(Mary,19,3.7F)
-(Bill,20,3.9F)
-(Joe,22,3.8F)
-(Jill,20,4.0F)
-
-B = FILTER A BY name matches 'J.+';
-
-DUMP B;
-(John,18,4.0F)
-(Joe,22,3.8F)
-(Jill,20,4.0F)
-</programlisting>
-</section></section>
+  
    
    <section>
    <title>FILTER </title>
@@ -6521,7 +6470,7 @@
    
    <section>
    <title>STORE </title>
-   <para>Stores data to the file system.</para>
+   <para>Stores or saves results to the file system.</para>
    
    <section>
    <title>Syntax</title>
@@ -6591,7 +6540,10 @@
    
    <section>
    <title>Usage</title>
-   <para>Use the STORE operator to run (execute) Pig Latin statements and to 
store data on the file system. </para></section>
+   <para>Use the STORE operator to run (execute) Pig Latin statements and save 
(persist) results to the file system. Use STORE for production scripts and 
batch mode processing.</para>
+   
+   <para>Note: To debug scripts during development, you can use <ulink 
url="piglatin_reference.html#DUMP">DUMP</ulink> to check intermediate 
results.</para>
+</section>
    
    <section>
    <title>Examples</title>
@@ -6962,6 +6914,68 @@
    
    </section></section>
    
+   
+ <section>
+   <title>DUMP</title>
+   <para>Dumps or displays results to screen.</para>
+   
+   <section>
+   <title>Syntax</title>
+   <informaltable frame="all">
+      <tgroup cols="1"><tbody><row>
+            <entry>
+               <para>DUMP alias;Â  Â  Â  Â  </para>
+            </entry>
+         </row></tbody></tgroup>
+   </informaltable></section>
+   
+   <section>
+   <title>Terms</title>
+   <informaltable frame="all">
+      <tgroup cols="2"><tbody><row>
+            <entry>
+               <para>alias</para>
+            </entry>
+            <entry>
+               <para>The name of a relation.</para>
+            </entry>
+         </row></tbody></tgroup>
+   </informaltable></section>
+   
+   <section>
+   <title>Usage</title>
+   <para>Use the DUMP operator to run (execute) Pig Latin statements and 
display the results to your screen. DUMP is meant for interactive mode; 
statements are executed immediately and the results are not saved (persisted). 
You can use DUMP as a debugging device to make sure that the results you are 
expecting are actually generated. </para>
+   
+   <para>
+   Note that production scripts <emphasis>should not</emphasis> use DUMP as it 
will disable multi-query optimizations and is likely to slow down execution 
+   (see <ulink url="piglatin_users.html#Store+vs.+Dump">Store vs. 
Dump</ulink>).
+   </para>
+   </section>
+   
+   <section>
+   <title>Example</title>
+   <para>In this example a dump is performed after each statement.</para>
+<programlisting>
+A = LOAD 'student' AS (name:chararray, age:int, gpa:float);
+
+DUMP A;
+(John,18,4.0F)
+(Mary,19,3.7F)
+(Bill,20,3.9F)
+(Joe,22,3.8F)
+(Jill,20,4.0F)
+
+B = FILTER A BY name matches 'J.+';
+
+DUMP B;
+(John,18,4.0F)
+(Joe,22,3.8F)
+(Jill,20,4.0F)
+</programlisting>
+</section></section>   
+   
+   
+   
    <section>
    <title>EXPLAIN</title>
    <para>Displays execution plans.</para>

Modified: 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml?rev=896134&r1=896133&r2=896134&view=diff
==============================================================================
--- 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml 
(original)
+++ 
hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/piglatin_users.xml 
Tue Jan  5 17:18:51 2010
@@ -54,7 +54,7 @@
   
    <section>
    <title>Running Pig Latin </title>
-   <p>You can execute Pig Latin statements interactively or in batch mode 
using Pig scripts (see the EXEC and RUN operators).</p>
+   <p>You can execute Pig Latin statements interactively or in batch mode 
using Pig scripts (see the <a href="piglatin_reference.html#exec">exec</a> and 
<a href="piglatin_reference.html#run">run</a> commands).</p>
    
    <p>Grunt Shell, Interactive or Batch Mode</p>
    <source>
@@ -228,15 +228,12 @@
 <!-- MULTI-QUERY EXECUTION-->
 <section>
 <title>Multi-Query Execution</title>
-<p>With multi-query execution Pig processes an entire script or a batch of 
statements at once 
-(as opposed to processing statements when a DUMP or STORE is encountered). </p>
-
-
+<p>With multi-query execution Pig processes an entire script or a batch of 
statements at once.</p>
 
 <section>
        <title>Turning Multi-Query Execution On or Off</title>  
        <p>Multi-query execution is turned on by default. 
-       To turn it off and revert to Pi'gs "execute-on-dump/store" behavior, 
use the "-M" or "-no_multiquery" options. </p>
+       To turn it off and revert to Pig's "execute-on-dump/store" behavior, 
use the "-M" or "-no_multiquery" options. </p>
        <p>To run script "myscript.pig" without the optimization, execute Pig 
as follows: </p>
 <source>
 $ pig -M myscript.pig
@@ -253,7 +250,8 @@
 <li>
 <p>For batch mode execution, the entire script is first parsed to determine if 
intermediate tasks 
 can be combined to reduce the overall amount of work that needs to be done; 
execution starts only after the parsing is completed 
-(see the EXPLAIN operator and the EXEC and RUN commands). </p>
+(see the <a href="piglatin_reference.html#EXPLAIN">EXPLAIN</a> operator and 
the <a href="piglatin_reference.html#exec">exec</a> and <a 
href="piglatin_reference.html#run">run</a> commands). </p>
+
 </li>
 <li>
 <p>Two run scenarios are optimized, as explained below: explicit and implicit 
splits, and storing intermediate results.</p>
@@ -316,7 +314,32 @@
 </section>
 </section>
 
+<section>
+       <title>Store vs. Dump</title>
+       <p>With multi-query exection, you want to use <a 
href="piglatin_reference.html#STORE">STORE</a> to save (persist) your results. 
+       You do not want to use <a href="piglatin_reference.html#DUMP">DUMP</a> 
as it will disable multi-query execution and is likely to slow down execution. 
(If you have included DUMP statements in your scripts for debugging purposes, 
you should remove them.) </p>
+       
+       <p>DUMP Example: In this script, because the DUMP command is 
interactive, the multi-query execution will be disabled and two separate jobs 
will be created to execute this script. The first job will execute A > B > DUMP 
while the second job will execute A > B > C > STORE.</p>
+       
+<source>
+A = LOAD âinputâ AS (x, y, z);
+B = FILTER A BY x > 5;
+DUMP B;
+C = FOREACH B GENERATE y, z;
+STORE C INTO âoutputâ;
+</source>
+       
+       <p>STORE Example: In this script, multi-query optimization will kick in 
allowing the entire script to be executed as a single job. Two outputs are 
produced: output1 and output2.</p>
+       
+<source>
+A = LOAD âinputâ AS (x, y, z);
+B = FILTER A BY x > 5;
+STORE B INTO âoutput1â;
+C = FOREACH B GENERATE y, z;
+STORE C INTO âoutput2â;    
+</source>
 
+</section>
 <section>
        <title>Error Handling</title>
        <p>With multi-query execution Pig processes an entire script or a batch 
of statements at once. 
@@ -352,10 +375,10 @@
        <title>Backward Compatibility</title>
        
        <p>Most existing Pig scripts will produce the same result with or 
without the multi-query execution. 
-       There are cases though were this is not true. Path names and schemes 
are discussed here.</p>
+       There are cases though where this is not true. Path names and schemes 
are discussed here.</p>
        
        <p>Any script is parsed in it's entirety before it is sent to 
execution. Since the current directory can change 
-       throughout the script any path used in load or store is translated to a 
fully qualified and absolute path.</p>
+       throughout the script any path used in LOAD or STORE statement is 
translated to a fully qualified and absolute path.</p>
                
        <p>In map-reduce mode, the following script will load from 
"hdfs://&lt;host&gt;:&lt;port&gt;/data1" and store into 
"hdfs://&lt;host&gt;:&lt;port&gt;/tmp/out1". </p>
 <source>
@@ -375,7 +398,7 @@
                <li><p>Specify a custom scheme for the LoadFunc/Slicer </p></li>
        </ol>   
        
-       <p>Arguments used in a load statement that have a scheme other than 
"hdfs" or "file" will not be expanded and passed to the LoadFunc/Slicer 
unchanged.</p>
+       <p>Arguments used in a LOAD statement that have a scheme other than 
"hdfs" or "file" will not be expanded and passed to the LoadFunc/Slicer 
unchanged.</p>
        <p>In the SQL case, the SQLLoader function is invoked with 
"sql://mytable". </p>
 
 <source>
@@ -416,7 +439,7 @@
 
 <section>
        <title>Example</title>
-<p>In this script, the store/load operators have different file paths; 
however, the load operator depends on the store operator.</p>
+<p>In this script, the STORE/LOAD operators have different file paths; 
however, the LOAD operator depends on the STORE operator.</p>
 <source>
 A = LOAD '/user/xxx/firstinput' USING PigStorage();
 B = group ....

svn commit: r896134 - in /hadoop/pig/trunk: CHANGES.txt src/docs/src/documentation/content/xdocs/piglatin_reference.xml src/docs/src/documentation/content/xdocs/piglatin_users.xml

Reply via email to