getstarted.xml

olga Fri, 05 Jun 2009 11:24:17 -0700

Author: olga
Date: Fri Jun  5 18:20:55 2009
New Revision: 782088

URL: http://svn.apache.org/viewvc?rev=782088&view=rev
Log:
missing doc file


Added:
    hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml

Added: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml?rev=782088&view=auto
==============================================================================
--- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml 
(added)
+++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml 
Fri Jun  5 18:20:55 2009
@@ -0,0 +1,232 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" 
"http://forrest.apache.org/dtd/document-v20.dtd";>
+<document>
+  <header>
+    <title>Pig Getting Started Guide</title>
+  </header>
+  <body>
+  
+    <section id="req">
+      <title>Requirements</title>
+      
+      <p><strong>Unix</strong> and <strong>Windows</strong> users need the 
following:</p>
+               <ol>
+                 <li> <strong>Hadoop 18</strong>: <a 
href="http://hadoop.apache.org/core/";>http://hadoop.apache.org/core/</a></li>
+                 <li> <strong>Java 1.6</strong>, preferably from Sun: <a 
href="http://java.sun.com/javase/downloads/index.jsp";>http://java.sun.com/javase/downloads/index.jsp</a>.
 Set JAVA_HOME to the root of your Java installation.</li>
+                 <li> <strong>Ant</strong> for builds: <a 
href="http://ant.apache.org/";>http://ant.apache.org/</a>.</li>
+                 <li> <strong>JUnit</strong> for unit tests: <a 
href="http://junit.sourceforge.net/";>http://junit.sourceforge.net/</a>.</li>
+               </ol>
+       <p><strong>Windows</strong> users need to install Cygwin and the Perl 
package: <a href="http://www.cygwin.com/";> http://www.cygwin.com/</a>.</p>
+   </section>
+        
+    <section>
+       <title>Download Pig</title>
+       <p>To get a Pig distribution, download a recent stable release from one 
of the Apache Download Mirrors (see <a 
href="http://hadoop.apache.org/pig/releases.html";> Pig Releases</a>).</p>
+       <p>Unpack the downloaded Pig distribution. You can find the Pig script 
in the bin directory (/pig-n.n.n/bin/pig).</p>
+       <p>Add /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv 
(tcsh,csh). For example: </p>
+<source>
+$ export PATH=/&lt;my-path-to-pig&gt;/pig-n.n.n/bin:$PATH
+</source>
+       <p>Try the following command, to get a listing of all Pig commands </p> 
+<source>
+$ pig -help
+</source>
+       <p>Try the following command, to start the Grunt Shell:</p>
+<source>
+$ pig 
+</source>
+
+
+       </section>  
+       
+    <section>
+      <title>Build Pig</title>
+      <p>(optional) To build pig, do the following:</p>
+     <ol>
+         <li> Check out the Pig code from SVN: <em>svn co 
http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li>
+         <li> Build the code from the top directory: <em>ant</em>. If the 
build is successful, you should see the <em>pig.jar</em> created in that 
directory. </li>    
+         <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant 
test</em></li>
+     </ol>
+    </section>
+
+<section>
+       <title>Run Pig</title>
+    <p>Pig has two run modes or exectypes:  </p>
+    <ul>
+      <li><p> Local Mode: To run Pig in local mode, you need access to a 
single machine.  </p></li>
+      <li><p> Mapreduce Mode: To run Pig in mapreduce mode, you need access to 
a Hadoop cluster and HDFS installation. 
+      Pig will automatically allocate and deallocate a 15-node 
cluster.</p></li>
+    </ul>
+
+
+<section>
+<title>Grunt Shell</title>
+<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. 
+(You can also run or execute script files from the Grunt shell. See the RUN 
and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>). </p>
+<p>Local mode: 
+</p>
+<source>
+$ pig -x local
+</source>
+<p>Mapreduce mode: 
+</p>
+<source>
+$ pig
+or
+$ pig -x mapreduce
+</source>
+<p>The Grunt shell is invoked and you can enter commands at the prompt. 
+</p>
+<source>
+grunt&gt; A = load 'passwd' using PigStorage(':'); 
+grunt&gt; B = foreach A generate $0 as id; 
+grunt&gt; dump B; 
+</source>
+</section>
+
+<section>
+<title>Script Files</title>
+<p>Use script files to run Pig commands as batch jobs. See the sample code for 
the script file (id.pig) used in the examples.</p>
+<p>Local mode:</p>
+<source>
+$ pig -x local id.pig
+</source>
+<p>Mapreduce mode: </p>
+<source>
+$ pig id.pig
+or
+$ pig -x mapreduce id.pig
+</source>
+<p>The Pig Latin statements are executed and the results are displayed to your 
terminal screen (if DUMP is used) or to a file (if STORE is used).</p>
+</section>
+
+<section>
+       <title>Embedded Programs</title>
+<p>Embed Pig commands in a host language and run the program. 
+See the sample code for the java files (idlocal.java, idmapreduce.java) used 
in the examples.</p>
+       <section>
+<title> Local Mode</title>
+<p>From your current working directory, compile the program: </p>
+<source>
+$ javac -cp pig.jar idlocal.java
+</source>
+<p>Note: idlocal.class is written to your current working directory. Include 
â.â in the class path when you run the program. </p>
+<p>From your current working directory, run the program: 
+</p>
+<source>
+Unix:   $ java -cp pig.jar:. idlocal
+Cygwin: $ java âcp â.;pig.jarâ idlocal
+</source>
+<p>To view the results, check the output file, id.out. </p>
+</section>
+<section>
+<title>Mapreduce Mode</title>
+<p>Point $HADOOPDIR to the directory that contains the hadoop-site.xml file. 
Example: 
+</p>
+<source>
+$ export HADOOPDIR=/yourHADOOPsite/conf 
+</source>
+<p>From your current working directory, compile the program: 
+</p>
+<source>
+$ javac -cp pig.jar idhadoop.java
+</source>
+<p>Note: idhadoop.class is written to your current working directory. Include 
â.â in the class path when you run the program. </p>
+<p>From your current working directory, run the program: 
+</p>
+<source>
+Unix:   $ java -cp pig.jar:.:$HADOOPDIR idhadoop
+Cygwin: $ java âcp â.;pig.jar;$HADOOPDIRâ idhadoop
+</source>
+<p>To view the results, check the idout directory on your Hadoop system. </p>
+
+</section>
+</section>
+</section>
+
+<section>
+<title>Sample Code</title>
+
+<p>The sample code is based on Pig Latin statements that extract all user IDs 
from the /etc/passwd file. </p>
+<p>Copy the /etc/passwd file to your local working directory.</p>
+       
+<section>
+<title>id.pig</title>
+<p>For the Grunt Shell and script files. </p>
+<source>
+A = load 'passwd' using PigStorage(':'); 
+B = foreach A generate $0 as id;
+dump B; 
+store B into âid.outâ;
+</source>
+</section>
+
+<section>
+<title>idlocal.java</title>
+<p>For embedded programs. </p>
+<source>
+import java.io.IOException;
+import org.apache.pig.PigServer;
+public class idlocal{ 
+public static void main(String[] args) {
+try {
+    PigServer pigServer = new PigServer("local");
+    runIdQuery(pigServer, "passwd");
+    }
+    catch(Exception e) {
+    }
+ }
+public static void runIdQuery(PigServer pigServer, String inputFile) throws 
IOException {
+    pigServer.registerQuery("A = load '" + inputFile + "' using 
PigStorage(':');");
+    pigServer.registerQuery("B = foreach A generate $0 as id;");
+    pigServer.store("B", "id.out");
+ }
+}
+</source>
+</section>
+
+<section>
+<title>idmapreduce.java</title>
+<p>For embedded programs. </p>
+<source>
+import java.io.IOException;
+import org.apache.pig.PigServer;
+public class idhadoop {
+   public static void main(String[] args) {
+   try {
+     PigServer pigServer = new PigServer("mapreduce");
+     runIdQuery(pigServer, "passwd");
+   }
+   catch(Exception e) {
+   }
+}
+public static void runIdQuery(PigServer pigServer, String inputFile) throws 
IOException {
+   pigServer.registerQuery("A = load '" + inputFile + "' using 
PigStorage(':');")
+   pigServer.registerQuery("B = foreach A generate $0 as id;");
+   pigServer.store("B", "idout");
+   }
+}
+</source>
+</section>
+
+
+</section>
+
+</body>
+</document>

svn commit: r782088 - /hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml

Reply via email to