Author: olga Date: Fri Jun 5 18:20:55 2009 New Revision: 782088 URL: http://svn.apache.org/viewvc?rev=782088&view=rev Log: missing doc file
Added: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml Added: hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml URL: http://svn.apache.org/viewvc/hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml?rev=782088&view=auto ============================================================================== --- hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml (added) +++ hadoop/pig/trunk/src/docs/src/documentation/content/xdocs/getstarted.xml Fri Jun 5 18:20:55 2009 @@ -0,0 +1,232 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!-- + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--> +<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd"> +<document> + <header> + <title>Pig Getting Started Guide</title> + </header> + <body> + + <section id="req"> + <title>Requirements</title> + + <p><strong>Unix</strong> and <strong>Windows</strong> users need the following:</p> + <ol> + <li> <strong>Hadoop 18</strong>: <a href="http://hadoop.apache.org/core/">http://hadoop.apache.org/core/</a></li> + <li> <strong>Java 1.6</strong>, preferably from Sun: <a href="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</a>. Set JAVA_HOME to the root of your Java installation.</li> + <li> <strong>Ant</strong> for builds: <a href="http://ant.apache.org/">http://ant.apache.org/</a>.</li> + <li> <strong>JUnit</strong> for unit tests: <a href="http://junit.sourceforge.net/">http://junit.sourceforge.net/</a>.</li> + </ol> + <p><strong>Windows</strong> users need to install Cygwin and the Perl package: <a href="http://www.cygwin.com/"> http://www.cygwin.com/</a>.</p> + </section> + + <section> + <title>Download Pig</title> + <p>To get a Pig distribution, download a recent stable release from one of the Apache Download Mirrors (see <a href="http://hadoop.apache.org/pig/releases.html"> Pig Releases</a>).</p> + <p>Unpack the downloaded Pig distribution. You can find the Pig script in the bin directory (/pig-n.n.n/bin/pig).</p> + <p>Add /pig-n.n.n/bin to your path. Use export (bash,sh,ksh) or setenv (tcsh,csh). For example: </p> +<source> +$ export PATH=/<my-path-to-pig>/pig-n.n.n/bin:$PATH +</source> + <p>Try the following command, to get a listing of all Pig commands </p> +<source> +$ pig -help +</source> + <p>Try the following command, to start the Grunt Shell:</p> +<source> +$ pig +</source> + + + </section> + + <section> + <title>Build Pig</title> + <p>(optional) To build pig, do the following:</p> + <ol> + <li> Check out the Pig code from SVN: <em>svn co http://svn.apache.org/repos/asf/hadoop/pig/trunk</em>. </li> + <li> Build the code from the top directory: <em>ant</em>. If the build is successful, you should see the <em>pig.jar</em> created in that directory. </li> + <li> Validate your <em>pig.jar</em> by running a unit test: <em>ant test</em></li> + </ol> + </section> + +<section> + <title>Run Pig</title> + <p>Pig has two run modes or exectypes: </p> + <ul> + <li><p> Local Mode: To run Pig in local mode, you need access to a single machine. </p></li> + <li><p> Mapreduce Mode: To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation. + Pig will automatically allocate and deallocate a 15-node cluster.</p></li> + </ul> + + +<section> +<title>Grunt Shell</title> +<p>Use Pig's interactive shell, Grunt, to enter pig commands manually. +(You can also run or execute script files from the Grunt shell. See the RUN and EXEC commands in the <a href="piglatin.html">Pig Latin Manual</a>). </p> +<p>Local mode: +</p> +<source> +$ pig -x local +</source> +<p>Mapreduce mode: +</p> +<source> +$ pig +or +$ pig -x mapreduce +</source> +<p>The Grunt shell is invoked and you can enter commands at the prompt. +</p> +<source> +grunt> A = load 'passwd' using PigStorage(':'); +grunt> B = foreach A generate $0 as id; +grunt> dump B; +</source> +</section> + +<section> +<title>Script Files</title> +<p>Use script files to run Pig commands as batch jobs. See the sample code for the script file (id.pig) used in the examples.</p> +<p>Local mode:</p> +<source> +$ pig -x local id.pig +</source> +<p>Mapreduce mode: </p> +<source> +$ pig id.pig +or +$ pig -x mapreduce id.pig +</source> +<p>The Pig Latin statements are executed and the results are displayed to your terminal screen (if DUMP is used) or to a file (if STORE is used).</p> +</section> + +<section> + <title>Embedded Programs</title> +<p>Embed Pig commands in a host language and run the program. +See the sample code for the java files (idlocal.java, idmapreduce.java) used in the examples.</p> + <section> +<title> Local Mode</title> +<p>From your current working directory, compile the program: </p> +<source> +$ javac -cp pig.jar idlocal.java +</source> +<p>Note: idlocal.class is written to your current working directory. Include â.â in the class path when you run the program. </p> +<p>From your current working directory, run the program: +</p> +<source> +Unix: $ java -cp pig.jar:. idlocal +Cygwin: $ java âcp â.;pig.jarâ idlocal +</source> +<p>To view the results, check the output file, id.out. </p> +</section> +<section> +<title>Mapreduce Mode</title> +<p>Point $HADOOPDIR to the directory that contains the hadoop-site.xml file. Example: +</p> +<source> +$ export HADOOPDIR=/yourHADOOPsite/conf +</source> +<p>From your current working directory, compile the program: +</p> +<source> +$ javac -cp pig.jar idhadoop.java +</source> +<p>Note: idhadoop.class is written to your current working directory. Include â.â in the class path when you run the program. </p> +<p>From your current working directory, run the program: +</p> +<source> +Unix: $ java -cp pig.jar:.:$HADOOPDIR idhadoop +Cygwin: $ java âcp â.;pig.jar;$HADOOPDIRâ idhadoop +</source> +<p>To view the results, check the idout directory on your Hadoop system. </p> + +</section> +</section> +</section> + +<section> +<title>Sample Code</title> + +<p>The sample code is based on Pig Latin statements that extract all user IDs from the /etc/passwd file. </p> +<p>Copy the /etc/passwd file to your local working directory.</p> + +<section> +<title>id.pig</title> +<p>For the Grunt Shell and script files. </p> +<source> +A = load 'passwd' using PigStorage(':'); +B = foreach A generate $0 as id; +dump B; +store B into âid.outâ; +</source> +</section> + +<section> +<title>idlocal.java</title> +<p>For embedded programs. </p> +<source> +import java.io.IOException; +import org.apache.pig.PigServer; +public class idlocal{ +public static void main(String[] args) { +try { + PigServer pigServer = new PigServer("local"); + runIdQuery(pigServer, "passwd"); + } + catch(Exception e) { + } + } +public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { + pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');"); + pigServer.registerQuery("B = foreach A generate $0 as id;"); + pigServer.store("B", "id.out"); + } +} +</source> +</section> + +<section> +<title>idmapreduce.java</title> +<p>For embedded programs. </p> +<source> +import java.io.IOException; +import org.apache.pig.PigServer; +public class idhadoop { + public static void main(String[] args) { + try { + PigServer pigServer = new PigServer("mapreduce"); + runIdQuery(pigServer, "passwd"); + } + catch(Exception e) { + } +} +public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException { + pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(':');") + pigServer.registerQuery("B = foreach A generate $0 as id;"); + pigServer.store("B", "idout"); + } +} +</source> +</section> + + +</section> + +</body> +</document>