Dear Wiki user,
You have subscribed to a wiki page or wiki category on Pig Wiki for change
notification.
The following page has been changed by OlgaN:
http://wiki.apache.org/pig/EmbeddedPig
New page:
== Embedding Pig In Java Programs ==
Sometimes you want more control than Pig scripts can give you. If so, you can
embed Pig Latin in Java (just like SQL can be embedded in programs using JDBC).
The following steps need to be carried out:
* Make sure `pig.jar` is on your classpath.
* Create an instance of `PigServer`
* Issue commands through that PigServer by calling
`PigServer.registerQuery()`.
* To retrieve results, either call `PigServer.openIterator()` or
`PigServer.store()`.
* If you have user defined functions, register them by calling
`PigServer.registerJar()`.
=== Example ===
Lets assume that I need to count the number of occurrences of each word in a
document. Lets also assume that you have EvalFunction `Tokenize` that parses a
line of text and returns all the words for that line. The function is located
in `/mylocation/tokenize.jar`.
PigLatin script for this computation will look as follows:
{{{
register /mylocation/tokenize.jar
A = load 'mytext' using TextLoader();
B = foreach A generate flatten(tokenize($0));
C = group B by $1;
D = foreach C generate flatten(group), COUNT(B.$0);
store D into 'myoutput';
}}}
The same can be accomplished with the following Java program
{{{
import java.io.IOException;
import org.apache.pig.PigServer;
public class WordCount {
public static void main(String[] args) {
PigServer pigServer = new PigServer();
try {
pigServer.registerJar(/mylocation/tokenize.jar);
runMyQuery(pigServer, myinput.txt;
} catch (IOException e) {
e.printStackTrace();
}
}
public static void runMyQuery(PigServer pigServer, String inputFile) throws
IOException {
pigServer.registerQuery(A = load ' + inputFile + ' using
TextLoader(););
pigServer.registerQuery(B = foreach A generate flatten(tokenize($0)););
pigServer.registerQuery(C = group B by $1;);
pigServer.registerQuery(D = foreach C generate flatten(group),
COUNT(B.$0););
pigServer.store(D, myoutput);
}
}
}}}
Notes:
* The jar which contains your functions must be registered.
* The four calls to `pigServer.registerQuery()` simply cause the query to be
parsed and enquired. The query is not actually executed until
`pigServer.store()` is called.
* The input data referred to on the load statement, must be on DFS in the
specified location.
* The final result is placed into `myoutput` file in the your current working
directory on DFS. (By default this is your home directory on DFS.)
To run your program, you need to first compile it by using the following
command:
{{{
javac -cp pathpig.jar WordCount.java
}}}
If the compilation is successful, you can then run your program:
{{{
java -cp pathpig.jar WordCount
}}}