Rob,
You need to tell Hadoop which jars you need it to ship to the worker
nodes. You include datagen.jar, etc, on the classpath, which makes
them discoverable locally, but you aren't telling Hadoop to ship them.
You want to list them, comma-separated, in the -libjars parameter.

-D

On Thu, Jan 14, 2010 at 6:49 AM, Rob Stewart
<[email protected]> wrote:
> Hi there.
>
> I am well underway with comparing Pig, Hive, JAQL etc...
>
> The DataGenerator is proving a valuable tool for me. Thanks for that.
>
> I have one query. I am able to use it in local mode, no problem, and some
> experiments are complete.
>
> However, I cannot seem to use it in MapReduce mode on the cluster. This is
> my file "generateData" contents:
> ------------------
> export pigjar=$HOME/installation/pig/pig-0.5.0/pig-0.5.0-core.jar
> export zipfjar=$HOME/installation/pig/pig-0.5.0/sdsuLibJKD14.jar
> export datagenjar=$HOME/rs46/installation/DataGenerator/dist/MyPig.jar
> export conf_file=/usr/lib/hadoop/conf/hadoop-site.xml
> export HADOOP_CLASSPATH=$pigjar:$zipfjar:$datagenjar
> /usr/lib/hadoop/bin/hadoop jar $datagenjar
> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -m 1 -rows
> 10000000 -f words.dat s:8:50:z:0
> ------------------
>
> The error I receive when trying to run it with "-m 1" option (in cluster
> mode):
> Caused by: java.lang.ClassNotFoundException: sdsu.algorithms.data.Zipf
>
> So in local mode, it successfully picks up the jar file sdsuLibJKD14.jar ,
> but when running it in cluster mode, this classpath is not found?
>
>
> thanks.
>
> Rob Stewart
>

Reply via email to