Hi there. I am well underway with comparing Pig, Hive, JAQL etc...
The DataGenerator is proving a valuable tool for me. Thanks for that. I have one query. I am able to use it in local mode, no problem, and some experiments are complete. However, I cannot seem to use it in MapReduce mode on the cluster. This is my file "generateData" contents: ------------------ export pigjar=$HOME/installation/pig/pig-0.5.0/pig-0.5.0-core.jar export zipfjar=$HOME/installation/pig/pig-0.5.0/sdsuLibJKD14.jar export datagenjar=$HOME/rs46/installation/DataGenerator/dist/MyPig.jar export conf_file=/usr/lib/hadoop/conf/hadoop-site.xml export HADOOP_CLASSPATH=$pigjar:$zipfjar:$datagenjar /usr/lib/hadoop/bin/hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -m 1 -rows 10000000 -f words.dat s:8:50:z:0 ------------------ The error I receive when trying to run it with "-m 1" option (in cluster mode): Caused by: java.lang.ClassNotFoundException: sdsu.algorithms.data.Zipf So in local mode, it successfully picks up the jar file sdsuLibJKD14.jar , but when running it in cluster mode, this classpath is not found? thanks. Rob Stewart
