Thanks for persevering Rob! :) -D
On Thu, Jan 14, 2010 at 4:16 PM, Rob Stewart <[email protected]> wrote: > Cheers Alan, > > Done. > > Rob. > > > 2010/1/14 Alan Gates <[email protected]> > >> Rob, >> >> Feel free to update the wiki with your findings. You don't have to be a >> committer to change the wiki. >> >> Alan. >> >> >> On Jan 14, 2010, at 12:15 PM, Rob Stewart wrote: >> >> Hello Dmitry! >>> >>> I have it solved, it was just a bit of trial and error based on the Hive >>> bug >>> report/fix I found. >>> >>> The report is indeed correct, the following works: >>> >>>> hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator >>>> >>> -libjars $zipfjar -conf $conf_file -rows 10000000 -m 3 -f >>> /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0 >>> >>> This puts the Pig wiki out of date for Hadoop 0.20, but is still relevant >>> for Hadoop 0.18 and less. >>> >>> May I propose that you update the wiki as such: >>> ------------------------ >>> DataGenerator Usage: >>> For 0.18.0 >>> >>>> hadoop jar -libjars $zipfjar $datagenjar >>>> >>> org.apache.pig.test.utils.datagen.DataGenerator </pig/DataGenerator> -conf >>> $conf_file [options] colspec... >>> >>> For 0.20.0 >>> >>>> hadoop jar $datagenjar >>>> org.apache.pig.test.utils.datagen.DataGenerator</pig/DataGenerator> >>>> -libjars >>>> >>> $zipfjar -conf $conf_file [options] colspec... >>> -------------- >>> >>> Sound OK ? >>> >>> >>> Rob Stewart >>> >>> >>> 2010/1/14 Rob Stewart <[email protected]> >>> >>> Yeah, unfortunately your suggestion does not work, and neither does the >>>> order given on the Pig wiki. Instead, see the Hadoop wiki for -libjars >>>> usage: >>>> >>>> hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars >>>> mylib.jar input output >>>> >>>> So I tried this: >>>> hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator >>>> -conf $conf_file -rows 10000000 -f >>>> /scratch/tmpHDFS_files/wordsx1_skewed.dat >>>> -libjars $zipfjar s:8:50:z:0 >>>> >>>> However, the DataGenerator does not like it as one of its' options: >>>> --------- >>>> Couldn't parse the command line arguments, Found unknown option >>>> (-libjars) >>>> at position 5 >>>> --------- >>>> >>>> I'd be happy/surprised to hear from anyone who can use the format given >>>> on >>>> the Pig wiki for the DataGenerator, in cluster mode (using -m parameter). >>>> >>>> Any more suggestions Dmitry, and thanks for your help, it's mucho >>>> appreciated! >>>> >>>> Rob >>>> >>>> >>>> >>>> 2010/1/14 Dmitriy Ryaboy <[email protected]> >>>> >>>> Sorry if I am not reading carefully enough -- but the bug report you >>>>> cite seems to indicate you want >>>>> >>>>> hadoop jar org.apache.pig.test.utils.datagen.DataGenerator -libjars >>>>> $zipfjar $datagenjar -conf $conf_file -rows >>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0 >>>>> >>>>> (possibly separating zipfjar and datagenjar with commas if that patch >>>>> was applied to your version of 20) >>>>> >>>>> which I don't see in the list of things you tried? >>>>> >>>>> -D >>>>> >>>>> On Thu, Jan 14, 2010 at 10:13 AM, Rob Stewart >>>>> <[email protected]> wrote: >>>>> >>>>>> Hi Dmitriy, >>>>>> >>>>>> No, I do think that there was a change in 0.20.0 >>>>>> >>>>>> See the error I get: >>>>>> Exception in thread "main" java.io.IOException: Error opening job jar: >>>>>> -libjars >>>>>> >>>>>> This is what I am trying to run: >>>>>> hadoop jar -libjars $zipfjar $datagenjar >>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows >>>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0 >>>>>> >>>>>> The $zipfjar has only one jar file in this classpath. It seems that >>>>>> >>>>> there >>>>> >>>>>> was a change to hadoop 0.20.0, not allowing for the option -libjars >>>>>> immediately after "hadoop jar". >>>>>> >>>>>> This is the extract from the Hive bug report I was talking about: >>>>>> ------------- >>>>>> >>>>>> >>>>>> In hadoop-20 - the -libjars has to come after the jar file/class >>>>>> >>>>>> Please try applying this patch to bin/ext/cli.sh >>>>>> >>>>>> --- cli.sh (revision 789726) >>>>>> +++ cli.sh (working copy) >>>>>> @@ -10,7 +10,7 @@ >>>>>> exit 3; >>>>>> fi >>>>>> >>>>>> - exec $HADOOP jar $AUX_JARS_CMD_LINE ${HIVE_LIB}/hive_cli.jar $CLASS >>>>>> $HIVE_OPTS "$@" >>>>>> + exec $HADOOP jar ${HIVE_LIB}/hive_cli.jar $CLASS $AUX_JARS_CMD_LINE >>>>>> $HIVE_OPTS "$@" >>>>>> } >>>>>> >>>>>> ---------------- >>>>>> >>>>>> I have also tried: >>>>>> hadoop jar -libjars [full_location_to_sdsuLibJKD14.jar] $datagenjar >>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows >>>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0 >>>>>> >>>>>> This gives the same error. >>>>>> >>>>>> >>>>>> >>>>>> Rob >>>>>> >>>>>> 2010/1/14 Dmitriy Ryaboy <[email protected]> >>>>>> >>>>>> I think the link you sent got malformatted, but try separating the >>>>>>> jars with a comma >>>>>>> http://issues.apache.org/jira/browse/HADOOP-4864 >>>>>>> >>>>>>> On Thu, Jan 14, 2010 at 7:40 AM, Rob Stewart >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>> Hi Dmitriy, >>>>>>>> >>>>>>>> OK, well it seems that since 0.20.0 the order as specified on the Pig >>>>>>>> >>>>>>> wiki >>>>>>> >>>>>>>> is no longer relevant: >>>>>>>> doop jar -libjars $zipfjar $datagenjar >>>>>>>> >>>>>>> org.apache.pig.test.utils.datagen. >>>>> >>>>>> DataGenerator </pig/DataGenerator> -conf $conf_file [options] >>>>>>>> >>>>>>> colspec... >>>>> >>>>>> >>>>>>>> See this patch over at Hive for 0.20.0: >>>>>>>> >>>>>>>> >>>>> http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/200907.mbox/< >>>>> >>>>>> dfd95197f3ae8c45b0a96c2f4ba3a2556c8358c...@sc-mbxc1.thefacebook.com> >>>>>>>> >>>>>>>> I have tried a few combinations, but I can't seem to fit in the >>>>>>>> >>>>>>> "-libjars >>>>> >>>>>> $zipfjar" in anywhere now. >>>>>>>> >>>>>>>> Any ideas? >>>>>>>> >>>>>>>> Thanks for your help. >>>>>>>> >>>>>>>> Rob >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2010/1/14 Dmitriy Ryaboy <[email protected]> >>>>>>>> >>>>>>>> Rob, >>>>>>>>> You need to tell Hadoop which jars you need it to ship to the worker >>>>>>>>> nodes. You include datagen.jar, etc, on the classpath, which makes >>>>>>>>> them discoverable locally, but you aren't telling Hadoop to ship >>>>>>>>> >>>>>>>> them. >>>>> >>>>>> You want to list them, comma-separated, in the -libjars parameter. >>>>>>>>> >>>>>>>>> -D >>>>>>>>> >>>>>>>>> On Thu, Jan 14, 2010 at 6:49 AM, Rob Stewart >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi there. >>>>>>>>>> >>>>>>>>>> I am well underway with comparing Pig, Hive, JAQL etc... >>>>>>>>>> >>>>>>>>>> The DataGenerator is proving a valuable tool for me. Thanks for >>>>>>>>>> >>>>>>>>> that. >>>>> >>>>>> >>>>>>>>>> I have one query. I am able to use it in local mode, no problem, >>>>>>>>>> >>>>>>>>> and >>>>> >>>>>> some >>>>>>> >>>>>>>> experiments are complete. >>>>>>>>>> >>>>>>>>>> However, I cannot seem to use it in MapReduce mode on the cluster. >>>>>>>>>> >>>>>>>>> This >>>>>>> >>>>>>>> is >>>>>>>>> >>>>>>>>>> my file "generateData" contents: >>>>>>>>>> ------------------ >>>>>>>>>> export pigjar=$HOME/installation/pig/pig-0.5.0/pig-0.5.0-core.jar >>>>>>>>>> export zipfjar=$HOME/installation/pig/pig-0.5.0/sdsuLibJKD14.jar >>>>>>>>>> export >>>>>>>>>> >>>>>>>>> datagenjar=$HOME/rs46/installation/DataGenerator/dist/MyPig.jar >>>>> >>>>>> export conf_file=/usr/lib/hadoop/conf/hadoop-site.xml >>>>>>>>>> export HADOOP_CLASSPATH=$pigjar:$zipfjar:$datagenjar >>>>>>>>>> /usr/lib/hadoop/bin/hadoop jar $datagenjar >>>>>>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file >>>>>>>>>> >>>>>>>>> -m 1 >>>>> >>>>>> -rows >>>>>>>>> >>>>>>>>>> 10000000 -f words.dat s:8:50:z:0 >>>>>>>>>> ------------------ >>>>>>>>>> >>>>>>>>>> The error I receive when trying to run it with "-m 1" option (in >>>>>>>>>> >>>>>>>>> cluster >>>>>>> >>>>>>>> mode): >>>>>>>>>> Caused by: java.lang.ClassNotFoundException: >>>>>>>>>> >>>>>>>>> sdsu.algorithms.data.Zipf >>>>> >>>>>> >>>>>>>>>> So in local mode, it successfully picks up the jar file >>>>>>>>>> >>>>>>>>> sdsuLibJKD14.jar >>>>>>> >>>>>>>> , >>>>>>>>> >>>>>>>>>> but when running it in cluster mode, this classpath is not found? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> thanks. >>>>>>>>>> >>>>>>>>>> Rob Stewart >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >> >
