Thanks for  persevering Rob! :)

-D

On Thu, Jan 14, 2010 at 4:16 PM, Rob Stewart
<[email protected]> wrote:
> Cheers Alan,
>
> Done.
>
> Rob.
>
>
> 2010/1/14 Alan Gates <[email protected]>
>
>> Rob,
>>
>> Feel free to update the wiki with your findings.  You don't have to be a
>> committer to change the wiki.
>>
>> Alan.
>>
>>
>> On Jan 14, 2010, at 12:15 PM, Rob Stewart wrote:
>>
>>  Hello Dmitry!
>>>
>>> I have it solved, it was just a bit of trial and error based on the Hive
>>> bug
>>> report/fix I found.
>>>
>>> The report is indeed correct, the following works:
>>>
>>>> hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator
>>>>
>>> -libjars $zipfjar -conf $conf_file -rows 10000000 -m 3 -f
>>> /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
>>>
>>> This puts the Pig wiki out of date for Hadoop 0.20, but is still relevant
>>> for Hadoop 0.18 and less.
>>>
>>> May I propose that you update the wiki as such:
>>> ------------------------
>>> DataGenerator Usage:
>>> For 0.18.0
>>>
>>>> hadoop jar -libjars $zipfjar $datagenjar
>>>>
>>> org.apache.pig.test.utils.datagen.DataGenerator </pig/DataGenerator> -conf
>>> $conf_file [options] colspec...
>>>
>>> For 0.20.0
>>>
>>>> hadoop jar $datagenjar
>>>> org.apache.pig.test.utils.datagen.DataGenerator</pig/DataGenerator>
>>>>  -libjars
>>>>
>>> $zipfjar -conf $conf_file [options] colspec...
>>> --------------
>>>
>>> Sound OK ?
>>>
>>>
>>> Rob Stewart
>>>
>>>
>>> 2010/1/14 Rob Stewart <[email protected]>
>>>
>>>  Yeah, unfortunately your suggestion does not work, and neither does the
>>>> order given on the Pig wiki. Instead, see the Hadoop wiki for -libjars
>>>> usage:
>>>>
>>>> hadoop jar hadoop-examples.jar wordcount -files cachefile.txt -libjars
>>>> mylib.jar input output
>>>>
>>>> So I tried this:
>>>> hadoop jar $datagenjar org.apache.pig.test.utils.datagen.DataGenerator
>>>> -conf $conf_file -rows 10000000 -f
>>>> /scratch/tmpHDFS_files/wordsx1_skewed.dat
>>>> -libjars $zipfjar s:8:50:z:0
>>>>
>>>> However, the DataGenerator does not like it as one of its' options:
>>>> ---------
>>>> Couldn't parse the command line arguments, Found unknown option
>>>> (-libjars)
>>>> at position 5
>>>> ---------
>>>>
>>>> I'd be happy/surprised to hear from anyone who can use the format given
>>>> on
>>>> the Pig wiki for the DataGenerator, in cluster mode (using -m parameter).
>>>>
>>>> Any more suggestions Dmitry, and thanks for your help, it's mucho
>>>> appreciated!
>>>>
>>>> Rob
>>>>
>>>>
>>>>
>>>> 2010/1/14 Dmitriy Ryaboy <[email protected]>
>>>>
>>>>  Sorry if I am not reading carefully enough -- but the bug report you
>>>>> cite seems to indicate you want
>>>>>
>>>>> hadoop jar org.apache.pig.test.utils.datagen.DataGenerator -libjars
>>>>> $zipfjar $datagenjar -conf $conf_file -rows
>>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
>>>>>
>>>>> (possibly separating zipfjar and datagenjar with commas if that patch
>>>>> was applied to your version of 20)
>>>>>
>>>>> which I don't see in the list of things you tried?
>>>>>
>>>>> -D
>>>>>
>>>>> On Thu, Jan 14, 2010 at 10:13 AM, Rob Stewart
>>>>> <[email protected]> wrote:
>>>>>
>>>>>> Hi Dmitriy,
>>>>>>
>>>>>> No, I do think that there was a change in 0.20.0
>>>>>>
>>>>>> See the error I get:
>>>>>> Exception in thread "main" java.io.IOException: Error opening job jar:
>>>>>> -libjars
>>>>>>
>>>>>> This is what I am trying to run:
>>>>>> hadoop jar -libjars $zipfjar $datagenjar
>>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows
>>>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
>>>>>>
>>>>>> The $zipfjar has only one jar file in this classpath. It seems that
>>>>>>
>>>>> there
>>>>>
>>>>>> was a change to hadoop 0.20.0, not allowing for the option -libjars
>>>>>> immediately after "hadoop jar".
>>>>>>
>>>>>> This is the extract from the Hive bug report I was talking about:
>>>>>> -------------
>>>>>>
>>>>>>
>>>>>> In hadoop-20 - the -libjars has to come after the jar file/class
>>>>>>
>>>>>> Please try applying this patch to bin/ext/cli.sh
>>>>>>
>>>>>> --- cli.sh  (revision 789726)
>>>>>> +++ cli.sh  (working copy)
>>>>>> @@ -10,7 +10,7 @@
>>>>>>   exit 3;
>>>>>>  fi
>>>>>>
>>>>>> -  exec $HADOOP jar $AUX_JARS_CMD_LINE ${HIVE_LIB}/hive_cli.jar $CLASS
>>>>>> $HIVE_OPTS "$@"
>>>>>> +  exec $HADOOP jar ${HIVE_LIB}/hive_cli.jar $CLASS $AUX_JARS_CMD_LINE
>>>>>> $HIVE_OPTS "$@"
>>>>>> }
>>>>>>
>>>>>> ----------------
>>>>>>
>>>>>> I have also tried:
>>>>>> hadoop jar -libjars [full_location_to_sdsuLibJKD14.jar] $datagenjar
>>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file -rows
>>>>>> 10000000 -f /scratch/tmpHDFS_files/wordsx1_skewed.dat s:8:50:z:0
>>>>>>
>>>>>> This gives the same error.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Rob
>>>>>>
>>>>>> 2010/1/14 Dmitriy Ryaboy <[email protected]>
>>>>>>
>>>>>>  I think the link you sent got malformatted, but try separating the
>>>>>>> jars with a comma
>>>>>>> http://issues.apache.org/jira/browse/HADOOP-4864
>>>>>>>
>>>>>>> On Thu, Jan 14, 2010 at 7:40 AM, Rob Stewart
>>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>>> Hi Dmitriy,
>>>>>>>>
>>>>>>>> OK, well it seems that since 0.20.0 the order as specified on the Pig
>>>>>>>>
>>>>>>> wiki
>>>>>>>
>>>>>>>> is no longer relevant:
>>>>>>>> doop jar -libjars $zipfjar $datagenjar
>>>>>>>>
>>>>>>> org.apache.pig.test.utils.datagen.
>>>>>
>>>>>> DataGenerator </pig/DataGenerator> -conf $conf_file [options]
>>>>>>>>
>>>>>>> colspec...
>>>>>
>>>>>>
>>>>>>>> See this patch over at Hive for 0.20.0:
>>>>>>>>
>>>>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/hadoop-hive-user/200907.mbox/<
>>>>>
>>>>>> dfd95197f3ae8c45b0a96c2f4ba3a2556c8358c...@sc-mbxc1.thefacebook.com>
>>>>>>>>
>>>>>>>> I have tried a few combinations, but I can't seem to fit in the
>>>>>>>>
>>>>>>> "-libjars
>>>>>
>>>>>> $zipfjar" in anywhere now.
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>> Thanks for your help.
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2010/1/14 Dmitriy Ryaboy <[email protected]>
>>>>>>>>
>>>>>>>>  Rob,
>>>>>>>>> You need to tell Hadoop which jars you need it to ship to the worker
>>>>>>>>> nodes. You include datagen.jar, etc, on the classpath, which makes
>>>>>>>>> them discoverable locally, but you aren't telling Hadoop to ship
>>>>>>>>>
>>>>>>>> them.
>>>>>
>>>>>>  You want to list them, comma-separated, in the -libjars parameter.
>>>>>>>>>
>>>>>>>>> -D
>>>>>>>>>
>>>>>>>>> On Thu, Jan 14, 2010 at 6:49 AM, Rob Stewart
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi there.
>>>>>>>>>>
>>>>>>>>>> I am well underway with comparing Pig, Hive, JAQL etc...
>>>>>>>>>>
>>>>>>>>>> The DataGenerator is proving a valuable tool for me. Thanks for
>>>>>>>>>>
>>>>>>>>> that.
>>>>>
>>>>>>
>>>>>>>>>> I have one query. I am able to use it in local mode, no problem,
>>>>>>>>>>
>>>>>>>>> and
>>>>>
>>>>>> some
>>>>>>>
>>>>>>>> experiments are complete.
>>>>>>>>>>
>>>>>>>>>> However, I cannot seem to use it in MapReduce mode on the cluster.
>>>>>>>>>>
>>>>>>>>> This
>>>>>>>
>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> my file "generateData" contents:
>>>>>>>>>> ------------------
>>>>>>>>>> export pigjar=$HOME/installation/pig/pig-0.5.0/pig-0.5.0-core.jar
>>>>>>>>>> export zipfjar=$HOME/installation/pig/pig-0.5.0/sdsuLibJKD14.jar
>>>>>>>>>> export
>>>>>>>>>>
>>>>>>>>> datagenjar=$HOME/rs46/installation/DataGenerator/dist/MyPig.jar
>>>>>
>>>>>>  export conf_file=/usr/lib/hadoop/conf/hadoop-site.xml
>>>>>>>>>> export HADOOP_CLASSPATH=$pigjar:$zipfjar:$datagenjar
>>>>>>>>>> /usr/lib/hadoop/bin/hadoop jar $datagenjar
>>>>>>>>>> org.apache.pig.test.utils.datagen.DataGenerator -conf $conf_file
>>>>>>>>>>
>>>>>>>>> -m 1
>>>>>
>>>>>>  -rows
>>>>>>>>>
>>>>>>>>>> 10000000 -f words.dat s:8:50:z:0
>>>>>>>>>> ------------------
>>>>>>>>>>
>>>>>>>>>> The error I receive when trying to run it with "-m 1" option (in
>>>>>>>>>>
>>>>>>>>> cluster
>>>>>>>
>>>>>>>> mode):
>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>
>>>>>>>>> sdsu.algorithms.data.Zipf
>>>>>
>>>>>>
>>>>>>>>>> So in local mode, it successfully picks up the jar file
>>>>>>>>>>
>>>>>>>>> sdsuLibJKD14.jar
>>>>>>>
>>>>>>>> ,
>>>>>>>>>
>>>>>>>>>> but when running it in cluster mode, this classpath is not found?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> thanks.
>>>>>>>>>>
>>>>>>>>>> Rob Stewart
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>
>

Reply via email to