Thanks Alan. I fixed the cluster property in pig.properties and it worked. I
was simply using instructions from the get-started wiki and seem to have
missed editing the pig.properties file.
Thanks!
Prashanth
On Fri, May 23, 2008 at 3:21 PM, Alan Gates <[EMAIL PROTECTED]> wrote:
> A couple of questions. Is the hadoop-site.xml in your class path when you
> run pig? In you pig.properties file, what do you have exectype set to? It
> should be set to mapreduce. What do you have cluster set to? It should be
> the hostname:port for the job tracker of your cluster.
>
> Alan.
>
>
> Prashanth Pappu wrote:
>
>> All:
>>
>> I've seen a thread with a similar issue and it was left unresolved.
>> So, here it goes again -
>>
>> (a) I'm trying to get PIG to connect to a HADOOP cluster and execute a
>> script.
>> (a.1) The hadoop-site.xml file is in /home/hadoop and the script is
>> /home/hadoop/tmp/pig
>>
>> (b) PIG finds the data file in DFS but does not run any mapreduce jobs on
>> the cluster (of 6 nodes). Instead it runs all the mapreduce jobs using a
>> local job runner.
>>
>> (c) What am I missing? How do I get PIG to schedule its mapred jobs on the
>> cluster?
>>
>> Thanks,
>> Prashanth
>>
>>
>>
>>> verbose debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main -v /home/hadoop/tmp.pig
>> 2008-05-23 16:43:44,636 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:43:44,656 [main] DEBUG org.apache.hadoop.conf.Configuration
>> -
>> java.io.IOException: config()
>> at
>> org.apache.hadoop.conf.Configuration.<init>(Configuration.java:156)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.toConfiguration(ConfigurationUtil.java:14)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.init(HDataStorage.java:45)
>> at
>>
>> org.apache.pig.backend.hadoop.datastorage.HDataStorage.<init>(HDataStorage.java:36)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:139)
>> at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:106)
>> at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
>> at org.apache.pig.PigServer.<init>(PigServer.java:149)
>> at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:43)
>> at org.apache.pig.Main.main(Main.java:295)
>> ...
>>
>>
>>
>>> simple debug
>>>>
>>>>
>>>
>> java -cp /home/hadoop:pig.jar org.apache.pig.Main /home/hadoop/tmp.pig
>> 2008-05-23 16:51:13,076 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
>> Connecting
>> to hadoop file system at: local
>> 2008-05-23 16:51:13,386 [main] INFO
>> org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
>> processName=JobTracker, sessionId=
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - -----
>> MapReduce
>> Job -----
>> 2008-05-23 16:51:14,038 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Input:
>> [/user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(','),
>> /user/hadoop/prashanth/log1:PigStorage(',')]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map:
>> [[*]->GENERATE {[FLOOR(GENERATE {[PROJECT $1]})],[PROJECT $3],[PROJECT
>> $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT $3],[PROJECT
>> $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT
>> $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['1']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}, [*]->GENERATE {[FLOOR(GENERATE {[PROJECT
>> $1]})],[PROJECT $3],[PROJECT $38],[PROJECT $37],[PROJECT $32]}->GENERATE
>> {[org.apache.pig.impl.builtin.MULTIPLY(GENERATE
>> {[org.apache.pig.impl.builtin.ADD(GENERATE {[FLOOR(GENERATE
>> {[org.apache.pig.impl.builtin.DIVIDE(GENERATE {[PROJECT
>> $0],['2']})]})],['2']})],['2']})],[PROJECT $0],[PROJECT $1],[PROJECT
>> $3],[PROJECT $2],[PROJECT $4]}]
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Group: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Combine: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce: null
>> 2008-05-23 16:51:14,039 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Output:
>> /tmp/temp-711499347/tmp128716201:org.apache.pig.builtin.BinStorage
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Split: null
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Map
>> parallelism:
>> -1
>> 2008-05-23 16:51:14,040 [main] INFO
>> org.apache.pig.backend.hadoop.executionengine.POMapreduce - Reduce
>> parallelism: -1
>> 2008-05-23 16:51:15,640 [Thread-14] INFO org.apache.hadoop.mapred.MapTask
>> -
>> numReduceTasks: 1
>> 2008-05-23 16:51:16,531 [main] INFO
>>
>> org.apache.pig.backend.hadoop.executionengine.mapreduceExec.MapReduceLauncher
>> - Pig progress = 0%
>> 2008-05-23 16:51:17,344 [Thread-14] WARN
>> org.apache.hadoop.mapred.LocalJobRunner - job_local_1
>>
>>
>>
>