Re: Pig and Hadoop Integration Error

rahul Thu, 26 Aug 2010 18:22:45 -0700

Hi Jeff,

I have set the hadoop conf in class path by setting $HADOOP_CONF_DIR variable.


But I have both Pig and hadoop running at the same machine, so localhost should 
not make a difference.

So I have used all the default config setting for the core-site.xml, 
hdfs-site.xml, mapred-site.xml, as per the hadoop tutorial.

Please let me know if my understanding is correct ?

I am attaching the conf files as well :
hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

</configuration>

core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  
<value>/Users/rahulmalviya/Documents/Hadoop/hadoop-0.21.0/hadoop-${user.name}</value>
  <description>A base for other temporary directories.</description>
</property>
</configuration>

mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at. If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of tasks that will be run simultaneously by a
a task tracker
</description>
</property>
</configuration>

Please let me know if there is a issue in my configurations ? Any input is 
valuable for me.

Thanks,
Rahul

On Aug 26, 2010, at 6:10 PM, Jeff Zhang wrote:

> Do you put the hadoop conf on classpath ? It seems you are still using
> local file system but conncect Hadoop's JobTracker.
> Make sure you set the correct configuration in core-site.xml
> hdfs-site.xml, mapred-site.xml, and put them on classpath.
> 
> 
> 
> On Thu, Aug 26, 2010 at 5:32 PM, rahul <rmalv...@apple.com> wrote:
>> Hi ,
>> 
>> I am trying to integrate Pig with Hadoop for processing of jobs.
>> 
>> I am able to run Pig in local mode and Hadoop with streaming api perfectly.
>> 
>> But when I try to run Pig with Hadoop I get follwong Error:
>> 
>> Pig Stack Trace
>> ---------------
>> ERROR 2116: Unexpected error. Could not validate the output specification 
>> for: file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>> 
>> org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An unexpected 
>> exception caused the validation to stop
>>        at 
>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:56)
>>        at 
>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:49)
>>        at 
>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileValidator.validate(InputOutputFileValidator.java:37)
>>        at 
>> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
>>        at org.apache.pig.PigServer.validate(PigServer.java:930)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:910)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:871)
>>        at org.apache.pig.PigServer.compileLp(PigServer.java:852)
>>        at org.apache.pig.PigServer.execute(PigServer.java:816)
>>        at org.apache.pig.PigServer.access$100(PigServer.java:105)
>>        at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
>>        at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
>>        at 
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:109)
>>        at 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
>>        at 
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>>        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>>        at org.apache.pig.Main.main(Main.java:391)
>> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 2116: 
>> Unexpected error. Could not validate the output specification for: 
>> file:///Users/rahulmalviya/Documents/Pig/dev/main_merged_hdp_out
>>        at 
>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:93)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:140)
>>        at org.apache.pig.impl.logicalLayer.LOStore.visit(LOStore.java:37)
>>        at 
>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>>        at 
>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at 
>> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
>>        at 
>> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>>        at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>>        at 
>> org.apache.pig.impl.plan.PlanValidator.validate(PlanValidator.java:50)
>>        ... 16 more
>> Caused by: java.io.IOException: Call to localhost/127.0.0.1:9001 failed on 
>> local exception: java.io.EOFException
>>        at org.apache.hadoop.ipc.Client.wrapException(Client.java:775)
>>        at org.apache.hadoop.ipc.Client.call(Client.java:743)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>        at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
>>        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
>>        at 
>> org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
>>        at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
>>        at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
>>        at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
>>        at 
>> org.apache.pig.impl.logicalLayer.validators.InputOutputFileVisitor.visit(InputOutputFileVisitor.java:89)
>>        ... 24 more
>> Caused by: java.io.EOFException
>>        at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>        at 
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:501)
>>        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:446)
>> ================================================================================
>> 
>> Did anyone got the same error. I think it related to connection between pig 
>> and hadoop.
>> 
>> Can someone tell me how to connect Pig and hadoop.
>> 
>> Thanks.
>> 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

Re: Pig and Hadoop Integration Error

Reply via email to