Hi all,Facing a weird problem and wondering if anyone
has run into this before. I've been playing with PigServer to programmatically
run some simple pig scripts and it does not seem to be connecting to HDFS when
I pass in ExecType.MAPREDUCE.I am running in pseudo-distributed mode and have
the tasktracker and namenode both running on default ports. When I run scripts
by using "pig script.pig" or from the grunt console it connects to hdfs and
works fine.Do I need to specify some additional properties in the PigServer
constructor, or construct a custom PigContext? I had assumed that by passing
ExecType.MAPREDUCE and using the defaults, everything would be fine.Would
really appreciate any insight or anecdotes of others using PigServer and how
they have it set up. Thanks a bunch!-ZachHere is the code I'm using:PigServer
pigServer = new
PigServer("mapreduce");pigServer.setBatchOn();pigServer.registerScript("/Users/zach/Desktop/test.pig");List<ExecJob>
jobs = pigServer.executeBat
ch();and
here is the log output:0 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to
hadoop file system at: file:///622 [main] INFO
org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned for
pages622 [main] INFO org.apache.pig.impl.logicalLayer.optimizer.PruneColumns
- No map keys pruned for pages659 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with
processName=JobTracker, sessionId=751 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name:
Store(file:///output:PigStorage) - 1-70 Operator Key: 1-70)789 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1790 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1815 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetri
cs - C
annot initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized822 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot
initialize JVM Metrics with processName=JobTracker, sessionId= - already
initialized822 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default
0.32534 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job2582 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized2582 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.2590 [Thread-4] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the
arguments. Applications should imp
lement T
ool for the same.2746 [Thread-4] INFO org.apache.hadoop.metrics.jvm.JvmMetrics
- Cannot initialize JVM Metrics with processName=JobTracker, sessionId= -
already initialized2765 [Thread-4] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with
processName=JobTracker, sessionId= - already initialized3083 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete3084 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete3084 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!3085 [main] WARN
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - There
is no log file to write to.3085 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher -
Backend error message during job
submissionorg.apache.pig.backend.executionengine
.ExecExc
eption: ERROR 2118: Unable to create input splits for: file:///input at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779) at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at
org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378) at
org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
at java.lang.Thread.run(Thread.java:637)Caused by:
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist: file:/input at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at org.apac
he.hadoo
p.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)
... 7 more3092 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in: "file:///output"3092 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!