A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative path. Pig will understand file:/home/geoffeg/test.json or file:///home/geoffeg/test.json to load the absolute path. Same goes for a file in hdfs://
HTH, Joe On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway <[email protected]>wrote: > Hello, I'm looking for some clues to help me fix an annoying error I'm > getting using Pig. > > I need to parse a large JSON file so I grabbed kimsterv's ( > https://gist.github.com/601331) JSON loader, compiled it and successfully > tested it on my laptop via -x local. However, when I try to run it on the > edgenode of our dev hadoop instance I am unable to get it to work, even if > I > run it in -x local. I get > "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable > to > create input splits for test.json". I looked through the mailing list for > this message, only to find a mention of it being related to LZO compression > issues. I'm not using any file compression and this error still occurs when > running in -x local on the edgenode of the dev cluster. Is there some > environment variables I'm missing? Maybe some permissions issues I'm > unaware > of? Suggestions and theories welcome! > > Hadoop version: Hadoop 0.20.2+737 > Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar) > > Command line: > java -cp '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.' > org.apache.pig.Main -v -x local json.pig > > Pig script: > REGISTER /home/geoffeg/pig-functions/jsontester.jar; > -- file:// should specify the local FS, remove file:// to specify HDFS > A = LOAD 'file://home/geoffeg/test.json' using > org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] ); > B = foreach A generate json#'_keyword'; > DUMP B; > > Full error/log: > 2011-01-09 22:33:29,692 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > Connecting > to hadoop file system at: file:/// > 2011-01-09 22:33:30,345 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column pruned > for A > 2011-01-09 22:33:30,345 [main] INFO > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key required > for A: $0->[_keyword] > 2011-01-09 22:33:30,455 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: > > Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage) > - 1-36 Operator Key: 1-36) > 2011-01-09 22:33:30,482 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size before optimization: 1 > 2011-01-09 22:33:30,482 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > - MR plan size after optimization: 1 > 2011-01-09 22:33:30,517 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with > processName=JobTracker, sessionId= > 2011-01-09 22:33:30,522 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 > 2011-01-09 22:33:32,520 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > - Setting up single store job > 2011-01-09 22:33:32,552 [main] INFO > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized > 2011-01-09 22:33:32,552 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map-reduce job(s) waiting for submission. > 2011-01-09 22:33:32,562 [Thread-2] WARN org.apache.hadoop.mapred.JobClient > - Use GenericOptionsParser for parsing the arguments. Applications should > implement Tool for the same. > 2011-01-09 22:33:32,692 [Thread-2] INFO org.apache.hadoop.mapred.JobClient > - Cleaning up the staging area > > file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001 > 2011-01-09 22:33:33,054 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 0% complete > 2011-01-09 22:33:33,054 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 100% complete > 2011-01-09 22:33:33,054 [main] ERROR > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 1 map reduce job(s) failed! > 2011-01-09 22:33:33,064 [main] ERROR > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149" > 2011-01-09 22:33:33,064 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Records written : Unable to determine number of records written > 2011-01-09 22:33:33,065 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Bytes written : Unable to determine number of bytes written > 2011-01-09 22:33:33,065 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Spillable Memory Manager spill count : 0 > 2011-01-09 22:33:33,065 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Proactive spill count : 0 > 2011-01-09 22:33:33,065 [main] INFO > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Failed! > 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2997: Unable to recreate exception from backend error: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to > create input splits for: file://home/geoffeg/test.json > 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt - > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias B > at org.apache.pig.PigServer.openIterator(PigServer.java:607) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545) > at > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:414) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR > 2997: > Unable to recreate exception from backend error: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to > create input splits for: file://home/geoffeg/test.json > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169) > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270) > at > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007) > at org.apache.pig.PigServer.store(PigServer.java:697) > at org.apache.pig.PigServer.openIterator(PigServer.java:590) > ... 6 more > > -- > Sent from my email client. >
