Thanks to Joe and Daniel, I was able to fix this issue. It was a combination of ambiguity about file paths (which Joe's message helped me confirm) and an error in my Java that wasn't causing an exception and failing silently.
Thanks, Geoff On Wed, Jan 12, 2011 at 7:43 AM, Joe Crobak <[email protected]> wrote: > A = LOAD 'file://home/geoffeg/test.json' will try to load using a relative > path. Pig will understand file:/home/geoffeg/test.json or > file:///home/geoffeg/test.json to load the absolute path. Same goes for a > file in hdfs:// > > HTH, > Joe > > On Sun, Jan 9, 2011 at 11:47 PM, Geoffrey Gallaway <[email protected] > >wrote: > > > Hello, I'm looking for some clues to help me fix an annoying error I'm > > getting using Pig. > > > > I need to parse a large JSON file so I grabbed kimsterv's ( > > https://gist.github.com/601331) JSON loader, compiled it and > successfully > > tested it on my laptop via -x local. However, when I try to run it on the > > edgenode of our dev hadoop instance I am unable to get it to work, even > if > > I > > run it in -x local. I get > > "org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable > > to > > create input splits for test.json". I looked through the mailing list for > > this message, only to find a mention of it being related to LZO > compression > > issues. I'm not using any file compression and this error still occurs > when > > running in -x local on the edgenode of the dev cluster. Is there some > > environment variables I'm missing? Maybe some permissions issues I'm > > unaware > > of? Suggestions and theories welcome! > > > > Hadoop version: Hadoop 0.20.2+737 > > Pig version: 0.7.0+16 (compiled against the pig 0.7.0 jar) > > > > Command line: > > java -cp > '/usr/lib/pig/*:/usr/lib/hadoop/*:/usr/lib/hadoop/lib/*:libs/*:.' > > org.apache.pig.Main -v -x local json.pig > > > > Pig script: > > REGISTER /home/geoffeg/pig-functions/jsontester.jar; > > -- file:// should specify the local FS, remove file:// to specify HDFS > > A = LOAD 'file://home/geoffeg/test.json' using > > org.geoffeg.hadoop.pig.loader.PigJsonLoader() as ( json: map[] ); > > B = foreach A generate json#'_keyword'; > > DUMP B; > > > > Full error/log: > > 2011-01-09 22:33:29,692 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - > > Connecting > > to hadoop file system at: file:/// > > 2011-01-09 22:33:30,345 [main] INFO > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - No column > pruned > > for A > > 2011-01-09 22:33:30,345 [main] INFO > > org.apache.pig.impl.logicalLayer.optimizer.PruneColumns - Map key > required > > for A: $0->[_keyword] > > 2011-01-09 22:33:30,455 [main] INFO > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: > > > > > Store(file:/tmp/temp1814319995/tmp1141533149:org.apache.pig.builtin.BinStorage) > > - 1-36 Operator Key: 1-36) > > 2011-01-09 22:33:30,482 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size before optimization: 1 > > 2011-01-09 22:33:30,482 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer > > - MR plan size after optimization: 1 > > 2011-01-09 22:33:30,517 [main] INFO > > org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with > > processName=JobTracker, sessionId= > > 2011-01-09 22:33:30,522 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - mapred.job.reduce.markreset.buffer.percent is not set, set to default > 0.3 > > 2011-01-09 22:33:32,520 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler > > - Setting up single store job > > 2011-01-09 22:33:32,552 [main] INFO > > org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics > > with processName=JobTracker, sessionId= - already initialized > > 2011-01-09 22:33:32,552 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map-reduce job(s) waiting for submission. > > 2011-01-09 22:33:32,562 [Thread-2] WARN > org.apache.hadoop.mapred.JobClient > > - Use GenericOptionsParser for parsing the arguments. Applications should > > implement Tool for the same. > > 2011-01-09 22:33:32,692 [Thread-2] INFO > org.apache.hadoop.mapred.JobClient > > - Cleaning up the staging area > > > > > file:/tmp/hadoop-geoffeg/mapred/staging/geoffeg395595954/.staging/job_local_0001 > > 2011-01-09 22:33:33,054 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 0% complete > > 2011-01-09 22:33:33,054 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 100% complete > > 2011-01-09 22:33:33,054 [main] ERROR > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - 1 map reduce job(s) failed! > > 2011-01-09 22:33:33,064 [main] ERROR > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Failed to produce result in: "file:/tmp/temp1814319995/tmp1141533149" > > 2011-01-09 22:33:33,064 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Records written : Unable to determine number of records written > > 2011-01-09 22:33:33,065 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Bytes written : Unable to determine number of bytes written > > 2011-01-09 22:33:33,065 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Spillable Memory Manager spill count : 0 > > 2011-01-09 22:33:33,065 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Proactive spill count : 0 > > 2011-01-09 22:33:33,065 [main] INFO > > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > > - Failed! > > 2011-01-09 22:33:33,133 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > ERROR 2997: Unable to recreate exception from backend error: > > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable > to > > create input splits for: file://home/geoffeg/test.json > > 2011-01-09 22:33:33,134 [main] ERROR org.apache.pig.tools.grunt.Grunt - > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > > open iterator for alias B > > at org.apache.pig.PigServer.openIterator(PigServer.java:607) > > at > > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:545) > > at > > > > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > > at > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:163) > > at > > > > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139) > > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > > at org.apache.pig.Main.main(Main.java:414) > > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR > > 2997: > > Unable to recreate exception from backend error: > > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable > to > > create input splits for: file://home/geoffeg/test.json > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169) > > at > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:270) > > at > > > > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) > > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1007) > > at org.apache.pig.PigServer.store(PigServer.java:697) > > at org.apache.pig.PigServer.openIterator(PigServer.java:590) > > ... 6 more > > > > -- > > Sent from my email client. > > > -- Sent from my email client.
