aren't you missing the bucket name? On Mon, Jun 14, 2010 at 7:00 AM, Dave Viner <[email protected]> wrote: > Here's the stack trace related to that error: > > Pig Stack Trace > --------------- > ERROR 2997: Unable to recreate exception from backend error: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to > create input splits for: s3n://my-key:my-skey@ > /log/file/path/2010.04.13.20:05:04.log.bz2 > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias LOGS > at org.apache.pig.PigServer.openIterator(PigServer.java:521) > at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) > at org.apache.pig.Main.main(Main.java:357) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997: > Unable to recreate exception from backend error: > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to > create input splits for: s3n://my-key:my-skey@ > /log/file/path/2010.04.13.20:05:04.log.bz2 > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:268) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835) > at org.apache.pig.PigServer.store(PigServer.java:569) > at org.apache.pig.PigServer.openIterator(PigServer.java:504) > ... 6 more > > After much more experimentation, I discovered that if I copy the file > locally before executing Pig, the script works properly. That is, I ran: > > % /usr/local/hadoop/bin/hadoop dfs -copyToLocal > "s3n:///log/file/path/2010-04-13-20-05-04.log.bz2" test.bz2 > > Then in pig, read in the file using: > logstest2 = load 'test.bz2' USING PigStorage('\t'); > > and it worked fine. > > One additional problem I discovered, at least for hdfs, is that dfs > -copyToLocal does not work for a file with a ':' in the name. When I > replaced the ':' with '-', it worked fine. > However, even using the '-' filename, Pig would not open the remote file. > > Dave Viner > > On Sun, Jun 13, 2010 at 11:09 PM, Ashutosh Chauhan < > [email protected]> wrote: > >> Dave, >> >> A log file must be sitting in your dir from where you are running Pig. >> It will contain the stack trace for the failure. Can you paste the >> content of the log file here. >> >> Ashutosh >> On Sun, Jun 13, 2010 at 19:36, Dave Viner <[email protected]> wrote: >> > I'm having trouble using S3 as a data source for files in the LOAD >> > statement. From research, it definitely appears that I want s3n://, not >> > s3:// because the file was placed there by another (non-hadoop/pig) >> process. >> > So, here's the basic step: >> > >> > LOGS = LOAD 's3n://my-key:my-skey@ >> /log/file/path/2010.04.13.20:05:04.log.bz2' >> > USING PigStorage('\t') >> > dump LOGS; >> > >> > I get this grunt error: >> > >> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable >> to >> > create input splits for: s3n://my-key:my-skey@ >> > /log/file/path/2010.04.13.20:05:04.log.bz2 >> > >> > >> > Is there some other way I can/should specify a file from S3 as the source >> of >> > a LOAD statement? >> > >> > Thanks >> > Dave Viner >> > >> >
-- Dan Di Spaltro
