Re: using s3 as a data source

Dan Di Spaltro Mon, 14 Jun 2010 08:39:53 -0700

aren't you missing the bucket name?

On Mon, Jun 14, 2010 at 7:00 AM, Dave Viner <[email protected]> wrote:
> Here's the stack trace related to that error:
>
> Pig Stack Trace
> ---------------
> ERROR 2997: Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: s3n://my-key:my-skey@
> /log/file/path/2010.04.13.20:05:04.log.bz2
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias LOGS
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
> at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> at org.apache.pig.Main.main(Main.java:357)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2997:
> Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input splits for: s3n://my-key:my-skey@
> /log/file/path/2010.04.13.20:05:04.log.bz2
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:169)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:268)
> at
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
> at org.apache.pig.PigServer.store(PigServer.java:569)
> at org.apache.pig.PigServer.openIterator(PigServer.java:504)
> ... 6 more
>
> After much more experimentation, I discovered that if I copy the file
> locally before executing Pig, the script works properly.  That is, I ran:
>
> % /usr/local/hadoop/bin/hadoop dfs -copyToLocal
> "s3n:///log/file/path/2010-04-13-20-05-04.log.bz2" test.bz2
>
> Then in pig, read in the file using:
> logstest2 = load 'test.bz2' USING PigStorage('\t');
>
> and it worked fine.
>
> One additional problem I discovered, at least for hdfs, is that dfs
> -copyToLocal does not work for a file with a ':' in the name.  When I
> replaced the ':' with '-', it worked fine.
> However, even using the '-' filename, Pig would not open the remote file.
>
> Dave Viner
>
> On Sun, Jun 13, 2010 at 11:09 PM, Ashutosh Chauhan <
> [email protected]> wrote:
>
>> Dave,
>>
>> A log file must be sitting in your dir from where you are running Pig.
>> It will contain the stack trace for the failure. Can you paste the
>> content of the log file here.
>>
>> Ashutosh
>> On Sun, Jun 13, 2010 at 19:36, Dave Viner <[email protected]> wrote:
>> > I'm having trouble using S3 as a data source for files in the LOAD
>> > statement.  From research, it definitely appears that I want s3n://, not
>> > s3:// because the file was placed there by another (non-hadoop/pig)
>> process.
>> >  So, here's the basic step:
>> >
>> > LOGS = LOAD 's3n://my-key:my-skey@
>> /log/file/path/2010.04.13.20:05:04.log.bz2'
>> > USING PigStorage('\t')
>> > dump LOGS;
>> >
>> > I get this grunt error:
>> >
>> > org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable
>> to
>> > create input splits for: s3n://my-key:my-skey@
>> > /log/file/path/2010.04.13.20:05:04.log.bz2
>> >
>> >
>> > Is there some other way I can/should specify a file from S3 as the source
>> of
>> > a LOAD statement?
>> >
>> > Thanks
>> > Dave Viner
>> >
>>
>




-- 
Dan Di Spaltro

Re: using s3 as a data source

Reply via email to