using s3 as a data source

Dave Viner Sun, 13 Jun 2010 19:37:20 -0700

I'm having trouble using S3 as a data source for files in the LOAD
statement.  From research, it definitely appears that I want s3n://, not
s3:// because the file was placed there by another (non-hadoop/pig) process.
 So, here's the basic step:


LOGS = LOAD 's3n://my-key:my-skey@/log/file/path/2010.04.13.20:05:04.log.bz2'
USING PigStorage('\t')
dump LOGS;

I get this grunt error:

org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: s3n://my-key:my-skey@
/log/file/path/2010.04.13.20:05:04.log.bz2


Is there some other way I can/should specify a file from S3 as the source of
a LOAD statement?

Thanks
Dave Viner

using s3 as a data source

Reply via email to