I'm having trouble using S3 as a data source for files in the LOAD
statement. From research, it definitely appears that I want s3n://, not
s3:// because the file was placed there by another (non-hadoop/pig) process.
So, here's the basic step:
LOGS = LOAD 's3n://my-key:my-skey@/log/file/path/2010.04.13.20:05:04.log.bz2'
USING PigStorage('\t')
dump LOGS;
I get this grunt error:
org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
create input splits for: s3n://my-key:my-skey@
/log/file/path/2010.04.13.20:05:04.log.bz2
Is there some other way I can/should specify a file from S3 as the source of
a LOAD statement?
Thanks
Dave Viner