Hello,
When using s3:// schema on amazon managed s3 bucket, the module attempts to 
retrieve the prefix with a leading / character which amazonaws does not 
recognize. When under a debugger the leading "/" is removed the module proceeds 
forward, but errors out downstream expecting a leading "/".
Additionally, if the secret key generated by AWS contains / character, 
authentication will break. If the / character is replaced with URI escape 
sequence %2F authentication will break. The only way to pass auth in my case 
was to keep regenerating the keys until the secret key produced was free of  / 
or : characters. I'm pretty sure this is not going to cut it in production. 
Does anyone know what magic is called for to properly escape the offending 
characters from the s3(n) URI with the format as proposed by the module 
developers ?  
Has anyone had any success using S3InputModule? I haven't deployed the app to 
the EMR cluster yet, all tests are local accessing S3 buckets from outside of 
the amazon cloud. It seems the authors have setup their unit tests with s3n:// 
schema. Is there a way to replicate the original unit tests?
Side note, current implementation does not persist the list of processed files 
anywhere outside of the running process. Nor does the logic allow for moving 
processed files into another bucket or marking them as complete. Does anyone 
know what was the original design though, in terms of protection against 
duplicate processing?
Any insights would be greatly appreciated!
-- David.

Reply via email to