Hello,
When using s3:// schema on amazon managed s3 bucket, the module attempts to
retrieve the prefix with a leading / character which amazonaws does not
recognize. When under a debugger the leading "/" is removed the module proceeds
forward, but errors out downstream expecting a leading "/".
Additionally, if the secret key generated by AWS contains / character,
authentication will break. If the / character is replaced with URI escape
sequence %2F authentication will break. The only way to pass auth in my case
was to keep regenerating the keys until the secret key produced was free of /
or : characters. I'm pretty sure this is not going to cut it in production.
Does anyone know what magic is called for to properly escape the offending
characters from the s3(n) URI with the format as proposed by the module
developers ?
Has anyone had any success using S3InputModule? I haven't deployed the app to
the EMR cluster yet, all tests are local accessing S3 buckets from outside of
the amazon cloud. It seems the authors have setup their unit tests with s3n://
schema. Is there a way to replicate the original unit tests?
Side note, current implementation does not persist the list of processed files
anywhere outside of the running process. Nor does the logic allow for moving
processed files into another bucket or marking them as complete. Does anyone
know what was the original design though, in terms of protection against
duplicate processing?
Any insights would be greatly appreciated!
-- David.