On 16 Nov 2016, at 22:34, Edden Burrow
mailto:eddenbur...@gmail.com>> wrote:
Anyone dealing with a lot of files with spark? We're trying s3a with 2.0.1
because we're seeing intermittent errors in S3 where jobs fail and saveAsText
file fails. Using pyspark.
How many files? Thousands? Millions
Anyone dealing with a lot of files with spark? We're trying s3a with 2.0.1
because we're seeing intermittent errors in S3 where jobs fail and
saveAsText file fails. Using pyspark.
Is there any issue with working in a S3 folder that has too many files?
How about having versioning enabled? Are thes