Re: Performance Problems Migrating to S3A Committers

2021-08-05 Thread James Yu
See this ticket https://issues.apache.org/jira/browse/HADOOP-17201. It may help your team. From: Johnny Burns Sent: Tuesday, June 22, 2021 3:41 PM To: user@spark.apache.org Cc: data-orchestration-team Subject: Performance Problems Migrating to S3A Committers

Re: Performance Problems Migrating to S3A Committers

2021-06-23 Thread Artemis User
Thanks Johnny for sharing your experience.  Have you tried to use S3A committer?  Looks like this one is introduced in the latest Hadoop for solving problems with other committers. https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html - ND On 6/22/21 6:41 PM,

Performance Problems Migrating to S3A Committers

2021-06-22 Thread Johnny Burns
Hello. I’m Johnny, I work at Stripe. We’re heavy Spark users and we’ve been exploring using s3 committers. Currently we first write the data to HDFS and then upload it to S3. However, now with S3 offering strong consistency guarantees, we are evaluating if we can write data directly to S3. We’re