[ https://issues.apache.org/jira/browse/HADOOP-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932742#comment-16932742 ]
Steve Loughran commented on HADOOP-16570: ----------------------------------------- Even when you set the # of threads to 0, so disabling task commit, the job commit (for a terasort BTW) fails. stack trace implies its while listing the files to commit {code} main at java.util.Arrays.copyOfRange([CII)[C (Arrays.java:3664) at java.lang.String.<init>([CII)V (String.java:207) at java.lang.String.substring(II)Ljava/lang/String; (String.java:1969) at java.net.URI$Parser.substring(II)Ljava/lang/String; (URI.java:2869) at java.net.URI$Parser.parseHierarchical(II)I (URI.java:3106) at java.net.URI$Parser.parse(Z)V (URI.java:3053) at java.net.URI.<init>(Ljava/lang/String;)V (URI.java:588) at org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.destinationPath()Lorg/apache/hadoop/fs/Path; (SinglePendingCommit.java:253) at org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.validate()V (SinglePendingCommit.java:195) at org.apache.hadoop.fs.s3a.commit.files.PendingSet.validate()V (PendingSet.java:146) at org.apache.hadoop.fs.s3a.commit.files.PendingSet.load(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/s3a/commit/files/PendingSet; (PendingSet.java:109) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$loadPendingsetFiles$1(Ljava/util/List;Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/FileStatus;)V (AbstractS3ACommitter.java:492) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter$$Lambda$92.run(Ljava/lang/Object;)V (Unknown Source) at org.apache.hadoop.fs.s3a.commit.Tasks$Builder.runSingleThreaded(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z (Tasks.java:165) at org.apache.hadoop.fs.s3a.commit.Tasks$Builder.run(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z (Tasks.java:150) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.loadPendingsetFiles(Lorg/apache/hadoop/mapreduce/JobContext;ZLorg/apache/hadoop/fs/FileSystem;Ljava/lang/Iterable;)Ljava/util/List; (AbstractS3ACommitter.java:490) at org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploads(Lorg/apache/hadoop/mapreduce/JobContext;Z)Ljava/util/List; (StagingCommitter.java:502) at org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploadsToCommit(Lorg/apache/hadoop/mapreduce/JobContext;)Ljava/util/List; (StagingCommitter.java:472) at org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;)V (AbstractS3ACommitter.java:598) at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V (HadoopMapReduceCommitProtocol.scala:166) at org.apache.spark.internal.io.cloud.PathOutputCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V (PathOutputCommitProtocol.scala:194) at {code} Hypothesis: there are too many files to commit by way of enumerating them all and then committing. * We need to move to a sequence of load and commit or load and abort where both the load and the commit/abort is done in the same worker thread. * we don't create a list of results for a success file except for smaller jobs. Maybe we could list the first 100 files and not worry about the rest; but do add a counter of how many files there really were, if we didn't have one already. > S3A committers leak threads on job/task commit > ---------------------------------------------- > > Key: HADOOP-16570 > URL: https://issues.apache.org/jira/browse/HADOOP-16570 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 3.2.0, 3.1.2 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Major > > The fixed size ThreadPool created in AbstractS3ACommitter doesn't get cleaned > up at EOL; as a result you leak the no. of threads set in > "fs.s3a.committer.threads" > Not visible in MR/distcp jobs, but ultimately causes OOM on Spark -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org