[ 
https://issues.apache.org/jira/browse/HADOOP-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932742#comment-16932742
 ] 

Steve Loughran commented on HADOOP-16570:
-----------------------------------------

Even when you set the # of threads to 0, so disabling task commit, the job 
commit (for a terasort BTW) fails. 
 
stack trace implies its while listing the files to commit

{code}
main
  at java.util.Arrays.copyOfRange([CII)[C (Arrays.java:3664)
  at java.lang.String.<init>([CII)V (String.java:207)
  at java.lang.String.substring(II)Ljava/lang/String; (String.java:1969)
  at java.net.URI$Parser.substring(II)Ljava/lang/String; (URI.java:2869)
  at java.net.URI$Parser.parseHierarchical(II)I (URI.java:3106)
  at java.net.URI$Parser.parse(Z)V (URI.java:3053)
  at java.net.URI.<init>(Ljava/lang/String;)V (URI.java:588)
  at 
org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.destinationPath()Lorg/apache/hadoop/fs/Path;
 (SinglePendingCommit.java:253)
  at org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.validate()V 
(SinglePendingCommit.java:195)
  at org.apache.hadoop.fs.s3a.commit.files.PendingSet.validate()V 
(PendingSet.java:146)
  at 
org.apache.hadoop.fs.s3a.commit.files.PendingSet.load(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/s3a/commit/files/PendingSet;
 (PendingSet.java:109)
  at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$loadPendingsetFiles$1(Ljava/util/List;Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/FileStatus;)V
 (AbstractS3ACommitter.java:492)
  at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter$$Lambda$92.run(Ljava/lang/Object;)V
 (Unknown Source)
  at 
org.apache.hadoop.fs.s3a.commit.Tasks$Builder.runSingleThreaded(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z
 (Tasks.java:165)
  at 
org.apache.hadoop.fs.s3a.commit.Tasks$Builder.run(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z
 (Tasks.java:150)
  at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.loadPendingsetFiles(Lorg/apache/hadoop/mapreduce/JobContext;ZLorg/apache/hadoop/fs/FileSystem;Ljava/lang/Iterable;)Ljava/util/List;
 (AbstractS3ACommitter.java:490)
  at 
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploads(Lorg/apache/hadoop/mapreduce/JobContext;Z)Ljava/util/List;
 (StagingCommitter.java:502)
  at 
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploadsToCommit(Lorg/apache/hadoop/mapreduce/JobContext;)Ljava/util/List;
 (StagingCommitter.java:472)
  at 
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;)V
 (AbstractS3ACommitter.java:598)
  at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V
 (HadoopMapReduceCommitProtocol.scala:166)
  at 
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V
 (PathOutputCommitProtocol.scala:194)
  at 
{code}

Hypothesis: there are too many files to commit by way of enumerating them all 
and then committing. 

* We need to move to a sequence of load and commit or load and abort where both 
the load and the commit/abort is done in the same worker thread.
* we don't create a list of results for a success file except for smaller jobs. 
Maybe we could list the first 100 files and not worry about the rest; but do 
add a counter of how many files there really were, if we didn't have one 
already.

> S3A committers leak threads on job/task commit
> ----------------------------------------------
>
>                 Key: HADOOP-16570
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16570
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0, 3.1.2
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>
> The fixed size ThreadPool created in AbstractS3ACommitter doesn't get cleaned 
> up at EOL; as a result you leak the no. of threads set in 
> "fs.s3a.committer.threads"
> Not visible in MR/distcp jobs, but ultimately causes OOM on Spark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to