[
https://issues.apache.org/jira/browse/YARN-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385744#comment-15385744
]
Weiwei Yang commented on YARN-5309:
-----------------------------------
[~vvasudev] Thanks a lot for all your help!
> Fix SSLFactory truststore reloader thread leak in TimelineClientImpl
> --------------------------------------------------------------------
>
> Key: YARN-5309
> URL: https://issues.apache.org/jira/browse/YARN-5309
> Project: Hadoop YARN
> Issue Type: Bug
> Components: timelineserver, yarn
> Affects Versions: 2.7.1
> Reporter: Thomas Friedrich
> Assignee: Weiwei Yang
> Priority: Blocker
> Fix For: 2.7.3
>
> Attachments: YARN-5309.001.patch, YARN-5309.002.patch,
> YARN-5309.003.patch, YARN-5309.004.patch, YARN-5309.005.patch,
> YARN-5309.branch-2.7.3.001.patch, YARN-5309.branch-2.8.001.patch
>
>
> We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class
> creates an instance of SSLFactory in newSslConnConfigurator and subsequently
> creates the ReloadingX509TrustManager instance which in turn starts a trust
> store reloader thread.
> However, the SSLFactory is never destroyed and hence the trust store reloader
> threads are not killed.
> This problem was observed by a customer who had SSL enabled in Hadoop and
> submitted many queries against the HiveServer2. After a few days, the HS2
> instance crashed and from the Java dump we could see many (over 13000)
> threads like this:
> "Truststore reloader thread" #126 daemon prio=5 os_prio=0
> tid=0x00007f680d2e3000 nid=0x98fd waiting on
> condition [0x00007f67e482c000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
> (ReloadingX509TrustManager.java:225)
> at java.lang.Thread.run(Thread.java:745)
> HiveServer2 uses the JobClient to submit a job:
> Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at
> line 89 in
> ReloadingX509TrustManager))
> owns: Object (id=464)
> owns: Object (id=465)
> owns: Object (id=466)
> owns: ServiceLoader<S> (id=210)
> ReloadingX509TrustManager.<init>(String, String, String, long) line: 89
> FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209
> SSLFactory.init() line: 131
> TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532
> TimelineClientImpl.newConnConfigurator(Configuration) line: 507
> TimelineClientImpl.serviceInit(Configuration) line: 269
> TimelineClientImpl(AbstractService).init(Configuration) line: 163
> YarnClientImpl.serviceInit(Configuration) line: 169
> YarnClientImpl(AbstractService).init(Configuration) line: 163
> ResourceMgrDelegate.serviceInit(Configuration) line: 102
> ResourceMgrDelegate(AbstractService).init(Configuration) line: 163
> ResourceMgrDelegate.<init>(YarnConfiguration) line: 96
> YARNRunner.<init>(Configuration) line: 112
> YarnClientProtocolProvider.create(Configuration) line: 34
> Cluster.initialize(InetSocketAddress, Configuration) line: 95
> Cluster.<init>(InetSocketAddress, Configuration) line: 82
> Cluster.<init>(Configuration) line: 75
> JobClient.init(JobConf) line: 475
> JobClient.<init>(JobConf) line: 454
> MapRedTask(ExecDriver).execute(DriverContext) line: 401
> MapRedTask.execute(DriverContext) line: 137
> MapRedTask(Task<T>).executeTask() line: 160
> TaskRunner.runSequential() line: 88
> Driver.launchTask(Task<Serializable>, String, boolean, String, int,
> DriverContext) line: 1653
> Driver.execute() line: 1412
> For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl
> is created. But because the HS2 process stays up for days, the previous trust
> store reloader threads are still hanging around in the HS2 process and
> eventually use all the resources available.
> It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl
> but it doesn't have a destroy method to begin with.
> One option to avoid this problem is to disable the yarn timeline service
> (yarn.timeline-service.enabled=false).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]