Thomas Friedrich created YARN-5309:
--------------------------------------

             Summary: SSLFactory truststore reloader thread leak in 
TimelineClientImpl
                 Key: YARN-5309
                 URL: https://issues.apache.org/jira/browse/YARN-5309
             Project: Hadoop YARN
          Issue Type: Bug
          Components: timelineserver, yarn
    Affects Versions: 2.7.1
            Reporter: Thomas Friedrich


We found a similar issue as HADOOP-11368 in TimelineClientImpl. The class 
creates an instance of SSLFactory in newSslConnConfigurator and subsequently 
creates the ReloadingX509TrustManager instance which in turn starts a trust 
store reloader thread. 
However, the SSLFactory is never destroyed and hence the trust store reloader 
threads are not killed.

This problem was observed by a customer who had SSL enabled in Hadoop and 
submitted many queries against the HiveServer2. After a few days, the HS2 
instance crashed and from the Java dump we could see many (over 13000) threads 
like this:
"Truststore reloader thread" #126 daemon prio=5 os_prio=0 
tid=0x00007f680d2e3000 nid=0x98fd waiting on 
condition [0x00007f67e482c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at org.apache.hadoop.security.ssl.ReloadingX509TrustManager.run
(ReloadingX509TrustManager.java:225)
        at java.lang.Thread.run(Thread.java:745)

HiveServer2 uses the JobClient to submit a job:
Thread [HiveServer2-Background-Pool: Thread-188] (Suspended (breakpoint at line 
89 in 

ReloadingX509TrustManager))     
        owns: Object  (id=464)  
        owns: Object  (id=465)  
        owns: Object  (id=466)  
        owns: ServiceLoader<S>  (id=210)        
        ReloadingX509TrustManager.<init>(String, String, String, long) line: 89 
        FileBasedKeyStoresFactory.init(SSLFactory$Mode) line: 209       
        SSLFactory.init() line: 131     
        TimelineClientImpl.newSslConnConfigurator(int, Configuration) line: 532 
        TimelineClientImpl.newConnConfigurator(Configuration) line: 507 
        TimelineClientImpl.serviceInit(Configuration) line: 269 
        TimelineClientImpl(AbstractService).init(Configuration) line: 163       
        YarnClientImpl.serviceInit(Configuration) line: 169     
        YarnClientImpl(AbstractService).init(Configuration) line: 163   
        ResourceMgrDelegate.serviceInit(Configuration) line: 102        
        ResourceMgrDelegate(AbstractService).init(Configuration) line: 163      
        ResourceMgrDelegate.<init>(YarnConfiguration) line: 96  
        YARNRunner.<init>(Configuration) line: 112      
        YarnClientProtocolProvider.create(Configuration) line: 34       
        Cluster.initialize(InetSocketAddress, Configuration) line: 95   
        Cluster.<init>(InetSocketAddress, Configuration) line: 82       
        Cluster.<init>(Configuration) line: 75  
        JobClient.init(JobConf) line: 475       
        JobClient.<init>(JobConf) line: 454     
        MapRedTask(ExecDriver).execute(DriverContext) line: 401 
        MapRedTask.execute(DriverContext) line: 137     
        MapRedTask(Task<T>).executeTask() line: 160     
        TaskRunner.runSequential() line: 88     
        Driver.launchTask(Task<Serializable>, String, boolean, String, int, 
DriverContext) line: 1653   
        Driver.execute() line: 1412     

For every job, a new instance of JobClient/YarnClientImpl/TimelineClientImpl is 
created. But because the HS2 process stays up for days, the previous trust 
store reloader threads are still hanging around in the HS2 process and 
eventually use all the resources available. 

It seems like a similar fix as HADOOP-11368 is needed in TimelineClientImpl but 
it doesn't have a destroy method to begin with. 

One option to avoid this problem is to disable the yarn timeline service 
(yarn.timeline-service.enabled=false).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to