[ 
https://issues.apache.org/jira/browse/YARN-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferenc Erdelyi updated YARN-11590:
----------------------------------
    Description: 
YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
Curator uses ClientCnxnSocketNetty for secured connection and the thread needs 
to be closed after calling confStore.format() to avoid the netty thread waiting 
indefinitely, which renders the RM unresponsive after deleting the confstore 
when started with the "-format-conf-store" arg.

The unclosed thread, which keeps RM running:
{code:java}
2023-10-10 12:13:01,000 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The 
Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
 is stands at [sun.misc.Unsafe.park(Native Method), 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215), 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
 
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
 java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684), 
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
{code}


  was:
YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
Curator uses ClientCnxnSocketNetty for secured connection and the thread needs 
to be closed with confStore.close() after calling confStore.format() to avoid 
the netty thread to wait indefinitely, which renders the RM unresponsive after 
deleting the confstore when started with the "-format-conf-store" arg.

The unclosed thread, which keeps RM running:
{code:java}
2023-10-10 12:13:01,000 INFO 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The 
Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
 is stands at [sun.misc.Unsafe.park(Native Method), 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215), 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
 
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
 java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684), 
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
{code}



> RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, 
>  as netty thread waits indefinitely
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-11590
>                 URL: https://issues.apache.org/jira/browse/YARN-11590
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Ferenc Erdelyi
>            Assignee: Ferenc Erdelyi
>            Priority: Major
>
> YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
> Curator uses ClientCnxnSocketNetty for secured connection and the thread 
> needs to be closed after calling confStore.format() to avoid the netty thread 
> waiting indefinitely, which renders the RM unresponsive after deleting the 
> confstore when started with the "-format-conf-store" arg.
> The unclosed thread, which keeps RM running:
> {code:java}
> 2023-10-10 12:13:01,000 INFO 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The 
> Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
>  is stands at [sun.misc.Unsafe.park(Native Method), 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215), 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
>  
> java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
>  java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684), 
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
>  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to