[
https://issues.apache.org/jira/browse/YARN-11590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ferenc Erdelyi updated YARN-11590:
----------------------------------
Description:
YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
Curator uses ClientCnxnSocketNetty for secured connection and the thread needs
to be closed after calling confStore.format() to avoid the netty thread waiting
indefinitely, which renders the RM unresponsive after deleting the confstore
when started with the "-format-conf-store" arg.
The unclosed thread, which keeps RM running:
{code:java}
2023-10-10 12:13:01,000 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The
Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
is stands at [sun.misc.Unsafe.park(Native Method),
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215),
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684),
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
{code}
was:
YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
Curator uses ClientCnxnSocketNetty for secured connection and the thread needs
to be closed with confStore.close() after calling confStore.format() to avoid
the netty thread to wait indefinitely, which renders the RM unresponsive after
deleting the confstore when started with the "-format-conf-store" arg.
The unclosed thread, which keeps RM running:
{code:java}
2023-10-10 12:13:01,000 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The
Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
is stands at [sun.misc.Unsafe.park(Native Method),
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215),
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684),
org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
{code}
> RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled,
> as netty thread waits indefinitely
> -----------------------------------------------------------------------------------------------------------------
>
> Key: YARN-11590
> URL: https://issues.apache.org/jira/browse/YARN-11590
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Ferenc Erdelyi
> Assignee: Ferenc Erdelyi
> Priority: Major
>
> YARN-11468 enabled Zookeeper SSL/TLS support for YARN.
> Curator uses ClientCnxnSocketNetty for secured connection and the thread
> needs to be closed after calling confStore.format() to avoid the netty thread
> waiting indefinitely, which renders the RM unresponsive after deleting the
> confstore when started with the "-format-conf-store" arg.
> The unclosed thread, which keeps RM running:
> {code:java}
> 2023-10-10 12:13:01,000 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: The
> Thread[main-SendThread(ferdelyi-1.ferdelyi.root.hwx.site:2182),5,main]TIMED_WAITING
> is stands at [sun.misc.Unsafe.park(Native Method),
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215),
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078),
>
> java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:522),
> java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:684),
> org.apache.zookeeper.ClientCnxnSocketNetty.doTransport(ClientCnxnSocketNetty.java:275),
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1289)]
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]