[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hung updated YARN-8193: Attachment: YARN-8193-branch-2.10-001.patch > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.10-001.patch, YARN-8193-branch-2.9.0-001.patch, > YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tan, Wangda updated YARN-8193: -- Target Version/s: 2.9.2 Priority: Blocker (was: Critical) > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8193: - Fix Version/s: (was: 3.2.1) 3.2.0 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sunil Govindan updated YARN-8193: - Fix Version/s: (was: 3.2.0) 3.2.1 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.1.1, 3.2.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated YARN-8193: - Attachment: YARN-8193-branch-2-001.patch > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2-001.patch, > YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8193: - Fix Version/s: (was: 2.9.0) > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8193: - Fix Version/s: 3.2.0 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-8193: - Fix Version/s: (was: 3.2.0) > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianjuan updated YARN-8193: --- Fix Version/s: 2.9.0 > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 2.9.0, 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianjuan updated YARN-8193: --- Attachment: YARN-8193-branch-2.9.0-001.patch > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > Attachments: YARN-8193-branch-2.9.0-001.patch, YARN-8193.001.patch, > YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Attachment: YARN-8193.002.patch > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Attachments: YARN-8193.001.patch, YARN-8193.002.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Attachment: YARN-8193.001.patch > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Attachments: YARN-8193.001.patch > > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Description: When running massive queries successively, at some point RM just hangs and stops allocating resources. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. was: When running massive queries successively, at some point RM just hangs and stops allocating resources. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). This problem goes away on restarting > ResourceManager. > No NM restart is required. > At the point RM get hangs, YARN throw NullPointerException at > RegularContainerAllocator.getLocalityWaitFactor. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Description: When running massive queries successively, at some point RM just hangs and stops allocating resources. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. was: When running massive queries successively, at some point RM just hangs and stops allocating resources. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. At the point RM get hangs, YARN throw > NullPointerException at RegularContainerAllocator.getLocalityWaitFactor. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). > This problem goes away on restarting ResourceManager. No NM restart is > required. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Description: When running massive queries successively, at some point RM just hangs and stops allocating resources. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. At the point RM get hangs, YARN throw NullPointerException at RegularContainerAllocator.getLocalityWaitFactor was: We were running TPCDS queries successively and at some point RM just hangs and stops allocating resources. There's sufficient space given to yarn.nodemanager.local-dirs (not a node health issue, RM didn't report any node being unhealthy). There is no fixed trigger for this (query or operation). This problem goes away on restarting ResourceManager. No NM restart is required. I have attached RM logs. The application that just finished before the current one is application_155930059_0379 The current application (one that hangs) is assigned application number application_155930059_0380. > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > > When running massive queries successively, at some point RM just hangs and > stops allocating resources. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). This problem goes away on restarting > ResourceManager. > No NM restart is required. > At the point RM get hangs, YARN throw NullPointerException at > RegularContainerAllocator.getLocalityWaitFactor > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Attachment: (was: hadoop-yarn-resourcemanager-c01s04.hadoop.local (1).log) > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > > We were running TPCDS queries successively and at some point RM just hangs > and stops allocating resources. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). This problem goes away on restarting > ResourceManager. > No NM restart is required. > I have attached RM logs. > The application that just finished before the current one is > application_155930059_0379 > The current application (one that hangs) is assigned application number > application_155930059_0380. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-8193) YARN RM hangs abruptly (stops allocating resources) when running successive applications.
[ https://issues.apache.org/jira/browse/YARN-8193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zian Chen updated YARN-8193: Attachment: hadoop-yarn-resourcemanager-c01s04.hadoop.local (1).log > YARN RM hangs abruptly (stops allocating resources) when running successive > applications. > - > > Key: YARN-8193 > URL: https://issues.apache.org/jira/browse/YARN-8193 > Project: Hadoop YARN > Issue Type: Bug > Components: yarn >Reporter: Zian Chen >Assignee: Zian Chen >Priority: Critical > Attachments: hadoop-yarn-resourcemanager-c01s04.hadoop.local (1).log > > > We were running TPCDS queries successively and at some point RM just hangs > and stops allocating resources. > There's sufficient space given to yarn.nodemanager.local-dirs (not a node > health issue, RM didn't report any node being unhealthy). There is no fixed > trigger for this (query or operation). This problem goes away on restarting > ResourceManager. > No NM restart is required. > I have attached RM logs. > The application that just finished before the current one is > application_155930059_0379 > The current application (one that hangs) is assigned application number > application_155930059_0380. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org