[jira] [Comment Edited] (YARN-8036) Memory Available shows a negative value after running updateNodeResource
[ https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138925#comment-17138925 ] yinghua_zh edited comment on YARN-8036 at 6/18/20, 12:53 AM: - 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. Waiting for DAG over RPC 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|*{color:#ff}: App total resource memory: -2048 cpu: 0 taskAllocations: 0{color}* 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|{color:#ff}: {color}*{color:#ff}Allocated: Free: p{color}en*dingRequests: 0 delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \\{Central}] |node.PerSourceNodeTracker|: Num cluster nodes = 11 was (Author: yinghua_zh): 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. Waiting for DAG over RPC 2 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|*{color:#FF}: App total resource memory: -2048 cpu: 0 taskAllocations: 0{color}* 2 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|{color:#FF}: {color}*{color:#FF}Allocated: Free: p{color}en*dingRequests: 0 delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 2 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] |node.PerSourceNodeTracker|: Num cluster nodes = 11 > Memory Available shows a negative value after running updateNodeResource > > > Key: YARN-8036 > URL: https://issues.apache.org/jira/browse/YARN-8036 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Zian Chen >Priority: Major > > Running updateNodeResource for a node that already has applications running > on it doesn't update Memory Available with the right values. It may end up > showing negative values based on the requirements of the application. > Attached a screenshot for reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-8036) Memory Available shows a negative value after running updateNodeResource
[ https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138925#comment-17138925 ] yinghua_zh commented on YARN-8036: -- 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. Waiting for DAG over RPC 2 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|*{color:#FF}: App total resource memory: -2048 cpu: 0 taskAllocations: 0{color}* 2 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|{color:#FF}: {color}*{color:#FF}Allocated: Free: p{color}en*dingRequests: 0 delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 2 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] |node.PerSourceNodeTracker|: Num cluster nodes = 11 > Memory Available shows a negative value after running updateNodeResource > > > Key: YARN-8036 > URL: https://issues.apache.org/jira/browse/YARN-8036 > Project: Hadoop YARN > Issue Type: Bug >Reporter: Charan Hebri >Assignee: Zian Chen >Priority: Major > > Running updateNodeResource for a node that already has applications running > on it doesn't update Memory Available with the right values. It may end up > showing negative values based on the requirements of the application. > Attached a screenshot for reference. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-10317) RM returns a negative value when TEZ AM requests resources
[ https://issues.apache.org/jira/browse/YARN-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136718#comment-17136718 ] yinghua_zh commented on YARN-10317: --- The problem is the same as this:HIVE-12957 > RM returns a negative value when TEZ AM requests resources > -- > > Key: YARN-10317 > URL: https://issues.apache.org/jira/browse/YARN-10317 > Project: Hadoop YARN > Issue Type: Bug > Components: fairscheduler >Affects Versions: 2.7.2 >Reporter: yinghua_zh >Priority: Major > > RM returns a negative value when TEZ AM requests resources,The records are as > follows: > 2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: > IPC Server listener on 23482: starting > 2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] > |client.DAGClientServer|: Instantiated DAGClientRPCServer at > sdp-10-88-0-19/10.88.0.19:23482 > 2020-06-16 15:10:15,726 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: > Added filter AM_PROXY_FILTER > (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context > 2020-06-16 15:10:15,730 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: > Added filter AM_PROXY_FILTER > (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context > static > 2020-06-16 15:10:15,734 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: > adding path spec: /* > 2020-06-16 15:10:15,954 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: > Registered webapp guice modules > 2020-06-16 15:10:15,955 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: > Jetty bound to port 28343 > 2020-06-16 15:10:15,956 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: > jetty-6.1.26 > 2020-06-16 15:10:15,979 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: > Extract > jar:[file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2.jar!/webapps/|file://data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/] > to > /data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp > 2020-06-16 15:10:16,123 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: > Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343 > 2020-06-16 15:10:16,123 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web > app started at 28343 > 2020-06-16 15:10:16,123 [INFO] > [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: > Instantiated WebUIService at > [http://10-88-0-19:28343/ui/|http://sdp-10-88-0-19:28343/ui/] > 2020-06-16 15:10:16,125 [INFO] > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] > |rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService > 2020-06-16 15:10:16,148 [INFO] > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] > |Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, > use dfs.bytes-per-checksum > 2020-06-16 15:10:16,149 [INFO] > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] > |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with > clusterIdentifier=0 > 2020-06-16 15:10:16,159 [INFO] > [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] > |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with > configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, > reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, > preemptionPercentage: 10, preemptionMaxWaitTime: 6, > numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, > idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0 > 2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: > [HISTORY][DAG:N/A][Event:AM_STARTED]: > appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235 > 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. > Waiting for DAG over RPC > 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] > |rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 > taskAllocations: 0 > 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] > |rm.YarnTaskSchedulerService|: {color:#ff}*A**llocated: vCores:0> Free: *{color} pendingRequests: 0 > delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 > 2020-06-16 15:10:16,264 [INFO] [Dispatcher th
[jira] [Updated] (YARN-10317) RM returns a negative value when TEZ AM requests resources
[ https://issues.apache.org/jira/browse/YARN-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yinghua_zh updated YARN-10317: -- Description: RM returns a negative value when TEZ AM requests resources,The records are as follows: 2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC Server listener on 23482: starting 2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] |client.DAGClientServer|: Instantiated DAGClientRPCServer at sdp-10-88-0-19/10.88.0.19:23482 2020-06-16 15:10:15,726 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 2020-06-16 15:10:15,730 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static 2020-06-16 15:10:15,734 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: adding path spec: /* 2020-06-16 15:10:15,954 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Registered webapp guice modules 2020-06-16 15:10:15,955 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Jetty bound to port 28343 2020-06-16 15:10:15,956 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: jetty-6.1.26 2020-06-16 15:10:15,979 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Extract jar:[file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2.jar!/webapps/|file://data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/] to /data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web app started at 28343 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: Instantiated WebUIService at [http://10-88-0-19:28343/ui/|http://sdp-10-88-0-19:28343/ui/] 2020-06-16 15:10:16,125 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService 2020-06-16 15:10:16,148 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2020-06-16 15:10:16,149 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with clusterIdentifier=0 2020-06-16 15:10:16,159 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 10, preemptionMaxWaitTime: 6, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0 2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:AM_STARTED]: appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. Waiting for DAG over RPC 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 taskAllocations: 0 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: {color:#ff}*A**llocated: Free: *{color} pendingRequests: 0 delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \\{Central}] |node.PerSourceNodeTracker|: Num cluster nodes = 11 This leads to errors in tez segmentation was: RM returns a negative value when TEZ AM requests resources,The records are as follows: 2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC Server listener on 23482: starting 2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] |client.DAGClientServer|: Instantiated DAGClientRPCServer at sdp-10-88-0-19/10.88.0.19:23482 2020-06-16 15:10:15,726 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.s
[jira] [Created] (YARN-10317) RM returns a negative value when TEZ AM requests resources
yinghua_zh created YARN-10317: - Summary: RM returns a negative value when TEZ AM requests resources Key: YARN-10317 URL: https://issues.apache.org/jira/browse/YARN-10317 Project: Hadoop YARN Issue Type: Bug Components: fairscheduler Affects Versions: 2.7.2 Reporter: yinghua_zh RM returns a negative value when TEZ AM requests resources,The records are as follows: 2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC Server listener on 23482: starting 2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] |client.DAGClientServer|: Instantiated DAGClientRPCServer at sdp-10-88-0-19/10.88.0.19:23482 2020-06-16 15:10:15,726 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 2020-06-16 15:10:15,730 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Added filter AM_PROXY_FILTER (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context static 2020-06-16 15:10:15,734 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: adding path spec: /* 2020-06-16 15:10:15,954 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Registered webapp guice modules 2020-06-16 15:10:15,955 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: Jetty bound to port 28343 2020-06-16 15:10:15,956 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: jetty-6.1.26 2020-06-16 15:10:15,979 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Extract jar:file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/ to /data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web app started at 28343 2020-06-16 15:10:16,123 [INFO] [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: Instantiated WebUIService at http://sdp-10-88-0-19:28343/ui/ 2020-06-16 15:10:16,125 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService 2020-06-16 15:10:16,148 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2020-06-16 15:10:16,149 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with clusterIdentifier=0 2020-06-16 15:10:16,159 [INFO] [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, preemptionPercentage: 10, preemptionMaxWaitTime: 6, numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0 2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:AM_STARTED]: appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. Waiting for DAG over RPC 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 taskAllocations: 0 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] |rm.YarnTaskSchedulerService|: {color:#FF}*A**llocated: Free: *{color} pendingRequests: 0 delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] |node.PerSourceNodeTracker|: Num cluster nodes = 11 This leads to errors in tez segmentation -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-5040) CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when yarn.nodemanager.resource.percentage-physical-cpu-limit < 100
[ https://issues.apache.org/jira/browse/YARN-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933167#comment-16933167 ] yinghua_zh edited comment on YARN-5040 at 9/19/19 8:54 AM: --- I also encountered the same problem. When CGroup is enabled and a big task is running, after a period of time, the operating system kernel crash,But not every time. My version information is as follows: Hadoop:2.7.2 OS: CentOS Linux release 7.3.1611 OS kernel: 3.10.0-514.el7.x86_64 How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie] [~ecwpp] [~cheersyang]@ was (Author: yinghua_zh): I also encountered the same problem. When CGroup is enabled and a big task is running, after a period of time, the operating system kernel crash, My version information is as follows: Hadoop:2.7.2 OS: CentOS Linux release 7.3.1611 OS kernel: 3.10.0-514.el7.x86_64 How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie] [~ecwpp] @ > CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when > yarn.nodemanager.resource.percentage-physical-cpu-limit < 100 > -- > > Key: YARN-5040 > URL: https://issues.apache.org/jira/browse/YARN-5040 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Sidharta Seethana >Assignee: Varun Vasudev >Priority: Major > > /cc [~vvasudev] > We have been running some benchmarks internally with resource isolation > enabled. We have consistently run into kernel panics when running a large job > ( a large pi job, terasort ). These kernel panics wen't away when we set > yarn.nodemanager.resource.percentage-physical-cpu-limit=100 . Anything less > than 100 triggers different behavior in YARN's CPU resource handler which > seems to cause these issues. Looking at the kernel crash dumps, the > backtraces were different - sometimes pointing to java processes, sometimes > not. > Kernel versions used : 3.10.0-229.14.1.el7.x86_64 and > 3.10.0-327.13.1.el7.x86_64 . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-5040) CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when yarn.nodemanager.resource.percentage-physical-cpu-limit < 100
[ https://issues.apache.org/jira/browse/YARN-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933167#comment-16933167 ] yinghua_zh commented on YARN-5040: -- I also encountered the same problem. When CGroup is enabled and a big task is running, after a period of time, the operating system kernel crash, My version information is as follows: Hadoop:2.7.2 OS: CentOS Linux release 7.3.1611 OS kernel: 3.10.0-514.el7.x86_64 How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie] [~ecwpp] @ > CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when > yarn.nodemanager.resource.percentage-physical-cpu-limit < 100 > -- > > Key: YARN-5040 > URL: https://issues.apache.org/jira/browse/YARN-5040 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.9.0 >Reporter: Sidharta Seethana >Assignee: Varun Vasudev >Priority: Major > > /cc [~vvasudev] > We have been running some benchmarks internally with resource isolation > enabled. We have consistently run into kernel panics when running a large job > ( a large pi job, terasort ). These kernel panics wen't away when we set > yarn.nodemanager.resource.percentage-physical-cpu-limit=100 . Anything less > than 100 triggers different behavior in YARN's CPU resource handler which > seems to cause these issues. Looking at the kernel crash dumps, the > backtraces were different - sometimes pointing to java processes, sometimes > not. > Kernel versions used : 3.10.0-229.14.1.el7.x86_64 and > 3.10.0-327.13.1.el7.x86_64 . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org