[jira] [Comment Edited] (YARN-8036) Memory Available shows a negative value after running updateNodeResource

2020-06-17 Thread yinghua_zh (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138925#comment-17138925
 ] 

yinghua_zh edited comment on YARN-8036 at 6/18/20, 12:53 AM:
-

2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
Waiting for DAG over RPC
 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|*{color:#ff}: App total resource memory: -2048 
cpu: 0 taskAllocations: 0{color}*
 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|{color:#ff}: {color}*{color:#ff}Allocated: 
 Free:  p{color}en*dingRequests: 0 
delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \\{Central}] 
|node.PerSourceNodeTracker|: Num cluster nodes = 11


was (Author: yinghua_zh):
2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
Waiting for DAG over RPC
2
2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|*{color:#FF}: App total resource memory: -2048 
cpu: 0 taskAllocations: 0{color}*
2
2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|{color:#FF}: {color}*{color:#FF}Allocated: 
 Free:  p{color}en*dingRequests: 0 
delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
2
2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] 
|node.PerSourceNodeTracker|: Num cluster nodes = 11

> Memory Available shows a negative value after running updateNodeResource
> 
>
> Key: YARN-8036
> URL: https://issues.apache.org/jira/browse/YARN-8036
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Major
>
> Running updateNodeResource for a node that already has applications running 
> on it doesn't update Memory Available with the right values. It may end up 
> showing negative values based on the requirements of the application. 
> Attached a screenshot for reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-8036) Memory Available shows a negative value after running updateNodeResource

2020-06-17 Thread yinghua_zh (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-8036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17138925#comment-17138925
 ] 

yinghua_zh commented on YARN-8036:
--

2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
Waiting for DAG over RPC
2
2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|*{color:#FF}: App total resource memory: -2048 
cpu: 0 taskAllocations: 0{color}*
2
2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|{color:#FF}: {color}*{color:#FF}Allocated: 
 Free:  p{color}en*dingRequests: 0 
delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
2
2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] 
|node.PerSourceNodeTracker|: Num cluster nodes = 11

> Memory Available shows a negative value after running updateNodeResource
> 
>
> Key: YARN-8036
> URL: https://issues.apache.org/jira/browse/YARN-8036
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Charan Hebri
>Assignee: Zian Chen
>Priority: Major
>
> Running updateNodeResource for a node that already has applications running 
> on it doesn't update Memory Available with the right values. It may end up 
> showing negative values based on the requirements of the application. 
> Attached a screenshot for reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10317) RM returns a negative value when TEZ AM requests resources

2020-06-16 Thread yinghua_zh (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136718#comment-17136718
 ] 

yinghua_zh commented on YARN-10317:
---

 
The problem is the same as this:HIVE-12957

> RM returns a negative value when TEZ AM requests resources
> --
>
> Key: YARN-10317
> URL: https://issues.apache.org/jira/browse/YARN-10317
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Affects Versions: 2.7.2
>Reporter: yinghua_zh
>Priority: Major
>
> RM returns a negative value when TEZ AM requests resources,The records are as 
> follows:
> 2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: 
> IPC Server listener on 23482: starting
>  2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] 
> |client.DAGClientServer|: Instantiated DAGClientRPCServer at 
> sdp-10-88-0-19/10.88.0.19:23482
>  2020-06-16 15:10:15,726 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
> Added filter AM_PROXY_FILTER 
> (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
>  2020-06-16 15:10:15,730 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
> Added filter AM_PROXY_FILTER 
> (class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
> static
>  2020-06-16 15:10:15,734 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
> adding path spec: /*
>  2020-06-16 15:10:15,954 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: 
> Registered webapp guice modules
>  2020-06-16 15:10:15,955 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
> Jetty bound to port 28343
>  2020-06-16 15:10:15,956 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: 
> jetty-6.1.26
>  2020-06-16 15:10:15,979 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: 
> Extract 
> jar:[file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2.jar!/webapps/|file://data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/]
>  to 
> /data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp
>  2020-06-16 15:10:16,123 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: 
> Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343
>  2020-06-16 15:10:16,123 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web 
> app started at 28343
>  2020-06-16 15:10:16,123 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: 
> Instantiated WebUIService at 
> [http://10-88-0-19:28343/ui/|http://sdp-10-88-0-19:28343/ui/]
>  2020-06-16 15:10:16,125 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
> |rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService
>  2020-06-16 15:10:16,148 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
> |Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, 
> use dfs.bytes-per-checksum
>  2020-06-16 15:10:16,149 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
> |rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with 
> clusterIdentifier=0
>  2020-06-16 15:10:16,159 [INFO] 
> [ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
> |rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with 
> configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, 
> reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, 
> preemptionPercentage: 10, preemptionMaxWaitTime: 6, 
> numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, 
> idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0
>  2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: 
> [HISTORY][DAG:N/A][Event:AM_STARTED]: 
> appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235
>  2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
> Waiting for DAG over RPC
>  2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 
> taskAllocations: 0
>  2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: {color:#ff}*A**llocated:  vCores:0> Free: *{color} pendingRequests: 0 
> delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
>  2020-06-16 15:10:16,264 [INFO] [Dispatcher th

[jira] [Updated] (YARN-10317) RM returns a negative value when TEZ AM requests resources

2020-06-16 Thread yinghua_zh (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yinghua_zh updated YARN-10317:
--
Description: 
RM returns a negative value when TEZ AM requests resources,The records are as 
follows:

2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC 
Server listener on 23482: starting
 2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] 
|client.DAGClientServer|: Instantiated DAGClientRPCServer at 
sdp-10-88-0-19/10.88.0.19:23482
 2020-06-16 15:10:15,726 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Added filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
 2020-06-16 15:10:15,730 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Added filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
static
 2020-06-16 15:10:15,734 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
adding path spec: /*
 2020-06-16 15:10:15,954 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: 
Registered webapp guice modules
 2020-06-16 15:10:15,955 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Jetty bound to port 28343
 2020-06-16 15:10:15,956 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: 
jetty-6.1.26
 2020-06-16 15:10:15,979 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Extract 
jar:[file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2.jar!/webapps/|file://data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/]
 to 
/data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp
 2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Started 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343
 2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web 
app started at 28343
 2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: 
Instantiated WebUIService at 
[http://10-88-0-19:28343/ui/|http://sdp-10-88-0-19:28343/ui/]
 2020-06-16 15:10:16,125 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService
 2020-06-16 15:10:16,148 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, use 
dfs.bytes-per-checksum
 2020-06-16 15:10:16,149 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with 
clusterIdentifier=0
 2020-06-16 15:10:16,159 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with 
configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, 
reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, 
preemptionPercentage: 10, preemptionMaxWaitTime: 6, 
numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, 
idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0
 2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: 
[HISTORY][DAG:N/A][Event:AM_STARTED]: 
appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235
 2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
Waiting for DAG over RPC
 2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 
taskAllocations: 0
 2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: {color:#ff}*A**llocated:  Free: *{color} pendingRequests: 0 
delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
 2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \\{Central}] 
|node.PerSourceNodeTracker|: Num cluster nodes = 11

This leads to errors in tez segmentation

 

 

 

 

  was:
RM returns a negative value when TEZ AM requests resources,The records are as 
follows:

2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC 
Server listener on 23482: starting
2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] 
|client.DAGClientServer|: Instantiated DAGClientRPCServer at 
sdp-10-88-0-19/10.88.0.19:23482
2020-06-16 15:10:15,726 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Added filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.s

[jira] [Created] (YARN-10317) RM returns a negative value when TEZ AM requests resources

2020-06-16 Thread yinghua_zh (Jira)
yinghua_zh created YARN-10317:
-

 Summary: RM returns a negative value when TEZ AM requests resources
 Key: YARN-10317
 URL: https://issues.apache.org/jira/browse/YARN-10317
 Project: Hadoop YARN
  Issue Type: Bug
  Components: fairscheduler
Affects Versions: 2.7.2
Reporter: yinghua_zh


RM returns a negative value when TEZ AM requests resources,The records are as 
follows:

2020-06-16 15:10:15,726 [INFO] [IPC Server listener on 23482] |ipc.Server|: IPC 
Server listener on 23482: starting
2020-06-16 15:10:15,726 [INFO] [ServiceThread:DAGClientRPCServer] 
|client.DAGClientServer|: Instantiated DAGClientRPCServer at 
sdp-10-88-0-19/10.88.0.19:23482
2020-06-16 15:10:15,726 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Added filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
2020-06-16 15:10:15,730 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Added filter AM_PROXY_FILTER 
(class=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter) to context 
static
2020-06-16 15:10:15,734 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
adding path spec: /*
2020-06-16 15:10:15,954 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: 
Registered webapp guice modules
2020-06-16 15:10:15,955 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |http.HttpServer2|: 
Jetty bound to port 28343
2020-06-16 15:10:15,956 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: 
jetty-6.1.26
2020-06-16 15:10:15,979 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Extract 
jar:file:/data/data6/yarn/local/filecache/17/tez.tar.gz/lib/hadoop-yarn-common-2.7.2-SDP.jar!/webapps/
 to 
/data/data1/yarn/local/usercache/zyh/appcache/application_1592291210011_0010/container_e13_1592291210011_0010_01_01/tmp/Jetty_0_0_0_0_28343_webappsmdg1c9/webapp
2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |mortbay.log|: Started 
HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:28343
2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |webapp.WebApps|: Web 
app started at 28343
2020-06-16 15:10:16,123 [INFO] 
[ServiceThread:org.apache.tez.dag.app.web.WebUIService] |web.WebUIService|: 
Instantiated WebUIService at http://sdp-10-88-0-19:28343/ui/
2020-06-16 15:10:16,125 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.TaskSchedulerManager|: Creating TaskScheduler: YarnTaskSchedulerService
2020-06-16 15:10:16,148 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|Configuration.deprecation|: io.bytes.per.checksum is deprecated. Instead, use 
dfs.bytes-per-checksum
2020-06-16 15:10:16,149 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.TaskSchedulerManager|: Creating TaskScheduler: Local TaskScheduler with 
clusterIdentifier=0
2020-06-16 15:10:16,159 [INFO] 
[ServiceThread:org.apache.tez.dag.app.rm.TaskSchedulerManager] 
|rm.YarnTaskSchedulerService|: YarnTaskScheduler initialized with 
configuration: maxRMHeartbeatInterval: 250, containerReuseEnabled: true, 
reuseRackLocal: true, reuseNonLocal: false, localitySchedulingDelay: 250, 
preemptionPercentage: 10, preemptionMaxWaitTime: 6, 
numHeartbeatsBetweenPreemptions: 3, idleContainerMinTimeout: 1, 
idleContainerMaxTimeout: 2, sessionMinHeldContainers: 0
2020-06-16 15:10:16,235 [INFO] [main] |history.HistoryEventHandler|: 
[HISTORY][DAG:N/A][Event:AM_STARTED]: 
appAttemptId=appattempt_1592291210011_0010_01, startTime=1592291416235
2020-06-16 15:10:16,235 [INFO] [main] |app.DAGAppMaster|: In Session mode. 
Waiting for DAG over RPC
2020-06-16 15:10:16,261 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: App total resource memory: -2048 cpu: 0 
taskAllocations: 0
2020-06-16 15:10:16,262 [INFO] [AMRM Callback Handler Thread] 
|rm.YarnTaskSchedulerService|: {color:#FF}*A**llocated:  Free: *{color} pendingRequests: 0 
delayedContainers: 0 heartbeats: 1 lastPreemptionHeartbeat: 0
2020-06-16 15:10:16,264 [INFO] [Dispatcher thread \{Central}] 
|node.PerSourceNodeTracker|: Num cluster nodes = 11

This leads to errors in tez segmentation

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-5040) CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when yarn.nodemanager.resource.percentage-physical-cpu-limit < 100

2019-09-19 Thread yinghua_zh (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933167#comment-16933167
 ] 

yinghua_zh edited comment on YARN-5040 at 9/19/19 8:54 AM:
---

I also encountered the same problem. When CGroup is enabled and a big task is 
running, after a period of time, the operating system kernel crash,But not 
every time. My version information is as follows:

Hadoop:2.7.2

OS: CentOS Linux release 7.3.1611

OS kernel: 3.10.0-514.el7.x86_64

How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie]  [~ecwpp] 
[~cheersyang]@


was (Author: yinghua_zh):
I also encountered the same problem. When CGroup is enabled and a big task is 
running, after a period of time, the operating system kernel crash, My version 
information is as follows:

Hadoop:2.7.2

OS: CentOS Linux release 7.3.1611

OS kernel: 3.10.0-514.el7.x86_64

How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie]  [~ecwpp] @

> CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when 
> yarn.nodemanager.resource.percentage-physical-cpu-limit < 100
> --
>
> Key: YARN-5040
> URL: https://issues.apache.org/jira/browse/YARN-5040
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Major
>
> /cc [~vvasudev]
> We have been running some benchmarks internally with resource isolation 
> enabled. We have consistently run into kernel panics when running a large job 
> ( a large pi job, terasort ). These kernel panics wen't away when we set 
> yarn.nodemanager.resource.percentage-physical-cpu-limit=100 . Anything less 
> than 100 triggers different behavior in YARN's CPU resource handler which 
> seems to cause these issues. Looking at the kernel crash dumps, the 
> backtraces were different - sometimes pointing to java processes, sometimes 
> not. 
> Kernel versions used : 3.10.0-229.14.1.el7.x86_64 and 
> 3.10.0-327.13.1.el7.x86_64 . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-5040) CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when yarn.nodemanager.resource.percentage-physical-cpu-limit < 100

2019-09-19 Thread yinghua_zh (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933167#comment-16933167
 ] 

yinghua_zh commented on YARN-5040:
--

I also encountered the same problem. When CGroup is enabled and a big task is 
running, after a period of time, the operating system kernel crash, My version 
information is as follows:

Hadoop:2.7.2

OS: CentOS Linux release 7.3.1611

OS kernel: 3.10.0-514.el7.x86_64

How did you solve the problem?[~vvasudev] [~sidharta-s] [~Tao Jie]  [~ecwpp] @

> CPU Isolation with CGroups triggers kernel panics on Centos 7.1/7.2 when 
> yarn.nodemanager.resource.percentage-physical-cpu-limit < 100
> --
>
> Key: YARN-5040
> URL: https://issues.apache.org/jira/browse/YARN-5040
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sidharta Seethana
>Assignee: Varun Vasudev
>Priority: Major
>
> /cc [~vvasudev]
> We have been running some benchmarks internally with resource isolation 
> enabled. We have consistently run into kernel panics when running a large job 
> ( a large pi job, terasort ). These kernel panics wen't away when we set 
> yarn.nodemanager.resource.percentage-physical-cpu-limit=100 . Anything less 
> than 100 triggers different behavior in YARN's CPU resource handler which 
> seems to cause these issues. Looking at the kernel crash dumps, the 
> backtraces were different - sometimes pointing to java processes, sometimes 
> not. 
> Kernel versions used : 3.10.0-229.14.1.el7.x86_64 and 
> 3.10.0-327.13.1.el7.x86_64 . 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org