[jira] [Updated] (YARN-2299) inconsistency at identifying node

2014-11-27 Thread Bruno Alexandre Rosa (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Alexandre Rosa updated YARN-2299:
---
Affects Version/s: 2.5.0
   2.5.2

 inconsistency at identifying node
 -

 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.5.0, 2.5.2
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 If port of yarn.nodemanager.address is not specified at NM, NM will choose 
 random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
 and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
 host:port1 and host:port2 will both be present in Active Nodes on WebUI 
 for a while, and after host:port1 expiration, we get host:port1 in Lost 
 Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead 
 again, we get only host:port1 in Lost Nodes. host:port2 is neither in 
 Active Nodes nor in  Lost Nodes.
 Another case, two NM is running on same host(miniYarnCluster or other test 
 purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
 In both case, sum of Active Nodes and Lost Nodes is not the number of 
 nodes we expected.
 The root cause is due to inconsistency at how we think two Nodes are 
 identical.
 When we manager active nodes(RMContextImpl.nodes), we use NodeId which 
 contains port. Two nodes with same host but different port are thought to be 
 different node.
 But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
 use host. Two nodes with same host but different port are thought to 
 identical.
 To fix the inconsistency, we should differentiate below 2 cases and be 
 consistent for both of them:
  - intentionally multiple NMs per host
  - NM instances one after another on same host
 Two possible solutions:
 1) Introduce a boolean config like one-node-per-host(default as true), 
 and use host to differentiate nodes on RM if it's true.
 2) Make it mandatory to have valid port in yarn.nodemanager.address config. 
  In this sutiation, NM instances one after another on same host will have 
 same NodeId, while intentionally multiple NMs per host will have different 
 NodeId.
 Personally I prefer option 1 because it's easier for users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2299) inconsistency at identifying node

2014-07-15 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2299:
--

Description: 
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

To fix the inconsistency, we should differentiate below 2 cases and support 
both of them:
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.


  was:
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

We should differentiate 2 cases: 
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.



 inconsistency at identifying node
 -

 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 If port of yarn.nodemanager.address is not specified at NM, NM will choose 
 random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
 and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
 host:port1 and host:port2 will both be present in Active Nodes on WebUI 
 for a while, and after host:port1 expiration, we get host:port1 in Lost 
 Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead 
 again, we get only host:port1 in Lost Nodes. host:port2 is neither in 
 Active Nodes nor in  Lost Nodes.
 Another case, two NM is running on same host(miniYarnCluster or other test 
 purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
 In both case, sum of Active Nodes and Lost Nodes is not the number of 
 nodes we expected.
 The root cause is due 

[jira] [Updated] (YARN-2299) inconsistency at identifying node

2014-07-15 Thread Hong Zhiguo (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Zhiguo updated YARN-2299:
--

Description: 
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

To fix the inconsistency, we should differentiate below 2 cases and be 
consistent for both of them:
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.


  was:
If port of yarn.nodemanager.address is not specified at NM, NM will choose 
random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
host:port1 and host:port2 will both be present in Active Nodes on WebUI 
for a while, and after host:port1 expiration, we get host:port1 in Lost Nodes 
and host:port2 in Active Nodes. If the NM is ungracefully dead again, we get 
only host:port1 in Lost Nodes. host:port2 is neither in Active Nodes nor 
in  Lost Nodes.

Another case, two NM is running on same host(miniYarnCluster or other test 
purpose), if both of them are lost, we get only one Lost Nodes in WebUI.

In both case, sum of Active Nodes and Lost Nodes is not the number of nodes 
we expected.

The root cause is due to inconsistency at how we think two Nodes are identical.
When we manager active nodes(RMContextImpl.nodes), we use NodeId which contains 
port. Two nodes with same host but different port are thought to be different 
node.
But when we manager inactive nodes(RMContextImpl.inactiveNodes), we use only 
use host. Two nodes with same host but different port are thought to identical.

To fix the inconsistency, we should differentiate below 2 cases and support 
both of them:
 - intentionally multiple NMs per host
 - NM instances one after another on same host

Two possible solutions:
1) Introduce a boolean config like one-node-per-host(default as true), and 
use host to differentiate nodes on RM if it's true.

2) Make it mandatory to have valid port in yarn.nodemanager.address config.  
In this sutiation, NM instances one after another on same host will have same 
NodeId, while intentionally multiple NMs per host will have different NodeId.

Personally I prefer option 1 because it's easier for users.



 inconsistency at identifying node
 -

 Key: YARN-2299
 URL: https://issues.apache.org/jira/browse/YARN-2299
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Hong Zhiguo
Assignee: Hong Zhiguo
Priority: Critical

 If port of yarn.nodemanager.address is not specified at NM, NM will choose 
 random port. If the NM is ungracefully dead(OOM kill, kill -9, or OS restart) 
 and then restarted within yarn.nm.liveness-monitor.expiry-interval-ms, 
 host:port1 and host:port2 will both be present in Active Nodes on WebUI 
 for a while, and after host:port1 expiration, we get host:port1 in Lost 
 Nodes and host:port2 in Active Nodes. If the NM is ungracefully dead 
 again, we get only host:port1 in Lost Nodes. host:port2 is neither in 
 Active Nodes nor in  Lost Nodes.
 Another case, two NM is running on same host(miniYarnCluster or other test 
 purpose), if both of them are lost, we get only one Lost Nodes in WebUI.
 In both case, sum of Active Nodes and Lost Nodes