Every NodeManager registers and heartbeats to the active ResourceManager instance, which acts as the source of truth for cluster node status. If the active ResourceManager terminates, then another becomes active, and every NodeManager will start a new connection to register and heartbeat with that new active ResourceManager.
As such, a standby ResourceManager cannot satisfy requests for node status and instead will redirect to the current active: curl -i ' http://cnauroth-ha-m-2:8088/ws/v1/cluster/nodes/cnauroth-ha-w-0.us-central1-c.c.hadoop-cloud-dev.google.com.internal:8026 ' HTTP/1.1 307 Temporary Redirect Date: Tue, 27 Dec 2022 19:28:38 GMT Cache-Control: no-cache Expires: Tue, 27 Dec 2022 19:28:38 GMT Date: Tue, 27 Dec 2022 19:28:38 GMT Pragma: no-cache Content-Type: text/plain;charset=utf-8 X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Location: http://cnauroth-ha-m-1.us-central1-c.c.hadoop-cloud-dev.google.com.internal.:8088/ws/v1/cluster/nodes/cnauroth-ha-w-0.us-central1-c.c.hadoop-cloud-dev.google.com.internal:8026 Content-Length: 136 If it looked like you were able to query a standby, then perhaps you were using a browser or some other client that automatically follows redirects (e.g. curl -L)? The data really would have come from the active though, so you can trust that it's not stale. The only thing you might have to consider is that after a failover, it might take a while before every NodeManager registers with the new ResourceManager. Separately, if you're concerned about divergence of node include/exclude files, you can configure them to be stored at a shared file system (e.g. your preferred cloud object store) to be used by all ResourceManager instances. Chris Nauroth On Sat, Dec 24, 2022 at 6:27 PM Dong Ye <yedong...@gmail.com> wrote: > Hi, All: > > I have some questions about the state of the node manager. If I use > the rest API > > - http://rm-http-address:port/ws/v1/cluster/nodes/{nodeid} > > to get node manager state from a standby RM, > 1) is it possible that it could be stale? > 2) If it is possible, how long will the node manager state be updated? > 3) Is it possible that the NM state returned from standby RM be very > different from that returned from the active RM? Say one is returning > RUNNING while the other returns DECOMMISSIONED because the local > exclude.xml is very different/diverges? > > Thanks. > Have a good holiday. >