I have 2 different xcat boot servers (we'll call them boot1 and boot2)
both running xcat 2.16.5 who were upgraded from 2.10. boot2 was
upgraded directly, while boot 1 was upgraded but then xcat was removed
completely then installed 2.16.5 fresh.
I am having an issue with nodeset on boot1. For some reason, nodeset
on a single node on boot 1 takes over a minute, while nodeset on the
same node on boot2 takes less than a second. I can' seem to find the
cause of this. I tried stracing the nodeset and I get long timeouts on
a resource being temporarily unavailable but I can't seem to figure
out the resource.
[root@boot1 ~]# time nodeset -V proc5201_p osimage=rhel7-ia2
proc5201_p: netboot rhels7.4-x86_64-compute
real 0m50.636s
user 0m0.064s
sys 0m0.038s
[root@boot2 ~]# time nodeset -V proc5201_p osimage=rhel7-ia2
proc5201_p: netboot rhels7.4-x86_64-compute
real 0m0.799s
user 0m0.060s
sys 0m0.033s
on boot1:
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
read(3, 0x20de033, 5) = -1 EAGAIN (Resource
temporarily unavailable)
select(8, [3], NULL, NULL, {0, 500000}) = 0 (Timeout)
Has anyone seen this before and can help explain why this is
happening?
_______________________
- Keith Hannum
- keith.han...@lmco.com