I've also seen this kind of weirdness with wonky Broadcom (QLogic) 10Gb NICs.

On Apr 6, 2016 11:41 AM, Damir Krstic <[email protected]> wrote:
I had a similar issue with one of our new Nextscale racks - it turns out my uplink ethernet cable was bad...first bad ethernet cable on new installation of over 500+ nodes.

Damir

On Wed, Apr 6, 2016 at 10:39 AM Lopilato, John <[email protected]> wrote:
I do see the request in /var/log/httpd/access_log:

172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/xnba/nodes/proc5212.uefi HTTP/1.1" 200 107 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/elilo-x64.efi HTTP/1.1" 200 242929 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/xnba/nodes/proc5212.elilo HTTP/1.1" 200 402 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/osimage/rhel7 /kernel HTTP/1.1" 200 4902000 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/osimage/rhel7/initrd-stateless.gz HTTP/1.1" 200 27339862 "-" "iPXE/1.0.3-131028 (d603e)"

There are no entries for hung nodes in error_log.

I do not see any persistent connection open in netstat -ntp, though it's hard to check quickly since I have to induce the problem and it occurs randomly, so I effectively see the problem about 1-3 minutes after the node has hung up.

Thanks,

John Lopilato


-----Original Message-----
From: Christopher Samuel [mailto:[email protected]]
Sent: Tuesday, April 05, 2016 8:58 PM
To: [email protected]
Subject: EXTERNAL: Re: [xcat-user] Stateless node randomly fails to get initrd-stateless.gz

On 06/04/16 00:39, Lopilato, John wrote:

> You can see where it hangs in the attached image.  I'd appreciate any
> help on tracking down what's going on here.

When it hangs do you see the request appear in the Apache server logs?

If so is there still the corresponding connection visible from the
compute node to Apache in netstat -ntp ?

All the best,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci


------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to