My NIC on the xCat servers is a Broadcom Corporation NetXtreme II BCM57810 10 
Gigabit Ethernet (rev 10). The NIC on the nodes is a HP Ethernet 1Gb 4-port 
366M Adapter.

Did you have any fixes for it besides replacing the NIC? I'll check for new 
firmware. 

Thanks,

John Lopilato 

From: Russell Auld [mailto:[email protected]] 
Sent: Wednesday, April 06, 2016 11:56 AM
To: xCAT Users Mailing list <[email protected]>
Subject: Re: [xcat-user] EXTERNAL: Re: Stateless node randomly fails to get 
initrd-stateless.gz

I've also seen this kind of weirdness with wonky Broadcom (QLogic) 10Gb NICs.
On Apr 6, 2016 11:41 AM, Damir Krstic <[email protected]> wrote:
I had a similar issue with one of our new Nextscale racks - it turns out my 
uplink ethernet cable was bad...first bad ethernet cable on new installation of 
over 500+ nodes.

Damir

On Wed, Apr 6, 2016 at 10:39 AM Lopilato, John <[email protected]> wrote:
I do see the request in /var/log/httpd/access_log:

172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET 
/tftpboot/xcat/xnba/nodes/proc5212.uefi HTTP/1.1" 200 107 "-" 
"iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/elilo-x64.efi 
HTTP/1.1" 200 242929 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET 
/tftpboot/xcat/xnba/nodes/proc5212.elilo HTTP/1.1" 200 402 "-" 
"iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET /tftpboot/xcat/osimage/rhel7 
/kernel HTTP/1.1" 200 4902000 "-" "iPXE/1.0.3-131028 (d603e)"
172.16.17.39 - - [06/Apr/2016:10:36:43 -0400] "GET 
/tftpboot/xcat/osimage/rhel7/initrd-stateless.gz HTTP/1.1" 200 27339862 "-" 
"iPXE/1.0.3-131028 (d603e)"

There are no entries for hung nodes in error_log.

I do not see any persistent connection open in netstat -ntp, though it's hard 
to check quickly since I have to induce the problem and it occurs randomly, so 
I effectively see the problem about 1-3 minutes after the node has hung up.

Thanks,

John Lopilato


-----Original Message-----
From: Christopher Samuel [mailto:[email protected]]
Sent: Tuesday, April 05, 2016 8:58 PM
To: [email protected]
Subject: EXTERNAL: Re: [xcat-user] Stateless node randomly fails to get 
initrd-stateless.gz

On 06/04/16 00:39, Lopilato, John wrote:

> You can see where it hangs in the attached image.  I'd appreciate any
> help on tracking down what's going on here.

When it hangs do you see the request appear in the Apache server logs?

If so is there still the corresponding connection visible from the
compute node to Apache in netstat -ntp ?

All the best,
Chris
--
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci


------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to