Daniel, thx for your response. i'm aware of the LACP behavior and did not
expect x4 performance but i also did not expect two link out of four will
not send traffic at all (maybe just normal behavior). receive seems to
work better.
the distribution is two links, each sending 117M per second and two links
sending 0. we have enough hosts (>150) to have better distribution (i hope
:) ).
Here is what we see on the console of servers that are not booted -
CLIENT MAC ADDR: 34 40 B5 BB 41 37 GUID: 261970A2 18F8 E111 A95E
3440B5BB4137
CLIENT IP: 10.2.5.42 MASK: 255.255.0.0 DHCP IP: 10.2.200.8
GATEWAY IP: 10.2.254.100
PXE->EB: !PXE at 97D2:0070, entry point at 97D2:0106
UNDI code segment 97D2:5210, data segment 9197:63B0 (582-628kB)
UNDI device is PCI 06:00.0, type DIX+802.3
542kB free base memory after PXE unload
xNBA initialising devices...ok
xCAT Network Boot Agent
iPXE 1.0.3-7 -- Open Source Network Boot Firmware -- http://ipxe.org
Features: HTTP HTTPS iSCSI DNS TFTP bzImage ELF PXE PXEXT
net0: 34:40:b5:bb:41:37 using undionly on UNDI-PCI06:00.0 (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 34:40:b5:bb:41:37)... ok
net0: 10.2.5.42/255.255.0.0 gw 10.2.254.100
Next server: 10.2.200.5
Filename: http://10.2.200.5/tftpboot/xcat/xnba/nodes/s05r1b42
medpout1(http://ipxe.org/4c0a6035)ba/nodes/s05r1b42...............
Connection ti
Could not chain image: Connection timed out (http://ipxe.org/4c0a6035)
No more network devices
PXE-M0F: Exiting Intel Boot Agent.
Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997-2010, Intel Corporation
CLIENT MAC ADDR: 34 40 B5 BB 41 38 GUID: 261970A2 18F8 E111 A95E
3440B5BB4137
PXE-E51: No DHCP or proxyDHCP offers were received.
PXE-M0F: Exiting Intel Boot Agent.
Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997*2010, Intel Corporation
CLIENT MAC ADDR: 34 40 B5 BB 42 2D GUID: DA913255 7CF7 E111 ABA8
3440B5BB422D
CLIENT IP: 10.2.5.60 MASK: 255.255.0.0 DHCP IP: 10.2.200.5
GATEWAY IP: 10.2.254.100
PXE*E11: ARP timeout
PXE*M0F: Exiting Intel Boot Agent.
Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997*2010, Intel Corporation
CLIENT MAC ADDR: 34 40 B5 BB 42 2E GUID: DA913255 7CF7 E111 ABA8
3440B5BB422D
PXE*E51: No DHCP or proxyDHCP offers were received.
PXE*M0F: Exiting Intel Boot Agent.
Is this simply a network BW issue? any parameters we can change to
imporove it given the network? we folloew the xCAT for large cluster doc,
except the http caching.
Any ideas are more than welcome :)
Regards,
Gilad Berman
HPC Architect
IBM System & Technology Group. Israel
E-mail: [email protected]
Tel: 972-3-9188262
Mobile: 972-52-2554262
The information contained in this email is being provided by IBM as a
matter of courtesy and provided "AS-IS" without any direct and implied
warranty; IBM assumes no liability. It is your responsibility to ensure
that any resulting customer proposal has been correctly designed to meet
your clients' requirements and to have an active review process which
ensures an appropriate level of solution assurance is performed for all
proposals. IBM does not take responsibility for the solution or solution
assurance.
From: "Daniel M. Weeks" <[email protected]>
To: xCAT Users Mailing list <[email protected]>,
Cc: Gilad Berman/Israel/IBM@IBMIL
Date: 21/12/2012 18:46
Subject: Re: [xcat-user] Improving boot performance
On 12/21/2012 10:45 AM, Gilad Berman wrote:
> Hello All,
>
> I would like your advise on how we can increase the boot performance,
> i.e - when we are booting 84 nodes at once (Statelite) some of them (~2)
> usually do not boot (looks like they are not getting dhcp request but
> i'm not completely sure) and we have to reboot them.
> when booting 168 from the same node the number goes up to ~5-7 nodes.
> While this is might be acceptable we would still like to improve it.
>
> I know the first thing comes to mind is the network - we have 4 port
> LACP bonding, however, only two ports are actually active on send for
> some reason - we see two links fully loaded and two links do not send
> anything. on receive the bond behave as expected.
>
> I know the LACP issue has nothing to do with xCAT but i'm sure people on
> the list have optimized the boot performance before.
LACP uses hashing to determine which link to send traffic over. I would
guess that the default algorithm is not appropriate for your traffic
pattern and it's not balancing well across the links. It's typical to
only get a 70/30 split on a "well-tuned" aggregated link and a true
50/50 split is very rare.
>
> Any advise is appreciated and happy Christmas :)
>
> Regards,
>
> Gilad Berman
> HPC Architect
> IBM System & Technology Group. Israel
>
> E-mail: [email protected]
> Tel: 972-3-9188262
> Mobile: 972-52-2554262
>
> The information contained in this email is being provided by IBM as a
> matter of courtesy and provided "AS-IS" without any direct and implied
> warranty; IBM assumes no liability. It is your responsibility to ensure
> that any resulting customer proposal has been correctly designed to meet
> your clients' requirements and to have an active review process which
> ensures an appropriate level of solution assurance is performed for all
> proposals. IBM does not take responsibility for the solution or solution
> assurance.
>
>
>
------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
>
>
>
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Daniel M. Weeks
Systems Administrator
Computational Center for Nanotechnology Innovations
Rensselaer Polytechnic Institute
Troy, NY 12180
518-276-4458
------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user