Daniel, thx for your response. i'm aware of the LACP behavior and did not 
expect x4 performance but i also did not expect two link out of four will 
not send traffic at all (maybe just normal behavior). receive seems to 
work better. 
the distribution is two links, each sending 117M per second and two links 
sending 0. we have enough hosts (>150) to have better distribution (i hope 
:) ).

Here is what we see on the console of servers that are not booted - 

CLIENT MAC ADDR: 34 40 B5 BB 41 37  GUID: 261970A2 18F8 E111 A95E 
3440B5BB4137
CLIENT IP: 10.2.5.42  MASK: 255.255.0.0  DHCP IP: 10.2.200.8
GATEWAY IP: 10.2.254.100
PXE->EB: !PXE at 97D2:0070, entry point at 97D2:0106
          UNDI code segment 97D2:5210, data segment 9197:63B0 (582-628kB)
          UNDI device is PCI 06:00.0, type DIX+802.3
          542kB free base memory after PXE unload
xNBA initialising devices...ok

xCAT Network Boot Agent
iPXE 1.0.3-7 -- Open Source Network Boot Firmware -- http://ipxe.org
Features: HTTP HTTPS iSCSI DNS TFTP bzImage ELF PXE PXEXT
net0: 34:40:b5:bb:41:37 using undionly on UNDI-PCI06:00.0 (open)
   [Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 34:40:b5:bb:41:37)... ok
net0: 10.2.5.42/255.255.0.0 gw 10.2.254.100
Next server: 10.2.200.5
Filename: http://10.2.200.5/tftpboot/xcat/xnba/nodes/s05r1b42
medpout1(http://ipxe.org/4c0a6035)ba/nodes/s05r1b42............... 
Connection ti

Could not chain image: Connection timed out (http://ipxe.org/4c0a6035)
No more network devices
PXE-M0F: Exiting Intel Boot Agent.

Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997-2010, Intel Corporation

CLIENT MAC ADDR: 34 40 B5 BB 41 38  GUID: 261970A2 18F8 E111 A95E 
3440B5BB4137
PXE-E51: No DHCP or proxyDHCP offers were received.

PXE-M0F: Exiting Intel Boot Agent.



Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997*2010, Intel Corporation

CLIENT MAC ADDR: 34 40 B5 BB 42 2D  GUID: DA913255 7CF7 E111 ABA8 
3440B5BB422D
CLIENT IP: 10.2.5.60  MASK: 255.255.0.0  DHCP IP: 10.2.200.5
GATEWAY IP: 10.2.254.100
PXE*E11: ARP timeout
PXE*M0F: Exiting Intel Boot Agent.

Intel(R) Boot Agent GE v1.3.72
Copyright (C) 1997*2010, Intel Corporation

CLIENT MAC ADDR: 34 40 B5 BB 42 2E  GUID: DA913255 7CF7 E111 ABA8 
3440B5BB422D
PXE*E51: No DHCP or proxyDHCP offers were received.

PXE*M0F: Exiting Intel Boot Agent.


Is this simply a network BW issue? any parameters we can change to 
imporove it given the network?  we folloew the xCAT for large cluster doc, 
except the http caching. 

Any ideas are more than welcome :) 

Regards,

Gilad Berman
HPC Architect
IBM System & Technology Group. Israel

E-mail: [email protected]
Tel:    972-3-9188262
Mobile: 972-52-2554262

The information contained in this email is being provided by IBM as a 
matter of courtesy and provided "AS-IS" without any direct and implied 
warranty; IBM assumes no liability. It is your responsibility to ensure 
that any resulting customer proposal has been correctly designed to meet 
your clients' requirements and to have an active review process which 
ensures an appropriate level of solution assurance is performed for all 
proposals. IBM does not take responsibility for the solution or solution 
assurance.



From:   "Daniel M. Weeks" <[email protected]>
To:     xCAT Users Mailing list <[email protected]>, 
Cc:     Gilad Berman/Israel/IBM@IBMIL
Date:   21/12/2012 18:46
Subject:        Re: [xcat-user] Improving boot  performance



On 12/21/2012 10:45 AM, Gilad Berman wrote:
> Hello All,
> 
> I would like your advise on how we can increase the boot performance,
> i.e - when we are booting 84 nodes at once (Statelite) some of them (~2)
> usually do not boot (looks like they are not getting dhcp request but
> i'm not completely sure) and we have to reboot them.
> when booting 168 from the same node the number goes up to ~5-7 nodes.
>  While  this is might be acceptable we would still like to improve it.
> 
> I know the first thing comes to mind is the network - we have 4 port
> LACP bonding, however, only two ports are actually active on send for
> some reason - we see two links fully loaded and two links do not send
> anything. on receive  the bond behave as expected.
> 
> I know the LACP issue has nothing to do with xCAT but i'm sure people on
> the list have optimized the boot performance before.

LACP uses hashing to determine which link to send traffic over. I would
guess that the default algorithm is not appropriate for your traffic
pattern and it's not balancing well across the links. It's typical to
only get a 70/30 split on a "well-tuned" aggregated link and a true
50/50 split is very rare.

> 
> Any advise is appreciated and happy Christmas :)
> 
> Regards,
> 
> Gilad Berman
> HPC Architect
> IBM System & Technology Group. Israel
> 
> E-mail: [email protected]
> Tel:    972-3-9188262
> Mobile: 972-52-2554262
> 
> The information contained in this email is being provided by IBM as a
> matter of courtesy and provided "AS-IS" without any direct and implied
> warranty; IBM assumes no liability. It is your responsibility to ensure
> that any resulting customer proposal has been correctly designed to meet
> your clients' requirements and to have an active review process which
> ensures an appropriate level of solution assurance is performed for all
> proposals. IBM does not take responsibility for the solution or solution
> assurance.
> 
> 
> 
------------------------------------------------------------------------------
> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
> Remotely access PCs and mobile devices and provide instant support
> Improve your efficiency, and focus on delivering more value-add services
> Discover what IT Professionals Know. Rescue delivers
> http://p.sf.net/sfu/logmein_12329d2d
> 
> 
> 
> _______________________________________________
> xCAT-user mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/xcat-user
> 


-- 
Daniel M. Weeks
Systems Administrator
Computational Center for Nanotechnology Innovations
Rensselaer Polytechnic Institute
Troy, NY 12180
518-276-4458


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to