Hi Stuart,

On 18/09/14 13:08, Stuart Barkley wrote:

> Chris, I remember your cluster was similar to ours, except bigger and
> newer.  Are you running M3 or M4 systems?

These are M4's (SandyBridge):

barcoo001: System Description: iDataPlex dx360 M4

> We are now running kernel-2.6.32-431.23.3.el6.x86_64 on all of our
> compute nodes (CentOS and EPEL mirrored as of 20140830).

We're on 2.6.32-358.18.1.el6.x86_64 (RHEL 6.4).

> <sidetrack>
> Our CentOS 6.5 rollout has been a little less successful than I had
> hopped.  The first compute image had a kernel performance regression
> and we needed to revert to the last 6.4 kernel (keeping the rest of
> 6.5).

Uh-oh, fixed in later 6.5 kernels?

>  Our current 'stable' version has a regression in the EPEL
> version of R (or maybe just an annoying to users change "libRblas").

We always build our own central installs of things like R, Perl and
Python so we only need to install any requested modules a single time
(for $VERSION of $LANGUAGE) and they're instantly visible across the
cluster.

> The real annoyance I find is the GPFS kernel module rebuild needed if
> our 'stable' freeze includes a kernel change.

Amen brother.

> </sidetrack>
> 
> One of the things I needed to do was a more complete inventory of the
> situation across our various systems.  I've now looked in more detail
> at our support and lab systems including another cluster we are
> working on.  The results are interesting, but not xCAT related.
> 
> All of our IBM x3650 M2, x3650 M3, dx360 M2 have zone_reclaim_mode set
> to 1.  Some of these are running CentOS 6.4 from over a year ago.
> This includes several systems installed from CD/USB/Kickstart.

It would be very interesting to boot the same LiveCD on each of those
and see if the zone_reclaim_node replicated what you see here - that
would say it's down to kernel auto-tuning..

> One odd ball homebrew lab system also has zone_reclaim_mode set to 1.

Do any of them show any evidence of configuration to set it?

If you've got git installed (and don't have etckeeper going) then you
can do:

cd /etc; git grep --no-index zone_reclaim_node

to quickly scan for it (if you've got etckeeper then just drop the
--no-index flag as it'll be a git repo).

> All of the other systems I can see have zone_reclaim_mode set to 0.
> This includes VMs (running under KVM), other homebrew systems, and a
> handful of systems from other vendors.

Very weird..

> Also of note, all our newer IBM systems have zone_reclaim_mode set to
> 0.  This includes x3650 M4, dx360 M4, x3690 X5 and x3850 X5 systems (a
> small 'new' cluster, a GSS rack, some high memory compute nodes and
> some other miscellaneous gear).
> 
> It may also be time for us to review our BIOS/IMM versions and even
> our changes from the defaults.  Lots to do.

To be honest I'm suspicious of kernel autotuning here..

cheers!
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci


------------------------------------------------------------------------------
Slashdot TV.  Video for Nerds.  Stuff that Matters.
http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to