-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi all,
We've just reinstalled a cluster that was running RHEL5 with RHEL6 and
gone from xCAT 2.6.9 to 2.8.2.
Reusing the same switch definitions that worked for discovery in xCAT
2.6.9 (and adding the nodes from scratch) we find that we are getting
very unreliable node discovery, often with many nodes discovered as
merri001 or merri015.
Here you can see the first 14 nodes discovered fine, then then next two
are both discovered as merri001:
Aug 6 16:14:23 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.2
Aug 6 16:14:28 merri-m xCAT node discovery: merri001 has been discovered
Aug 6 16:17:28 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.4
Aug 6 16:17:34 merri-m xCAT node discovery: merri002 has been discovered
Aug 6 16:22:19 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.6
Aug 6 16:22:25 merri-m xCAT node discovery: merri003 has been discovered
Aug 6 16:22:50 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.13
Aug 6 16:22:53 merri-m xCAT node discovery: merri004 has been discovered
Aug 6 16:22:59 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.15
Aug 6 16:23:04 merri-m xCAT node discovery: merri005 has been discovered
Aug 6 16:23:09 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.16
Aug 6 16:23:11 merri-m xCAT node discovery: merri007 has been discovered
Aug 6 16:23:11 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.17
Aug 6 16:23:13 merri-m xCAT node discovery: merri006 has been discovered
Aug 6 16:23:14 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.18
Aug 6 16:23:16 merri-m xCAT node discovery: merri008 has been discovered
Aug 6 16:23:16 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.19
Aug 6 16:23:18 merri-m xCAT node discovery: merri010 has been discovered
Aug 6 16:23:22 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.20
Aug 6 16:23:24 merri-m xCAT node discovery: merri009 has been discovered
Aug 6 16:27:40 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.25
Aug 6 16:27:46 merri-m xCAT node discovery: merri011 has been discovered
Aug 6 16:27:46 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.26
Aug 6 16:27:48 merri-m xCAT node discovery: merri012 has been discovered
Aug 6 16:27:55 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.28
Aug 6 16:27:57 merri-m xCAT node discovery: merri013 has been discovered
Aug 6 16:28:01 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.29
Aug 6 16:28:03 merri-m xCAT node discovery: merri014 has been discovered
Aug 6 16:28:10 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.34
Aug 6 16:28:13 merri-m xCAT node discovery: merri001 has been discovered
Aug 6 16:28:16 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.36
Aug 6 16:28:22 merri-m xCAT node discovery: merri001 has been discovered
Aug 6 16:28:26 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.38
Aug 6 16:28:28 merri-m xCAT node discovery: merri001 has been discovered
It's not always the same nodes that are getting misidentified either, I
could handle it if it was consistent. What is consistent, though is
that it seems to be merri001 and merri015 which accumulate the
duplicates.
Our switch config is:
#node,switch,port,vlan,interface,comments,disable
"merri","sw03","27",,,,
"merri-v","sw04","27",,,,
"sw02b","sw02","|\D+0*(\d+)|(($1-26))|",,,,
"sw02a","sw02","|\D+0*(\d+)|($1)|",,,,
"sw03a","sw03","|\D+0*(\d+)|(($1)-14)|",,,,
"sw04a","sw04","|\D+0*(\d+)|(($1)-54)|",,,,
"nsd","sw05","|\D+0*(\d+)|(($1)+24)|",,,,
"merri081","sw05","35",,,,
"merri082","sw05","36",,,,
"merri083","sw06","22",,,,
"terri","sw06","16",,,,
"turpin","sw05","17",,,,
We added nodes as:
nodeadd merri001-merri014 groups=all,compute,sw02a,ipmis
nodeadd merri015-merri040 groups=all,compute,sw03a,ipmis
nodeadd merri041-merri054 groups=all,compute,sw02b,ipmis
nodeadd merri055-merri080 groups=all,compute,sw04a,ipmis
The groups sw02a, sw03a, sw02b and sw04a define the switch ports thus:
sw02a:
objtype=group
grouptype=static
members=merri001,merri002,merri003,merri004,merri005,merri006,merri007,merri008,merri009,merri010,merri011,merri012,merri013,merri014
switch=sw02
switchport=|\D+0*(\d+)|($1)|
sw02b:
objtype=group
grouptype=static
members=merri041,merri042,merri043,merri044,merri045,merri046,merri047,merri048,merri049,merri050,merri051,merri052,merri053,merri054
switch=sw02
switchport=|\D+0*(\d+)|(($1-26))|
sw03a:
objtype=group
grouptype=static
members=merri015,merri016,merri017,merri018,merri019,merri020,merri021,merri022,merri023,merri024,merri025,merri026,merri027,merri028,merri029,merri030,merri031,merri032,merri033,merri034,merri035,merri036,merri037,merri038,merri039,merri040
switch=sw03
switchport=|\D+0*(\d+)|(($1)-14)|
sw04a:
objtype=group
grouptype=static
members=merri055,merri056,merri057,merri058,merri059,merri060,merri061,merri062,merri063,merri064,merri065,merri066,merri067,merri068,merri069,merri070,merri071,merri072,merri073,merri074,merri075,merri076,merri077,merri078,merri079,merri080
switch=sw04
switchport=|\D+0*(\d+)|(($1)-54)|
Has anyone seen anything like this?
The fact that we've not changed hardware at all, merely RHEL
and xCAT makes me think it's xCAT related and the fact that
it's not consistent makes me wonder if it's some odd state
related problem (or a race condidtion).
All the best!
Chris
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: [email protected] Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iEYEARECAAYFAlIAqBgACgkQO2KABBYQAh/pugCeNJw0GImzok7yftoUQezNc8Tl
3/oAn2xA9ywUgi5QYYtH4EqhYJ2IeIx7
=CSDz
-----END PGP SIGNATURE-----
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user