Chris, Two things you need to check:
1. make sure the nodes which switch and port that calculated from regular expression have not duplicated ones. Run the lsdef node to check. 2. for the switch+port which has multiple nodes discovered, you need to make sure whether there are multiple nodes (multiple macs on one port) come from this port. login to the which and check the mac table for the specific port. Thanks Best Regards ---------------------------------------------------------------------- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: [email protected] Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 From: Christopher Samuel <[email protected]> To: xCAT Users Mailing list <[email protected]>, Date: 2013/08/06 15:50 Subject: [xcat-user] xCAT node discovery problems -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, We've just reinstalled a cluster that was running RHEL5 with RHEL6 and gone from xCAT 2.6.9 to 2.8.2. Reusing the same switch definitions that worked for discovery in xCAT 2.6.9 (and adding the nodes from scratch) we find that we are getting very unreliable node discovery, often with many nodes discovered as merri001 or merri015. Here you can see the first 14 nodes discovered fine, then then next two are both discovered as merri001: Aug 6 16:14:23 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.2 Aug 6 16:14:28 merri-m xCAT node discovery: merri001 has been discovered Aug 6 16:17:28 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.4 Aug 6 16:17:34 merri-m xCAT node discovery: merri002 has been discovered Aug 6 16:22:19 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.6 Aug 6 16:22:25 merri-m xCAT node discovery: merri003 has been discovered Aug 6 16:22:50 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.13 Aug 6 16:22:53 merri-m xCAT node discovery: merri004 has been discovered Aug 6 16:22:59 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.15 Aug 6 16:23:04 merri-m xCAT node discovery: merri005 has been discovered Aug 6 16:23:09 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.16 Aug 6 16:23:11 merri-m xCAT node discovery: merri007 has been discovered Aug 6 16:23:11 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.17 Aug 6 16:23:13 merri-m xCAT node discovery: merri006 has been discovered Aug 6 16:23:14 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.18 Aug 6 16:23:16 merri-m xCAT node discovery: merri008 has been discovered Aug 6 16:23:16 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.19 Aug 6 16:23:18 merri-m xCAT node discovery: merri010 has been discovered Aug 6 16:23:22 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.20 Aug 6 16:23:24 merri-m xCAT node discovery: merri009 has been discovered Aug 6 16:27:40 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.25 Aug 6 16:27:46 merri-m xCAT node discovery: merri011 has been discovered Aug 6 16:27:46 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.26 Aug 6 16:27:48 merri-m xCAT node discovery: merri012 has been discovered Aug 6 16:27:55 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.28 Aug 6 16:27:57 merri-m xCAT node discovery: merri013 has been discovered Aug 6 16:28:01 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.29 Aug 6 16:28:03 merri-m xCAT node discovery: merri014 has been discovered Aug 6 16:28:10 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.34 Aug 6 16:28:13 merri-m xCAT node discovery: merri001 has been discovered Aug 6 16:28:16 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.36 Aug 6 16:28:22 merri-m xCAT node discovery: merri001 has been discovered Aug 6 16:28:26 merri-m xCAT: xcatd: Processing discovery request from 10.6.3.38 Aug 6 16:28:28 merri-m xCAT node discovery: merri001 has been discovered It's not always the same nodes that are getting misidentified either, I could handle it if it was consistent. What is consistent, though is that it seems to be merri001 and merri015 which accumulate the duplicates. Our switch config is: #node,switch,port,vlan,interface,comments,disable "merri","sw03","27",,,, "merri-v","sw04","27",,,, "sw02b","sw02","|\D+0*(\d+)|(($1-26))|",,,, "sw02a","sw02","|\D+0*(\d+)|($1)|",,,, "sw03a","sw03","|\D+0*(\d+)|(($1)-14)|",,,, "sw04a","sw04","|\D+0*(\d+)|(($1)-54)|",,,, "nsd","sw05","|\D+0*(\d+)|(($1)+24)|",,,, "merri081","sw05","35",,,, "merri082","sw05","36",,,, "merri083","sw06","22",,,, "terri","sw06","16",,,, "turpin","sw05","17",,,, We added nodes as: nodeadd merri001-merri014 groups=all,compute,sw02a,ipmis nodeadd merri015-merri040 groups=all,compute,sw03a,ipmis nodeadd merri041-merri054 groups=all,compute,sw02b,ipmis nodeadd merri055-merri080 groups=all,compute,sw04a,ipmis The groups sw02a, sw03a, sw02b and sw04a define the switch ports thus: sw02a: objtype=group grouptype=static members=merri001,merri002,merri003,merri004,merri005,merri006,merri007,merri008,merri009,merri010,merri011,merri012,merri013,merri014 switch=sw02 switchport=|\D+0*(\d+)|($1)| sw02b: objtype=group grouptype=static members=merri041,merri042,merri043,merri044,merri045,merri046,merri047,merri048,merri049,merri050,merri051,merri052,merri053,merri054 switch=sw02 switchport=|\D+0*(\d+)|(($1-26))| sw03a: objtype=group grouptype=static members=merri015,merri016,merri017,merri018,merri019,merri020,merri021,merri022,merri023,merri024,merri025,merri026,merri027,merri028,merri029,merri030,merri031,merri032,merri033,merri034,merri035,merri036,merri037,merri038,merri039,merri040 switch=sw03 switchport=|\D+0*(\d+)|(($1)-14)| sw04a: objtype=group grouptype=static members=merri055,merri056,merri057,merri058,merri059,merri060,merri061,merri062,merri063,merri064,merri065,merri066,merri067,merri068,merri069,merri070,merri071,merri072,merri073,merri074,merri075,merri076,merri077,merri078,merri079,merri080 switch=sw04 switchport=|\D+0*(\d+)|(($1)-54)| Has anyone seen anything like this? The fact that we've not changed hardware at all, merely RHEL and xCAT makes me think it's xCAT related and the fact that it's not consistent makes me wonder if it's some odd state related problem (or a race condidtion). All the best! Chris - -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: [email protected] Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIAqBgACgkQO2KABBYQAh/pugCeNJw0GImzok7yftoUQezNc8Tl 3/oAn2xA9ywUgi5QYYtH4EqhYJ2IeIx7 =CSDz -----END PGP SIGNATURE----- ------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
<<inline: graycol.gif>>
------------------------------------------------------------------------------ Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
