Chris,

   Two things you need to check:

   1. make sure the nodes which switch and port that calculated from
   regular expression have not duplicated ones. Run the lsdef node to
   check.
   2. for the switch+port which has multiple nodes discovered, you need to
   make sure whether there are multiple nodes (multiple macs on one port)
   come from this port. login to the which and check the mac table for the
   specific port.

Thanks
Best Regards
----------------------------------------------------------------------
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: [email protected]
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193



From:   Christopher Samuel <[email protected]>
To:     xCAT Users Mailing list <[email protected]>,
Date:   2013/08/06 15:50
Subject:        [xcat-user] xCAT node discovery problems



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

We've just reinstalled a cluster that was running RHEL5 with RHEL6 and
gone from xCAT 2.6.9 to 2.8.2.

Reusing the same switch definitions that worked for discovery in xCAT
2.6.9 (and adding the nodes from scratch) we find that we are getting
very unreliable node discovery, often with many nodes discovered as
merri001 or merri015.

Here you can see the first 14 nodes discovered fine, then then next two
are both discovered as merri001:

Aug  6 16:14:23 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.2
Aug  6 16:14:28 merri-m xCAT node discovery: merri001 has been discovered
Aug  6 16:17:28 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.4
Aug  6 16:17:34 merri-m xCAT node discovery: merri002 has been discovered
Aug  6 16:22:19 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.6
Aug  6 16:22:25 merri-m xCAT node discovery: merri003 has been discovered
Aug  6 16:22:50 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.13
Aug  6 16:22:53 merri-m xCAT node discovery: merri004 has been discovered
Aug  6 16:22:59 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.15
Aug  6 16:23:04 merri-m xCAT node discovery: merri005 has been discovered
Aug  6 16:23:09 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.16
Aug  6 16:23:11 merri-m xCAT node discovery: merri007 has been discovered
Aug  6 16:23:11 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.17
Aug  6 16:23:13 merri-m xCAT node discovery: merri006 has been discovered
Aug  6 16:23:14 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.18
Aug  6 16:23:16 merri-m xCAT node discovery: merri008 has been discovered
Aug  6 16:23:16 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.19
Aug  6 16:23:18 merri-m xCAT node discovery: merri010 has been discovered
Aug  6 16:23:22 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.20
Aug  6 16:23:24 merri-m xCAT node discovery: merri009 has been discovered
Aug  6 16:27:40 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.25
Aug  6 16:27:46 merri-m xCAT node discovery: merri011 has been discovered
Aug  6 16:27:46 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.26
Aug  6 16:27:48 merri-m xCAT node discovery: merri012 has been discovered
Aug  6 16:27:55 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.28
Aug  6 16:27:57 merri-m xCAT node discovery: merri013 has been discovered
Aug  6 16:28:01 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.29
Aug  6 16:28:03 merri-m xCAT node discovery: merri014 has been discovered
Aug  6 16:28:10 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.34
Aug  6 16:28:13 merri-m xCAT node discovery: merri001 has been discovered
Aug  6 16:28:16 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.36
Aug  6 16:28:22 merri-m xCAT node discovery: merri001 has been discovered
Aug  6 16:28:26 merri-m xCAT: xcatd: Processing discovery request from
10.6.3.38
Aug  6 16:28:28 merri-m xCAT node discovery: merri001 has been discovered

It's not always the same nodes that are getting misidentified either, I
could handle it if it was consistent.  What is consistent, though is
that it seems to be merri001 and merri015 which accumulate the
duplicates.

Our switch config is:

#node,switch,port,vlan,interface,comments,disable
"merri","sw03","27",,,,
"merri-v","sw04","27",,,,
"sw02b","sw02","|\D+0*(\d+)|(($1-26))|",,,,
"sw02a","sw02","|\D+0*(\d+)|($1)|",,,,
"sw03a","sw03","|\D+0*(\d+)|(($1)-14)|",,,,
"sw04a","sw04","|\D+0*(\d+)|(($1)-54)|",,,,
"nsd","sw05","|\D+0*(\d+)|(($1)+24)|",,,,
"merri081","sw05","35",,,,
"merri082","sw05","36",,,,
"merri083","sw06","22",,,,
"terri","sw06","16",,,,
"turpin","sw05","17",,,,

We added nodes as:

nodeadd merri001-merri014 groups=all,compute,sw02a,ipmis
nodeadd merri015-merri040 groups=all,compute,sw03a,ipmis
nodeadd merri041-merri054 groups=all,compute,sw02b,ipmis
nodeadd merri055-merri080 groups=all,compute,sw04a,ipmis

The groups sw02a, sw03a, sw02b and sw04a define the switch ports thus:

sw02a:
    objtype=group
    grouptype=static

members=merri001,merri002,merri003,merri004,merri005,merri006,merri007,merri008,merri009,merri010,merri011,merri012,merri013,merri014

    switch=sw02
    switchport=|\D+0*(\d+)|($1)|

sw02b:
    objtype=group
    grouptype=static

members=merri041,merri042,merri043,merri044,merri045,merri046,merri047,merri048,merri049,merri050,merri051,merri052,merri053,merri054

    switch=sw02
    switchport=|\D+0*(\d+)|(($1-26))|

sw03a:
    objtype=group
    grouptype=static

members=merri015,merri016,merri017,merri018,merri019,merri020,merri021,merri022,merri023,merri024,merri025,merri026,merri027,merri028,merri029,merri030,merri031,merri032,merri033,merri034,merri035,merri036,merri037,merri038,merri039,merri040

    switch=sw03
    switchport=|\D+0*(\d+)|(($1)-14)|

sw04a:
    objtype=group
    grouptype=static

members=merri055,merri056,merri057,merri058,merri059,merri060,merri061,merri062,merri063,merri064,merri065,merri066,merri067,merri068,merri069,merri070,merri071,merri072,merri073,merri074,merri075,merri076,merri077,merri078,merri079,merri080

    switch=sw04
    switchport=|\D+0*(\d+)|(($1)-54)|



Has anyone seen anything like this?

The fact that we've not changed hardware at all, merely RHEL
and xCAT makes me think it's xCAT related and the fact that
it's not consistent makes me wonder if it's some odd state
related problem (or a race condidtion).

All the best!
Chris
- --
 Christopher Samuel        Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computation Initiative
 Email: [email protected] Phone: +61 (0)3 903 55545
 http://www.vlsci.org.au/      http://twitter.com/vlsci

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIAqBgACgkQO2KABBYQAh/pugCeNJw0GImzok7yftoUQezNc8Tl
3/oAn2xA9ywUgi5QYYtH4EqhYJ2IeIx7
=CSDz
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------

Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your SQL databases under
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

<<inline: graycol.gif>>

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to