Wanted to post a followup that I have a work around and the details in case
someone else experiences
this issue. Credit for this goes to Jarrod Johnson.
Version 7.x and later of the BNT switches do not correctly report Ethernet vs.
Port Channel details over SNMP. So during xCAT auto discover, xCAT discovered
on switch5, PortChannel 1 the
MAC address of c6u29 (which was on switch6), but because of the SNMP bug in the
BNT firmware,
reported it attached to Ethernet port 1 instead of Port Channel 1, which caused
xCAT to see all the nodes
on switch6 as c5u1. We didn't experience this during the original discovery
process because we where
running 6.5.x firmware on our BNT switches, which doesn't appear to be
susceptible.
The resolution was to rename the port channel to a number that doesn't map to a
used Ethernet port. We opted to use Port Channel 48, which isn't a used port
on any of our switches and there
is no "cXu48" defined in xCAT anywhere in our config. Once we did that,
discovery worked without issue.
-Brad
===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696
HSCSS Task Order Lead - Ravi Nair
919-541-5467 - [email protected]
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - [email protected]
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - [email protected]
________________________________
From: Viviano, Brad <[email protected]>
Sent: Monday, April 28, 2014 10:26 AM
To: xCAT Users Mailing list
Subject: [xcat-user] Problem with automatic node discovery on iDataplex nodes.
I am having a problem with automatic node discovery on xCAT 2.8.3 on our
iDataplex cluster. I had a dx360M4 node that we replaced the motherboard and
processor #1 on due to a possible bad memory controller or DIMM slot issue. I
used rmnodecfg to delete the node, which I've done
in the past no problem (node name is c6u29), when I booted the node, it came up
as c5u1 in genesis
and overwrote the entry in the "mac" table for the actual c5u1.
I'm using IBM/BNT 8052 network switches, everything was configured/signed
off by IBM lab services and as far as I can tell, the configuration looks
correct in xCAT. I've watched the discovery process via KVM and see it gets a
temporary address and then reassigns itself as c5u1 instead of c6u29.
lsdef for c6u29 shows it should be discovered on switch6, switchport 27,
which is where it's plugged
into (it's where it was discovered when it first came online), the entries in
the "switch" table look correct:
# tabdump switch
#node,switch,port,vlan,interface,comments,disable
"oddcolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2)|",,,,
"evencolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2-2)|",,,,
"c6u29","evencolumn,compute,idataplex,rack6,c6,all,m7912","noping","04-28-2014
08:27:23","xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down","04-28-2014
10:00:23",,,"synced","04-17-2014 13:46:56",,
# lsdef -t node -o c6u29
Object name: c6u29
appstatus=xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down
appstatustime=04-28-2014 10:03:19
arch=x86_64
bmc=c6u29-bmc
bmcport=0
chain=runcmd=bmcsetup,shell
currstate=netboot rhel6_c-x86_64-compute
groups=evencolumn,compute,idataplex,rack6,c6,all,m7912
height=This creates U descriptions in the form A-1 to A-42 and C-1 to C-42
initrd=xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz
ip=172.20.107.29
kcmdline=imgurl=http://!myipfn!:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
XCAT=!myipfn!:3001 NODE=c6u29 FC=yes console=tty0 console=ttyS0,115200
kernel=xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
mgt=ipmi
mtm=7912AC1
netboot=xnba
ondiscover=nodediscover
os=rhel6_c
otherinterfaces=Compute nodes eth0 interface
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles,setupntp,addsiteyum,ipoib,ramdrive,torque
profile=compute
provmethod=rhel6_c-x86_64-netboot-compute
rack=1
serial=KQ3PL8T
serialport=0
serialspeed=115200
status=noping
statustime=04-28-2014 08:27:23
supportedarchs=x86,x86_64
switch=switch6
switchport=27
unit=A6
updatestatus=synced
updatestatustime=04-17-2014 13:46:56
/var/log/messages on the xcat master shows:
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPOFFER on 172.20.255.114 to
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPREQUEST for 172.20.255.114 (172.20.0.1)
from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPACK on 172.20.255.114 to 40:f2:e9:04:84:c8
via eth0.500
Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from
172.20.255.114
Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from
172.20.255.114
Apr 28 08:53:38 sol-mgmt xCAT node discovery[4071]: c5u1 has been discovered
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPOFFER on 172.20.106.1 to 40:f2:e9:04:84:c8
via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPREQUEST for 172.20.106.1 (172.20.0.1) from
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPACK on 172.20.106.1 to 40:f2:e9:04:84:c8
via eth0.500
Apr 28 08:53:47 sol-mgmt xCAT[32472]: xCAT: Allowing getcredentials x509cert
from c5u1
Apr 28 08:53:48 sol-mgmt xCAT[32512]: xCAT: Allowing nextdestiny for c5u1 from
c5u1
Apr 28 08:53:54 sol-mgmt xCAT[32548]: xCAT: Allowing getbmcconfig for c5u1 from
c5u1
The BNT8052 shows that mac address on switch6, port 27 fine:
switch6#show mac-address-table | include 40:f2:e9:04:84:c8
40:f2:e9:04:84:c8 500 27 FWD N
Something seems off in how xCAT does node discovery with switches, but we used
this procedure
when we brought the nodes up originally. I've verified I can snmpwalk all my
BNT8052's. So, is there some other troubleshooting procedures you can
recommend in terms of determing why xCAT thinks c6u29 is c5u1?
Thanks,
-Brad Viviano
===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696
HSCSS Task Order Lead - Ravi Nair
919-541-5467 - [email protected]
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - [email protected]
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - [email protected]
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user