Wanted to post a followup that I have a work around and the details in case 
someone else experiences

this issue.  Credit for this goes to Jarrod Johnson.


Version 7.x and later of the BNT switches do not correctly report Ethernet vs. 
Port Channel details over SNMP.  So during xCAT auto discover, xCAT discovered 
on switch5, PortChannel 1 the

MAC address of c6u29 (which was on switch6), but because of the SNMP bug in the 
BNT firmware,

reported it attached to Ethernet port 1 instead of Port Channel 1, which caused 
xCAT to see all the nodes

on switch6 as c5u1. We didn't experience this during the original discovery 
process because we where

running 6.5.x firmware on our BNT switches, which doesn't appear to be 
susceptible.


The resolution was to rename the port channel to a number that doesn't map to a 
used Ethernet port.  We opted to use Port Channel 48, which isn't a used port 
on any of our switches and there

is no "cXu48" defined in xCAT anywhere in our config.  Once we did that, 
discovery worked without issue.


    -Brad


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - [email protected]
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - [email protected]
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - [email protected]
________________________________
From: Viviano, Brad <[email protected]>
Sent: Monday, April 28, 2014 10:26 AM
To: xCAT Users Mailing list
Subject: [xcat-user] Problem with automatic node discovery on iDataplex nodes.


    I am having a problem with automatic node discovery on xCAT 2.8.3 on our 
iDataplex cluster.  I had a dx360M4 node that we replaced the motherboard and 
processor #1 on due to a possible bad memory controller or DIMM slot issue.  I 
used rmnodecfg to delete the node, which I've done

in the past no problem (node name is c6u29), when I booted the node, it came up 
as c5u1 in genesis

and overwrote the entry in the "mac" table for the actual c5u1.

    I'm using IBM/BNT 8052 network switches, everything was configured/signed 
off by IBM lab services and as far as I can tell, the configuration looks 
correct in xCAT.  I've watched the discovery process via KVM and see it gets a 
temporary address and then reassigns itself as c5u1 instead of c6u29.

    lsdef for c6u29 shows it should be discovered on switch6, switchport 27, 
which is where it's plugged

into (it's where it was discovered when it first came online), the entries in 
the "switch" table look correct:


# tabdump switch
#node,switch,port,vlan,interface,comments,disable
"oddcolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2)|",,,,
"evencolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2-2)|",,,,


"c6u29","evencolumn,compute,idataplex,rack6,c6,all,m7912","noping","04-28-2014 
08:27:23","xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down","04-28-2014 
10:00:23",,,"synced","04-17-2014 13:46:56",,


# lsdef -t node -o c6u29
Object name: c6u29
    appstatus=xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down
    appstatustime=04-28-2014 10:03:19
    arch=x86_64
    bmc=c6u29-bmc
    bmcport=0
    chain=runcmd=bmcsetup,shell
    currstate=netboot rhel6_c-x86_64-compute
    groups=evencolumn,compute,idataplex,rack6,c6,all,m7912
    height=This creates U descriptions in the form A-1 to A-42 and C-1 to C-42
    initrd=xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz
    ip=172.20.107.29
    
kcmdline=imgurl=http://!myipfn!:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
 XCAT=!myipfn!:3001 NODE=c6u29 FC=yes  console=tty0 console=ttyS0,115200
    kernel=xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
    mgt=ipmi
    mtm=7912AC1
    netboot=xnba
    ondiscover=nodediscover
    os=rhel6_c
    otherinterfaces=Compute nodes eth0 interface
    postbootscripts=otherpkgs
    
postscripts=syslog,remoteshell,syncfiles,setupntp,addsiteyum,ipoib,ramdrive,torque
    profile=compute
    provmethod=rhel6_c-x86_64-netboot-compute
    rack=1
    serial=KQ3PL8T
    serialport=0
    serialspeed=115200
    status=noping
    statustime=04-28-2014 08:27:23
    supportedarchs=x86,x86_64
    switch=switch6
    switchport=27
    unit=A6
    updatestatus=synced
    updatestatustime=04-17-2014 13:46:56



/var/log/messages on the xcat master shows:


Apr 28 08:53:12 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPOFFER on 172.20.255.114 to 
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPREQUEST for 172.20.255.114 (172.20.0.1) 
from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPACK on 172.20.255.114 to 40:f2:e9:04:84:c8 
via eth0.500

Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from 
172.20.255.114
Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from 
172.20.255.114
Apr 28 08:53:38 sol-mgmt xCAT node discovery[4071]: c5u1 has been discovered
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPOFFER on 172.20.106.1 to 40:f2:e9:04:84:c8 
via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPREQUEST for 172.20.106.1 (172.20.0.1) from 
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPACK on 172.20.106.1 to 40:f2:e9:04:84:c8 
via eth0.500
Apr 28 08:53:47 sol-mgmt xCAT[32472]: xCAT: Allowing getcredentials x509cert 
from c5u1
Apr 28 08:53:48 sol-mgmt xCAT[32512]: xCAT: Allowing nextdestiny for c5u1 from 
c5u1
Apr 28 08:53:54 sol-mgmt xCAT[32548]: xCAT: Allowing getbmcconfig for c5u1 from 
c5u1


The BNT8052 shows that mac address on switch6, port 27 fine:


switch6#show mac-address-table | include 40:f2:e9:04:84:c8
  40:f2:e9:04:84:c8     500    27              FWD                  N


Something seems off in how xCAT does node discovery with switches, but we used 
this procedure

when we brought the nodes up originally.  I've verified I can snmpwalk all my 
BNT8052's.  So, is there some other troubleshooting procedures you can 
recommend in terms of determing why xCAT thinks c6u29 is c5u1?


    Thanks,

        -Brad Viviano


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - [email protected]
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - [email protected]
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - [email protected]
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to