I am having a problem with automatic node discovery on xCAT 2.8.3 on our 
iDataplex cluster.  I had a dx360M4 node that we replaced the motherboard and 
processor #1 on due to a possible bad memory controller or DIMM slot issue.  I 
used rmnodecfg to delete the node, which I've done

in the past no problem (node name is c6u29), when I booted the node, it came up 
as c5u1 in genesis

and overwrote the entry in the "mac" table for the actual c5u1.

    I'm using IBM/BNT 8052 network switches, everything was configured/signed 
off by IBM lab services and as far as I can tell, the configuration looks 
correct in xCAT.  I've watched the discovery process via KVM and see it gets a 
temporary address and then reassigns itself as c5u1 instead of c6u29.

    lsdef for c6u29 shows it should be discovered on switch6, switchport 27, 
which is where it's plugged

into (it's where it was discovered when it first came online), the entries in 
the "switch" table look correct:


# tabdump switch
#node,switch,port,vlan,interface,comments,disable
"oddcolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2)|",,,,
"evencolumn","|c(\d+)u(\d+)|switch($1)|","|c(\d+)u(\d+)|($2-2)|",,,,


"c6u29","evencolumn,compute,idataplex,rack6,c6,all,m7912","noping","04-28-2014 
08:27:23","xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down","04-28-2014 
10:00:23",,,"synced","04-17-2014 13:46:56",,


# lsdef -t node -o c6u29
Object name: c6u29
    appstatus=xend=down,sshd=up,rdp=down,https=down,pbs=up,msrpc=down
    appstatustime=04-28-2014 10:03:19
    arch=x86_64
    bmc=c6u29-bmc
    bmcport=0
    chain=runcmd=bmcsetup,shell
    currstate=netboot rhel6_c-x86_64-compute
    groups=evencolumn,compute,idataplex,rack6,c6,all,m7912
    height=This creates U descriptions in the form A-1 to A-42 and C-1 to C-42
    initrd=xcat/osimage/rhel6_c-x86_64-netboot-compute/initrd-stateless.gz
    ip=172.20.107.29
    
kcmdline=imgurl=http://!myipfn!:80//install/netboot/rhel6_c/x86_64/compute/rootimg.gz
 XCAT=!myipfn!:3001 NODE=c6u29 FC=yes  console=tty0 console=ttyS0,115200
    kernel=xcat/osimage/rhel6_c-x86_64-netboot-compute/kernel
    mgt=ipmi
    mtm=7912AC1
    netboot=xnba
    ondiscover=nodediscover
    os=rhel6_c
    otherinterfaces=Compute nodes eth0 interface
    postbootscripts=otherpkgs
    
postscripts=syslog,remoteshell,syncfiles,setupntp,addsiteyum,ipoib,ramdrive,torque
    profile=compute
    provmethod=rhel6_c-x86_64-netboot-compute
    rack=1
    serial=KQ3PL8T
    serialport=0
    serialspeed=115200
    status=noping
    statustime=04-28-2014 08:27:23
    supportedarchs=x86,x86_64
    switch=switch6
    switchport=27
    unit=A6
    updatestatus=synced
    updatestatustime=04-17-2014 13:46:56



/var/log/messages on the xcat master shows:


Apr 28 08:53:12 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPOFFER on 172.20.255.114 to 
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPREQUEST for 172.20.255.114 (172.20.0.1) 
from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:12 sol-mgmt dhcpd: DHCPACK on 172.20.255.114 to 40:f2:e9:04:84:c8 
via eth0.500

Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from 
172.20.255.114
Apr 28 08:53:23 sol-mgmt xCAT[4071]: xcatd: Processing discovery request from 
172.20.255.114
Apr 28 08:53:38 sol-mgmt xCAT node discovery[4071]: c5u1 has been discovered
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPDISCOVER from 40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPOFFER on 172.20.106.1 to 40:f2:e9:04:84:c8 
via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPREQUEST for 172.20.106.1 (172.20.0.1) from 
40:f2:e9:04:84:c8 via eth0.500
Apr 28 08:53:41 sol-mgmt dhcpd: DHCPACK on 172.20.106.1 to 40:f2:e9:04:84:c8 
via eth0.500
Apr 28 08:53:47 sol-mgmt xCAT[32472]: xCAT: Allowing getcredentials x509cert 
from c5u1
Apr 28 08:53:48 sol-mgmt xCAT[32512]: xCAT: Allowing nextdestiny for c5u1 from 
c5u1
Apr 28 08:53:54 sol-mgmt xCAT[32548]: xCAT: Allowing getbmcconfig for c5u1 from 
c5u1


The BNT8052 shows that mac address on switch6, port 27 fine:


switch6#show mac-address-table | include 40:f2:e9:04:84:c8
  40:f2:e9:04:84:c8     500    27              FWD                  N


Something seems off in how xCAT does node discovery with switches, but we used 
this procedure

when we brought the nodes up originally.  I've verified I can snmpwalk all my 
BNT8052's.  So, is there some other troubleshooting procedures you can 
recommend in terms of determing why xCAT thinks c6u29 is c5u1?


    Thanks,

        -Brad Viviano


===================================================
Brad Viviano
High Performance Computing & Scientific Visualization
Lockheed Martin, Supporting the EPA
Research Triangle Park, NC
919-541-2696

HSCSS Task Order Lead - Ravi Nair
919-541-5467 - [email protected]
High Performance Computing Subtask Lead - Durward Jones
919-541-5043 - [email protected]
Environmental Modeling and Visualization Lead - Heidi Paulsen
919-541-1834 - [email protected]
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to