Lissa,When I tried steps 1 and 2 it worked. I haven't had time to look at it deeper so I don't know if something else changed since the last time I tried or if XCATBYPASS is what fixed it. I'll try to let you know if I do find anything, but for now don't worry about it.
Mike On 8/14/13 4:00 AM, Lissa Valletta wrote:
I have a couple of debug ideas. 1) export XCATBYPASS=1 ( that makes you run without the daemon for debugging) 2) run your nodeset command -- do you see errors 3) unset XCATBYPASS service xcatd stop /opt/xcat/sbin/xcatd -f ( that runs the daemon in the foreground In another window, run nodeset, Do you see anything errors being displayed the daemon 4) Get out of this by service xcatd start in the other window. That should shutdown the xcatd -f and bring you up normal. If not, ps -ef | grep xcatd and kill -9 any xcatd process and then service xcatd restart. If none of this works. Can we get a copy of your database. I would tabprune -a auditlog ( if you are using auditlog) and then dumpxCATdb, tar and compress it and send it to lis...@us.ibm.com. The only other option is can I get on the system. You can send a note to that same userid. Are you just using sqlite ( the default database)? Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 Inactive hide details for Michael Robbert ---08/09/2013 11:33:00 AM---Here is a diff comparing a working and non-working node. Michael Robbert ---08/09/2013 11:33:00 AM---Here is a diff comparing a working and non-working node. [root@mgmt ~]# diff /tmp/compute004.def /tm From: Michael Robbert <mrobb...@mines.edu> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Date: 08/09/2013 11:33 AM Subject: Re: [xcat-user] New nodes not deploying ------------------------------------------------------------------------ Here is a diff comparing a working and non-working node. [root@mgmt ~]# diff /tmp/compute004.def /tmp/compute084.def 1c1 < Object name: compute004 --- > Object name: compute084 3c3 < bmc=compute004-bmc --- > bmc=compute084-bmc 7c7,8 < currstate=netboot centos6.3-x86_64-compute --- > currchain=runcmd=standby > currstate=runcmd=standby 9c10 < initrd=xcat/osimage/mycomputeimage/initrd-stateless.gz --- > initrd=xcat/genesis.fs.x86_64.lzma 11,14c12,15 < ip=172.17.8.4 < kcmdline=imgurl=http://172.17.0.1:80//install/netboot/centos6.3/x86_64/compute/rootimg.gz XCAT=!myipfn!:3001 NODE=compute004 ifname=eth0:00:30:48:f2:87:c4 netdev=eth0 console=tty0 console=ttyS0,115200n8r < kernel=xcat/osimage/mycomputeimage/kernel < mac=00:30:48:f2:87:c4 --- > ip=172.17.8.84 > kcmdline=quiet console=tty0 console=ttyS0,115200 xcatd=172.17.0.1:3001 destiny=runcmd=standby > kernel=xcat/genesis.kernel.x86_64 > mac=00:25:90:19:92:52 21c22 < otherinterfaces=compute004-bmc:172.17.32.4 --- > otherinterfaces=compute084-bmc:172.17.32.84 32,33c33,34 < status=failed < statustime=08-05-2013 16:49:52 --- > status=configuring > statustime=08-08-2013 13:35:03 35,36d35 < updatestatus=synced < updatestatustime=07-29-2013 15:36:16 The only differences I see are IPs and node unique numbering plus the things that I think nodeset should be changing for me. I run this: [root@mgmt ~]# nodeset compute084 osimage=mycomputeimage compute084: netboot centos6.3-x86_64-compute and nothing changes. I have tried running xcatdebug to see if I could see what is happening under the covers, but when I do the daemon stops responding to commands and needs to be restarted. Also regarding the switch name, it is being populated by the nodediscovery process. I'm not using switch discovery though, just sequential discovery. Mike ________________________________________ From: Lissa Valletta [lis...@us.ibm.com] Sent: Friday, August 09, 2013 7:55 To: xCAT Users Mailing list Subject: Re: [xcat-user] New nodes not deploying It the last suggestion does not work, then I would take the two lsdef's and remove any attribute in the bad node that is not in the good node. until you have the exact same attributes defined in the bad node as in the good node. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [Inactive hide details for Michael Robbert ---08/08/2013 03:30:23 PM---Didn't work. I recreated dhcp with those commands. I had]Michael Robbert ---08/08/2013 03:30:23 PM---Didn't work. I recreated dhcp with those commands. I had run makedhcp before, but probably not with From: Michael Robbert <mrobb...@mines.edu> To: <xcat-user@lists.sourceforge.net>, Date: 08/08/2013 03:30 PM Subject: Re: [xcat-user] New nodes not deploying ________________________________ Didn't work. I recreated dhcp with those commands. I had run makedhcp before, but probably not with -n. The nodes are and were showing up in the leases file so that didn't appear to be the problem. I also made some changes to the nodehm table so that the serial attributes are showing up on the new nodes. Still nodeset runs without error, but doesn't change the tftp config file for the node. I'm just not sure how to debug this. It is just failing silently. Thanks for any tips, Mike On 8/8/13 5:16 AM, Lissa Valletta wrote: > Thanks for all the good information! > > Did you run makedhcp -a to pick up the new nodes . Also you will need > to run makeconservercf to pick up the new nodes. > You might need to run makedhcp -n followed by makedhcp -a. > > Also these attributes are missing in the new nodes > serialflow=hard > serialport=0 > serialspeed=115200 > > > Lissa K. Valletta > 8-3/B10 > Poughkeepsie, NY 12601 > (tie 293) 433-3102 > > > > Inactive hide details for Michael Robbert ---08/07/2013 06:30:05 > PM---I've got a small test cluster with working stateless nodeMichael > Robbert ---08/07/2013 06:30:05 PM---I've got a small test cluster with > working stateless nodes. Recently I tried to add 2 more nodes an > > From: Michael Robbert <mrobb...@mines.edu> > To: <xcat-user@lists.sourceforge.net>, > Date: 08/07/2013 06:30 PM > Subject: [xcat-user] New nodes not deploying > > ------------------------------------------------------------------------ > > > > I've got a small test cluster with working stateless nodes. Recently I > tried to add 2 more nodes and I can't get them to deploy the same > stateless image. For some reason the tftp configuration files are > getting touched when I run nodeset, but not changed to point to the > correct boot images. They are staying with genesis boot images. > I have tried various incarnations of nodeadd and nodediscover, always > followed by a nodeset $nodename osimage=mycomputeimage > The nodeset command will change /tftpboot/xcat/xnba/nodes/$nodename file > for the working nodes and it updates the timestamp for the non-working > nodes, but the file still points to the genesis boot images. Am I > missing a step? > > Here is my setup. > xCAT server: > > [root@mgmt nodes]# cat /etc/redhat-release > CentOS release 6.4 (Final) > > [root@mgmt nodes]# rpm -qa|grep -i xcat > ipmitool-xcat-1.8.11-3.x86_64 > xCAT-buildkit-2.8.2-snap201307222332.noarch > xCAT-UI-2.8.2-snap201307222329.noarch > yaboot-xcat-1.3.17-rc1.noarch > xCAT-client-2.8.2-snap201307222328.noarch > xCAT-2.8.2-snap201307222333.x86_64 > xCAT-genesis-base-x86_64-2.8-snap201305300347.noarch > perl-xCAT-2.8.2-snap201307222328.noarch > conserver-xcat-8.1.16-10.x86_64 > xCAT-UI-deps-2.8-2.noarch > syslinux-xcat-3.86-2.noarch > xCAT-genesis-scripts-x86_64-2.8.2-snap201307222333.noarch > elilo-xcat-3.14-4.noarch > openslp-xcat-1.2.1-1.x86_64 > xCAT-server-2.8.2-snap201307222328.noarch > > [root@mgmt nodes]# lsdef -t osimage mycomputeimage > Object name: mycomputeimage > exlist=/install/custom/netboot/centos/compute.exlist > imagetype=linux > osarch=x86_64 > osname=Linux > osvers=centos6.3 > otherpkgdir=/install/post/otherpkgs/centos6.3/x86_64 > otherpkglist=/install/custom/netboot/centos/compute.otherpkgs.pkglist > permission=755 > pkgdir=/install/centos6.3/x86_64 > pkglist=/install/custom/netboot/centos/compute.pkglist > postbootscripts=configiba > postinstall=/install/custom/netboot/centos/compute.postinstall > postscripts=configiba,syncfiles > profile=compute > provmethod=netboot > rootimgdir=/install/netboot/centos6.3/x86_64/compute > synclists=/install/custom/netboot/centos/compute.synclist > > This is a previously configured and currently working node: > [root@mgmt nodes]# lsdef compute004 > Object name: compute004 > arch=x86_64 > bmc=compute004-bmc > bmcport=0 > chain=runcmd=standby > cons=ipmi > currstate=netboot centos6.3-x86_64-compute > groups=compute,ipmi,all > initrd=xcat/osimage/mycomputeimage/initrd-stateless.gz > installnic=eth0 > ip=172.17.8.4 > > kcmdline=imgurl=http://172.17.0.1:80//install/netboot/centos6.3/x86_64/compute/rootimg.gz > > XCAT=!myipfn!:3001 NODE=compute004 ifname=eth0:00:30:48:f2:87:c4 > netdev=eth0 console=tty0 console=ttyS0,115200n8r > kernel=xcat/osimage/mycomputeimage/kernel > mac=00:30:48:f2:87:c4 > mgt=ipmi > netboot=xnba > nfsserver=172.17.0.1 > nodetype=osi > ondiscover=nodediscover > os=centos6.3 > otherinterfaces=compute004-bmc:172.17.32.4 > postbootscripts=otherpkgs,setroute > > postscripts=syslog,remoteshell,syncfiles,setupntp,addexternalyum,configiba,orangefs,setroute > power=ipmi > primarynic=eth0 > profile=compute > provmethod=mycomputeimage > routenames=ct_gc,ct_gc_eth > serialflow=hard > serialport=0 > serialspeed=115200 > status=failed > statustime=08-05-2013 16:49:52 > tftpserver=172.17.0.1 > updatestatus=synced > updatestatustime=07-29-2013 15:36:16 > > This is one of the new nodes that isn't working: > [root@mgmt nodes]# lsdef compute084 > Object name: compute084 > arch=x86_64 > chain=runcmd=standby > currchain=runcmd=standby > currstate=runcmd=standby > groups=compute > initrd=xcat/genesis.fs.x86_64.lzma > installnic=eth0 > ip=172.17.8.84 > kcmdline=quiet xcatd=172.17.0.1:3001 destiny=runcmd=standby > kernel=xcat/genesis.kernel.x86_64 > mac=00:25:90:19:92:52 > mgt=ipmi > netboot=xnba > nfsserver=172.17.0.1 > nodetype=osi > ondiscover=nodediscover > os=centos6.3 > otherinterfaces=compute084-bmc:172.17.32.84 > postbootscripts=otherpkgs,setroute > > postscripts=syslog,remoteshell,syncfiles,setupntp,addexternalyum,configiba,orangefs,setroute > primarynic=eth0 > profile=compute > provmethod=mycomputeimage > routenames=ct_gc,ct_gc_eth > serial=P1715140 > status=booting > statustime=08-07-2013 14:43:40 > supportedarchs=x86,x86_64 > switch=Binary file (standard input) matches > switchport=Binary file (standard input) matches > tftpserver=172.17.0.1 > > Let me know what else you want to see. > > Thanks, > Mike Robbert > Colorado School of Mines > > /(See attached file: > smime.p7s)/------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > > > > _______________________________________________ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > [attachment "smime.p7s" deleted by Lissa Valletta/Poughkeepsie/IBM] ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user [attachment "graycol.gif" deleted by Lissa Valletta/Poughkeepsie/IBM] ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
smime.p7s
Description: S/MIME Cryptographic Signature
------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user