So first I think we need to make sure your /etc/xcat/hostkeys are setup correctly on the Service Nodes. That should have been done during the install of the Service Nodes. Not sure what happened there. You can run xdsh <service> -K and that will correctly setup all the credentials and keys on the Service Nodes. Assuming you have the service as your servicenode group. You will be prompted for root's password.
A couple of things to check after this. Can you ssh to the service node with no password prompt and can you access the database from the servicenode. xdsh sn01,sn02 date xdsh sn01,sn02 tabdump site One other thing rpm -qa | grep xCAT for each service node and return the output. Just checking that the xCATsn-* rpm is on the service nodes not the xCAT-* rpm which is for the Management Node. This has happened before. It looks like you are setup to have a dhcp server running on both service nodes. Isn't this a bad on a flat network. I thought we could only have one dhcp server. This is not my strong area. Having the install directory mounted is typical. Looking at your definition of node001. You do have servicenode name the same as xcatmaster. The servicenode is suppose to be the ip address as known by the management node. The xcatmaster is suppose to be the servicenode address as known by node0001. Is it correct to be the same? Also will the xcatmaster name be resolved correctly during install. servicenode=sn02 xcatmaster=sn02 Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Christian Caruthers/Richmond/IBM@IBMUS To: xcat-user@lists.sourceforge.net Date: 07/21/2011 10:50 AM Subject: [xcat-user] service nodes node working correctly I have a number of compute nodes, 2 service nodes and an xCAT MN (xCAT 2.5.2) on a flat network. I would like to have all available services running off the service nodes, but I am running into some problems. When a compute node PXE boots for stateful install (rhel 6.1) it gets a DHCP response from the xCAT MN rather than the SNs. Looking on the SNs, I see an empty dhcpd.leases file. Running makedhcp doesn't resolve this. After the DHCP response, the node pulls its boot image and installation from the correct SN, but it fails in updating its status and reinstalls after rebooting. If I run nodeset node boot from the MN, some of the postscripts don't appear to run correctly. For example, remoteshell doesn't run, and when I run it using updatenode from the MN, I get an error: <error>Unable to read private DSA key from /etc/xcat/hostkeys</error> <error>Unable to read private RSA key from /etc/xcat/hostkeys</error> Looking on the SNs, I don't see any /etc/xcat/hostkeys directory. What's supposed to set this up? Sharing the /install directory. Currently, my SNs are configured to NFS-mount the /install directory from the MN on boot. Is this correct or should they be syncing that directory? I may have missed it, but the wiki page was unclear on this to me. Finally, Looking on the node that was installed by the SN, I see syslog is configured to log to the SN, but I don't see that happening. nodels sn01 servicenode qservice01: servicenode.dhcpserver: 1 qservice01: servicenode.tftpserver: 1 qservice01: servicenode.node: sn01 qservice01: servicenode.nameserver: 1 qservice01: servicenode.nimserver: 1 qservice01: servicenode.ftpserver: 1 qservice01: servicenode.conserver: 1 qservice01: servicenode.monserver: 1 qservice01: servicenode.nfsserver: 1 qservice01: servicenode.comments: qservice01: servicenode.ldapserver: qservice01: servicenode.ntpserver: qservice01: servicenode.ipforward: qservice01: servicenode.disable: tabdump site #key,value,comments,disable "xcatdport","3001",, "xcatiport","3002",, "tftpdir","/tftpboot",, "master","mn01",, "domain","cluster.net",, "installdir","/install",, "timezone","America/Chicago",, "forwarders","XXX",, "dhcpinterfaces","bond0",, "ntpservers","mn01",, "consoleondemand","yes",, "sharedtftp","0",, "nameservers","mn01",, "installloc","/install",, nodels node0001 noderes qgpu0001: noderes.primarynic: eth0 qgpu0001: noderes.xcatmaster: sn02 qgpu0001: noderes.installnic: eth0 qgpu0001: noderes.netboot: pxe qgpu0001: noderes.servicenode: sn02 qgpu0001: noderes.node: node0001 qgpu0001: noderes.nfsserver: sn02 qgpu0001: noderes.tftpserver: qgpu0001: noderes.comments: qgpu0001: noderes.nfsdir: qgpu0001: noderes.disable: qgpu0001: noderes.discoverynics: qgpu0001: noderes.nimserver: qgpu0001: noderes.cmdinterface: qgpu0001: noderes.next_osimage: qgpu0001: noderes.current_osimage: qgpu0001: noderes.monserver: lsdef sn02 Object name: sn02 arch=x86_64 bmc=sn02-bmc bmcport=0 currchain=boot currstate=boot groups=service,ipmi,bnt102-service,x3650m2,all initrd=xcat/rhels5.4/x86_64/initrd.img installnic=eth0 interface=eth0 ip=XXXXXX kcmdline=nofb utf8 ks=http://mn01/install/autoinst/qservice02 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels5.4/x86_64/vmlinuz mac=E4:1F:13:44:F5:9C mgt=ipmi mtm=7945AC1 netboot=pxe nfsserver=mn01 os=rhels5.4 postbootscripts=otherpkgs,setupntp,setupntp postscripts=syslog,remoteshell,syncfiles,nwu.service,servicenode,xcatserver,xcatclient primarynic=eth0 profile=service provmethod=install serial= 06GA470 serialport=0 serialspeed=115200 servicenode=mn01 setupconserver=1 setupdhcp=1 setupftp=1 setupnameserver=1 setupnfs=1 setupnim=1 setuptftp=1 status=booting statustime=07-20-2011 16:25:39 switch=bnt102 switchport=8 tftpserver=mn01 xcatmaster=mn01 lsdef node0001 Object name: node0001 arch=x86_64 bmc=node0001-bmc bmcport=0 chain=runcmd=bmcsetup,standby currchain=boot currstate=boot groups=gpu,ipmi,dx360m3,gpubnt01,gpurack01,all,allgpu initrd=xcat/rhels6.1/x86_64/initrd.img installnic=eth0 interface=eth0 ip=XXXXXX kcmdline=nofb utf8 ks=http://sn02/install/autoinst/qgpu0001 ksdevice=eth0 console=tty0 console=ttyS0,115200n8r noipv6 kernel=xcat/rhels6.1/x86_64/vmlinuz mac=e4:1f:13:f0:80:9c mgt=ipmi mtm=6391AC1 netboot=pxe nfsserver=sn02 ondiscover=nodediscover os=rhels6.1 postbootscripts=otherpkgs,setupntp,nwu.ipoib postscripts=syslog,remoteshell,syncfiles,nwu.ofed primarynic=eth0 profile=gpu provmethod=install serial=06CGM96 serialflow=hard serialport=0 serialspeed=115200 servicenode=sn02 status=booted statustime=07-20-2011 22:23:36 supportedarchs=x86,x86_64 switch=gpubnt01 switchinterface=eth0 switchport=1 switchvlan=1 xcatmaster=sn02 Christian D. Caruthers Linux HPC Consultant STG Lab Services 757-656-9675 ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user