Thanks for the help! This is the last thing I have to get working to complete this engagement!
Ran xdsh service -K but no hostkeys were copied. The /etc/xcat/hostkeys directory exists on the MN, and the noderes.xcatmaster and noderes.servicenode are the IP of the MN. I verified the root password on both nodes by logging in through the console. The xdsh command came back with: /usr/bin/ssh setup is complete return code = 0 Running xdsh service date and xdsh service tabdump site come back with the correct results. sn01: rpm -qa | grep xCAT xCAT-nbroot-oss-x86-2.0-snap200804021050 xCAT-nbroot-oss-ppc64-2.0-snap200801291320 perl-xCAT-2.5.2-snap201103041120 xCAT-nbkernel-x86-2.6.18_164-8 xCAT-nbroot-oss-x86_64-2.0-snap200801291344 xCAT-nbroot-core-x86_64-2.5.1-snap201011121008 xCAT-nbroot-core-x86-2.5.1-snap201011121008 xCAT-nbroot-core-ppc64-2.5.1-snap201011121008 xCAT-client-2.5.2-snap201102251840 xCAT-nbkernel-x86_64-2.6.18_164-8 xCAT-nbkernel-ppc64-2.6.18_92-4 xCATsn-2.5.1-snap201011101325 xCAT-server-2.5.2-snap201103041121 xCATsn-2.5.1-snap201011101325 sn02: rpm -qa | grep xCAT xCAT-nbroot-oss-x86-2.0-snap200804021050 xCAT-nbroot-oss-ppc64-2.0-snap200801291320 perl-xCAT-2.5.2-snap201103041120 xCAT-nbkernel-x86-2.6.18_164-8 xCAT-nbroot-oss-x86_64-2.0-snap200801291344 xCAT-nbroot-core-x86_64-2.5.1-snap201011121008 xCAT-nbroot-core-x86-2.5.1-snap201011121008 xCAT-nbroot-core-ppc64-2.5.1-snap201011121008 xCAT-client-2.5.2-snap201102251840 xCAT-nbkernel-x86_64-2.6.18_164-8 xCAT-nbkernel-ppc64-2.6.18_92-4 xCATsn-2.5.1-snap201011101325 xCAT-server-2.5.2-snap201103041121 xCATsn-2.5.1-snap201011101325 As for the DHCP, I gathered from an earlier email from Linda Mellor on the recent thread on SNs: "And the network possibilities with all of this can start to be mind-boggling, but we try to address the most common ones as best we can: - the entire cluster on one flat network. This means there will be multiple DHCP servers (and tftp servers), and xCAT needs to configure any DHCP server to respond correctly to a broadcast request on the network, so all dhcpd.leases files will need to be identical, with the "next-server" value set to the designated tftpserver for a given node." When I do nodeset node001 install, the tftpboot/pxelinux.cfg/node001 file shows sn02 as the host it's pulling its files from. Also in /install/autoinst on the MN & both SNs, the kickstart file shows url=http://sn02/... and all that appears to work. When I manually nodeset node001 boot and it boots up to the OS, I can see that DNS is working properly. Could it be a matter of DNS is supposed to be configured on the SNs (I have servicenode.nameserver=1) and it's not configured despite the fact that the site.nameservers shows the IP for the xCAT MN (I edited that out of my earlier email). Should site.nameservers have the SN IPs in there as well for DNS to work? What about site.dhcpinterfaces? Will adding the SNs in there cause makedhcp to build working DHCP configs on the SNs? Christian D. Caruthers Linux HPC Consultant STG Lab Services 757-656-9675 From: Lissa Valletta/Poughkeepsie/IBM@IBMUS To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> Cc: xcat-user@lists.sourceforge.net Date: 07/21/2011 01:26 PM Subject: Re: [xcat-user] service nodes node working correctly So first I think we need to make sure your /etc/xcat/hostkeys are setup correctly on the Service Nodes. That should have been done during the install of the Service Nodes. Not sure what happened there. You can run xdsh <service> -K and that will correctly setup all the credentials and keys on the Service Nodes. Assuming you have the service as your servicenode group. You will be prompted for root's password. A couple of things to check after this. Can you ssh to the service node with no password prompt and can you access the database from the servicenode. xdsh sn01,sn02 date xdsh sn01,sn02 tabdump site One other thing rpm -qa | grep xCAT for each service node and return the output. Just checking that the xCATsn-* rpm is on the service nodes not the xCAT-* rpm which is for the Management Node. This has happened before. It looks like you are setup to have a dhcp server running on both service nodes. Isn't this a bad on a flat network. I thought we could only have one dhcp server. This is not my strong area. Having the install directory mounted is typical. Looking at your definition of node001. You do have servicenode name the same as xcatmaster. The servicenode is suppose to be the ip address as known by the management node. The xcatmaster is suppose to be the servicenode address as known by node0001. Is it correct to be the same? Also will the xcatmaster name be resolved correctly during install. servicenode=sn02 xcatmaster=sn02 Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Christian Caruthers/Richmond/IBM@IBMUS To: xcat-user@lists.sourceforge.net Date: 07/21/2011 10:50 AM Subject: [xcat-user] service nodes node working correctly I have a number of compute nodes, 2 service nodes and an xCAT MN (xCAT 2.5.2) on a flat network. I would like to have all available services running off the service nodes, but I am running into some problems. When a compute node PXE boots for stateful install (rhel 6.1) it gets a DHCP response from the xCAT MN rather than the SNs. Looking on the SNs, I see an empty dhcpd.leases file. Running makedhcp doesn't resolve this. After the DHCP response, the node pulls its boot image and installation from the correct SN, but it fails in updating its status and reinstalls after rebooting. If I run nodeset node boot from the MN, some of the postscripts don't appear to run correctly. For example, remoteshell doesn't run, and when I run it using updatenode from the MN, I get an error: <error>Unable to read private DSA key from /etc/xcat/hostkeys</error> <error>Unable to read private RSA key from /etc/xcat/hostkeys</error> Looking on the SNs, I don't see any /etc/xcat/hostkeys directory. What's supposed to set this up? Sharing the /install directory. Currently, my SNs are configured to NFS-mount the /install directory from the MN on boot. Is this correct or should they be syncing that directory? I may have missed it, but the wiki page was unclear on this to me. Finally, Looking on the node that was installed by the SN, I see syslog is configured to log to the SN, but I don't see that happening. nodels sn01 servicenode qservice01: servicenode.dhcpserver: 1 qservice01: servicenode.tftpserver: 1 qservice01: servicenode.node: sn01 qservice01: servicenode.nameserver: 1 qservice01: servicenode.nimserver: 1 qservice01: servicenode.ftpserver: 1 qservice01: servicenode.conserver: 1 qservice01: servicenode.monserver: 1 qservice01: servicenode.nfsserver: 1 qservice01: servicenode.comments: qservice01: servicenode.ldapserver: qservice01: servicenode.ntpserver: qservice01: servicenode.ipforward: qservice01: servicenode.disable: tabdump site #key,value,comments,disable "xcatdport","3001",, "xcatiport","3002",, "tftpdir","/tftpboot",, "master","mn01",, "domain","cluster.net",, "installdir","/install",, "timezone","America/Chicago",, "forwarders","XXX",, "dhcpinterfaces","bond0",, "ntpservers","mn01",, "consoleondemand","yes",, "sharedtftp","0",, "nameservers","mn01",, "installloc","/install",, nodels node0001 noderes qgpu0001: noderes.primarynic: eth0 qgpu0001: noderes.xcatmaster: sn02 qgpu0001: noderes.installnic: eth0 qgpu0001: noderes.netboot: pxe qgpu0001: noderes.servicenode: sn02 qgpu0001: noderes.node: node0001 qgpu0001: noderes.nfsserver: sn02 qgpu0001: noderes.tftpserver: qgpu0001: noderes.comments: qgpu0001: noderes.nfsdir: qgpu0001: noderes.disable: qgpu0001: noderes.discoverynics: qgpu0001: noderes.nimserver: qgpu0001: noderes.cmdinterface: qgpu0001: noderes.next_osimage: qgpu0001: noderes.current_osimage: qgpu0001: noderes.monserver: lsdef sn02 Object name: sn02 arch=x86_64 bmc=sn02-bmc bmcport=0 currchain=boot currstate=boot groups=service,ipmi,bnt102-service,x3650m2,all initrd=xcat/rhels5.4/x86_64/initrd.img installnic=eth0 interface=eth0 ip=XXXXXX kcmdline=nofb utf8 ks=http://mn01/install/autoinst/qservice02 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels5.4/x86_64/vmlinuz mac=E4:1F:13:44:F5:9C mgt=ipmi mtm=7945AC1 netboot=pxe nfsserver=mn01 os=rhels5.4 postbootscripts=otherpkgs,setupntp,setupntp postscripts=syslog,remoteshell,syncfiles,nwu.service,servicenode,xcatserver,xcatclient primarynic=eth0 profile=service provmethod=install serial= 06GA470 serialport=0 serialspeed=115200 servicenode=mn01 setupconserver=1 setupdhcp=1 setupftp=1 setupnameserver=1 setupnfs=1 setupnim=1 setuptftp=1 status=booting statustime=07-20-2011 16:25:39 switch=bnt102 switchport=8 tftpserver=mn01 xcatmaster=mn01 lsdef node0001 Object name: node0001 arch=x86_64 bmc=node0001-bmc bmcport=0 chain=runcmd=bmcsetup,standby currchain=boot currstate=boot groups=gpu,ipmi,dx360m3,gpubnt01,gpurack01,all,allgpu initrd=xcat/rhels6.1/x86_64/initrd.img installnic=eth0 interface=eth0 ip=XXXXXX kcmdline=nofb utf8 ks=http://sn02/install/autoinst/qgpu0001 ksdevice=eth0 console=tty0 console=ttyS0,115200n8r noipv6 kernel=xcat/rhels6.1/x86_64/vmlinuz mac=e4:1f:13:f0:80:9c mgt=ipmi mtm=6391AC1 netboot=pxe nfsserver=sn02 ondiscover=nodediscover os=rhels6.1 postbootscripts=otherpkgs,setupntp,nwu.ipoib postscripts=syslog,remoteshell,syncfiles,nwu.ofed primarynic=eth0 profile=gpu provmethod=install serial=06CGM96 serialflow=hard serialport=0 serialspeed=115200 servicenode=sn02 status=booted statustime=07-20-2011 22:23:36 supportedarchs=x86,x86_64 switch=gpubnt01 switchinterface=eth0 switchport=1 switchvlan=1 xcatmaster=sn02 Christian D. Caruthers Linux HPC Consultant STG Lab Services 757-656-9675 ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ------------------------------------------------------------------------------ 5 Ways to Improve & Secure Unified Communications Unified Communications promises greater efficiencies for business. UC can improve internal communications as well as offer faster, more efficient ways to interact with customers and streamline customer service. Learn more! http://www.accelacomm.com/jaw/sfnl/114/51426253/ _______________________________________________ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user