Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemdOSes?
I see. Thank you. -Josh On Thu, Aug 4, 2016 at 7:15 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: > On 05/08/16 02:32, Josh Nielsen wrote: > > > It looks like that uses the nics table which is currently empty in my > > config. The examples on that page show IPs being entered in > > manually/individually though. How does that relate to the regular > > expressions I've created for each of my hosts in the 'hosts' table that > > automatically assign an IP address based on the hostname? > > You can use regular expressions in the nics table as well: > > #node,nicips,nichostnamesuffixes,nichostnameprefixes,nictypes, > niccustomscripts,nicnetworks,nicaliases,comments,disable > "compute","|\D+(\d+)|ib0!10.2.0.($1*1)|","ib0!-ib",,"ib0! > Infiniband",,"ib0!bruce_infiniband",,, > > Gives (for example): > > [root@bruce-m ~]# lsdef bruce001 | fgrep 10. | fgrep -v kcmdline > ip=10.13.0.1 > nicips.ib0=10.2.0.1 > otherinterfaces=-bmc:10.12.0.1 > > [root@bruce-m ~]# lsdef bruce003 | fgrep 10. | fgrep -v kcmdline > ip=10.13.0.3 > nicips.ib0=10.2.0.3 > otherinterfaces=-bmc:10.12.0.3 > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemdOSes?
Thanks Wang, It looks like that uses the nics table which is currently empty in my config. The examples on that page show IPs being entered in manually/individually though. How does that relate to the regular expressions I've created for each of my hosts in the 'hosts' table that automatically assign an IP address based on the hostname? Josh On Wed, Aug 3, 2016 at 7:06 PM, Xiao Peng Wang <w...@cn.ibm.com> wrote: > xCAT recommends to use 'confignics' postscripts to achieve that. Refer to > the doc: > http://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/diskful/customize_image/cfg_network_adapter.html > > > Thanks > Best Regards > -- > Wang Xiaopeng (王晓朋) > IBM China System Technology Laboratory > Tel: 86-10-82453455 > Email: w...@cn.ibm.com > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > Haidian District Beijing P.R.China 100193 > > > > - Original message - > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Cc: > Subject: [xcat-user] Is there an ifcfg-eth postscript that works on > systemd OSes? > Date: Thu, Aug 4, 2016 4:24 AM > > Hello, > > I am now testing the deployment of Centos 7 in my environment and I've > noticed that the ifcfg-eth postscript is not geared to work with it. For > starters Centos 7 doesn't install ifconfig by default, though I've solved > that with my kickstart, but more to the point the postscript explicitly > looks for "Ethernet" in the ifconfig line to grab the interface name which > doesn't work on systemd OSes like Centos 7. And of course there's the > change from "eth" interface names to "ens" and a variety of other names. > > I changed the line that looked like this in the postscript: > > interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}') > > To this: > > interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}') > > And while that does parse out the ens interface names now they come with a > colon tacked on to the end of them in the ifconfig output like this: > ens160: > ens192: > > I can parse that out with a regex substitution to remove the colon, but > before I hack the default script up too much has there been an alternative > ifcfg-eth postscript released for systems like this? I'm using this script > to change the /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP > to the static addresses defined through xCAT, which works fine on my Centos > 6 OSes. > > This is my xCAT version: > lsxcatd -v > Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built > Mon Nov 30 05:43:11 EST 2015) > > Thanks, > Josh Nielsen > > > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > > -- > > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemd OSes?
Thanks! Yes, I knew ifconfig was deprecated, hence why I knew this was a hack and was asking. Is that postscript something you wrote yourself? Thanks, Josh On Wed, Aug 3, 2016 at 3:49 PM, Rich Sudlow <r...@nd.edu> wrote: > On 08/03/2016 03:54 PM, Josh Nielsen wrote: > >> Hello, >> >> I am now testing the deployment of Centos 7 in my environment and I've >> noticed >> that the ifcfg-eth postscript is not geared to work with it. For starters >> Centos >> 7 doesn't install ifconfig by default, though I've solved that with my >> kickstart, but more to the point the postscript explicitly looks for >> "Ethernet" >> in the ifconfig line to grab the interface name which doesn't work on >> systemd >> OSes like Centos 7. And of course there's the change from "eth" interface >> names >> to "ens" and a variety of other names. >> >> I changed the line that looked like this in the postscript: >> >> interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}') >> >> To this: >> >> interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}') >> >> And while that does parse out the ens interface names now they come with >> a colon >> tacked on to the end of them in the ifconfig output like this: >> ens160: >> ens192: >> >> I can parse that out with a regex substitution to remove the colon, but >> before I >> hack the default script up too much has there been an alternative >> ifcfg-eth >> postscript released for systems like this? I'm using this script to >> change the >> /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP to the static >> addresses defined through xCAT, which works fine on my Centos 6 OSes. >> >> This is my xCAT version: >> lsxcatd -v >> Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built >> Mon Nov >> 30 05:43:11 EST 2015) >> >> Thanks, >> Josh Nielsen >> > > I believe all the latest xcat routines use ip addr > > as you might know the use if ifconfig is deprecated... > > here's a snippet from a postscript which uses ksh > > > if [[ $OSVER = *rhels7* ]]; then > # This just hardcodes the entries which are already set > # Change to grep only on "inet " rather than "inet addr" so that rhels7 > works - RKS - 8/21/2014 > for nic in `ifconfig -a|grep -B1 "inet "|awk '{print $1}'|grep -v > inet|grep -v -- --|grep -v lo|sed s/:$//`; do > echo "Setting up hardeths on rhels7" >> /root/post.log >echo NIC $nic >echo NIC $nic >> /root/post.log >IPADDR=`ifconfig $nic |grep "inet "|awk '{print $2}' |awk -F: '{print > $1}'` >echo "IPADDR: $IPADDR" >> /root/post.log >NETMASK=`ifconfig $nic |grep "inet "|awk '{print $4}' |awk -F: '{print > $1}'` >echo "NETMASK: $NETMASK" >> /root/post.log >sed -i s/BOOTPROTO=dhcp/BOOTPROTO=none/ > /etc/sysconfig/network-scripts/ifcfg-$nic >sed -i s/ONBOOT=no/ONBOOT=yes/ /etc/sysconfig/network-scripts/ifcfg-$nic >echo IPADDR=$IPADDR >> /etc/sysconfig/network-scripts/ifcfg-$nic >echo NETMASK=$NETMASK >> /etc/sysconfig/network-scripts/ifcfg-$nic ># Remove firewalld since we're using iptables - RKS 9/30/14 >yum remove -y firewalld | logger -t xcat >echo "Done with hardeths on rhels7" >> /root/post.log > done > > Hope this help you out. > > > > >> >> >> >> >> -- >> >> >> >> ___ >> xCAT-user mailing list >> xCAT-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/xcat-user >> >> > > -- > Rich Sudlow > University of Notre Dame > Center for Research Computing - Union Station > 506 W. South St > South Bend, In 46601 > > (574) 631-7258 (office) > (574) 807-1046 (cell) > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] Is there an ifcfg-eth postscript that works on systemd OSes?
Hello, I am now testing the deployment of Centos 7 in my environment and I've noticed that the ifcfg-eth postscript is not geared to work with it. For starters Centos 7 doesn't install ifconfig by default, though I've solved that with my kickstart, but more to the point the postscript explicitly looks for "Ethernet" in the ifconfig line to grab the interface name which doesn't work on systemd OSes like Centos 7. And of course there's the change from "eth" interface names to "ens" and a variety of other names. I changed the line that looked like this in the postscript: interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}') To this: interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}') And while that does parse out the ens interface names now they come with a colon tacked on to the end of them in the ifconfig output like this: ens160: ens192: I can parse that out with a regex substitution to remove the colon, but before I hack the default script up too much has there been an alternative ifcfg-eth postscript released for systems like this? I'm using this script to change the /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP to the static addresses defined through xCAT, which works fine on my Centos 6 OSes. This is my xCAT version: lsxcatd -v Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built Mon Nov 30 05:43:11 EST 2015) Thanks, Josh Nielsen -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?
Well, I should have looked in the logs first. There were more detailed messages in /var/log/messages on the MN: Jun 16 14:10:14 xcat-master xcat[30550]: Error dispatching request to xcat-serv1:3001, trying other service nodes: Connection failure: SSL connect attempt failed because of handshake problems error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at /opt/xcat/lib/perl/xCAT/Client.pm line 265. Jun 16 14:10:15 xcat-master xcat[30550]: Error dispatching request to xcat-serv2:3001, trying other service nodes: Connection failure: SSL connect attempt failed because of handshake problems error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at /opt/xcat/lib/perl/xCAT/Client.pm line 265. Which SSL cert or key is involved in this connection? Although I copied over the rsa keys in /root/.ssh from the old MN to the new one I did not do the same for either /etc/xcat/cert/ or /etc/ssh/. Might a missing key or cert from either of those directories be responsible for that error? Thanks, Josh On Thu, Jun 16, 2016 at 2:23 PM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Xiao, > > Okay, so I followed those four steps with some modifications. I did 1 & 4 > as instructed with no issues. The service nodes are getting their database > access from the new MN now, and I updated the SN object definitions to > point xcatmaster, tftpserver, and other relevant parameters to the new MN. > > I avoided step #3 because I just copied the old /root/.ssh/id_rsa and > corresponding .pub file to the new MN and passwordless logon works fine. I > also tested this from the two service nodes to make sure they could fetch > the host keys: "USEOPENSSLFORXCAT=yes XCATSERVER=:3001 > /xcatpost/getcredentials.awk ssh_rsa_hostkey. Is that sufficient for the > key step? > > And lastly for #3 I only selectively updated certain packages on the SNs > like syslog and NTP, because I didn't want to run all of the packages and > in particular the servicenode postscript. > > So, I was able to use updatenode with no issues from the new MN to update > the SNs, however when I try to update any cluster client nodes it is having > problems dispatching to the service nodes in the hierarchy: > > > # updatenode node0010 -P addsiteyum > Error: Failed to dispatch command to any of the following service nodes: > xcat-serv1,xcat-serv2 > > What is most likely causing that issue? > > Thanks, > Josh > > On Fri, Jun 3, 2016 at 7:01 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote: > >> I think we should talk it as opposite way that how to make the MN to use >> the new SN. >> >> Following steps are necessary to switch a SN: >> 1. rerun 'mysqlsetup -f' to assign the access permission for SN to access >> DB on MN >> 2. run 'updatenode -k ' to set up the ssh key >> 3. run 'updatenode -P' to update the SN >> 4. change the 'servicenode' attribute for compute node accordingly. >> >> >> Thanks >> Best Regards >> -- >> Wang Xiaopeng (王晓朋) >> IBM China System Technology Laboratory >> Tel: 86-10-82453455 >> Email: w...@cn.ibm.com >> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, >> Haidian District Beijing P.R.China 100193 >> >> >> >> - Original message - >> From: Josh Nielsen <jniel...@hudsonalpha.org> >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> Cc: >> Subject: Re: [xcat-user] How can I migrate to a new xCAT MN in a >> hierarchical environment? >> Date: Thu, Jun 2, 2016 3:49 AM >> >> Can anyone verify if simply updating cfgloc should be all I need to for >> the SNs to start using the new MN? By pointing it to the new MN's MySQL >> instance, which has a site table with the new MN specified as the >> xcatmaster, it should even update the content the the xcatmaster value >> shown in an 'lsdef' of the service nodes automatically, right? >> >> Thanks, >> Josh >> >> On Tue, May 17, 2016 at 3:42 PM, Josh Nielsen <jniel...@hudsonalpha.org> >> wrote: >> >> A correction below for something I wrote previously. >> >> "...and the SNs then shouldn't need newly generated keys (right?)..." >> >> On Tue, May 17, 2016 at 3:36 PM, Josh Nielsen <jniel...@hudsonalpha.org> >> wrote: >> >> I looked at the 'servicenode' postscript and it does _way_ too much for >> what I want to accomplish. I don't think the script was written with >> changes or upgrades in mind. It looks like it freshly copies everything to >> the SNs' $installdir/postscripts and /etc/xcat on the service node and
Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?
Xiao, Okay, so I followed those four steps with some modifications. I did 1 & 4 as instructed with no issues. The service nodes are getting their database access from the new MN now, and I updated the SN object definitions to point xcatmaster, tftpserver, and other relevant parameters to the new MN. I avoided step #3 because I just copied the old /root/.ssh/id_rsa and corresponding .pub file to the new MN and passwordless logon works fine. I also tested this from the two service nodes to make sure they could fetch the host keys: "USEOPENSSLFORXCAT=yes XCATSERVER=:3001 /xcatpost/getcredentials.awk ssh_rsa_hostkey. Is that sufficient for the key step? And lastly for #3 I only selectively updated certain packages on the SNs like syslog and NTP, because I didn't want to run all of the packages and in particular the servicenode postscript. So, I was able to use updatenode with no issues from the new MN to update the SNs, however when I try to update any cluster client nodes it is having problems dispatching to the service nodes in the hierarchy: # updatenode node0010 -P addsiteyum Error: Failed to dispatch command to any of the following service nodes: xcat-serv1,xcat-serv2 What is most likely causing that issue? Thanks, Josh On Fri, Jun 3, 2016 at 7:01 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote: > I think we should talk it as opposite way that how to make the MN to use > the new SN. > > Following steps are necessary to switch a SN: > 1. rerun 'mysqlsetup -f' to assign the access permission for SN to access > DB on MN > 2. run 'updatenode -k ' to set up the ssh key > 3. run 'updatenode -P' to update the SN > 4. change the 'servicenode' attribute for compute node accordingly. > > > Thanks > Best Regards > -- > Wang Xiaopeng (王晓朋) > IBM China System Technology Laboratory > Tel: 86-10-82453455 > Email: w...@cn.ibm.com > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > Haidian District Beijing P.R.China 100193 > > > > - Original message - > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Cc: > Subject: Re: [xcat-user] How can I migrate to a new xCAT MN in a > hierarchical environment? > Date: Thu, Jun 2, 2016 3:49 AM > > Can anyone verify if simply updating cfgloc should be all I need to for > the SNs to start using the new MN? By pointing it to the new MN's MySQL > instance, which has a site table with the new MN specified as the > xcatmaster, it should even update the content the the xcatmaster value > shown in an 'lsdef' of the service nodes automatically, right? > > Thanks, > Josh > > On Tue, May 17, 2016 at 3:42 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > > A correction below for something I wrote previously. > > "...and the SNs then shouldn't need newly generated keys (right?)..." > > On Tue, May 17, 2016 at 3:36 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > > I looked at the 'servicenode' postscript and it does _way_ too much for > what I want to accomplish. I don't think the script was written with > changes or upgrades in mind. It looks like it freshly copies everything to > the SNs' $installdir/postscripts and /etc/xcat on the service node and > generates (new?) keys. The SNs don't need those updates/changes in my case. > From looking at the following comment in the 'servicenode' postscript and > the code I'm wondering if all I need to do is manually > modify /etc/xcat/cfgloc to update the IP for the new MN database location > and if everything else will be fine. They keys should already be in place > because I am copying the same keys from the old MN onto the new MN server, > and the SNs then shouldn't need to keys (right?). Please let me know if you > see any problems with this. > > The comment in the code: > > For Linux: >It calls xcatserver and xcatclient script to get the ssh keys, ssl >redentials and cfgloc file and transfer from the MN to the SN >to be able to access the >database, setup ssh keys on the nodes and have daemon to daemon >commmunication between the SN and MN and have the SN access the DB. > > > P.S. Also would just giving the new MN the same IP and hostname (even as > an alias to a different primary hostname) more or less prevent any changes > from needing to be made on the SNs at all (no postscripts run nor manual > modifications of files)? > > Thanks, > Josh > > On Thu, May 5, 2016 at 11:42 AM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > > Hi Christian, > > Thanks for the response. So do I actually have to reinstall the SNs and/or > rerun the service n
Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?
I looked at the 'servicenode' postscript and it does _way_ too much for what I want to accomplish. I don't think the script was written with changes or upgrades in mind. It looks like it freshly copies everything to the SNs' $installdir/postscripts and /etc/xcat on the service node and generates (new?) keys. The SNs don't need those updates/changes in my case. >From looking at the following comment in the 'servicenode' postscript and the code I'm wondering if all I need to do is manually modify /etc/xcat/cfgloc to update the IP for the new MN database location and if everything else will be fine. They keys should already be in place because I am copying the same keys from the old MN onto the new MN server, and the SNs then shouldn't need to keys (right?). Please let me know if you see any problems with this. The comment in the code: For Linux: It calls xcatserver and xcatclient script to get the ssh keys, ssl redentials and cfgloc file and transfer from the MN to the SN to be able to access the database, setup ssh keys on the nodes and have daemon to daemon commmunication between the SN and MN and have the SN access the DB. P.S. Also would just giving the new MN the same IP and hostname (even as an alias to a different primary hostname) more or less prevent any changes from needing to be made on the SNs at all (no postscripts run nor manual modifications of files)? Thanks, Josh On Thu, May 5, 2016 at 11:42 AM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Hi Christian, > > Thanks for the response. So do I actually have to reinstall the SNs and/or > rerun the service node postscript? If reruning the SN post script just > makes some minor adjustments but doesn't clear the dhcpd.leases and the > .conf files for named and dhcp, as I have them configured, then that would > be fine, but if it blows all that away and starts over that would qualify > as disruptive for my environment since the cluster depends on slave DNS > services and dhcp on the SN. I would ideally like minimal changes on the > SNs except to point them to the new MN. > > As far as the postscripts, my question was what common (if not default in > most installs) postscripts that come with xCAT have code in them that would > result in the hardcoding of the MN's IP in some configuration file. I > actually thought of one possible example along those lines, and that is > whatever configures the client compute nodes to send all their syslog > messages to the /var/log/messages log on the headnode instead of locally > will need to be rerun/updated. What will need to be run to change that to > make the clients log to the new MN server? > > Regarding the server identity (even though it will have a new IP address > and hostname) can we just copy the keys in /etc/ssh/ to the new MN so that > the SSH fingerprint doesn't change? > > Lastly, as regards running updatenode -k I definitely (in this case) do > not want to replace the root rsa_id private and public keys on the cluster, > the MN, or the SNs since other critical services like GPFS require the > current keys to remain in place. Why is rerunning the key deploy necessary > and is there not a way to make it work with the current keys? > > I just need to be very careful with my current setup so that I don't knock > out critical services while changing the MN, which is why I was wondering > how disruptive doing this might be. I appreciate the help! > > Thanks, > Josh > > On Tue, May 3, 2016 at 10:05 AM, Christian Caruthers < > ccaruth...@lenovo.com> wrote: > >> I would begin by looking at the servicenode postscript. It sets up the >> daemon and database communications between SN & MN. Beyond that, the >> default postscripts are listed in the "xcatdefaults" entry of the >> postscripts table. You will probably want to run updatenode -k once you >> have xCAT configured on the new MN. After that, you probably want to rerun >> the remoteshell and syslog postscripts on the cluster members (updatenode >> -P) at the very least. >> >> >> >> Second, you can dump the xCAT DB using dumpxCATdb command. After that, >> grep out the management node (hostname and/or IP) to see where changes need >> to be made for the DB on the new MN. >> >> >> >> If the SNs are handling DHCP, it only needs to be enabled on the MN if >> you plan in reinstaling a SN. >> >> >> >> Anything that resolves DNS through the MN will need an updated >> resolv.conf. >> >> >> >> Depending on how you're maintaining your /install directory on the SNs, >> that mechanism will need to be updated. >> >> >> >> If your MN is routing for any nodes, that will need to be addressed. You >>
Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?
Hi Christian, Thanks for the response. So do I actually have to reinstall the SNs and/or rerun the service node postscript? If reruning the SN post script just makes some minor adjustments but doesn't clear the dhcpd.leases and the .conf files for named and dhcp, as I have them configured, then that would be fine, but if it blows all that away and starts over that would qualify as disruptive for my environment since the cluster depends on slave DNS services and dhcp on the SN. I would ideally like minimal changes on the SNs except to point them to the new MN. As far as the postscripts, my question was what common (if not default in most installs) postscripts that come with xCAT have code in them that would result in the hardcoding of the MN's IP in some configuration file. I actually thought of one possible example along those lines, and that is whatever configures the client compute nodes to send all their syslog messages to the /var/log/messages log on the headnode instead of locally will need to be rerun/updated. What will need to be run to change that to make the clients log to the new MN server? Regarding the server identity (even though it will have a new IP address and hostname) can we just copy the keys in /etc/ssh/ to the new MN so that the SSH fingerprint doesn't change? Lastly, as regards running updatenode -k I definitely (in this case) do not want to replace the root rsa_id private and public keys on the cluster, the MN, or the SNs since other critical services like GPFS require the current keys to remain in place. Why is rerunning the key deploy necessary and is there not a way to make it work with the current keys? I just need to be very careful with my current setup so that I don't knock out critical services while changing the MN, which is why I was wondering how disruptive doing this might be. I appreciate the help! Thanks, Josh On Tue, May 3, 2016 at 10:05 AM, Christian Caruthers <ccaruth...@lenovo.com> wrote: > I would begin by looking at the servicenode postscript. It sets up the > daemon and database communications between SN & MN. Beyond that, the > default postscripts are listed in the "xcatdefaults" entry of the > postscripts table. You will probably want to run updatenode -k once you > have xCAT configured on the new MN. After that, you probably want to rerun > the remoteshell and syslog postscripts on the cluster members (updatenode > -P) at the very least. > > > > Second, you can dump the xCAT DB using dumpxCATdb command. After that, > grep out the management node (hostname and/or IP) to see where changes need > to be made for the DB on the new MN. > > > > If the SNs are handling DHCP, it only needs to be enabled on the MN if you > plan in reinstaling a SN. > > > > Anything that resolves DNS through the MN will need an updated resolv.conf. > > > > Depending on how you're maintaining your /install directory on the SNs, > that mechanism will need to be updated. > > > > If your MN is routing for any nodes, that will need to be addressed. You > might want to check the network configuration on the IMMs. On discovery, if > you have a gateway defined on your management network (I believe it > defaults to ), they might be pointing to the old MN. Shouldn't > be an issue, but it's something to think about. If you're not routing on > that network, I would use pasu to set the IMM gateway to 0.0.0.0 and be > done with it. > > > > The only other concern I can think of would be the installation repos > configured on the cluster nodes and SNs. If any point to the MN, they will > need to be changed. > > > > Aside from all of that, it really depends on the particulars your cluster. > > > > Regards, > *Christian Caruthers* > Lenovo xESS IT Consultant > > Mobile: 757-289-9872 > > > > > > *From:* Josh Nielsen [mailto:jniel...@hudsonalpha.org] > *Sent:* Monday, May 02, 2016 8:32 PM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] How can I migrate to a new xCAT MN in a > hierarchical environment? > > > > Hello all, > > My team is trying to move the xCAT MN role off of an old server and get it > over onto new virtual infrastructure, but I am a little unsure about > whether it is possible to do while leaving everything else in its place as > we currently have it in our environment. We have an MN with two SNs for our > xCAT environment, and I would need to make the SNs recognize that the new > MN (with a new IP and hostname) is now their xcatmaster, and they would > need to take hierarchical command updates from the new MN, look to the new > MN for the xCAT database (which is a MySQL database in our environment), > etc. > > So a few questions along those lines. > > 1. Which/how many xCAT database fields would I need to upd
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Casandra, Good news. Setting sshbetweennodes explicitly to ALLGROUPS automatically provided both the id_rsa and id_rsa.pub keys in /root/.ssh/ when 'remoteshell' ran this time (in xCAT 2.8.3)! So that's all it was. Except the strange thing is that we never had to set that before. Perhaps whatever version we had before 2.8.3 didn't need it, but 2.8.3 does now. But we were on 2.8.3 already with the last deploy, so... I have no idea how this worked a few months ago without setting sshbetweennodes explicitly (or why the default settings didn't have the same result), but I'm glad after banging my head on this for about two weeks to make some headway finally. Thanks for the suggestion! Best regards, Josh On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote: > mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes > attribute is available for while. if it is not defined in the site table, > the default should be set up passwordless between nodes. > "nslookup nodename" still works after you update the hostname, right? > can u able to find any error message from logs? maybe in the > /var/log/message. > > Thanks, > Casandra > ... > Casandra Hong Qiu > Phone: (845) 433-9291, t/l 293-9291 > Office: B/002, Floor 3, Z13 > cxh...@us.ibm.com > > > > [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33 > PM---Thanks for the response Casandra. I should firstly note that]Josh > Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I > should firstly note that I have xCAT 2.8.3. I know I need to upg > > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Date: 03/08/2016 03:58 PM > Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without > prompting for password > -- > > > > > Thanks for the response Casandra. I should firstly note that I have xCAT > 2.8.3. I know I need to upgrade, but not only has this worked in the past > but I also successfully deployed 50+ nodes back in November with the exact > same xCAT version I have now and using the same osimage for Centos 6.5, > same kickstart, same defined postscripts, etc. So something else has > changed, perhaps in our environment? > > That being said, I did not see sshbetweennodes specified at all in the > site table. The following are the only two references to ssh in the table: > > #tabdump site | grep -i ssh > "maxssh","8",, > "rsh","/usr/bin/ssh",, > > Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has > it been around a while? > > Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in > either scenario), and I have seen that before as well, but actually I am > not seeing any id_rsa* keys (public or private) copied to the node at all. > And even an updatenode -k is only producing the id_rsa (if I manually type > the password) but not the .pub, which is also odd. But authorized_keys is > populated with the rsa public key signature. Something else must be going > on. > > P.S. The only other issue I'm still dealing with, which may irrelevant for > this issue, is a hostname problem to where if I run the ifcfg-eth > postscript it updates the hostname in /etc/sysconfig/network from the > correct node name to "localhost". My forward and reverse lookup entries in > DNS are present, and the hostname is set correctly by the kickstart before > ifcfg-eth is run, and remains there if it is not run, which I presume it > gets from either the node definition in dhcpd.leases which is created with > 'makedhcp' and/or the DNS entries for the host's IP. On the off chance that > key copying could be tied to name resolution inconsistencies I thought I > might mention that as well. > > Thanks, > Josh > > On Tue, Mar 8, 2016 at 1:20 PM, Casandra H Qiu <*cxh...@us.ibm.com* > <cxh...@us.ibm.com>> wrote: > >can u check the site table if sshbetweennodes is set up? The default >for sshbetweennodes is ALLGROUPS, and will enable passwordless between >nodes. this attribute will be ignored if zone table is set up, so please >check zone table also. > >if it enables, the remoteshell postscript will copy id_rsa and >id_rsa.pub over to compute node, otherwise, it only copies id_rsa.pub. > >from source code, updatenode -k is always required password. > > >Thanks, >Casandra >....... >Casandra Hong Qiu >Phone: *(845) 433-9291* <%28845%29%20433-9291>, t/l 293-9291 >
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Hi Christopher, I checked our 'nameservers' field in the site table and it is pointing to our external DNS server, which I think was a required configuration for makedns -e to work. Updates for new entries to the external DNS server, which is the master of the slave DNS servers on the SNs, propagate down to the SNs immediately. But I would be hesitant to add the SNs to the list of IPs in that field lest it break something in the external DNS configuration. The oddest thing however is that "none of the above" (the SNs slave DNS or external DNS server) are making it into resolv.conf, but rather the MN's IP (which used to - once upon a time - host the master DNS server before we changed to an external DNS configuration). The only field in the site table pointing to the MN is the 'master' field, as it should be. And here is a sample of one particular node in question. None of it points to the MN's IP, and 'xcatmaster ' is relativized between SNs for the group the node is in: nfsserver=[SN1_IP] os=centos6.5 postscripts=syslog,remoteshell,ospkgs,addsiteyum,syncfiles,fstab,setupntp,otherpkgs primarynic=eth0 profile=compute provmethod=centos6.5-x86_64-install-compute serialflow=hard serialport=0 serialspeed=115200 servicenode=[SN1_DNS_NAME],[SN2_DNS_NAME] status=booting tftpserver=[SN1_DNS_NAME] xcatmaster=[SN1_IP] -Josh On Tue, Mar 8, 2016 at 4:54 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: > On 09/03/16 09:37, Josh Nielsen wrote: > > > Actually, I take it back, some of the nodes have the MN's IP in their > > resolv.conf (which has no DNS server) after a kickstart install, which > > is possibly why the localhost assignment happens since a lookup would > > fail. I think only if I run a postscript that I specifically created to > > edit the /etc/resolv.conf and point it to the SNs which run DNS servers > > does it allow the node to resolve its name. I wish the kickstart would > > populate the resolv.conf file with its service node IPs by default (and > > I thought it used to??). > > xCAT should populate your /etc/resolv.conf with the "nameservers" > attribute from your site table. > > Ours certainly do - and we've got a variety of xCAT versions deployed > for our various clusters (2.8.1, 2.8.5 (x3) & 2.10) on the "If it ain't > broke, don't fix it" principle. > > All the best, > Chris > -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > > > -- > Transform Data into Opportunity. > Accelerate data analysis in your applications with > Intel Data Analytics Acceleration Library. > Click to learn more. > http://makebettercode.com/inteldaal-eval > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
I should also mention that for kicks last week we tried deploying one VM that had been defined in the hpvmgroupA group 4 months ago with the rest of them, but had not been deployed and had been delayed until now, and it too is experiencing the same problems with the keys as well as our (unrelated?) hostname issues, so I'm going for something that changed in our environment, be it networking related, DNS, perhaps permissions, or any host of other variables that can change on systems involved with the xCAT deploy process. I just wish there were a meaningful error or way to determine the root issue more easily. On Wed, Mar 9, 2016 at 10:01 AM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Yes. Sorry that I did not mention it before, but they are stateful nodes. > I have never used stateless nodes and have nothing configured for > stateless. I'm just puzzled what could have changed in four months since I > last I deployed 50+ VMs with the same osimage template and postscripts they > used at that time. I'm deploying Centos 6.5 to a series of VMs on ESXi > (standalone - not vSphere managed) on an HP CX7000 Blade chassis. I'm PXE > booting as the method of delivering the kickstart, and I'm only having > problems post-kickstart (or so it seems...). And the deploy is completely > abstracted to the VM guest operating system level, so nothing specific to > HP. > > I'm obscuring the IPs in the following excerpt from my xCAT 'hosts' table, > but the only real change in xCAT was that I defined a third ('C') group of > VMs that run on the HP cluster following the pattern I had done for the > previous 50+ VMs which were divided between the A and B groups: > > "hpvmgroupA","|\D+(\d+).*$|X.Y.101.($1-0)|",,,"HP Compute nodes compute > interface", > "hpvmgroupB","|\D+(\d+).*$|X.Y.102.($1-0)|",,,"HP Compute nodes compute > interface", > "hpvmgroupC","|\D+(\d+).*$|X.Y.103.($1-0)|",,,"HP Compute nodes compute > interface", > > Otherwise, if you do an lsdef and look at a node that was deployed > successfully which was part of hpvmgroupA and one that I am currently > trying to deploy in hpvmgroupC they are identical except for the details > that should differ like IP address and other things; and the osimage for > the Centos 6.5 image (which points to the *.tmpl kickstart files and the > otherpkgs *.tmpl files) is the same. So perhaps something else in my > environment changed? > > My coworker and I have tried delving into the code and placing echoes as > debugging statements, and have looked at the -V verbose output of various > commands, but can't seem to find a meaningful error as to why it is not > fetching either the id_rsa or id_rsa.pub files. Perhaps this was taken care > of in a certain postscript that I have taken for granted before and > forgotten to run this time? I have no idea. > > -Josh > > On Tue, Mar 8, 2016 at 10:34 PM, Daniel Letai <d...@letai.org.il> wrote: > >> Can you confirm you are deploying stateful nodes and not stateless ? >> >> >> On 03/09/2016 12:53 AM, Josh Nielsen wrote: >> >> My coworker just pointed out that the /xcatpost/mypostscript on the nodes >> that are deployed actually have this line: >> >> ENABLESSHBETWEENNODES='NO' >> export ENABLESSHBETWEENNODES >> >> That's interesting, given that sshbetweennodes (without 'enable' as the >> beginning of the parameter name?) was not defined at all in the site table >> and the default is supposedly enabled(?). However, I just set >> sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now >> redeploying to see if it makes a difference. >> >> The man page for site says: >> >> sshbetweennodes: Comma separated list of groups to enable passwordless >> root >> ssh during install, or xdsh -K. >> Default is ALLGROUPS. >> Set to NOGROUPS,if you do not wish to >> enabled any groups. >> Service Nodes are not affected by >> this attribute >> they are always setup with >> passwordless root access to nodes and >> other SN. >> >> -Josh >> >> On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org> >> wrote: >> >>> Here is what I see in /var/log/messages when remoteshell is run: >>> >>> Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup >>> /etc/ssh/sshd_config and ssh_config >>> Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: s
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Yes. Sorry that I did not mention it before, but they are stateful nodes. I have never used stateless nodes and have nothing configured for stateless. I'm just puzzled what could have changed in four months since I last I deployed 50+ VMs with the same osimage template and postscripts they used at that time. I'm deploying Centos 6.5 to a series of VMs on ESXi (standalone - not vSphere managed) on an HP CX7000 Blade chassis. I'm PXE booting as the method of delivering the kickstart, and I'm only having problems post-kickstart (or so it seems...). And the deploy is completely abstracted to the VM guest operating system level, so nothing specific to HP. I'm obscuring the IPs in the following excerpt from my xCAT 'hosts' table, but the only real change in xCAT was that I defined a third ('C') group of VMs that run on the HP cluster following the pattern I had done for the previous 50+ VMs which were divided between the A and B groups: "hpvmgroupA","|\D+(\d+).*$|X.Y.101.($1-0)|",,,"HP Compute nodes compute interface", "hpvmgroupB","|\D+(\d+).*$|X.Y.102.($1-0)|",,,"HP Compute nodes compute interface", "hpvmgroupC","|\D+(\d+).*$|X.Y.103.($1-0)|",,,"HP Compute nodes compute interface", Otherwise, if you do an lsdef and look at a node that was deployed successfully which was part of hpvmgroupA and one that I am currently trying to deploy in hpvmgroupC they are identical except for the details that should differ like IP address and other things; and the osimage for the Centos 6.5 image (which points to the *.tmpl kickstart files and the otherpkgs *.tmpl files) is the same. So perhaps something else in my environment changed? My coworker and I have tried delving into the code and placing echoes as debugging statements, and have looked at the -V verbose output of various commands, but can't seem to find a meaningful error as to why it is not fetching either the id_rsa or id_rsa.pub files. Perhaps this was taken care of in a certain postscript that I have taken for granted before and forgotten to run this time? I have no idea. -Josh On Tue, Mar 8, 2016 at 10:34 PM, Daniel Letai <d...@letai.org.il> wrote: > Can you confirm you are deploying stateful nodes and not stateless ? > > > On 03/09/2016 12:53 AM, Josh Nielsen wrote: > > My coworker just pointed out that the /xcatpost/mypostscript on the nodes > that are deployed actually have this line: > > ENABLESSHBETWEENNODES='NO' > export ENABLESSHBETWEENNODES > > That's interesting, given that sshbetweennodes (without 'enable' as the > beginning of the parameter name?) was not defined at all in the site table > and the default is supposedly enabled(?). However, I just set > sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now > redeploying to see if it makes a difference. > > The man page for site says: > > sshbetweennodes: Comma separated list of groups to enable passwordless > root > ssh during install, or xdsh -K. > Default is ALLGROUPS. > Set to NOGROUPS,if you do not wish to > enabled any groups. > Service Nodes are not affected by this > attribute > they are always setup with > passwordless root access to nodes and > other SN. > > -Josh > > On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > >> Here is what I see in /var/log/messages when remoteshell is run: >> >> Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup >> /etc/ssh/sshd_config and ssh_config >> Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh >> Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing >> getcredentials ssh_dsa_hostkey from node0087c >> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting >> ssh_host_dsa_key >> Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing >> getcredentials ssh_rsa_hostkey from node0087c >> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey >> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd >> >> I see new timestamps on authorized_keys and copy.sh when it is run, so it >> is actually doing something. >> >> Is there a substantial difference between remoteshell and updatenode -k? >> Why does updatenode -k successfully copy the id_rsa key to the node if I >> type in the password? >> >> As for the hostname in /etc/sysconfig/network, actually if I run just the >> kickstart and remove ifcfg-eth from the list of postscripts (in the node >> definition's postscripts= field) to execute automatica
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
My coworker just pointed out that the /xcatpost/mypostscript on the nodes that are deployed actually have this line: ENABLESSHBETWEENNODES='NO' export ENABLESSHBETWEENNODES That's interesting, given that sshbetweennodes (without 'enable' as the beginning of the parameter name?) was not defined at all in the site table and the default is supposedly enabled(?). However, I just set sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now redeploying to see if it makes a difference. The man page for site says: sshbetweennodes: Comma separated list of groups to enable passwordless root ssh during install, or xdsh -K. Default is ALLGROUPS. Set to NOGROUPS,if you do not wish to enabled any groups. Service Nodes are not affected by this attribute they are always setup with passwordless root access to nodes and other SN. -Josh On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Here is what I see in /var/log/messages when remoteshell is run: > > Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup > /etc/ssh/sshd_config and ssh_config > Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh > Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing > getcredentials ssh_dsa_hostkey from node0087c > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting > ssh_host_dsa_key > Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing > getcredentials ssh_rsa_hostkey from node0087c > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd > > I see new timestamps on authorized_keys and copy.sh when it is run, so it > is actually doing something. > > Is there a substantial difference between remoteshell and updatenode -k? > Why does updatenode -k successfully copy the id_rsa key to the node if I > type in the password? > > As for the hostname in /etc/sysconfig/network, actually if I run just the > kickstart and remove ifcfg-eth from the list of postscripts (in the node > definition's postscripts= field) to execute automatically it ends up > looking like this: > # cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=3(NXDOMAIN) > > But when I manually run updatenode node0087c -P ifcfg-eth it works > correctly. (Also it changes the /etc/sysconfig/network-scripts/ifcfg-eth0 > file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it > should). > # cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=node0087c.morgan.haib.org > > Only if I keep the ifcfg-eth postscript in the node definition (listed in > postscripts= if you 'lsdef') to be automatically executed does > that /etc/sysconfig/network file say 'localhost', which seems to indicate > an order of execution problem to me (even though I made sure ifcfg-eth was > listed last). And, to answer your question, an nslookup node0087c before > and after both return the correct IP from either of the SNs' slave DNS > servers. > > They are possibly two unrelated issues, but I'm close to just upgrading > xCAT and seeing if I have any better luck if I can't figure out some > obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the > site table and rerun the remoteshell postscript. > > Regards, > Josh > > On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote: > >> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes >> attribute is available for while. if it is not defined in the site table, >> the default should be set up passwordless between nodes. >> "nslookup nodename" still works after you update the hostname, right? >> can u able to find any error message from logs? maybe in the >> /var/log/message. >> >> Thanks, >> Casandra >> ... >> Casandra Hong Qiu >> Phone: (845) 433-9291, t/l 293-9291 >> Office: B/002, Floor 3, Z13 >> cxh...@us.ibm.com >> >> >> >> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33 >> PM---Thanks for the response Casandra. I should firstly note that]Josh >> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I >> should firstly note that I have xCAT 2.8.3. I know I need to upg >> >> From: Josh Nielsen <jniel...@hudsonalpha.org> >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> Date: 03/08/2016 03:58 PM >> Subject: Re: [xcat-user] Updatenode -k won't create id_rs
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Actually, I take it back, some of the nodes have the MN's IP in their resolv.conf (which has no DNS server) after a kickstart install, which is possibly why the localhost assignment happens since a lookup would fail. I think only if I run a postscript that I specifically created to edit the /etc/resolv.conf and point it to the SNs which run DNS servers does it allow the node to resolve its name. I wish the kickstart would populate the resolv.conf file with its service node IPs by default (and I thought it used to??). On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Here is what I see in /var/log/messages when remoteshell is run: > > Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup > /etc/ssh/sshd_config and ssh_config > Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh > Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing > getcredentials ssh_dsa_hostkey from node0087c > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting > ssh_host_dsa_key > Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing > getcredentials ssh_rsa_hostkey from node0087c > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey > Mar 7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd > > I see new timestamps on authorized_keys and copy.sh when it is run, so it > is actually doing something. > > Is there a substantial difference between remoteshell and updatenode -k? > Why does updatenode -k successfully copy the id_rsa key to the node if I > type in the password? > > As for the hostname in /etc/sysconfig/network, actually if I run just the > kickstart and remove ifcfg-eth from the list of postscripts (in the node > definition's postscripts= field) to execute automatically it ends up > looking like this: > # cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=3(NXDOMAIN) > > But when I manually run updatenode node0087c -P ifcfg-eth it works > correctly. (Also it changes the /etc/sysconfig/network-scripts/ifcfg-eth0 > file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it > should). > # cat /etc/sysconfig/network > NETWORKING=yes > HOSTNAME=node0087c.morgan.haib.org > > Only if I keep the ifcfg-eth postscript in the node definition (listed in > postscripts= if you 'lsdef') to be automatically executed does > that /etc/sysconfig/network file say 'localhost', which seems to indicate > an order of execution problem to me (even though I made sure ifcfg-eth was > listed last). And, to answer your question, an nslookup node0087c before > and after both return the correct IP from either of the SNs' slave DNS > servers. > > They are possibly two unrelated issues, but I'm close to just upgrading > xCAT and seeing if I have any better luck if I can't figure out some > obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the > site table and rerun the remoteshell postscript. > > Regards, > Josh > > On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote: > >> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes >> attribute is available for while. if it is not defined in the site table, >> the default should be set up passwordless between nodes. >> "nslookup nodename" still works after you update the hostname, right? >> can u able to find any error message from logs? maybe in the >> /var/log/message. >> >> Thanks, >> Casandra >> ... >> Casandra Hong Qiu >> Phone: (845) 433-9291, t/l 293-9291 >> Office: B/002, Floor 3, Z13 >> cxh...@us.ibm.com >> >> >> >> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33 >> PM---Thanks for the response Casandra. I should firstly note that]Josh >> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I >> should firstly note that I have xCAT 2.8.3. I know I need to upg >> >> From: Josh Nielsen <jniel...@hudsonalpha.org> >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> >> Date: 03/08/2016 03:58 PM >> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without >> prompting for password >> -- >> >> >> >> >> Thanks for the response Casandra. I should firstly note that I have xCAT >> 2.8.3. I know I need to upgrade, but not only has this worked in the past >> but I also successfully deployed 50+ nodes back in November with the exact >> same xCAT version I have now and using the same osimage for Centos 6.5, >> same kickstart, same defined postscripts, etc. So something else h
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Here is what I see in /var/log/messages when remoteshell is run: Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup /etc/ssh/sshd_config and ssh_config Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing getcredentials ssh_dsa_hostkey from node0087c Mar 7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting ssh_host_dsa_key Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing getcredentials ssh_rsa_hostkey from node0087c Mar 7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey Mar 7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd I see new timestamps on authorized_keys and copy.sh when it is run, so it is actually doing something. Is there a substantial difference between remoteshell and updatenode -k? Why does updatenode -k successfully copy the id_rsa key to the node if I type in the password? As for the hostname in /etc/sysconfig/network, actually if I run just the kickstart and remove ifcfg-eth from the list of postscripts (in the node definition's postscripts= field) to execute automatically it ends up looking like this: # cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=3(NXDOMAIN) But when I manually run updatenode node0087c -P ifcfg-eth it works correctly. (Also it changes the /etc/sysconfig/network-scripts/ifcfg-eth0 file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it should). # cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=node0087c.morgan.haib.org Only if I keep the ifcfg-eth postscript in the node definition (listed in postscripts= if you 'lsdef') to be automatically executed does that /etc/sysconfig/network file say 'localhost', which seems to indicate an order of execution problem to me (even though I made sure ifcfg-eth was listed last). And, to answer your question, an nslookup node0087c before and after both return the correct IP from either of the SNs' slave DNS servers. They are possibly two unrelated issues, but I'm close to just upgrading xCAT and seeing if I have any better luck if I can't figure out some obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the site table and rerun the remoteshell postscript. Regards, Josh On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote: > mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes > attribute is available for while. if it is not defined in the site table, > the default should be set up passwordless between nodes. > "nslookup nodename" still works after you update the hostname, right? > can u able to find any error message from logs? maybe in the > /var/log/message. > > Thanks, > Casandra > ... > Casandra Hong Qiu > Phone: (845) 433-9291, t/l 293-9291 > Office: B/002, Floor 3, Z13 > cxh...@us.ibm.com > > > > [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33 > PM---Thanks for the response Casandra. I should firstly note that]Josh > Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I > should firstly note that I have xCAT 2.8.3. I know I need to upg > > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Date: 03/08/2016 03:58 PM > Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without > prompting for password > -- > > > > > Thanks for the response Casandra. I should firstly note that I have xCAT > 2.8.3. I know I need to upgrade, but not only has this worked in the past > but I also successfully deployed 50+ nodes back in November with the exact > same xCAT version I have now and using the same osimage for Centos 6.5, > same kickstart, same defined postscripts, etc. So something else has > changed, perhaps in our environment? > > That being said, I did not see sshbetweennodes specified at all in the > site table. The following are the only two references to ssh in the table: > > #tabdump site | grep -i ssh > "maxssh","8",, > "rsh","/usr/bin/ssh",, > > Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has > it been around a while? > > Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in > either scenario), and I have seen that before as well, but actually I am > not seeing any id_rsa* keys (public or private) copied to the node at all. > And even an updatenode -k is only producing the id_rsa (if I manually type > the password) but not the .pub, which is also odd. But authorized_keys is > populated with the rsa public key signature. Something else must be going > on. > > P.S. The only other issue I'm still dealing with, which may irrelev
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Thanks for the response Casandra. I should firstly note that I have xCAT 2.8.3. I know I need to upgrade, but not only has this worked in the past but I also successfully deployed 50+ nodes back in November with the exact same xCAT version I have now and using the same osimage for Centos 6.5, same kickstart, same defined postscripts, etc. So something else has changed, perhaps in our environment? That being said, I did not see sshbetweennodes specified at all in the site table. The following are the only two references to ssh in the table: #tabdump site | grep -i ssh "maxssh","8",, "rsh","/usr/bin/ssh",, Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has it been around a while? Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in either scenario), and I have seen that before as well, but actually I am not seeing any id_rsa* keys (public or private) copied to the node at all. And even an updatenode -k is only producing the id_rsa (if I manually type the password) but not the .pub, which is also odd. But authorized_keys is populated with the rsa public key signature. Something else must be going on. P.S. The only other issue I'm still dealing with, which may irrelevant for this issue, is a hostname problem to where if I run the ifcfg-eth postscript it updates the hostname in /etc/sysconfig/network from the correct node name to "localhost". My forward and reverse lookup entries in DNS are present, and the hostname is set correctly by the kickstart before ifcfg-eth is run, and remains there if it is not run, which I presume it gets from either the node definition in dhcpd.leases which is created with 'makedhcp' and/or the DNS entries for the host's IP. On the off chance that key copying could be tied to name resolution inconsistencies I thought I might mention that as well. Thanks, Josh On Tue, Mar 8, 2016 at 1:20 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote: > can u check the site table if sshbetweennodes is set up? The default for > sshbetweennodes is ALLGROUPS, and will enable passwordless between nodes. > this attribute will be ignored if zone table is set up, so please check > zone table also. > > if it enables, the remoteshell postscript will copy id_rsa and id_rsa.pub > over to compute node, otherwise, it only copies id_rsa.pub. > > from source code, updatenode -k is always required password. > > > Thanks, > Casandra > ... > Casandra Hong Qiu > Phone: (845) 433-9291, t/l 293-9291 > Office: B/002, Floor 3, Z13 > cxh...@us.ibm.com > > > > [image: Inactive hide details for Josh Nielsen ---03/08/2016 12:51:58 > PM---Yes, I just verified. It is present, but that alone is not s]Josh > Nielsen ---03/08/2016 12:51:58 PM---Yes, I just verified. It is present, > but that alone is not sufficient for that node to be able to SS > > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Date: 03/08/2016 12:51 PM > Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without > prompting for password > -- > > > > Yes, I just verified. It is present, but that alone is not sufficient for > that node to be able to SSH to other nodes itself. It allows other nodes > which have the correct private key to SSH to it, but not the other way > around. > > For example, on one compute node I'm having trouble with /root/.ssh has > these three files: > > -rw---. 1 root root 408 Mar 7 14:28 authorized_keys > -rw---. 1 root root 411 Mar 7 14:28 copy.sh > -rw--- 1 root root 402 Mar 3 16:20 known_hosts > > And authorized_keys has the correct ssh-rsa public key entry, but I cannot > go from this node to any other node in my cluster via passwordless ssh. But > as soon as I run updatenode -k, and type in the password that it prompts > for to complete the command, the id_rsa key is added as the fourth file to > the /root/.ssh directory, and then after that I can ssh to other nodes from > it without supplying a password. That is the issue. > > In the past simply running the remoteshell postscript (or so I assumed) > was sufficient for adding the id_rsa file, and it was all automated from a > fresh deploy by specifying remoteshell as one of the default postscripts to > run. But now it doesn't look like remoteshell is placing the id_rsa file on > the node (unless some other script or command is responsible for that), but > remoteshell looks like it creates everything else in /root/.ssh/ (and > /etc/ssh/). > > Is remoteshell the correct postscript for that, or was the id_rsa key most > likely being pushed to the nodes some other way (like by some code that &
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Yes, I just verified. It is present, but that alone is not sufficient for that node to be able to SSH to other nodes itself. It allows other nodes which have the correct private key to SSH to it, but not the other way around. For example, on one compute node I'm having trouble with /root/.ssh has these three files: -rw---. 1 root root 408 Mar 7 14:28 authorized_keys -rw---. 1 root root 411 Mar 7 14:28 copy.sh -rw--- 1 root root 402 Mar 3 16:20 known_hosts And authorized_keys has the correct ssh-rsa public key entry, but I cannot go from this node to any other node in my cluster via passwordless ssh. But as soon as I run updatenode -k, and type in the password that it prompts for to complete the command, the id_rsa key is added as the fourth file to the /root/.ssh directory, and then after that I can ssh to other nodes from it without supplying a password. That is the issue. In the past simply running the remoteshell postscript (or so I assumed) was sufficient for adding the id_rsa file, and it was all automated from a fresh deploy by specifying remoteshell as one of the default postscripts to run. But now it doesn't look like remoteshell is placing the id_rsa file on the node (unless some other script or command is responsible for that), but remoteshell looks like it creates everything else in /root/.ssh/ (and /etc/ssh/). Is remoteshell the correct postscript for that, or was the id_rsa key most likely being pushed to the nodes some other way (like by some code that called updatenode -k upon initial deployment)? Either way, all I can say for sure is that id_rsa used to appear in /root/.ssh on the compute node automatically and now it does not. Regards, Josh On Tue, Mar 8, 2016 at 4:19 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote: > To enable the login without password, the rsa public key should be copied > to /root/.ssh/authorized_keys in the compute node. Could you check whether > the key has been added in to > /root/.ssh/authorized_keys? > > > > Thanks > Best Regards > -- > Wang Xiaopeng (王晓朋) > IBM China System Technology Laboratory > Tel: 86-10-82453455 > Email: w...@cn.ibm.com > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > Haidian District Beijing P.R.China 100193 > > > > - Original message - > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Cc: > Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without > prompting for password > Date: Tue, Mar 8, 2016 5:26 AM > > Also if remoteshell is invoked directly as a postscript ('updatenode > node0086c -V -P remoteshell') it produces the same result, but does not > prompt for a password (like invoking xdsh -K directly doesn't), and copies > everything over except id_rsa. So actually the prompting for a password is > specific to updatenode -k, not xdsh -K or the remoteshell postscript (which > run that). So I'm not sure if that is relevant to the underlying problem or > not, but if I do invoke updatenode -k and supply it the password it copies > the id_rsa to the node. > > On Mon, Mar 7, 2016 at 2:12 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > > Hello, > > When we freshly deploy a node from the kickstart and run our postscripts > we noticed that for some reason the /root/.ssh/id_rsa file which allows > passwordless login from that node to other nodes is missing, though this > was not the case just a few months ago. When I try to generate the key > manually it prompts for a password, after which it will copy/create that > file successfully (see below), but there are a few odd things connected to > this. > > The error is: > updatenode node0087c -k > Enter the password for the userid: root on the node where the ssh keys > will be updated: > > The first oddity is that even after supplying the password once for a > particular node it will prompt for the password every time if I run it > again, as well as the related problem that this never used to happen before > and the key used to be created without issue or prompting for a password. > The 'passwd' xCAT table has the password for root (if that is where it > looks for this command). > > Secondly I have done several manual debugging steps (and poking around > source code to see what is happening) and I have run the actual xdsh > command that is called, shown from the -V verbose output (which it prints > two of, the first apparently to prep the SNs and run the 'remoteshell' > postscript on them, and the second to actually do the same to the node > specified). > > xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m > [MN_IP] 'remoteshell,servicenode'
Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Also if remoteshell is invoked directly as a postscript ('updatenode node0086c -V -P remoteshell') it produces the same result, but does not prompt for a password (like invoking xdsh -K directly doesn't), and copies everything over except id_rsa. So actually the prompting for a password is specific to updatenode -k, not xdsh -K or the remoteshell postscript (which run that). So I'm not sure if that is relevant to the underlying problem or not, but if I do invoke updatenode -k and supply it the password it copies the id_rsa to the node. On Mon, Mar 7, 2016 at 2:12 PM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Hello, > > When we freshly deploy a node from the kickstart and run our postscripts > we noticed that for some reason the /root/.ssh/id_rsa file which allows > passwordless login from that node to other nodes is missing, though this > was not the case just a few months ago. When I try to generate the key > manually it prompts for a password, after which it will copy/create that > file successfully (see below), but there are a few odd things connected to > this. > > The error is: > updatenode node0087c -k > Enter the password for the userid: root on the node where the ssh keys > will be updated: > > The first oddity is that even after supplying the password once for a > particular node it will prompt for the password every time if I run it > again, as well as the related problem that this never used to happen before > and the key used to be created without issue or prompting for a password. > The 'passwd' xCAT table has the password for root (if that is where it > looks for this command). > > Secondly I have done several manual debugging steps (and poking around > source code to see what is happening) and I have run the actual xdsh > command that is called, shown from the -V verbose output (which it prints > two of, the first apparently to prep the SNs and run the 'remoteshell' > postscript on them, and the second to actually do the same to the node > specified). > > xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m > [MN_IP] 'remoteshell,servicenode' --tftp /tftpboot --installdir /install > --nfsv4 no -c -V > > xdsh node0086c --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 > -m [SN1_IP] 'remoteshell' --tftp /tftpboot --installdir /install --nfsv4 no > -c -V > > This did not reveal anything useful, except that when invoked directly > like this no password is prompted for and it runs, but still leaves out the > id_rsa file. I followed also the suggestion by Wang Xaiopeng in this thread > (*http://tinyurl.com/jz2jzmb <http://tinyurl.com/jz2jzmb>)* to test the > getcredentials call with: > > 1. Enable mini server > /xcatpost/allowcred.awk & > > 2.Try to get rsa hostkey > USEOPENSSLFORXCAT=yes XCATSERVER=:3001 /xcatpost/getcredentials.awk > ssh_rsa_hostkey > This returned ssh_rsa_hostkey sucessfully. When remoteshell is run > (whether with updatenode -k or xdsh -K) it actually does copy over the key > files into /etc/ssh/ and it copies known_hosts, copy.sh, and > authorized_keys into /root/.ssh on the compute node but omits id_rsa. What > could be going wrong here? > > Regards, > Josh Nielsen > > > -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] Updatenode -k won't create id_rsa key without prompting for password
Hello, When we freshly deploy a node from the kickstart and run our postscripts we noticed that for some reason the /root/.ssh/id_rsa file which allows passwordless login from that node to other nodes is missing, though this was not the case just a few months ago. When I try to generate the key manually it prompts for a password, after which it will copy/create that file successfully (see below), but there are a few odd things connected to this. The error is: updatenode node0087c -k Enter the password for the userid: root on the node where the ssh keys will be updated: The first oddity is that even after supplying the password once for a particular node it will prompt for the password every time if I run it again, as well as the related problem that this never used to happen before and the key used to be created without issue or prompting for a password. The 'passwd' xCAT table has the password for root (if that is where it looks for this command). Secondly I have done several manual debugging steps (and poking around source code to see what is happening) and I have run the actual xdsh command that is called, shown from the -V verbose output (which it prints two of, the first apparently to prep the SNs and run the 'remoteshell' postscript on them, and the second to actually do the same to the node specified). xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m [MN_IP] 'remoteshell,servicenode' --tftp /tftpboot --installdir /install --nfsv4 no -c -V xdsh node0086c --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m [SN1_IP] 'remoteshell' --tftp /tftpboot --installdir /install --nfsv4 no -c -V This did not reveal anything useful, except that when invoked directly like this no password is prompted for and it runs, but still leaves out the id_rsa file. I followed also the suggestion by Wang Xaiopeng in this thread (*http://tinyurl.com/jz2jzmb <http://tinyurl.com/jz2jzmb>)* to test the getcredentials call with: 1. Enable mini server /xcatpost/allowcred.awk & 2.Try to get rsa hostkey USEOPENSSLFORXCAT=yes XCATSERVER=:3001 /xcatpost/getcredentials.awk ssh_rsa_hostkey This returned ssh_rsa_hostkey sucessfully. When remoteshell is run (whether with updatenode -k or xdsh -K) it actually does copy over the key files into /etc/ssh/ and it copies known_hosts, copy.sh, and authorized_keys into /root/.ssh on the compute node but omits id_rsa. What could be going wrong here? Regards, Josh Nielsen -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://makebettercode.com/inteldaal-eval___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Some trouble with postscripts
Thank you. The otherpkgs postscript wasn't executing because it was somehow removed from the default postscripts list, and I thought for some reason addsiteyum was supposed to add the xCAT-otherpkgs0.repo file thus I didn't look any further than that. I forgot though that otherpkgs and ifcfg-eth (hence the network config was not static) were not baked into the default postscripts list. That explains the differences I was seeing. -Josh On Tue, Mar 1, 2016 at 7:26 PM, Xiao Peng Wang <w...@cn.ibm.com> wrote: > You xCAT build was too old. Now the stable version is xCAT 2.11. > > The file named xCAT-otherpkgs0.repo should be created by the running of > otherpkgs script. If the otherpkgs failed, you need to check whether the > otherpkgdir/otherpkglist attributes for the osimage has been set > correctly. Also you can check whether the environment variables like > OTHERPKGDIR have been exported from CN:/xcatpost/mypostscript. > > The default configuratioin for install nic is 'dhcp', you can run the > postscript 'confignics -s' to set it to be static. Also the confignics are > the recommended one to configure nics. > > Thanks > Best Regards > -- > Wang Xiaopeng (王晓朋) > IBM China System Technology Laboratory > Tel: 86-10-82453455 > Email: w...@cn.ibm.com > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > Haidian District Beijing P.R.China 100193 > > > > - Original message - > From: Josh Nielsen <jniel...@hudsonalpha.org> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Cc: > Subject: [xcat-user] Some trouble with postscripts > Date: Wed, Mar 2, 2016 8:16 AM > > I am using xCAT 2.8.3 and I successfully installed 50+ nodes just 4-5 > months ago using a certain custom defined osimage definition for Centos > 6.5. Now I'm trying to deploy the same image to a new group of nodes that I > defined in xCAT, but am encountering some difficulty with certain > postscripts. The addsiteyum postscript is not adding the correct otherpkgs > repo file into /etc/yum.repos.d/ (it creates xCAT-centos6.5-path0.repo but > not xCAT-otherpkgs0.repo - whereas the previous 50+ nodes have both). And > since that is not in place the otherpkgs postscript also fails because it > can't reach that repo to install custom rpms. > > I also noticed that, contrary to the results with the 50+ hosts I deployed > last time, the node is being assigned the correct IP but its > /etc/sysconfig/network-scripts/ifcfg-eth0 files says BOOTPROTO=dhcp instead > of 'static'. That in turn causes another post script I created > ('ifcfg-eth'), which does a sed on that file to produce a similar one on a > second ethernet interface, to fail. > > The lsdef for these new nodes matches the other successful hosts I did > before, and the default postscripts are as follows: > postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode > > > Many of the other postscripts like the syslog, ospkgs, and fstab seem to > be executing fine though. My log file on the node, after showing the wget > results of the postscripts down to the node, shows this: > > - > FINISHED --2016-03-01 16:03:04-- > Downloaded: 374 files, 55M in 0.3s (202 MB/s) > mv: cannot stat `/xcatpost/mypostscript.node0071c': No such file or > directory > mv: cannot stat `10.20.0.100/postscripts': No such file or directory > Tue Mar 1 16:03:04 CST 2016 Running postscript: syslog > Shutting down system logger: [FAILED] > Starting system logger: [ OK ] > Tue Mar 1 16:03:05 CST 2016 Running postscript: remoteshell > > Stopping sshd: [FAILED] > Generating SSH1 RSA host key: [ OK ] > Starting sshd: [ OK ] > Tue Mar 1 16:03:07 CST 2016 Running postscript: ospkgs > Tue Mar 1 16:03:10 CST 2016 Running postscript: addsiteyum > Tue Mar 1 16:03:10 CST 2016 Running postscript: syncfiles > Did not sync any files. > Tue Mar 1 16:03:10 CST 2016 Running postscript: fstab > Info: Configuring fstab > mount.nfs: rpc.statd is not running but is required for remote locking. > mount.nfs: Either use '-o nolock' to keep locks local, or start statd. > mount.nfs: an incorrect mount option was specified > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying > Error: try grep x3650-head01:/opt /etc/mtab unsuccessful > Tue Mar 1
[xcat-user] Some trouble with postscripts
I am using xCAT 2.8.3 and I successfully installed 50+ nodes just 4-5 months ago using a certain custom defined osimage definition for Centos 6.5. Now I'm trying to deploy the same image to a new group of nodes that I defined in xCAT, but am encountering some difficulty with certain postscripts. The addsiteyum postscript is not adding the correct otherpkgs repo file into /etc/yum.repos.d/ (it creates xCAT-centos6.5-path0.repo but not xCAT-otherpkgs0.repo - whereas the previous 50+ nodes have both). And since that is not in place the otherpkgs postscript also fails because it can't reach that repo to install custom rpms. I also noticed that, contrary to the results with the 50+ hosts I deployed last time, the node is being assigned the correct IP but its /etc/sysconfig/network-scripts/ifcfg-eth0 files says BOOTPROTO=dhcp instead of 'static'. That in turn causes another post script I created ('ifcfg-eth'), which does a sed on that file to produce a similar one on a second ethernet interface, to fail. The lsdef for these new nodes matches the other successful hosts I did before, and the default postscripts are as follows: postscripts=syslog,remoteshell,ospkgs,addsiteyum ,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode Many of the other postscripts like the syslog, ospkgs, and fstab seem to be executing fine though. My log file on the node, after showing the wget results of the postscripts down to the node, shows this: - FINISHED --2016-03-01 16:03:04-- Downloaded: 374 files, 55M in 0.3s (202 MB/s) mv: cannot stat `/xcatpost/mypostscript.node0071c': No such file or directory mv: cannot stat `10.20.0.100/postscripts': No such file or directory Tue Mar 1 16:03:04 CST 2016 Running postscript: syslog Shutting down system logger: [FAILED] Starting system logger: [ OK ] Tue Mar 1 16:03:05 CST 2016 Running postscript: remoteshell Stopping sshd: [FAILED] Generating SSH1 RSA host key: [ OK ] Starting sshd: [ OK ] Tue Mar 1 16:03:07 CST 2016 Running postscript: ospkgs Tue Mar 1 16:03:10 CST 2016 Running postscript: addsiteyum Tue Mar 1 16:03:10 CST 2016 Running postscript: syncfiles Did not sync any files. Tue Mar 1 16:03:10 CST 2016 Running postscript: fstab Info: Configuring fstab mount.nfs: rpc.statd is not running but is required for remote locking. mount.nfs: Either use '-o nolock' to keep locks local, or start statd. mount.nfs: an incorrect mount option was specified Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying Error: try grep x3650-head01:/opt /etc/mtab unsuccessful Tue Mar 1 16:04:14 CST 2016 Running postscript: setupntp ntpd is stopped 1 Mar 16:04:14 ntpdate[13492]: adjust time server 10.20.0.100 offset 0.201009 sec Starting ntpd: [ OK ] - >From past experience when things like this go wrong it tends to be due to something small but significant that is misconfigured. Does any of this look symptomatic of any common misconfiguration you are aware of? P.S. I have verified that DNS has forward and reverse lookups for these nodes on the MN and SNs. Thanks, Josh Nielsen -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
Also there are two logs that you can utilize for investigating postscripts more. As long as you can get onto the deployed OS (and the deployment is not a total failure) when postscripts do not complete you can look in that node's local /var/log/xcat/xcat.log. An optional log, which you will have to set for yourself in the kickstart (for testing you can just edit the /install/autoinst/[nodename] kickstart file directly), is to specify a log file for your %post section like this: %post --log=/root/ks-post.log. If all else fails you can add your own debugging messages to xcatdsklspost to track how far you are getting. I hope that helps in some way. Regards, Josh On Tue, Nov 24, 2015 at 10:18 AM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > Hi Nathan, > > Well, I may not be of any help at all (and I'm not familiar with Centos 7 > deployment) but since you mentioned DNS it reminded me that once I did a > makedns -e [external_server] for some Centos 6.5 nodes and for some reason > only the forward lookup entries were added to the DNS server but not the > reverse entries. My postscripts were failing each time for an unknown reason > when I attempted to deploy the nodes, and it turns out that it couldn't > complete without the reverse lookup because some of the xCAT code in > xcatdsklspost queries it. > > After running the command manually I saw this error: > >> /opt/xcat/xcatdsklspost 6 > > awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information > (3002, installstatus booted) invalid > > > When I searched for a solution to this error I found this which pointed me > to a reverse lookup error: > http://sourceforge.net/p/xcat/mailman/message/27872412/ > > So maybe that's just one more thing to strike off your DNS and postscript > checklist, if you haven't already. > > Regards, > Josh > > On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T. <nhe...@iu.edu> wrote: >> >> No, this is still in the same state. >> >> As best as I can tell DNS resolution is working, but it’s possible I’m >> mistaken. No post scripts run so not much should be different once it boots >> a second time (and I run a "nodeset boot” before it starts a 2nd install). >> It can ping the xcat server’s hostname fine once it comes back up and I can >> examine the node. I haven’t had much more time to work on this, but I do >> plan to keep chipping away at it as time allows. >> >> This is the xcat error I see for reference: >> Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost... >> Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9: >> /xcatpost/xcatlib.sh: No such file or directory >> Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process >> exited, code=exited status=1 >> Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost. >> Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed >> state. >> >> Thanks, >> -Nathan >> >> >> From: Josh Nielsen <jniel...@hudsonalpha.org> >> Date: Monday, November 23, 2015 at 3:43 PM >> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Nathan >> Heald <nhe...@iu.edu> >> Cc: "russa...@comcast.net" <russa...@comcast.net> >> Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7 >> >> And did you ever figure out your problem Nathan? >> >> -Josh >> >> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org> >> wrote: >>> >>> I was going to post a new thread about Centos 7 but thought I might piggy >>> back on this one since it is a similar topic. I have xCAT 2.8.3 currently >>> and it sounds like from the thread here that upgrading to 2.10 is part of >>> the solution for deploying Centos 7.1. Currently the issue I'm having is >>> that during a PXE boot it fetches the correct centos 7.1 image and begins >>> trying to deploy but after it shows "Mounted Configuration File System" and >>> "Started Show Plymouth Boot Screen" I get a message like this: >>> "dracut-initqueue: Warning: Could not boot" and hangs there. >>> >>> This isn't a very explicit error, as to what caused it, and I didn't >>> learn much from removing "quiet" from the PXE kernel boot parameters. I also >>> can't get the ctrl+alt+F keys to work, at least in a VM, for showing >>> different terminals like you could in Centos 6 (which was very helpful for >>> debugging). >>> >>> Is this most likely because the kickstart file itself is not formatted >>> correctly (I'm reusin
Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
Hi Nathan, Well, I may not be of any help at all (and I'm not familiar with Centos 7 deployment) but since you mentioned DNS it reminded me that once I did a makedns -e [external_server] for some Centos 6.5 nodes and for some reason only the forward lookup entries were added to the DNS server but not the reverse entries. My postscripts were failing each time for an unknown reason when I attempted to deploy the nodes, and it turns out that it couldn't complete without the reverse lookup because some of the xCAT code in xcatdsklspost queries it. After running the command manually I saw this error: > */opt/xcat/xcatdsklspost 6* *awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information (3002, installstatus booted) invalid* When I searched for a solution to this error I found this which pointed me to a reverse lookup error: http://sourceforge.net/p/xcat/mailman/message/27872412/ So maybe that's just one more thing to strike off your DNS and postscript checklist, if you haven't already. Regards, Josh On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T. <nhe...@iu.edu> wrote: > No, this is still in the same state. > > As best as I can tell DNS resolution is working, but it’s possible I’m > mistaken. No post scripts run so not much should be different once it boots > a second time (and I run a "nodeset boot” before it starts a 2nd install). > It can ping the xcat server’s hostname fine once it comes back up and I can > examine the node. I haven’t had much more time to work on this, but I do > plan to keep chipping away at it as time allows. > > This is the xcat error I see for reference: > Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost... > Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9: > /xcatpost/xcatlib.sh: No such file or directory > Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process > exited, code=exited status=1 > Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost. > Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed > state. > > Thanks, > -Nathan > > > From: Josh Nielsen <jniel...@hudsonalpha.org> > Date: Monday, November 23, 2015 at 3:43 PM > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Nathan > Heald <nhe...@iu.edu> > Cc: "russa...@comcast.net" <russa...@comcast.net> > Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7 > > And did you ever figure out your problem Nathan? > > -Josh > > On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > >> I was going to post a new thread about Centos 7 but thought I might piggy >> back on this one since it is a similar topic. I have xCAT 2.8.3 currently >> and it sounds like from the thread here that upgrading to 2.10 is part of >> the solution for deploying Centos 7.1. Currently the issue I'm having is >> that during a PXE boot it fetches the correct centos 7.1 image and begins >> trying to deploy but after it shows "Mounted Configuration File System" and >> "Started Show Plymouth Boot Screen" I get a message like this: >> "dracut-initqueue: Warning: Could not boot" and hangs there. >> >> This isn't a very explicit error, as to what caused it, and I didn't >> learn much from removing "quiet" from the PXE kernel boot parameters. I >> also can't get the ctrl+alt+F keys to work, at least in a VM, for showing >> different terminals like you could in Centos 6 (which was very helpful for >> debugging). >> >> Is this most likely because the kickstart file itself is not formatted >> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how >> Centos 7.1 differs), or perhaps that the kickstart is not being properly >> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently >> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to >> looks like this: >> >> #!gpxe >> #install centos7.1-x86_64-compute >> imgfetch -n kernel >> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz >> imgload kernel >> imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks= >> http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline >> console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp} >> imgfetch >> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img >> imgexec kernel >> >> I manually changed ksdevice to "bootif" just to make sure it was using >> the right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for >> &q
Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
Hi Jesus, That's very useful information. Thank you for that! I'm going to update my osimage definition and try a few more deploys and see how far I get with it. I think I'll still end up having to update xCAT in any case, but I'll see if I can get it working with the current install first, while making manual edits as needed for testing. Thanks, Josh On Mon, Nov 23, 2015 at 3:56 PM, Jesus R. Camou <jca...@nearnix.com> wrote: > Hey Josh, > > centos7.1 provisioning works smoothly for me on 2.9.1 with the following > definition and templates: > > [jcamou@xcat ~]$ lsxcatd -v > Version 2.9.1 (git commit 7f6043fffd62d482931b17b60f9488eb5754fdc1, built > Thu Mar 19 03:25:35 EDT 2015) > > [jcamou@xcat ~]$ lsdef -t osimage centos7.1-x86_64-install-compute > Object name: centos7.1-x86_64-install-compute > addkcmdline=net.ifnames=0 biodevname=0 > imagetype=linux > osarch=x86_64 > osdistroname=centos7.1-x86_64 > osname=Linux > osvers=centos7.1 > otherpkgdir=/install/post/otherpkgs/centos7.1/x86_64 > pkgdir=/install/centos7.1/x86_64 > pkglist=/install/custom/install/centos7/compute.centos7.pkglist > profile=compute > provmethod=install > template=/install/custom/install/centos7/compute.centos7.tmpl > > Notice the addkcmdline attribute has "net.ifnames=0 biodevname=0" to > disable consistent network device naming which forces the kernel to use and > stick to the ethX interface naming convention: > > [jcamou@xcat ~]$ lsdef -t osimage -o centos7.1-x86_64-install-compute -i > addkcmdline > Object name: centos7.1-x86_64-install-compute > addkcmdline=net.ifnames=0 biodevname=0 > > You may set this by: > > [jcamou@xcat ~]$ tabch imagename=centos7.1-x86_64-install-compute > linuximage.addkcmdline="net.ifnames=0 biodevname=0" > > > And the following templates (ignoring commented and empty lines for > shortening purposes): > > [jcamou@xcat ~]$ grep -Ev '^#|^$' > /install/custom/install/centos7/compute.centos7.tmpl > lang en_US > %include /tmp/repos > keyboard "us" > zerombr > clearpart --all --initlabel > %include /tmp/partitioning > bootloader > install > text > firewall --disabled > timezone --utc "#TABLE:site:key=timezone:value#" > skipx > rootpw --iscrypted #CRYPT:passwd:key=system,username=root:password# > auth --useshadow --enablemd5 > selinux --disabled > reboot > %packages > %end > %pre > %end > %post > %end > > > ... and pkglist: > > [jcamou@xcat ~]$ cat > /install/custom/install/centos7/compute.centos7.pkglist > #Please make sure there is a space between @ and group name > wget > ntp > nfs-utils > net-snmp > rsync > yp-tools > openssh-server > util-linux > net-tools > mailx > vim-enhanced > > > > > - j > > > On Mon, Nov 23, 2015 at 12:43 PM, Josh Nielsen <jniel...@hudsonalpha.org> > wrote: > >> And did you ever figure out your problem Nathan? >> >> -Josh >> >> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org> >> wrote: >> >>> I was going to post a new thread about Centos 7 but thought I might >>> piggy back on this one since it is a similar topic. I have xCAT 2.8.3 >>> currently and it sounds like from the thread here that upgrading to 2.10 is >>> part of the solution for deploying Centos 7.1. Currently the issue I'm >>> having is that during a PXE boot it fetches the correct centos 7.1 image >>> and begins trying to deploy but after it shows "Mounted Configuration File >>> System" and "Started Show Plymouth Boot Screen" I get a message like this: >>> "dracut-initqueue: Warning: Could not boot" and hangs there. >>> >>> This isn't a very explicit error, as to what caused it, and I didn't >>> learn much from removing "quiet" from the PXE kernel boot parameters. I >>> also can't get the ctrl+alt+F keys to work, at least in a VM, for showing >>> different terminals like you could in Centos 6 (which was very helpful for >>> debugging). >>> >>> Is this most likely because the kickstart file itself is not formatted >>> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how >>> Centos 7.1 differs), or perhaps that the kickstart is not being properly >>> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently >>> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to >>> looks like this: >>> >>> #!gpxe >>> #install centos7.1-x86_64-compute >>>
Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
I was going to post a new thread about Centos 7 but thought I might piggy back on this one since it is a similar topic. I have xCAT 2.8.3 currently and it sounds like from the thread here that upgrading to 2.10 is part of the solution for deploying Centos 7.1. Currently the issue I'm having is that during a PXE boot it fetches the correct centos 7.1 image and begins trying to deploy but after it shows "Mounted Configuration File System" and "Started Show Plymouth Boot Screen" I get a message like this: "dracut-initqueue: Warning: Could not boot" and hangs there. This isn't a very explicit error, as to what caused it, and I didn't learn much from removing "quiet" from the PXE kernel boot parameters. I also can't get the ctrl+alt+F keys to work, at least in a VM, for showing different terminals like you could in Centos 6 (which was very helpful for debugging). Is this most likely because the kickstart file itself is not formatted correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how Centos 7.1 differs), or perhaps that the kickstart is not being properly fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to looks like this: #!gpxe #install centos7.1-x86_64-compute imgfetch -n kernel http:// ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz imgload kernel imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks= http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp} imgfetch http:// ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img imgexec kernel I manually changed ksdevice to "bootif" just to make sure it was using the right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for "em" didn't it?). Nonetheless, I think my kickstart is being sucessfully fetched, because I changed the disk formatting commands in the kickstart and when I removed "quiet" from the boot I saw errors related to disk formatting/partitioning. So I'm partial to thinking that the old Centos 6.4 kickstart configuration is not 100% compatible with Centos 7.1, but I want to double check to see what xCAT 2.10 bring to the table that might be necessary for Centos 7.1 deployment. Comments or thoughts? Regards, Josh On Fri, Oct 23, 2015 at 1:44 PM, Russell Auldwrote: > Looks like xcatdsklspost does get called even for stateful installs. Look > at the script header. > Usually in cases like this, the issue is that the node being imaged can't > resolve the name of the master node. Make sure your dns is working properly. > > On Oct 23, 2015 12:52 PM, "Heald, Nathan T." wrote: > > > > To follow up: > > > > I have resolved the pxe problem by upgrading further to xcat 2.10. Now > it sets kickstart parameters that centos7 responds to. > > > > However I have a new problem, the rinstall is now looping. I’ve gotten > as far as seeing that "/xcatpost” is never created on my stateful install. > The xcat debugging page suggests networking problems as the first thing to > check. So far I’ve not found anything on that front. I can’t find what > specifically creates /xcatpost during the install. I see that /opt/xcat was > created which contains the scripts xcatdsklspost and xcatinstallpost. Only > the script xcatdsklspost has the line “mkdir /$xcatpost”, I would not > expect this script to be executed since this is not a diskless install. > “xcatinstallpost” starts trying to download stuff to /xcatpost but there’s > nothing in this script that creates this directory. > > > > Suggestions? > > > > Thanks, > > -Nathan > > > > > > From: Nathan Heald > > Reply-To: "xcat-user@lists.sourceforge.net" < > xcat-user@lists.sourceforge.net> > > Date: Tuesday, October 13, 2015 at 5:30 PM > > To: "xcat-user@lists.sourceforge.net" > > Subject: [xcat-user] xCAT 2.9.1, problem kickstarting centos7 > > > > Greetings, > > > > I’m trying to get a working statefull install of centos7 on generic > hardware. This management node started out with xCAT 2.8.4, I have upgraded > it to 2.9.1 to try to get this working. Of course the parameters of the > RH/centos 7 kickstart have changed necessitating the upgrade. I found this > url indicating that as of 2.9 xcat should know know how to handle > RH/centos7: > > http://sourceforge.net/p/xcat/bugs/4318/ > > > > However, I’m not able to trick xcat into setting up my pxe file > correctly when I run the nodeset for the host I want to install. I get this > in my /tftpboot/pxelinux.cfg/host01 file: > > … > > APPEND initrd=xcat/osimage/centos7-x86_64-install-compute/initrd.img > quiet repo=http://192.168.0.1:80/install/centos7/x86_64 ks= > http://192.168.0.1:80/install/autoinst/host01 ksdevice=2C:44:FD:00:00:00 > > … > > > > But I’m expecting something more like this which
Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
And did you ever figure out your problem Nathan? -Josh On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org> wrote: > I was going to post a new thread about Centos 7 but thought I might piggy > back on this one since it is a similar topic. I have xCAT 2.8.3 currently > and it sounds like from the thread here that upgrading to 2.10 is part of > the solution for deploying Centos 7.1. Currently the issue I'm having is > that during a PXE boot it fetches the correct centos 7.1 image and begins > trying to deploy but after it shows "Mounted Configuration File System" and > "Started Show Plymouth Boot Screen" I get a message like this: > "dracut-initqueue: Warning: Could not boot" and hangs there. > > This isn't a very explicit error, as to what caused it, and I didn't learn > much from removing "quiet" from the PXE kernel boot parameters. I also > can't get the ctrl+alt+F keys to work, at least in a VM, for showing > different terminals like you could in Centos 6 (which was very helpful for > debugging). > > Is this most likely because the kickstart file itself is not formatted > correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how > Centos 7.1 differs), or perhaps that the kickstart is not being properly > fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently > the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to > looks like this: > > #!gpxe > #install centos7.1-x86_64-compute > imgfetch -n kernel http:// > ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz > imgload kernel > imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks= > http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline > console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp} > imgfetch http:// > ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img > imgexec kernel > > I manually changed ksdevice to "bootif" just to make sure it was using the > right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for > "em" didn't it?). Nonetheless, I think my kickstart is being sucessfully > fetched, because I changed the disk formatting commands in the kickstart > and when I removed "quiet" from the boot I saw errors related to disk > formatting/partitioning. So I'm partial to thinking that the old Centos 6.4 > kickstart configuration is not 100% compatible with Centos 7.1, but I want > to double check to see what xCAT 2.10 bring to the table that might be > necessary for Centos 7.1 deployment. Comments or thoughts? > > Regards, > Josh > > On Fri, Oct 23, 2015 at 1:44 PM, Russell Auld <russa...@comcast.net> > wrote: > >> Looks like xcatdsklspost does get called even for stateful installs. Look >> at the script header. >> Usually in cases like this, the issue is that the node being imaged can't >> resolve the name of the master node. Make sure your dns is working properly. >> >> On Oct 23, 2015 12:52 PM, "Heald, Nathan T." <nhe...@iu.edu> wrote: >> > >> > To follow up: >> > >> > I have resolved the pxe problem by upgrading further to xcat 2.10. Now >> it sets kickstart parameters that centos7 responds to. >> > >> > However I have a new problem, the rinstall is now looping. I’ve gotten >> as far as seeing that "/xcatpost” is never created on my stateful install. >> The xcat debugging page suggests networking problems as the first thing to >> check. So far I’ve not found anything on that front. I can’t find what >> specifically creates /xcatpost during the install. I see that /opt/xcat was >> created which contains the scripts xcatdsklspost and xcatinstallpost. Only >> the script xcatdsklspost has the line “mkdir /$xcatpost”, I would not >> expect this script to be executed since this is not a diskless install. >> “xcatinstallpost” starts trying to download stuff to /xcatpost but there’s >> nothing in this script that creates this directory. >> > >> > Suggestions? >> > >> > Thanks, >> > -Nathan >> > >> > >> > From: Nathan Heald <nhe...@iu.edu> >> > Reply-To: "xcat-user@lists.sourceforge.net" < >> xcat-user@lists.sourceforge.net> >> > Date: Tuesday, October 13, 2015 at 5:30 PM >> > To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net> >> > Subject: [xcat-user] xCAT 2.9.1, problem kickstarting centos7 >> > >> > Greetings, >> > >> > I’m trying to get a worki
Re: [xcat-user] Hello Newbie who need some help. - Benjamin.
Hi Benjamin, It has been several years since I tried installing on VBox, and not all the instructions make sense to me any longer, but here is what little I had documented from my install: Note: The node 'vh01' below is put for the computer that you are running VBox on (desktop, laptop, etc.) Add your nodes so that they can work with the Virtual Box plugin. nodeadd node1 groups=testgrp,vboxgrp vm.host=vh01 Followed by: chtab node=vboxgrp nodehm.mgt=vbox nodehm.power=vbox That tells the nodehm table what settings to run for the nodes in the vboxgrp group so that they can properly run as VM nodes in VirtualBox. Then add the host computer vh01 as a node (your port number may be different): nodeadd vh01 groups=websrv websrv.port=18083 websrv.username=test websrv.password=test chtab node=vh01 hosts.ip=IP of vh01 hosts.comments=“The host running VBox” Manually set the MAC addresses in the mac table. Finally you must launch the VirtualBox webservice on the physical host machine. You may first want to execute (I don't remember why I made this note, or what problem it solves): VBoxManage setproperty websrvauthlibrary null Then when you are prepared to launch the webservices run in a separate shell (it must remain open running) or in the backgroud: vboxwebsrv Final note to resolve an issue with a VM's name as shown in VirtualBox and the hostname of the xCAT node being named the same thing (which you would think wouldn't be a problem): “Sometimes the VirtualBox machine name can not equal the xCAT node name, especially in test cases that might occur. There is a feature for this situation: add somewhere in the comment field of the vm table a schema like vmname:NAME! and replace NAME by the vm name on the VirtualBox host system. You may insert other comments before and after that schema. For example the xCAT node vm03 should point to the virtual machine machine03 on the VirtualBox host vh01.” (This “trick” was absolutely necessary for me - else the rpower command would not work) Example config in vm table: node1,vh01,,,vmname:xCat_Node1!, I hope maybe that helps. That's all I've got. :-) Best regards, Josh Nielsen On Tue, Aug 4, 2015 at 6:21 PM, Benjamin LIPERE benjamin.lipere...@gmail.com wrote: Hello. I am a newbie with XCAT. I am a little lost on the installation. Can some one help me to start on XCAT with a VirtualBox Cluster, please, step by step, and by e-mail of course ?? I may have a twin available, but I am not sure it will work directly. Thanks by advance. Best Regards. Benjamin. -- LIPERE Benjamin Le logis de paille 87270, Chaptelat FRANCE 06 26 14 35 20 benjamin.lipere...@gmail.com -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] Autodiscovery for HP Cluster - SNMP iLo Question
Hello, We are planning to use xCAT to deploy ESX on an HP BladeSystem c7000 cluster (BL460C Gen 9 blade servers). For the time being I was going to skip iLo setup since I am confused about how to get that working with xCAT. However, what I want to know is if I can still do autodiscovery of our nodes as long as the nodes are defined in the switch table and the switches have SNMP enabled. The documentation available online sends mixed signals about this. Sumavi's documentation seems in some places to suggest that SNMP being enabled is the only critical component for autodiscovery: *So you may not have the ability to have SNMP read access to the network switches to do autodiscovery. Not a problem. You'll just have to fill in the MAC addresses yourself.* (From: http://sumavi.com/sections/skipping-autodiscovery) Elsewhere it also seems to suggest SNMP is all is needed: *If xCAT doesn't know who the node is, then it will automatically run the autodiscover code. This code looks through all the switches in the switch table trying to map the port to the mac to the node. It does this by running snmpwalk commands through the switches.* But then it disappointingly (and possibly outdated-ly) says: *Autodiscovery has not been tested with HP blades. Therefore, you are better off just running getmacs when these nodes are defined to map the mac address*. (From: http://sumavi.com/sections/understanding-autodiscovery) But then elsewhere it says this, which seems to imply that the management module is needed for autodiscovery, and not merely SNMP: *In order to automate discovery for IPMI and iLo nodes there needs to be a mapping of nodes to switch ports. This relationship is defined in the switch table. The methods, passwords, etc to communicate with the switches is in the switches table. Notice that with blades this step usually isn't necessary. xCAT can discover IBM blades through the management module.* (From: http://sumavi.com/sections/switch-and-switches) Is this perhaps only talking about the nodeX-bmc definition in regards to autodiscovery and not the base nodeX definition itself? My main question is, based on the above, if SNMP and the management module both have a part to play in autodiscovery is there any chance that HP Blades can work with discovery, as long as I explicitly map the nodes to port numbers in the switch table? Will I have to configure iLo to make autodiscovery work or can I leave iLo alone and just enable SNMP and configure the switch table? Thanks, Josh Nielsen -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] ESXi VM Discovery Deployment
Hello Wang, Thank you for the interaction. *From my point view that you should consider the bare-metal and virtual machine management to be two levels.* Agreed, though I want xCAT to have node definitions for both levels, because I want OS deployment to be handled by xCAT for ESXi and for our Centos VMs. *First, use switch-based discovery to discovery bare-metal node and install ESXI as a general operating system;* Agreed. I pretty much have that figured out at this point. *Second, manually or using script to define the virtual machine against certain host. After the vm node definition has done, the hardware control and OS deployment for vm node will be simple since it's very similar with the bare-metal node.* This is where I am more fuzzy on what to do. I plan to PXE deploy the OS images to the VMs once they are created, so I need xCAT to know the VMs' details like their MAC address, node name, etc. exactly as classic discovery would provide, so that when an unprovisioned VM boots and contacts the DHCP server and then continues down the boot chain that xCAT recognizes the node, can assign it an OS hostname, and proceed with the OS installation as with traditional physical node deployment. However I cannot use SNMP switch discovery at that second level of abstraction for the VMs. So two questions really: 1) What are my best options for creating the VMs on the deployed standalone ESXi hosts to start with? Use the esxcli command line (is that what you meant by 'script')? Or just connect with the vmware client into the ESXi host via the GUI and step through the VM wizard manually? 2) Once I have created the VM, what are my best options for discovery deployment? SNMP location-based discovery node definitions (using regular expressions) seems out of the picture with VMs. So that leaves manually populating the MACs myself, or sequential discovery so that they boot up in the right order and PXE boots the correct image for the node definition. *You mentioned the discovery of vm. I am curious about this requirement. Is that because the vm was not created by xcat (like mkvm command), so you need to discovery the vm from certain host?* *xCAT does not have command to discovery/scan host to get vm list. A simple way is to use 'xdsh' to run virsh command against the host.* *BTW, don't your organization think the performance might be a problem to move from bare-metal to virtual machine?* Sorry, I failed to mention that we are getting a new compute cluster and switch fabric to support the production compute which will be virtualized, and the legacy compute will be made a development cluster (also which has less resource demands). Regards, Josh Nielsen On Mon, Jul 6, 2015 at 9:28 PM, Xiao Peng Wang w...@cn.ibm.com wrote: From my point view that you should consider the bare-metal and virtual machine management to be two levels. First, use switch-based discovery to discovery bare-metal node and install ESXI as a general operating system; Second, manually or using script to define the virtual machine against certain host. After the vm node definition has done, the hardware control and OS deployment for vm node will be simple since it's very similar with the bare-metal node. You mentioned the discovery of vm. I am curious about this requirement. Is that because the vm was not created by xcat (like mkvm command), so you need to discovery the vm from certain host? xCAT does not have command to discovery/scan host to get vm list. A simple way is to use 'xdsh' to run virsh command against the host. BTW, don't your organization think the performance might be a problem to move from bare-metal to virtual machine? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2015/07/07 08:28:49---Also, what will the 'switch' xCAT table look like with multiple]Josh Nielsen ---2015/07/07 08:28:49---Also, what will the 'switch' xCAT table look like with multiple VMs on the same physical host, since From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 2015/07/07 08:28 Subject: Re: [xcat-user] ESXi VM Discovery Deployment -- Also, what will the 'switch' xCAT table look like with multiple VMs on the same physical host, since the man page for it says contains what switch port numbers each node is connected to? On Mon, Jul 6, 2015 at 1:31 PM, Josh Nielsen *jniel...@hudsonalpha.org* jniel...@hudsonalpha.org wrote: Hello all, Our organization is in the process of shifting our HPC model from an all physical/bare metal compute cluster to a virtualized compute cluster, making each physical
Re: [xcat-user] ESXi VM Discovery Deployment
Jarrod, Thanks for the reply. *FYI, xCAT's vm commands don't require vCenter, though certain functions like rmigrate and friends are restricted by vmware. Unless the free license is installed, after which ESXi locks us out of the API (eval mode does not have this restriction).* Below is a forward the last interaction we had (December 2014) on the issue of using the xCAT commands with a standalone ESXi machine, and the errors I was seeing. If the license type prevents the functioning of the VM xCAT commands then effectively I cannot use them, since we are using the free ESXi license. So I am back to the two questions I posed to Wang then. But thank you for clarifying. Regards, Josh -- Forwarded message -- From: Jarrod Johnson jarrod.b.john...@gmail.com Date: Sat, Dec 6, 2014 at 2:58 PM Subject: Re: [xcat-user] Will xCAT work with ESXi 5.5? To: xCAT Users Mailing list xcat-user@lists.sourceforge.net It will work, but the free license doesn't let us automate it. It requires eval or commercial license sadly. This is the case for older ones as well. On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, I ran into this issue when trying to create a vm I had defined in xCAT in an environment where we have two ESXi 5.5 standalone servers: # mkvm --verbose vm1 -s 100G Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to vm1 for root@localhost: esx instance' with error ' SOAP Fault: --- Fault string: Current license or ESXi version prohibits execution of the requested operation. Fault detail: RestrictedVersionFault' while trying to fulfill request for the following nodes: vm1 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required) and I can contact the server with this cli command: vmware-cmd -H esxihost0001.morgan.haib.org -l The documentation only uses 4.1 as an example: http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What is the latest version of ESXi that xCAT will work with? If it does support 5.5 do I need to configure some kind of key or license file locally? I already ran my variation on: chtab key=vmware passwd.username=root passwd.password=cluster for the xCAT passwd table. Regards, Josh Nielsen -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] ESXi VM Discovery Deployment
Hello all, Our organization is in the process of shifting our HPC model from an all physical/bare metal compute cluster to a virtualized compute cluster, making each physical compute node a standalone ESXi host (without vCenter licensing or central management). Because we are not using vCenter the vm-specific xCAT commands are not of much use to us, but I'm not so much concerned about that as with how to redesign/organize the discovery deployment process for VMs on the ESXi hosts. With our current physical compute cluster we had used the ultra handy SNMP switch port discovery method to identify and label nodes with regular expressions, creating compute hosts with simple names like node0001, node0002, etc. Now the ESXi hosts take on those names and use the SNMP switch port discovery method for their naming, IP addresses, etc. But once that is done I need to determine how best to deploy VMs on top of those ESXi hosts and how discovery will work with them. Our intended naming scheme will be to name each VM, per host, after the name of the ESXi host with letters appended to them. So say ESXi host node0001 will have three VMs deployed: we would name them node0001a, node0001b, and node0001c. From what I can tell I cannot use the SNMP method of identifying those VMs. Since I may have to create the VMs by hand anyway (or deploy from a template), perhaps I can use the most tedious method of manually populating the MAC addresses, but I am wondering if anyone has any better ideas for ways to accomplish that. I would welcome any suggestions or pointers for things that I haven't thought of yet. Thanks! Josh Nielsen -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] ESXi VM Discovery Deployment
Also, what will the 'switch' xCAT table look like with multiple VMs on the same physical host, since the man page for it says contains what switch port numbers each node is connected to? On Mon, Jul 6, 2015 at 1:31 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, Our organization is in the process of shifting our HPC model from an all physical/bare metal compute cluster to a virtualized compute cluster, making each physical compute node a standalone ESXi host (without vCenter licensing or central management). Because we are not using vCenter the vm-specific xCAT commands are not of much use to us, but I'm not so much concerned about that as with how to redesign/organize the discovery deployment process for VMs on the ESXi hosts. With our current physical compute cluster we had used the ultra handy SNMP switch port discovery method to identify and label nodes with regular expressions, creating compute hosts with simple names like node0001, node0002, etc. Now the ESXi hosts take on those names and use the SNMP switch port discovery method for their naming, IP addresses, etc. But once that is done I need to determine how best to deploy VMs on top of those ESXi hosts and how discovery will work with them. Our intended naming scheme will be to name each VM, per host, after the name of the ESXi host with letters appended to them. So say ESXi host node0001 will have three VMs deployed: we would name them node0001a, node0001b, and node0001c. From what I can tell I cannot use the SNMP method of identifying those VMs. Since I may have to create the VMs by hand anyway (or deploy from a template), perhaps I can use the most tedious method of manually populating the MAC addresses, but I am wondering if anyone has any better ideas for ways to accomplish that. I would welcome any suggestions or pointers for things that I haven't thought of yet. Thanks! Josh Nielsen -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] Service Node trying to update Master DNS
Hello all, I have one xCAT MN and two SNs along with an external DNS server that they all interact with. The SNs act as slave DNS servers that only sync down changes made to the external (master) DNS server that is updated by commands like 'makedns -e' from the MN only. An issue I am running into, that I haven't tried since I moved to the MN/SN hierarchy model, is that during a PXE deploy the new node gets a DHCP IP from one of the SNs and then that SN tries to add the node's IP to the external DNS server and can't. The SNs are not authorized to update the DNS entries since it is a readonly slave that just syncs what it sees on the master external DNS server. I can manually alter their named.conf to allow updates to be pushed up from the slave named servers on the SNs, but I wanted to check and see what others are doing for this issue first and whether you allow your SNs DNS slaves update the master DNS. Perhaps I configured something incorrectly to begin with. One thing I do need to figure out though is why the node's entry in /var/lib/dhcpd/dhcpd.leases is saying dynamic though instead of fixed-address and doesn't list the IP that shows for the node's config with an lsdef. Perhaps that is why the makedns -e doesn't seem to help, because it picks a different IP then tries to add it. Either way the SNs don't have permission to update DNS. Thanks, Josh Nielsen -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Will xCAT work with ESXi 5.5?
Thanks Michael, that is useful information. We may end up using VMWare's own provisioning tools instead in this case. -Josh On Mon, Dec 8, 2014 at 5:20 PM, Fenn, Michael michael.f...@deshawresearch.com wrote: AFAIK, without (licensed) vCenter, you don't have access to the API endpoints that xCAT is looking for. So any xCAT command that needs to communicate with the API will not work. Pure xCAT DB and file manipulation commands will work. For example mkvm, rsetboot, and rower will not work, but nodeset will. You can definitely deploy a VM on the (free) ESXi with xCAT, it's just a little more of a manual process. Once the VM boots, it does PXE just like a physical node would. So you would need use nodeset to prep the VM for install, then manually create and power on the the VM in the VI Client. The VM should then PXE boot and install like a normal cluster node. Michael From: Josh Nielsen jniel...@hudsonalpha.org Reply-To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: Monday, December 8, 2014 at 10:29 AM To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Subject: Re: [xcat-user] Will xCAT work with ESXi 5.5? Hello Jarrod, I think we are indeed using the free license. What exactly do you mean by doesn't let us automate it? Do you mean, basically, that the commands like mkvm won't work (which is just a wrapper around the VMWare SDK functions right?) and that (with the free license) we would have to create the VMs manually? But if so, even once they are created (manually) can xCAT deploy an image to it via ESX (following the rsetboot, nodeset, rpower instructions in the wiki) if we are using the free license? Or are you saying xCAT simply will not be able to deploy anything to a VM at all with the free ESX license? Thanks, Josh Nielsen On Sat, Dec 6, 2014 at 2:58 PM, Jarrod Johnson jarrod.b.john...@gmail.com wrote: It will work, but the free license doesn't let us automate it. It requires eval or commercial license sadly. This is the case for older ones as well. On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, I ran into this issue when trying to create a vm I had defined in xCAT in an environment where we have two ESXi 5.5 standalone servers: # mkvm --verbose vm1 -s 100G Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to vm1 for root@localhost: esx instance' with error ' SOAP Fault: --- Fault string: Current license or ESXi version prohibits execution of the requested operation. Fault detail: RestrictedVersionFault' while trying to fulfill request for the following nodes: vm1 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required) and I can contact the server with this cli command: vmware-cmd -H esxihost0001.morgan.haib.org -l The documentation only uses 4.1 as an example: http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What is the latest version of ESXi that xCAT will work with? If it does support 5.5 do I need to configure some kind of key or license file locally? I already ran my variation on: chtab key=vmware passwd.username=root passwd.password=cluster for the xCAT passwd table. Regards, Josh Nielsen -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
Re: [xcat-user] Will xCAT work with ESXi 5.5?
Hello Jarrod, I think we are indeed using the free license. What exactly do you mean by doesn't let us automate it? Do you mean, basically, that the commands like mkvm won't work (which is just a wrapper around the VMWare SDK functions right?) and that (with the free license) we would have to create the VMs manually? But if so, even once they are created (manually) can xCAT deploy an image to it via ESX (following the rsetboot, nodeset, rpower instructions in the wiki) if we are using the free license? Or are you saying xCAT simply will not be able to deploy anything to a VM at all with the free ESX license? Thanks, Josh Nielsen On Sat, Dec 6, 2014 at 2:58 PM, Jarrod Johnson jarrod.b.john...@gmail.com wrote: It will work, but the free license doesn't let us automate it. It requires eval or commercial license sadly. This is the case for older ones as well. On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, I ran into this issue when trying to create a vm I had defined in xCAT in an environment where we have two ESXi 5.5 standalone servers: # mkvm --verbose vm1 -s 100G Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to vm1 for root@localhost: esx instance' with error ' SOAP Fault: --- Fault string: Current license or ESXi version prohibits execution of the requested operation. Fault detail: RestrictedVersionFault' while trying to fulfill request for the following nodes: vm1 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required) and I can contact the server with this cli command: vmware-cmd -H esxihost0001.morgan.haib.org -l The documentation only uses 4.1 as an example: http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What is the latest version of ESXi that xCAT will work with? If it does support 5.5 do I need to configure some kind of key or license file locally? I already ran my variation on: chtab key=vmware passwd.username=root passwd.password=cluster for the xCAT passwd table. Regards, Josh Nielsen -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] Will xCAT work with ESXi 5.5?
Hello all, I ran into this issue when trying to create a vm I had defined in xCAT in an environment where we have two ESXi 5.5 standalone servers: # mkvm --verbose vm1 -s 100G Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to vm1 for root@localhost: esx instance' with error ' SOAP Fault: --- Fault string: Current license or ESXi version prohibits execution of the requested operation. Fault detail: RestrictedVersionFault' while trying to fulfill request for the following nodes: vm1 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required) and I can contact the server with this cli command: vmware-cmd -H esxihost0001.morgan.haib.org -l The documentation only uses 4.1 as an example: http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What is the latest version of ESXi that xCAT will work with? If it does support 5.5 do I need to configure some kind of key or license file locally? I already ran my variation on: chtab key=vmware passwd.username=root passwd.password=cluster for the xCAT passwd table. Regards, Josh Nielsen -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Good xCAT Book
Hi John, I had a similar experience to you in the matter of the documentation only taking me so far and the rest was left to my real world implementations (mostly troubleshooting when things didn't go as expected). xCAT has dependencies on many other things that are assumed to be working already (like SNMP on switches if you are using network discovery, to take one example among many), and so has many moving pieces. I too found the Sumavi documentation useful to an extent. I have over 20 pages of notes in a google doc of my own observations on specific xCAT things I have run into. To add to your list of documentation though, I discovered this little gem of a presentation by Jordi Caubet working at IBM Spain given in 2011: http://www.bsc.es/media/4373.pdf. It may not cover everything you need but I actually found that it had details about xCAT that you cannot find elsewhere. For example on slide 56 he shows the exact order that deployment template scripts are parsed in in a very useful picture, which is information not even to be found in the sourceforge wiki (maybe someone reading this can fix that?). Anyway, maybe that will give you something you are looking for. Regards, Josh Nielsen On Thu, Oct 16, 2014 at 2:46 PM, Russell Jones russell-l...@jonesmail.me wrote: This is a good start. It's a little bit old and does some things in a different way than I would, but the overall information in it is great: http://sumavi.com/books/xcat-administrators-guide On 10/16/2014 2:35 PM, John Hosie wrote: I'm working on a government contract to support an HPC environment that uses xCAT. I've spent over 7 1/2 months working with a document that really doesn't say much more than how to get hardware support that was produced by the vendor who did the initial installation. I've also used what I could find in Google. But, to tell you the truth, none of it really seems to be all that good. It tells me about this or that implementation, and can be adapted to meet my needs, but it takes time and effort that seems to be more than should be necessary. Now, it may be better than it once was. I'll give you that. But I'd like to know if there is a good published document out there I can look for, electronic or pulp, that I can use to give me better guidance? What is there out there? And are there any sites you've run across that might help me to fill in the gaps in my current knowledge? Don't get me wrong. I really do appreciate the pieces of documentation I've found. I'd just like to find something that would help me to be able to tie it all together better. Sincerely, *John W. Hosie III* *jwh3* -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device.http://p.sf.net/sfu/Zoho ___ xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Good xCAT Book
Actually, it occurs to me to ask: Can Jordi's presentation be uploaded to the sourceforge wiki so that it can benefit more people? -Josh On Mon, Oct 20, 2014 at 11:52 AM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hi John, I had a similar experience to you in the matter of the documentation only taking me so far and the rest was left to my real world implementations (mostly troubleshooting when things didn't go as expected). xCAT has dependencies on many other things that are assumed to be working already (like SNMP on switches if you are using network discovery, to take one example among many), and so has many moving pieces. I too found the Sumavi documentation useful to an extent. I have over 20 pages of notes in a google doc of my own observations on specific xCAT things I have run into. To add to your list of documentation though, I discovered this little gem of a presentation by Jordi Caubet working at IBM Spain given in 2011: http://www.bsc.es/media/4373.pdf. It may not cover everything you need but I actually found that it had details about xCAT that you cannot find elsewhere. For example on slide 56 he shows the exact order that deployment template scripts are parsed in in a very useful picture, which is information not even to be found in the sourceforge wiki (maybe someone reading this can fix that?). Anyway, maybe that will give you something you are looking for. Regards, Josh Nielsen On Thu, Oct 16, 2014 at 2:46 PM, Russell Jones russell-l...@jonesmail.me wrote: This is a good start. It's a little bit old and does some things in a different way than I would, but the overall information in it is great: http://sumavi.com/books/xcat-administrators-guide On 10/16/2014 2:35 PM, John Hosie wrote: I'm working on a government contract to support an HPC environment that uses xCAT. I've spent over 7 1/2 months working with a document that really doesn't say much more than how to get hardware support that was produced by the vendor who did the initial installation. I've also used what I could find in Google. But, to tell you the truth, none of it really seems to be all that good. It tells me about this or that implementation, and can be adapted to meet my needs, but it takes time and effort that seems to be more than should be necessary. Now, it may be better than it once was. I'll give you that. But I'd like to know if there is a good published document out there I can look for, electronic or pulp, that I can use to give me better guidance? What is there out there? And are there any sites you've run across that might help me to fill in the gaps in my current knowledge? Don't get me wrong. I really do appreciate the pieces of documentation I've found. I'd just like to find something that would help me to be able to tie it all together better. Sincerely, *John W. Hosie III* *jwh3* -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device.http://p.sf.net/sfu/Zoho ___ xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://p.sf.net/sfu/Zoho___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Lissa / Xiao, I may have possibly discovered a bug(?) in ddns.pm, unless it is a print formatting issue. I traced the autovivification of the $ctx hash in ddns.pm throughout the code, and then wrote a few lines of test code to descend into all the hash values of $ctx and print them out (prefixed with the string CTX Key/Value). For $ctx-{resolver} the value looks like it may not have executed the Net::DNS::Resolver Perl call correctly, and instead treated the Net::DNS::Resolver function as a literal string (and what caught my eye is that the string has a space between the two colons after Net (e.g. Net: :). Here is what my code printed for the key/value of $ctx-{resolver}: CTX Key/Value ||resolver|| = Net: :DNS::Resolver=HASH(0x2ca6b08) Maybe the actual Perl module call Net: :DNS::Resolver is meant to be literally placed in the value of the $ctx_{resolver} though, so I may be mistaken. By contrast, however, some of the other key/value pairs under $ctx looked like this: CTX Key/Value ||domain|| = morgan.haib.org CTX Key/Value ||forwarders|| = ARRAY(0x2b9b2d8) ||forwarders|| (ARRAY VALUE) 172.24.12.1 ||forwarders|| (ARRAY VALUE) 172.24.12.2 The relevant code on line 723 of ddns.pm is: if ($external) { # based on /etc/resolv.conf $ctx-{resolver} = Net::DNS::Resolver-new(); } else { # default to localhost $ctx-{resolver} = Net::DNS::Resolver-new(nameservers=['127.0.0.1']); } my $ret = add_or_delete_records($ctx); unless($ret) { xCAT::SvrUtils::sendmsg(DNS setup is completed, $callback); } Is the value of $ctx-{resolver} normally supposed to be Net::DNS::Resolver=HASH(0x2ca6b08) (once I remove the whitespace between the two colons)? Or is it not supposed to include the Perl module call? Regards, Josh On Thu, Oct 2, 2014 at 1:47 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Okay, so I moved to an RPM based install of BIND instead of from source and the problem did not go away. My setup is that I am using a development machine which I exported the current xCAT settings that are on my iDataPlex headnode to that dev VM which I installed xCAT on (definitely a newer version on the VM than the headnode) and when I first ran restorexCATdb I got errors when running 'makedns' which said: *Ignoring host node0014, it does not belong to any nets defined in networks table or the net it belongs to is configured to use an external nameserver*. I thought: That's odd, because the network definition looks fine to me and I don't see why xCAT would change it between versions. On the headnode the networks database looks like this for the relevant network (compute) a secondary network (I snipped out the other entries): #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable compute,10.20.0.0,255.255.0.0,eth0,,,10.20.0.1,10.20.0.1,,,10.20.200.254-10.20.254.254 10gig,10.60.0.0,255.255.0.0,,10.20.0.1,, When I restored the tables on the dev VM it would fail to parse compute but not for 10gig (the only obvious difference being that 10gig had less fields filled in). I modified the ddns.pm code to echo messages when parsing the networks to determine what was going on, and once I discovered that it liked 10gig but not compute I deleted all the extra fields in compute to make it match 10gig and suddenly it would parse. Maybe it has to do with the dynamicrange field in the networks definition? Below are my code modifications (just adding sendmsg commands) and the corresponding output from a test execution: # exclude the nodes not belong to any nets defined in networks table # because only the nets defined in networks table will be add # zones later. my $found = 0; foreach (@networks) { xCAT::SvrUtils::sendmsg(ADDR is $addr , $callback); xCAT::SvrUtils::sendmsg(MASK is $_-{mask} , $callback); xCAT::SvrUtils::sendmsg(NETWORK is $_-{net} , $callback); if(xCAT::NetworkUtils-ishostinsubnet($addr, $_-{mask}, $_-{net})) { $found = 1; xCAT::SvrUtils::sendmsg(Found! , $callback); } else { xCAT::SvrUtils::sendmsg(Not Found! , $callback); } } if ($found) { push @nodes,$node; $ctx-{nodeips}-{$node}-{$addr}=1; } else { unless ($node =~ /localhost/) { xCAT::SvrUtils::sendmsg(:Ignoring host $node, it does not belong to any nets defined in networks table or the net it belongs to is configured to use an external nameserver., $callback
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Okay, so I moved to an RPM based install of BIND instead of from source and the problem did not go away. My setup is that I am using a development machine which I exported the current xCAT settings that are on my iDataPlex headnode to that dev VM which I installed xCAT on (definitely a newer version on the VM than the headnode) and when I first ran restorexCATdb I got errors when running 'makedns' which said: *Ignoring host node0014, it does not belong to any nets defined in networks table or the net it belongs to is configured to use an external nameserver*. I thought: That's odd, because the network definition looks fine to me and I don't see why xCAT would change it between versions. On the headnode the networks database looks like this for the relevant network (compute) a secondary network (I snipped out the other entries): #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable compute,10.20.0.0,255.255.0.0,eth0,,,10.20.0.1,10.20.0.1,,,10.20.200.254-10.20.254.254 10gig,10.60.0.0,255.255.0.0,,10.20.0.1,, When I restored the tables on the dev VM it would fail to parse compute but not for 10gig (the only obvious difference being that 10gig had less fields filled in). I modified the ddns.pm code to echo messages when parsing the networks to determine what was going on, and once I discovered that it liked 10gig but not compute I deleted all the extra fields in compute to make it match 10gig and suddenly it would parse. Maybe it has to do with the dynamicrange field in the networks definition? Below are my code modifications (just adding sendmsg commands) and the corresponding output from a test execution: # exclude the nodes not belong to any nets defined in networks table # because only the nets defined in networks table will be add # zones later. my $found = 0; foreach (@networks) { xCAT::SvrUtils::sendmsg(ADDR is $addr , $callback); xCAT::SvrUtils::sendmsg(MASK is $_-{mask} , $callback); xCAT::SvrUtils::sendmsg(NETWORK is $_-{net} , $callback); if(xCAT::NetworkUtils-ishostinsubnet($addr, $_-{mask}, $_-{net})) { $found = 1; xCAT::SvrUtils::sendmsg(Found! , $callback); } else { xCAT::SvrUtils::sendmsg(Not Found! , $callback); } } if ($found) { push @nodes,$node; $ctx-{nodeips}-{$node}-{$addr}=1; } else { unless ($node =~ /localhost/) { xCAT::SvrUtils::sendmsg(:Ignoring host $node, it does not belong to any nets defined in networks table or the net it belongs to is configured to use an external nameserver., $callback); } } --- My test run: # makedns -e node0014 Handling node0014 in /etc/hosts. ADDR is 10.20.101.14 MASK is NETWORK is Not Found! ADDR is 10.20.101.14 MASK is NETWORK is Not Found! ADDR is 10.20.101.14 MASK is 255.255.0.0 NETWORK is 10.40.0.0 Not Found! ADDR is 10.20.101.14 MASK is 255.255.0.0 NETWORK is 10.20.0.0 Found! - ADDR is 10.20.101.14 MASK is 255.255.0.0 NETWORK is 10.60.0.0 Not Found! - As you can see, some other networks are being parsed as blank (the first two encountered in the foreach loop) but after I modified the compute/ 10.20.0.0 network then it printed finally my Found! message (perviously it was also parsing as blank). I suspect that there are other lurking problems in the tables between the two xCAT versions due to the backup/restore of the databases which might be causing this problem. I'm not sure how to address this without a full reinstall from scratch for xCAT if a backup/restore between versions is not possible/compatible. I'm exploring alternatives though. Regards, Josh On Wed, Oct 1, 2014 at 10:45 AM, Josh Nielsen jniel...@hudsonalpha.org wrote: So I'm thinking that either: A) My DNS server (BIND) is somehow misconfigured (which may include some bug when BIND is compiled from the latest source) - even though the keys obviously work in some instances and I am indeed getting partial remote updates successfully. - OR - B) There is a bug in makedns or the underlying config/tools it uses on the OS. Possibly there was some unclean transfer or mismatch of settings when I used dumpxCATdb/restorexCATdb, even though I updated the site table and have my resolv.conf pointing only to the external DNS server. Even so, as I pointed out, some of the entries are being correctly sent with the key, but I am occasionally seeing request is not signed for some requests OR I am not seeing the request for a forward lookup entry at all (which
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
So I'm thinking that either: A) My DNS server (BIND) is somehow misconfigured (which may include some bug when BIND is compiled from the latest source) - even though the keys obviously work in some instances and I am indeed getting partial remote updates successfully. - OR - B) There is a bug in makedns or the underlying config/tools it uses on the OS. Possibly there was some unclean transfer or mismatch of settings when I used dumpxCATdb/restorexCATdb, even though I updated the site table and have my resolv.conf pointing only to the external DNS server. Even so, as I pointed out, some of the entries are being correctly sent with the key, but I am occasionally seeing request is not signed for some requests OR I am not seeing the request for a forward lookup entry at all (which is what happened the first few times I tried). I just can't tell whether it is a sending problem (makedns/nsupdate) or a receiving problem (external BIND server). -Josh On Tue, Sep 30, 2014 at 10:29 AM, Josh Nielsen jniel...@hudsonalpha.org wrote: Xiao, For some additional details I am running BIND 9.10.0-P2 which I compiled from source. I used --enable-largefile which specfies 64-bit file support but I noticed this in the kernel boot messages in /var/log/messages: Sep 29 11:25:25 dns01 kernel: warning: `named' uses 32-bit capabilities (legacy support in use) When I start named in the foreground here are the first few lines, where you can see my compile options and a few things about startup: 30-Sep-2014 09:59:20.672 built with '--prefix=/opt/bind9' '--sysconfdir=/etc' '--with-gtest' '--with-log4cplus=/opt/log4cplus' '--with-pythonpath=/usr/bin/python' '--localstatedir=/var' '--mandir=/usr/share/man' '--enable-threads' '--enable-largefile' '--with-libtool' '--disable-static' '--with-openssl' 30-Sep-2014 09:59:20.672 30-Sep-2014 09:59:20.672 BIND 9 is maintained by Internet Systems Consortium, 30-Sep-2014 09:59:20.672 Inc. (ISC), a non-profit 501(c)(3) public-benefit 30-Sep-2014 09:59:20.672 corporation. Support and training for BIND 9 are 30-Sep-2014 09:59:20.672 available at https://www.isc.org/support 30-Sep-2014 09:59:20.672 30-Sep-2014 09:59:20.672 adjusted limit on open files from 4096 to 1048576 30-Sep-2014 09:59:20.672 found 1 CPU, using 1 worker thread 30-Sep-2014 09:59:20.672 using 1 UDP listener per interface 30-Sep-2014 09:59:20.672 using up to 4096 sockets 30-Sep-2014 09:59:20.672 Registering DLZ_dlopen driver 30-Sep-2014 09:59:20.672 Registering SDLZ driver 'dlopen' 30-Sep-2014 09:59:20.672 Registering DLZ driver 'dlopen' 30-Sep-2014 09:59:20.676 decrement_reference: delete from rbt: 0x7ff551ecd148 . 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: created 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -3 for socket 20 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -2 for socket -1 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: socket_recv: event 0x7ff551ee0010 - task 0x7ff551ed9250 30-Sep-2014 09:59:20.678 loading configuration from '/etc/named.conf' 30-Sep-2014 09:59:20.679 reading built-in trusted keys from file '/etc/bind.keys' I've been planning to move to an RPM based installation rolled out with CentOS but was testing with a source install first. Does any of this look like a likely culprit for the problems I'm seeing? Thanks, Josh On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote: Then, in this case, did you check the 'allow of key xcat_key' has been set correctly in name.conf for zone morgan.haib.org? And did you see any useful message for this error in the syslog of external server? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server working: partly. For some]Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server working: partly. For some very odd reason the external DNS serv From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 2014/09/30 06:49 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e -- Okay, I have the external DNS server working: partly. For some very odd reason the external DNS server is only receiving the request to enter the reverse lookup for a new node (ex: makedns -e node0014), but does not even attempt to add the forward lookup zone. I see the key authorization passed successfully for the reverse entry, so there are no authentication issues
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Hi Xiao, Here is the relevant zone: zone morgan.haib.org in { type master; file morgan/db.morgan.haib.org; allow-transfer { 10.20.0.100/32; 10.20.0.101/32; }; allow-update { key xcat_key; }; notify yes; }; Its allow-update looks the same as the one for zone 20.10.IN-ADDR.ARPA. I saw no messages in /var/log/messages concerning named. But I did just try it again this morning and the opposite happened: 'morgan.haib.org' updated but 20.10.IN-ADDR.ARPA. did not. So the issue is very inconsistent, other than that one of the two entries gets omitted. See below (which I have shortened some), and notice where it says request is not signed for the 20.10 zone, but it signed (this time!) the morgain.haib.org zone: 0-Sep-2014 10:01:39.446 socket 0x7ff551edebc8 172.26.42.60#58640: packet received correctly 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: UDP request 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: using view '_default' 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: request is not signed 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: recursion available 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: query 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): query '20.10.IN-ADDR.ARPA/NS/IN' approved 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): send 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): sendto 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): senddone 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): next 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0 30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): endrequest 30-Sep-2014 10:01:39.446 client @0x7ff5480d7710: udprecv ... ... 30-Sep-2014 10:01:39.454 socket 0x7ff551edebc8 172.26.42.60#42345: packet received correctly 30-Sep-2014 10:01:39.454 client 172.26.42.60#42345: UDP request 30-Sep-2014 10:01:39.454 client 172.26.42.60#42345: using view '_default' 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345: request has valid signature: xcat_key 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: recursion available 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: update 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: ns_client_attach: ref = 1 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': prerequisites are OK 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: signer xcat_key approved 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: update 'morgan.haib.org/IN' approved 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': update section prescan OK 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': adding an RR at 'node0014.morgan.haib.org' A 10.20.101.14 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': checking for NSEC3PARAM changes 30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': writing journal morgan/db.morgan.haib.org.jnl 30-Sep-2014 10:01:39.455 writing to journal 30-Sep-2014 10:01:39.455 del morgan.haib.org. 86400 IN SOA dns01.morgan.haib.org. root.dns01.morgan.haib.org. 2012080935 10800 3600 604800 86400 30-Sep-2014 10:01:39.455 add morgan.haib.org. 86400 IN SOA dns01.morgan.haib.org. root.dns01.morgan.haib.org. 2012080936 10800 3600 604800 86400 30-Sep-2014 10:01:39.455 add node0014.morgan.haib.org. 86400 IN A 10.20.101.14 30-Sep-2014 10:01:39.476 client 172.26.42.60#42345/key xcat_key: updating zone 'morgan.haib.org/IN': committing update transaction 30-Sep-2014 10:01:39.477 zone_needdump: zone morgan.haib.org/IN: enter 30-Sep-2014 10:01:39.477 zone_settimer: zone morgan.haib.org/IN: enter 30-Sep-2014 10:01:39.477 zone_settimer: zone morgan.haib.org/IN: enter I'll send another email with other possibly relevant details to keep this one's length down. -Josh On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote: Then, in this case, did you check the 'allow of key xcat_key' has been set correctly in name.conf for zone morgan.haib.org? And did you see any useful message for this error in the syslog of external server? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server working: partly. For some very odd reason the external
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Xiao, For some additional details I am running BIND 9.10.0-P2 which I compiled from source. I used --enable-largefile which specfies 64-bit file support but I noticed this in the kernel boot messages in /var/log/messages: Sep 29 11:25:25 dns01 kernel: warning: `named' uses 32-bit capabilities (legacy support in use) When I start named in the foreground here are the first few lines, where you can see my compile options and a few things about startup: 30-Sep-2014 09:59:20.672 built with '--prefix=/opt/bind9' '--sysconfdir=/etc' '--with-gtest' '--with-log4cplus=/opt/log4cplus' '--with-pythonpath=/usr/bin/python' '--localstatedir=/var' '--mandir=/usr/share/man' '--enable-threads' '--enable-largefile' '--with-libtool' '--disable-static' '--with-openssl' 30-Sep-2014 09:59:20.672 30-Sep-2014 09:59:20.672 BIND 9 is maintained by Internet Systems Consortium, 30-Sep-2014 09:59:20.672 Inc. (ISC), a non-profit 501(c)(3) public-benefit 30-Sep-2014 09:59:20.672 corporation. Support and training for BIND 9 are 30-Sep-2014 09:59:20.672 available at https://www.isc.org/support 30-Sep-2014 09:59:20.672 30-Sep-2014 09:59:20.672 adjusted limit on open files from 4096 to 1048576 30-Sep-2014 09:59:20.672 found 1 CPU, using 1 worker thread 30-Sep-2014 09:59:20.672 using 1 UDP listener per interface 30-Sep-2014 09:59:20.672 using up to 4096 sockets 30-Sep-2014 09:59:20.672 Registering DLZ_dlopen driver 30-Sep-2014 09:59:20.672 Registering SDLZ driver 'dlopen' 30-Sep-2014 09:59:20.672 Registering DLZ driver 'dlopen' 30-Sep-2014 09:59:20.676 decrement_reference: delete from rbt: 0x7ff551ecd148 . 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: created 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -3 for socket 20 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -2 for socket -1 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: socket_recv: event 0x7ff551ee0010 - task 0x7ff551ed9250 30-Sep-2014 09:59:20.678 loading configuration from '/etc/named.conf' 30-Sep-2014 09:59:20.679 reading built-in trusted keys from file '/etc/bind.keys' I've been planning to move to an RPM based installation rolled out with CentOS but was testing with a source install first. Does any of this look like a likely culprit for the problems I'm seeing? Thanks, Josh On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote: Then, in this case, did you check the 'allow of key xcat_key' has been set correctly in name.conf for zone morgan.haib.org? And did you see any useful message for this error in the syslog of external server? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server working: partly. For some]Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server working: partly. For some very odd reason the external DNS serv From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 2014/09/30 06:49 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e -- Okay, I have the external DNS server working: partly. For some very odd reason the external DNS server is only receiving the request to enter the reverse lookup for a new node (ex: makedns -e node0014), but does not even attempt to add the forward lookup zone. I see the key authorization passed successfully for the reverse entry, so there are no authentication issues, and when I try a manual 'nsupdate' it adds the forward lookup definition just fine: # nsupdate -k /etc/rndc.key server [external_dns_ip] prereq nxdomain *node0014.morgan.haib.org* http://node0014.morgan.haib.org/. update add *node0014.morgan.haib.org* http://node0014.morgan.haib.org/. 300 A 10.20.101.14 send It looks like 'makedns -e node0014' is somehow sending ONLY the reverse lookup definition (though this is only a guess). This is what the client sees: [root@JNDev ~]# makedns -e node0014 Handling node0014 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Updating DNS records, this may take several minutes for a large cluster. Error: No reply received when sending DNS update to zone *morgan.haib.org* http://morgan.haib.org/. Completed updating DNS records. It updates my reverse zone '20.10.IN-ADDR.ARPA' in the file 'db.10.20' but for the DNS zone '*morgan.haib.org* http://morgan.haib.org/' I see the message: Error: No reply received when sending DNS update to zone
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Thank you for the clarifications. I read the wiki entry on Cluster Name Resolution but wanted to be sure of the particulars. Regards, Josh On Thu, Sep 25, 2014 at 10:36 PM, Xiao Peng Wang w...@cn.ibm.com wrote: #1, you are right, you need prepare configuration files in external dns server. #2, the xcat_key is gotten from passwd table like this 'omapi,xcat_key,TjFYM2kwTUNOcWVVZG5QNWFhb2xPVkg1eTZLMXpuSGs=' Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/09/25 21:58:19---Hi Xiao, Thanks for the response. So if I have this straight, I]Josh Nielsen ---2014/09/25 21:58:19---Hi Xiao, Thanks for the response. So if I have this straight, I need to manually From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 2014/09/25 21:58 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e -- Hi Xiao, Thanks for the response. So if I have this straight, I need to manually create and maintain the zone files on the external DNS server, and already have initially empty (but present) zone files in place. Is that right? And I imagine that I will need to allow-update from the IP of the xCAT server too. Secondly, I still don't quite understand about the xcat_key. On the server hosting xCAT the xcat_key is typically defined in /etc/rndc.key and in /etc/named.conf. I should not even need /etc/named.conf since I am not hosting DNS on the same server as xCAT correct? Or does makedns have some sort of dependency on a local /etc/named.conf even when pushing externally? If /etc/named.conf is not needed, however, then where does makedns (which I think ends up running 'nsupdate') read the key from? Note that I have already configured the xcat_key on the external DNS server in its own /etc/named.conf, but I am not asking about that host but rather the xCAT server (which for purposes of DNS I am calling the client). Thanks, Josh On Wed, Sep 24, 2014 at 8:59 PM, Xiao Peng Wang *w...@cn.ibm.com* w...@cn.ibm.com wrote: You can NOT depend on xCAT to setup dns server on remote server. You need get the xcat_key from local dns configuration. You need prepare the remote dns ready so that xCAT could push the new dns entry to the remote dns server, this is what xCAT 'makedns -e' does. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: *w...@cn.ibm.com* w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation that I had back in Ja]Josh Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation that I had back in January of this year in a different lig From: Josh Nielsen *jniel...@hudsonalpha.org* jniel...@hudsonalpha.org To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net* xcat-user@lists.sourceforge.net Date: 2014/09/25 05:46 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e -- I'm starting to see an older conversation that I had back in January of this year in a different light now: *http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182* http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182 There it was suggested that I be aware that you may need to run a separate makedns to populate the local zone files as well as makedns -e to do the external name servers. Does this mean that an xCAT installation is required on the external DNS server just to populate the zones initially? Wouldn't having two xCAT installations, whose records you have to keep in sync, get tedious after a while? Maybe I'm misunderstanding how the external DNS solution is supposed to work in connection with xCAT. Any clarifications are welcome. -Josh On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen *jniel...@hudsonalpha.org* jniel...@hudsonalpha.org wrote: Hello all, I am in the process of trying to move to an external DNS implementation for name resolution with our compute cluster. The only requirement I see in the man page for makedns is to have one (and only one?) IP of the external DNS server that you want to update in /etc/resolv.conf and also a valid xcat_key. Firstly
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
Hi Xiao, Thanks for the response. So if I have this straight, I need to manually create and maintain the zone files on the external DNS server, and already have initially empty (but present) zone files in place. Is that right? And I imagine that I will need to allow-update from the IP of the xCAT server too. Secondly, I still don't quite understand about the xcat_key. On the server hosting xCAT the xcat_key is typically defined in /etc/rndc.key and in /etc/named.conf. I should not even need /etc/named.conf since I am not hosting DNS on the same server as xCAT correct? Or does makedns have some sort of dependency on a local /etc/named.conf even when pushing externally? If /etc/named.conf is not needed, however, then where does makedns (which I think ends up running 'nsupdate') read the key from? Note that I have already configured the xcat_key on the external DNS server in its own /etc/named.conf, but I am not asking about that host but rather the xCAT server (which for purposes of DNS I am calling the client). Thanks, Josh On Wed, Sep 24, 2014 at 8:59 PM, Xiao Peng Wang w...@cn.ibm.com wrote: You can NOT depend on xCAT to setup dns server on remote server. You need get the xcat_key from local dns configuration. You need prepare the remote dns ready so that xCAT could push the new dns entry to the remote dns server, this is what xCAT 'makedns -e' does. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation that I had back in Ja]Josh Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation that I had back in January of this year in a different lig From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 2014/09/25 05:46 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e -- I'm starting to see an older conversation that I had back in January of this year in a different light now: http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182 There it was suggested that I be aware that you may need to run a separate makedns to populate the local zone files as well as makedns -e to do the external name servers. Does this mean that an xCAT installation is required on the external DNS server just to populate the zones initially? Wouldn't having two xCAT installations, whose records you have to keep in sync, get tedious after a while? Maybe I'm misunderstanding how the external DNS solution is supposed to work in connection with xCAT. Any clarifications are welcome. -Josh On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, I am in the process of trying to move to an external DNS implementation for name resolution with our compute cluster. The only requirement I see in the man page for makedns is to have one (and only one?) IP of the external DNS server that you want to update in /etc/resolv.conf and also a valid xcat_key. Firstly, how do you specify the xcat_key to be used on the client server that is pushing out the dns changes with makedns -e? For a local setup the key definition in /etc/named.conf is sufficient, but since makedns -e isn't dependent on the local config (or is it?), how is the xcat_key specified? The local /etc/named.conf file doesn't even need to exist on the client does it? Secondly, the server I am pushing to only has a 127.0.0 zone since it is a fresh BIND install and I'm wanting 'makedns -e' to create the new zones (and zone files under /var/named/) for me on the remote server. With a local DNS setup, makedns would parse your settings and handle all the file updates and creation for you. But when I try makedns -e I see the following but no files or zone updates (is this an xcat_key problem?): (I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity) 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default' 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN' approved 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): send 24-Sep-2014 15:30
[xcat-user] Questions on prerequisites for external DNS and makedns -e
Hello all, I am in the process of trying to move to an external DNS implementation for name resolution with our compute cluster. The only requirement I see in the man page for makedns is to have one (and only one?) IP of the external DNS server that you want to update in /etc/resolv.conf and also a valid xcat_key. Firstly, how do you specify the xcat_key to be used on the client server that is pushing out the dns changes with makedns -e? For a local setup the key definition in /etc/named.conf is sufficient, but since makedns -e isn't dependent on the local config (or is it?), how is the xcat_key specified? The local /etc/named.conf file doesn't even need to exist on the client does it? Secondly, the server I am pushing to only has a 127.0.0 zone since it is a fresh BIND install and I'm wanting 'makedns -e' to create the new zones (and zone files under /var/named/) for me on the remote server. With a local DNS setup, makedns would parse your settings and handle all the file updates and creation for you. But when I try makedns -e I see the following but no files or zone updates (is this an xcat_key problem?): (I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity) 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default' 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN' approved 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): send 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): sendto 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): senddone 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): next 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): endrequest 24-Sep-2014 15:30:14.687 client @0x7f000c0d7710: udprecv 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -3 for socket 514 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -2 for socket -1 24-Sep-2014 15:30:14.687 socket 0x7f0017e1ebc8: socket_recv: event 0x7f0017c8c160 - task 0x7f0017e369d0 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: dispatch_recv: event 0x7f0017c8c160 - task 0x7f0017e369d0 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: internal_recv: task 0x7f0017e369d0 got event 0x7f0017e1ec88 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8 172.26.42.60#46141: packet received correctly 9.101.20.10.IN-ADDR.ARPA/NS/IN must be my test node definition in the client's /etc/hosts file 10.20.101.9 node0009 node0009.mydomain.org, but I have no zone definition for 10.20 in /etc/named.conf on the external DNS server yet. On the Cluster Name Resolution wiki page ( http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution) under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If you already have a DNS on your site network and you want to use that for your cluster node names too, you can point all of the nodes to it. You must ensure that your nodes have IP connectivity to the DNS, and you must manually configure your DNS with the node hostnames and IP addresses. Does 'makedns -e' not populate the zone files for you, just like it would if DNS were running locally on the MN itself by just parsing /etc/hosts on the client and adding/pushing it to the (remote) DNS zone files for you? If not does this mean I need to hand configure the remote DNS server's /etc/named.conf to stub out definitions for (as of yet) empty zones, or will makedns -e do that for me? Thanks, Josh -- Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e
I'm starting to see an older conversation that I had back in January of this year in a different light now: http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182 There it was suggested that I be aware that you may need to run a separate makedns to populate the local zone files as well as makedns -e to do the external name servers. Does this mean that an xCAT installation is required on the external DNS server just to populate the zones initially? Wouldn't having two xCAT installations, whose records you have to keep in sync, get tedious after a while? Maybe I'm misunderstanding how the external DNS solution is supposed to work in connection with xCAT. Any clarifications are welcome. -Josh On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Hello all, I am in the process of trying to move to an external DNS implementation for name resolution with our compute cluster. The only requirement I see in the man page for makedns is to have one (and only one?) IP of the external DNS server that you want to update in /etc/resolv.conf and also a valid xcat_key. Firstly, how do you specify the xcat_key to be used on the client server that is pushing out the dns changes with makedns -e? For a local setup the key definition in /etc/named.conf is sufficient, but since makedns -e isn't dependent on the local config (or is it?), how is the xcat_key specified? The local /etc/named.conf file doesn't even need to exist on the client does it? Secondly, the server I am pushing to only has a 127.0.0 zone since it is a fresh BIND install and I'm wanting 'makedns -e' to create the new zones (and zone files under /var/named/) for me on the remote server. With a local DNS setup, makedns would parse your settings and handle all the file updates and creation for you. But when I try makedns -e I see the following but no files or zone updates (is this an xcat_key problem?): (I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity) 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default' 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN' approved 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): send 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): sendto 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): senddone 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): next 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844 (9.101.20.10.IN-ADDR.ARPA): endrequest 24-Sep-2014 15:30:14.687 client @0x7f000c0d7710: udprecv 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -3 for socket 514 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -2 for socket -1 24-Sep-2014 15:30:14.687 socket 0x7f0017e1ebc8: socket_recv: event 0x7f0017c8c160 - task 0x7f0017e369d0 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: dispatch_recv: event 0x7f0017c8c160 - task 0x7f0017e369d0 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: internal_recv: task 0x7f0017e369d0 got event 0x7f0017e1ec88 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8 172.26.42.60#46141: packet received correctly 9.101.20.10.IN-ADDR.ARPA/NS/IN must be my test node definition in the client's /etc/hosts file 10.20.101.9 node0009 node0009.mydomain.org, but I have no zone definition for 10.20 in /etc/named.conf on the external DNS server yet. On the Cluster Name Resolution wiki page (http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution) under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If you already have a DNS on your site network and you want to use that for your cluster node names too, you can point all of the nodes to it. You must ensure that your nodes have IP connectivity to the DNS, and you must manually configure your DNS with the node hostnames and IP addresses. Does 'makedns -e' not populate the zone files for you, just like it would if DNS were running locally on the MN itself by just parsing /etc/hosts on the client and adding/pushing it to the (remote) DNS zone files for you? If not does this mean I need to hand configure the remote DNS server's /etc/named.conf to stub out definitions for (as of yet) empty zones, or will makedns -e do that for me? Thanks, Josh
Re: [xcat-user] Makedns wiping out DNS records
As Christopher notes, it is an additional attack surface. Also I would like to not have to worry about an additional named instance running in my network which I am not using. If it is not currently possible to prevent the local named daemon on the MN from starting when makedns is run, can that behavior be changed in a future xCAT version if you want an external master only? I'm still curious as to whether the /var/named/ directory would have any zone files locally on the MN in the case where named is running locally but you are pushing updates to an external DNS server, or whether it only pushes zone entries out to the DNS server specified with 'makedns -e' (as it should). -Josh On Mon, Mar 31, 2014 at 8:17 PM, Xiao Peng Wang w...@cn.ibm.com wrote: Enable the local DNS on xCAT MN does not hurt anything except the cpu/memory to run makedns in your case, right? Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/04/01 00:00:13---By default, makedns sets up the named service and updates the]Josh Nielsen ---2014/04/01 00:00:13---By default, makedns sets up the named service and updates the DNS records on the local system (man From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, sam...@unimelb.edu.au, Date: 2014/04/01 00:00 Subject: Re: [xcat-user] Makedns wiping out DNS records -- By default, makedns sets up the named service and updates the DNS records on the local system (management node). If the -e flag is specified, it will also update the DNS records on any external DNS server that is listed in the /etc/resolv.conf on the management node. Interesting, so even if I use an external DNS server I can't get rid of/turn off the DNS service on the MN? I suppose I could do a 'chkconfig named off' but makedns starts the named daemon anyway when you run it (I think). I want to make the external server the sole Master DNS server, and every other server its slave (I have only two other slave DNS servers, which are my Service Nodes). I'm in the same boat as Christopher who noted One kink I think we've noticed there is that even with makedns -e xCAT seems to start up bind on the local system which isn't what we want at all. I also would prefer the local BIND/named daemon to not start at all, and rather make the external DNS the sole master. However, as you indicated makedns by default (which I'm assuming is a behavior that can't be changed [yet]?) sets up the local named service when it is run. Can xCAT be modified to accommodate the external only master scenario? Regards, Josh Nielsen On Thu, Mar 27, 2014 at 12:38 PM, Lissa Valletta *lis...@us.ibm.com*lis...@us.ibm.com wrote: It is saying if your site has an external name server and you want to use it , you need to add that external name server into the /etc/resolv.conf file.You also have to add the correct nameservers in the site table before you run makedns. This is the manual part. chdef -t site nameservers=50.1.2.254 for example. The makedns -e will take the external nameservers that you have listed in /etc/resolv.conf and update their DNS records, assuming you have the authority to change that server. From man makedns By default, makedns sets up the named service and updates the DNS records on the local system (management node). If the -e flag is specified, it will also update the DNS records on any external DNS server that is listed in the /etc/resolv.conf on the management node. (Assuming the external DNS server can recognize the xCAT key as authentication.) Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---03/26/2014 03:15:37 PM---Sorry to revive this, but I have a question about external D]Josh Nielsen ---03/26/2014 03:15:37 PM---Sorry to revive this, but I have a question about external DNS since I am thinking of trying it now. From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net, Date: 03/26/2014 03:15 PM Subject: Re: [xcat-user] Makedns wiping out DNS records -- Sorry to revive this, but I have a question about external DNS since I am thinking of trying it now. On the Cluster Name Resolution wiki page ( *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution*http://sourceforge.net/apps
Re: [xcat-user] Makedns wiping out DNS records
Okay, as long as makedns -e updates the external named server just like it would a local named instance. The reason I had asked that is because the documentation was rather ambiguously worded on that point. So now I'm just trying to figure out the best way to make xCAT only update the external DNS instance. Since you said that you have the problem of the local deamon starting up regardless when you run makedns -e, have you ever noticed whether it is actually updating local records also? Is your /var/named/ directory on the xCAT server populated with zone files? Also did you remove the MN's local host IP (whether loopback or assigned) from its /etc/resolv.conf and make it point only to the external DNS server's IP? Since I'm trying to transition without an outage I think I will have to have a brief period where I have both the MN and the external server running identical named configs. Then I can slowly begin to move my compute cluster nodes to resolve to the external server in 10 node increments or so, so that I can troubleshoot any problems and undo it to resolve back to the MN again if need be. I just want to be as careful as possible. Thanks, Josh On Mon, Mar 31, 2014 at 10:17 PM, Christopher Samuel sam...@unimelb.edu.auwrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 27/03/14 06:11, Josh Nielsen wrote: What does it mean that you must manually configure the hostnames and IP addresses? Does 'makedns -e' not do that for you, just like it would if DNS were running locally on the MN itself by just parsing /etc/hosts and adding/pushing it to the (remote) DNS zone files for you? We've occasionally pushed records into the DNS server by hand when they're things external to xCAT, but I we've gone from that to defining them in xCAT on our infrastructure management node and pushing from there. So we've got 4 xCAT management nodes updating the same external DNS servers with makedns -e. cheers, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlM6L7UACgkQO2KABBYQAh+SRgCbBrVkrOB53IUqj1+OSWpmlMe9 9HMAn0Zf3hFtebzDfoyfUs/4ctdE8C/g =2YjS -END PGP SIGNATURE- -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Makedns wiping out DNS records
Sorry to revive this, but I have a question about external DNS since I am thinking of trying it now. On the Cluster Name Resolution wiki page ( http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution) under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If you already have a DNS on your site network and you want to use that for your cluster node names too, you can point all of the nodes to it. You must ensure that your nodes have IP connectivity to the DNS, and you must manually configure your DNS with the node hostnames and IP addresses. What does it mean that you must manually configure the hostnames and IP addresses? Does 'makedns -e' not do that for you, just like it would if DNS were running locally on the MN itself by just parsing /etc/hosts and adding/pushing it to the (remote) DNS zone files for you? In my case I may even be migrating my existing DNS zone files and /etc/named.conf config onto a new VM and bring up DNS there and just add any new hosts. But I would be surprised if makedns -e doesn't add the hostnames and IPs for you. Can anyone who has used the external DNS option with xCAT speak to that? Thanks, Josh On Thu, Jan 16, 2014 at 5:13 PM, Christopher Samuel sam...@unimelb.edu.auwrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 17/01/14 04:00, Josh Nielsen wrote: If it just something with our installation or does makedns do this foe other people as well? FWIW we do not run DNS on the management nodes at all, we have 4 separate xCAT built clusters (3 HPC clusters and 1 for our GPFS/TSM infrastructure and our BG/Q service and front ends) and so to let them all populate DNS with a single, consistent view we have two external DNS servers that they all send updates to with makedns -e. One kink I think we've noticed there is that even with makedns -e xCAT seems to start up bind on the local system which isn't what we want at all. Good luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLYZ4sACgkQO2KABBYQAh8pwACfSEi4RePX5F2hG5E1AtrAdy77 PtMAni1KEwsOM/az4z0U7o4aZZOEuO/+ =fPWd -END PGP SIGNATURE- -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Makedns wiping out DNS records
Also, if I externalize DNS but leave DHCP on the MN nothing has to change in the DHCP config, except changes propagated from xcat via makedhcp, does it? As long as I specify site.nameservers=X.X.X.X as the IP of my external DNS master and update /etc/dhcpd.conf with makedhcp only the domain-name-servers line should change, right? I know that I need to change the MN's resolv.conf to point to the external DNS server too to make this work. Also as thishttps://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg00451.htmlarchived list email suggests I should also make sure the TSIG key for DNS is defined in /etc/named.conf on both the MN and the external DNS server (assuming I allow DNS updates only with the xcat_key signature). Does that mean the MN still needs its named daemon running to push out updates though? Overall, in addition to the things mentioned above, is there anything else that I have missed that will be necessary to do to make external DNS work with xCAT? Thanks, Josh On Wed, Mar 26, 2014 at 2:11 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote: Sorry to revive this, but I have a question about external DNS since I am thinking of trying it now. On the Cluster Name Resolution wiki page ( http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution) under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If you already have a DNS on your site network and you want to use that for your cluster node names too, you can point all of the nodes to it. You must ensure that your nodes have IP connectivity to the DNS, and you must manually configure your DNS with the node hostnames and IP addresses. What does it mean that you must manually configure the hostnames and IP addresses? Does 'makedns -e' not do that for you, just like it would if DNS were running locally on the MN itself by just parsing /etc/hosts and adding/pushing it to the (remote) DNS zone files for you? In my case I may even be migrating my existing DNS zone files and /etc/named.conf config onto a new VM and bring up DNS there and just add any new hosts. But I would be surprised if makedns -e doesn't add the hostnames and IPs for you. Can anyone who has used the external DNS option with xCAT speak to that? Thanks, Josh On Thu, Jan 16, 2014 at 5:13 PM, Christopher Samuel sam...@unimelb.edu.au wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 17/01/14 04:00, Josh Nielsen wrote: If it just something with our installation or does makedns do this foe other people as well? FWIW we do not run DNS on the management nodes at all, we have 4 separate xCAT built clusters (3 HPC clusters and 1 for our GPFS/TSM infrastructure and our BG/Q service and front ends) and so to let them all populate DNS with a single, consistent view we have two external DNS servers that they all send updates to with makedns -e. One kink I think we've noticed there is that even with makedns -e xCAT seems to start up bind on the local system which isn't what we want at all. Good luck! Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.14 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlLYZ4sACgkQO2KABBYQAh8pwACfSEi4RePX5F2hG5E1AtrAdy77 PtMAni1KEwsOM/az4z0U7o4aZZOEuO/+ =fPWd -END PGP SIGNATURE- -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] How can I prevent overwritting of resolv.conf and named.conf?
I have noticed that with my recent restructuring of my cluster's DNS hierarchy by creating two Service Nodes to stand in between the compute nodes and the Management Node that I am having two separate problems with files being overwritten once I modify them. Firstly, I configured the SNs to act as actual slave DNS servers instead of just forwarding to the MN (that feature it looks like will be officially supported in the next xcat release but is not supported in the current one), so I had to edit /etc/named.conf to facilitate that. Before I edited that file on both SNs it simply had an options { } block ending with forward only and a forwarders { } block with the IP of the MN, but I removed the forward only statement, added zone definitions, and made each zone a slave to the MN. It worked perfectly. The only problem is that every couple days (and it happened again this morning) all my changes get erased somehow and named.conf is regenerated to the default file with only an options { } block. How can I prevent that from happening? Secondly, for compute nodes and storage nodes which were dhcp enabled instead of statically assigned in their /etc/sysconfig/network-scripts/ifcfg-eth* files, when I manually edited the /etc/resolv.conf (though a postscript would do the same) it too would get overwritten fairly soon after I made the change, back to only pointing to the MN for DNS. I changed the resolv.conf to point not just to the MN (as they did originally) but created three nameserver entries to look for DNS name servers in the following order: SN1, SN2, MN. I fixed this by statically assigning IPs in the ifcfg-eth* files, but I am wondering if there is a better way. DHCP has the ability to push out DNS server names for resolv.conf and so I looked to see if it was the culprit and I changed the option domain-name-servers line to include SN1, SN2, and the MN (does the nameservers value in the xCAT 'site' table set this line?), but I'm not sure if that is the line for DHCP responsible for changing the values in /etc/resolv.conf, or how often the DHCP changes were pushed out (this is happening for machines which are not being rebooted or reinitialized in any way - just running as normal - and they suddenly change their resolv.conf). Are any of the suggestions on this page good options: http://www.cyberciti.biz/faq/dhclient-etcresolvconf-hooks/? I don't have a dhclient.conf file on my RHEL/CentOS servers though. Anyway, any suggestions would be much appreciated! Thanks, Josh -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] How can I prevent overwritting of resolv.conf and named.conf?
Thank you Dennis and Jonathan. Setting PEERDNS=no was part of my fix to the ifcfg-eth* files, and I guess that's the best way to prevent revolv.conf from being overwritten then. I also saw the suggestion for chatt +i but would only like to use it as a last resort, since I might loose track of which nodes I have and haven't done that one, especially if it is a newly deployed node from xcat. So on to named.conf then. What would be resetting it? Does the slave configuration have something to do with it? I don't even know if it is a remotely initiated change or whether it originates locally for some reason. Is there any config I can post here that would help determine that? Thanks, Josh On Fri, Mar 7, 2014 at 11:50 AM, Jonathan Mills jonmi...@renci.org wrote: What Dennis says should work. However, I think the accepted Redhat way of doing it is to put PEERDNS=no in you /etc/sysconfig/network-scripts/ifcfg-ethX script. On 03/07/2014 12:48 PM, Dennis Zheleznyak wrote: You can lock the file by entering the following command chattr +i /etc/resolv.conf. This will lock the file even for root. Dennis. On Friday, March 7, 2014, Josh Nielsen jniel...@hudsonalpha.org mailto:jniel...@hudsonalpha.org wrote: I have noticed that with my recent restructuring of my cluster's DNS hierarchy by creating two Service Nodes to stand in between the compute nodes and the Management Node that I am having two separate problems with files being overwritten once I modify them. Firstly, I configured the SNs to act as actual slave DNS servers instead of just forwarding to the MN (that feature it looks like will be officially supported in the next xcat release but is not supported in the current one), so I had to edit /etc/named.conf to facilitate that. Before I edited that file on both SNs it simply had an options { } block ending with forward only and a forwarders { } block with the IP of the MN, but I removed the forward only statement, added zone definitions, and made each zone a slave to the MN. It worked perfectly. The only problem is that every couple days (and it happened again this morning) all my changes get erased somehow and named.conf is regenerated to the default file with only an options { } block. How can I prevent that from happening? Secondly, for compute nodes and storage nodes which were dhcp enabled instead of statically assigned in their /etc/sysconfig/network-scripts/ifcfg-eth* files, when I manually edited the /etc/resolv.conf (though a postscript would do the same) it too would get overwritten fairly soon after I made the change, back to only pointing to the MN for DNS. I changed the resolv.conf to point not just to the MN (as they did originally) but created three nameserver entries to look for DNS name servers in the following order: SN1, SN2, MN. I fixed this by statically assigning IPs in the ifcfg-eth* files, but I am wondering if there is a better way. DHCP has the ability to push out DNS server names for resolv.conf and so I looked to see if it was the culprit and I changed the option domain-name-servers line to include SN1, SN2, and the MN (does the nameservers value in the xCAT 'site' table set this line?), but I'm not sure if that is the line for DHCP responsible for changing the values in /etc/resolv.conf, or how often the DHCP changes were pushed out (this is happening for machines which are not being rebooted or reinitialized in any way - just running as normal - and they suddenly change their resolv.conf). Are any of the suggestions on this page good options: http://www.cyberciti.biz/faq/dhclient-etcresolvconf-hooks/? I don't have a dhclient.conf file on my RHEL/CentOS servers though. Anyway, any suggestions would be much appreciated! Thanks, Josh -- Jonathan Mills Systems Administrator Renaissance Computing Institute UNC-Chapel Hill -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large binaries. Built-in WAN optimization and the freedom to use Git, Perforce or both. Make the move to Perforce. http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Subversion Kills Productivity. Get off Subversion Make the Move to Perforce. With Perforce, you get hassle-free workflows. Merge that actually works. Faster operations. Version large
Re: [xcat-user] How to create and deploy an xCAT Service Node
Okay, I guess I need to revive this again now that I have the SNs deployed and now I am trying to snmove some nodes onto them. The Heirarchical Cluster wiki page is oriented toward those setting up a brand new cluster and not migrating an established cluster to include SNs, so it does not include clear instructions of what commands to run after you have created groups of CNs for SNs to manage. I am assuming that to get nodes to initially look away from the MN and put them on an SN for the first time you must execute snmove with -d and -D to point to the SN. My config follows: I am testing on just two of the nodes in my cluster for now. So first I did this: mkdef -t group -o serv1_compute members=node0001,node0002 Then following the documentation for creating service pools I did this: chdef -t group serv1_compute servicenode=xcat-serv1,xcat-serv2 # lsdef -t group serv1_compute Object name: serv1_compute grouptype=static members=node0001,node0002 servicenode=xcat-serv1,xcat-serv2 And noderes looks like this now: #node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,nameservers,comments,disable user,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,, service,,xnba,MN_IP,,MN_IP,,,mac,mac,,,MN_IP,,, storage,,xnba,MN_IP,,MN_IP,,,eth1,eth1,,,MN_IP,,, compute,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,, login,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,, node0059,,xnba, hinode01,,xnba, serv1_compute,xcat-serv1,xcat-serv2,, node0001,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,, node0002,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,, I may have a conflict problem though in that the established compute group which node0001 and node0002 are in is pointing to MN_IP (the MN's ip address) while serv1_compute points to xcat-serv1. I was hoping that since noderes FURTHER defined the servicenode and xcatmaster for them that it would override the settings for compute. Will that work or do I have to remove node0001 node0002 from compute altogether? Their nodelist entries look like this: node0001,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013 13:55:00,synced,02-05-2014 08:59:57,, node0002,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013 13:55:00 Then after all the configuration, I tried an snmove on just node0001: # snmove serv1_compute -d xcat-serv1 -D xcat-serv1 Moving nodes to their backup service nodes. Setting new values in the xCAT database. node0001: install centos6.4-x86_64-compute node0002: install centos6.4-x86_64-compute node0001: install centos6.4-x86_64-compute node0002: install centos6.4-x86_64-compute Running postscripts on the nodes. If you specify the -s flag you must not specify either the -S or -k or -P flags In /var/log/messages I saw: Allowing nodeset to node0001,node0002 install for x3650-head01.haib.org http://x3650-head01.haig.org/ from x3650-head01 Firstly, why was a nodeset done when I typed snmove? The nodes are already installed, I don't want to reinstall them. Secondly, According to the wiki documentation: If the CNs are up at the time the *snmove* command is run then snmove will run postscripts on the CNs to reconfigure them for the new SN. However I checked files on node0001 like /etc/ntp.conf and their timestamp had not changed (therefore I deduce the postscript did not run). So I ran the postscripts manually with updatenode node0001 syslog,setupntp. I checked /etc/ntp.conf again and this time the timestamp was updated but the file's contents were identical to before: it pointed to the MN_IP and not xcat-serv1 as it should based on the xcatmaster setting in the noderes table. What am I doing wrong here? Thanks, Josh On Fri, Jan 10, 2014 at 1:48 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote: Thank you Lissa, that is helpful. -Josh On Fri, Jan 10, 2014 at 1:25 PM, Lissa Valletta lis...@us.ibm.com wrote: DNS and DHCP will still work from the Service Node, if setup correctly. In other words, you have configured the service node as the DNS server and/or DHCP server for the nodes and there is no requirement on the Management Node for dns or dhcp. You will not be able to run any xcat commands on the service node, if the Management Node is down. xCAT requires access to the database configured on the MN for the xcat cluster ( mysql, postgresql) to run most xcat commands. Even to recognize that the node is in the xcat cluster. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong befor]Josh Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got your
Re: [xcat-user] xCAT start_SSL Bug?
Actually there is a disparity between the MN and SN OS type/version. First of all the MN is RedHat, and my service nodes will be Centos. That's shouldn't be a large problem. However the real difference is that the MN is running Red Hat Enterprise Linux Server release 5.9 (Tikanga) right now. I need the SN up to serve DNS and DHCP while I take the MN down to update it to RHEL 6.X. Would the versioning difference cause a problem? The OS repo for the SN is of course only pulling from the Centos 6.4 RPMs. Currently the MN has these package versions: # rpm -qa | grep -i ssl | sort gskssl64-8.0-14.14 openssl-0.9.8e-20.el5 openssl-0.9.8e-20.el5 openssl-0.9.8e-20.el5_7.1 openssl-0.9.8e-20.el5_7.1 openssl-0.9.8e-22.el5 openssl-0.9.8e-22.el5 openssl-0.9.8e-22.el5_8.1 openssl-0.9.8e-22.el5_8.1 openssl-0.9.8e-22.el5_8.3 openssl-0.9.8e-22.el5_8.3 openssl-0.9.8e-22.el5_8.4 openssl-0.9.8e-22.el5_8.4 openssl-devel-0.9.8e-22.el5_8.4 openssl-devel-0.9.8e-22.el5_8.4 perl-Crypt-SSLeay-0.51-11.el5 perl-IO-Socket-SSL-1.01-1.fc6 perl-Net-SSLeay-1.30-4.fc6 pyOpenSSL-0.6-2.el5 And the SN: # rpm -qa | grep -i ssl | sort openssl098e-0.9.8e-17.el6.centos.2.x86_64 openssl-1.0.0-27.el6.x86_64 perl-Crypt-SSLeay-0.57-16.el6.x86_64 perl-IO-Socket-SSL-1.77-1.x86_64 perl-Net-SSLeay-1.35-9.el6.x86_64 -Josh On Mon, Jan 27, 2014 at 10:34 AM, Lissa Valletta lis...@us.ibm.com wrote: Running in XCATBYPASS mode only stops you from using the daemon which stops the SSL Credential use, which is why you do not get errors in XCATBYPASS. Is you MN Centos 6.4?I would run rpm -qa | grep -i ssl and see what perl ssl rpms are installed. Do the same on the SErvice node. My service node has these for Redhat. openssl098e-0.9.8e-17.el6_2.2.ppc64 perl-Net-SSLeay-1.35-9.el6.ppc64 openssl-1.0.0-20.el6_2.5.ppc64 openssl098e-0.9.8e-17.el6_2.2.ppc perl-IO-Socket-SSL-1.31-2.el6.noarch perl-Crypt-SSLeay-0.57-16.el6.ppc64 Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but mine is already set t]Josh Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but mine is already set to that, and that is only useful if the r From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/27/2014 10:36 AM Subject: Re: [xcat-user] xCAT start_SSL Bug? -- Yes, I read that on the bug report but mine is already set to that, and that is only useful if the repo has the files needed. If it is perl-NET-SSLeay then even pointing it to xcat-dep/rh6/x86_64 won't help since it isn't present. Here are my yum repos: /etc/yum.repos.d/xCAT-otherpkgs0.repo: [xcat-otherpkgs0] name=xcat-otherpkgs0 baseurl=*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/ . enabled=1 gpgcheck=0 /etc/yum.repos.d/xCAT-otherpkgs1.repo: [xcat-otherpkgs1] name=xcat-otherpkgs1 baseurl= *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core enabled=1 gpgcheck=0 /etc/yum.repos.d/xCAT-otherpkgs2.repo: [xcat-otherpkgs2] name=xcat-otherpkgs2 baseurl= *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64 enabled=1 gpgcheck=0 I decided to remove the xCAT-otherpkgs0.repo, just in case it was interfering, then uninstall all xCAT packages, and reinstall them but it still has the same problem after all that. Here were the yum packages grabbed for the install: Dependencies Resolved PackageArch Version Repository Size Installing: conserver-xcat x86_64 8.1.16-10 xcat-otherpkgs2 218 k cpio x86_64 2.11-20 xcat-otherpkgs2 252 k elilo-xcat noarch 3.14-4xcat-otherpkgs2 75 k ipmitool-xcat x86_64 1.8.11-3 xcat-otherpkgs2 244
Re: [xcat-user] xCAT start_SSL Bug?
Also, as far as I can tell, all the correct SSL certs are present and I can ssh directly into the SN without providing login credentials. I don't see any SSL related errors in the system logs either. What is the basic issue that causes the error? An incorrect perl library? On Mon, Jan 27, 2014 at 11:07 AM, Josh Nielsen jniel...@hudsonalpha.orgwrote: Actually there is a disparity between the MN and SN OS type/version. First of all the MN is RedHat, and my service nodes will be Centos. That's shouldn't be a large problem. However the real difference is that the MN is running Red Hat Enterprise Linux Server release 5.9 (Tikanga) right now. I need the SN up to serve DNS and DHCP while I take the MN down to update it to RHEL 6.X. Would the versioning difference cause a problem? The OS repo for the SN is of course only pulling from the Centos 6.4 RPMs. Currently the MN has these package versions: # rpm -qa | grep -i ssl | sort gskssl64-8.0-14.14 openssl-0.9.8e-20.el5 openssl-0.9.8e-20.el5 openssl-0.9.8e-20.el5_7.1 openssl-0.9.8e-20.el5_7.1 openssl-0.9.8e-22.el5 openssl-0.9.8e-22.el5 openssl-0.9.8e-22.el5_8.1 openssl-0.9.8e-22.el5_8.1 openssl-0.9.8e-22.el5_8.3 openssl-0.9.8e-22.el5_8.3 openssl-0.9.8e-22.el5_8.4 openssl-0.9.8e-22.el5_8.4 openssl-devel-0.9.8e-22.el5_8.4 openssl-devel-0.9.8e-22.el5_8.4 perl-Crypt-SSLeay-0.51-11.el5 perl-IO-Socket-SSL-1.01-1.fc6 perl-Net-SSLeay-1.30-4.fc6 pyOpenSSL-0.6-2.el5 And the SN: # rpm -qa | grep -i ssl | sort openssl098e-0.9.8e-17.el6.centos.2.x86_64 openssl-1.0.0-27.el6.x86_64 perl-Crypt-SSLeay-0.57-16.el6.x86_64 perl-IO-Socket-SSL-1.77-1.x86_64 perl-Net-SSLeay-1.35-9.el6.x86_64 -Josh On Mon, Jan 27, 2014 at 10:34 AM, Lissa Valletta lis...@us.ibm.comwrote: Running in XCATBYPASS mode only stops you from using the daemon which stops the SSL Credential use, which is why you do not get errors in XCATBYPASS. Is you MN Centos 6.4?I would run rpm -qa | grep -i ssl and see what perl ssl rpms are installed. Do the same on the SErvice node. My service node has these for Redhat. openssl098e-0.9.8e-17.el6_2.2.ppc64 perl-Net-SSLeay-1.35-9.el6.ppc64 openssl-1.0.0-20.el6_2.5.ppc64 openssl098e-0.9.8e-17.el6_2.2.ppc perl-IO-Socket-SSL-1.31-2.el6.noarch perl-Crypt-SSLeay-0.57-16.el6.ppc64 Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but mine is already set t]Josh Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but mine is already set to that, and that is only useful if the r From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/27/2014 10:36 AM Subject: Re: [xcat-user] xCAT start_SSL Bug? -- Yes, I read that on the bug report but mine is already set to that, and that is only useful if the repo has the files needed. If it is perl-NET-SSLeay then even pointing it to xcat-dep/rh6/x86_64 won't help since it isn't present. Here are my yum repos: /etc/yum.repos.d/xCAT-otherpkgs0.repo: [xcat-otherpkgs0] name=xcat-otherpkgs0 baseurl=*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/ . enabled=1 gpgcheck=0 /etc/yum.repos.d/xCAT-otherpkgs1.repo: [xcat-otherpkgs1] name=xcat-otherpkgs1 baseurl= *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core enabled=1 gpgcheck=0 /etc/yum.repos.d/xCAT-otherpkgs2.repo: [xcat-otherpkgs2] name=xcat-otherpkgs2 baseurl= *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64 enabled=1 gpgcheck=0 I decided to remove the xCAT-otherpkgs0.repo, just in case it was interfering, then uninstall all xCAT packages, and reinstall them but it still has the same problem after all that. Here were the yum packages grabbed for the install: Dependencies Resolved PackageArch Version Repository Size Installing: conserver-xcat x86_64 8.1.16-10 xcat-otherpkgs2 218 k cpio x86_64 2.11-20
Re: [xcat-user] Determine which otherpkgs.pkglist file is being used
Okay, but did you see the problem I mentioned with provmethod=centos6.4-x86_64-install-service? That would mean (I think) that you have to create one (sub)directory name per service image under /install/custom in order for it to find the .tmpl and .pkglist files, since the scheme for locating those files is: /install/custom/provmethod/platform. Do I have to create a /install/custom/centos6.4-x86_64-install-service/centos directory now and no longer use /install/custom/install/whatever? Also the initrd kenel parameters take on the service image's name in their path when you nodeset with 'osimage'. I am quite confused, because currently I'm convinced that if I do a nodeset to xcat-serv1 with the osimage set to the current service image that it will not work because the directories it will point to don't exist. For example, where would I find the directory pointed to here: initrd=xcat/osimage/centos6.4-x86_64-install-service/initrd.img? You also had suggested using /install/custom/install/centos (as I have been), but with the osimage method it seems that directory is no longer usable since it would rather require /install/custom/centos6.4-x86_64-install-service/centos. Can any one clear up my confusion here? On Wed, Jan 22, 2014 at 6:48 AM, Lissa Valletta lis...@us.ibm.com wrote: It is supported, but we are encouraging everyone to go the osimage. The provmethod install, netboot, statelite will be sunset over time and even now our testing is with provmethod=osimage name. This is why we are warning. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 Josh Nielsen ---01/21/2014 05:55:28 PM---On a hunch I tried using the install keyword with nodeset in additon to osimage, and it actually set From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/21/2014 05:55 PM Subject: Re: [xcat-user] Determine which otherpkgs.pkglist file is being used On a hunch I tried using the install keyword with nodeset in additon to osimage, and it actually set the provmethod to install like I wanted, but I got a warning: # nodeset xcat-serv1 install osimage=centos6.4-x86_64-install-service Warning: The options install, netboot, and statelite have been deprecated. They should continue to work in this release, but have not been tested as carefully, and some new functions are not available with these options. For full function and support, use nodeset noderange osimage=osimage_name instead. xcat-serv1: install centos6.4-x86_64-service # lsdef xcat-serv1 Object name: xcat-serv1 arch=x86_64 currchain=boot currstate=install centos6.4-x86_64-service groups=service,ipmi,all initrd=xcat/centos6.4/x86_64/service/initrd.img installnic=mac ip=10.20.254.227 kcmdline=quiet repo=http://10.20.0.1:80/install/centos6.4/x86_64 ks=http://10.20.0.1:80/install/autoinst/xcat-serv1 ksdevice=00:50:56:bc:0c:db kernel=xcat/centos6.4/x86_64/service/vmlinuz mac=00:50:56:bc:0c:db netboot=xnba nfsserver=10.20.0.1 nodetype=osi os=centos6.4 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode primarynic=mac profile=service provmethod=install setupconserver=1 setupdhcp=1 setupnameserver=1 setupnfs=1 setupntp=1 setuptftp=1 status=booting statustime=01-17-2014 16:05:46 tftpserver=10.20.0.1 updatestatus=failed updatestatustime=01-17-2014 17:47:47 xcatmaster=10.20.0.1 If that warning is correct that install, netboot, and statelite are deprecated then that means that we cannot use the /install/custom/install|netboot|statelite/platform directories any more. Does that mean we will have to have one directory for each osimage name under /install/custom/ now? That would be significantly less convenient - though doable. Since you also suggested /install/custom/install/centos as a good directory to use, were you aware of this? That actually lines up with the (now deprecated?) documentation though: On Linux, if provmethod for the node is install, netboot or statelite , the os, profile, and arch of that node are used to search for the files in /install/custom/provmethod/platform first, and then in /opt/xcat/share/xcat/provmethod/platform. http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Provmethod%3Dinstall,netboot_or_statelite In any case, what is the recommended thing for me to do given my configuration? -Josh On Tue, Jan 21, 2014 at 4:24 PM, Josh Nielsen jniel...@hudsonalpha.org wrote: Okay, great. I have the new service image created now. I am confused though why it is giving the node the incorrect provmethod though when I nodeset it. Here is it the output from my terminal: # lsdef -t osimage -l centos6.4-x86_64-install-service Object name: centos6.4-x86_64-install-service imagetype
Re: [xcat-user] Frustrating time with sequential node discovery
Whoops, I meant to write Jarrod not Jarros. I went a little Koine Greek on your name there. Sorry about that. :-) On Wed, Jan 22, 2014 at 10:22 AM, Josh Nielsen jniel...@hudsonalpha.orgwrote: Jarros, I know you haven't been following the whole thread but Jonathan's problem (which this discussion originated from) is that somehow his installation is still using the /tftpboot/pxelinux.cfg/ files despite being up to date with genesis. I saw a response on the list from you to someone else about a similar problem in November 2013: http://sourceforge.net/mailarchive/message.php?msg_id=31683484 http://sourceforge.net/mailarchive/message.php?msg_id=31686689 Although I am not experiencing the same problem I too would be interested in the solution. It has always been challenging for me to understand the xCAT PXE deployment process, which is why I have 5-10 pages of self-written documentation and debugging info in a Google Doc about it. He posted his dhcpd.conf in his latest email. Any thoughts? -Josh On Wed, Jan 22, 2014 at 10:11 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: 1) If running 2.8, go ahead and delete nbroot-core. genesis is far more maintainable and easier to muck with on the end point (e.g. having full fledged glibc) 2) The greatest common denominator remains switch based. It's the only frequently instrumented topology cue that is pretty universal. For bladed solutions there is potential for the chassis manager to be a good topology cue. I am curious what goes awry with switch based discovery. Without a topology cue, then the choices are sequential discovery (which I frankly haven't used myself) or using/scripting nodediscoverls/nodediscoverdef. I personally think the latter is actually better and can be trivially made into a 'sequential' discovery using straightforward scripting.. 3) Tom Alandt is the best person to discuss what can/can't be done by IBM mfg. On the other hand, we *could* take some measures to make you impervious to the conflict. The 'lsslp --flexdiscover' for its respective bits is impervious to IP conflict issues and will automatically fix it. It's not a huge stretch to make that pan out for rackmount systems (though currently it's hard to pull off without *some* topology cue). [image: Inactive hide details for David D Johnson ---01/22/2014 08:11:53 AM---I've been lurking on this discussion, and just checked to]David D Johnson ---01/22/2014 08:11:53 AM---I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- From: David D Johnson david_john...@brown.edu To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/22/2014 08:11 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery -- I've been lurking on this discussion, and just checked to see what we've got -- nbroot or genesis -- and we have both of them. I had given up on node discovery years ago, we originally used the switch port numbers and forwarding tables to assign node names. Now I use ASU to collect the macs, and populate the mac table with a bit of grep and awk. But last week we powered on a rack of 20 non-ibm nodes, and I was wishing we had something easier since ASU didn't work for them. So I have two questions -- 1) Can I safely delete the xCAT-nbroot-core* RPMS ? 2) What is the current best practice method? What about for non-ibm hardware? No, three questions 3) How do you get IBM manufacturing to use a specific different 172.29.X for each rack they build for you? We've had three racks arrive in different months all with 172.29.101 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each one. I can't put them on the same network until the conflicts are gone. What's hard for me is that we get new nodes only a couple times a year, and I forget everything in between. Thanks, -- ddj On Jan 22, 2014, at 7:12 AM, Lissa Valletta *lis...@us.ibm.com*lis...@us.ibm.com wrote: xCAT-nbroot-core* was replaced by xCAT-genesis-* in xCAT 2.8. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? From: Xiao Peng Wang *w...@cn.ibm.com* w...@cn.ibm.com To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net Date: 01/22/2014 02:58 AM Subject: Re: [xcat-user] Frustrating time with sequential node discovery -- Why do you say that you need nbk.x86_64? Is this file listed in the /tftpboot/xcat/xnba/nets/? With the latest xCAT build, it needs /tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.* Thanks
[xcat-user] Sequential discovery does not support virtual machines message
Sorry list denizens, another issue I ran into. I am trying to rediscover a VM that I already once discovered with sequential discovery, but now it is acting up. From my /var/log/messages log when I PXE-boot the VM: Jan 22 16:28:36 x3650-head01 dhcpd: DHCPDISCOVER from 00:50:56:bc:0c:db via eth0 Jan 22 16:28:36 x3650-head01 dhcpd: DHCPOFFER on 10.20.242.251 to 00:50:56:bc:0c:db via eth0 Jan 22 16:28:36 x3650-head01 dhcpd: DHCPREQUEST for 10.20.242.251 (10.29.0.1) from 00:50:56:bc:0c:db via eth0 Jan 22 16:28:36 x3650-head01 dhcpd: DHCPACK on 10.20.242.251 to 00:50:56:bc:0c:db via eth0 Jan 22 16:28:36 x3650-head01 xCAT[7847]: xCAT: Allowing getcredentials x509cert Jan 22 16:28:37 x3650-head01 xCAT[28098]: xcatd: Processing discovery request from 10.20.242.251 Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential Discovery: Processing Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential discovery does not support virtual machines, exiting... Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential Discovery: Processing Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential discovery does not support virtual machines, exiting... Jan 22 16:28:42 x3650-head01 xCAT[28098]: xcatd: Processing discovery request from 10.20.242.251 Jan 22 16:28:42 x3650-head01 xCAT[28098]: Sequential Discovery: Processing Jan 22 16:28:42 x3650-head01 xCAT[28098]: Sequential discovery does not support virtual machines, exiting... I needed to clean things up a bit from all my previous deploys to that node anyway so I decided to start clean by deleting the mac entry from the xCAT table and try again, but it didn't make a difference, I still see the same message. This worked a few days ago with no such messages and it captured the VMs mac with no problem. What do I need to change so that I can start from scratch to discover that VM with Sequential Discovery again? Thanks, Josh -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Frustrating time with sequential node discovery
Evidently though something in his xCAT setup it creating the files in /tftpboot/pxelinux.cfg/ with reference to xnba just like my installation. Where does xCAT grab the configuration for that? Maybe it was because I didn't do a completely clean install and did an in-place upgrade, but my cluster actually works perfectly with both xnba genesis installed because it uses xnba first to bootstrap and then requests the Genesis image. xCAT must support that scenario else I haven't the slightest idea by what miracle my installation is running with such a configuration. :-) -Josh On Tue, Jan 21, 2014 at 2:58 PM, Russell Jones russell-l...@jonesmail.me wrote: xNBA is a customized gpxe image that xCAT uses. NBFS is the older maintenance image that was used for if you set your node to boot to shell, or booted a runimage script. NBFS is deprecated, and Genesis replaced NBFS as the maintenance image for these tasks. In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs installed - Genesis replaced them. perl-xCAT-2.8.3-snap201311122316.noarch xCAT-2.8.3-snap201311122318.x86_64 xCAT-client-2.8.3-snap201311122316.noarch xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch elilo-xcat-3.14-4.noarch xCAT-server-2.8.3-snap201311122316.noarch xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch ipmitool-xcat-1.8.11-3.x86_64 conserver-xcat-8.1.16-10.x86_64 xCAT-buildkit-2.8.3-snap201311122318.noarch syslinux-xcat-3.86-2.noarch On 1/21/2014 2:38 PM, Josh Nielsen wrote: Hi Jonathan, Yes, I definitely think that would cause a problem. This is jogging my memory because I think that when the new Genesis boot loader was rolled out in the first version of xCAT that supported it that I faced a similar problem. I had assumed that only Genesis was needed but xNBA is still used an an intermediate image even if it is no longer the final image. I will check my yum repos as soon as I can - but by some unfortunate coincidence I just discovered that YUM is not working since our RHEL license expired three days ago (unbeknownst to me until 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You may even have to pull xNBA images from an older install(?) and then run mknb to build the images. I remember downloading the tarred files with the RPM manually and creating a local repo for xCAT. Whenever I get YUM back I'll give you more specifics if I can. -Josh On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills jonmi...@renci.org wrote: Josh, I don't doubt that you're on to something. But if this is the case, it means my systems are missing some files, namely: /tftpboot/xcat/nbk.x86_64 /tftpboot/xcat/nbfs.x86_64.gz Can you tell me what RPM installed those files on your system? They don't exist on mine, and even a 'yum provides' doesn't find them. On 01/21/2014 11:51 AM, Josh Nielsen wrote: Hi Jonathan, It is my understanding, from extensive debugging and notes that I have taken about the xCAT netbooting process in the past, that xCAT uses a two-stage image deployment method. It will first come up with a more generic boot image (normally xnba or sometimes yaboot) which - when it contacts the xCAT headnode (or the node handling DHCP requests) - the headnode will then recognize the current image on the client that is sending requests to DHCP for further boot instructions, and will tell the client to then load another image based on the subnet and image type it is currently using. For example my headnode's /etc/dhcpd.conf file has an entry that looks like this: hared-network eth0 { subnet 10.20.0.0 netmask 255.255.0.0 { max-lease-time 43200; min-lease-time 43200; default-lease-time 43200; next-server 10.20.0.1; option log-servers 10.20.0.1; option ntp-servers 10.20.0.1; option domain-name x; option domain-name-servers 10.20.0.1; if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent always-broadcast on; filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16;; } else if option user-class-identifier = xNBA and option client-architecture = 00:09 { #x86, xCAT Network Boot Agent filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi;; } else if option client-architecture = 00:00 { #x86 filename xcat/xnba.kpxe; } else if option vendor-class-identifier = Etherboot-5.4 { #x86 filename xcat/xnba.kpxe; } else if option client-architecture = 00:07 { #x86_64 uefi filename xcat/xnba.efi; } else if option client-architecture = 00:09 { #x86_64 uefi alternative id filename xcat/xnba.efi; } else if option client-architecture = 00:02 { #ia64 filename elilo.efi; } else if substring(filename,0,1) = null { #otherwise, provide yaboot if the client isn't specific filename
[xcat-user] Determine which otherpkgs.pkglist file is being used
A node I am deploying seems to be grabbing the wrong otherpkgs.pkglist file. I have placed pkglist files in /install/custom/install/centos and created one the service group/profile. In that directory are these files: compute.otherpkgs.pkglist compute.pkglist compute.tmpl hinode01.tmpl service.otherpkgs.pkglist service.tmpl I want to use the service.otherpkgs.pkglist file. I deployed a node xcat-serv1 which belongs to the service group but it did not install the RPMs I specified in the otherpkgs.pkglist file for 'service'. When I did an lsdef I saw that the profile was set to compute and realized that would be a problem so I changed it to 'service' as well and tried an updatenode xcat-serv1 otherpkgs but it didn't make a difference. Here is the current node definition: [root@x3650-head01 centos]# lsdef xcat-serv1 Object name: xcat-serv1 arch=x86_64 currchain=boot currstate=boot groups=service,ipmi,all initrd=xcat/osimage/centos6.4-x86_64-install-compute/initrd.img installnic=mac kernel=xcat/osimage/centos6.4-x86_64-install-compute/vmlinuz netboot=xnba nodetype=osi os=centos6.4 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode primarynic=mac profile=service provmethod=centos6.4-x86_64-install-compute snipped The groups and profile look correct now, but I still can't get the right packages to install. Is there a particular line in the otherpkgs postscript I could use to print a variable with the file name of the file chosen/matched against? I looked but only found a variable with the actual parsed packages from whatever file it chose. It looks like it is using the compute.otherpkgs.pkglist. I can't quite figure out why it wouldn't match though. This page seems to indicate that my directory and nomenclature for the files is correct: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Provmethod%3Dinstall,netboot_or_statelite. And I know the kickstarts from that folder work since they are used during provisioning. It is usually something simple. Does anyone see what I am missing? Thanks, Josh -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Determine which otherpkgs.pkglist file is being used
Nevermind, it was the funky provmethod value. Would provisioning it with nodeset xcat-serv1 osimage=centos6.4-x86_64-install-compute change the provmethod to the same thing as the osimage? The provmethod needs to simply be install. Not sure how it got the same name as the osimage. -Josh On Fri, Jan 17, 2014 at 5:43 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote: A node I am deploying seems to be grabbing the wrong otherpkgs.pkglist file. I have placed pkglist files in /install/custom/install/centos and created one the service group/profile. In that directory are these files: compute.otherpkgs.pkglist compute.pkglist compute.tmpl hinode01.tmpl service.otherpkgs.pkglist service.tmpl I want to use the service.otherpkgs.pkglist file. I deployed a node xcat-serv1 which belongs to the service group but it did not install the RPMs I specified in the otherpkgs.pkglist file for 'service'. When I did an lsdef I saw that the profile was set to compute and realized that would be a problem so I changed it to 'service' as well and tried an updatenode xcat-serv1 otherpkgs but it didn't make a difference. Here is the current node definition: [root@x3650-head01 centos]# lsdef xcat-serv1 Object name: xcat-serv1 arch=x86_64 currchain=boot currstate=boot groups=service,ipmi,all initrd=xcat/osimage/centos6.4-x86_64-install-compute/initrd.img installnic=mac kernel=xcat/osimage/centos6.4-x86_64-install-compute/vmlinuz netboot=xnba nodetype=osi os=centos6.4 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode primarynic=mac profile=service provmethod=centos6.4-x86_64-install-compute snipped The groups and profile look correct now, but I still can't get the right packages to install. Is there a particular line in the otherpkgs postscript I could use to print a variable with the file name of the file chosen/matched against? I looked but only found a variable with the actual parsed packages from whatever file it chose. It looks like it is using the compute.otherpkgs.pkglist. I can't quite figure out why it wouldn't match though. This page seems to indicate that my directory and nomenclature for the files is correct: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Provmethod%3Dinstall,netboot_or_statelite. And I know the kickstarts from that folder work since they are used during provisioning. It is usually something simple. Does anyone see what I am missing? Thanks, Josh -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Update schema after upgrade prematurely terminated
I actually checked that yesterday when you mentioned it and I had installed all the latest RPMs the first time around, and I had already restarted the daemon several times. So I'm not sure why it was misreporting the version for a while. -Josh On Thu, Jan 16, 2014 at 7:04 AM, Lissa Valletta lis...@us.ibm.com wrote: I expect xcatd restarted cleanly for some reason.Did you run yum update '*xCAT*' finally instead of yum update xcat. That would have done it also. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/14/2014 03:37:53 PM---Suddenly the lsxcatd command is returning 2.8.3 to me now. I]Josh Nielsen ---01/14/2014 03:37:53 PM---Suddenly the lsxcatd command is returning 2.8.3 to me now. I'm not sure why it would take time to up From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/14/2014 03:37 PM Subject: Re: [xcat-user] Update schema after upgrade prematurely terminated -- Suddenly the lsxcatd command is returning 2.8.3 to me now. I'm not sure why it would take time to update, but it looks fine now: [root@x3650-head01 init.d]# lsxcatd -a Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013) This is a Management Node cfgloc=mysql:dbname=xcatdb;host=10.X.X.X|xx dbengine=mysql dbname=xcatdb dbhost=10.X.X.X dbadmin=xx And here is my rpm query: [root@x3650-head01 init.d]# rpm -qa | grep xCAT xCAT-nbroot-oss-ppc64-2.0-snap200801291320 xCAT-nbkernel-x86-2.6.18_128-8 xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318 perl-xCAT-2.8.3-snap201311122316 xCAT-server-2.8.3-snap201311122316 xCAT-nbroot-core-x86-2.3.1-snap200911062009 xCAT-2.8.3-snap201311122318 xCAT-nbroot-oss-x86_64-2.0-snap200801291344 xCAT-nbroot-core-x86_64-2.3.1-snap200911062009 xCAT-nbkernel-x86_64-2.6.18_128-8 xCAT-genesis-base-x86_64-2.8-snap201308090229 xCAT-nbroot-oss-x86-2.0-snap200804021050 xCAT-nbroot-core-ppc64-2.3.1-snap200911062009 xCAT-nbkernel-ppc64-2.6.18_92-4 xCAT-buildkit-2.8.3-snap201311122318 xCAT-client-2.8.3-snap201311122316 Except for the ppc64 packages (which I don't really need anyway since we use x86_64) everything is up to the current version. The xCAT nbroot-core and nbkernel RPMs aren't up to 2.8.X but it seems that they are still at the most current version available (I have my yum repos pointing to the latest tarballs of the core and dep packages). Thanks for all the help suggestions. I'm just going to try using it now. This was all prep so that I can deploy some service nodes (as mentioned in my other emails to the mailing list). Thanks, Josh On Tue, Jan 14, 2014 at 1:55 PM, Lissa Valletta *lis...@us.ibm.com*lis...@us.ibm.com wrote: Two possibilities the daemon did not restart correctly and that has been common when upgrading from 2.7.6 to higher levels. service xcatd stop ps -ef | grep xcatd kill -9 any hung xcatd processes service xcatd start The other thing is make sure your rpms are all at the 2.8.3 level. rpm -qa | grep xCAT Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all kinds of trouble with xCA]Josh Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7. From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net, Date: 01/14/2014 01:05 PM Subject: [xcat-user] Update schema after upgrade prematurely terminated -- I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and dep tarballs, and made local yum repos, and installed all the newer packages from them. Then to check if the daemon had been upgraded I did an 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon manually from the command line with the verbose flag but wasn't expecting to see the schema updates (honestly all I saw were SQL commands scrolling in the screen - I didn't know they were for the schema at the time) and I panicked and hit CRTL+C. This halted the daemon updates but the next time I tried to run it I got an error although the daemon ran in the background. This is from my terminal: root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT updateschema: Running ALTER TABLE mpa ADD `slots` TEXT updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT
[xcat-user] Makedns wiping out DNS records
Regardless of the version of xCAT we have used in the past, my coworkers and I have noticed that running makedns with any following options can be dangerous for us, because something happens where it will wipe out all the records in DNS. If we run makedns with no options generally any new nodes that have been added to /etc/hosts will be parsed and added appropriately. Maybe I am misunderstanding what happens when the command is run. Here is a recent example. My entry for a new node xcat-serv1 for some reason didn't have the reverse lookup from the IP in the DNS file for that subnet (db.10.20), and so I decided to try to remove just the DNS entry for that node and then re-add it. Here is what I ran: # makedns -d xcat-serv1 Handling xcat-serv1 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Updating zones. Completed updating zones. Updating DNS records, this may take several minutes for a large cluster. Completed updating DNS records. Then I ran it with the -n parameter to re-add that node (with XCATBYPASS=1): # export XCATBYPASS=1 [root@x3650-head01 centos]# makedns -n xcat-serv1 Unrecognized escape \h passed through at /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408. Subroutine handled_commands redefined at /opt/xcat/lib/perl/xCAT_plugin/ hosts.pm line 28. Subroutine delnode redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 33. Subroutine addnode redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 58. Subroutine build_line redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 117. Subroutine addotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/ hosts.pm line 182. Subroutine delotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/ hosts.pm line 209. Subroutine add_hosts_content redefined at /opt/xcat/lib/perl/xCAT_plugin/ hosts.pm line 234. Subroutine process_request redefined at /opt/xcat/lib/perl/xCAT_plugin/ hosts.pm line 305. Subroutine writeout redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 511. Subroutine donics redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pm line 545. Subroutine getIPdomain redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 801. Handling xcat-serv1 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Stopping named: . [ OK ] Updating zones. Completed updating zones. Restarting named Restarting named complete Updating DNS records, this may take several minutes for a large cluster. Completed updating DNS records. DNS setup is completed When I did an 'ls' in /var/named many records were missing and the ones still left we virtually empty except one entry for the headnode: BEFORE: # ls /var/named data db.10.29 db.10.40 db.127.0.0db.172.26.41 db.172.29 db.192.168.100 db.haig db.morgan.haib.org.jnl db.10.29.jnl db.10.60 db.172.20 db.172.26.42 db.172.30 db.cache db.morgan slaves db.10.20 db.10.30 db.10.60.jnl db.172.26.40 db.172.26.43 db.172.40 db.haib db.morgan.haib.org AFTER: # ls /var/named data db.10.20.jnl db.10.29.jnl db.10.30.jnl db.10.40.jnl db.10.60.jnl db.172.26.41 db.172.26.43db.morgan.haib.org.jnl db.10.20 db.10.29 db.10.30 db.10.40 db.10.60 db.172.26.40 db.172.26.42 db.morgan.haib.org slaves As you can see several files are missing, Luckily I made a full backup yesterday and was able to restore the old files. From /var/log/messages you can see the initial deletion message for xcat-serv1: Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '11.102.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '7.101.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '10.102.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '27.102.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '33.101.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '1.102.40.10.IN-ADDR.ARPA' PTR Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating zone Then for the makedens -a xcat-serv you actually see it update EVERY ZONE: Jan 16 10:32:42 x3650-head01 named[8249]: client 10.20.0.1#43788: updating zone '30.10.IN-ADDR.ARPA/IN': adding an RR at '1.50.30.10.IN-ADDR.ARPA' PTR Jan 16 10:32:42 x3650-head01 named[8249]: client 10.20.0.1#43788: updating zone '30.10.IN-ADDR.ARPA/IN':
Re: [xcat-user] Makedns wiping out DNS records
Okay, I will look through that document. This is the first and only time I've used XCATBYPASS with that command (to get more clarity into what was going on in the background). Makedns has been presenting us with the problem(s) I mentioned for well over a year now, with previous versions of xCAT as well. I'm guessing that if this doesn't happen for other people that it must have been some setting when xCAT was originally installed which was misconfigured, or maybe something funny in one of the xCAT db tables like 'site'. I just wanted to see if anyone has had problems with using the -n or -d options deleting more than they are supposed to. -Josh On Thu, Jan 16, 2014 at 11:23 AM, Lissa Valletta lis...@us.ibm.com wrote: We have an entire doc on name resolution also I would not make it a common practice to use XCATBYPASS mode. Running with the xcatd is much more secure and some things do not work well in XCATBYPASS mode. https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/16/2014 12:03:28 PM---Regardless of the version of xCAT we have used in the past,]Josh Nielsen ---01/16/2014 12:03:28 PM---Regardless of the version of xCAT we have used in the past, my coworkers and I have noticed that run From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/16/2014 12:03 PM Subject: [xcat-user] Makedns wiping out DNS records -- Regardless of the version of xCAT we have used in the past, my coworkers and I have noticed that running makedns with any following options can be dangerous for us, because something happens where it will wipe out all the records in DNS. If we run makedns with no options generally any new nodes that have been added to /etc/hosts will be parsed and added appropriately. Maybe I am misunderstanding what happens when the command is run. Here is a recent example. My entry for a new node xcat-serv1 for some reason didn't have the reverse lookup from the IP in the DNS file for that subnet (db.10.20), and so I decided to try to remove just the DNS entry for that node and then re-add it. Here is what I ran: # makedns -d xcat-serv1 Handling xcat-serv1 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Updating zones. Completed updating zones. Updating DNS records, this may take several minutes for a large cluster. Completed updating DNS records. Then I ran it with the -n parameter to re-add that node (with XCATBYPASS=1): # export XCATBYPASS=1 [root@x3650-head01 centos]# makedns -n xcat-serv1 Unrecognized escape \h passed through at /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408. Subroutine handled_commands redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 28. Subroutine delnode redefined at /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line 33. Subroutine addnode redefined at /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line 58. Subroutine build_line redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 117. Subroutine addotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 182. Subroutine delotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 209. Subroutine add_hosts_content redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 234. Subroutine process_request redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 305. Subroutine writeout redefined at /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line 511. Subroutine donics redefined at /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line 545. Subroutine getIPdomain redefined at /opt/xcat/lib/perl/xCAT_plugin/ *hosts.pm* http://hosts.pm/ line 801. Handling xcat-serv1 in /etc/hosts. Getting reverse zones, this may take several minutes for a large cluster. Completed getting reverse zones. Stopping named: . [ OK ] Updating zones. Completed updating zones. Restarting named Restarting named complete Updating DNS records, this may take several minutes for a large cluster. Completed updating DNS records. DNS setup is completed When I did an 'ls' in /var/named many records were missing and the ones still left we virtually empty except one entry for the headnode: BEFORE: # ls /var/named data db.10.29 db.10.40 db.127.0.0db.172.26.41 db.172.29 db.192.168.100 db.haig db.morgan.haib.org.jnl db.10.29.jnl db.10.60 db.172.20 db.172.26.42 db.172.30 db.cachedb.morgan slaves db.10.20 db.10.30 db
Re: [xcat-user] Update schema after upgrade prematurely terminated
Unfortunately, no. Would the MySQL database have to be off as well? I thought the daemon might automatically restart after the upgrade, but you are right that I should have turned it off first. At this point things still work, but I am most confused about lsxcatd still returning 2.7.3 to me. Maybe it is getting that as a string from a database table that hasn't been updated yet instead of directly from the xcatd binary? I have some of the new commands like nodediscoverls and nodediscoverstart which weren't there before so I'm pretty sure everything was upgraded. I did a yum update xcat and for good measure yum install xCAT-nbkernel-x86_64 xCAT-genesis-x86_64 xCAT-client xCAT-nbroot-core-x86_64 xCAT-server xCAT-nbroot-oss-x86_64. As long as nothing is operating incorrectly though I guess I can just presume everything is at the latest version. -Josh On Tue, Jan 14, 2014 at 12:12 PM, Russell Jones russell-l...@jonesmail.mewrote: Did you remember to stop the xcatd service before doing the upgrade? On 1/14/2014 12:02 PM, Josh Nielsen wrote: I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and dep tarballs, and made local yum repos, and installed all the newer packages from them. Then to check if the daemon had been upgraded I did an 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon manually from the command line with the verbose flag but wasn't expecting to see the schema updates (honestly all I saw were SQL commands scrolling in the screen - I didn't know they were for the schema at the time) and I panicked and hit CRTL+C. This halted the daemon updates but the next time I tried to run it I got an error although the daemon ran in the background. This is from my terminal: root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT updateschema: Running ALTER TABLE mpa ADD `slots` TEXT updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT updateschema: Running ALTER TABLE switches ADD `protocol` TEXT updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT updateschema: Running ALTER TABLE vm ADD `mgr` TEXT updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT updateschema: Running ALTER TABLE vm ADD `cluster` TEXT updateschema: Running ALTER TABLE vm ADD `physlots` TEXT updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT updateschema: Running ALTER TABLE domain ADD `type` TEXT updateschema: Running ALTER TABLE nodegroup ADD `membergroups` TEXT updateschema: Running ALTER TABLE noderes ADD `nameservers` TEXT updateschema: Running ALTER TABLE osimage ADD `groups` TEXT updateschema: Running ALTER TABLE osimage ADD `description` TEXT updateschema: Running ALTER TABLE osimage ADD `osdistroname` TEXT updateschema: Running ALTER TABLE osimage ADD `osupdatename` VARCHAR(1024) updateschema: Running ALTER TABLE osimage ADD `cfmdir` TEXT updateschema: Running ALTER TABLE osimage ADD `serverrole` TEXT updateschema: Running ALTER TABLE osimage ADD `isdeletable` TEXT updateschema: Running ALTER TABLE osimage ADD `kitcomponents` TEXT updateschema: Running ALTER TABLE passwd ADD `authdomain` TEXT updateschema: Running ALTER TABLE nodepos ADD `height` TEXT updateschema: Running ALTER TABLE nodehm ADD `cmdmapping` TEXT [root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v Unrecognized escape \h passed through at /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408. Despite this I can still use 'service xcatd restart' and start the daemon, and I can run psh commands and tabdump etcetera, but now I'm unsure whether the xCAT MySQL database is in an inconsistent state and has not received all the schema updates it was supposed to. I was hoping that by running xcatd -v again I would see it try again but I haven't. And even after all that I still see 2.7.3 when I type 'lsxcatd'. Maybe I need to restart the node. Any clue
Re: [xcat-user] Update schema after upgrade prematurely terminated
Suddenly the lsxcatd command is returning 2.8.3 to me now. I'm not sure why it would take time to update, but it looks fine now: [root@x3650-head01 init.d]# lsxcatd -a Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013) This is a Management Node cfgloc=mysql:dbname=xcatdb;host=10.X.X.X|xx dbengine=mysql dbname=xcatdb dbhost=10.X.X.X dbadmin=xx And here is my rpm query: [root@x3650-head01 init.d]# rpm -qa | grep xCAT xCAT-nbroot-oss-ppc64-2.0-snap200801291320 xCAT-nbkernel-x86-2.6.18_128-8 xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318 perl-xCAT-2.8.3-snap201311122316 xCAT-server-2.8.3-snap201311122316 xCAT-nbroot-core-x86-2.3.1-snap200911062009 xCAT-2.8.3-snap201311122318 xCAT-nbroot-oss-x86_64-2.0-snap200801291344 xCAT-nbroot-core-x86_64-2.3.1-snap200911062009 xCAT-nbkernel-x86_64-2.6.18_128-8 xCAT-genesis-base-x86_64-2.8-snap201308090229 xCAT-nbroot-oss-x86-2.0-snap200804021050 xCAT-nbroot-core-ppc64-2.3.1-snap200911062009 xCAT-nbkernel-ppc64-2.6.18_92-4 xCAT-buildkit-2.8.3-snap201311122318 xCAT-client-2.8.3-snap201311122316 Except for the ppc64 packages (which I don't really need anyway since we use x86_64) everything is up to the current version. The xCAT nbroot-core and nbkernel RPMs aren't up to 2.8.X but it seems that they are still at the most current version available (I have my yum repos pointing to the latest tarballs of the core and dep packages). Thanks for all the help suggestions. I'm just going to try using it now. This was all prep so that I can deploy some service nodes (as mentioned in my other emails to the mailing list). Thanks, Josh On Tue, Jan 14, 2014 at 1:55 PM, Lissa Valletta lis...@us.ibm.com wrote: Two possibilities the daemon did not restart correctly and that has been common when upgrading from 2.7.6 to higher levels. service xcatd stop ps -ef | grep xcatd kill -9 any hung xcatd processes service xcatd start The other thing is make sure your rpms are all at the 2.8.3 level. rpm -qa | grep xCAT Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all kinds of trouble with xCA]Josh Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7. From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/14/2014 01:05 PM Subject: [xcat-user] Update schema after upgrade prematurely terminated -- I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and dep tarballs, and made local yum repos, and installed all the newer packages from them. Then to check if the daemon had been upgraded I did an 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon manually from the command line with the verbose flag but wasn't expecting to see the schema updates (honestly all I saw were SQL commands scrolling in the screen - I didn't know they were for the schema at the time) and I panicked and hit CRTL+C. This halted the daemon updates but the next time I tried to run it I got an error although the daemon ran in the background. This is from my terminal: root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT updateschema: Running ALTER TABLE mpa ADD `slots` TEXT updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT updateschema: Running ALTER TABLE switches ADD `protocol` TEXT updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT updateschema: Running ALTER TABLE vm ADD `mgr` TEXT updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT updateschema: Running ALTER TABLE vm ADD `cluster` TEXT updateschema: Running ALTER TABLE vm ADD `physlots` TEXT updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT
Re: [xcat-user] Update schema after upgrade prematurely terminated
Whew. Great! I thought I caused that by terminating the schema update early at first. So, speaking of which, would the schema update have completed when I launched it the next time? I'm still not clear on that. Would the xcat daemon just refuse to launch if it hadn't completed the schema update? Thanks, Josh On Tue, Jan 14, 2014 at 2:06 PM, Lissa Valletta lis...@us.ibm.com wrote: Unrecognized escape \h passed through at /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408 is a known problem.You can ignore it. The file is only used on z-series. if you want to get rid of it mv /opt/xcat/lib/perl/xCAT/zvmUtils.pm /opt/xcat/lib/perl/xCAT/zvmUtils.pm.save and mv /opt/xcat/lib/perl/xCAT_plugin/zvm.pm /opt/xcat/lib/perl/xCAT_plugin/zvm.pm.save This is the defect: https://sourceforge.net/p/xcat/bugs/3946/ Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Jarrod B Johnson---01/14/2014 01:33:58 PM---FYI, that should be safe. The schema updates *within* 2.]Jarrod B Johnson---01/14/2014 01:33:58 PM---FYI, that should be safe. The schema updates *within* 2.x have generally been backward compatible. From: Jarrod B Johnson/Raleigh/IBM@IBMUS To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/14/2014 01:33 PM Subject: Re: [xcat-user] Update schema after upgrade prematurely terminated -- FYI, that should be safe. The schema updates *within* 2.x have generally been backward compatible. The schema update is done (it checks the tables and columns instead of something like a version string), so you are good on that front. The error message you are getting is unrelated, would have to ask the mainframe guys about what that expression is trying to do... XCATBYPASS=1 nodels --version nodels --version ps axf|grep -i xcatd [image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:02 PM---I seem to be getting myself in all kinds of trouble with xCA]Josh Nielsen ---01/14/2014 01:05:02 PM---I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7. From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 01/14/2014 01:05 PM Subject: [xcat-user] Update schema after upgrade prematurely terminated -- I seem to be getting myself in all kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and dep tarballs, and made local yum repos, and installed all the newer packages from them. Then to check if the daemon had been upgraded I did an 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon manually from the command line with the verbose flag but wasn't expecting to see the schema updates (honestly all I saw were SQL commands scrolling in the screen - I didn't know they were for the schema at the time) and I panicked and hit CRTL+C. This halted the daemon updates but the next time I tried to run it I got an error although the daemon ran in the background. This is from my terminal: root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT updateschema: Running ALTER TABLE mpa ADD `slots` TEXT updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT updateschema: Running ALTER TABLE switches ADD `protocol` TEXT updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT updateschema: Running ALTER TABLE vm ADD `mgr` TEXT updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT updateschema: Running ALTER TABLE vm ADD `cluster` TEXT updateschema: Running ALTER TABLE vm ADD `physlots` TEXT updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT updateschema: Running ALTER TABLE domain ADD `type` TEXT updateschema: Running ALTER TABLE nodegroup ADD `membergroups` TEXT updateschema: Running ALTER TABLE noderes ADD `nameservers` TEXT
Re: [xcat-user] Node discovery for VMs in VMWare
Then the thing turns to you need make sure the nodeset has been run correctly on your management node. Did you see the message like this? node name : netboot sles11.2-x86_64-compute Actually, no, nodeset didn't return anything. I tried nodeset xcat-serv1 install. The hyphen in the node name isn't problematic is it? After the upgrade to 2.8.3 the warning about the SN daemons not being up is gone now (which is good), but nothing is returned to the console when I do a nodeset (it completes silently). I don't think it is working, and I didn't see an entry created for it in /tftpboot/xcat/xnba/nodes/. Here are the entries in my tables: *tabdump nodelist:* ... xcat-serv2,service,ipmi,all,, xcat-serv1,service,ipmi,all,, *tabdump servicenode:* #node,nameserver,dhcpserver,tftpserver,nfsserver,conserver,monserver,ldapserver,ntpserver,ftpserver,nimserver,ipforward,dhcpinterfaces,comments,disable service,2,1,1,1,1,,,1,, *lsdef service:* Object name: xcat-serv1 arch=x86_64 groups=service,ipmi,all installnic=mac ip=172.20.3.1 mac=00:50:56:bc:0c:db netboot=xnba nfsserver=10.X.X.X nodetype=osi os=centos6.4 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode primarynic=mac profile=service provmethod=centos6.4-x86_64-install-compute setupconserver=1 setupdhcp=1 setupnameserver=2 setupnfs=1 setupntp=1 setuptftp=1 tftpserver=10.X.X.X xcatmaster=10.X.X.X Object name: xcat-serv2 arch=x86_64 groups=service,ipmi,all installnic=mac ip=172.20.3.2 netboot=xnba nfsserver=10.X.X.X nodetype=osi os=centos6.4 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode primarynic=mac profile=service provmethod=centos6.4-x86_64-install-compute setupconserver=1 setupdhcp=1 setupnameserver=2 setupnfs=1 setupntp=1 setuptftp=1 tftpserver=10.X.X.X xcatmaster=10.X.X.X And the OS images: *lsdef -t osimage:* centos6.4-x86_64-install-all (osimage) centos6.4-x86_64-install-compute (osimage) centos6.4-x86_64-install-iscsi (osimage) centos6.4-x86_64-install-kvm (osimage) centos6.4-x86_64-install-login (osimage) centos6.4-x86_64-install-storage (osimage) centos6.4-x86_64-install-xen (osimage) centos6.4-x86_64-netboot-compute (osimage) centos6.4-x86_64-netboot-kvm (osimage) centos6.4-x86_64-netboot-xen (osimage) centos6.4-x86_64-statelite-compute (osimage) centos6.4-x86_64-statelite-kvm (osimage) centos6.4-x86_64-statelite-xen (osimage) rhels5.3-x86_64-install-compute (osimage) rhels5.3-x86_64-install-iscsi (osimage) rhels5.3-x86_64-install-kvm (osimage) rhels5.3-x86_64-install-service (osimage) rhels5.3-x86_64-install-storage (osimage) rhels5.3-x86_64-install-xen (osimage) Any idea what might be going wrong? Thanks, Josh On Mon, Jan 13, 2014 at 7:12 PM, Xiao Peng Wang w...@cn.ibm.com wrote: Regarding the dispatching xCAT command request to service node, in certain case (e.g. create dhcpd configuration) the request will be sent to all the service nodes in the xCAT database (which is configured in servicenode table) so that every service nodes can handle the dhcp request for load balance. In your current situation, you don't need service node to handle your target nodes (they are service node), xCAT MN will cover them. So you can ignore this error message anyway. Then the thing turns to you need make sure the nodeset has been run correctly on your management node. Did you see the message like this? node name : netboot sles11.2-x86_64-compute Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/01/14 07:39:35---Thank you, that information is very useful. I just realized tha]Josh Nielsen ---2014/01/14 07:39:35---Thank you, that information is very useful. I just realized that I have an older version of xCAT (2. From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/14 07:39 Subject: Re: [xcat-user] Node discovery for VMs in VMWare -- Thank you, that information is very useful. I just realized that I have an older version of xCAT (2.7.3) and need to upgrade to get those commands for node discovery. I will do that soon. Also when I did a 'nodeset' on one of the service nodes I saw an error, as if xcat were already trying to delegate commands to service nodes that aren't even provisioned yet. I'm pretty sure it would do this regardless of what node I tried to nodeset. What I can't tell is whether the command was successful
Re: [xcat-user] Node discovery for VMs in VMWare
Thank you, that information is very useful. I just realized that I have an older version of xCAT (2.7.3) and need to upgrade to get those commands for node discovery. I will do that soon. Also when I did a 'nodeset' on one of the service nodes I saw an error, as if xcat were already trying to delegate commands to service nodes that aren't even provisioned yet. I'm pretty sure it would do this regardless of what node I tried to nodeset. What I can't tell is whether the command was successful or not, despite the errors saying that it can't reach the service node daemons. Here is what I got: [root@x3650-head01 ~]# nodeset xcat-serv1 Error: Unable to dispatch hierarchical sub-command to xcat-serv1:3001. This service node may be down or its xcatd daemon may not be responding. Error: Unable to dispatch hierarchical sub-command to xcat-serv2:3001. This service node may be down or its xcatd daemon may not be responding. That is very odd since this presents a chicken-and-egg problem if the nodeset is not working (the nodes can't be required to be present before they are provisioned). I followed the xCAT guide to fully define the service nodes in the xCAT tables first, but it appears something in the tables is making xCAT think that the service nodes are/should be up already. Can I ignore this or is nodeset not going to work until I fix this somehow? Maybe this will be fixed by upgrading xCAT though. -Josh On Sun, Jan 12, 2014 at 8:29 PM, Xiao Peng Wang w...@cn.ibm.com wrote: You saw the discovery message was because of the node configuration was not finished yet. After running 'nodeset' and start the node deployment, the discovery message won't show again. Regarding the discovery, you can use 'sequential discovery' to discover virtual node. Refer to: https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_iDataPlex_Cluster_Quick_Start#Option_1:_Sequential_Discovery Another easy manual method is: You can check the undiscovered node by command: 'nodediscoverls -t all' (Any nodes which is sending discovery request will be recorded in discoverydata table and can be dispalyed by nodediscoverls command). And run 'nodediscoverdef' to define the undiscovered node to a node definition in xCAT db. See examples: # nodediscoverls -t all UUIDNODEMETHOD MTM SERIAL 643CC79B-B7BD-224F-8105-9AEF8329E195undef undef 9AA99E16 # nodediscoverls -t all -l Object uuid: 643CC79B-B7BD-224F-8105-9AEF8329E195 method=undef discoverytime=01-13-2014 01:14:28 arch=x86_64 cpucount=2 cputype=Intel Core 2 Duo P9xxx (Penryn Class Core 2) memory=3770188 serial=9AA99E16-BAAA-11E1-B9E3-3440B5E0E828_34:40:B5:E0:E8:28 nicdriver=eth0!e1000 nicipv4=eth0!10.1.100.114/16 nichwaddr=eth0!00:1A:4A:01:00:5D nicpci=eth0!:00:03.0 Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/01/11 08:09:31---I tried the manual insertion of the MAC address and I was able]Josh Nielsen ---2014/01/11 08:09:31---I tried the manual insertion of the MAC address and I was able to get a DHCP lease for the VM (IP 10 From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/11 08:09 Subject: Re: [xcat-user] Node discovery for VMs in VMWare -- I tried the manual insertion of the MAC address and I was able to get a DHCP lease for the VM (IP 10.20.240.254) but for some reason it just kept saying it was processing the discovery request like this over and over again: Jan 10 17:45:11 x3650-head01 xCAT: xcatd: Processing discovery request from 10.20.240.254 Jan 10 23:45:14 10.20.240.254 (none) dhclient[310]: XMT: Solicit on eth0, interval 125010ms. Jan 10 17:45:16 x3650-head01 xCAT: xcatd: Processing discovery request from 10.20.240.254 Jan 10 23:45:17 10.20.240.254 (none) dhclient[321]: XMT: Solicit on eth1, interval 112560ms. Jan 10 17:45:21 x3650-head01 xCAT: xcatd: Processing discovery request from 10.20.240.254 Jan 10 17:45:36 x3650-head01 last message repeated 3 times I'll have to debug this more after the weekend. -Josh On Fri, Jan 10, 2014 at 3:22 PM, Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org wrote: Hello all, I would like to provision some new nodes in my xCAT cluster inside VMWare but I just realized that the way xCAT has identified and discovered my nodes in the past has been by dedicated physical port numbers on the switches as defined in the xCAT switch database table
Re: [xcat-user] How to create and deploy an xCAT Service Node
Hi Wang Xiaopeng (I apologize if I got your name wrong before), You said: SN cannot work without MN since there's only one database which is running on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT does NOT recommend to run any command on SN. Will DNS and DHCP still work however from an SN when the MN is down? I would need the SN to still resolve the compute nodes' DNS requests on the cluster subnet and issue DHCP leases/renewals as necessary. It looks like SNs are intended to take responsibility on behalf of the MN for serving the compute nodes' necessary services, such that you designate each node's xcatmaster as the SN like: chdef -t group compute1 servicenode=sn1 xcatmaster=sn1-nic. If that is the case then shouldn't DNS DHCP still be resolved from the SN by a node even when the MN is down? Thanks, Josh On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang w...@cn.ibm.com wrote: The doc mentioned by Josh will lead you in the correct direction to setup Service Node. Several information for you: Roughly, your understand is correct that you can install and configure a SN automatically when prepare/configure correctly for SNnode definition (correct group like service, correct attribute like postscripts, correct osimage like rhels6.4-x86_64-install-service) and xCAT rpm repository. Generally, the /tftpboot and /install on SN are mounted from xCAT MN, so them will be identical with the dirs on MN in any time. SN cannot work without MN since there's only one database which is running on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT does NOT recommend to run any command on SN. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never came up in my]Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never came up in my Google searches even From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/09 03:48 Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node -- Hi Russell, For some reason that wiki page never came up in my Google searches even when I placed quotes around service node. Thanks! I'll take a look through it. Regards, Josh On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones *russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote: Hi Josh, This should be helpful for you (it was for me when first learning how to set this up): *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster On 1/8/2014 11:52 AM, Josh Nielsen wrote: Hello all, I am trying to move my xCAT installation toward a hierarchical structure by including Service Nodes but I am finding a dearth of information on how to create a service node on the xCAT wiki and even the Sumavi xCAT guide. At first I began following the instructions for a setting up a management node (which involves a new xCAT install, an independent xCAT database, etc.) on a VM intended to the the SN but then realized that that must be wrong. I have noticed the xCAT 'servicenode' database table present on the Management Node (listed with tabdump). I am wondering if I can simply designate a new node (in my case a VM) to be the prospective SN, designate which services (like dhcp, dns, etc.) to enable/disable in the 'servicenode' table, and then do a fresh 'rinstall' to that VM and have it be automatically fully configured as an SN from postinstall scripts (which I presume are already present and will be automatically applied to that node by some xCAT logic of parsing the 'servicenode' table and associating any relevant postscripts to be run with the node). If so will it create the /install and /tftpboot directories (including ISOs made with copycds) on each SN identical to the contents of the MN? Is it that simple or is there more involved than that? In addition I want to be able to take down the Management Node in the future for maintenance and be able to do just about everything that the MN can do from the SNs (including dhcp leases, rpower commands, updatenode, psh, etc) while the MN is down. Is that possible? Any clarifications would be appreciated. Thank you, Josh Nielsen
Re: [xcat-user] How to create and deploy an xCAT Service Node
Thank you Lissa, that is helpful. -Josh On Fri, Jan 10, 2014 at 1:25 PM, Lissa Valletta lis...@us.ibm.com wrote: DNS and DHCP will still work from the Service Node, if setup correctly. In other words, you have configured the service node as the DNS server and/or DHCP server for the nodes and there is no requirement on the Management Node for dns or dhcp. You will not be able to run any xcat commands on the service node, if the Management Node is down. xCAT requires access to the database configured on the MN for the xcat cluster ( mysql, postgresql) to run most xcat commands. Even to recognize that the node is in the xcat cluster. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong befor]Josh Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong before), You said: SN cannot work without MN From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/10/2014 12:59 PM Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node -- Hi Wang Xiaopeng (I apologize if I got your name wrong before), You said: SN cannot work without MN since there's only one database which is running on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT does NOT recommend to run any command on SN. Will DNS and DHCP still work however from an SN when the MN is down? I would need the SN to still resolve the compute nodes' DNS requests on the cluster subnet and issue DHCP leases/renewals as necessary. It looks like SNs are intended to take responsibility on behalf of the MN for serving the compute nodes' necessary services, such that you designate each node's xcatmaster as the SN like: chdef -t group compute1 servicenode=sn1 xcatmaster=sn1-nic. If that is the case then shouldn't DNS DHCP still be resolved from the SN by a node even when the MN is down? Thanks, Josh On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang *w...@cn.ibm.com*w...@cn.ibm.com wrote: The doc mentioned by Josh will lead you in the correct direction to setup Service Node. Several information for you: Roughly, your understand is correct that you can install and configure a SN automatically when prepare/configure correctly for SNnode definition (correct group like service, correct attribute like postscripts, correct osimage like rhels6.4-x86_64-install-service) and xCAT rpm repository. Generally, the /tftpboot and /install on SN are mounted from xCAT MN, so them will be identical with the dirs on MN in any time. SN cannot work without MN since there's only one database which is running on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT does NOT recommend to run any command on SN. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: *w...@cn.ibm.com* w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 [image: Inactive hide details for Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never came up in my]Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never came up in my Google searches even From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net, Date: 2014/01/09 03:48 Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node -- Hi Russell, For some reason that wiki page never came up in my Google searches even when I placed quotes around service node. Thanks! I'll take a look through it. Regards, Josh On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones *russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote: Hi Josh, This should be helpful for you (it was for me when first learning how to set this up): *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster On 1/8/2014 11:52 AM, Josh Nielsen wrote: Hello all, I am trying to move my xCAT installation toward a hierarchical structure by including Service Nodes but I am finding a dearth of information on how to create a service node on the xCAT wiki and even the Sumavi xCAT guide. At first I
[xcat-user] Node discovery for VMs in VMWare
Hello all, I would like to provision some new nodes in my xCAT cluster inside VMWare but I just realized that the way xCAT has identified and discovered my nodes in the past has been by dedicated physical port numbers on the switches as defined in the xCAT switch database table. That is not a problem for physical nodes which have a single cable going into a port on a switch dedicated only to that node. For example one of my nodes has this entry in the switch table: #node,switch,port,vlan,interface,comments,disable hinode,bnt01,44,1,eth0,bnt01:p44 This tells xCAT that the node found connected to port 44 on switch bnt01 can be identified as node hinode. With a VMWare cluster however a VM may reside on many ports (in my case I have 3 ESX servers - and 3 corresponding cables to the switch - which the VM may be on) at any given time, depending on which VMWare host it is running on. And on top of that the network connection is trunked so that every VM running on each ESX host goes through the same network port. Has anyone provisioned nodes in VMWare before with xCAT? And if there is not a way to do this automatically by the switch/port associations, is there another manual way that I can discover the node? Would manually inserting the node's MAC address into the xCAT 'mac' table work, or is more than that required for (auto)discovery? Thanks, Josh Nielsen -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] How to create and deploy an xCAT Service Node
That may be what I need actually, because I have to wipe the OS on our current (physical) headnode to upgrade from RHEL5 to RHEL6 and I need something (a VM HA management node) up and functional while I am doing that to service the cluster. -Josh On Thu, Jan 9, 2014 at 2:32 PM, Lissa Valletta lis...@us.ibm.com wrote: Highly Available management nodes makes sure that if one Management Server goes down there is one configured that can take over.Only one Management Server at a time is active. Lissa K. Valletta 8-3/B10 Poughkeepsie, NY 12601 (tie 293) 433-3102 [image: Inactive hide details for Josh Nielsen ---01/09/2014 03:13:09 PM---Thank you Xiao Peng Wang, that information is useful. I am t]Josh Nielsen ---01/09/2014 03:13:09 PM---Thank you Xiao Peng Wang, that information is useful. I am thinking that, in that case, Service Node From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 01/09/2014 03:13 PM Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node -- Thank you Xiao Peng Wang, that information is useful. I am thinking that, in that case, Service Nodes are not what I need after all. Our cluster is not at a scale yet that requires additional repositories to pull software updates from (which SNs look like they are designed for). Rather I need additional management points. I looked a little further and it looks like Highly Available Management Nodes are what I need: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Highly_Available_Management_Node . I do have a question about the Service Nodes though since I've already begun configuring them: In the Service Node setup documentation on the wiki, under the heading Add OS and Hardware Attributes to Service Nodes they tell you to do a chdef with parameters like: chdef -t group service arch=x86_64 os=centos6.4 nodetype=osi ... My question is about the nodetype. I will be deploying to VMs acting as the SNs, so should the nodetype not rather be nodetype=vm? What does nodetype=osi as an osimage even mean when applied to a physical or virtual machine? When I do an lsdef on the nodes (that will be on VMs) I see: # lsdef service arch=x86_64 groups=service,ipmi,all installnic=mac ip=172.20.3.1 netboot=xnba nfsserver=10.X.X.X nodetype=osi ... Object name: xcat-serv2 arch=x86_64 groups=service,ipmi,all installnic=mac ip=172.20.3.2 netboot=xnba nfsserver=10.X.X.X nodetype=osi .. But the VMs are vms and not osimages. I'm just wondering about the semantics of what osi/osimage indicates should be done to a node or how it will affect deployment. Thanks, Josh On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang w...@cn.ibm.com wrote: The doc mentioned by Josh will lead you in the correct direction to setup Service Node. Several information for you: Roughly, your understand is correct that you can install and configure a SN automatically when prepare/configure correctly for SNnode definition (correct group like service, correct attribute like postscripts, correct osimage like rhels6.4-x86_64-install-service) and xCAT rpm repository. Generally, the /tftpboot and /install on SN are mounted from xCAT MN, so them will be identical with the dirs on MN in any time. SN cannot work without MN since there's only one database which is running on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT does NOT recommend to run any command on SN. Thanks Best Regards -- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: w...@cn.ibm.com Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never came up in my Google searches even From: Josh Nielsen jniel...@hudsonalpha.org To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 2014/01/09 03:48 Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node Hi Russell, For some reason that wiki page never came up in my Google searches even when I placed quotes around service node. Thanks! I'll take a look through it. Regards, Josh On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones russell-l...@jonesmail.me wrote: Hi Josh, This should be helpful for you (it was for me when first learning how to set this up): http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster On 1/8/2014 11:52 AM, Josh Nielsen wrote: Hello all, I am trying to move my xCAT installation toward a hierarchical structure by including Service Nodes but I am
[xcat-user] How to create and deploy an xCAT Service Node
Hello all, I am trying to move my xCAT installation toward a hierarchical structure by including Service Nodes but I am finding a dearth of information on how to create a service node on the xCAT wiki and even the Sumavi xCAT guide. At first I began following the instructions for a setting up a management node (which involves a new xCAT install, an independent xCAT database, etc.) on a VM intended to the the SN but then realized that that must be wrong. I have noticed the xCAT 'servicenode' database table present on the Management Node (listed with tabdump). I am wondering if I can simply designate a new node (in my case a VM) to be the prospective SN, designate which services (like dhcp, dns, etc.) to enable/disable in the 'servicenode' table, and then do a fresh 'rinstall' to that VM and have it be automatically fully configured as an SN from postinstall scripts (which I presume are already present and will be automatically applied to that node by some xCAT logic of parsing the 'servicenode' table and associating any relevant postscripts to be run with the node). If so will it create the /install and /tftpboot directories (including ISOs made with copycds) on each SN identical to the contents of the MN? Is it that simple or is there more involved than that? In addition I want to be able to take down the Management Node in the future for maintenance and be able to do just about everything that the MN can do from the SNs (including dhcp leases, rpower commands, updatenode, psh, etc) while the MN is down. Is that possible? Any clarifications would be appreciated. Thank you, Josh Nielsen -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] How to create and deploy an xCAT Service Node
Hi Russell, For some reason that wiki page never came up in my Google searches even when I placed quotes around service node. Thanks! I'll take a look through it. Regards, Josh On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones russell-l...@jonesmail.mewrote: Hi Josh, This should be helpful for you (it was for me when first learning how to set this up): http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster On 1/8/2014 11:52 AM, Josh Nielsen wrote: Hello all, I am trying to move my xCAT installation toward a hierarchical structure by including Service Nodes but I am finding a dearth of information on how to create a service node on the xCAT wiki and even the Sumavi xCAT guide. At first I began following the instructions for a setting up a management node (which involves a new xCAT install, an independent xCAT database, etc.) on a VM intended to the the SN but then realized that that must be wrong. I have noticed the xCAT 'servicenode' database table present on the Management Node (listed with tabdump). I am wondering if I can simply designate a new node (in my case a VM) to be the prospective SN, designate which services (like dhcp, dns, etc.) to enable/disable in the 'servicenode' table, and then do a fresh 'rinstall' to that VM and have it be automatically fully configured as an SN from postinstall scripts (which I presume are already present and will be automatically applied to that node by some xCAT logic of parsing the 'servicenode' table and associating any relevant postscripts to be run with the node). If so will it create the /install and /tftpboot directories (including ISOs made with copycds) on each SN identical to the contents of the MN? Is it that simple or is there more involved than that? In addition I want to be able to take down the Management Node in the future for maintenance and be able to do just about everything that the MN can do from the SNs (including dhcp leases, rpower commands, updatenode, psh, etc) while the MN is down. Is that possible? Any clarifications would be appreciated. Thank you, Josh Nielsen -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] dhcp timeout in xnba
Can you tell xnba to put you directly into a shell instead of discovery mode (by modifying the chains table or setting 'nodeset [node] runcmd=shell'), and then set a temp static IP and try some ping tests from there? I forget how far you have to get to drop to a shell. It seems I've had to do that before in a shell though. For me on a previous occasion we had to ensure the switch port was configured as an STP edge port and to make sure that it communicated over the default VLAN. Additionally sometimes stale DHCP IP assignment entries (with the MAC address) were present in /var/lib/dhcpd.leases and we had to manually delete them and restart dhcpd and try again (this was in a RHEL/Centos5 environment). I don't know why that would cause a timeout though. Were you able to verify that the DHCP offer was coming from your xCat server in each case and not (potentially) another DHCP server (rogue or otherwise)? I had problems one time when I had two xCat servers/headnodes from two different clusters cross talking and issuing IPs to newly provisioned nodes in the other's cluster. Just a few ideas. You never know. Josh Nielsen On Mon, Oct 28, 2013 at 12:35 PM, Roland Santos roland.san...@gmail.comwrote: I'm getting a dhcp timeout issue with xnba/genesis. On discovery, a node will boot from the network and get the xnba image. It then times out getting network information via dhcp from within xnba. I'm running xCAT version 2.8.2, if it helps. On the xcat management server side, I see the initial sequence of DHCP requests coming from the node - DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, then DHCPACK. After the node downloads the xnba image and loads it, it then tries to do another dhcp request, where the request times out. On the xcat side, I see just a DHCPDISCOVER and a DHCPOFFER, but no further. Any help would be appreciated. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Android is increasing in popularity, but the open development platform that developers love is also attractive to malware creators. Download this white paper to learn more about secure code signing practices that can help keep Android apps secure. http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'
Okay, it turns out that the xnba-undi package was outdated and for some reason it was not updated from xcat-dep when we installed xCAT 2.7.3. I guess it was not a required dependency hence was not grabbed automatically for update with the new xCAT. I can boot Genesis now after updating that package. Also, it looks like the source rpm for that xnba-undi package just adds two files: /tftpboot/xcat/xnba.efi /tftpboot/xcat/xnba.kpxe The timestamp on them is now Feb 6 2012 and previously my xnba.kpxe timestamp was Aug 24 2009 and I did not have xnba.efi before at all. Why are these files needed with genesis? For my own edification does genesis run *on top* of xnba or does it boot an entirely new image once xnba fetches the genesis kernel images? Thanks, Josh On Tue, Aug 7, 2012 at 5:24 PM, Josh Nielsen jniel...@hudsonalpha.comwrote: Hi Jarrod, Okay, I upgraded to xCAT 2.7.3 and installed the xCAT-genesis-x86_64 and elilo-xCAT RPMs and I reran mknb x86_64 to recreate the /tftpboot/xcat/xnba/nets files and it also reported Creating genesis.fs.x86_64.gz in /tftpboot/xcat. But I still get a missing NIC driver error when I PXE boot (this time explicit - confirming this is the problem). Upon PXE boot the clients get yaboot first, then load xnba.kpxe, which according to my /etc/dhcpd.conf file is set up to query the appropriate nets file: *if option user-class-identifier = xNBA and option client-architecture = 00:00 { #x86, xCAT Network Boot Agent* * always-broadcast on;* * filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16 ;* Inside that file is: *[root@x3650-head01 etc]# cat /tftpboot/xcat/xnba/nets/10.20.0.0_16* *#!gpxe* *imgfetch -n kernel http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64 xcatd=10.20.0.1:3001 BOOTIF=01-${netX/machyp}* *imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz * *imgload kernel* *imgexec kernel* In my apache log I see: 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET /tftpboot/xcat/xnba/nets/10.20.0.0_16 HTTP/1.0 200 235 - gPXE/0.9.7 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET /tftpboot/xcat/genesis.kernel.x86_64 HTTP/1.0 200 3942032 - gPXE/0.9.7 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET /tftpboot/xcat/genesis.fs.x86_64.gz HTTP/1.0 200 20210204 - gPXE/0.9.7 Then genesis boots and (after I removed 'quiet' from the kernel arguments) does some initial boot checks and then it goes into a loop of dumping to the screen the help/syntax screen for grep (which indicates to me that what it is greping for is failing - possibly /tmp/dhcpserver like before). Eventually it gives up and prints this to the screen: ERROR Unable to find boot device (*maybe* the *nbroot is missing* the * driver* for your *nic*?) At that point it just sits there, and does not try anything else. What have I done wrong here? Is maybe the BOOTIF argument to genesis kernel wrong? Also why is it looking for nbroot? There was a previous bootloader in /opt/xcat/share/xcat/netboot/x86_64/nbroot/ but it should be looking in /opt/xcat/share/xcat/netboot/genesis/x86_64/ since it is using genesis, correct? Any ideas? -Josh On Wed, Jul 25, 2012 at 1:37 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: Hmm, with xcat 2.7.3 you should be pulling in the 'xCAT-genesis' packages that replace the environment with something newer that has the appropriate nic drivers... -Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.comwrote: - To: xcat-user@lists.sourceforge.net From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com Date: 07/25/2012 12:59PM Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver' Hello, I have some new IBM System X DX360M4 nodes (all our previous ones were DX360M3s) that I am trying to autodiscover with xCAT and I am running into the same problem as in this mail thread: http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html. Essentially the node boots up, does a dhcpdiscover, and grabs a generic bootloader (in my case yaboot but it also works with pxelinux.0) which then reinitiates the dhcpdiscover and queries again for the appropriate bootloader from the file in the xcat/xnba/nets/ folder and is served boots xnba.kpxe. At this point I believe the xnba image is supposed to load and execute the autodiscovery process which includes the getdestiny script, but all I am seeing are the messages: cat: can't open '/tmp/dhcpserver': No such file or directory grep: /tmp/destiny: No such file or directory grep: /tmp/destiny: No such file or directory One reply by Jarrod Johnson to that email thread above mentioned a possible network driver issue and suggested using Genesis from xCAT v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working with my current version of xCAT and OS (inject drivers somehow
[xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'
Hello, I have some new IBM System X DX360M4 nodes (all our previous ones were DX360M3s) that I am trying to autodiscover with xCAT and I am running into the same problem as in this mail thread: http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html. Essentially the node boots up, does a dhcpdiscover, and grabs a generic bootloader (in my case yaboot but it also works with pxelinux.0) which then reinitiates the dhcpdiscover and queries again for the appropriate bootloader from the file in the xcat/xnba/nets/ folder and is served boots xnba.kpxe. At this point I believe the xnba image is supposed to load and execute the autodiscovery process which includes the getdestiny script, but all I am seeing are the messages: cat: can't open '/tmp/dhcpserver': No such file or directory grep: /tmp/destiny: No such file or directory grep: /tmp/destiny: No such file or directory One reply by Jarrod Johnson to that email thread above mentioned a possible network driver issue and suggested using Genesis from xCAT v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working with my current version of xCAT and OS (inject drivers somehow?), or do I need to update to v2.7 of xCAT (does that require Centos 6)? In any case, is a network driver the most likely explanation for what I am seeing? I have tried everything that I can think of from the switch side to make sure SNMP is enabled and the port definitions are correct in the switch table for autodiscovery. Any ideas? Thanks, Josh Nielsen -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'
Okay, we will be doing an upgrade of xCAT soon. I just wanted to double check. P.S. Can you take a crack at answering that gPXE/UNDI question? On Wed, Jul 25, 2012 at 1:37 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: Hmm, with xcat 2.7.3 you should be pulling in the 'xCAT-genesis' packages that replace the environment with something newer that has the appropriate nic drivers... -Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.comwrote: - To: xcat-user@lists.sourceforge.net From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com Date: 07/25/2012 12:59PM Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver' Hello, I have some new IBM System X DX360M4 nodes (all our previous ones were DX360M3s) that I am trying to autodiscover with xCAT and I am running into the same problem as in this mail thread: http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html. Essentially the node boots up, does a dhcpdiscover, and grabs a generic bootloader (in my case yaboot but it also works with pxelinux.0) which then reinitiates the dhcpdiscover and queries again for the appropriate bootloader from the file in the xcat/xnba/nets/ folder and is served boots xnba.kpxe. At this point I believe the xnba image is supposed to load and execute the autodiscovery process which includes the getdestiny script, but all I am seeing are the messages: cat: can't open '/tmp/dhcpserver': No such file or directory grep: /tmp/destiny: No such file or directory grep: /tmp/destiny: No such file or directory One reply by Jarrod Johnson to that email thread above mentioned a possible network driver issue and suggested using Genesis from xCAT v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working with my current version of xCAT and OS (inject drivers somehow?), or do I need to update to v2.7 of xCAT (does that require Centos 6)? In any case, is a network driver the most likely explanation for what I am seeing? I have tried everything that I can think of from the switch side to make sure SNMP is enabled and the port definitions are correct in the switch table for autodiscovery. Any ideas? Thanks, Josh Nielsen -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'
Okay, thanks for the explanation. Cheers, Josh On Wed, Jul 25, 2012 at 2:28 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: So the issue is that the xnba can work fine. The problem is the linux image that subsequently loads lacked the driver update. In an ideal world, we make an efi executable that does everything in UEFI that is moderately future proof. In a practical world we work with the much richer linux toolset for lack of time to develop EFI resources. -Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.comwrote: - To: xCAT Users Mailing list xcat-user@lists.sourceforge.netxcat-user@lists.sourceforge.net From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com Date: 07/25/2012 02:58PM Subject: Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver' Thanks for the information Lissa. I do have another more general question as well though. It is regarding the xnba boot image itself, which appears to be based off of gPXE. The idea of gPXE is that it is UNDI-capable and does not have to use TFTP to serve the images correct? And looking at the xnba.kpxe image/bootloader extension of '.kpxe' it looks like that is reserved specifically for images that load UNDI but offload PXE. According to this page ( http://etherboot.org/wiki/gpxe_imagetypes) the extensions for images break down like this: - .pxe is an image designed to be chainloaded, unloading both the underlying PXE and UNDI code sections. - .kpxe is a PXE image that keeps UNDI loaded and unloads PXE - .kkpxe is a PXE image that keeps PXE+UNDI loaded and return to PXE (instead of int 18h). So does xnba.kpxe try to interact with the NIC card via UNDI once it loads? If so since UNDI is an abstracted API, and unless the API has been updated on the newer NIC cards, shouldn't it work with just about any network card regardless? From this PXE chainloading page ( http://etherboot.org/wiki/pxechaining) it says: When chainloading gPXE from PXE, gPXE can use this API (instead of loading an hardware driver). This way, you're getting support for network controllers that are not natively supported by gPXE. Some network controllers have improved performance when using the UNDI driver over the vendor specific gPXE driver. I'm just curious about some of the theory behind this because I'm only used to good ol' legacy PXE. All this gPXE and UNDI stuff is new to me. On Wed, Jul 25, 2012 at 12:57 PM, Lissa Valletta lis...@us.ibm.comwrote: Version 2.3.x has not been supported for a long time. You need to be on the latest level of 2.6 for support , but you might as well go to the latest release which is 2.7.3. You can upgrade xCAT and stay at your current Centos level. Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 Josh Nielsen ---07/25/2012 01:33:25 PM---Hello, I have some new IBM System X DX360M4 nodes (all our previous ones were From: Josh Nielsen jniel...@hudsonalpha.com To: xcat-user@lists.sourceforge.net Date: 07/25/2012 01:33 PM Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver' -- Hello, I have some new IBM System X DX360M4 nodes (all our previous ones were DX360M3s) that I am trying to autodiscover with xCAT and I am running into the same problem as in this mail thread: * http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html *http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html. Essentially the node boots up, does a dhcpdiscover, and grabs a generic bootloader (in my case yaboot but it also works with pxelinux.0) which then reinitiates the dhcpdiscover and queries again for the appropriate bootloader from the file in the xcat/xnba/nets/ folder and is served boots xnba.kpxe. At this point I believe the xnba image is supposed to load and execute the autodiscovery process which includes the getdestiny script, but all I am seeing are the messages: cat: can't open '/tmp/dhcpserver': No such file or directory grep: /tmp/destiny: No such file or directory grep: /tmp/destiny: No such file or directory One reply by Jarrod Johnson to that email thread above mentioned a possible network driver issue and suggested using Genesis from xCAT v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working with my current version of xCAT and OS (inject drivers somehow?), or do I need to update to v2.7 of xCAT (does that require Centos 6)? In any case, is a network driver the most likely explanation for what I am seeing? I have tried everything that I can think of from the switch side to make sure SNMP is enabled and the port definitions are correct in the switch table for autodiscovery. Any ideas? Thanks, Josh Nielsen -- Live Security Virtual Conference