from:"Josh Nielsen"

Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemdOSes?

2016-08-05 Thread Josh Nielsen

I see. Thank you.

-Josh

On Thu, Aug 4, 2016 at 7:15 PM, Christopher Samuel <sam...@unimelb.edu.au>
wrote:

> On 05/08/16 02:32, Josh Nielsen wrote:
>
> > It looks like that uses the nics table which is currently empty in my
> > config. The examples on that page show IPs being entered in
> > manually/individually though. How does that relate to the regular
> > expressions I've created for each of my hosts in the 'hosts' table that
> > automatically assign an IP address based on the hostname?
>
> You can use regular expressions in the nics table as well:
>
> #node,nicips,nichostnamesuffixes,nichostnameprefixes,nictypes,
> niccustomscripts,nicnetworks,nicaliases,comments,disable
> "compute","|\D+(\d+)|ib0!10.2.0.($1*1)|","ib0!-ib",,"ib0!
> Infiniband",,"ib0!bruce_infiniband",,,
>
> Gives (for example):
>
> [root@bruce-m ~]# lsdef bruce001 | fgrep 10. | fgrep -v kcmdline
> ip=10.13.0.1
> nicips.ib0=10.2.0.1
> otherinterfaces=-bmc:10.12.0.1
>
> [root@bruce-m ~]# lsdef bruce003 | fgrep 10. | fgrep -v kcmdline
> ip=10.13.0.3
> nicips.ib0=10.2.0.3
> otherinterfaces=-bmc:10.12.0.3
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
> 
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemdOSes?

2016-08-04 Thread Josh Nielsen

Thanks Wang,

It looks like that uses the nics table which is currently empty in my
config. The examples on that page show IPs being entered in
manually/individually though. How does that relate to the regular
expressions I've created for each of my hosts in the 'hosts' table that
automatically assign an IP address based on the hostname?

Josh


On Wed, Aug 3, 2016 at 7:06 PM, Xiao Peng Wang <w...@cn.ibm.com> wrote:

> xCAT recommends to use 'confignics' postscripts to achieve that. Refer to
> the doc:
> http://xcat-docs.readthedocs.io/en/stable/guides/admin-guides/manage_clusters/ppc64le/diskful/customize_image/cfg_network_adapter.html
>
>
> Thanks
> Best Regards
> --
> Wang Xiaopeng (王晓朋)
> IBM China System Technology Laboratory
> Tel: 86-10-82453455
> Email: w...@cn.ibm.com
> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
> Haidian District Beijing P.R.China 100193
>
>
>
> - Original message -
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: [xcat-user] Is there an ifcfg-eth postscript that works on
> systemd OSes?
> Date: Thu, Aug 4, 2016 4:24 AM
>
> Hello,
>
> I am now testing the deployment of Centos 7 in my environment and I've
> noticed that the ifcfg-eth postscript is not geared to work with it. For
> starters Centos 7 doesn't install ifconfig by default, though I've solved
> that with my kickstart, but more to the point the postscript explicitly
> looks for "Ethernet" in the ifconfig line to grab the interface name which
> doesn't work on systemd OSes like Centos 7. And of course there's the
> change from "eth" interface names to "ens" and a variety of other names.
>
> I changed the line that looked like this in the postscript:
>
> interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}')
>
> To this:
>
> interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}')
>
> And while that does parse out the ens interface names now they come with a
> colon tacked on to the end of them in the ifconfig output like this:
> ens160:
> ens192:
>
> I can parse that out with a regex substitution to remove the colon, but
> before I hack the default script up too much has there been an alternative
> ifcfg-eth postscript released for systems like this? I'm using this script
> to change the /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP
> to the static addresses defined through xCAT, which works fine on my Centos
> 6 OSes.
>
> This is my xCAT version:
> lsxcatd -v
> Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built
> Mon Nov 30 05:43:11 EST 2015)
>
> Thanks,
> Josh Nielsen
>
>
>
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
>
> --
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Is there an ifcfg-eth postscript that works on systemd OSes?

2016-08-03 Thread Josh Nielsen

Thanks! Yes, I knew ifconfig was deprecated, hence why I knew this was a
hack and was asking. Is that postscript something you wrote yourself?

Thanks,
Josh

On Wed, Aug 3, 2016 at 3:49 PM, Rich Sudlow <r...@nd.edu> wrote:

> On 08/03/2016 03:54 PM, Josh Nielsen wrote:
>
>> Hello,
>>
>> I am now testing the deployment of Centos 7 in my environment and I've
>> noticed
>> that the ifcfg-eth postscript is not geared to work with it. For starters
>> Centos
>> 7 doesn't install ifconfig by default, though I've solved that with my
>> kickstart, but more to the point the postscript explicitly looks for
>> "Ethernet"
>> in the ifconfig line to grab the interface name which doesn't work on
>> systemd
>> OSes like Centos 7. And of course there's the change from "eth" interface
>> names
>> to "ens" and a variety of other names.
>>
>> I changed the line that looked like this in the postscript:
>>
>> interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}')
>>
>> To this:
>>
>> interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}')
>>
>> And while that does parse out the ens interface names now they come with
>> a colon
>> tacked on to the end of them in the ifconfig output like this:
>> ens160:
>> ens192:
>>
>> I can parse that out with a regex substitution to remove the colon, but
>> before I
>> hack the default script up too much has there been an alternative
>> ifcfg-eth
>> postscript released for systems like this? I'm using this script to
>> change the
>> /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP to the static
>> addresses defined through xCAT, which works fine on my Centos 6 OSes.
>>
>> This is my xCAT version:
>> lsxcatd -v
>> Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built
>> Mon Nov
>> 30 05:43:11 EST 2015)
>>
>> Thanks,
>> Josh Nielsen
>>
>
> I believe all the latest xcat routines use ip addr 
>
> as you might know the use if ifconfig is deprecated...
>
> here's a snippet from a postscript which uses ksh
>
>
> if [[ $OSVER = *rhels7* ]]; then
> # This just hardcodes the entries which are already set
> # Change to grep only on "inet " rather than "inet addr" so that rhels7
> works - RKS - 8/21/2014
> for nic in `ifconfig -a|grep -B1 "inet "|awk '{print $1}'|grep -v
> inet|grep -v -- --|grep -v lo|sed s/:$//`; do
> echo "Setting up hardeths on rhels7" >> /root/post.log
>echo NIC $nic
>echo NIC $nic >> /root/post.log
>IPADDR=`ifconfig $nic |grep "inet "|awk '{print $2}' |awk -F: '{print
> $1}'`
>echo "IPADDR: $IPADDR" >> /root/post.log
>NETMASK=`ifconfig $nic |grep "inet "|awk '{print $4}' |awk -F: '{print
> $1}'`
>echo "NETMASK: $NETMASK" >> /root/post.log
>sed -i s/BOOTPROTO=dhcp/BOOTPROTO=none/
> /etc/sysconfig/network-scripts/ifcfg-$nic
>sed -i s/ONBOOT=no/ONBOOT=yes/ /etc/sysconfig/network-scripts/ifcfg-$nic
>echo IPADDR=$IPADDR >> /etc/sysconfig/network-scripts/ifcfg-$nic
>echo NETMASK=$NETMASK >> /etc/sysconfig/network-scripts/ifcfg-$nic
># Remove firewalld since we're using iptables - RKS 9/30/14
>yum remove -y firewalld | logger -t xcat
>echo "Done with hardeths on rhels7" >> /root/post.log
> done
>
> Hope this help you out.
>
>
>
>
>>
>>
>>
>>
>> --
>>
>>
>>
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>
> --
> Rich Sudlow
> University of Notre Dame
> Center for Research Computing - Union Station
> 506 W. South St
> South Bend, In 46601
>
> (574) 631-7258 (office)
> (574) 807-1046 (cell)
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] Is there an ifcfg-eth postscript that works on systemd OSes?

2016-08-03 Thread Josh Nielsen

Hello,

I am now testing the deployment of Centos 7 in my environment and I've
noticed that the ifcfg-eth postscript is not geared to work with it. For
starters Centos 7 doesn't install ifconfig by default, though I've solved
that with my kickstart, but more to the point the postscript explicitly
looks for "Ethernet" in the ifconfig line to grab the interface name which
doesn't work on systemd OSes like Centos 7. And of course there's the
change from "eth" interface names to "ens" and a variety of other names.

I changed the line that looked like this in the postscript:

interfaces=$(ifconfig -a | grep "Ethernet" | awk '{print $1}')

To this:

interfaces=$(ifconfig -a | egrep "Ethernet|ens" | awk '{print $1}')

And while that does parse out the ens interface names now they come with a
colon tacked on to the end of them in the ifconfig output like this:
ens160:
ens192:

I can parse that out with a regex substitution to remove the colon, but
before I hack the default script up too much has there been an alternative
ifcfg-eth postscript released for systems like this? I'm using this script
to change the /etc/sysconfig/network-scripts/ifcfg-* files from using DHCP
to the static addresses defined through xCAT, which works fine on my Centos
6 OSes.

This is my xCAT version:
lsxcatd -v
Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built
Mon Nov 30 05:43:11 EST 2015)

Thanks,
Josh Nielsen
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?

2016-06-16 Thread Josh Nielsen

Well, I should have looked in the logs first. There were more detailed
messages in /var/log/messages on the MN:

Jun 16 14:10:14 xcat-master xcat[30550]: Error dispatching request to
xcat-serv1:3001, trying other service nodes: Connection failure: SSL
connect attempt failed because of handshake problems error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at
/opt/xcat/lib/perl/xCAT/Client.pm line 265.
Jun 16 14:10:15 xcat-master xcat[30550]: Error dispatching request to
xcat-serv2:3001, trying other service nodes: Connection failure: SSL
connect attempt failed because of handshake problems error:14094418:SSL
routines:SSL3_READ_BYTES:tlsv1 alert unknown ca at
/opt/xcat/lib/perl/xCAT/Client.pm line 265.

Which SSL cert or key is involved in this connection? Although I copied
over the rsa keys in /root/.ssh from the old MN to the new one I did not do
the same for either /etc/xcat/cert/ or /etc/ssh/. Might a missing key or
cert from either of those directories be responsible for that error?

Thanks,
Josh

On Thu, Jun 16, 2016 at 2:23 PM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Xiao,
>
> Okay, so I followed those four steps with some modifications. I did 1 & 4
> as instructed with no issues. The service nodes are getting their database
> access from the new MN now, and I updated the SN object definitions to
> point xcatmaster, tftpserver, and other relevant parameters to the new MN.
>
> I avoided step #3 because I just copied the old /root/.ssh/id_rsa and
> corresponding .pub file to the new MN and passwordless logon works fine. I
> also tested this from the two service nodes to make sure they could fetch
> the host keys: "USEOPENSSLFORXCAT=yes XCATSERVER=:3001
> /xcatpost/getcredentials.awk ssh_rsa_hostkey. Is that sufficient for the
> key step?
>
> And lastly for #3 I only selectively updated certain packages on the SNs
> like syslog and NTP, because I didn't want to run all of the packages and
> in particular the servicenode postscript.
>
> So, I was able to use updatenode with no issues from the new MN to update
> the SNs, however when I try to update any cluster client nodes it is having
> problems dispatching to the service nodes in the hierarchy:
>
>
> # updatenode node0010 -P addsiteyum
> Error: Failed to dispatch command to any of the following service nodes:
> xcat-serv1,xcat-serv2
>
> What is most likely causing that issue?
>
> Thanks,
> Josh
>
> On Fri, Jun 3, 2016 at 7:01 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote:
>
>> I think we should talk it as opposite way that how to make the MN to use
>> the new SN.
>>
>> Following steps are necessary to switch a SN:
>> 1. rerun 'mysqlsetup -f' to assign the access permission for SN to access
>> DB on MN
>> 2. run 'updatenode -k ' to set up the ssh key
>> 3. run 'updatenode -P' to update the SN
>> 4. change the 'servicenode' attribute for compute node accordingly.
>>
>>
>> Thanks
>> Best Regards
>> --
>> Wang Xiaopeng (王晓朋)
>> IBM China System Technology Laboratory
>> Tel: 86-10-82453455
>> Email: w...@cn.ibm.com
>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
>> Haidian District Beijing P.R.China 100193
>>
>>
>>
>> - Original message -
>> From: Josh Nielsen <jniel...@hudsonalpha.org>
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> Cc:
>> Subject: Re: [xcat-user] How can I migrate to a new xCAT MN in a
>> hierarchical environment?
>> Date: Thu, Jun 2, 2016 3:49 AM
>>
>> Can anyone verify if simply updating cfgloc should be all I need to for
>> the SNs to start using the new MN? By pointing it to the new MN's MySQL
>> instance, which has a site table with the new MN specified as the
>> xcatmaster, it should even update the content the the xcatmaster value
>> shown in an 'lsdef' of the service nodes automatically, right?
>>
>> Thanks,
>> Josh
>>
>> On Tue, May 17, 2016 at 3:42 PM, Josh Nielsen <jniel...@hudsonalpha.org>
>> wrote:
>>
>> A correction below for something I wrote previously.
>>
>> "...and the SNs then shouldn't need newly generated keys (right?)..."
>>
>> On Tue, May 17, 2016 at 3:36 PM, Josh Nielsen <jniel...@hudsonalpha.org>
>> wrote:
>>
>> I looked at the 'servicenode' postscript and it does _way_ too much for
>> what I want to accomplish. I don't think the script was written with
>> changes or upgrades in mind. It looks like it freshly copies everything to
>> the SNs' $installdir/postscripts and /etc/xcat on the service node and

Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?

2016-06-16 Thread Josh Nielsen

Xiao,

Okay, so I followed those four steps with some modifications. I did 1 & 4
as instructed with no issues. The service nodes are getting their database
access from the new MN now, and I updated the SN object definitions to
point xcatmaster, tftpserver, and other relevant parameters to the new MN.

I avoided step #3 because I just copied the old /root/.ssh/id_rsa and
corresponding .pub file to the new MN and passwordless logon works fine. I
also tested this from the two service nodes to make sure they could fetch
the host keys: "USEOPENSSLFORXCAT=yes XCATSERVER=:3001
/xcatpost/getcredentials.awk ssh_rsa_hostkey. Is that sufficient for the
key step?

And lastly for #3 I only selectively updated certain packages on the SNs
like syslog and NTP, because I didn't want to run all of the packages and
in particular the servicenode postscript.

So, I was able to use updatenode with no issues from the new MN to update
the SNs, however when I try to update any cluster client nodes it is having
problems dispatching to the service nodes in the hierarchy:


# updatenode node0010 -P addsiteyum
Error: Failed to dispatch command to any of the following service nodes:
xcat-serv1,xcat-serv2

What is most likely causing that issue?

Thanks,
Josh

On Fri, Jun 3, 2016 at 7:01 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote:

> I think we should talk it as opposite way that how to make the MN to use
> the new SN.
>
> Following steps are necessary to switch a SN:
> 1. rerun 'mysqlsetup -f' to assign the access permission for SN to access
> DB on MN
> 2. run 'updatenode -k ' to set up the ssh key
> 3. run 'updatenode -P' to update the SN
> 4. change the 'servicenode' attribute for compute node accordingly.
>
>
> Thanks
> Best Regards
> --
> Wang Xiaopeng (王晓朋)
> IBM China System Technology Laboratory
> Tel: 86-10-82453455
> Email: w...@cn.ibm.com
> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
> Haidian District Beijing P.R.China 100193
>
>
>
> - Original message -
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: Re: [xcat-user] How can I migrate to a new xCAT MN in a
> hierarchical environment?
> Date: Thu, Jun 2, 2016 3:49 AM
>
> Can anyone verify if simply updating cfgloc should be all I need to for
> the SNs to start using the new MN? By pointing it to the new MN's MySQL
> instance, which has a site table with the new MN specified as the
> xcatmaster, it should even update the content the the xcatmaster value
> shown in an 'lsdef' of the service nodes automatically, right?
>
> Thanks,
> Josh
>
> On Tue, May 17, 2016 at 3:42 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
> A correction below for something I wrote previously.
>
> "...and the SNs then shouldn't need newly generated keys (right?)..."
>
> On Tue, May 17, 2016 at 3:36 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
> I looked at the 'servicenode' postscript and it does _way_ too much for
> what I want to accomplish. I don't think the script was written with
> changes or upgrades in mind. It looks like it freshly copies everything to
> the SNs' $installdir/postscripts and /etc/xcat on the service node and
> generates (new?) keys. The SNs don't need those updates/changes in my case.
> From looking at the following comment in the 'servicenode' postscript and
> the code I'm wondering if all I need to do is manually
> modify /etc/xcat/cfgloc to update the IP for the new MN database location
> and if everything else will be fine. They keys should already be in place
> because I am copying the same keys from the old MN onto the new MN server,
> and the SNs then shouldn't need to keys (right?). Please let me know if you
> see any problems with this.
>
> The comment in the code:
>
>  For Linux:
>It calls xcatserver and xcatclient script to get the ssh keys, ssl
>redentials and cfgloc file and transfer from the MN to the SN
>to be able to access the
>database,  setup ssh keys on the nodes and have daemon to daemon
>commmunication between the SN and MN and have the SN access the DB.
>
>
> P.S. Also would just giving the new MN the same IP and hostname (even as
> an alias to a different primary hostname) more or less prevent any changes
> from needing to be made on the SNs at all (no postscripts run nor manual
> modifications of files)?
>
> Thanks,
> Josh
>
> On Thu, May 5, 2016 at 11:42 AM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
> Hi Christian,
>
> Thanks for the response. So do I actually have to reinstall the SNs and/or
> rerun the service n

Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?

2016-05-17 Thread Josh Nielsen

I looked at the 'servicenode' postscript and it does _way_ too much for
what I want to accomplish. I don't think the script was written with
changes or upgrades in mind. It looks like it freshly copies everything to
the SNs' $installdir/postscripts and /etc/xcat on the service node and
generates (new?) keys. The SNs don't need those updates/changes in my case.
>From looking at the following comment in the 'servicenode' postscript and
the code I'm wondering if all I need to do is manually
modify /etc/xcat/cfgloc to update the IP for the new MN database location
and if everything else will be fine. They keys should already be in place
because I am copying the same keys from the old MN onto the new MN server,
and the SNs then shouldn't need to keys (right?). Please let me know if you
see any problems with this.

The comment in the code:

 For Linux:
   It calls xcatserver and xcatclient script to get the ssh keys, ssl
   redentials and cfgloc file and transfer from the MN to the SN
   to be able to access the
   database,  setup ssh keys on the nodes and have daemon to daemon
   commmunication between the SN and MN and have the SN access the DB.


P.S. Also would just giving the new MN the same IP and hostname (even as an
alias to a different primary hostname) more or less prevent any changes
from needing to be made on the SNs at all (no postscripts run nor manual
modifications of files)?

Thanks,
Josh

On Thu, May 5, 2016 at 11:42 AM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Hi Christian,
>
> Thanks for the response. So do I actually have to reinstall the SNs and/or
> rerun the service node postscript? If reruning the SN post script just
> makes some minor adjustments but doesn't clear the dhcpd.leases and the
> .conf files for named and dhcp, as I have them configured, then that would
> be fine, but if it blows all that away and starts over that would qualify
> as disruptive for my environment since the cluster depends on slave DNS
> services and dhcp on the SN. I would ideally like minimal changes on the
> SNs except to point them to the new MN.
>
> As far as the postscripts, my question was what common (if not default in
> most installs) postscripts that come with xCAT have code in them that would
> result in the hardcoding of the MN's IP in some configuration file. I
> actually thought of one possible example along those lines, and that is
> whatever configures the client compute nodes to send all their syslog
> messages to the /var/log/messages log on the headnode instead of locally
> will need to be rerun/updated. What will need to be run to change that to
> make the clients log to the new MN server?
>
> Regarding the server identity (even though it will have a new IP address
> and hostname) can we just copy the keys in /etc/ssh/ to the new MN so that
> the SSH fingerprint doesn't change?
>
> Lastly, as regards running updatenode -k I definitely (in this case) do
> not want to replace the root rsa_id private and public keys on the cluster,
> the MN, or the SNs since other critical services like GPFS require the
> current keys to remain in place. Why is rerunning the key deploy necessary
> and is there not a way to make it work with the current keys?
>
> I just need to be very careful with my current setup so that I don't knock
> out critical services while changing the MN, which is why I was wondering
> how disruptive doing this might be. I appreciate the help!
>
> Thanks,
> Josh
>
> On Tue, May 3, 2016 at 10:05 AM, Christian Caruthers <
> ccaruth...@lenovo.com> wrote:
>
>> I would begin by looking at the servicenode postscript. It sets up the
>> daemon and database communications between SN & MN. Beyond that, the
>> default postscripts are listed in the "xcatdefaults" entry of the
>> postscripts table. You will probably want to run updatenode -k once you
>> have xCAT configured on the new MN. After that, you probably want to rerun
>> the remoteshell and syslog postscripts on the cluster members (updatenode
>> -P) at the very least.
>>
>>
>>
>> Second, you can dump the xCAT DB using dumpxCATdb command. After that,
>> grep out the management node (hostname and/or IP) to see where changes need
>> to be made for the DB on the new MN.
>>
>>
>>
>> If the SNs are handling DHCP, it only needs to be enabled on the MN if
>> you plan in reinstaling a SN.
>>
>>
>>
>> Anything that resolves DNS through the MN will need an updated
>> resolv.conf.
>>
>>
>>
>> Depending on how you're maintaining your /install directory on the SNs,
>> that mechanism will need to be updated.
>>
>>
>>
>> If your MN is routing for any nodes, that will need to be addressed. You
>>

Re: [xcat-user] How can I migrate to a new xCAT MN in a hierarchical environment?

2016-05-05 Thread Josh Nielsen

Hi Christian,

Thanks for the response. So do I actually have to reinstall the SNs and/or
rerun the service node postscript? If reruning the SN post script just
makes some minor adjustments but doesn't clear the dhcpd.leases and the
.conf files for named and dhcp, as I have them configured, then that would
be fine, but if it blows all that away and starts over that would qualify
as disruptive for my environment since the cluster depends on slave DNS
services and dhcp on the SN. I would ideally like minimal changes on the
SNs except to point them to the new MN.

As far as the postscripts, my question was what common (if not default in
most installs) postscripts that come with xCAT have code in them that would
result in the hardcoding of the MN's IP in some configuration file. I
actually thought of one possible example along those lines, and that is
whatever configures the client compute nodes to send all their syslog
messages to the /var/log/messages log on the headnode instead of locally
will need to be rerun/updated. What will need to be run to change that to
make the clients log to the new MN server?

Regarding the server identity (even though it will have a new IP address
and hostname) can we just copy the keys in /etc/ssh/ to the new MN so that
the SSH fingerprint doesn't change?

Lastly, as regards running updatenode -k I definitely (in this case) do not
want to replace the root rsa_id private and public keys on the cluster, the
MN, or the SNs since other critical services like GPFS require the current
keys to remain in place. Why is rerunning the key deploy necessary and is
there not a way to make it work with the current keys?

I just need to be very careful with my current setup so that I don't knock
out critical services while changing the MN, which is why I was wondering
how disruptive doing this might be. I appreciate the help!

Thanks,
Josh

On Tue, May 3, 2016 at 10:05 AM, Christian Caruthers <ccaruth...@lenovo.com>
wrote:

> I would begin by looking at the servicenode postscript. It sets up the
> daemon and database communications between SN & MN. Beyond that, the
> default postscripts are listed in the "xcatdefaults" entry of the
> postscripts table. You will probably want to run updatenode -k once you
> have xCAT configured on the new MN. After that, you probably want to rerun
> the remoteshell and syslog postscripts on the cluster members (updatenode
> -P) at the very least.
>
>
>
> Second, you can dump the xCAT DB using dumpxCATdb command. After that,
> grep out the management node (hostname and/or IP) to see where changes need
> to be made for the DB on the new MN.
>
>
>
> If the SNs are handling DHCP, it only needs to be enabled on the MN if you
> plan in reinstaling a SN.
>
>
>
> Anything that resolves DNS through the MN will need an updated resolv.conf.
>
>
>
> Depending on how you're maintaining your /install directory on the SNs,
> that mechanism will need to be updated.
>
>
>
> If your MN is routing for any nodes, that will need to be addressed. You
> might want to check the network configuration on the IMMs. On discovery, if
> you have a gateway defined on your management network (I believe it
> defaults to ), they might be pointing to the old MN. Shouldn't
> be an issue, but it's something to think about. If you're not routing on
> that network, I would use pasu to set the IMM gateway to 0.0.0.0 and be
> done with it.
>
>
>
> The only other concern I can think of would be the installation repos
> configured on the cluster nodes and SNs. If any point to the MN, they will
> need to be changed.
>
>
>
> Aside from all of that, it really depends on the particulars your cluster.
>
>
>
> Regards,
> *Christian Caruthers*
> Lenovo xESS IT Consultant
>
> Mobile: 757-289-9872
>
>
>
>
>
> *From:* Josh Nielsen [mailto:jniel...@hudsonalpha.org]
> *Sent:* Monday, May 02, 2016 8:32 PM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] How can I migrate to a new xCAT MN in a
> hierarchical environment?
>
>
>
> Hello all,
>
> My team is trying to move the xCAT MN role off of an old server and get it
> over onto new virtual infrastructure, but I am a little unsure about
> whether it is possible to do while leaving everything else in its place as
> we currently have it in our environment. We have an MN with two SNs for our
> xCAT environment, and I would need to make the SNs recognize that the new
> MN (with a new IP and hostname) is now their xcatmaster, and they would
> need to take hierarchical command updates from the new MN, look to the new
> MN for the xCAT database (which is a MySQL database in our environment),
> etc.
>
> So a few questions along those lines.
>
> 1. Which/how many xCAT database fields would I need to upd

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-09 Thread Josh Nielsen

Casandra,

Good news. Setting sshbetweennodes explicitly to ALLGROUPS automatically
provided both the id_rsa and id_rsa.pub keys in /root/.ssh/ when
'remoteshell' ran this time (in xCAT 2.8.3)! So that's all it was.

Except the strange thing is that we never had to set that before. Perhaps
whatever version we had before 2.8.3 didn't need it, but 2.8.3 does now.
But we were on 2.8.3 already with the last deploy, so... I have no idea how
this worked a few months ago without setting sshbetweennodes explicitly (or
why the default settings didn't have the same result), but I'm glad after
banging my head on this for about two weeks to make some headway finally.
Thanks for the suggestion!

Best regards,
Josh

On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:

> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes
> attribute is available for while. if it is not defined in the site table,
> the default should be set up passwordless between nodes.
> "nslookup nodename" still works after you update the hostname, right?
> can u able to find any error message from logs? maybe in the
> /var/log/message.
>
> Thanks,
> Casandra
> ...
> Casandra Hong Qiu
> Phone: (845) 433-9291, t/l 293-9291
> Office: B/002, Floor 3, Z13
> cxh...@us.ibm.com
>
>
>
> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33
> PM---Thanks for the response Casandra. I should firstly note that]Josh
> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I
> should firstly note that I have xCAT 2.8.3. I know I need to upg
>
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Date: 03/08/2016 03:58 PM
> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
> prompting for password
> --
>
>
>
>
> Thanks for the response Casandra. I should firstly note that I have xCAT
> 2.8.3. I know I need to upgrade, but not only has this worked in the past
> but I also successfully deployed 50+ nodes back in November with the exact
> same xCAT version I have now and using the same osimage for Centos 6.5,
> same kickstart, same defined postscripts, etc. So something else has
> changed, perhaps in our environment?
>
> That being said, I did not see sshbetweennodes specified at all in the
> site table. The following are the only two references to ssh in the table:
>
> #tabdump site | grep -i ssh
> "maxssh","8",,
> "rsh","/usr/bin/ssh",,
>
> Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has
> it been around a while?
>
> Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in
> either scenario), and I have seen that before as well, but actually I am
> not seeing any id_rsa* keys (public or private) copied to the node at all.
> And even an updatenode -k is only producing the id_rsa (if I manually type
> the password) but not the .pub, which is also odd. But authorized_keys is
> populated with the rsa public key signature. Something else must be going
> on.
>
> P.S. The only other issue I'm still dealing with, which may irrelevant for
> this issue, is a hostname problem to where if I run the ifcfg-eth
> postscript it updates the hostname in /etc/sysconfig/network from the
> correct node name to "localhost". My forward and reverse lookup entries in
> DNS are present, and the hostname is set correctly by the kickstart before
> ifcfg-eth is run, and remains there if it is not run, which I presume it
> gets from either the node definition in dhcpd.leases which is created with
> 'makedhcp' and/or the DNS entries for the host's IP. On the off chance that
> key copying could be tied to name resolution inconsistencies I thought I
> might mention that as well.
>
> Thanks,
> Josh
>
> On Tue, Mar 8, 2016 at 1:20 PM, Casandra H Qiu <*cxh...@us.ibm.com*
> <cxh...@us.ibm.com>> wrote:
>
>can u check the site table if sshbetweennodes is set up? The default
>for sshbetweennodes is ALLGROUPS, and will enable passwordless between
>nodes. this attribute will be ignored if zone table is set up, so please
>check zone table also.
>
>if it enables, the remoteshell postscript will copy id_rsa and
>id_rsa.pub over to compute node, otherwise, it only copies id_rsa.pub.
>
>from source code, updatenode -k is always required password.
>
>
>Thanks,
>Casandra
>.......
>Casandra Hong Qiu
>Phone: *(845) 433-9291* <%28845%29%20433-9291>, t/l 293-9291
>

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-09 Thread Josh Nielsen

Hi Christopher,

I checked our 'nameservers' field in the site table and it is pointing to
our external DNS server, which I think was a required configuration for
makedns -e to work. Updates for new entries to the external DNS server,
which is the master of the slave DNS servers on the SNs, propagate down to
the SNs immediately. But I would be hesitant to add the SNs to the list of
IPs in that field lest it break something in the external DNS
configuration.

The oddest thing however is that "none of the above" (the SNs slave DNS or
external DNS server) are making it into resolv.conf, but rather the MN's IP
(which used to - once upon a time - host the master DNS server before we
changed to an external DNS configuration). The only field in the site table
pointing to the MN is the 'master' field, as it should be.

And here is a sample of one particular node in question. None of it points
to the MN's IP, and 'xcatmaster ' is relativized between SNs for the group
the node is in:

nfsserver=[SN1_IP]
os=centos6.5

postscripts=syslog,remoteshell,ospkgs,addsiteyum,syncfiles,fstab,setupntp,otherpkgs
primarynic=eth0
profile=compute
provmethod=centos6.5-x86_64-install-compute
serialflow=hard
serialport=0
serialspeed=115200
servicenode=[SN1_DNS_NAME],[SN2_DNS_NAME]
status=booting
tftpserver=[SN1_DNS_NAME]
xcatmaster=[SN1_IP]

-Josh

On Tue, Mar 8, 2016 at 4:54 PM, Christopher Samuel <sam...@unimelb.edu.au>
wrote:

> On 09/03/16 09:37, Josh Nielsen wrote:
>
> > Actually, I take it back, some of the nodes have the MN's IP in their
> > resolv.conf (which has no DNS server) after a kickstart install, which
> > is possibly why the localhost assignment happens since a lookup would
> > fail. I think only if I run a postscript that I specifically created to
> > edit the /etc/resolv.conf and point it to the SNs which run DNS servers
> > does it allow the node to resolve its name. I wish the kickstart would
> > populate the resolv.conf file with its service node IPs by default (and
> > I thought it used to??).
>
> xCAT should populate your /etc/resolv.conf with the "nameservers"
> attribute from your site table.
>
> Ours certainly do - and we've got a variety of xCAT versions deployed
> for our various clusters (2.8.1, 2.8.5 (x3) & 2.10) on the "If it ain't
> broke, don't fix it" principle.
>
> All the best,
> Chris
> --
>  Christopher SamuelSenior Systems Administrator
>  VLSCI - Victorian Life Sciences Computation Initiative
>  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.org.au/  http://twitter.com/vlsci
>
>
>
> --
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://makebettercode.com/inteldaal-eval
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-09 Thread Josh Nielsen

I should also mention that for kicks last week we tried deploying one VM
that had been defined in the hpvmgroupA group 4 months ago with the rest of
them, but had not been deployed and had been delayed until now, and it too
is experiencing the same problems with the keys as well as our (unrelated?)
hostname issues, so I'm going for something that changed in our
environment, be it networking related, DNS, perhaps permissions, or any
host of other variables that can change on systems involved with the xCAT
deploy process. I just wish there were a meaningful error or way to
determine the root issue more easily.

On Wed, Mar 9, 2016 at 10:01 AM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Yes. Sorry that I did not mention it before, but they are stateful nodes.
> I have never used stateless nodes and have nothing configured for
> stateless. I'm just puzzled what could have changed in four months since I
> last I deployed 50+ VMs with the same osimage template and postscripts they
> used at that time. I'm deploying Centos 6.5 to a series of VMs on ESXi
> (standalone - not vSphere managed) on an HP CX7000 Blade chassis. I'm PXE
> booting as the method of delivering the kickstart, and I'm only having
> problems post-kickstart (or so it seems...). And the deploy is completely
> abstracted to the VM guest operating system level, so nothing specific to
> HP.
>
> I'm obscuring the IPs in the following excerpt from my xCAT 'hosts' table,
> but the only real change in xCAT was that I defined a third ('C') group of
> VMs that run on the HP cluster following the pattern I had done for the
> previous 50+ VMs which were divided between the A and B groups:
>
> "hpvmgroupA","|\D+(\d+).*$|X.Y.101.($1-0)|",,,"HP Compute nodes compute
> interface",
> "hpvmgroupB","|\D+(\d+).*$|X.Y.102.($1-0)|",,,"HP Compute nodes compute
> interface",
> "hpvmgroupC","|\D+(\d+).*$|X.Y.103.($1-0)|",,,"HP Compute nodes compute
> interface",
>
> Otherwise, if you do an lsdef and look at a node that was deployed
> successfully which was part of hpvmgroupA and one that I am currently
> trying to deploy in hpvmgroupC they are identical except for the details
> that should differ like IP address and other things; and the osimage for
> the Centos 6.5 image (which points to the *.tmpl kickstart files and the
> otherpkgs *.tmpl files) is the same. So perhaps something else in my
> environment changed?
>
> My coworker and I have tried delving into the code and placing echoes as
> debugging statements, and have looked at the -V verbose output of various
> commands, but can't seem to find a meaningful error as to why it is not
> fetching either the id_rsa or id_rsa.pub files. Perhaps this was taken care
> of in a certain postscript that I have taken for granted before and
> forgotten to run this time? I have no idea.
>
> -Josh
>
> On Tue, Mar 8, 2016 at 10:34 PM, Daniel Letai <d...@letai.org.il> wrote:
>
>> Can you confirm you are deploying stateful nodes and not stateless ?
>>
>>
>> On 03/09/2016 12:53 AM, Josh Nielsen wrote:
>>
>> My coworker just pointed out that the /xcatpost/mypostscript on the nodes
>> that are deployed actually have this line:
>>
>> ENABLESSHBETWEENNODES='NO'
>> export ENABLESSHBETWEENNODES
>>
>> That's interesting, given that sshbetweennodes (without 'enable' as the
>> beginning of the parameter name?) was not defined at all in the site table
>> and the default is supposedly enabled(?). However, I just set
>> sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now
>> redeploying to see if it makes a difference.
>>
>> The man page for site says:
>>
>>  sshbetweennodes:  Comma separated list of groups to enable passwordless
>> root
>> ssh during install, or xdsh -K.
>> Default is ALLGROUPS.
>>         Set to NOGROUPS,if you do not wish to
>> enabled any groups.
>> Service Nodes are not affected by
>> this attribute
>> they are always setup with
>> passwordless root access to nodes and
>> other SN.
>>
>> -Josh
>>
>> On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org>
>> wrote:
>>
>>> Here is what I see in /var/log/messages when remoteshell is run:
>>>
>>> Mar  7 14:28:41 xcat-serv1 node0087c xcat: remoteshell:  setup
>>> /etc/ssh/sshd_config and ssh_config
>>> Mar  7 14:28:41 xcat-serv1 node0087c xcat: Install: s

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-09 Thread Josh Nielsen

Yes. Sorry that I did not mention it before, but they are stateful nodes. I
have never used stateless nodes and have nothing configured for
stateless. I'm just puzzled what could have changed in four months since I
last I deployed 50+ VMs with the same osimage template and postscripts they
used at that time. I'm deploying Centos 6.5 to a series of VMs on ESXi
(standalone - not vSphere managed) on an HP CX7000 Blade chassis. I'm PXE
booting as the method of delivering the kickstart, and I'm only having
problems post-kickstart (or so it seems...). And the deploy is completely
abstracted to the VM guest operating system level, so nothing specific to
HP.

I'm obscuring the IPs in the following excerpt from my xCAT 'hosts' table,
but the only real change in xCAT was that I defined a third ('C') group of
VMs that run on the HP cluster following the pattern I had done for the
previous 50+ VMs which were divided between the A and B groups:

"hpvmgroupA","|\D+(\d+).*$|X.Y.101.($1-0)|",,,"HP Compute nodes compute
interface",
"hpvmgroupB","|\D+(\d+).*$|X.Y.102.($1-0)|",,,"HP Compute nodes compute
interface",
"hpvmgroupC","|\D+(\d+).*$|X.Y.103.($1-0)|",,,"HP Compute nodes compute
interface",

Otherwise, if you do an lsdef and look at a node that was deployed
successfully which was part of hpvmgroupA and one that I am currently
trying to deploy in hpvmgroupC they are identical except for the details
that should differ like IP address and other things; and the osimage for
the Centos 6.5 image (which points to the *.tmpl kickstart files and the
otherpkgs *.tmpl files) is the same. So perhaps something else in my
environment changed?

My coworker and I have tried delving into the code and placing echoes as
debugging statements, and have looked at the -V verbose output of various
commands, but can't seem to find a meaningful error as to why it is not
fetching either the id_rsa or id_rsa.pub files. Perhaps this was taken care
of in a certain postscript that I have taken for granted before and
forgotten to run this time? I have no idea.

-Josh

On Tue, Mar 8, 2016 at 10:34 PM, Daniel Letai <d...@letai.org.il> wrote:

> Can you confirm you are deploying stateful nodes and not stateless ?
>
>
> On 03/09/2016 12:53 AM, Josh Nielsen wrote:
>
> My coworker just pointed out that the /xcatpost/mypostscript on the nodes
> that are deployed actually have this line:
>
> ENABLESSHBETWEENNODES='NO'
> export ENABLESSHBETWEENNODES
>
> That's interesting, given that sshbetweennodes (without 'enable' as the
> beginning of the parameter name?) was not defined at all in the site table
> and the default is supposedly enabled(?). However, I just set
> sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now
> redeploying to see if it makes a difference.
>
> The man page for site says:
>
>  sshbetweennodes:  Comma separated list of groups to enable passwordless
> root
> ssh during install, or xdsh -K.
> Default is ALLGROUPS.
> Set to NOGROUPS,if you do not wish to
> enabled any groups.
> Service Nodes are not affected by this
> attribute
>         they are always setup with
> passwordless root access to nodes and
> other SN.
>
> -Josh
>
> On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
>> Here is what I see in /var/log/messages when remoteshell is run:
>>
>> Mar  7 14:28:41 xcat-serv1 node0087c xcat: remoteshell:  setup
>> /etc/ssh/sshd_config and ssh_config
>> Mar  7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh
>> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing
>> getcredentials ssh_dsa_hostkey from node0087c
>> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting
>> ssh_host_dsa_key
>> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing
>> getcredentials ssh_rsa_hostkey from node0087c
>> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey
>> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd
>>
>> I see new timestamps on authorized_keys and copy.sh when it is run, so it
>> is actually doing something.
>>
>> Is there a substantial difference between remoteshell and updatenode -k?
>> Why does updatenode -k successfully copy the id_rsa key to the node if I
>> type in the password?
>>
>> As for the hostname in /etc/sysconfig/network, actually if I run just the
>> kickstart and remove ifcfg-eth from the list of postscripts (in the node
>> definition's postscripts= field) to execute automatica

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-08 Thread Josh Nielsen

My coworker just pointed out that the /xcatpost/mypostscript on the nodes
that are deployed actually have this line:

ENABLESSHBETWEENNODES='NO'
export ENABLESSHBETWEENNODES

That's interesting, given that sshbetweennodes (without 'enable' as the
beginning of the parameter name?) was not defined at all in the site table
and the default is supposedly enabled(?). However, I just set
sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now
redeploying to see if it makes a difference.

The man page for site says:

 sshbetweennodes:  Comma separated list of groups to enable passwordless
root
ssh during install, or xdsh -K. Default
is ALLGROUPS.
Set to NOGROUPS,if you do not wish to
enabled any groups.
Service Nodes are not affected by this
attribute
they are always setup with
passwordless root access to nodes and
other SN.

-Josh

On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Here is what I see in /var/log/messages when remoteshell is run:
>
> Mar  7 14:28:41 xcat-serv1 node0087c xcat: remoteshell:  setup
> /etc/ssh/sshd_config and ssh_config
> Mar  7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh
> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing
> getcredentials ssh_dsa_hostkey from node0087c
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting
> ssh_host_dsa_key
> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing
> getcredentials ssh_rsa_hostkey from node0087c
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd
>
> I see new timestamps on authorized_keys and copy.sh when it is run, so it
> is actually doing something.
>
> Is there a substantial difference between remoteshell and updatenode -k?
> Why does updatenode -k successfully copy the id_rsa key to the node if I
> type in the password?
>
> As for the hostname in /etc/sysconfig/network, actually if I run just the
> kickstart and remove ifcfg-eth from the list of postscripts (in the node
> definition's postscripts= field) to execute automatically it ends up
> looking like this:
> # cat /etc/sysconfig/network
> NETWORKING=yes
> HOSTNAME=3(NXDOMAIN)
>
> But when I manually run updatenode node0087c -P ifcfg-eth it works
> correctly. (Also it changes the  /etc/sysconfig/network-scripts/ifcfg-eth0
> file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it
> should).
> # cat /etc/sysconfig/network
> NETWORKING=yes
> HOSTNAME=node0087c.morgan.haib.org
>
> Only if I keep the ifcfg-eth postscript in the node definition (listed in
> postscripts= if you 'lsdef') to be automatically executed does
> that /etc/sysconfig/network file say 'localhost', which seems to indicate
> an order of execution problem to me (even though I made sure ifcfg-eth was
> listed last). And, to answer your question, an nslookup node0087c before
> and after both return the correct IP from either of the SNs' slave DNS
> servers.
>
> They are possibly two unrelated issues, but I'm close to just upgrading
> xCAT and seeing if I have any better luck if I can't figure out some
> obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the
> site table and rerun the remoteshell postscript.
>
> Regards,
> Josh
>
> On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:
>
>> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes
>> attribute is available for while. if it is not defined in the site table,
>> the default should be set up passwordless between nodes.
>> "nslookup nodename" still works after you update the hostname, right?
>> can u able to find any error message from logs? maybe in the
>> /var/log/message.
>>
>> Thanks,
>> Casandra
>> ...
>> Casandra Hong Qiu
>> Phone: (845) 433-9291, t/l 293-9291
>> Office: B/002, Floor 3, Z13
>> cxh...@us.ibm.com
>>
>>
>>
>> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33
>> PM---Thanks for the response Casandra. I should firstly note that]Josh
>> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I
>> should firstly note that I have xCAT 2.8.3. I know I need to upg
>>
>> From: Josh Nielsen <jniel...@hudsonalpha.org>
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> Date: 03/08/2016 03:58 PM
>> Subject: Re: [xcat-user] Updatenode -k won't create id_rs

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-08 Thread Josh Nielsen

Actually, I take it back, some of the nodes have the MN's IP in their
resolv.conf (which has no DNS server) after a kickstart install, which is
possibly why the localhost assignment happens since a lookup would fail. I
think only if I run a postscript that I specifically created to edit the
/etc/resolv.conf and point it to the SNs which run DNS servers does it
allow the node to resolve its name. I wish the kickstart would populate the
resolv.conf file with its service node IPs by default (and I thought it
used to??).

On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Here is what I see in /var/log/messages when remoteshell is run:
>
> Mar  7 14:28:41 xcat-serv1 node0087c xcat: remoteshell:  setup
> /etc/ssh/sshd_config and ssh_config
> Mar  7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh
> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing
> getcredentials ssh_dsa_hostkey from node0087c
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting
> ssh_host_dsa_key
> Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing
> getcredentials ssh_rsa_hostkey from node0087c
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey
> Mar  7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd
>
> I see new timestamps on authorized_keys and copy.sh when it is run, so it
> is actually doing something.
>
> Is there a substantial difference between remoteshell and updatenode -k?
> Why does updatenode -k successfully copy the id_rsa key to the node if I
> type in the password?
>
> As for the hostname in /etc/sysconfig/network, actually if I run just the
> kickstart and remove ifcfg-eth from the list of postscripts (in the node
> definition's postscripts= field) to execute automatically it ends up
> looking like this:
> # cat /etc/sysconfig/network
> NETWORKING=yes
> HOSTNAME=3(NXDOMAIN)
>
> But when I manually run updatenode node0087c -P ifcfg-eth it works
> correctly. (Also it changes the  /etc/sysconfig/network-scripts/ifcfg-eth0
> file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it
> should).
> # cat /etc/sysconfig/network
> NETWORKING=yes
> HOSTNAME=node0087c.morgan.haib.org
>
> Only if I keep the ifcfg-eth postscript in the node definition (listed in
> postscripts= if you 'lsdef') to be automatically executed does
> that /etc/sysconfig/network file say 'localhost', which seems to indicate
> an order of execution problem to me (even though I made sure ifcfg-eth was
> listed last). And, to answer your question, an nslookup node0087c before
> and after both return the correct IP from either of the SNs' slave DNS
> servers.
>
> They are possibly two unrelated issues, but I'm close to just upgrading
> xCAT and seeing if I have any better luck if I can't figure out some
> obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the
> site table and rerun the remoteshell postscript.
>
> Regards,
> Josh
>
> On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:
>
>> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes
>> attribute is available for while. if it is not defined in the site table,
>> the default should be set up passwordless between nodes.
>> "nslookup nodename" still works after you update the hostname, right?
>> can u able to find any error message from logs? maybe in the
>> /var/log/message.
>>
>> Thanks,
>> Casandra
>> ...
>> Casandra Hong Qiu
>> Phone: (845) 433-9291, t/l 293-9291
>> Office: B/002, Floor 3, Z13
>> cxh...@us.ibm.com
>>
>>
>>
>> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33
>> PM---Thanks for the response Casandra. I should firstly note that]Josh
>> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I
>> should firstly note that I have xCAT 2.8.3. I know I need to upg
>>
>> From: Josh Nielsen <jniel...@hudsonalpha.org>
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>> Date: 03/08/2016 03:58 PM
>> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
>> prompting for password
>> --
>>
>>
>>
>>
>> Thanks for the response Casandra. I should firstly note that I have xCAT
>> 2.8.3. I know I need to upgrade, but not only has this worked in the past
>> but I also successfully deployed 50+ nodes back in November with the exact
>> same xCAT version I have now and using the same osimage for Centos 6.5,
>> same kickstart, same defined postscripts, etc. So something else h

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-08 Thread Josh Nielsen

Here is what I see in /var/log/messages when remoteshell is run:

Mar  7 14:28:41 xcat-serv1 node0087c xcat: remoteshell:  setup
/etc/ssh/sshd_config and ssh_config
Mar  7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh
Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing
getcredentials ssh_dsa_hostkey from node0087c
Mar  7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting
ssh_host_dsa_key
Mar  7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing
getcredentials ssh_rsa_hostkey from node0087c
Mar  7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey
Mar  7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd

I see new timestamps on authorized_keys and copy.sh when it is run, so it
is actually doing something.

Is there a substantial difference between remoteshell and updatenode -k?
Why does updatenode -k successfully copy the id_rsa key to the node if I
type in the password?

As for the hostname in /etc/sysconfig/network, actually if I run just the
kickstart and remove ifcfg-eth from the list of postscripts (in the node
definition's postscripts= field) to execute automatically it ends up
looking like this:
# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=3(NXDOMAIN)

But when I manually run updatenode node0087c -P ifcfg-eth it works
correctly. (Also it changes the  /etc/sysconfig/network-scripts/ifcfg-eth0
file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it
should).
# cat /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=node0087c.morgan.haib.org

Only if I keep the ifcfg-eth postscript in the node definition (listed in
postscripts= if you 'lsdef') to be automatically executed does
that /etc/sysconfig/network file say 'localhost', which seems to indicate
an order of execution problem to me (even though I made sure ifcfg-eth was
listed last). And, to answer your question, an nslookup node0087c before
and after both return the correct IP from either of the SNs' slave DNS
servers.

They are possibly two unrelated issues, but I'm close to just upgrading
xCAT and seeing if I have any better luck if I can't figure out some
obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the
site table and rerun the remoteshell postscript.

Regards,
Josh

On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:

> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes
> attribute is available for while. if it is not defined in the site table,
> the default should be set up passwordless between nodes.
> "nslookup nodename" still works after you update the hostname, right?
> can u able to find any error message from logs? maybe in the
> /var/log/message.
>
> Thanks,
> Casandra
> ...
> Casandra Hong Qiu
> Phone: (845) 433-9291, t/l 293-9291
> Office: B/002, Floor 3, Z13
> cxh...@us.ibm.com
>
>
>
> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33
> PM---Thanks for the response Casandra. I should firstly note that]Josh
> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I
> should firstly note that I have xCAT 2.8.3. I know I need to upg
>
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Date: 03/08/2016 03:58 PM
> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
> prompting for password
> --
>
>
>
>
> Thanks for the response Casandra. I should firstly note that I have xCAT
> 2.8.3. I know I need to upgrade, but not only has this worked in the past
> but I also successfully deployed 50+ nodes back in November with the exact
> same xCAT version I have now and using the same osimage for Centos 6.5,
> same kickstart, same defined postscripts, etc. So something else has
> changed, perhaps in our environment?
>
> That being said, I did not see sshbetweennodes specified at all in the
> site table. The following are the only two references to ssh in the table:
>
> #tabdump site | grep -i ssh
> "maxssh","8",,
> "rsh","/usr/bin/ssh",,
>
> Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has
> it been around a while?
>
> Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in
> either scenario), and I have seen that before as well, but actually I am
> not seeing any id_rsa* keys (public or private) copied to the node at all.
> And even an updatenode -k is only producing the id_rsa (if I manually type
> the password) but not the .pub, which is also odd. But authorized_keys is
> populated with the rsa public key signature. Something else must be going
> on.
>
> P.S. The only other issue I'm still dealing with, which may irrelev

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-08 Thread Josh Nielsen

Thanks for the response Casandra. I should firstly note that I have xCAT
2.8.3. I know I need to upgrade, but not only has this worked in the past
but I also successfully deployed 50+ nodes back in November with the exact
same xCAT version I have now and using the same osimage for Centos 6.5,
same kickstart, same defined postscripts, etc. So something else has
changed, perhaps in our environment?

That being said, I did not see sshbetweennodes specified at all in the site
table. The following are the only two references to ssh in the table:

#tabdump site | grep -i ssh
"maxssh","8",,
"rsh","/usr/bin/ssh",,

Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or has it
been around a while?

Lastly, you said that remoteshell copies over id_rsa.pub (regardless - in
either scenario), and I have seen that before as well, but actually I am
not seeing any id_rsa* keys (public or private) copied to the node at all.
And even an updatenode -k is only producing the id_rsa (if I manually type
the password) but not the .pub, which is also odd. But authorized_keys is
populated with the rsa public key signature. Something else must be going
on.

P.S. The only other issue I'm still dealing with, which may irrelevant for
this issue, is a hostname problem to where if I run the ifcfg-eth
postscript it updates the hostname in /etc/sysconfig/network from the
correct node name to "localhost". My forward and reverse lookup entries in
DNS are present, and the hostname is set correctly by the kickstart before
ifcfg-eth is run, and remains there if it is not run, which I presume it
gets from either the node definition in dhcpd.leases which is created with
'makedhcp' and/or the DNS entries for the host's IP. On the off chance that
key copying could be tied to name resolution inconsistencies I thought I
might mention that as well.

Thanks,
Josh

On Tue, Mar 8, 2016 at 1:20 PM, Casandra H Qiu <cxh...@us.ibm.com> wrote:

> can u check the site table if sshbetweennodes is set up? The default for
> sshbetweennodes is ALLGROUPS, and will enable passwordless between nodes.
> this attribute will be ignored if zone table is set up, so please check
> zone table also.
>
> if it enables, the remoteshell postscript will copy id_rsa and id_rsa.pub
> over to compute node, otherwise, it only copies id_rsa.pub.
>
> from source code, updatenode -k is always required password.
>
>
> Thanks,
> Casandra
> ...
> Casandra Hong Qiu
> Phone: (845) 433-9291, t/l 293-9291
> Office: B/002, Floor 3, Z13
> cxh...@us.ibm.com
>
>
>
> [image: Inactive hide details for Josh Nielsen ---03/08/2016 12:51:58
> PM---Yes, I just verified. It is present, but that alone is not s]Josh
> Nielsen ---03/08/2016 12:51:58 PM---Yes, I just verified. It is present,
> but that alone is not sufficient for that node to be able to SS
>
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Date: 03/08/2016 12:51 PM
> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
> prompting for password
> --
>
>
>
> Yes, I just verified. It is present, but that alone is not sufficient for
> that node to be able to SSH to other nodes itself. It allows other nodes
> which have the correct private key to SSH to it, but not the other way
> around.
>
> For example, on one compute node I'm having trouble with /root/.ssh has
> these three files:
>
> -rw---. 1 root root  408 Mar  7 14:28 authorized_keys
> -rw---. 1 root root  411 Mar  7 14:28 copy.sh
> -rw---  1 root root  402 Mar  3 16:20 known_hosts
>
> And authorized_keys has the correct ssh-rsa public key entry, but I cannot
> go from this node to any other node in my cluster via passwordless ssh. But
> as soon as I run updatenode -k, and type in the password that it prompts
> for to complete the command, the id_rsa key is added as the fourth file to
> the /root/.ssh directory, and then after that I can ssh to other nodes from
> it without supplying a password. That is the issue.
>
> In the past simply running the remoteshell postscript (or so I assumed)
> was sufficient for adding the id_rsa file, and it was all automated from a
> fresh deploy by specifying remoteshell as one of the default postscripts to
> run. But now it doesn't look like remoteshell is placing the id_rsa file on
> the node (unless some other script or command is responsible for that), but
> remoteshell looks like it creates everything else in /root/.ssh/ (and
> /etc/ssh/).
>
> Is remoteshell the correct postscript for that, or was the id_rsa key most
> likely being pushed to the nodes some other way (like by some code that
&

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-08 Thread Josh Nielsen

Yes, I just verified. It is present, but that alone is not sufficient for
that node to be able to SSH to other nodes itself. It allows other nodes
which have the correct private key to SSH to it, but not the other way
around.

For example, on one compute node I'm having trouble with /root/.ssh has
these three files:

-rw---. 1 root root  408 Mar  7 14:28 authorized_keys
-rw---. 1 root root  411 Mar  7 14:28 copy.sh
-rw---  1 root root  402 Mar  3 16:20 known_hosts

And authorized_keys has the correct ssh-rsa public key entry, but I cannot
go from this node to any other node in my cluster via passwordless ssh. But
as soon as I run updatenode -k, and type in the password that it prompts
for to complete the command, the id_rsa key is added as the fourth file to
the /root/.ssh directory, and then after that I can ssh to other nodes from
it without supplying a password. That is the issue.

In the past simply running the remoteshell postscript (or so I assumed) was
sufficient for adding the id_rsa file, and it was all automated from a
fresh deploy by specifying remoteshell as one of the default postscripts to
run. But now it doesn't look like remoteshell is placing the id_rsa file on
the node (unless some other script or command is responsible for that), but
remoteshell looks like it creates everything else in /root/.ssh/ (and
/etc/ssh/).

Is remoteshell the correct postscript for that, or was the id_rsa key most
likely being pushed to the nodes some other way (like by some code that
called updatenode -k upon initial deployment)? Either way, all I can say
for sure is that id_rsa used to appear in /root/.ssh on the compute node
automatically and now it does not.

Regards,
Josh

On Tue, Mar 8, 2016 at 4:19 AM, Xiao Peng Wang <w...@cn.ibm.com> wrote:

> To enable the login without password, the rsa public key should be copied
> to /root/.ssh/authorized_keys in the compute node. Could you check whether
> the key has been added in to
> /root/.ssh/authorized_keys?
>
>
>
> Thanks
> Best Regards
> --
> Wang Xiaopeng (王晓朋)
> IBM China System Technology Laboratory
> Tel: 86-10-82453455
> Email: w...@cn.ibm.com
> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
> Haidian District Beijing P.R.China 100193
>
>
>
> - Original message -
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
> prompting for password
> Date: Tue, Mar 8, 2016 5:26 AM
>
> Also if remoteshell is invoked directly as a postscript ('updatenode
> node0086c -V -P remoteshell') it produces the same result, but does not
> prompt for a password (like invoking xdsh -K directly doesn't), and copies
> everything over except id_rsa. So actually the prompting for a password is
> specific to updatenode -k, not xdsh -K or the remoteshell postscript (which
> run that). So I'm not sure if that is relevant to the underlying problem or
> not, but if I do invoke updatenode -k and supply it the password it copies
> the id_rsa to the node.
>
> On Mon, Mar 7, 2016 at 2:12 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
> Hello,
>
> When we freshly deploy a node from the kickstart and run our postscripts
> we noticed that for some reason the /root/.ssh/id_rsa file which allows
> passwordless login from that node to other nodes is missing, though this
> was not the case just a few months ago. When I try to generate the key
> manually it prompts for a password, after which it will copy/create that
> file successfully (see below), but there are a few odd things connected to
> this.
>
> The error is:
> updatenode node0087c -k
> Enter the password for the userid: root on the node where the ssh keys
> will be updated:
>
> The first oddity is that even after supplying the password once for a
> particular node it will prompt for the password every time if I run it
> again, as well as the related problem that this never used to happen before
> and the key used to be created without issue or prompting for a password.
> The 'passwd' xCAT table has the password for root (if that is where it
> looks for this command).
>
> Secondly I have done several manual debugging steps (and poking around
> source code to see what is happening) and I have run the actual xdsh
> command that is called, shown from the -V verbose output (which it prints
> two of, the first apparently to prep the SNs and run the 'remoteshell'
> postscript on them, and the second to actually do the same to the node
> specified).
>
> xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m
> [MN_IP] 'remoteshell,servicenode'

Re: [xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-07 Thread Josh Nielsen

Also if remoteshell is invoked directly as a postscript ('updatenode
node0086c -V -P remoteshell') it produces the same result, but does not
prompt for a password (like invoking xdsh -K directly doesn't), and copies
everything over except id_rsa. So actually the prompting for a password is
specific to updatenode -k, not xdsh -K or the remoteshell postscript (which
run that). So I'm not sure if that is relevant to the underlying problem or
not, but if I do invoke updatenode -k and supply it the password it copies
the id_rsa to the node.

On Mon, Mar 7, 2016 at 2:12 PM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> Hello,
>
> When we freshly deploy a node from the kickstart and run our postscripts
> we noticed that for some reason the /root/.ssh/id_rsa file which allows
> passwordless login from that node to other nodes is missing, though this
> was not the case just a few months ago. When I try to generate the key
> manually it prompts for a password, after which it will copy/create that
> file successfully (see below), but there are a few odd things connected to
> this.
>
> The error is:
> updatenode node0087c -k
> Enter the password for the userid: root on the node where the ssh keys
> will be updated:
>
> The first oddity is that even after supplying the password once for a
> particular node it will prompt for the password every time if I run it
> again, as well as the related problem that this never used to happen before
> and the key used to be created without issue or prompting for a password.
> The 'passwd' xCAT table has the password for root (if that is where it
> looks for this command).
>
> Secondly I have done several manual debugging steps (and poking around
> source code to see what is happening) and I have run the actual xdsh
> command that is called, shown from the -V verbose output (which it prints
> two of, the first apparently to prep the SNs and run the 'remoteshell'
> postscript on them, and the second to actually do the same to the node
> specified).
>
> xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m
> [MN_IP] 'remoteshell,servicenode' --tftp /tftpboot --installdir /install
> --nfsv4 no -c -V
>
> xdsh node0086c --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5
> -m [SN1_IP] 'remoteshell' --tftp /tftpboot --installdir /install --nfsv4 no
> -c -V
>
> This did not reveal anything useful, except that when invoked directly
> like this no password is prompted for and it runs, but still leaves out the
> id_rsa file. I followed also the suggestion by Wang Xaiopeng in this thread
> (*http://tinyurl.com/jz2jzmb <http://tinyurl.com/jz2jzmb>)* to test the
> getcredentials call with:
>
> 1. Enable mini server
> /xcatpost/allowcred.awk &
>
> 2.Try to get rsa hostkey
> USEOPENSSLFORXCAT=yes XCATSERVER=:3001 /xcatpost/getcredentials.awk
> ssh_rsa_hostkey
> This returned  ssh_rsa_hostkey sucessfully. When remoteshell is run
> (whether with updatenode -k or xdsh -K) it actually does copy over the key
> files into /etc/ssh/ and it copies known_hosts, copy.sh, and
> authorized_keys into /root/.ssh on the compute node but omits id_rsa. What
> could be going wrong here?
>
> Regards,
> Josh Nielsen
>
>
>
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] Updatenode -k won't create id_rsa key without prompting for password

2016-03-07 Thread Josh Nielsen

Hello,

When we freshly deploy a node from the kickstart and run our postscripts we
noticed that for some reason the /root/.ssh/id_rsa file which allows
passwordless login from that node to other nodes is missing, though this
was not the case just a few months ago. When I try to generate the key
manually it prompts for a password, after which it will copy/create that
file successfully (see below), but there are a few odd things connected to
this.

The error is:
updatenode node0087c -k
Enter the password for the userid: root on the node where the ssh keys will
be updated:

The first oddity is that even after supplying the password once for a
particular node it will prompt for the password every time if I run it
again, as well as the related problem that this never used to happen before
and the key used to be created without issue or prompting for a password.
The 'passwd' xCAT table has the password for root (if that is where it
looks for this command).

Secondly I have done several manual debugging steps (and poking around
source code to see what is happening) and I have run the actual xdsh
command that is called, shown from the -V verbose output (which it prints
two of, the first apparently to prep the SNs and run the 'remoteshell'
postscript on them, and the second to actually do the same to the node
specified).

xdsh sn1,sn2 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5 -m
[MN_IP] 'remoteshell,servicenode' --tftp /tftpboot --installdir /install
--nfsv4 no -c -V

xdsh node0086c --nodestatus -s -v -e /install/postscripts/xcatdsklspost 5
-m [SN1_IP] 'remoteshell' --tftp /tftpboot --installdir /install --nfsv4 no
-c -V

This did not reveal anything useful, except that when invoked directly like
this no password is prompted for and it runs, but still leaves out the
id_rsa file. I followed also the suggestion by Wang Xaiopeng in this thread
(*http://tinyurl.com/jz2jzmb <http://tinyurl.com/jz2jzmb>)* to test the
getcredentials call with:

1. Enable mini server
/xcatpost/allowcred.awk &

2.Try to get rsa hostkey
USEOPENSSLFORXCAT=yes XCATSERVER=:3001 /xcatpost/getcredentials.awk
ssh_rsa_hostkey
This returned  ssh_rsa_hostkey sucessfully. When remoteshell is run
(whether with updatenode -k or xdsh -K) it actually does copy over the key
files into /etc/ssh/ and it copies known_hosts, copy.sh, and
authorized_keys into /root/.ssh on the compute node but omits id_rsa. What
could be going wrong here?

Regards,
Josh Nielsen
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Some trouble with postscripts

2016-03-02 Thread Josh Nielsen

Thank you. The otherpkgs postscript wasn't executing because it was somehow
removed from the default postscripts list, and I thought for some reason
addsiteyum was supposed to add the xCAT-otherpkgs0.repo file thus I didn't
look any further than that. I forgot though that otherpkgs and ifcfg-eth
(hence the network config was not static) were not baked into the default
postscripts list. That explains the differences I was seeing.

-Josh

On Tue, Mar 1, 2016 at 7:26 PM, Xiao Peng Wang <w...@cn.ibm.com> wrote:

> You xCAT build was too old. Now the stable version is xCAT 2.11.
>
> The file named xCAT-otherpkgs0.repo should be created by the running of
> otherpkgs script. If the otherpkgs failed, you need to check whether the
> otherpkgdir/otherpkglist attributes for the osimage has been set
> correctly.  Also you can check whether the environment variables like
> OTHERPKGDIR have been exported from CN:/xcatpost/mypostscript.
>
> The default configuratioin for install nic is 'dhcp', you can run the
> postscript 'confignics -s' to set it to be static. Also the confignics are
> the recommended one to configure nics.
>
> Thanks
> Best Regards
> --
> Wang Xiaopeng (王晓朋)
> IBM China System Technology Laboratory
> Tel: 86-10-82453455
> Email: w...@cn.ibm.com
> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
> Haidian District Beijing P.R.China 100193
>
>
>
> - Original message -
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Cc:
> Subject: [xcat-user] Some trouble with postscripts
> Date: Wed, Mar 2, 2016 8:16 AM
>
> I am using xCAT 2.8.3 and I successfully installed 50+ nodes just 4-5
> months ago using a certain custom defined osimage definition for Centos
> 6.5. Now I'm trying to deploy the same image to a new group of nodes that I
> defined in xCAT, but am encountering some difficulty with certain
> postscripts. The addsiteyum postscript is not adding the correct otherpkgs
> repo file into /etc/yum.repos.d/ (it creates xCAT-centos6.5-path0.repo but
> not xCAT-otherpkgs0.repo - whereas the previous 50+ nodes have both). And
> since that is not in place the otherpkgs postscript also fails because it
> can't reach that repo to install custom rpms.
>
> I also noticed that, contrary to the results with the 50+ hosts I deployed
> last time, the node is being assigned the correct IP but its
> /etc/sysconfig/network-scripts/ifcfg-eth0 files says BOOTPROTO=dhcp instead
> of 'static'. That in turn causes another post script I created
> ('ifcfg-eth'), which does a sed on that file to produce a similar one on a
> second ethernet interface, to fail.
>
> The lsdef for these new nodes matches the other successful hosts I did
> before, and the default postscripts are as follows:
> postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
>
>
> Many of the other postscripts like the syslog, ospkgs, and fstab seem to
> be executing fine though. My log file on the node, after showing the wget
> results of the postscripts down to the node, shows this:
>
> -
> FINISHED --2016-03-01 16:03:04--
> Downloaded: 374 files, 55M in 0.3s (202 MB/s)
> mv: cannot stat `/xcatpost/mypostscript.node0071c': No such file or
> directory
> mv: cannot stat `10.20.0.100/postscripts': No such file or directory
> Tue Mar  1 16:03:04 CST 2016 Running postscript: syslog
> Shutting down system logger: [FAILED]
> Starting system logger: [  OK  ]
> Tue Mar  1 16:03:05 CST 2016 Running postscript: remoteshell
>
> Stopping sshd: [FAILED]
> Generating SSH1 RSA host key: [  OK  ]
> Starting sshd: [  OK  ]
> Tue Mar  1 16:03:07 CST 2016 Running postscript: ospkgs
> Tue Mar  1 16:03:10 CST 2016 Running postscript: addsiteyum
> Tue Mar  1 16:03:10 CST 2016 Running postscript: syncfiles
>   Did not sync any files.
> Tue Mar  1 16:03:10 CST 2016 Running postscript: fstab
> Info: Configuring fstab
> mount.nfs: rpc.statd is not running but is required for remote locking.
> mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
> mount.nfs: an incorrect mount option was specified
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
> Error: try grep x3650-head01:/opt /etc/mtab unsuccessful
> Tue Mar  1

[xcat-user] Some trouble with postscripts

2016-03-01 Thread Josh Nielsen

I am using xCAT 2.8.3 and I successfully installed 50+ nodes just 4-5
months ago using a certain custom defined osimage definition for Centos
6.5. Now I'm trying to deploy the same image to a new group of nodes that I
defined in xCAT, but am encountering some difficulty with certain
postscripts. The addsiteyum postscript is not adding the correct otherpkgs
repo file into /etc/yum.repos.d/ (it creates xCAT-centos6.5-path0.repo but
not xCAT-otherpkgs0.repo - whereas the previous 50+ nodes have both). And
since that is not in place the otherpkgs postscript also fails because it
can't reach that repo to install custom rpms.

I also noticed that, contrary to the results with the 50+ hosts I deployed
last time, the node is being assigned the correct IP but its
/etc/sysconfig/network-scripts/ifcfg-eth0 files says BOOTPROTO=dhcp instead
of 'static'. That in turn causes another post script I created
('ifcfg-eth'), which does a sed on that file to produce a similar one on a
second ethernet interface, to fail.

The lsdef for these new nodes matches the other successful hosts I did
before, and the default postscripts are as follows:
postscripts=syslog,remoteshell,ospkgs,addsiteyum
,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode

Many of the other postscripts like the syslog, ospkgs, and fstab seem to be
executing fine though. My log file on the node, after showing the wget
results of the postscripts down to the node, shows this:

-
FINISHED --2016-03-01 16:03:04--
Downloaded: 374 files, 55M in 0.3s (202 MB/s)
mv: cannot stat `/xcatpost/mypostscript.node0071c': No such file or
directory
mv: cannot stat `10.20.0.100/postscripts': No such file or directory
Tue Mar  1 16:03:04 CST 2016 Running postscript: syslog
Shutting down system logger: [FAILED]
Starting system logger: [  OK  ]
Tue Mar  1 16:03:05 CST 2016 Running postscript: remoteshell

Stopping sshd: [FAILED]
Generating SSH1 RSA host key: [  OK  ]
Starting sshd: [  OK  ]
Tue Mar  1 16:03:07 CST 2016 Running postscript: ospkgs
Tue Mar  1 16:03:10 CST 2016 Running postscript: addsiteyum
Tue Mar  1 16:03:10 CST 2016 Running postscript: syncfiles
  Did not sync any files.
Tue Mar  1 16:03:10 CST 2016 Running postscript: fstab
Info: Configuring fstab
mount.nfs: rpc.statd is not running but is required for remote locking.
mount.nfs: Either use '-o nolock' to keep locks local, or start statd.
mount.nfs: an incorrect mount option was specified
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Warning: try grep x3650-head01:/opt /etc/mtab failed, rc 1, retrying
Error: try grep x3650-head01:/opt /etc/mtab unsuccessful
Tue Mar  1 16:04:14 CST 2016 Running postscript: setupntp
ntpd is stopped
 1 Mar 16:04:14 ntpdate[13492]: adjust time server 10.20.0.100 offset
0.201009 sec
Starting ntpd: [  OK  ]
-

>From past experience when things like this go wrong it tends to be due to
something small but significant that is misconfigured. Does any of this
look symptomatic of any common misconfiguration you are aware of?

P.S. I have verified that DNS has forward and reverse lookups for these
nodes on the MN and SNs.

Thanks,
Josh Nielsen
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-24 Thread Josh Nielsen

Also there are two logs that you can utilize for investigating
postscripts more. As long as you can get onto the deployed OS (and the
deployment is not a total failure) when postscripts do not complete
you can look in that node's local /var/log/xcat/xcat.log. An optional
log, which you will have to set for yourself in the kickstart (for
testing you can just edit the /install/autoinst/[nodename] kickstart
file directly), is to specify a log file for your %post section like
this: %post --log=/root/ks-post.log. If all else fails you can add
your own debugging messages to xcatdsklspost to track how far you are
getting.

I hope that helps in some way.

Regards,
Josh

On Tue, Nov 24, 2015 at 10:18 AM, Josh Nielsen <jniel...@hudsonalpha.org> wrote:
> Hi Nathan,
>
> Well, I may not be of any help at all (and I'm not familiar with Centos 7
> deployment) but since you mentioned DNS it reminded me that once I did a
> makedns -e [external_server] for some Centos 6.5 nodes and for some reason
> only the forward lookup entries were added to the DNS server but not the
> reverse entries. My postscripts were failing each time for an unknown reason
> when I attempted to deploy the nodes, and it turns out that it couldn't
> complete without the reverse lookup because some of the xCAT code in
> xcatdsklspost queries it.
>
> After running the command manually I saw this error:
>
>> /opt/xcat/xcatdsklspost 6
>
> awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information
> (3002, installstatus booted) invalid
>
>
> When I searched for a solution to this error I found this which pointed me
> to a reverse lookup error:
> http://sourceforge.net/p/xcat/mailman/message/27872412/
>
> So maybe that's just one more thing to strike off your DNS and postscript
> checklist, if you haven't already.
>
> Regards,
> Josh
>
> On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T. <nhe...@iu.edu> wrote:
>>
>> No, this is still in the same state.
>>
>> As best as I can tell DNS resolution is working, but it’s possible I’m
>> mistaken. No post scripts run so not much should be different once it boots
>> a second time (and I run a "nodeset boot” before it starts a 2nd install).
>> It can ping the xcat server’s hostname fine once it comes back up and I can
>> examine the node. I haven’t had much more time to work on this, but I do
>> plan to keep chipping away at it as time allows.
>>
>> This is the xcat error I see for reference:
>> Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost...
>> Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9:
>> /xcatpost/xcatlib.sh: No such file or directory
>> Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process
>> exited, code=exited status=1
>> Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost.
>> Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed
>> state.
>>
>> Thanks,
>> -Nathan
>>
>>
>> From: Josh Nielsen <jniel...@hudsonalpha.org>
>> Date: Monday, November 23, 2015 at 3:43 PM
>> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Nathan
>> Heald <nhe...@iu.edu>
>> Cc: "russa...@comcast.net" <russa...@comcast.net>
>> Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
>>
>> And did you ever figure out your problem Nathan?
>>
>> -Josh
>>
>> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org>
>> wrote:
>>>
>>> I was going to post a new thread about Centos 7 but thought I might piggy
>>> back on this one since it is a similar topic. I have xCAT 2.8.3 currently
>>> and it sounds like from the thread here that upgrading to 2.10 is part of
>>> the solution for deploying Centos 7.1. Currently the issue I'm having is
>>> that during a PXE boot it fetches the correct centos 7.1 image and begins
>>> trying to deploy but after it shows "Mounted Configuration File System" and
>>> "Started Show Plymouth Boot Screen" I get a message like this:
>>> "dracut-initqueue: Warning: Could not boot" and hangs there.
>>>
>>> This isn't a very explicit error, as to what caused it, and I didn't
>>> learn much from removing "quiet" from the PXE kernel boot parameters. I also
>>> can't get the ctrl+alt+F keys to work, at least in a VM, for showing
>>> different terminals like you could in Centos 6 (which was very helpful for
>>> debugging).
>>>
>>> Is this most likely because the kickstart file itself is not formatted
>>> correctly (I'm reusin

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-24 Thread Josh Nielsen

Hi Nathan,

Well, I may not be of any help at all (and I'm not familiar with Centos 7
deployment) but since you mentioned DNS it reminded me that once I did a
makedns -e [external_server] for some Centos 6.5 nodes and for some reason
only the forward lookup entries were added to the DNS server but not the
reverse entries. My postscripts were failing each time for an unknown
reason when I attempted to deploy the nodes, and it turns out that it
couldn't complete without the reverse lookup because some of the xCAT code
in xcatdsklspost queries it.

After running the command manually I saw this error:

> */opt/xcat/xcatdsklspost 6*

*awk: //xcatpost/updateflag.awk:22: fatal: remote host and port information
(3002, installstatus booted) invalid*

When I searched for a solution to this error I found this which pointed me
to a reverse lookup error:
http://sourceforge.net/p/xcat/mailman/message/27872412/

So maybe that's just one more thing to strike off your DNS and postscript
checklist, if you haven't already.

Regards,
Josh

On Tue, Nov 24, 2015 at 9:16 AM, Heald, Nathan T. <nhe...@iu.edu> wrote:

> No, this is still in the same state.
>
> As best as I can tell DNS resolution is working, but it’s possible I’m
> mistaken. No post scripts run so not much should be different once it boots
> a second time (and I run a "nodeset boot” before it starts a 2nd install).
> It can ping the xcat server’s hostname fine once it comes back up and I can
> examine the node. I haven’t had much more time to work on this, but I do
> plan to keep chipping away at it as time allows.
>
> This is the xcat error I see for reference:
> Oct 21 16:35:15 oss01 systemd: Starting LSB: xCATpost...
> Oct 21 16:35:15 oss01 xcatpostinit1: /opt/xcat/xcatinstallpost: line 9:
> /xcatpost/xcatlib.sh: No such file or directory
> Oct 21 16:35:15 oss01 systemd: xcatpostinit1.service: control process
> exited, code=exited status=1
> Oct 21 16:35:15 oss01 systemd: Failed to start LSB: xCATpost.
> Oct 21 16:35:15 oss01 systemd: Unit xcatpostinit1.service entered failed
> state.
>
> Thanks,
> -Nathan
>
>
> From: Josh Nielsen <jniel...@hudsonalpha.org>
> Date: Monday, November 23, 2015 at 3:43 PM
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>, Nathan
> Heald <nhe...@iu.edu>
> Cc: "russa...@comcast.net" <russa...@comcast.net>
> Subject: Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
>
> And did you ever figure out your problem Nathan?
>
> -Josh
>
> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
>> I was going to post a new thread about Centos 7 but thought I might piggy
>> back on this one since it is a similar topic. I have xCAT 2.8.3 currently
>> and it sounds like from the thread here that upgrading to 2.10 is part of
>> the solution for deploying Centos 7.1. Currently the issue I'm having is
>> that during a PXE boot it fetches the correct centos 7.1 image and begins
>> trying to deploy but after it shows "Mounted Configuration File System" and
>> "Started Show Plymouth Boot Screen" I get a message like this:
>> "dracut-initqueue: Warning: Could not boot" and hangs there.
>>
>> This isn't a very explicit error, as to what caused it, and I didn't
>> learn much from removing "quiet" from the PXE kernel boot parameters. I
>> also can't get the ctrl+alt+F keys to work, at least in a VM, for showing
>> different terminals like you could in Centos 6 (which was very helpful for
>> debugging).
>>
>> Is this most likely because the kickstart file itself is not formatted
>> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
>> Centos 7.1 differs), or perhaps that the kickstart is not being properly
>> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
>> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
>> looks like this:
>>
>> #!gpxe
>> #install centos7.1-x86_64-compute
>> imgfetch -n kernel
>> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
>> imgload kernel
>> imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks=
>> http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline
>> console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp}
>> imgfetch
>> http://${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img
>> imgexec kernel
>>
>> I manually changed ksdevice to "bootif" just to make sure it was using
>> the right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for
>> &q

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-23 Thread Josh Nielsen

Hi Jesus,

That's very useful information. Thank you for that! I'm going to update my
osimage definition and try a few more deploys and see how far I get with
it. I think I'll still end up having to update xCAT in any case, but I'll
see if I can get it working with the current install first, while making
manual edits as needed for testing.

Thanks,
Josh

On Mon, Nov 23, 2015 at 3:56 PM, Jesus R. Camou <jca...@nearnix.com> wrote:

> Hey Josh,
>
> centos7.1 provisioning works smoothly for me on 2.9.1 with the following
> definition and templates:
>
> [jcamou@xcat ~]$ lsxcatd -v
> Version 2.9.1 (git commit 7f6043fffd62d482931b17b60f9488eb5754fdc1, built
> Thu Mar 19 03:25:35 EDT 2015)
>
> [jcamou@xcat ~]$ lsdef -t osimage centos7.1-x86_64-install-compute
> Object name: centos7.1-x86_64-install-compute
> addkcmdline=net.ifnames=0 biodevname=0
> imagetype=linux
> osarch=x86_64
> osdistroname=centos7.1-x86_64
> osname=Linux
> osvers=centos7.1
> otherpkgdir=/install/post/otherpkgs/centos7.1/x86_64
> pkgdir=/install/centos7.1/x86_64
> pkglist=/install/custom/install/centos7/compute.centos7.pkglist
> profile=compute
> provmethod=install
> template=/install/custom/install/centos7/compute.centos7.tmpl
>
> Notice the addkcmdline attribute has "net.ifnames=0 biodevname=0" to
> disable consistent network device naming which forces the kernel to use and
> stick to the ethX interface naming convention:
>
> [jcamou@xcat ~]$ lsdef -t osimage -o centos7.1-x86_64-install-compute -i
> addkcmdline
> Object name: centos7.1-x86_64-install-compute
> addkcmdline=net.ifnames=0 biodevname=0
>
> You may set this by:
>
> [jcamou@xcat ~]$ tabch imagename=centos7.1-x86_64-install-compute
> linuximage.addkcmdline="net.ifnames=0 biodevname=0"
>
>
> And the following templates (ignoring commented and empty lines for
> shortening purposes):
>
> [jcamou@xcat ~]$ grep -Ev '^#|^$'
> /install/custom/install/centos7/compute.centos7.tmpl
> lang en_US
> %include /tmp/repos
> keyboard "us"
> zerombr
> clearpart --all --initlabel
> %include /tmp/partitioning
> bootloader
> install
> text
> firewall --disabled
> timezone --utc "#TABLE:site:key=timezone:value#"
> skipx
> rootpw --iscrypted #CRYPT:passwd:key=system,username=root:password#
> auth --useshadow --enablemd5
> selinux --disabled
> reboot
> %packages
> %end
> %pre
> %end
> %post
> %end
>
>
> ... and pkglist:
>
> [jcamou@xcat ~]$ cat
> /install/custom/install/centos7/compute.centos7.pkglist
> #Please make sure there is a space between @ and group name
> wget
> ntp
> nfs-utils
> net-snmp
> rsync
> yp-tools
> openssh-server
> util-linux
> net-tools
> mailx
> vim-enhanced
>
>
>
>
> - j
>
>
> On Mon, Nov 23, 2015 at 12:43 PM, Josh Nielsen <jniel...@hudsonalpha.org>
> wrote:
>
>> And did you ever figure out your problem Nathan?
>>
>> -Josh
>>
>> On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org>
>> wrote:
>>
>>> I was going to post a new thread about Centos 7 but thought I might
>>> piggy back on this one since it is a similar topic. I have xCAT 2.8.3
>>> currently and it sounds like from the thread here that upgrading to 2.10 is
>>> part of the solution for deploying Centos 7.1. Currently the issue I'm
>>> having is that during a PXE boot it fetches the correct centos 7.1 image
>>> and begins trying to deploy but after it shows "Mounted Configuration File
>>> System" and "Started Show Plymouth Boot Screen" I get a message like this:
>>> "dracut-initqueue: Warning: Could not boot" and hangs there.
>>>
>>> This isn't a very explicit error, as to what caused it, and I didn't
>>> learn much from removing "quiet" from the PXE kernel boot parameters. I
>>> also can't get the ctrl+alt+F keys to work, at least in a VM, for showing
>>> different terminals like you could in Centos 6 (which was very helpful for
>>> debugging).
>>>
>>> Is this most likely because the kickstart file itself is not formatted
>>> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
>>> Centos 7.1 differs), or perhaps that the kickstart is not being properly
>>> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
>>> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
>>> looks like this:
>>>
>>> #!gpxe
>>> #install centos7.1-x86_64-compute
>>>

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-23 Thread Josh Nielsen

I was going to post a new thread about Centos 7 but thought I might piggy
back on this one since it is a similar topic. I have xCAT 2.8.3 currently
and it sounds like from the thread here that upgrading to 2.10 is part of
the solution for deploying Centos 7.1. Currently the issue I'm having is
that during a PXE boot it fetches the correct centos 7.1 image and begins
trying to deploy but after it shows "Mounted Configuration File System" and
"Started Show Plymouth Boot Screen" I get a message like this:
"dracut-initqueue: Warning: Could not boot" and hangs there.

This isn't a very explicit error, as to what caused it, and I didn't learn
much from removing "quiet" from the PXE kernel boot parameters. I also
can't get the ctrl+alt+F keys to work, at least in a VM, for showing
different terminals like you could in Centos 6 (which was very helpful for
debugging).

Is this most likely because the kickstart file itself is not formatted
correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
Centos 7.1 differs), or perhaps that the kickstart is not being properly
fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
looks like this:

#!gpxe
#install centos7.1-x86_64-compute
imgfetch -n kernel http://
${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
imgload kernel
imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks=
http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline
console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp}
imgfetch http://
${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img
imgexec kernel

I manually changed ksdevice to "bootif" just to make sure it was using the
right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for
"em" didn't it?). Nonetheless, I think my kickstart is being sucessfully
fetched, because I changed the disk formatting commands in the kickstart
and when I removed "quiet" from the boot I saw errors related to disk
formatting/partitioning. So I'm partial to thinking that the old Centos 6.4
kickstart configuration is not 100% compatible with Centos 7.1, but I want
to double check to see what xCAT 2.10 bring to the table that might be
necessary for Centos 7.1 deployment. Comments or thoughts?

Regards,
Josh

On Fri, Oct 23, 2015 at 1:44 PM, Russell Auld  wrote:

> Looks like xcatdsklspost does get called even for stateful installs. Look
> at the script header.
> Usually in cases like this, the issue is that the node being imaged can't
> resolve the name of the master node. Make sure your dns is working properly.
>
> On Oct 23, 2015 12:52 PM, "Heald, Nathan T."  wrote:
> >
> > To follow up:
> >
> > I have resolved the pxe problem by upgrading further to xcat 2.10. Now
> it sets kickstart parameters that centos7 responds to.
> >
> > However I have a new problem, the rinstall is now looping. I’ve gotten
> as far as seeing that "/xcatpost” is never created on my stateful install.
> The xcat debugging page suggests networking problems as the first thing to
> check. So far I’ve not found anything on that front. I can’t find what
> specifically creates /xcatpost during the install. I see that /opt/xcat was
> created which contains the scripts xcatdsklspost and xcatinstallpost. Only
> the script xcatdsklspost has the line “mkdir /$xcatpost”, I would not
> expect this script to be executed since this is not a diskless install.
> “xcatinstallpost” starts trying to download stuff to /xcatpost but there’s
> nothing in this script that creates this directory.
> >
> > Suggestions?
> >
> > Thanks,
> > -Nathan
> >
> >
> > From: Nathan Heald 
> > Reply-To: "xcat-user@lists.sourceforge.net" <
> xcat-user@lists.sourceforge.net>
> > Date: Tuesday, October 13, 2015 at 5:30 PM
> > To: "xcat-user@lists.sourceforge.net" 
> > Subject: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
> >
> > Greetings,
> >
> > I’m trying to get a working statefull install of centos7 on generic
> hardware. This management node started out with xCAT 2.8.4, I have upgraded
> it to 2.9.1 to try to get this working. Of course the parameters of the
> RH/centos 7 kickstart have changed necessitating the upgrade. I found this
> url indicating that as of 2.9 xcat should know know how to handle
> RH/centos7:
> > http://sourceforge.net/p/xcat/bugs/4318/
> >
> > However, I’m not able to trick xcat into setting up my pxe file
> correctly when I run the nodeset for the host I want to install. I get this
> in my /tftpboot/pxelinux.cfg/host01 file:
> > …
> > APPEND initrd=xcat/osimage/centos7-x86_64-install-compute/initrd.img
> quiet repo=http://192.168.0.1:80/install/centos7/x86_64 ks=
> http://192.168.0.1:80/install/autoinst/host01 ksdevice=2C:44:FD:00:00:00
> > …
> >
> > But I’m expecting something more like this which

Re: [xcat-user] xCAT 2.9.1, problem kickstarting centos7

2015-11-23 Thread Josh Nielsen

And did you ever figure out your problem Nathan?

-Josh

On Mon, Nov 23, 2015 at 2:34 PM, Josh Nielsen <jniel...@hudsonalpha.org>
wrote:

> I was going to post a new thread about Centos 7 but thought I might piggy
> back on this one since it is a similar topic. I have xCAT 2.8.3 currently
> and it sounds like from the thread here that upgrading to 2.10 is part of
> the solution for deploying Centos 7.1. Currently the issue I'm having is
> that during a PXE boot it fetches the correct centos 7.1 image and begins
> trying to deploy but after it shows "Mounted Configuration File System" and
> "Started Show Plymouth Boot Screen" I get a message like this:
> "dracut-initqueue: Warning: Could not boot" and hangs there.
>
> This isn't a very explicit error, as to what caused it, and I didn't learn
> much from removing "quiet" from the PXE kernel boot parameters. I also
> can't get the ctrl+alt+F keys to work, at least in a VM, for showing
> different terminals like you could in Centos 6 (which was very helpful for
> debugging).
>
> Is this most likely because the kickstart file itself is not formatted
> correctly (I'm reusing my Centos 6.4 kickstart - until I figure out how
> Centos 7.1 differs), or perhaps that the kickstart is not being properly
> fetched from the gpxe configuration in /tftpboot/xcat/xnba/nodes? Currently
> the gpxe boot configuration for the node I'm trying to deploy Centos7.1 to
> looks like this:
>
> #!gpxe
> #install centos7.1-x86_64-compute
> imgfetch -n kernel http://
> ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/vmlinuz
> imgload kernel
> imgargs kernel repo=http://10.20.0.101:80/install/centos7.1/x86_64 ks=
> http://10.20.0.101:80/install/autoinst/node0067c ksdevice=bootif cmdline
> console=tty0 console=ttyS0,115200n8r BOOTIF=01-${netX/machyp}
> imgfetch http://
> ${next-server}/tftpboot/xcat/osimage/centos7.1-x86_64-install-compute/initrd.img
> imgexec kernel
>
> I manually changed ksdevice to "bootif" just to make sure it was using the
> right interface (it was set to eth0 but Centos 7.1 got rid of "eth" for
> "em" didn't it?). Nonetheless, I think my kickstart is being sucessfully
> fetched, because I changed the disk formatting commands in the kickstart
> and when I removed "quiet" from the boot I saw errors related to disk
> formatting/partitioning. So I'm partial to thinking that the old Centos 6.4
> kickstart configuration is not 100% compatible with Centos 7.1, but I want
> to double check to see what xCAT 2.10 bring to the table that might be
> necessary for Centos 7.1 deployment. Comments or thoughts?
>
> Regards,
> Josh
>
> On Fri, Oct 23, 2015 at 1:44 PM, Russell Auld <russa...@comcast.net>
> wrote:
>
>> Looks like xcatdsklspost does get called even for stateful installs. Look
>> at the script header.
>> Usually in cases like this, the issue is that the node being imaged can't
>> resolve the name of the master node. Make sure your dns is working properly.
>>
>> On Oct 23, 2015 12:52 PM, "Heald, Nathan T." <nhe...@iu.edu> wrote:
>> >
>> > To follow up:
>> >
>> > I have resolved the pxe problem by upgrading further to xcat 2.10. Now
>> it sets kickstart parameters that centos7 responds to.
>> >
>> > However I have a new problem, the rinstall is now looping. I’ve gotten
>> as far as seeing that "/xcatpost” is never created on my stateful install.
>> The xcat debugging page suggests networking problems as the first thing to
>> check. So far I’ve not found anything on that front. I can’t find what
>> specifically creates /xcatpost during the install. I see that /opt/xcat was
>> created which contains the scripts xcatdsklspost and xcatinstallpost. Only
>> the script xcatdsklspost has the line “mkdir /$xcatpost”, I would not
>> expect this script to be executed since this is not a diskless install.
>> “xcatinstallpost” starts trying to download stuff to /xcatpost but there’s
>> nothing in this script that creates this directory.
>> >
>> > Suggestions?
>> >
>> > Thanks,
>> > -Nathan
>> >
>> >
>> > From: Nathan Heald <nhe...@iu.edu>
>> > Reply-To: "xcat-user@lists.sourceforge.net" <
>> xcat-user@lists.sourceforge.net>
>> > Date: Tuesday, October 13, 2015 at 5:30 PM
>> > To: "xcat-user@lists.sourceforge.net" <xcat-user@lists.sourceforge.net>
>> > Subject: [xcat-user] xCAT 2.9.1, problem kickstarting centos7
>> >
>> > Greetings,
>> >
>> > I’m trying to get a worki

Re: [xcat-user] Hello Newbie who need some help. - Benjamin.

2015-08-04 Thread Josh Nielsen

Hi Benjamin,

It has been several years since I tried installing on VBox, and not all the
instructions make sense to me any longer, but here is what little I had
documented from my install:

Note: The node 'vh01' below is put for the computer that you are running
VBox on (desktop, laptop, etc.)

Add your nodes so that they can work with the Virtual Box plugin.

nodeadd node1 groups=testgrp,vboxgrp vm.host=vh01

Followed by:

chtab node=vboxgrp nodehm.mgt=vbox nodehm.power=vbox

That tells the nodehm table what settings to run for the nodes in the
vboxgrp group so that they can properly run as VM nodes in VirtualBox.

Then add the host computer vh01 as a node (your port number may be
different):

nodeadd vh01 groups=websrv websrv.port=18083 websrv.username=test
websrv.password=test

chtab node=vh01 hosts.ip=IP of vh01 hosts.comments=“The host running
VBox”

Manually set the MAC addresses in the mac table.

Finally you must launch the VirtualBox webservice on the physical host
machine.


You may first want to execute (I don't remember why I made this note, or
what problem it solves):
VBoxManage setproperty websrvauthlibrary null

Then when you are prepared to launch the webservices run in a separate
shell (it must remain open  running) or in the backgroud: vboxwebsrv

Final note to resolve an issue with a VM's name as shown in VirtualBox and
the hostname of the xCAT node being named the same thing (which you would
think wouldn't be a problem):

“Sometimes the VirtualBox machine name can not equal the xCAT node name,
especially in test cases that might occur. There is a feature for this
situation: add somewhere in the comment field of the vm table a schema like
vmname:NAME! and replace NAME by the vm name on the VirtualBox host
system. You may insert other comments before and after that schema. For
example the xCAT node vm03 should point to the virtual machine
machine03 on the VirtualBox host vh01.”


(This “trick” was absolutely necessary for me - else the rpower command
would not work)

Example config in vm table: node1,vh01,,,vmname:xCat_Node1!,

I hope maybe that helps. That's all I've got. :-)

Best regards,
Josh Nielsen

On Tue, Aug 4, 2015 at 6:21 PM, Benjamin LIPERE 
benjamin.lipere...@gmail.com wrote:

 Hello.

 I am a newbie with XCAT.
 I am a little lost on the installation.
 Can some one help me to start on XCAT with a VirtualBox Cluster, please,
 step by step, and by e-mail of course ??
 I may have a twin available, but I am not sure it will work directly.

 Thanks by advance.
 Best Regards.
 Benjamin.

 --
 LIPERE Benjamin
 Le logis de paille
 87270, Chaptelat
 FRANCE
 06 26 14 35 20
 benjamin.lipere...@gmail.com


 --

 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] Autodiscovery for HP Cluster - SNMP iLo Question

2015-08-04 Thread Josh Nielsen

Hello,

We are planning to use xCAT to deploy ESX on an HP BladeSystem c7000
cluster (BL460C Gen 9 blade servers). For the time being I was going to
skip iLo setup since I am confused about how to get that working with xCAT.
However, what I want to know is if I can still do autodiscovery of our
nodes as long as the nodes are defined in the switch table and the switches
have SNMP enabled. The documentation available online sends mixed signals
about this.

Sumavi's documentation seems in some places to suggest that SNMP being
enabled is the only critical component for autodiscovery: *So you may not
have the ability to have SNMP read access to the network switches to do
autodiscovery. Not a problem. You'll just have to fill in the MAC addresses
yourself.*
(From: http://sumavi.com/sections/skipping-autodiscovery)

Elsewhere it also seems to suggest SNMP is all is needed:

*If xCAT doesn't know who the node is, then it will automatically run the
autodiscover code. This code looks through all the switches in the switch
table trying to map the port to the mac to the node. It does this by
running snmpwalk commands through the switches.*

But then it disappointingly (and possibly outdated-ly) says: *Autodiscovery
has not been tested with HP blades. Therefore, you are better off just
running getmacs when these nodes are defined to map the mac address*.
(From: http://sumavi.com/sections/understanding-autodiscovery)

But then elsewhere it says this, which seems to imply that the management
module is needed for autodiscovery, and not merely SNMP: *In order to
automate discovery for IPMI and iLo nodes there needs to be a mapping of
nodes to switch ports. This relationship is defined in the switch table.
The methods, passwords, etc to communicate with the switches is in the
switches table. Notice that with blades this step usually isn't necessary.
xCAT can discover IBM blades through the management module.* (From:
http://sumavi.com/sections/switch-and-switches) Is this perhaps only
talking about the nodeX-bmc definition in regards to autodiscovery and not
the base nodeX definition itself?

My main question is, based on the above, if SNMP and the management module
both have a part to play in autodiscovery is there any chance that HP
Blades can work with discovery, as long as I explicitly map the nodes to
port numbers in the switch table? Will I have to configure iLo to make
autodiscovery work or can I leave iLo alone and just enable SNMP and
configure the switch table?

Thanks,
Josh Nielsen
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] ESXi VM Discovery Deployment

2015-07-10 Thread Josh Nielsen

Hello Wang,

Thank you for the interaction.

*From my point view that you should consider the bare-metal and virtual
machine management to be two levels.*

Agreed, though I want xCAT to have node definitions for both levels,
because I want OS deployment to be handled by xCAT for ESXi and for our
Centos VMs.

*First, use switch-based discovery to discovery bare-metal node and
install ESXI as a general operating system;*

Agreed. I pretty much have that figured out at this point.

*Second, manually or using script to define the virtual machine against
certain host. After the vm node definition has done, the hardware control
and OS deployment for vm node will be simple since it's very similar with
the bare-metal node.*

This is where I am more fuzzy on what to do. I plan to PXE deploy the OS
images to the VMs once they are created, so I need xCAT to know the VMs'
details like their MAC address, node name, etc. exactly as classic
discovery would provide, so that when an unprovisioned VM boots and
contacts the DHCP server and then continues down the boot chain that xCAT
recognizes the node, can assign it an OS hostname, and proceed with the OS
installation as with traditional physical node deployment. However I cannot
use SNMP switch discovery at that second level of abstraction for the VMs.

So two questions really:

1) What are my best options for creating the VMs on the deployed standalone
ESXi hosts to start with? Use the esxcli command line (is that what you
meant by 'script')? Or just connect with the vmware client into the ESXi
host via the GUI and step through the VM wizard manually?

2) Once I have created the VM, what are my best options for discovery 
deployment? SNMP location-based discovery  node definitions (using regular
expressions) seems out of the picture with VMs. So that leaves manually
populating the MACs myself, or sequential discovery so that they boot up in
the right order and PXE boots the correct image for the node definition.

*You mentioned the discovery of vm. I am curious about this requirement.
Is that because the vm was not created by xcat (like mkvm command), so you
need to discovery the vm from certain host?*
*xCAT does not have command to discovery/scan host to get vm list. A simple
way is to use 'xdsh' to run virsh command against the host.*

*BTW, don't your organization think the performance might be a problem to
move from bare-metal to virtual machine?*

Sorry, I failed to mention that we are getting a new compute cluster and
switch fabric to support the production compute which will be virtualized,
and the legacy compute will be made a development cluster (also which has
less resource demands).

Regards,
Josh Nielsen

On Mon, Jul 6, 2015 at 9:28 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 From my point view that you should consider the bare-metal and virtual
 machine management to be two levels.

 First, use switch-based discovery to discovery bare-metal node and install
 ESXI as a general operating system;
 Second, manually or using script to define the virtual machine against
 certain host. After the vm node definition has done, the hardware control
 and OS deployment for vm node will be simple since it's very similar with
 the bare-metal node.

 You mentioned the discovery of vm. I am curious about this requirement. Is
 that because the vm was not created by xcat (like mkvm command), so you
 need to discovery the vm from certain host?
 xCAT does not have command to discovery/scan host to get vm list. A simple
 way is to use 'xdsh' to run virsh command against the host.

 BTW, don't your organization think the performance might be a problem to
 move from bare-metal to virtual machine?

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2015/07/07
 08:28:49---Also, what will the 'switch' xCAT table look like with 
 multiple]Josh
 Nielsen ---2015/07/07 08:28:49---Also, what will the 'switch' xCAT table
 look like with multiple VMs on the same physical host, since

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2015/07/07 08:28
 Subject: Re: [xcat-user] ESXi VM Discovery  Deployment
 --



 Also, what will the 'switch' xCAT table look like with multiple VMs on the
 same physical host, since the man page for it says contains what switch
 port numbers each node is connected to?

 On Mon, Jul 6, 2015 at 1:31 PM, Josh Nielsen *jniel...@hudsonalpha.org*
 jniel...@hudsonalpha.org wrote:

Hello all,

Our organization is in the process of shifting our HPC model from an
all physical/bare metal compute cluster to a virtualized compute cluster,
making each physical

Re: [xcat-user] ESXi VM Discovery Deployment

2015-07-10 Thread Josh Nielsen

Jarrod,

Thanks for the reply.

*FYI, xCAT's vm commands don't require vCenter, though certain functions
like rmigrate and friends are restricted by vmware.  Unless the free
license is installed, after which ESXi locks us out of the API (eval mode
does not have this restriction).*

Below is a forward the last interaction we had (December 2014) on the issue
of using the xCAT commands with a standalone ESXi machine, and the errors I
was seeing. If the license type prevents the functioning of the VM xCAT
commands then effectively I cannot use them, since we are using the free
ESXi license.

So I am back to the two questions I posed to Wang then. But thank you for
clarifying.

Regards,
Josh

-- Forwarded message --
From: Jarrod Johnson jarrod.b.john...@gmail.com
Date: Sat, Dec 6, 2014 at 2:58 PM
Subject: Re: [xcat-user] Will xCAT work with ESXi 5.5?
To: xCAT Users Mailing list xcat-user@lists.sourceforge.net


It will work, but the free license doesn't let us automate it.  It requires
eval or commercial license sadly.  This is the case for older ones as well.

On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

 Hello all,

 I ran into this issue when trying to create a vm I had defined in xCAT in
 an environment where we have two ESXi 5.5 standalone servers:

 # mkvm --verbose vm1 -s 100G
 Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to
 vm1 for root@localhost: esx instance' with error '
 SOAP Fault:
 ---
 Fault string: Current license or ESXi version prohibits execution of the
 requested operation.
 Fault detail: RestrictedVersionFault' while trying to fulfill request for
 the following nodes: vm1

 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required)
 and I can contact the server with this cli command:

  vmware-cmd -H esxihost0001.morgan.haib.org -l

 The documentation only uses 4.1 as an example:
 http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What
 is the latest version of ESXi that xCAT will work with? If it does support
 5.5 do I need to configure some kind of key or license file locally? I
 already ran my variation on: chtab key=vmware passwd.username=root
 passwd.password=cluster for the xCAT passwd table.

 Regards,
 Josh Nielsen


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] ESXi VM Discovery Deployment

2015-07-06 Thread Josh Nielsen

Hello all,

Our organization is in the process of shifting our HPC model from an all
physical/bare metal compute cluster to a virtualized compute cluster,
making each physical compute node a standalone ESXi host (without vCenter
licensing or central management). Because we are not using vCenter the
vm-specific xCAT commands are not of much use to us, but I'm not so much
concerned about that as with how to redesign/organize the discovery 
deployment process for VMs on the ESXi hosts.

With our current physical compute cluster we had used the ultra handy SNMP
switch port discovery method to identify and label nodes with regular
expressions, creating compute hosts with simple names like node0001,
node0002, etc. Now the ESXi hosts take on those names and use the SNMP
switch port discovery method for their naming, IP addresses, etc. But once
that is done I need to determine how best to deploy VMs on top of those
ESXi hosts and how discovery will work with them.

Our intended naming scheme will be to name each VM, per host, after the
name of the ESXi host with letters appended to them. So say ESXi host
node0001 will have three VMs deployed: we would name them node0001a,
node0001b, and node0001c. From what I can tell I cannot use the SNMP method
of identifying those VMs. Since I may have to create the VMs by hand anyway
(or deploy from a template), perhaps I can use the most tedious method of
manually populating the MAC addresses, but I am wondering if anyone has any
better ideas for ways to accomplish that. I would welcome any suggestions
or pointers for things that I haven't thought of yet.

Thanks!
Josh Nielsen
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] ESXi VM Discovery Deployment

2015-07-06 Thread Josh Nielsen

Also, what will the 'switch' xCAT table look like with multiple VMs on the
same physical host, since the man page for it says contains what switch
port numbers each node is connected to?

On Mon, Jul 6, 2015 at 1:31 PM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

 Hello all,

 Our organization is in the process of shifting our HPC model from an all
 physical/bare metal compute cluster to a virtualized compute cluster,
 making each physical compute node a standalone ESXi host (without vCenter
 licensing or central management). Because we are not using vCenter the
 vm-specific xCAT commands are not of much use to us, but I'm not so much
 concerned about that as with how to redesign/organize the discovery 
 deployment process for VMs on the ESXi hosts.

 With our current physical compute cluster we had used the ultra handy SNMP
 switch port discovery method to identify and label nodes with regular
 expressions, creating compute hosts with simple names like node0001,
 node0002, etc. Now the ESXi hosts take on those names and use the SNMP
 switch port discovery method for their naming, IP addresses, etc. But once
 that is done I need to determine how best to deploy VMs on top of those
 ESXi hosts and how discovery will work with them.

 Our intended naming scheme will be to name each VM, per host, after the
 name of the ESXi host with letters appended to them. So say ESXi host
 node0001 will have three VMs deployed: we would name them node0001a,
 node0001b, and node0001c. From what I can tell I cannot use the SNMP method
 of identifying those VMs. Since I may have to create the VMs by hand anyway
 (or deploy from a template), perhaps I can use the most tedious method of
 manually populating the MAC addresses, but I am wondering if anyone has any
 better ideas for ways to accomplish that. I would welcome any suggestions
 or pointers for things that I haven't thought of yet.

 Thanks!
 Josh Nielsen

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] Service Node trying to update Master DNS

2015-03-03 Thread Josh Nielsen

Hello all,

I have one xCAT MN and two SNs along with an external DNS server that they
all interact with. The SNs act as slave DNS servers that only sync down
changes made to the external (master) DNS server that is updated by
commands like 'makedns -e' from the MN only.

An issue I am running into, that I haven't tried since I moved to the MN/SN
hierarchy model, is that during a PXE deploy the new node gets a DHCP IP
from one of the SNs and then that SN tries to add the node's IP to the
external DNS server and can't. The SNs are not authorized to update the DNS
entries since it is a readonly slave that just syncs what it sees on the
master external DNS server. I can manually alter their named.conf to allow
updates to be pushed up from the slave named servers on the SNs, but I
wanted to check and see what others are doing for this issue first and
whether you allow your SNs DNS slaves update the master DNS. Perhaps I
configured something incorrectly to begin with.

One thing I do need to figure out though is why the node's entry in
/var/lib/dhcpd/dhcpd.leases is saying  dynamic though instead of
fixed-address and doesn't list the IP that shows for the node's config
with an lsdef. Perhaps that is why the makedns -e doesn't seem to help,
because it picks a different IP then tries to add it. Either way the SNs
don't have permission to update DNS.

Thanks,
Josh Nielsen
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Will xCAT work with ESXi 5.5?

2014-12-09 Thread Josh Nielsen

Thanks Michael, that is useful information. We may end up using VMWare's
own provisioning tools instead in this case.

-Josh

On Mon, Dec 8, 2014 at 5:20 PM, Fenn, Michael 
michael.f...@deshawresearch.com wrote:

  AFAIK, without (licensed) vCenter, you don't have access to the API
 endpoints that xCAT is looking for.  So any xCAT command that needs to
 communicate with the API will not work.  Pure xCAT DB and file manipulation
 commands will work.  For example mkvm, rsetboot, and rower will not work,
 but nodeset will.

  You can definitely deploy a VM on the (free) ESXi with xCAT, it's just a
 little more of a manual process.  Once the VM boots, it does PXE just like
 a physical node would.  So you would need use nodeset to prep the VM for
 install, then manually create and power on the the VM in the VI Client.
 The VM should then PXE boot and install like a normal cluster node.

  Michael

   From: Josh Nielsen jniel...@hudsonalpha.org
 Reply-To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: Monday, December 8, 2014 at 10:29 AM
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Subject: Re: [xcat-user] Will xCAT work with ESXi 5.5?

   Hello Jarrod,

 I think we are indeed using the free license. What exactly do you mean by 
 doesn't
 let us automate it? Do you mean, basically, that the commands like mkvm
 won't work (which is just a wrapper around the VMWare SDK functions right?)
 and that (with the free license) we would have to create the VMs manually?
 But if so, even once they are created (manually) can xCAT deploy an image
 to it via ESX (following the rsetboot, nodeset, rpower instructions in the
 wiki) if we are using the free license? Or are you saying xCAT simply will
 not be able to deploy anything to a VM at all with the free ESX license?

 Thanks,
 Josh Nielsen

 On Sat, Dec 6, 2014 at 2:58 PM, Jarrod Johnson jarrod.b.john...@gmail.com
  wrote:

 It will work, but the free license doesn't let us automate it.  It
 requires eval or commercial license sadly.  This is the case for older ones
 as well.

 On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org
 wrote:

 Hello all,

 I ran into this issue when trying to create a vm I had defined in xCAT
 in an environment where we have two ESXi 5.5 standalone servers:

 # mkvm --verbose vm1 -s 100G
 Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm
 to vm1 for root@localhost: esx instance' with error '
 SOAP Fault:
 ---
 Fault string: Current license or ESXi version prohibits execution of the
 requested operation.
 Fault detail: RestrictedVersionFault' while trying to fulfill request
 for the following nodes: vm1

 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required)
 and I can contact the server with this cli command:

   vmware-cmd -H esxihost0001.morgan.haib.org -l

  The documentation only uses 4.1 as an example:
 http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/.
 What is the latest version of ESXi that xCAT will work with? If it does
 support 5.5 do I need to configure some kind of key or license file
 locally? I already ran my variation on: chtab key=vmware
 passwd.username=root passwd.password=cluster for the xCAT passwd table.

 Regards,
 Josh Nielsen


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk

Re: [xcat-user] Will xCAT work with ESXi 5.5?

2014-12-08 Thread Josh Nielsen

Hello Jarrod,

I think we are indeed using the free license. What exactly do you mean
by doesn't
let us automate it? Do you mean, basically, that the commands like mkvm
won't work (which is just a wrapper around the VMWare SDK functions right?)
and that (with the free license) we would have to create the VMs manually?
But if so, even once they are created (manually) can xCAT deploy an image
to it via ESX (following the rsetboot, nodeset, rpower instructions in the
wiki) if we are using the free license? Or are you saying xCAT simply will
not be able to deploy anything to a VM at all with the free ESX license?

Thanks,
Josh Nielsen

On Sat, Dec 6, 2014 at 2:58 PM, Jarrod Johnson jarrod.b.john...@gmail.com
wrote:

 It will work, but the free license doesn't let us automate it.  It
 requires eval or commercial license sadly.  This is the case for older ones
 as well.

 On Thu, Dec 4, 2014 at 4:32 PM, Josh Nielsen jniel...@hudsonalpha.org
 wrote:

 Hello all,

 I ran into this issue when trying to create a vm I had defined in xCAT in
 an environment where we have two ESXi 5.5 standalone servers:

 # mkvm --verbose vm1 -s 100G
 Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm
 to vm1 for root@localhost: esx instance' with error '
 SOAP Fault:
 ---
 Fault string: Current license or ESXi version prohibits execution of the
 requested operation.
 Fault detail: RestrictedVersionFault' while trying to fulfill request for
 the following nodes: vm1

 I installed version 5.5 update 2 of the VMWare Perl SDK (as is required)
 and I can contact the server with this cli command:

  vmware-cmd -H esxihost0001.morgan.haib.org -l

 The documentation only uses 4.1 as an example:
 http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/.
 What is the latest version of ESXi that xCAT will work with? If it does
 support 5.5 do I need to configure some kind of key or license file
 locally? I already ran my variation on: chtab key=vmware
 passwd.username=root passwd.password=cluster for the xCAT passwd table.

 Regards,
 Josh Nielsen


 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user




 --
 Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
 from Actuate! Instantly Supercharge Your Business Reports and Dashboards
 with Interactivity, Sharing, Native Excel Exports, App Integration  more
 Get technology previously reserved for billion-dollar corporations, FREE

 http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] Will xCAT work with ESXi 5.5?

2014-12-04 Thread Josh Nielsen

Hello all,

I ran into this issue when trying to create a vm I had defined in xCAT in
an environment where we have two ESXi 5.5 standalone servers:

# mkvm --verbose vm1 -s 100G
Error: esx plugin bug, pid 16602, process description: 'xCATd SSL: mkvm to
vm1 for root@localhost: esx instance' with error '
SOAP Fault:
---
Fault string: Current license or ESXi version prohibits execution of the
requested operation.
Fault detail: RestrictedVersionFault' while trying to fulfill request for
the following nodes: vm1

I installed version 5.5 update 2 of the VMWare Perl SDK (as is required)
and I can contact the server with this cli command:

 vmware-cmd -H esxihost0001.morgan.haib.org -l

The documentation only uses 4.1 as an example:
http://sourceforge.net/p/xcat/wiki/XCAT_Virtualization_with_VMWare/. What
is the latest version of ESXi that xCAT will work with? If it does support
5.5 do I need to configure some kind of key or license file locally? I
already ran my variation on: chtab key=vmware passwd.username=root
passwd.password=cluster for the xCAT passwd table.

Regards,
Josh Nielsen
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Good xCAT Book

2014-10-20 Thread Josh Nielsen

Hi John,

I had a similar experience to you in the matter of the documentation only
taking me so far and the rest was left to my real world implementations
(mostly troubleshooting when things didn't go as expected). xCAT has
dependencies on many other things that are assumed to be working already
(like SNMP on switches if you are using network discovery, to take one
example among many), and so has many moving pieces. I too found the Sumavi
documentation useful to an extent. I have over 20 pages of notes in a
google doc of my own observations on specific xCAT things I have run into.

To add to your list of documentation though, I discovered this little gem
of a presentation by Jordi Caubet working at IBM Spain given in 2011:
http://www.bsc.es/media/4373.pdf. It may not cover everything you need but
I actually found that it had details about xCAT that you cannot find
elsewhere. For example on slide 56 he shows the exact order that deployment
template scripts are parsed in in a very useful picture, which is
information not even to be found in the sourceforge wiki (maybe someone
reading this can fix that?).

Anyway, maybe that will give you something you are looking for.

Regards,
Josh Nielsen

On Thu, Oct 16, 2014 at 2:46 PM, Russell Jones russell-l...@jonesmail.me
wrote:

This is a good start. It's a little bit old and does some things in a
different way than I would, but the overall information in it is great:
http://sumavi.com/books/xcat-administrators-guide

On 10/16/2014 2:35 PM, John Hosie wrote:

I'm working on a government contract to support an HPC environment that
uses xCAT. I've spent over 7 1/2 months working with a document that really
doesn't say much more than how to get hardware support that was produced by
the vendor who did the initial installation. I've also used what I could
find in Google. But, to tell you the truth, none of it really seems to be
all that good. It tells me about this or that implementation, and can be
adapted to meet my needs, but it takes time and effort that seems to be
more than should be necessary.

Now, it may be better than it once was. I'll give you that. But I'd like
to know if there is a good published document out there I can look for,
electronic or pulp, that I can use to give me better guidance? What is
there out there? And are there any sites you've run across that might help
me to fill in the gaps in my current knowledge?

Don't get me wrong. I really do appreciate the pieces of documentation
I've found. I'd just like to find something that would help me to be able
to tie it all together better.

Sincerely,

*John W. Hosie III*

*jwh3*

___
xCAT-user mailing
listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Good xCAT Book

2014-10-20 Thread Josh Nielsen

Actually, it occurs to me to ask: Can Jordi's presentation be uploaded to
the sourceforge wiki so that it can benefit more people?

-Josh

On Mon, Oct 20, 2014 at 11:52 AM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

Hi John,

To add to your list of documentation though, I discovered this little gem
of a presentation by Jordi Caubet working at IBM Spain given in 2011:
http://www.bsc.es/media/4373.pdf. It may not cover everything you need
but I actually found that it had details about xCAT that you cannot find
elsewhere. For example on slide 56 he shows the exact order that deployment
template scripts are parsed in in a very useful picture, which is
information not even to be found in the sourceforge wiki (maybe someone
reading this can fix that?).

Anyway, maybe that will give you something you are looking for.

Regards,
Josh Nielsen

On Thu, Oct 16, 2014 at 2:46 PM, Russell Jones russell-l...@jonesmail.me
wrote:

This is a good start. It's a little bit old and does some things in a
different way than I would, but the overall information in it is great:
http://sumavi.com/books/xcat-administrators-guide

On 10/16/2014 2:35 PM, John Hosie wrote:

Don't get me wrong. I really do appreciate the pieces of documentation
I've found. I'd just like to find something that would help me to be able
to tie it all together better.

Sincerely,

*John W. Hosie III*

*jwh3*

___
xCAT-user mailing
listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user

--
Comprehensive Server Monitoring with Site24x7.
Monitor 10 servers for $9/Month.
Get alerted through email, SMS, voice calls or mobile push notifications.
Take corrective actions from your mobile device.
http://p.sf.net/sfu/Zoho___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-10-03 Thread Josh Nielsen

Lissa / Xiao,

I may have possibly discovered a bug(?) in ddns.pm, unless it is a print
formatting issue. I traced the autovivification of the $ctx hash in ddns.pm
throughout the code, and then wrote a few lines of test code to descend
into all the hash values of $ctx and print them out (prefixed with the
string CTX Key/Value).

For $ctx-{resolver} the value looks like it may not have executed the
Net::DNS::Resolver Perl call correctly, and instead treated the
Net::DNS::Resolver function as a literal string (and what caught my eye is
that the string has a space between the two colons after Net (e.g. Net:
:). Here is what my code printed for the key/value of $ctx-{resolver}:

CTX Key/Value ||resolver|| = Net: :DNS::Resolver=HASH(0x2ca6b08)

Maybe the actual Perl module call Net: :DNS::Resolver is meant to be
literally placed in the value of the $ctx_{resolver} though, so I may be
mistaken. By contrast, however, some of the other key/value pairs under
$ctx looked like this:

CTX Key/Value ||domain|| = morgan.haib.org

CTX Key/Value ||forwarders|| = ARRAY(0x2b9b2d8)
  ||forwarders|| (ARRAY VALUE) 172.24.12.1
  ||forwarders|| (ARRAY VALUE) 172.24.12.2

The relevant code on line 723 of ddns.pm is:

if ($external)
{
# based on /etc/resolv.conf
$ctx-{resolver} = Net::DNS::Resolver-new();
}
else
{
# default to localhost
$ctx-{resolver} =
Net::DNS::Resolver-new(nameservers=['127.0.0.1']);
}

my $ret = add_or_delete_records($ctx);
unless($ret) {
xCAT::SvrUtils::sendmsg(DNS setup is completed, $callback);
}

Is the value of $ctx-{resolver} normally supposed to be
Net::DNS::Resolver=HASH(0x2ca6b08) (once I remove the whitespace between
the two colons)? Or is it not supposed to include the Perl module call?

Regards,
Josh

On Thu, Oct 2, 2014 at 1:47 PM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

 Okay, so I moved to an RPM based install of BIND instead of from source
 and the problem did not go away. My setup is that I am using a development
 machine which I exported the current xCAT settings that are on my iDataPlex
 headnode to that dev VM which I installed xCAT on (definitely a newer
 version on the VM than the headnode) and when I first ran restorexCATdb I
 got errors when running 'makedns' which said: *Ignoring host node0014,
 it does not belong to any nets defined in networks table or the net it
 belongs to is configured to use an external nameserver*.

 I thought: That's odd, because the network definition looks fine to me and
 I don't see why xCAT would change it between versions. On the headnode the
 networks database looks like this for the relevant network (compute)  a
 secondary network (I snipped out the other entries):


 #netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable

 compute,10.20.0.0,255.255.0.0,eth0,,,10.20.0.1,10.20.0.1,,,10.20.200.254-10.20.254.254
 10gig,10.60.0.0,255.255.0.0,,10.20.0.1,,

 When I restored the tables on the dev VM it would fail to parse compute
 but not for 10gig (the only obvious difference being that 10gig had
 less fields filled in). I modified the ddns.pm code to echo messages when
 parsing the networks to determine what was going on, and once I discovered
 that it liked 10gig but not compute I deleted all the extra fields in
 compute to make it match 10gig and suddenly it would parse. Maybe it
 has to do with the dynamicrange field in the networks definition?

 Below are my code modifications (just adding sendmsg commands) and the
 corresponding output from a test execution:


# exclude the nodes not belong to any nets defined in networks table
 #   because only the nets defined in networks table will be
 add
 #   zones later.
 my $found = 0;
 foreach (@networks)
 {
 xCAT::SvrUtils::sendmsg(ADDR is $addr , $callback);
 xCAT::SvrUtils::sendmsg(MASK is $_-{mask} , $callback);
 xCAT::SvrUtils::sendmsg(NETWORK is $_-{net} ,
 $callback);
 if(xCAT::NetworkUtils-ishostinsubnet($addr, $_-{mask},
 $_-{net}))
 {
 $found = 1;
 xCAT::SvrUtils::sendmsg(Found! , $callback);
 }
 else {
 xCAT::SvrUtils::sendmsg(Not Found! , $callback);
 }
 }

 if ($found)
 {
 push @nodes,$node;
 $ctx-{nodeips}-{$node}-{$addr}=1;
 }
 else
 {
 unless ($node =~ /localhost/)
 {
 xCAT::SvrUtils::sendmsg(:Ignoring host $node, it does
 not belong to any nets defined in networks table or the net it belongs to
 is configured to use an external nameserver., $callback

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-10-02 Thread Josh Nielsen

Okay, so I moved to an RPM based install of BIND instead of from source and
the problem did not go away. My setup is that I am using a development
machine which I exported the current xCAT settings that are on my iDataPlex
headnode to that dev VM which I installed xCAT on (definitely a newer
version on the VM than the headnode) and when I first ran restorexCATdb I
got errors when running 'makedns' which said: *Ignoring host node0014, it
does not belong to any nets defined in networks table or the net it belongs
to is configured to use an external nameserver*.

I thought: That's odd, because the network definition looks fine to me and
I don't see why xCAT would change it between versions. On the headnode the
networks database looks like this for the relevant network (compute)  a
secondary network (I snipped out the other entries):

#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable
compute,10.20.0.0,255.255.0.0,eth0,,,10.20.0.1,10.20.0.1,,,10.20.200.254-10.20.254.254
10gig,10.60.0.0,255.255.0.0,,10.20.0.1,,

When I restored the tables on the dev VM it would fail to parse compute
but not for 10gig (the only obvious difference being that 10gig had
less fields filled in). I modified the ddns.pm code to echo messages when
parsing the networks to determine what was going on, and once I discovered
that it liked 10gig but not compute I deleted all the extra fields in
compute to make it match 10gig and suddenly it would parse. Maybe it
has to do with the dynamicrange field in the networks definition?

Below are my code modifications (just adding sendmsg commands) and the
corresponding output from a test execution:


   # exclude the nodes not belong to any nets defined in networks table
#   because only the nets defined in networks table will be add
#   zones later.
my $found = 0;
foreach (@networks)
{
xCAT::SvrUtils::sendmsg(ADDR is $addr , $callback);
xCAT::SvrUtils::sendmsg(MASK is $_-{mask} , $callback);
xCAT::SvrUtils::sendmsg(NETWORK is $_-{net} , $callback);
if(xCAT::NetworkUtils-ishostinsubnet($addr, $_-{mask},
$_-{net}))
{
$found = 1;
xCAT::SvrUtils::sendmsg(Found! , $callback);
}
else {
xCAT::SvrUtils::sendmsg(Not Found! , $callback);
}
}

if ($found)
{
push @nodes,$node;
$ctx-{nodeips}-{$node}-{$addr}=1;
}
else
{
unless ($node =~ /localhost/)
{
xCAT::SvrUtils::sendmsg(:Ignoring host $node, it does
not belong to any nets defined in networks table or the net it belongs to
is configured to use an external nameserver., $callback);
}
}

---

My test run:

# makedns -e node0014
Handling node0014 in /etc/hosts.
ADDR is 10.20.101.14
MASK is
NETWORK is
Not Found!

ADDR is 10.20.101.14
MASK is
NETWORK is
Not Found!

ADDR is 10.20.101.14
MASK is 255.255.0.0
NETWORK is 10.40.0.0
Not Found!

ADDR is 10.20.101.14
MASK is 255.255.0.0
NETWORK is 10.20.0.0
Found!
-
ADDR is 10.20.101.14
MASK is 255.255.0.0
NETWORK is 10.60.0.0
Not Found!
-


As you can see, some other networks are being parsed as blank (the first
two encountered in the foreach loop) but after I modified the compute/
10.20.0.0 network then it printed finally my Found! message (perviously
it was also parsing as blank). I suspect that there are other lurking
problems in the tables between the two xCAT versions due to the
backup/restore of the databases which might be causing this problem. I'm
not sure how to address this without a full reinstall from scratch for xCAT
if a backup/restore between versions is not possible/compatible. I'm
exploring alternatives though.

Regards,
Josh


On Wed, Oct 1, 2014 at 10:45 AM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

 So I'm thinking that either:

 A) My DNS server (BIND) is somehow misconfigured (which may include some
 bug when BIND is compiled from the latest source) - even though the keys
 obviously work in some instances and I am indeed getting partial remote
 updates successfully.
 - OR -
 B) There is a bug in makedns or the underlying config/tools it uses on the
 OS. Possibly there was some unclean transfer or mismatch of settings when I
 used dumpxCATdb/restorexCATdb, even though I updated the site table and
 have my resolv.conf pointing only to the external DNS server. Even so, as I
 pointed out, some of the entries are being correctly sent with the key, but
 I am occasionally seeing request is not signed for some requests OR I am
 not seeing the request for a forward lookup entry at all (which

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-10-01 Thread Josh Nielsen

So I'm thinking that either:

A) My DNS server (BIND) is somehow misconfigured (which may include some
bug when BIND is compiled from the latest source) - even though the keys
obviously work in some instances and I am indeed getting partial remote
updates successfully.
- OR -
B) There is a bug in makedns or the underlying config/tools it uses on the
OS. Possibly there was some unclean transfer or mismatch of settings when I
used dumpxCATdb/restorexCATdb, even though I updated the site table and
have my resolv.conf pointing only to the external DNS server. Even so, as I
pointed out, some of the entries are being correctly sent with the key, but
I am occasionally seeing request is not signed for some requests OR I am
not seeing the request for a forward lookup entry at all (which is what
happened the first few times I tried).

I just can't tell whether it is a sending problem (makedns/nsupdate) or a
receiving problem (external BIND server).

-Josh

On Tue, Sep 30, 2014 at 10:29 AM, Josh Nielsen jniel...@hudsonalpha.org
wrote:

 Xiao,

 For some additional details I am running BIND 9.10.0-P2 which I compiled
 from source. I used --enable-largefile which specfies 64-bit file
 support but I noticed this in the kernel boot messages in
 /var/log/messages:

 Sep 29 11:25:25 dns01 kernel: warning: `named' uses 32-bit capabilities
 (legacy support in use)

 When I start named in the foreground here are the first few lines, where
 you can see my compile options and a few things about startup:

 30-Sep-2014 09:59:20.672 built with '--prefix=/opt/bind9'
 '--sysconfdir=/etc' '--with-gtest' '--with-log4cplus=/opt/log4cplus'
 '--with-pythonpath=/usr/bin/python' '--localstatedir=/var'
 '--mandir=/usr/share/man' '--enable-threads' '--enable-largefile'
 '--with-libtool' '--disable-static' '--with-openssl'
 30-Sep-2014 09:59:20.672
 
 30-Sep-2014 09:59:20.672 BIND 9 is maintained by Internet Systems
 Consortium,
 30-Sep-2014 09:59:20.672 Inc. (ISC), a non-profit 501(c)(3) public-benefit
 30-Sep-2014 09:59:20.672 corporation.  Support and training for BIND 9 are
 30-Sep-2014 09:59:20.672 available at https://www.isc.org/support
 30-Sep-2014 09:59:20.672
 
 30-Sep-2014 09:59:20.672 adjusted limit on open files from 4096 to 1048576
 30-Sep-2014 09:59:20.672 found 1 CPU, using 1 worker thread
 30-Sep-2014 09:59:20.672 using 1 UDP listener per interface
 30-Sep-2014 09:59:20.672 using up to 4096 sockets
 30-Sep-2014 09:59:20.672 Registering DLZ_dlopen driver
 30-Sep-2014 09:59:20.672 Registering SDLZ driver 'dlopen'
 30-Sep-2014 09:59:20.672 Registering DLZ driver 'dlopen'
 30-Sep-2014 09:59:20.676 decrement_reference: delete from rbt:
 0x7ff551ecd148 .
 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: created
 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -3
 for socket 20
 30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -2
 for socket -1
 30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: socket_recv: event
 0x7ff551ee0010 - task 0x7ff551ed9250
 30-Sep-2014 09:59:20.678 loading configuration from '/etc/named.conf'
 30-Sep-2014 09:59:20.679 reading built-in trusted keys from file
 '/etc/bind.keys'

 I've been planning to move to an RPM based installation rolled out with
 CentOS but was testing with a source install first. Does any of this look
 like a likely culprit for the problems I'm seeing?

 Thanks,
 Josh

 On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 Then, in this case, did you check the 'allow of key xcat_key' has been
 set correctly in name.conf for zone morgan.haib.org? And did you see any
 useful message for this error in the syslog of external server?

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/09/30
 06:49:50---Okay, I have the external DNS server working: partly. For 
 some]Josh
 Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server
 working: partly. For some very odd reason the external DNS serv

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2014/09/30 06:49
 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and
 makedns -e
 --



 Okay, I have the external DNS server working: partly. For some very odd
 reason the external DNS server is only receiving the request to enter the
 reverse lookup for a new node (ex: makedns -e node0014), but does not even
 attempt to add the forward lookup zone. I see the key authorization passed
 successfully for the reverse entry, so there are no authentication issues

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-30 Thread Josh Nielsen

Hi Xiao,

Here is the relevant zone:

zone morgan.haib.org in {
  type master;
  file morgan/db.morgan.haib.org;
  allow-transfer {
10.20.0.100/32;
10.20.0.101/32;
   };
   allow-update {
   key xcat_key;
   };
   notify yes;
};

Its allow-update looks the same as the one for zone
20.10.IN-ADDR.ARPA. I saw no messages in /var/log/messages
concerning named.

But I did just try it again this morning and the opposite happened:
'morgan.haib.org' updated but 20.10.IN-ADDR.ARPA. did not. So the
issue is very inconsistent, other than that one of the two entries
gets omitted. See below (which I have shortened some), and notice
where it says request is not signed for the 20.10 zone, but it
signed (this time!) the morgain.haib.org zone:

0-Sep-2014 10:01:39.446 socket 0x7ff551edebc8 172.26.42.60#58640:
packet received correctly
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: UDP request
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: using view '_default'
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: request is not signed
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: recursion available
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640: query
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640
(20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640
(20.10.IN-ADDR.ARPA): query '20.10.IN-ADDR.ARPA/NS/IN' approved
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): send
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): sendto
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640
(20.10.IN-ADDR.ARPA): senddone
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640 (20.10.IN-ADDR.ARPA): next
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640
(20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0
30-Sep-2014 10:01:39.446 client 172.26.42.60#58640
(20.10.IN-ADDR.ARPA): endrequest
30-Sep-2014 10:01:39.446 client @0x7ff5480d7710: udprecv
...
...
30-Sep-2014 10:01:39.454 socket 0x7ff551edebc8 172.26.42.60#42345:
packet received correctly
30-Sep-2014 10:01:39.454 client 172.26.42.60#42345: UDP request
30-Sep-2014 10:01:39.454 client 172.26.42.60#42345: using view '_default'
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345: request has valid
signature: xcat_key
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
recursion available
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key: update
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
ns_client_attach: ref = 1
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': prerequisites are OK
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
signer xcat_key approved
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
update 'morgan.haib.org/IN' approved
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': update section prescan OK
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': adding an RR at
'node0014.morgan.haib.org' A 10.20.101.14
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': checking for NSEC3PARAM changes
30-Sep-2014 10:01:39.455 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': writing journal
morgan/db.morgan.haib.org.jnl
30-Sep-2014 10:01:39.455 writing to journal
30-Sep-2014 10:01:39.455 del morgan.haib.org. 86400 IN SOA
dns01.morgan.haib.org. root.dns01.morgan.haib.org. 2012080935 10800
3600 604800 86400
30-Sep-2014 10:01:39.455 add morgan.haib.org. 86400 IN SOA
dns01.morgan.haib.org. root.dns01.morgan.haib.org. 2012080936 10800
3600 604800 86400
30-Sep-2014 10:01:39.455 add node0014.morgan.haib.org. 86400 IN A 10.20.101.14
30-Sep-2014 10:01:39.476 client 172.26.42.60#42345/key xcat_key:
updating zone 'morgan.haib.org/IN': committing update transaction
30-Sep-2014 10:01:39.477 zone_needdump: zone morgan.haib.org/IN: enter
30-Sep-2014 10:01:39.477 zone_settimer: zone morgan.haib.org/IN: enter
30-Sep-2014 10:01:39.477 zone_settimer: zone morgan.haib.org/IN: enter

I'll send another email with other possibly relevant details to keep
this one's length down.

-Josh

On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote:
 Then, in this case, did you check the 'allow of key xcat_key' has been set
 correctly in name.conf for zone morgan.haib.org? And did you see any useful
 message for this error in the syslog of external server?

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
 District Beijing P.R.China 100193

 Josh Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server
 working: partly. For some very odd reason the external

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-30 Thread Josh Nielsen

Xiao,

For some additional details I am running BIND 9.10.0-P2 which I compiled
from source. I used --enable-largefile which specfies 64-bit file
support but I noticed this in the kernel boot messages in
/var/log/messages:

Sep 29 11:25:25 dns01 kernel: warning: `named' uses 32-bit capabilities
(legacy support in use)

When I start named in the foreground here are the first few lines, where
you can see my compile options and a few things about startup:

30-Sep-2014 09:59:20.672 built with '--prefix=/opt/bind9'
'--sysconfdir=/etc' '--with-gtest' '--with-log4cplus=/opt/log4cplus'
'--with-pythonpath=/usr/bin/python' '--localstatedir=/var'
'--mandir=/usr/share/man' '--enable-threads' '--enable-largefile'
'--with-libtool' '--disable-static' '--with-openssl'
30-Sep-2014 09:59:20.672

30-Sep-2014 09:59:20.672 BIND 9 is maintained by Internet Systems
Consortium,
30-Sep-2014 09:59:20.672 Inc. (ISC), a non-profit 501(c)(3) public-benefit
30-Sep-2014 09:59:20.672 corporation.  Support and training for BIND 9 are
30-Sep-2014 09:59:20.672 available at https://www.isc.org/support
30-Sep-2014 09:59:20.672

30-Sep-2014 09:59:20.672 adjusted limit on open files from 4096 to 1048576
30-Sep-2014 09:59:20.672 found 1 CPU, using 1 worker thread
30-Sep-2014 09:59:20.672 using 1 UDP listener per interface
30-Sep-2014 09:59:20.672 using up to 4096 sockets
30-Sep-2014 09:59:20.672 Registering DLZ_dlopen driver
30-Sep-2014 09:59:20.672 Registering SDLZ driver 'dlopen'
30-Sep-2014 09:59:20.672 Registering DLZ driver 'dlopen'
30-Sep-2014 09:59:20.676 decrement_reference: delete from rbt:
0x7ff551ecd148 .
30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: created
30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -3 for
socket 20
30-Sep-2014 09:59:20.678 sockmgr 0x7ff551ec6010: watcher got message -2 for
socket -1
30-Sep-2014 09:59:20.678 socket 0x7ff551ede010: socket_recv: event
0x7ff551ee0010 - task 0x7ff551ed9250
30-Sep-2014 09:59:20.678 loading configuration from '/etc/named.conf'
30-Sep-2014 09:59:20.679 reading built-in trusted keys from file
'/etc/bind.keys'

I've been planning to move to an RPM based installation rolled out with
CentOS but was testing with a source install first. Does any of this look
like a likely culprit for the problems I'm seeing?

Thanks,
Josh

On Mon, Sep 29, 2014 at 9:40 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 Then, in this case, did you check the 'allow of key xcat_key' has been set
 correctly in name.conf for zone morgan.haib.org? And did you see any
 useful message for this error in the syslog of external server?

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/09/30
 06:49:50---Okay, I have the external DNS server working: partly. For some]Josh
 Nielsen ---2014/09/30 06:49:50---Okay, I have the external DNS server
 working: partly. For some very odd reason the external DNS serv

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2014/09/30 06:49
 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and
 makedns -e
 --



 Okay, I have the external DNS server working: partly. For some very odd
 reason the external DNS server is only receiving the request to enter the
 reverse lookup for a new node (ex: makedns -e node0014), but does not even
 attempt to add the forward lookup zone. I see the key authorization passed
 successfully for the reverse entry, so there are no authentication issues,
 and when I try a manual 'nsupdate' it adds the forward lookup definition
 just fine:

 # nsupdate -k /etc/rndc.key
  server [external_dns_ip]
  prereq nxdomain *node0014.morgan.haib.org*
 http://node0014.morgan.haib.org/.
  update add *node0014.morgan.haib.org* http://node0014.morgan.haib.org/.
 300 A 10.20.101.14
  send

 It looks like 'makedns -e node0014' is somehow sending ONLY the reverse
 lookup definition (though this is only a guess). This is what the client
 sees:

 [root@JNDev ~]# makedns -e node0014
 Handling node0014 in /etc/hosts.
 Getting reverse zones, this may take several minutes for a large cluster.
 Completed getting reverse zones.
 Updating DNS records, this may take several minutes for a large cluster.
 Error: No reply received when sending DNS update to zone *morgan.haib.org*
 http://morgan.haib.org/.
 Completed updating DNS records.

 It updates my reverse zone '20.10.IN-ADDR.ARPA' in the file 'db.10.20' but
 for the DNS zone '*morgan.haib.org* http://morgan.haib.org/' I see the
 message: Error: No reply received when sending DNS update to zone

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-26 Thread Josh Nielsen

Thank you for the clarifications. I read the wiki entry on Cluster Name
Resolution but wanted to be sure of the particulars.

Regards,
Josh

On Thu, Sep 25, 2014 at 10:36 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 #1, you are right, you need prepare configuration files in external dns
 server.

 #2, the xcat_key is gotten from passwd table like this
 'omapi,xcat_key,TjFYM2kwTUNOcWVVZG5QNWFhb2xPVkg1eTZLMXpuSGs='



 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/09/25 21:58:19---Hi
 Xiao, Thanks for the response. So if I have this straight, I]Josh Nielsen
 ---2014/09/25 21:58:19---Hi Xiao, Thanks for the response. So if I have
 this straight, I need to manually

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2014/09/25 21:58
 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and
 makedns -e
 --



 Hi Xiao,

 Thanks for the response. So if I have this straight, I need to manually
 create and maintain the zone files on the external DNS server, and already
 have initially empty (but present) zone files in place. Is that right? And
 I imagine that I will need to allow-update from the IP of the xCAT server
 too.

 Secondly, I still don't quite understand about the xcat_key. On the server
 hosting xCAT the xcat_key is typically defined in /etc/rndc.key and in
 /etc/named.conf. I should not even need /etc/named.conf since I am not
 hosting DNS on the same server as xCAT correct? Or does makedns have some
 sort of dependency on a local /etc/named.conf even when pushing externally?
 If /etc/named.conf is not needed, however, then where does makedns (which
 I think ends up running 'nsupdate') read the key from?

 Note that I have already configured the xcat_key on the external DNS
 server in its own /etc/named.conf, but I am not asking about that host but
 rather the xCAT server (which for purposes of DNS I am calling the
 client).

 Thanks,
 Josh

 On Wed, Sep 24, 2014 at 8:59 PM, Xiao Peng Wang *w...@cn.ibm.com*
 w...@cn.ibm.com wrote:

You can NOT depend on xCAT to setup dns server on remote server. You
need get the xcat_key from local dns configuration.

You need prepare the remote dns ready so that xCAT could push the new
dns entry to the remote dns server, this is what xCAT 'makedns -e' does.

Thanks
Best Regards
--
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: *w...@cn.ibm.com* w...@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193

[image: Inactive hide details for Josh Nielsen ---2014/09/25
05:46:04---I'm starting to see an older conversation that I had back in 
 Ja]Josh
Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation
that I had back in January of this year in a different lig

From: Josh Nielsen *jniel...@hudsonalpha.org*
jniel...@hudsonalpha.org
To: xCAT Users Mailing list *xcat-user@lists.sourceforge.net*
xcat-user@lists.sourceforge.net
Date: 2014/09/25 05:46
Subject: Re: [xcat-user] Questions on prerequisites for external DNS
and makedns -e
--



I'm starting to see an older conversation that I had back in January
of this year in a different light now:
 *http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182*
http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182

There it was suggested that I be aware that you may need to run a
separate makedns to populate the local zone files as well as makedns
-e to do the external name servers. Does this mean that an xCAT
installation is required on the external DNS server just to populate
the zones initially? Wouldn't having two xCAT installations, whose
records you have to keep in sync, get tedious after a while? Maybe I'm
misunderstanding how the external DNS solution is supposed to work in
connection with xCAT. Any clarifications are welcome.

-Josh

On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen 
*jniel...@hudsonalpha.org* jniel...@hudsonalpha.org wrote:
 Hello all,

 I am in the process of trying to move to an external DNS
implementation for
 name resolution with our compute cluster. The only requirement I see
in the
 man page for makedns is to have one (and only one?) IP of the
external DNS
 server that you want to update in /etc/resolv.conf and also a valid
 xcat_key.

 Firstly

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-25 Thread Josh Nielsen

Hi Xiao,

Thanks for the response. So if I have this straight, I need to manually
create and maintain the zone files on the external DNS server, and already
have initially empty (but present) zone files in place. Is that right? And
I imagine that I will need to allow-update from the IP of the xCAT server
too.

Secondly, I still don't quite understand about the xcat_key. On the server
hosting xCAT the xcat_key is typically defined in /etc/rndc.key and in
/etc/named.conf. I should not even need /etc/named.conf since I am not
hosting DNS on the same server as xCAT correct? Or does makedns have some
sort of dependency on a local /etc/named.conf even when pushing externally?
If /etc/named.conf is not needed, however, then where does makedns (which
I think ends up running 'nsupdate') read the key from?

Note that I have already configured the xcat_key on the external DNS server
in its own /etc/named.conf, but I am not asking about that host but rather
the xCAT server (which for purposes of DNS I am calling the client).

Thanks,
Josh

On Wed, Sep 24, 2014 at 8:59 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 You can NOT depend on xCAT to setup dns server on remote server. You need
 get the xcat_key from local dns configuration.

 You need prepare the remote dns ready so that xCAT could push the new dns
 entry to the remote dns server, this is what xCAT 'makedns -e' does.

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/09/25
 05:46:04---I'm starting to see an older conversation that I had back in 
 Ja]Josh
 Nielsen ---2014/09/25 05:46:04---I'm starting to see an older conversation
 that I had back in January of this year in a different lig

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2014/09/25 05:46
 Subject: Re: [xcat-user] Questions on prerequisites for external DNS and
 makedns -e
 --



 I'm starting to see an older conversation that I had back in January
 of this year in a different light now:
 http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182

 There it was suggested that I be aware that you may need to run a
 separate makedns to populate the local zone files as well as makedns
 -e to do the external name servers. Does this mean that an xCAT
 installation is required on the external DNS server just to populate
 the zones initially? Wouldn't having two xCAT installations, whose
 records you have to keep in sync, get tedious after a while? Maybe I'm
 misunderstanding how the external DNS solution is supposed to work in
 connection with xCAT. Any clarifications are welcome.

 -Josh

 On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen jniel...@hudsonalpha.org
 wrote:
  Hello all,
 
  I am in the process of trying to move to an external DNS implementation
 for
  name resolution with our compute cluster. The only requirement I see in
 the
  man page for makedns is to have one (and only one?) IP of the external
 DNS
  server that you want to update in /etc/resolv.conf and also a valid
  xcat_key.
 
  Firstly, how do you specify the xcat_key to be used on the client server
  that is pushing out the dns changes with makedns -e? For a local setup
 the
  key definition in /etc/named.conf is sufficient, but since makedns -e
 isn't
  dependent on the local config (or is it?), how is the xcat_key specified?
  The local /etc/named.conf file doesn't even need to exist on the client
 does
  it?
 
  Secondly, the server I am pushing to only has a 127.0.0 zone since it is
 a
  fresh BIND install and I'm wanting 'makedns -e' to create the new zones
 (and
  zone files under /var/named/) for me on the remote server. With a local
 DNS
  setup, makedns would parse your settings and handle all the file updates
 and
  creation for you. But when I try makedns -e I see the following  but no
  files or zone updates (is this an xcat_key problem?):
 
  (I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity)
 
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default'
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query
  24-Sep-2014 15:30:14.686 client 172.26.42.60#56844
  (9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1
  24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
  (9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN'
 approved
  24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
  (9.101.20.10.IN-ADDR.ARPA): send
  24-Sep-2014 15:30

[xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-24 Thread Josh Nielsen

Hello all,

I am in the process of trying to move to an external DNS implementation for
name resolution with our compute cluster. The only requirement I see in the
man page for makedns is to have one (and only one?) IP of the external DNS
server that you want to update in /etc/resolv.conf and also a valid
xcat_key.

Firstly, how do you specify the xcat_key to be used on the client server
that is pushing out the dns changes with makedns -e? For a local setup the
key definition in /etc/named.conf is sufficient, but since makedns -e isn't
dependent on the local config (or is it?), how is the xcat_key specified?
The local /etc/named.conf file doesn't even need to exist on the client
does it?

Secondly, the server I am pushing to only has a 127.0.0 zone since it is a
fresh BIND install and I'm wanting 'makedns -e' to create the new zones
(and zone files under /var/named/) for me on the remote server. With a
local DNS setup, makedns would parse your settings and handle all the file
updates and creation for you. But when I try makedns -e I see the following
 but no files or zone updates (is this an xcat_key problem?):

(I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity)

24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request
24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default'
24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed
24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available
24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query
24-Sep-2014 15:30:14.686 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN' approved
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): send
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): sendto
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): senddone
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): next
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0
24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
(9.101.20.10.IN-ADDR.ARPA): endrequest
24-Sep-2014 15:30:14.687 client @0x7f000c0d7710: udprecv
24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -3 for
socket 514
24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -2 for
socket -1
24-Sep-2014 15:30:14.687 socket 0x7f0017e1ebc8: socket_recv: event
0x7f0017c8c160 - task 0x7f0017e369d0
24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: dispatch_recv:  event
0x7f0017c8c160 - task 0x7f0017e369d0
24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: internal_recv: task
0x7f0017e369d0 got event 0x7f0017e1ec88
24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8 172.26.42.60#46141: packet
received correctly

9.101.20.10.IN-ADDR.ARPA/NS/IN must be my test node definition in the
client's /etc/hosts file 10.20.101.9 node0009 node0009.mydomain.org, but
I have no zone definition for 10.20 in /etc/named.conf on the external DNS
server yet.

On the Cluster Name Resolution wiki page (
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution)
under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If
you already have a DNS on your site network and you want to use that for
your cluster node names too, you can point all of the nodes to it. You must
ensure that your nodes have IP connectivity to the DNS, and you must
manually configure your DNS with the node hostnames and IP addresses.

Does 'makedns -e' not populate the zone files for you, just like it would
if DNS were running locally on the MN itself by just parsing /etc/hosts on
the client and adding/pushing it to the (remote) DNS zone files for you?

If not does this mean I need to hand configure the remote DNS server's
/etc/named.conf to stub out definitions for (as of yet) empty zones, or
will makedns -e do that for me?

Thanks,
Josh
--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Questions on prerequisites for external DNS and makedns -e

2014-09-24 Thread Josh Nielsen

I'm starting to see an older conversation that I had back in January
of this year in a different light now:
http://permalink.gmane.org/gmane.comp.clustering.xcat.user/182

There it was suggested that I be aware that you may need to run a
separate makedns to populate the local zone files as well as makedns
-e to do the external name servers. Does this mean that an xCAT
installation is required on the external DNS server just to populate
the zones initially? Wouldn't having two xCAT installations, whose
records you have to keep in sync, get tedious after a while? Maybe I'm
misunderstanding how the external DNS solution is supposed to work in
connection with xCAT. Any clarifications are welcome.

-Josh

On Wed, Sep 24, 2014 at 4:03 PM, Josh Nielsen jniel...@hudsonalpha.org wrote:
 Hello all,

 I am in the process of trying to move to an external DNS implementation for
 name resolution with our compute cluster. The only requirement I see in the
 man page for makedns is to have one (and only one?) IP of the external DNS
 server that you want to update in /etc/resolv.conf and also a valid
 xcat_key.

 Firstly, how do you specify the xcat_key to be used on the client server
 that is pushing out the dns changes with makedns -e? For a local setup the
 key definition in /etc/named.conf is sufficient, but since makedns -e isn't
 dependent on the local config (or is it?), how is the xcat_key specified?
 The local /etc/named.conf file doesn't even need to exist on the client does
 it?

 Secondly, the server I am pushing to only has a 127.0.0 zone since it is a
 fresh BIND install and I'm wanting 'makedns -e' to create the new zones (and
 zone files under /var/named/) for me on the remote server. With a local DNS
 setup, makedns would parse your settings and handle all the file updates and
 creation for you. But when I try makedns -e I see the following  but no
 files or zone updates (is this an xcat_key problem?):

 (I ran the remote DNS BIND daemon in the foreground with -d 60 verbosity)

 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: UDP request
 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: using view '_default'
 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: request is not signed
 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: recursion available
 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844: query
 24-Sep-2014 15:30:14.686 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): ns_client_attach: ref = 1
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): query '9.101.20.10.IN-ADDR.ARPA/NS/IN' approved
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): send
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): sendto
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): senddone
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): next
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): ns_client_detach: ref = 0
 24-Sep-2014 15:30:14.687 client 172.26.42.60#56844
 (9.101.20.10.IN-ADDR.ARPA): endrequest
 24-Sep-2014 15:30:14.687 client @0x7f000c0d7710: udprecv
 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -3 for
 socket 514
 24-Sep-2014 15:30:14.687 sockmgr 0x7f0017e06010: watcher got message -2 for
 socket -1
 24-Sep-2014 15:30:14.687 socket 0x7f0017e1ebc8: socket_recv: event
 0x7f0017c8c160 - task 0x7f0017e369d0
 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: dispatch_recv:  event
 0x7f0017c8c160 - task 0x7f0017e369d0
 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8: internal_recv: task
 0x7f0017e369d0 got event 0x7f0017e1ec88
 24-Sep-2014 15:30:14.688 socket 0x7f0017e1ebc8 172.26.42.60#46141: packet
 received correctly

 9.101.20.10.IN-ADDR.ARPA/NS/IN must be my test node definition in the
 client's /etc/hosts file 10.20.101.9 node0009 node0009.mydomain.org, but I
 have no zone definition for 10.20 in /etc/named.conf on the external DNS
 server yet.

 On the Cluster Name Resolution wiki page
 (http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution)
 under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If you
 already have a DNS on your site network and you want to use that for your
 cluster node names too, you can point all of the nodes to it. You must
 ensure that your nodes have IP connectivity to the DNS, and you must
 manually configure your DNS with the node hostnames and IP addresses.

 Does 'makedns -e' not populate the zone files for you, just like it would if
 DNS were running locally on the MN itself by just parsing /etc/hosts on the
 client and adding/pushing it to the (remote) DNS zone files for you?

 If not does this mean I need to hand configure the remote DNS server's
 /etc/named.conf to stub out definitions for (as of yet) empty zones, or will
 makedns -e do that for me?

 Thanks,
 Josh

Re: [xcat-user] Makedns wiping out DNS records

2014-04-02 Thread Josh Nielsen

As Christopher notes, it is an additional attack surface. Also I would like
to not have to worry about an additional named instance running in my
network which I am not using. If it is not currently possible to prevent
the local named daemon on the MN from starting when makedns is run, can
that behavior be changed in a future xCAT version if you want an external
master only?

I'm still curious as to whether the /var/named/ directory would have any
zone files locally on the MN in the case where named is running locally but
you are pushing updates to an external DNS server, or whether it only
pushes zone entries out to the DNS server specified with 'makedns -e' (as
it should).

-Josh


On Mon, Mar 31, 2014 at 8:17 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 Enable the local DNS on xCAT MN does not hurt anything except the
 cpu/memory to run makedns in your case, right?


 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/04/01
 00:00:13---By default, makedns sets up the named service and updates 
 the]Josh
 Nielsen ---2014/04/01 00:00:13---By default, makedns sets up the named
 service and updates the DNS records on the local system (man

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 sam...@unimelb.edu.au,
 Date: 2014/04/01 00:00
 Subject: Re: [xcat-user] Makedns wiping out DNS records
 --



 By default, makedns sets up the named service and updates the DNS
 records on the local system (management node). If the -e flag is specified,
 it will also update the DNS records on any external DNS server that is
 listed in the /etc/resolv.conf on the management node.

 Interesting, so even if I use an external DNS server I can't get rid
 of/turn off the DNS service on the MN? I suppose I could do a 'chkconfig
 named off' but makedns starts the named daemon anyway when you run it (I
 think). I want to make the external server the sole Master DNS server, and
 every other server its slave (I have only two other slave DNS servers,
 which are my Service Nodes).

 I'm in the same boat as Christopher who noted One kink I think we've
 noticed there is that even with makedns -e xCAT seems to start up bind on
 the local system which isn't what we want at all. I also would prefer the
 local BIND/named daemon to not start at all, and rather make the external
 DNS the sole master. However, as you indicated makedns by default (which
 I'm assuming is a behavior that can't be changed [yet]?) sets up the local
 named service when it is run. Can xCAT be modified to accommodate the
 external only master scenario?

 Regards,
 Josh Nielsen


 On Thu, Mar 27, 2014 at 12:38 PM, Lissa Valletta 
 *lis...@us.ibm.com*lis...@us.ibm.com
 wrote:

It is saying if your site has an external name server   and you want
to use it , you need to add that external name server into the
/etc/resolv.conf file.You also have to add the correct nameservers in
the site table before you run makedns.   This is the manual part.
chdef -t site nameservers=50.1.2.254  for example.

The makedns -e  will take the external nameservers that  you have
listed in /etc/resolv.conf  and update their DNS records, assuming you have
the authority to change that server.
From man makedns

 By default, makedns sets up the named service and updates the DNS
records on the local system
   (management node). If the -e flag is specified, it will also
update the DNS records on any
   external DNS server that is listed in the /etc/resolv.conf on
the management node. (Assuming the
   external DNS server can recognize the xCAT key as
authentication.)


Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



[image: Inactive hide details for Josh Nielsen ---03/26/2014 03:15:37
PM---Sorry to revive this, but I have a question about external D]Josh
Nielsen ---03/26/2014 03:15:37 PM---Sorry to revive this, but I have a
question about external DNS since I am thinking of trying it now.

From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org

To: xCAT Users Mailing list 
 *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net,

Date: 03/26/2014 03:15 PM
Subject: Re: [xcat-user] Makedns wiping out DNS records
--



Sorry to revive this, but I have a question about external DNS since I
am thinking of trying it now.

On the Cluster Name Resolution wiki page (

 *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution*http://sourceforge.net/apps

Re: [xcat-user] Makedns wiping out DNS records

2014-04-01 Thread Josh Nielsen

Okay, as long as makedns -e updates the external named server just like it
would a local named instance. The reason I had asked that is because the
documentation was rather ambiguously worded on that point. So now I'm just
trying to figure out the best way to make xCAT only update the external DNS
instance. Since you said that you have the problem of the local deamon
starting up regardless when you run makedns -e, have you ever noticed
whether it is actually updating local records also? Is your /var/named/
directory on the xCAT server populated with zone files? Also did you remove
the MN's local host IP (whether loopback or assigned) from its
/etc/resolv.conf and make it point only to the external DNS server's IP?

Since I'm trying to transition without an outage I think I will have to
have a brief period where I have both the MN and the external server
running identical named configs. Then I can slowly begin to move my compute
cluster nodes to resolve to the external server in 10 node increments or
so, so that I can troubleshoot any problems and undo it to resolve back to
the MN again if need be. I just want to be as careful as possible.

Thanks,
Josh


On Mon, Mar 31, 2014 at 10:17 PM, Christopher Samuel
sam...@unimelb.edu.auwrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 27/03/14 06:11, Josh Nielsen wrote:

  What does it mean that you must manually configure the hostnames
  and IP addresses? Does 'makedns -e' not do that for you, just like
  it would if DNS were running locally on the MN itself by just
  parsing /etc/hosts and adding/pushing it to the (remote) DNS zone
  files for you?

 We've occasionally pushed records into the DNS server by hand when
 they're things external to xCAT, but I we've gone from that to
 defining them in xCAT on our infrastructure management node and
 pushing from there.

 So we've got 4 xCAT management nodes updating the same external DNS
 servers with makedns -e.

 cheers,
 Chris
 - --
  Christopher SamuelSenior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iEYEARECAAYFAlM6L7UACgkQO2KABBYQAh+SRgCbBrVkrOB53IUqj1+OSWpmlMe9
 9HMAn0Zf3hFtebzDfoyfUs/4ctdE8C/g
 =2YjS
 -END PGP SIGNATURE-


 --
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Makedns wiping out DNS records

2014-03-26 Thread Josh Nielsen

Sorry to revive this, but I have a question about external DNS since I am
thinking of trying it now.

On the Cluster Name Resolution wiki page (
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution)
under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If
you already have a DNS on your site network and you want to use that for
your cluster node names too, you can point all of the nodes to it. You must
ensure that your nodes have IP connectivity to the DNS, and you must
manually configure your DNS with the node hostnames and IP addresses.

What does it mean that you must manually configure the hostnames and IP
addresses? Does 'makedns -e' not do that for you, just like it would if DNS
were running locally on the MN itself by just parsing /etc/hosts and
adding/pushing it to the (remote) DNS zone files for you?

In my case I may even be migrating my existing DNS zone files and
/etc/named.conf config onto a new VM and bring up DNS there and just add
any new hosts. But I would be surprised if makedns -e doesn't add the
hostnames and IPs for you. Can anyone who has used the external DNS option
with xCAT speak to that?

Thanks,
Josh


On Thu, Jan 16, 2014 at 5:13 PM, Christopher Samuel
sam...@unimelb.edu.auwrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 17/01/14 04:00, Josh Nielsen wrote:

  If it just something with our installation or does makedns do this
  foe other people as well?

 FWIW we do not run DNS on the management nodes at all, we have 4
 separate xCAT built clusters (3 HPC clusters and 1 for our GPFS/TSM
 infrastructure and our BG/Q service and front ends) and so to let them
 all populate DNS with a single, consistent view we have two external
 DNS servers that they all send updates to with makedns -e.

 One kink I think we've noticed there is that even with makedns -e xCAT
 seems to start up bind on the local system which isn't what we want at
 all.

 Good luck!
 Chris
 - --
  Christopher SamuelSenior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iEYEARECAAYFAlLYZ4sACgkQO2KABBYQAh8pwACfSEi4RePX5F2hG5E1AtrAdy77
 PtMAni1KEwsOM/az4z0U7o4aZZOEuO/+
 =fPWd
 -END PGP SIGNATURE-


 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Makedns wiping out DNS records

2014-03-26 Thread Josh Nielsen

Also, if I externalize DNS but leave DHCP on the MN nothing has to change
in the DHCP config, except changes propagated from xcat via makedhcp, does
it? As long as I specify site.nameservers=X.X.X.X as the IP of my external
DNS master and update /etc/dhcpd.conf with makedhcp only the
domain-name-servers line should change, right? I know that I need to
change the MN's resolv.conf to point to the external DNS server too to make
this work. Also as
thishttps://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg00451.htmlarchived
list email suggests I should also make sure the TSIG key for DNS
is defined in /etc/named.conf on both the MN and the external DNS server
(assuming I allow DNS updates only with the xcat_key signature). Does that
mean the MN still needs its named daemon running to push out updates
though?

Overall, in addition to the things mentioned above, is there anything else
that I have missed that will be necessary to do to make external DNS work
with xCAT?

Thanks,
Josh


On Wed, Mar 26, 2014 at 2:11 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

 Sorry to revive this, but I have a question about external DNS since I am
 thinking of trying it now.

 On the Cluster Name Resolution wiki page (
 http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution)
 under 'Option #2: Use a DNS That is Outside of the Cluster' it says: If
 you already have a DNS on your site network and you want to use that for
 your cluster node names too, you can point all of the nodes to it. You must
 ensure that your nodes have IP connectivity to the DNS, and you must
 manually configure your DNS with the node hostnames and IP addresses.

 What does it mean that you must manually configure the hostnames and IP
 addresses? Does 'makedns -e' not do that for you, just like it would if DNS
 were running locally on the MN itself by just parsing /etc/hosts and
 adding/pushing it to the (remote) DNS zone files for you?

 In my case I may even be migrating my existing DNS zone files and
 /etc/named.conf config onto a new VM and bring up DNS there and just add
 any new hosts. But I would be surprised if makedns -e doesn't add the
 hostnames and IPs for you. Can anyone who has used the external DNS option
 with xCAT speak to that?

 Thanks,
 Josh


 On Thu, Jan 16, 2014 at 5:13 PM, Christopher Samuel sam...@unimelb.edu.au
  wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 On 17/01/14 04:00, Josh Nielsen wrote:

  If it just something with our installation or does makedns do this
  foe other people as well?

 FWIW we do not run DNS on the management nodes at all, we have 4
 separate xCAT built clusters (3 HPC clusters and 1 for our GPFS/TSM
 infrastructure and our BG/Q service and front ends) and so to let them
 all populate DNS with a single, consistent view we have two external
 DNS servers that they all send updates to with makedns -e.

 One kink I think we've noticed there is that even with makedns -e xCAT
 seems to start up bind on the local system which isn't what we want at
 all.

 Good luck!
 Chris
 - --
  Christopher SamuelSenior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.org.au/  http://twitter.com/vlsci

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.14 (GNU/Linux)
 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

 iEYEARECAAYFAlLYZ4sACgkQO2KABBYQAh8pwACfSEi4RePX5F2hG5E1AtrAdy77
 PtMAni1KEwsOM/az4z0U7o4aZZOEuO/+
 =fPWd
 -END PGP SIGNATURE-


 --
 CenturyLink Cloud: The Leader in Enterprise Cloud Services.
 Learn Why More Businesses Are Choosing CenturyLink Cloud For
 Critical Workloads, Development Environments  Everything In Between.
 Get a Quote or Start a Free Trial Today.

 http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

[xcat-user] How can I prevent overwritting of resolv.conf and named.conf?

2014-03-07 Thread Josh Nielsen

I have noticed that with my recent restructuring of my cluster's DNS
hierarchy by creating two Service Nodes to stand in between the compute
nodes and the Management Node that I am having two separate problems with
files being overwritten once I modify them.

Firstly, I configured the SNs to act as actual slave DNS servers instead of
just forwarding to the MN (that feature it looks like will be officially
supported in the next xcat release but is not supported in the current
one), so I had to edit /etc/named.conf to facilitate that. Before I edited
that file on both SNs it simply had an options { } block ending with
forward only and a forwarders { } block with the IP of the MN, but I
removed the forward only statement, added zone definitions, and made each
zone a slave to the MN. It worked perfectly. The only problem is that every
couple days (and it happened again this morning) all my changes get erased
somehow and named.conf is regenerated to the default file with only an
options { } block. How can I prevent that from happening?

Secondly, for compute nodes and storage nodes which were dhcp enabled
instead of statically assigned in their
/etc/sysconfig/network-scripts/ifcfg-eth* files, when I manually edited the
/etc/resolv.conf (though a postscript would do the same) it too would get
overwritten fairly soon after I made the change, back to only pointing to
the MN for DNS. I changed the resolv.conf to point not just to the MN (as
they did originally) but created three nameserver entries to look for DNS
name servers in the following order: SN1, SN2, MN.

I fixed this by statically assigning IPs in the ifcfg-eth*  files, but I
am wondering if there is a better way. DHCP has the ability to push out DNS
server names for resolv.conf and so I looked to see if it was the culprit
and I changed the option domain-name-servers line to include SN1, SN2,
and the MN (does the nameservers value in the xCAT 'site' table set this
line?), but I'm not sure if that is the line for DHCP responsible for
changing the values in /etc/resolv.conf, or how often the DHCP changes were
pushed out (this is happening for machines which are not being rebooted or
reinitialized in any way - just running as normal - and they suddenly
change their resolv.conf).

Are any of the suggestions on this page good options:
http://www.cyberciti.biz/faq/dhclient-etcresolvconf-hooks/? I don't have a
dhclient.conf file on my RHEL/CentOS servers though. Anyway, any
suggestions would be much appreciated!

Thanks,
Josh
--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] How can I prevent overwritting of resolv.conf and named.conf?

2014-03-07 Thread Josh Nielsen

Thank you Dennis and Jonathan. Setting PEERDNS=no was part of my fix to
the ifcfg-eth* files, and I guess that's the best way to prevent
revolv.conf from being overwritten then. I also saw the suggestion for
chatt +i but would only like to use it as a last resort, since I might
loose track of which nodes I have and haven't done that one, especially if
it is a newly deployed node from xcat.

So on to named.conf then. What would be resetting it? Does the slave
configuration have something to do with it? I don't even know if it is a
remotely initiated change or whether it originates locally for some reason.
Is there any config I can post here that would help determine that?

Thanks,
Josh


On Fri, Mar 7, 2014 at 11:50 AM, Jonathan Mills jonmi...@renci.org wrote:

 What Dennis says should work.  However, I think the accepted Redhat
 way of doing it is to put

 PEERDNS=no

 in you /etc/sysconfig/network-scripts/ifcfg-ethX script.

 On 03/07/2014 12:48 PM, Dennis Zheleznyak wrote:
  You can lock the file by entering the following command chattr +i
  /etc/resolv.conf. This will lock the file even for root.
 
  Dennis.
 
  On Friday, March 7, 2014, Josh Nielsen jniel...@hudsonalpha.org
  mailto:jniel...@hudsonalpha.org wrote:
 
  I have noticed that with my recent restructuring of my cluster's DNS
  hierarchy by creating two Service Nodes to stand in between the
  compute nodes and the Management Node that I am having two separate
  problems with files being overwritten once I modify them.
 
  Firstly, I configured the SNs to act as actual slave DNS servers
  instead of just forwarding to the MN (that feature it looks like
  will be officially supported in the next xcat release but is not
  supported in the current one), so I had to edit /etc/named.conf to
  facilitate that. Before I edited that file on both SNs it simply had
  an options { } block ending with forward only and a forwarders {
  } block with the IP of the MN, but I removed the forward only
  statement, added zone definitions, and made each zone a slave to the
  MN. It worked perfectly. The only problem is that every couple days
  (and it happened again this morning) all my changes get erased
  somehow and named.conf is regenerated to the default file with only
  an options { } block. How can I prevent that from happening?
 
  Secondly, for compute nodes and storage nodes which were dhcp
  enabled instead of statically assigned in their
  /etc/sysconfig/network-scripts/ifcfg-eth* files, when I manually
  edited the /etc/resolv.conf (though a postscript would do the same)
  it too would get overwritten fairly soon after I made the change,
  back to only pointing to the MN for DNS. I changed the resolv.conf
  to point not just to the MN (as they did originally) but created
  three nameserver entries to look for DNS name servers in the
  following order: SN1, SN2, MN.
 
  I fixed this by statically assigning IPs in the ifcfg-eth*  files,
  but I am wondering if there is a better way. DHCP has the ability to
  push out DNS server names for resolv.conf and so I looked to see if
  it was the culprit and I changed the option domain-name-servers
  line to include SN1, SN2, and the MN (does the nameservers value
  in the xCAT 'site' table set this line?), but I'm not sure if that
  is the line for DHCP responsible for changing the values in
  /etc/resolv.conf, or how often the DHCP changes were pushed out
  (this is happening for machines which are not being rebooted or
  reinitialized in any way - just running as normal - and they
  suddenly change their resolv.conf).
 
  Are any of the suggestions on this page good options:
  http://www.cyberciti.biz/faq/dhclient-etcresolvconf-hooks/? I don't
  have a dhclient.conf file on my RHEL/CentOS servers though. Anyway,
  any suggestions would be much appreciated!
 
  Thanks,
  Josh
 

 --
 Jonathan Mills
 Systems Administrator
 Renaissance Computing Institute
 UNC-Chapel Hill


 --
 Subversion Kills Productivity. Get off Subversion  Make the Move to
 Perforce.
 With Perforce, you get hassle-free workflows. Merge that actually works.
 Faster operations. Version large binaries.  Built-in WAN optimization and
 the
 freedom to use Git, Perforce or both. Make the move to Perforce.

 http://pubads.g.doubleclick.net/gampad/clk?id=122218951iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Subversion Kills Productivity. Get off Subversion  Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works. 
Faster operations. Version large

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-02-05 Thread Josh Nielsen

Okay, I guess I need to revive this again now that I have the SNs deployed
and now I am trying to snmove some nodes onto them. The Heirarchical
Cluster wiki page is oriented toward those setting up a brand new cluster
and not migrating an established cluster to include SNs, so it does not
include clear instructions of what commands to run after you have created
groups of CNs for SNs to manage. I am assuming that to get nodes to
initially look away from the MN and put them on an SN for the first time
you must execute snmove with -d and -D to point to the SN.

My config follows:

I am testing on just two of the nodes in my cluster for now. So first I did
this:

mkdef -t group -o serv1_compute members=node0001,node0002

Then following the documentation for creating service pools I did this:

chdef -t group serv1_compute servicenode=xcat-serv1,xcat-serv2

# lsdef -t group serv1_compute
Object name: serv1_compute
grouptype=static
members=node0001,node0002
servicenode=xcat-serv1,xcat-serv2

And noderes looks like this now:

#node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,nameservers,comments,disable
user,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
service,,xnba,MN_IP,,MN_IP,,,mac,mac,,,MN_IP,,,
storage,,xnba,MN_IP,,MN_IP,,,eth1,eth1,,,MN_IP,,,
compute,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
login,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
node0059,,xnba,
hinode01,,xnba,
serv1_compute,xcat-serv1,xcat-serv2,,
node0001,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,,
node0002,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,,

I may have a conflict problem though in that the established compute
group which node0001 and node0002 are in is pointing to MN_IP (the MN's ip
address) while serv1_compute points to xcat-serv1. I was hoping that since
noderes FURTHER defined the servicenode and xcatmaster for them that it
would override the settings for compute. Will that work or do I have to
remove node0001  node0002 from compute altogether?

Their nodelist entries look like this:

node0001,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013
13:55:00,synced,02-05-2014 08:59:57,,
node0002,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013
13:55:00



Then after all the configuration, I tried an snmove on just node0001:


# snmove serv1_compute -d xcat-serv1 -D xcat-serv1
Moving nodes to their backup service nodes.

Setting new values in the xCAT database.


node0001: install centos6.4-x86_64-compute
node0002: install centos6.4-x86_64-compute
node0001: install centos6.4-x86_64-compute
node0002: install centos6.4-x86_64-compute
Running postscripts on the nodes.
If you specify the -s flag you must not specify either the -S or -k or -P
 flags

In /var/log/messages I saw: Allowing nodeset to node0001,node0002 install
for x3650-head01.haib.org http://x3650-head01.haig.org/ from x3650-head01

Firstly, why was a nodeset done when I typed snmove? The nodes are already
installed, I don't want to reinstall them.

Secondly, According to the wiki documentation: If the CNs are up at the
time the *snmove* command is run then snmove will run postscripts on the
CNs to reconfigure them for the new SN.

However I checked files on node0001 like /etc/ntp.conf and their timestamp
had not changed (therefore I deduce the postscript did not run). So I ran
the postscripts manually with updatenode node0001 syslog,setupntp. I
checked  /etc/ntp.conf again and this time the timestamp was updated but
the file's contents were identical to before: it pointed to the MN_IP and
not xcat-serv1 as it should based on the xcatmaster setting in the noderes
table.

What am I doing wrong here?

Thanks,
Josh




On Fri, Jan 10, 2014 at 1:48 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

 Thank you Lissa, that is helpful.

 -Josh


 On Fri, Jan 10, 2014 at 1:25 PM, Lissa Valletta lis...@us.ibm.com wrote:

  DNS and DHCP  will still work from the Service Node, if setup
 correctly.  In other words,   you have configured the service node as the
 DNS server and/or  DHCP server for the nodes  and there is no requirement
 on the Management Node for dns or dhcp.   You will  not be able to run
 any xcat commands on the service node, if the Management Node is down.
 xCAT  requires access to the database configured on the MN for the xcat
 cluster ( mysql, postgresql) to run most xcat commands.  Even to recognize
 that the node is in the xcat cluster.

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/10/2014 12:59:12
 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong befor]Josh
 Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got
 your

Re: [xcat-user] xCAT start_SSL Bug?

2014-01-27 Thread Josh Nielsen

Actually there is a disparity between the MN and SN OS type/version. First
of all the MN is RedHat, and my service nodes will be Centos. That's
shouldn't be a large problem. However the real difference is that the MN is
running Red Hat Enterprise Linux Server release 5.9 (Tikanga) right now. I
need the SN up to serve DNS and DHCP while I take the MN down to update it
to RHEL 6.X. Would the versioning difference cause a problem? The OS repo
for the SN is of course only pulling from the Centos 6.4 RPMs.

Currently the MN has these package versions:

# rpm -qa | grep -i ssl | sort
gskssl64-8.0-14.14
openssl-0.9.8e-20.el5
openssl-0.9.8e-20.el5
openssl-0.9.8e-20.el5_7.1
openssl-0.9.8e-20.el5_7.1
openssl-0.9.8e-22.el5
openssl-0.9.8e-22.el5
openssl-0.9.8e-22.el5_8.1
openssl-0.9.8e-22.el5_8.1
openssl-0.9.8e-22.el5_8.3
openssl-0.9.8e-22.el5_8.3
openssl-0.9.8e-22.el5_8.4
openssl-0.9.8e-22.el5_8.4
openssl-devel-0.9.8e-22.el5_8.4
openssl-devel-0.9.8e-22.el5_8.4
perl-Crypt-SSLeay-0.51-11.el5
perl-IO-Socket-SSL-1.01-1.fc6
perl-Net-SSLeay-1.30-4.fc6
pyOpenSSL-0.6-2.el5

And the SN:

# rpm -qa | grep -i ssl | sort
openssl098e-0.9.8e-17.el6.centos.2.x86_64
openssl-1.0.0-27.el6.x86_64
perl-Crypt-SSLeay-0.57-16.el6.x86_64
perl-IO-Socket-SSL-1.77-1.x86_64
perl-Net-SSLeay-1.35-9.el6.x86_64

-Josh


On Mon, Jan 27, 2014 at 10:34 AM, Lissa Valletta lis...@us.ibm.com wrote:

 Running in XCATBYPASS mode only stops you from using the daemon which
 stops the SSL Credential use,  which is why you do not get errors in
 XCATBYPASS.

 Is you MN  Centos 6.4?I would run rpm -qa | grep -i ssl  and see what
 perl ssl rpms are installed.  Do the same on the SErvice node.
 My service node has these for Redhat.
 openssl098e-0.9.8e-17.el6_2.2.ppc64
 perl-Net-SSLeay-1.35-9.el6.ppc64
 openssl-1.0.0-20.el6_2.5.ppc64
 openssl098e-0.9.8e-17.el6_2.2.ppc
 perl-IO-Socket-SSL-1.31-2.el6.noarch
 perl-Crypt-SSLeay-0.57-16.el6.ppc64

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/27/2014 10:36:13
 AM---Yes, I read that on the bug report but mine is already set t]Josh
 Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but
 mine is already set to that, and that is only useful if the r

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/27/2014 10:36 AM
 Subject: Re: [xcat-user] xCAT start_SSL Bug?
 --



 Yes, I read that on the bug report but mine is already set to that, and
 that is only useful if the repo has the files needed. If it is
 perl-NET-SSLeay then even pointing it to xcat-dep/rh6/x86_64 won't help
 since it isn't present.

 Here are my yum repos:

 /etc/yum.repos.d/xCAT-otherpkgs0.repo:
 [xcat-otherpkgs0]
 name=xcat-otherpkgs0
 baseurl=*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/
 .
 enabled=1
 gpgcheck=0

 /etc/yum.repos.d/xCAT-otherpkgs1.repo:
 [xcat-otherpkgs1]
 name=xcat-otherpkgs1
 baseurl=
 *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core
 enabled=1
 gpgcheck=0

 /etc/yum.repos.d/xCAT-otherpkgs2.repo:
 [xcat-otherpkgs2]
 name=xcat-otherpkgs2
 baseurl=
 *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64
 enabled=1
 gpgcheck=0

 I decided to remove the xCAT-otherpkgs0.repo, just in case it was
 interfering, then uninstall all xCAT packages, and reinstall them but it
 still has the same problem after all that. Here were the yum packages
 grabbed for the install:

 Dependencies Resolved


 
  PackageArch
Version   Repository
Size

 
 Installing:
  conserver-xcat x86_64
8.1.16-10 xcat-otherpkgs2
218 k
  cpio   x86_64
2.11-20   xcat-otherpkgs2
252 k
  elilo-xcat noarch
3.14-4xcat-otherpkgs2
 75 k
  ipmitool-xcat  x86_64
1.8.11-3  xcat-otherpkgs2
244

Re: [xcat-user] xCAT start_SSL Bug?

2014-01-27 Thread Josh Nielsen

Also, as far as I can tell, all the correct SSL certs are present and I can
ssh directly into the SN without providing login credentials. I don't see
any SSL related errors in the system logs either. What is the basic issue
that causes the error? An incorrect perl library?


On Mon, Jan 27, 2014 at 11:07 AM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

 Actually there is a disparity between the MN and SN OS type/version. First
 of all the MN is RedHat, and my service nodes will be Centos. That's
 shouldn't be a large problem. However the real difference is that the MN is
 running Red Hat Enterprise Linux Server release 5.9 (Tikanga) right now. I
 need the SN up to serve DNS and DHCP while I take the MN down to update it
 to RHEL 6.X. Would the versioning difference cause a problem? The OS repo
 for the SN is of course only pulling from the Centos 6.4 RPMs.

 Currently the MN has these package versions:

 # rpm -qa | grep -i ssl | sort
 gskssl64-8.0-14.14
 openssl-0.9.8e-20.el5
 openssl-0.9.8e-20.el5
 openssl-0.9.8e-20.el5_7.1
 openssl-0.9.8e-20.el5_7.1
 openssl-0.9.8e-22.el5
 openssl-0.9.8e-22.el5
 openssl-0.9.8e-22.el5_8.1
 openssl-0.9.8e-22.el5_8.1
 openssl-0.9.8e-22.el5_8.3
 openssl-0.9.8e-22.el5_8.3
 openssl-0.9.8e-22.el5_8.4
 openssl-0.9.8e-22.el5_8.4
 openssl-devel-0.9.8e-22.el5_8.4
 openssl-devel-0.9.8e-22.el5_8.4
 perl-Crypt-SSLeay-0.51-11.el5
 perl-IO-Socket-SSL-1.01-1.fc6
 perl-Net-SSLeay-1.30-4.fc6
 pyOpenSSL-0.6-2.el5

 And the SN:

 # rpm -qa | grep -i ssl | sort
 openssl098e-0.9.8e-17.el6.centos.2.x86_64
 openssl-1.0.0-27.el6.x86_64
 perl-Crypt-SSLeay-0.57-16.el6.x86_64
 perl-IO-Socket-SSL-1.77-1.x86_64
 perl-Net-SSLeay-1.35-9.el6.x86_64

 -Josh


 On Mon, Jan 27, 2014 at 10:34 AM, Lissa Valletta lis...@us.ibm.comwrote:

 Running in XCATBYPASS mode only stops you from using the daemon which
 stops the SSL Credential use,  which is why you do not get errors in
 XCATBYPASS.

 Is you MN  Centos 6.4?I would run rpm -qa | grep -i ssl  and see what
 perl ssl rpms are installed.  Do the same on the SErvice node.
 My service node has these for Redhat.
 openssl098e-0.9.8e-17.el6_2.2.ppc64
 perl-Net-SSLeay-1.35-9.el6.ppc64
 openssl-1.0.0-20.el6_2.5.ppc64
 openssl098e-0.9.8e-17.el6_2.2.ppc
 perl-IO-Socket-SSL-1.31-2.el6.noarch
 perl-Crypt-SSLeay-0.57-16.el6.ppc64

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/27/2014 10:36:13
 AM---Yes, I read that on the bug report but mine is already set t]Josh
 Nielsen ---01/27/2014 10:36:13 AM---Yes, I read that on the bug report but
 mine is already set to that, and that is only useful if the r

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/27/2014 10:36 AM
 Subject: Re: [xcat-user] xCAT start_SSL Bug?
 --



 Yes, I read that on the bug report but mine is already set to that, and
 that is only useful if the repo has the files needed. If it is
 perl-NET-SSLeay then even pointing it to xcat-dep/rh6/x86_64 won't help
 since it isn't present.

 Here are my yum repos:

 /etc/yum.repos.d/xCAT-otherpkgs0.repo:
 [xcat-otherpkgs0]
 name=xcat-otherpkgs0
 baseurl=*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/
 .
 enabled=1
 gpgcheck=0

 /etc/yum.repos.d/xCAT-otherpkgs1.repo:
 [xcat-otherpkgs1]
 name=xcat-otherpkgs1
 baseurl=
 *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-core
 enabled=1
 gpgcheck=0

 /etc/yum.repos.d/xCAT-otherpkgs2.repo:
 [xcat-otherpkgs2]
 name=xcat-otherpkgs2
 baseurl=
 *http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64*http://10.20.0.1/install/post/otherpkgs/centos6.4/x86_64/xcat/xcat-dep/rh6/x86_64
 enabled=1
 gpgcheck=0

 I decided to remove the xCAT-otherpkgs0.repo, just in case it was
 interfering, then uninstall all xCAT packages, and reinstall them but it
 still has the same problem after all that. Here were the yum packages
 grabbed for the install:

 Dependencies Resolved


 
  PackageArch
  Version   Repository
  Size

 
 Installing:
  conserver-xcat x86_64
  8.1.16-10 xcat-otherpkgs2
  218 k
  cpio   x86_64
  2.11-20

Re: [xcat-user] Determine which otherpkgs.pkglist file is being used

2014-01-22 Thread Josh Nielsen

Okay, but did you see the problem I mentioned with
provmethod=centos6.4-x86_64-install-service? That would mean (I think)
that you have to create one (sub)directory name per service image
under /install/custom in order for it to find the .tmpl and .pkglist
files, since the scheme for locating those files is:
/install/custom/provmethod/platform. Do I have to create a
/install/custom/centos6.4-x86_64-install-service/centos directory now
and no longer use /install/custom/install/whatever? Also the initrd
 kenel parameters take on the service image's name in their path when
you nodeset with 'osimage'.

I am quite confused, because currently I'm convinced that if I do a
nodeset to xcat-serv1 with the osimage set to the current service
image that it will not work because the directories it will point to
don't exist. For example, where would I find the directory pointed to
here: initrd=xcat/osimage/centos6.4-x86_64-install-service/initrd.img?

You also had suggested using /install/custom/install/centos (as I have
been), but with the osimage method it seems that directory is no
longer usable since it would rather require
/install/custom/centos6.4-x86_64-install-service/centos.

Can any one clear up my confusion here?

On Wed, Jan 22, 2014 at 6:48 AM, Lissa Valletta lis...@us.ibm.com wrote:
 It is supported,  but we are encouraging everyone to go the osimage.   The
 provmethod install, netboot, statelite will be sunset over time and even now
 our testing is with provmethod=osimage name.  This is why we are warning.

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 Josh Nielsen ---01/21/2014 05:55:28 PM---On a hunch I tried using the
 install keyword with nodeset in additon to osimage, and it actually set

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/21/2014 05:55 PM
 Subject: Re: [xcat-user] Determine which otherpkgs.pkglist file is being
 used

 



 On a hunch I tried using the install keyword with nodeset in additon to
 osimage, and it actually set the provmethod to install like I wanted, but I
 got a warning:

 # nodeset xcat-serv1 install osimage=centos6.4-x86_64-install-service
 Warning: The options install, netboot, and statelite have been
 deprecated. They should continue to work in this release, but have not been
 tested as carefully, and some new functions are not available with these
 options.  For full function and support, use nodeset noderange
 osimage=osimage_name instead.
 xcat-serv1: install centos6.4-x86_64-service

 # lsdef xcat-serv1
 Object name: xcat-serv1
 arch=x86_64
 currchain=boot
 currstate=install centos6.4-x86_64-service
 groups=service,ipmi,all
 initrd=xcat/centos6.4/x86_64/service/initrd.img
 installnic=mac
 ip=10.20.254.227
 kcmdline=quiet repo=http://10.20.0.1:80/install/centos6.4/x86_64
 ks=http://10.20.0.1:80/install/autoinst/xcat-serv1
 ksdevice=00:50:56:bc:0c:db
 kernel=xcat/centos6.4/x86_64/service/vmlinuz
 mac=00:50:56:bc:0c:db
 netboot=xnba
 nfsserver=10.20.0.1
 nodetype=osi
 os=centos6.4

 postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
 primarynic=mac
 profile=service
 provmethod=install
 setupconserver=1
 setupdhcp=1
 setupnameserver=1
 setupnfs=1
 setupntp=1
 setuptftp=1
 status=booting
 statustime=01-17-2014 16:05:46
 tftpserver=10.20.0.1
 updatestatus=failed
 updatestatustime=01-17-2014 17:47:47
 xcatmaster=10.20.0.1

 If that warning is correct that install, netboot, and statelite are
 deprecated then that means that we cannot use the
 /install/custom/install|netboot|statelite/platform directories any more.
 Does that mean we will have to have one directory for each osimage name
 under /install/custom/ now? That would be significantly less convenient -
 though doable. Since you also suggested /install/custom/install/centos as a
 good directory to use, were you aware of this?

 That actually lines up with the (now deprecated?) documentation though:
 On Linux, if provmethod for the node is install, netboot or statelite , the
 os, profile, and arch of that node are used to search for the files in
 /install/custom/provmethod/platform first, and then in
 /opt/xcat/share/xcat/provmethod/platform.
 http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Provmethod%3Dinstall,netboot_or_statelite


 In any case, what is the recommended thing for me to do given my
 configuration?

 -Josh


 On Tue, Jan 21, 2014 at 4:24 PM, Josh Nielsen jniel...@hudsonalpha.org
 wrote:

 Okay, great. I have the new service image created now. I am confused though
 why it is giving the node the incorrect provmethod though when I nodeset it.
 Here is it the output from my terminal:

 # lsdef -t osimage -l centos6.4-x86_64-install-service
 Object name: centos6.4-x86_64-install-service
 imagetype

Re: [xcat-user] Frustrating time with sequential node discovery

2014-01-22 Thread Josh Nielsen

Whoops, I meant to write Jarrod not Jarros. I went a little Koine Greek
on your name there. Sorry about that. :-)


On Wed, Jan 22, 2014 at 10:22 AM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

 Jarros, I know you haven't been following the whole thread but Jonathan's
 problem (which this discussion originated from) is that somehow his
 installation is still using the /tftpboot/pxelinux.cfg/ files despite
 being up to date with genesis. I saw a response on the list from you to
 someone else about a similar problem in November 2013:

 http://sourceforge.net/mailarchive/message.php?msg_id=31683484
 http://sourceforge.net/mailarchive/message.php?msg_id=31686689

 Although I am not experiencing the same problem I too would be interested
 in the solution. It has always been challenging for me to understand the
 xCAT PXE deployment process, which is why I have 5-10 pages of self-written
 documentation and debugging info in a Google Doc about it. He posted his 
 dhcpd.conf
 in his latest email. Any thoughts?

 -Josh


 On Wed, Jan 22, 2014 at 10:11 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 1) If running 2.8, go ahead and delete nbroot-core.  genesis is far more
 maintainable and easier to muck with on the end point (e.g. having full
 fledged glibc)
 2) The greatest common denominator remains switch based.  It's the only
 frequently instrumented topology cue that is pretty universal.  For bladed
 solutions there is potential for the chassis manager to be a good topology
 cue.  I am curious what goes awry with switch based discovery.  Without a
 topology cue, then the choices are sequential discovery (which I frankly
 haven't used myself) or using/scripting nodediscoverls/nodediscoverdef.  I
 personally think the latter is actually better and can be trivially made
 into a 'sequential' discovery using straightforward scripting..

 3) Tom Alandt is the best person to discuss what can/can't be done by IBM
 mfg.  On the other hand, we *could* take some measures to make you
 impervious to the conflict.  The 'lsslp --flexdiscover' for its respective
 bits is impervious to IP conflict issues and will automatically fix it.
  It's not a huge stretch to make that pan out for rackmount systems (though
 currently it's hard to pull off without *some* topology cue).
 [image: Inactive hide details for David D Johnson ---01/22/2014 08:11:53
 AM---I've been lurking on this discussion, and just checked to]David D
 Johnson ---01/22/2014 08:11:53 AM---I've been lurking on this discussion,
 and just checked to see what we've got -- nbroot or genesis --

 From: David D Johnson david_john...@brown.edu
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 01/22/2014 08:11 AM
 Subject: Re: [xcat-user] Frustrating time with sequential node discovery
 --



 I've been lurking on this discussion, and just checked to see what we've
 got -- nbroot or genesis -- and we have both of them.

 I had given up on node discovery years ago, we originally used the switch
 port numbers and forwarding tables to assign node names.  Now I use ASU to
 collect the macs, and populate the mac table with a bit of grep and awk.
  But last week we powered on a rack of 20 non-ibm nodes, and I was wishing
 we had something easier since ASU didn't work for them.

 So I have two questions --
 1) Can I safely delete the xCAT-nbroot-core* RPMS ?
 2) What is the current best practice method?  What about for non-ibm
 hardware?

 No, three questions
 3) How do you get IBM manufacturing to use a specific different 172.29.X
 for each rack they build for you?
 We've had three racks arrive in different months all with 172.29.101
 addresses for the IMM, and I have to spend 5-10 minutes reprogramming each
 one.  I can't put them on the same network until the conflicts are gone.

 What's hard for me is that we get new nodes only a couple times a year,
 and I forget everything in between.

 Thanks,
  -- ddj

 On Jan 22, 2014, at 7:12 AM, Lissa Valletta 
 *lis...@us.ibm.com*lis...@us.ibm.com
 wrote:


xCAT-nbroot-core*   was replaced by  xCAT-genesis-* in xCAT 2.8.

Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



graycol.gifXiao Peng Wang ---01/22/2014 02:58:13 AM---Why do you
say that you need nbk.x86_64? Is this file listed in the
/tftpboot/xcat/xnba/nets/?

From: Xiao Peng Wang *w...@cn.ibm.com* w...@cn.ibm.com
To: xCAT Users Mailing list 
 *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net,

Cc: xCAT Users Mailing list 
 *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net

Date: 01/22/2014 02:58 AM
Subject: Re: [xcat-user] Frustrating time with sequential node
discovery

--



Why do you say that you need nbk.x86_64? Is this file listed in the
/tftpboot/xcat/xnba/nets/?

With the latest xCAT build, it needs
/tftpboot/xcat/genesis.kernel.x86_64 instead of nbk.*

Thanks

[xcat-user] Sequential discovery does not support virtual machines message

2014-01-22 Thread Josh Nielsen

Sorry list denizens, another issue I ran into. I am trying to rediscover a
VM that I already once discovered with sequential discovery, but now it is
acting up. From my /var/log/messages log when I PXE-boot the VM:

Jan 22 16:28:36 x3650-head01 dhcpd: DHCPDISCOVER from 00:50:56:bc:0c:db via
eth0
Jan 22 16:28:36 x3650-head01 dhcpd: DHCPOFFER on 10.20.242.251 to
00:50:56:bc:0c:db via eth0
Jan 22 16:28:36 x3650-head01 dhcpd: DHCPREQUEST for 10.20.242.251
(10.29.0.1) from 00:50:56:bc:0c:db via eth0
Jan 22 16:28:36 x3650-head01 dhcpd: DHCPACK on 10.20.242.251 to
00:50:56:bc:0c:db via eth0
Jan 22 16:28:36 x3650-head01 xCAT[7847]: xCAT: Allowing getcredentials
x509cert
Jan 22 16:28:37 x3650-head01 xCAT[28098]: xcatd: Processing discovery
request from 10.20.242.251
Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential Discovery: Processing
Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential discovery does not
support virtual machines, exiting...
Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential Discovery: Processing
Jan 22 16:28:37 x3650-head01 xCAT[28098]: Sequential discovery does not
support virtual machines, exiting...
Jan 22 16:28:42 x3650-head01 xCAT[28098]: xcatd: Processing discovery
request from 10.20.242.251
Jan 22 16:28:42 x3650-head01 xCAT[28098]: Sequential Discovery: Processing
Jan 22 16:28:42 x3650-head01 xCAT[28098]: Sequential discovery does not
support virtual machines, exiting...

I needed to clean things up a bit from all my previous deploys to that node
anyway so I decided to start clean by deleting the mac entry from the xCAT
table and try again, but it didn't make a difference, I still see the same
message. This worked a few days ago with no such messages and it captured
the VMs mac with no problem. What do I need to change so that I can start
from scratch to discover that VM with Sequential Discovery again?

Thanks,
Josh
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Frustrating time with sequential node discovery

2014-01-21 Thread Josh Nielsen

Evidently though something in his xCAT setup it creating the files in
/tftpboot/pxelinux.cfg/ with reference to xnba just like my
installation. Where does xCAT grab the configuration for that? Maybe
it was because I didn't do a completely clean install and did an
in-place upgrade, but my cluster actually works perfectly with both
xnba  genesis installed because it uses xnba first to bootstrap and
then requests the Genesis image. xCAT must support that scenario else
I haven't the slightest idea by what miracle my installation is
running with such a configuration. :-)

-Josh

On Tue, Jan 21, 2014 at 2:58 PM, Russell Jones
russell-l...@jonesmail.me wrote:
 xNBA is a customized gpxe image that xCAT uses.

 NBFS is the older maintenance image that was used for if you set your
 node to boot to shell, or booted a runimage script. NBFS is deprecated,
 and Genesis replaced NBFS as the maintenance image for these tasks.

 In a standard 2.8 install, there should no longer be any nbk/nbfs RPMs
 installed - Genesis replaced them.

 perl-xCAT-2.8.3-snap201311122316.noarch
 xCAT-2.8.3-snap201311122318.x86_64
 xCAT-client-2.8.3-snap201311122316.noarch
 xCAT-genesis-base-x86_64-2.8-snap201308090229.noarch
 elilo-xcat-3.14-4.noarch
 xCAT-server-2.8.3-snap201311122316.noarch
 xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318.noarch
 ipmitool-xcat-1.8.11-3.x86_64
 conserver-xcat-8.1.16-10.x86_64
 xCAT-buildkit-2.8.3-snap201311122318.noarch
 syslinux-xcat-3.86-2.noarch



 On 1/21/2014 2:38 PM, Josh Nielsen wrote:
 Hi Jonathan,

 Yes, I definitely think that would cause a problem. This is jogging my
 memory because I think that when the new Genesis boot loader was
 rolled out in the first version of xCAT that supported it that I faced
 a similar problem. I had assumed that only Genesis was needed but xNBA
 is still used an an intermediate image even if it is no longer the
 final image. I will check my yum repos as soon as I can - but by some
 unfortunate coincidence I just discovered that YUM is not working
 since our RHEL license expired three days ago (unbeknownst to me until
 10 minutes ago). Do you have xCAT-genesis-x86_64 and elilo-xCAT? You
 may even have to pull xNBA images from an older install(?) and then
 run mknb to build the images.

 I remember downloading the tarred files with the RPM manually and
 creating a local repo for xCAT. Whenever I get YUM back I'll give you
 more specifics if I can.

 -Josh

 On Tue, Jan 21, 2014 at 1:54 PM, Jonathan Mills jonmi...@renci.org wrote:
 Josh,

 I don't doubt that you're on to something.  But if this is the case, it
 means my systems are missing some files, namely:

 /tftpboot/xcat/nbk.x86_64
 /tftpboot/xcat/nbfs.x86_64.gz

 Can you tell me what RPM installed those files on your system?  They
 don't exist on mine, and even a 'yum provides' doesn't find them.


 On 01/21/2014 11:51 AM, Josh Nielsen wrote:
 Hi Jonathan,

 It is my understanding, from extensive debugging and notes that I have
 taken about the xCAT netbooting process in the past, that xCAT uses a
 two-stage image deployment method. It will first come up with a more
 generic boot image (normally xnba or sometimes yaboot) which - when it
 contacts the xCAT headnode (or the node handling DHCP requests) - the
 headnode will then recognize the current image on the client that is
 sending requests to DHCP for further boot instructions, and will tell
 the client to then load another image based on the subnet and image type
 it is currently using. For example my headnode's /etc/dhcpd.conf file
 has an entry that looks like this:

 hared-network eth0 {
 subnet 10.20.0.0 netmask 255.255.0.0 {
   max-lease-time 43200;
   min-lease-time 43200;
   default-lease-time 43200;
   next-server  10.20.0.1;
   option log-servers 10.20.0.1;
   option ntp-servers 10.20.0.1;
   option domain-name x;
   option domain-name-servers  10.20.0.1;
   if option user-class-identifier = xNBA and option
 client-architecture = 00:00 { #x86, xCAT Network Boot Agent
  always-broadcast on;
  filename = 
 http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16;;
   } else if option user-class-identifier = xNBA and option
 client-architecture = 00:09 { #x86, xCAT Network Boot Agent
  filename =
 http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16.uefi;;
   } else if option client-architecture = 00:00  { #x86
 filename xcat/xnba.kpxe;
   } else if option vendor-class-identifier = Etherboot-5.4  { #x86
 filename xcat/xnba.kpxe;
   } else if option client-architecture = 00:07 { #x86_64 uefi
  filename xcat/xnba.efi;
   } else if option client-architecture = 00:09 { #x86_64 uefi
 alternative id
  filename xcat/xnba.efi;
   } else if option client-architecture = 00:02 { #ia64
  filename elilo.efi;
   } else if substring(filename,0,1) = null { #otherwise, provide
 yaboot if the client isn't specific
  filename

[xcat-user] Determine which otherpkgs.pkglist file is being used

2014-01-17 Thread Josh Nielsen

A node I am deploying seems to be grabbing the wrong otherpkgs.pkglist
file. I have placed pkglist files in /install/custom/install/centos and
created one the service group/profile. In that directory are these files:

compute.otherpkgs.pkglist
compute.pkglist
compute.tmpl
hinode01.tmpl
service.otherpkgs.pkglist
service.tmpl

I want to use the service.otherpkgs.pkglist file. I deployed a node
xcat-serv1 which belongs to the service group but it did not install the
RPMs I specified in the otherpkgs.pkglist file for 'service'. When I did an
lsdef I saw that the profile was set to compute and realized that would
be a problem so I changed it to 'service' as well and tried an updatenode
xcat-serv1 otherpkgs but it didn't make a difference. Here is the current
node definition:

[root@x3650-head01 centos]# lsdef xcat-serv1
Object name: xcat-serv1
arch=x86_64
currchain=boot
currstate=boot
groups=service,ipmi,all
initrd=xcat/osimage/centos6.4-x86_64-install-compute/initrd.img
installnic=mac
kernel=xcat/osimage/centos6.4-x86_64-install-compute/vmlinuz
netboot=xnba
nodetype=osi
os=centos6.4

postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
primarynic=mac
profile=service
provmethod=centos6.4-x86_64-install-compute
snipped

The groups and profile look correct now, but I still can't get the
right packages to install. Is there a particular line in the otherpkgs
postscript I could use to print a variable with the file name of the file
chosen/matched against? I looked but only found a variable with the actual
parsed packages from whatever file it chose. It looks like it is using the
compute.otherpkgs.pkglist. I can't quite figure out why it wouldn't match
though.

This page seems to indicate that my directory and nomenclature for the
files is correct:
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Using_Provmethod%3Dinstall,netboot_or_statelite.
And I know the kickstarts from that folder work since they are used during
provisioning.

It is usually something simple. Does anyone see what I am missing?

Thanks,
Josh
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Determine which otherpkgs.pkglist file is being used

2014-01-17 Thread Josh Nielsen

Nevermind, it was the funky provmethod value. Would provisioning it with
nodeset xcat-serv1 osimage=centos6.4-x86_64-install-compute change the
provmethod to the same thing as the osimage? The provmethod needs to
simply be install. Not sure how it got the same name as the osimage.

-Josh

On Fri, Jan 17, 2014 at 5:43 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

compute.otherpkgs.pkglist
compute.pkglist
compute.tmpl
hinode01.tmpl
service.otherpkgs.pkglist
service.tmpl

postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
primarynic=mac
profile=service
provmethod=centos6.4-x86_64-install-compute
snipped

It is usually something simple. Does anyone see what I am missing?

Thanks,
Josh

--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Update schema after upgrade prematurely terminated

2014-01-16 Thread Josh Nielsen

I actually checked that yesterday when you mentioned it and I had installed
all the latest RPMs the first time around, and I had already restarted the
daemon several times. So I'm not sure why it was misreporting the version
for a while.

-Josh


On Thu, Jan 16, 2014 at 7:04 AM, Lissa Valletta lis...@us.ibm.com wrote:

 I expect xcatd restarted cleanly for some reason.Did you run yum
 update '*xCAT*'  finally instead of yum update xcat.  That would have done
 it also.

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/14/2014 03:37:53
 PM---Suddenly the lsxcatd command is returning 2.8.3 to me now. I]Josh
 Nielsen ---01/14/2014 03:37:53 PM---Suddenly the lsxcatd command is
 returning 2.8.3 to me now. I'm not sure why it would take time to up

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/14/2014 03:37 PM
 Subject: Re: [xcat-user] Update schema after upgrade prematurely
 terminated
 --



 Suddenly the lsxcatd command is returning 2.8.3 to me now. I'm not sure
 why it would take time to update, but it looks fine now:

 [root@x3650-head01 init.d]# lsxcatd -a
 Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013)
 This is a Management Node
 cfgloc=mysql:dbname=xcatdb;host=10.X.X.X|xx
 dbengine=mysql
 dbname=xcatdb
 dbhost=10.X.X.X
 dbadmin=xx

 And here is my rpm query:

 [root@x3650-head01 init.d]# rpm -qa | grep xCAT
 xCAT-nbroot-oss-ppc64-2.0-snap200801291320
 xCAT-nbkernel-x86-2.6.18_128-8
 xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318
 perl-xCAT-2.8.3-snap201311122316
 xCAT-server-2.8.3-snap201311122316
 xCAT-nbroot-core-x86-2.3.1-snap200911062009
 xCAT-2.8.3-snap201311122318
 xCAT-nbroot-oss-x86_64-2.0-snap200801291344
 xCAT-nbroot-core-x86_64-2.3.1-snap200911062009
 xCAT-nbkernel-x86_64-2.6.18_128-8
 xCAT-genesis-base-x86_64-2.8-snap201308090229
 xCAT-nbroot-oss-x86-2.0-snap200804021050
 xCAT-nbroot-core-ppc64-2.3.1-snap200911062009
 xCAT-nbkernel-ppc64-2.6.18_92-4
 xCAT-buildkit-2.8.3-snap201311122318
 xCAT-client-2.8.3-snap201311122316

 Except for the ppc64 packages (which I don't really need anyway since we
 use x86_64) everything is up to the current version. The xCAT nbroot-core
 and nbkernel RPMs aren't up to 2.8.X but it seems that they are still at
 the most current version available (I have my yum repos pointing to the
 latest tarballs of the core and dep packages).

 Thanks for all the help  suggestions. I'm just going to try using it now.
 This was all prep so that I can deploy some service nodes (as mentioned in
 my other emails to the mailing list).

 Thanks,
 Josh







 On Tue, Jan 14, 2014 at 1:55 PM, Lissa Valletta 
 *lis...@us.ibm.com*lis...@us.ibm.com
 wrote:

Two possibilities the daemon did not restart correctly and that has
been common when upgrading from  2.7.6  to higher levels.
service xcatd stop
ps -ef | grep xcatd
kill -9 any hung xcatd processes
service xcatd start

The other thing is make sure your rpms are all at the 2.8.3 level.

rpm -qa | grep xCAT


Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102



[image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:08
PM---I seem to be getting myself in all kinds of trouble with xCA]Josh
Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all
kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.

From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org

To: xCAT Users Mailing list 
 *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net,

Date: 01/14/2014 01:05 PM
Subject: [xcat-user] Update schema after upgrade prematurely terminated
--



I seem to be getting myself in all kinds of trouble with xCAT lately.

I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core
and dep tarballs, and made local yum repos, and installed all the newer
packages from them. Then to check if the daemon had been upgraded I did an
'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon
manually from the command line with the verbose flag but wasn't expecting
to see the schema updates (honestly all I saw were SQL commands scrolling
in the screen - I didn't know they were for the schema at the time) and I
panicked and hit CRTL+C. This halted the daemon  updates but the next time
I tried to run it I got an error although the daemon ran in the background.
This is from my terminal:

root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v
updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT
updateschema: Running ALTER TABLE mpa ADD `slots` TEXT
updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT
updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT

[xcat-user] Makedns wiping out DNS records

2014-01-16 Thread Josh Nielsen

Regardless of the version of xCAT we have used in the past, my coworkers
and I have noticed that running makedns with any following options can be
dangerous for us, because something happens where it will wipe out all the
records in DNS. If we run makedns with no options generally any new nodes
that have been added to /etc/hosts will be parsed and added appropriately.
Maybe I am misunderstanding what happens when the command is run.

Here is a recent example. My entry for a new node xcat-serv1 for some
reason didn't have the reverse lookup from the IP in the DNS file for that
subnet (db.10.20), and so I decided to try to remove just the DNS entry for
that node and then re-add it. Here is what I ran:


# makedns -d xcat-serv1
Handling xcat-serv1 in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Updating zones.
Completed updating zones.
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.


Then I ran it with the -n parameter to re-add that node (with
XCATBYPASS=1):


# export XCATBYPASS=1
[root@x3650-head01 centos]# makedns -n xcat-serv1
Unrecognized escape \h passed through at
/opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408.
Subroutine handled_commands redefined at /opt/xcat/lib/perl/xCAT_plugin/
hosts.pm line 28.
Subroutine delnode redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 33.
Subroutine addnode redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 58.
Subroutine build_line redefined at
/opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 117.
Subroutine addotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/
hosts.pm line 182.
Subroutine delotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/
hosts.pm line 209.
Subroutine add_hosts_content redefined at /opt/xcat/lib/perl/xCAT_plugin/
hosts.pm line 234.
Subroutine process_request redefined at /opt/xcat/lib/perl/xCAT_plugin/
hosts.pm line 305.
Subroutine writeout redefined at
/opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 511.
Subroutine donics redefined at /opt/xcat/lib/perl/xCAT_plugin/hosts.pm line
545.
Subroutine getIPdomain redefined at
/opt/xcat/lib/perl/xCAT_plugin/hosts.pmline 801.
Handling xcat-serv1 in /etc/hosts.
Getting reverse zones, this may take several minutes for a large cluster.
Completed getting reverse zones.
Stopping named: .  [  OK  ]
Updating zones.
Completed updating zones.
Restarting named
Restarting named complete
Updating DNS records, this may take several minutes for a large cluster.
Completed updating DNS records.
DNS setup is completed




When I did an 'ls' in /var/named many records were missing and the ones
still left we virtually empty except one entry for the headnode:

BEFORE:

# ls /var/named
data  db.10.29  db.10.40  db.127.0.0db.172.26.41  db.172.29
 db.192.168.100  db.haig   db.morgan.haib.org.jnl
db.10.29.jnl  db.10.60  db.172.20 db.172.26.42  db.172.30  db.cache
   db.morgan   slaves
db.10.20  db.10.30  db.10.60.jnl  db.172.26.40  db.172.26.43  db.172.40
 db.haib db.morgan.haib.org

AFTER:
# ls /var/named
data  db.10.20.jnl  db.10.29.jnl  db.10.30.jnl  db.10.40.jnl
 db.10.60.jnl  db.172.26.41  db.172.26.43db.morgan.haib.org.jnl
db.10.20  db.10.29  db.10.30  db.10.40  db.10.60
 db.172.26.40  db.172.26.42  db.morgan.haib.org  slaves


As you can see several files are missing, Luckily I made a full backup
yesterday and was able to restore the old files.

From /var/log/messages you can see the initial deletion message for
xcat-serv1:


Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '11.102.40.10.IN-ADDR.ARPA'
PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '7.101.40.10.IN-ADDR.ARPA' PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '10.102.40.10.IN-ADDR.ARPA'
PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '27.102.40.10.IN-ADDR.ARPA'
PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '33.101.40.10.IN-ADDR.ARPA'
PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone '40.10.IN-ADDR.ARPA/IN': adding an RR at '1.102.40.10.IN-ADDR.ARPA' PTR
Jan 16 10:32:55 x3650-head01 named[8249]: client 10.20.0.1#56080: updating
zone

Then for the makedens -a xcat-serv you actually see it update EVERY ZONE:

Jan 16 10:32:42 x3650-head01 named[8249]: client 10.20.0.1#43788: updating
zone '30.10.IN-ADDR.ARPA/IN': adding an RR at '1.50.30.10.IN-ADDR.ARPA' PTR
Jan 16 10:32:42 x3650-head01 named[8249]: client 10.20.0.1#43788: updating
zone '30.10.IN-ADDR.ARPA/IN':

Re: [xcat-user] Makedns wiping out DNS records

2014-01-16 Thread Josh Nielsen

Okay, I will look through that document. This is the first and only time
I've used XCATBYPASS with that command (to get more clarity into what was
going on in the background). Makedns has been presenting us with the
problem(s) I mentioned for well over a year now, with previous versions of
xCAT as well. I'm guessing that if this doesn't happen for other people
that it must have been some setting when xCAT was originally installed
which was misconfigured, or maybe something funny in one of the xCAT db
tables like 'site'. I just wanted to see if anyone has had problems with
using the -n or -d options deleting more than they are supposed to.

-Josh


On Thu, Jan 16, 2014 at 11:23 AM, Lissa Valletta lis...@us.ibm.com wrote:

 We have an entire doc on name resolution  also I would not make it a
 common  practice to use XCATBYPASS mode.  Running with the xcatd is much
 more secure and some things do not work well in XCATBYPASS mode.

 https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Name_Resolution

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/16/2014 12:03:28
 PM---Regardless of the version of xCAT we have used in the past,]Josh
 Nielsen ---01/16/2014 12:03:28 PM---Regardless of the version of xCAT we
 have used in the past, my coworkers and I have noticed that run

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/16/2014 12:03 PM
 Subject: [xcat-user] Makedns wiping out DNS records
 --



 Regardless of the version of xCAT we have used in the past, my coworkers
 and I have noticed that running makedns with any following options can be
 dangerous for us, because something happens where it will wipe out all the
 records in DNS. If we run makedns with no options generally any new nodes
 that have been added to /etc/hosts will be parsed and added appropriately.
 Maybe I am misunderstanding what happens when the command is run.

 Here is a recent example. My entry for a new node xcat-serv1 for some
 reason didn't have the reverse lookup from the IP in the DNS file for that
 subnet (db.10.20), and so I decided to try to remove just the DNS entry for
 that node and then re-add it. Here is what I ran:

 
 # makedns -d xcat-serv1
 Handling xcat-serv1 in /etc/hosts.
 Getting reverse zones, this may take several minutes for a large cluster.
 Completed getting reverse zones.
 Updating zones.
 Completed updating zones.
 Updating DNS records, this may take several minutes for a large cluster.
 Completed updating DNS records.
 

 Then I ran it with the -n parameter to re-add that node (with
 XCATBYPASS=1):

 
 # export XCATBYPASS=1
 [root@x3650-head01 centos]# makedns -n xcat-serv1
 Unrecognized escape \h passed through at
 /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408.
 Subroutine handled_commands redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 28.
 Subroutine delnode redefined at 
 /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line
 33.
 Subroutine addnode redefined at 
 /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line
 58.
 Subroutine build_line redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 117.
 Subroutine addotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 182.
 Subroutine delotherinterfaces redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 209.
 Subroutine add_hosts_content redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 234.
 Subroutine process_request redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 305.
 Subroutine writeout redefined at 
 /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line
 511.
 Subroutine donics redefined at 
 /opt/xcat/lib/perl/xCAT_plugin/*hosts.pm*http://hosts.pm/ line
 545.
 Subroutine getIPdomain redefined at /opt/xcat/lib/perl/xCAT_plugin/
 *hosts.pm* http://hosts.pm/ line 801.
 Handling xcat-serv1 in /etc/hosts.
 Getting reverse zones, this may take several minutes for a large cluster.
 Completed getting reverse zones.
 Stopping named: .  [  OK  ]
 Updating zones.
 Completed updating zones.
 Restarting named
 Restarting named complete
 Updating DNS records, this may take several minutes for a large cluster.
 Completed updating DNS records.
 DNS setup is completed
 
 
 

 When I did an 'ls' in /var/named many records were missing and the ones
 still left we virtually empty except one entry for the headnode:

 BEFORE:

 # ls /var/named
 data  db.10.29  db.10.40  db.127.0.0db.172.26.41
  db.172.29  db.192.168.100  db.haig   db.morgan.haib.org.jnl
 db.10.29.jnl  db.10.60  db.172.20 db.172.26.42  db.172.30
  db.cachedb.morgan   slaves
 db.10.20  db.10.30  db

Re: [xcat-user] Update schema after upgrade prematurely terminated

2014-01-14 Thread Josh Nielsen

Unfortunately, no. Would the MySQL database have to be off as well? I
thought the daemon might automatically restart after the upgrade, but you
are right that I should have turned it off first.

At this point things still work, but I am most confused about lsxcatd still
returning 2.7.3 to me. Maybe it is getting that as a string from a database
table that hasn't been updated yet instead of directly from the xcatd
binary? I have some of the new commands like nodediscoverls and
nodediscoverstart
which weren't there before so I'm pretty sure everything was upgraded.

I did a yum update xcat and for good measure yum install
xCAT-nbkernel-x86_64 xCAT-genesis-x86_64 xCAT-client
xCAT-nbroot-core-x86_64 xCAT-server xCAT-nbroot-oss-x86_64. As long as
nothing is operating incorrectly though I guess I can just presume
everything is at the latest version.

-Josh


On Tue, Jan 14, 2014 at 12:12 PM, Russell Jones
russell-l...@jonesmail.mewrote:

  Did you remember to stop the xcatd service before doing the upgrade?

 On 1/14/2014 12:02 PM, Josh Nielsen wrote:

 I seem to be getting myself in all kinds of trouble with xCAT lately.

  I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and
 dep tarballs, and made local yum repos, and installed all the newer
 packages from them. Then to check if the daemon had been upgraded I did an
 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon
 manually from the command line with the verbose flag but wasn't expecting
 to see the schema updates (honestly all I saw were SQL commands scrolling
 in the screen - I didn't know they were for the schema at the time) and I
 panicked and hit CRTL+C. This halted the daemon  updates but the next time
 I tried to run it I got an error although the daemon ran in the background.
 This is from my terminal:

 root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v
 updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT
 updateschema: Running ALTER TABLE mpa ADD `slots` TEXT
 updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT
 updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT
 updateschema: Running ALTER TABLE switches ADD `protocol` TEXT
 updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT
 updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT
 updateschema: Running ALTER TABLE vm ADD `mgr` TEXT
 updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT
 updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT
 updateschema: Running ALTER TABLE vm ADD `cluster` TEXT
 updateschema: Running ALTER TABLE vm ADD `physlots` TEXT
 updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT
 updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT
 updateschema: Running ALTER TABLE domain ADD `type` TEXT
 updateschema: Running ALTER TABLE nodegroup ADD `membergroups` TEXT
 updateschema: Running ALTER TABLE noderes ADD `nameservers` TEXT
 updateschema: Running ALTER TABLE osimage ADD `groups` TEXT
 updateschema: Running ALTER TABLE osimage ADD `description` TEXT
  updateschema: Running ALTER TABLE osimage ADD `osdistroname` TEXT
 updateschema: Running ALTER TABLE osimage ADD `osupdatename` VARCHAR(1024)
 updateschema: Running ALTER TABLE osimage ADD `cfmdir` TEXT
 updateschema: Running ALTER TABLE osimage ADD `serverrole` TEXT
 updateschema: Running ALTER TABLE osimage ADD `isdeletable` TEXT
 updateschema: Running ALTER TABLE osimage ADD `kitcomponents` TEXT
 updateschema: Running ALTER TABLE passwd ADD `authdomain` TEXT
 updateschema: Running ALTER TABLE nodepos ADD `height` TEXT
 updateschema: Running ALTER TABLE nodehm ADD `cmdmapping` TEXT

  [root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v
 Unrecognized escape \h passed through at
 /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408.

  Despite this I can still use 'service xcatd restart' and start the
 daemon, and I can run psh commands and tabdump etcetera, but now I'm unsure
 whether the xCAT MySQL database is in an inconsistent state and has not
 received all the schema updates it was supposed to. I was hoping that by
 running xcatd -v again I would see it try again but I haven't.

 And even after all that I still see 2.7.3 when I type 'lsxcatd'. Maybe I
 need to restart the node.

 Any clue

Re: [xcat-user] Update schema after upgrade prematurely terminated

2014-01-14 Thread Josh Nielsen

Suddenly the lsxcatd command is returning 2.8.3 to me now. I'm not sure why
it would take time to update, but it looks fine now:

[root@x3650-head01 init.d]# lsxcatd -a
Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013)
This is a Management Node
cfgloc=mysql:dbname=xcatdb;host=10.X.X.X|xx
dbengine=mysql
dbname=xcatdb
dbhost=10.X.X.X
dbadmin=xx

And here is my rpm query:

[root@x3650-head01 init.d]# rpm -qa | grep xCAT
xCAT-nbroot-oss-ppc64-2.0-snap200801291320
xCAT-nbkernel-x86-2.6.18_128-8
xCAT-genesis-scripts-x86_64-2.8.3-snap201311122318
perl-xCAT-2.8.3-snap201311122316
xCAT-server-2.8.3-snap201311122316
xCAT-nbroot-core-x86-2.3.1-snap200911062009
xCAT-2.8.3-snap201311122318
xCAT-nbroot-oss-x86_64-2.0-snap200801291344
xCAT-nbroot-core-x86_64-2.3.1-snap200911062009
xCAT-nbkernel-x86_64-2.6.18_128-8
xCAT-genesis-base-x86_64-2.8-snap201308090229
xCAT-nbroot-oss-x86-2.0-snap200804021050
xCAT-nbroot-core-ppc64-2.3.1-snap200911062009
xCAT-nbkernel-ppc64-2.6.18_92-4
xCAT-buildkit-2.8.3-snap201311122318
xCAT-client-2.8.3-snap201311122316

Except for the ppc64 packages (which I don't really need anyway since we
use x86_64) everything is up to the current version. The xCAT nbroot-core
and nbkernel RPMs aren't up to 2.8.X but it seems that they are still at
the most current version available (I have my yum repos pointing to the
latest tarballs of the core and dep packages).

Thanks for all the help  suggestions. I'm just going to try using it now.
This was all prep so that I can deploy some service nodes (as mentioned in
my other emails to the mailing list).

Thanks,
Josh







On Tue, Jan 14, 2014 at 1:55 PM, Lissa Valletta lis...@us.ibm.com wrote:

 Two possibilities the daemon did not restart correctly and that has been
 common when upgrading from  2.7.6  to higher levels.
 service xcatd stop
 ps -ef | grep xcatd
 kill -9 any hung xcatd processes
 service xcatd start

 The other thing is make sure your rpms are all at the 2.8.3 level.

 rpm -qa | grep xCAT


 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:08
 PM---I seem to be getting myself in all kinds of trouble with xCA]Josh
 Nielsen ---01/14/2014 01:05:08 PM---I seem to be getting myself in all
 kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/14/2014 01:05 PM
 Subject: [xcat-user] Update schema after upgrade prematurely terminated
 --



 I seem to be getting myself in all kinds of trouble with xCAT lately.

 I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and
 dep tarballs, and made local yum repos, and installed all the newer
 packages from them. Then to check if the daemon had been upgraded I did an
 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon
 manually from the command line with the verbose flag but wasn't expecting
 to see the schema updates (honestly all I saw were SQL commands scrolling
 in the screen - I didn't know they were for the schema at the time) and I
 panicked and hit CRTL+C. This halted the daemon  updates but the next time
 I tried to run it I got an error although the daemon ran in the background.
 This is from my terminal:

 root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v
 updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT
 updateschema: Running ALTER TABLE mpa ADD `slots` TEXT
 updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT
 updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT
 updateschema: Running ALTER TABLE switches ADD `protocol` TEXT
 updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT
 updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT
 updateschema: Running ALTER TABLE vm ADD `mgr` TEXT
 updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT
 updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT
 updateschema: Running ALTER TABLE vm ADD `cluster` TEXT
 updateschema: Running ALTER TABLE vm ADD `physlots` TEXT
 updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT
 updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT

Re: [xcat-user] Update schema after upgrade prematurely terminated

2014-01-14 Thread Josh Nielsen

Whew. Great! I thought I caused that by terminating the schema update early
at first. So, speaking of which, would the schema update have completed
when I launched it the next time? I'm still not clear on that. Would the
xcat daemon just refuse to launch if it hadn't completed the schema update?

Thanks,
Josh


On Tue, Jan 14, 2014 at 2:06 PM, Lissa Valletta lis...@us.ibm.com wrote:

 Unrecognized escape \h passed through at
 /opt/xcat/lib/perl/xCAT/zvmUtils.pm line 2408  is a known problem.You
 can ignore it.  The file is only used on z-series.
  if you want to get rid of it
  mv  /opt/xcat/lib/perl/xCAT/zvmUtils.pm
 /opt/xcat/lib/perl/xCAT/zvmUtils.pm.save   and
 mv  /opt/xcat/lib/perl/xCAT_plugin/zvm.pm
 /opt/xcat/lib/perl/xCAT_plugin/zvm.pm.save

 This is the defect:
 https://sourceforge.net/p/xcat/bugs/3946/

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Jarrod B Johnson---01/14/2014 01:33:58
 PM---FYI, that should be safe. The schema updates *within* 2.]Jarrod B
 Johnson---01/14/2014 01:33:58 PM---FYI, that should be safe.  The schema
 updates *within* 2.x have generally been backward compatible.

 From: Jarrod B Johnson/Raleigh/IBM@IBMUS
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/14/2014 01:33 PM
 Subject: Re: [xcat-user] Update schema after upgrade prematurely
 terminated
 --



 FYI, that should be safe.  The schema updates *within* 2.x have generally
 been backward compatible.  The schema update is done (it checks the tables
 and columns instead of something like a version string), so you are good on
 that front.

 The error message you are getting is unrelated, would have to ask the
 mainframe guys about what that expression is trying to do...

 XCATBYPASS=1 nodels --version
 nodels --version
 ps axf|grep -i xcatd

 [image: Inactive hide details for Josh Nielsen ---01/14/2014 01:05:02
 PM---I seem to be getting myself in all kinds of trouble with xCA]Josh
 Nielsen ---01/14/2014 01:05:02 PM---I seem to be getting myself in all
 kinds of trouble with xCAT lately. I was upgrading from xCAT 2.7.

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 01/14/2014 01:05 PM
 Subject: [xcat-user] Update schema after upgrade prematurely terminated
 --



 I seem to be getting myself in all kinds of trouble with xCAT lately.

 I was upgrading from xCAT 2.7.3 to 2.8.3, I grabbed the latest core and
 dep tarballs, and made local yum repos, and installed all the newer
 packages from them. Then to check if the daemon had been upgraded I did an
 'lsxcatd -v' and it still said 2.7.3. So I decided to run the xcat daemon
 manually from the command line with the verbose flag but wasn't expecting
 to see the schema updates (honestly all I saw were SQL commands scrolling
 in the screen - I didn't know they were for the schema at the time) and I
 panicked and hit CRTL+C. This halted the daemon  updates but the next time
 I tried to run it I got an error although the daemon ran in the background.
 This is from my terminal:

 root@x3650-head01 yum.repos.d]# /opt/xcat/sbin/xcatd -v
 updateschema: Running ALTER TABLE mpa ADD `displayname` TEXT
 updateschema: Running ALTER TABLE mpa ADD `slots` TEXT
 updateschema: Running ALTER TABLE mpa ADD `urlpath` TEXT
 updateschema: Running ALTER TABLE servicenode ADD `dhcpinterfaces` TEXT
 updateschema: Running ALTER TABLE switches ADD `protocol` TEXT
 updateschema: Running ALTER TABLE hypervisor ADD `interface` TEXT
 updateschema: Running ALTER TABLE vmmaster ADD `specializeparameters` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `boottarget` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `addkcmdline` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `partitionfile` TEXT
 updateschema: Running ALTER TABLE linuximage ADD `driverupdatesrc` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatus` TEXT
 updateschema: Running ALTER TABLE nodelist ADD `updatestatustime` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrange` TEXT
 updateschema: Running ALTER TABLE networks ADD `staticrangeincrement` TEXT
 updateschema: Running ALTER TABLE vm ADD `mgr` TEXT
 updateschema: Running ALTER TABLE vm ADD `storagecache` TEXT
 updateschema: Running ALTER TABLE vm ADD `storageformat` TEXT
 updateschema: Running ALTER TABLE vm ADD `cluster` TEXT
 updateschema: Running ALTER TABLE vm ADD `physlots` TEXT
 updateschema: Running ALTER TABLE litetree ADD `mntopts` TEXT
 updateschema: Running ALTER TABLE domain ADD `authdomain` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminuser` TEXT
 updateschema: Running ALTER TABLE domain ADD `adminpassword` TEXT
 updateschema: Running ALTER TABLE domain ADD `type` TEXT
 updateschema: Running ALTER TABLE nodegroup ADD `membergroups` TEXT
 updateschema: Running ALTER TABLE noderes ADD `nameservers` TEXT

Re: [xcat-user] Node discovery for VMs in VMWare

2014-01-14 Thread Josh Nielsen

 Then the thing turns to you need make sure the nodeset has been run
correctly on your management node. Did you see the message like this?
 node name : netboot sles11.2-x86_64-compute

Actually, no, nodeset didn't return anything. I tried nodeset xcat-serv1
install. The hyphen in the node name isn't problematic is it? After the
upgrade to 2.8.3 the warning about the SN daemons not being up is gone now
(which is good), but nothing is returned to the console when I do a nodeset
(it completes silently). I don't think it is working, and I didn't see an
entry created for it in /tftpboot/xcat/xnba/nodes/.

Here are the entries in my tables:

*tabdump nodelist:*
...
xcat-serv2,service,ipmi,all,,
xcat-serv1,service,ipmi,all,,

*tabdump servicenode:*
#node,nameserver,dhcpserver,tftpserver,nfsserver,conserver,monserver,ldapserver,ntpserver,ftpserver,nimserver,ipforward,dhcpinterfaces,comments,disable
service,2,1,1,1,1,,,1,,

*lsdef service:*

Object name: xcat-serv1
arch=x86_64
groups=service,ipmi,all
installnic=mac
ip=172.20.3.1
mac=00:50:56:bc:0c:db
netboot=xnba
nfsserver=10.X.X.X
nodetype=osi
os=centos6.4

postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
primarynic=mac
profile=service
provmethod=centos6.4-x86_64-install-compute
setupconserver=1
setupdhcp=1
setupnameserver=2
setupnfs=1
setupntp=1
setuptftp=1
tftpserver=10.X.X.X
xcatmaster=10.X.X.X
Object name: xcat-serv2
arch=x86_64
groups=service,ipmi,all
installnic=mac
ip=172.20.3.2
netboot=xnba
nfsserver=10.X.X.X
nodetype=osi
os=centos6.4

postscripts=syslog,remoteshell,ospkgs,addsiteyum,otherpkgs,syncfiles,ifcfg-eth,fstab,servicenode
primarynic=mac
profile=service
provmethod=centos6.4-x86_64-install-compute
setupconserver=1
setupdhcp=1
setupnameserver=2
setupnfs=1
setupntp=1
setuptftp=1
tftpserver=10.X.X.X
xcatmaster=10.X.X.X

And the OS images:

*lsdef -t osimage:*

centos6.4-x86_64-install-all  (osimage)
centos6.4-x86_64-install-compute  (osimage)
centos6.4-x86_64-install-iscsi  (osimage)
centos6.4-x86_64-install-kvm  (osimage)
centos6.4-x86_64-install-login  (osimage)
centos6.4-x86_64-install-storage  (osimage)
centos6.4-x86_64-install-xen  (osimage)
centos6.4-x86_64-netboot-compute  (osimage)
centos6.4-x86_64-netboot-kvm  (osimage)
centos6.4-x86_64-netboot-xen  (osimage)
centos6.4-x86_64-statelite-compute  (osimage)
centos6.4-x86_64-statelite-kvm  (osimage)
centos6.4-x86_64-statelite-xen  (osimage)
rhels5.3-x86_64-install-compute  (osimage)
rhels5.3-x86_64-install-iscsi  (osimage)
rhels5.3-x86_64-install-kvm  (osimage)
rhels5.3-x86_64-install-service  (osimage)
rhels5.3-x86_64-install-storage  (osimage)
rhels5.3-x86_64-install-xen  (osimage)

Any idea what might be going wrong?

Thanks,
Josh

On Mon, Jan 13, 2014 at 7:12 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 Regarding the dispatching xCAT command request to service node, in certain
 case (e.g. create dhcpd configuration) the request will be sent to all the
 service nodes in the xCAT database (which is configured in servicenode
 table) so that every service nodes can handle the dhcp request for load
 balance.

 In your current situation, you don't need service node to handle your
 target nodes (they are service node), xCAT MN will cover them. So you can
 ignore this error message anyway.

 Then the thing turns to you need make sure the nodeset has been run
 correctly on your management node. Did you see the message like this?

node name : netboot sles11.2-x86_64-compute



 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/01/14
 07:39:35---Thank you, that information is very useful. I just realized 
 tha]Josh
 Nielsen ---2014/01/14 07:39:35---Thank you, that information is very
 useful. I just realized that I have an older version of xCAT (2.

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 2014/01/14 07:39
 Subject: Re: [xcat-user] Node discovery for VMs in VMWare
 --



 Thank you, that information is very useful. I just realized that I have an
 older version of xCAT (2.7.3) and need to upgrade to get those commands for
 node discovery. I will do that soon.

 Also when I did a 'nodeset' on one of the service nodes I saw an error, as
 if xcat were already trying to delegate commands to service nodes that
 aren't even provisioned yet. I'm pretty sure it would do this regardless of
 what node I tried to nodeset. What I can't tell is whether the command was
 successful

Re: [xcat-user] Node discovery for VMs in VMWare

2014-01-13 Thread Josh Nielsen

Thank you, that information is very useful. I just realized that I have an
older version of xCAT (2.7.3) and need to upgrade to get those commands for
node discovery. I will do that soon.

Also when I did a 'nodeset' on one of the service nodes I saw an error, as
if xcat were already trying to delegate commands to service nodes that
aren't even provisioned yet. I'm pretty sure it would do this regardless of
what node I tried to nodeset. What I can't tell is whether the command was
successful or not, despite the errors saying that it can't reach the
service node daemons. Here is what I got:

[root@x3650-head01 ~]# nodeset xcat-serv1
Error: Unable to dispatch hierarchical sub-command to xcat-serv1:3001.
 This service node may be down or its xcatd daemon may not be responding.
Error: Unable to dispatch hierarchical sub-command to xcat-serv2:3001.
 This service node may be down or its xcatd daemon may not be responding.

That is very odd since this presents a chicken-and-egg problem if the
nodeset is not working (the nodes can't be required to be present before
they are provisioned). I followed the xCAT guide to fully define the
service nodes in the xCAT tables first, but it appears something in the
tables is making xCAT think that the service nodes are/should be up
already. Can I ignore this or is nodeset not going to work until I fix this
somehow? Maybe this will be fixed by upgrading xCAT though.

-Josh




On Sun, Jan 12, 2014 at 8:29 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 You saw the discovery message was because of the node configuration was
 not finished yet. After running 'nodeset' and start the node deployment,
 the discovery message won't show again.

 Regarding the discovery, you can use 'sequential discovery' to discover
 virtual node. Refer to:
 https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_iDataPlex_Cluster_Quick_Start#Option_1:_Sequential_Discovery
 Another easy manual method is:

You can check the undiscovered node by command: 'nodediscoverls -t
all' (Any nodes which is sending discovery request will be recorded in
discoverydata table and can be dispalyed by nodediscoverls command). And
run 'nodediscoverdef' to define the undiscovered node to a node definition
in xCAT db.

 See examples:
 # nodediscoverls -t all
   UUIDNODEMETHOD
   MTM   SERIAL
   643CC79B-B7BD-224F-8105-9AEF8329E195undef   undef
  9AA99E16

 # nodediscoverls -t all -l
 Object uuid: 643CC79B-B7BD-224F-8105-9AEF8329E195
 method=undef
 discoverytime=01-13-2014 01:14:28
 arch=x86_64
 cpucount=2
 cputype=Intel Core 2 Duo P9xxx (Penryn Class Core 2)
 memory=3770188
 serial=9AA99E16-BAAA-11E1-B9E3-3440B5E0E828_34:40:B5:E0:E8:28
 nicdriver=eth0!e1000
 nicipv4=eth0!10.1.100.114/16
 nichwaddr=eth0!00:1A:4A:01:00:5D
 nicpci=eth0!:00:03.0

 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/01/11 08:09:31---I
 tried the manual insertion of the MAC address and I was able]Josh Nielsen
 ---2014/01/11 08:09:31---I tried the manual insertion of the MAC address
 and I was able to get a DHCP lease for the VM (IP 10

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 2014/01/11 08:09
 Subject: Re: [xcat-user] Node discovery for VMs in VMWare
 --



 I tried the manual insertion of the MAC address and I was able to get a
 DHCP lease for the VM (IP 10.20.240.254) but for some reason it just kept
 saying it was processing the discovery request like this over and over
 again:

 Jan 10 17:45:11 x3650-head01 xCAT: xcatd: Processing discovery request
 from 10.20.240.254
 Jan 10 23:45:14 10.20.240.254 (none) dhclient[310]: XMT: Solicit on eth0,
 interval 125010ms.
 Jan 10 17:45:16 x3650-head01 xCAT: xcatd: Processing discovery request
 from 10.20.240.254
 Jan 10 23:45:17 10.20.240.254 (none) dhclient[321]: XMT: Solicit on eth1,
 interval 112560ms.
 Jan 10 17:45:21 x3650-head01 xCAT: xcatd: Processing discovery request
 from 10.20.240.254
 Jan 10 17:45:36 x3650-head01 last message repeated 3 times

 I'll have to debug this more after the weekend.

 -Josh



 On Fri, Jan 10, 2014 at 3:22 PM, Josh Nielsen 
 *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org
 wrote:

Hello all,

I would like to provision some new nodes in my xCAT cluster inside
VMWare but I just realized that the way xCAT has identified and discovered
my nodes in the past has been by dedicated physical port numbers on the
switches as defined in the xCAT switch database table

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-01-10 Thread Josh Nielsen

Hi Wang Xiaopeng (I apologize if I got your name wrong before),

You said: SN cannot work without MN since there's only one database which
is running on MN. If MN is down you cannot run lots of commands on SN. In
fact xCAT does NOT recommend to run any command on SN.

Will DNS and DHCP still work however from an SN when the MN is down? I
would need the SN to still resolve the compute nodes' DNS requests on the
cluster subnet and issue DHCP leases/renewals as necessary. It looks like
SNs are intended to take responsibility on behalf of the MN for serving the
compute nodes' necessary services, such that you designate each node's
xcatmaster as the SN like: chdef -t group compute1 servicenode=sn1
xcatmaster=sn1-nic. If that is the case then shouldn't DNS  DHCP still be
resolved from the SN by a node even when the MN is down?

Thanks,
Josh


On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang w...@cn.ibm.com wrote:

 The doc mentioned by Josh will lead you in the correct direction to setup
 Service Node.

 Several information for you:

Roughly, your understand is correct that you can install and configure
a SN automatically when prepare/configure correctly for SNnode definition
(correct group like service, correct attribute like postscripts, correct
osimage like rhels6.4-x86_64-install-service) and xCAT rpm repository.

Generally, the /tftpboot and /install on SN are mounted from xCAT MN,
so them will be identical with the dirs on MN in any time.

SN cannot work without MN since there's only one database which is
running on MN. If MN is down you cannot run lots of commands on SN. In fact
xCAT does NOT recommend to run any command on SN.



 Thanks
 Best Regards
 --
 Wang Xiaopeng (王晓朋)
 IBM China System Technology Laboratory
 Tel: 86-10-82453455
 Email: w...@cn.ibm.com
 Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
 Haidian District Beijing P.R.China 100193

 [image: Inactive hide details for Josh Nielsen ---2014/01/09 03:48:01---Hi
 Russell, For some reason that wiki page never came up in my]Josh Nielsen
 ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page never
 came up in my Google searches even

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 2014/01/09 03:48
 Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node
 --



 Hi Russell,

 For some reason that wiki page never came up in my Google searches even
 when I placed quotes around service node. Thanks! I'll take a look
 through it.

 Regards,
 Josh


 On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones 
 *russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote:

Hi Josh,

This should be helpful for you (it was for me when first learning how
to set this up):

 *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster



On 1/8/2014 11:52 AM, Josh Nielsen wrote:
   Hello all,

   I am trying to move my xCAT installation toward a hierarchical
   structure by including Service Nodes but I am finding a dearth of
   information on how to create a service node on the xCAT wiki and even 
 the
   Sumavi xCAT guide. At first I began following the instructions for a
   setting up a management node (which involves a new xCAT install, an
   independent xCAT database, etc.) on a VM intended to the the SN but then
   realized that that must be wrong.

   I have noticed the xCAT 'servicenode' database table present on the
   Management Node (listed with tabdump). I am wondering if I can simply
   designate a new node (in my case a VM) to be the prospective SN, 
 designate
   which services (like dhcp, dns, etc.) to enable/disable in the
   'servicenode' table, and then do a fresh 'rinstall' to that VM and have 
 it
   be automatically  fully configured as an SN from postinstall scripts
   (which I presume are already present and will be automatically applied 
 to
   that node by some xCAT logic of parsing the 'servicenode' table and
   associating any relevant postscripts to be run with the node).

   If so will it create the /install and /tftpboot directories
   (including ISOs made with copycds) on each SN identical to the contents 
 of
   the MN? Is it that simple or is there more involved than that?

   In addition I want to be able to take down the Management Node in
   the future for maintenance and be able to do just about everything that 
 the
   MN can do from the SNs (including dhcp leases, rpower commands, 
 updatenode,
   psh, etc) while the MN is down. Is that possible? Any clarifications 
 would
   be appreciated.

   Thank you,
   Josh Nielsen

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-01-10 Thread Josh Nielsen

Thank you Lissa, that is helpful.

-Josh


On Fri, Jan 10, 2014 at 1:25 PM, Lissa Valletta lis...@us.ibm.com wrote:

 DNS and DHCP  will still work from the Service Node, if setup correctly.
  In other words,   you have configured the service node as the DNS server
 and/or  DHCP server for the nodes  and there is no requirement on the
 Management Node for dns or dhcp.   You will  not be able to run any
 xcat commands on the service node, if the Management Node is down.   xCAT
  requires access to the database configured on the MN for the xcat cluster
 ( mysql, postgresql) to run most xcat commands.  Even to recognize that the
 node is in the xcat cluster.

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/10/2014 12:59:12
 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong befor]Josh
 Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got
 your name wrong before), You said: SN cannot work without MN

 From: Josh Nielsen jniel...@hudsonalpha.org
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date: 01/10/2014 12:59 PM
 Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node
 --



 Hi Wang Xiaopeng (I apologize if I got your name wrong before),

 You said: SN cannot work without MN since there's only one database which
 is running on MN. If MN is down you cannot run lots of commands on SN. In
 fact xCAT does NOT recommend to run any command on SN.

 Will DNS and DHCP still work however from an SN when the MN is down? I
 would need the SN to still resolve the compute nodes' DNS requests on the
 cluster subnet and issue DHCP leases/renewals as necessary. It looks like
 SNs are intended to take responsibility on behalf of the MN for serving the
 compute nodes' necessary services, such that you designate each node's
 xcatmaster as the SN like: chdef -t group compute1 servicenode=sn1
 xcatmaster=sn1-nic. If that is the case then shouldn't DNS  DHCP still be
 resolved from the SN by a node even when the MN is down?

 Thanks,
 Josh


 On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang 
 *w...@cn.ibm.com*w...@cn.ibm.com
 wrote:

The doc mentioned by Josh will lead you in the correct direction to
setup Service Node.

Several information for you:

   Roughly, your understand is correct that you can install and
   configure a SN automatically when prepare/configure correctly for SNnode
   definition (correct group like service, correct attribute like 
 postscripts,
   correct osimage like rhels6.4-x86_64-install-service) and xCAT rpm
   repository.

   Generally, the /tftpboot and /install on SN are mounted from xCAT
   MN, so them will be identical with the dirs on MN in any time.

   SN cannot work without MN since there's only one database which is
   running on MN. If MN is down you cannot run lots of commands on SN. In 
 fact
   xCAT does NOT recommend to run any command on SN.


Thanks
Best Regards
--
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: *w...@cn.ibm.com* w...@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193

[image: Inactive hide details for Josh Nielsen ---2014/01/09
03:48:01---Hi Russell, For some reason that wiki page never came up in 
 my]Josh
Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that wiki page
never came up in my Google searches even

From: Josh Nielsen *jniel...@hudsonalpha.org*jniel...@hudsonalpha.org

To: xCAT Users Mailing list 
 *xcat-user@lists.sourceforge.net*xcat-user@lists.sourceforge.net,

Date: 2014/01/09 03:48
Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node
--



Hi Russell,

For some reason that wiki page never came up in my Google searches
even when I placed quotes around service node. Thanks! I'll take a look
through it.

Regards,
Josh


On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones 
*russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote:
   Hi Josh,

   This should be helpful for you (it was for me when first learning
   how to set this up):
   
 *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster



   On 1/8/2014 11:52 AM, Josh Nielsen wrote:
  Hello all,

  I am trying to move my xCAT installation toward a hierarchical
  structure by including Service Nodes but I am finding a dearth of
  information on how to create a service node on the xCAT wiki and 
 even the
  Sumavi xCAT guide. At first I

[xcat-user] Node discovery for VMs in VMWare

2014-01-10 Thread Josh Nielsen

Hello all,

I would like to provision some new nodes in my xCAT cluster inside VMWare
but I just realized that the way xCAT has identified and discovered my
nodes in the past has been by dedicated physical port numbers on the
switches as defined in the xCAT switch database table. That is not a
problem for physical nodes which have a single cable going into a port on a
switch dedicated only to that node.

For example one of my nodes has this entry in the switch table:

#node,switch,port,vlan,interface,comments,disable
hinode,bnt01,44,1,eth0,bnt01:p44

This tells xCAT that the node found connected to port 44 on switch bnt01
can be identified as node hinode. With a VMWare cluster however a VM may
reside on many ports (in my case I have 3 ESX servers - and 3 corresponding
cables to the switch - which the VM may be on) at any given time, depending
on which VMWare host it is running on. And on top of that the network
connection is trunked so that every VM running on each ESX host goes
through the same network port.

Has anyone provisioned nodes in VMWare before with xCAT? And if there is
not a way to do this automatically by the switch/port associations, is
there another manual way that I can discover the node? Would manually
inserting the node's MAC address into the xCAT 'mac' table work, or is more
than that required for (auto)discovery?

Thanks,
Josh Nielsen
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-01-09 Thread Josh Nielsen

That may be what I need actually, because I have to wipe the OS on our
current (physical) headnode to upgrade from RHEL5 to RHEL6 and I need
something (a VM HA management node) up and functional while I am doing that
to service the cluster.

-Josh

On Thu, Jan 9, 2014 at 2:32 PM, Lissa Valletta lis...@us.ibm.com wrote:

Highly Available management nodes makes sure that if one Management
Server goes down there is one configured that can take over.Only one
Management Server at a time is active.

Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102

[image: Inactive hide details for Josh Nielsen ---01/09/2014 03:13:09
PM---Thank you Xiao Peng Wang, that information is useful. I am t]Josh
Nielsen ---01/09/2014 03:13:09 PM---Thank you Xiao Peng Wang, that
information is useful. I am thinking that, in that case, Service Node

From: Josh Nielsen jniel...@hudsonalpha.org
To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
Date: 01/09/2014 03:13 PM
Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node
--

Thank you Xiao Peng Wang, that information is useful. I am thinking
that, in that case, Service Nodes are not what I need after all. Our
cluster is not at a scale yet that requires additional repositories to
pull software updates from (which SNs look like they are designed
for). Rather I need additional management points. I looked a little
further and it looks like Highly Available Management Nodes are what I
need:
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Highly_Available_Management_Node
.

I do have a question about the Service Nodes though since I've already
begun configuring them:

In the Service Node setup documentation on the wiki, under the heading
Add OS and Hardware Attributes to Service Nodes they tell you to do
a chdef with parameters like:

chdef -t group service arch=x86_64 os=centos6.4 nodetype=osi ...

My question is about the nodetype. I will be deploying to VMs acting
as the SNs, so should the nodetype not rather be nodetype=vm? What
does nodetype=osi as an osimage even mean when applied to a
physical or virtual machine?

When I do an lsdef on the nodes (that will be on VMs) I see:
# lsdef service
arch=x86_64
groups=service,ipmi,all
installnic=mac
ip=172.20.3.1
netboot=xnba
nfsserver=10.X.X.X
nodetype=osi
...
Object name: xcat-serv2
arch=x86_64
groups=service,ipmi,all
installnic=mac
ip=172.20.3.2
netboot=xnba
nfsserver=10.X.X.X
nodetype=osi
..

But the VMs are vms and not osimages. I'm just wondering about the
semantics of what osi/osimage indicates should be done to a node or
how it will affect deployment.

Thanks,
Josh

On Wed, Jan 8, 2014 at 6:32 PM, Xiao Peng Wang w...@cn.ibm.com wrote:
The doc mentioned by Josh will lead you in the correct direction to setup
Service Node.

Several information for you:

Roughly, your understand is correct that you can install and configure a
SN
automatically when prepare/configure correctly for SNnode definition
(correct group like service, correct attribute like postscripts, correct
osimage like rhels6.4-x86_64-install-service) and xCAT rpm repository.

Generally, the /tftpboot and /install on SN are mounted from xCAT MN, so
them will be identical with the dirs on MN in any time.

SN cannot work without MN since there's only one database which is
running
on MN. If MN is down you cannot run lots of commands on SN. In fact xCAT
does NOT recommend to run any command on SN.

Thanks
Best Regards
--
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: w...@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian
District Beijing P.R.China 100193

Josh Nielsen ---2014/01/09 03:48:01---Hi Russell, For some reason that
wiki
page never came up in my Google searches even

From: Josh Nielsen jniel...@hudsonalpha.org
To: xCAT Users Mailing list xcat-user@lists.sourceforge.net,
Date: 2014/01/09 03:48
Subject: Re: [xcat-user] How to create and deploy an xCAT Service Node

Hi Russell,

For some reason that wiki page never came up in my Google searches even
when
I placed quotes around service node. Thanks! I'll take a look through
it.

Regards,
Josh

On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones
russell-l...@jonesmail.me
wrote:

Hi Josh,

This should be helpful for you (it was for me when first learning how to
set
this up):

http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster

On 1/8/2014 11:52 AM, Josh Nielsen wrote:

Hello all,

I am trying to move my xCAT installation toward a hierarchical structure
by
including Service Nodes but I am

[xcat-user] How to create and deploy an xCAT Service Node

2014-01-08 Thread Josh Nielsen

Hello all,

I am trying to move my xCAT installation toward a hierarchical structure by
including Service Nodes but I am finding a dearth of information on how to
create a service node on the xCAT wiki and even the Sumavi xCAT guide. At
first I began following the instructions for a setting up a management node
(which involves a new xCAT install, an independent xCAT database, etc.) on
a VM intended to the the SN but then realized that that must be wrong.

I have noticed the xCAT 'servicenode' database table present on the
Management Node (listed with tabdump). I am wondering if I can simply
designate a new node (in my case a VM) to be the prospective SN, designate
which services (like dhcp, dns, etc.) to enable/disable in the
'servicenode' table, and then do a fresh 'rinstall' to that VM and have it
be automatically  fully configured as an SN from postinstall scripts
(which I presume are already present and will be automatically applied to
that node by some xCAT logic of parsing the 'servicenode' table and
associating any relevant postscripts to be run with the node).

If so will it create the /install and /tftpboot directories (including ISOs
made with copycds) on each SN identical to the contents of the MN? Is it
that simple or is there more involved than that?

In addition I want to be able to take down the Management Node in the
future for maintenance and be able to do just about everything that the MN
can do from the SNs (including dhcp leases, rpower commands, updatenode,
psh, etc) while the MN is down. Is that possible? Any clarifications would
be appreciated.

Thank you,
Josh Nielsen
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-01-08 Thread Josh Nielsen

Hi Russell,

For some reason that wiki page never came up in my Google searches even
when I placed quotes around service node. Thanks! I'll take a look
through it.

Regards,
Josh

On Wed, Jan 8, 2014 at 12:25 PM, Russell Jones russell-l...@jonesmail.mewrote:

Hi Josh,

This should be helpful for you (it was for me when first learning how to
set this up):
http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setting_Up_a_Linux_Hierarchical_Cluster

On 1/8/2014 11:52 AM, Josh Nielsen wrote:

Hello all,

I am trying to move my xCAT installation toward a hierarchical structure
by including Service Nodes but I am finding a dearth of information on how
to create a service node on the xCAT wiki and even the Sumavi xCAT guide.
At first I began following the instructions for a setting up a management
node (which involves a new xCAT install, an independent xCAT database,
etc.) on a VM intended to the the SN but then realized that that must be
wrong.

I have noticed the xCAT 'servicenode' database table present on the
Management Node (listed with tabdump). I am wondering if I can simply
designate a new node (in my case a VM) to be the prospective SN, designate
which services (like dhcp, dns, etc.) to enable/disable in the
'servicenode' table, and then do a fresh 'rinstall' to that VM and have it
be automatically fully configured as an SN from postinstall scripts
(which I presume are already present and will be automatically applied to
that node by some xCAT logic of parsing the 'servicenode' table and
associating any relevant postscripts to be run with the node).

If so will it create the /install and /tftpboot directories (including
ISOs made with copycds) on each SN identical to the contents of the MN? Is
it that simple or is there more involved than that?

In addition I want to be able to take down the Management Node in the
future for maintenance and be able to do just about everything that the MN
can do from the SNs (including dhcp leases, rpower commands, updatenode,
psh, etc) while the MN is down. Is that possible? Any clarifications would
be appreciated.

Thank you,
Josh Nielsen

--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics
Pro!http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk

___
xCAT-user mailing
listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user

--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics
Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] dhcp timeout in xnba

2013-10-29 Thread Josh Nielsen

Can you tell xnba to put you directly into a shell instead of discovery
mode (by modifying the chains table or setting 'nodeset [node]
runcmd=shell'), and then set a temp static IP and try some ping tests from
there?

I forget how far you have to get to drop to a shell. It seems I've had to
do that before in a shell though. For me on a previous occasion we had to
ensure the switch port was configured as an STP edge port and to make sure
that it communicated over the default VLAN.

Additionally sometimes stale DHCP IP assignment entries (with the MAC
address) were present in /var/lib/dhcpd.leases and we had to manually
delete them and restart dhcpd and try again (this was in a RHEL/Centos5
environment). I don't know why that would cause a timeout though. Were you
able to verify that the DHCP offer was coming from your xCat server in each
case and not (potentially) another DHCP server (rogue or otherwise)? I had
problems one time when I had two xCat servers/headnodes from two different
clusters cross talking and issuing IPs to newly provisioned nodes in the
other's cluster. Just a few ideas. You never know.

Josh Nielsen


On Mon, Oct 28, 2013 at 12:35 PM, Roland Santos roland.san...@gmail.comwrote:

 I'm getting a dhcp timeout issue with xnba/genesis. On discovery, a node
 will boot from the network and get the xnba image. It then times out
 getting network information via dhcp from within xnba. I'm running xCAT
 version 2.8.2, if it helps.

 On the xcat management server side, I see the initial  sequence of DHCP
 requests coming from the node - DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, then
 DHCPACK. After the node downloads the xnba image and loads it, it then
 tries to do another dhcp request, where the request times out. On the xcat
 side, I see just a DHCPDISCOVER and a DHCPOFFER, but no further.

 Any help would be appreciated.


 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Android is increasing in popularity, but the open development platform that
developers love is also attractive to malware creators. Download this white
paper to learn more about secure code signing practices that can help keep
Android apps secure.
http://pubads.g.doubleclick.net/gampad/clk?id=65839951iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

2012-08-07 Thread Josh Nielsen

Okay, it turns out that the xnba-undi package was outdated and for some
reason it was not updated from xcat-dep when we installed xCAT 2.7.3. I
guess it was not a required dependency hence was not grabbed
automatically for update with the new xCAT. I can boot Genesis now after
updating that package.

Also, it looks like the source rpm for that xnba-undi package just adds two
files:

/tftpboot/xcat/xnba.efi
/tftpboot/xcat/xnba.kpxe

The timestamp on them is now Feb 6 2012 and previously my xnba.kpxe
timestamp was Aug 24  2009 and I did not have xnba.efi before at all. Why
are these files needed with genesis? For my own edification does genesis
run *on top* of xnba or does it boot an entirely new image once xnba
fetches the genesis kernel images?

Thanks,
Josh

On Tue, Aug 7, 2012 at 5:24 PM, Josh Nielsen jniel...@hudsonalpha.comwrote:

 Hi Jarrod,

 Okay, I upgraded to xCAT 2.7.3 and installed the xCAT-genesis-x86_64 and
 elilo-xCAT RPMs and I reran mknb x86_64 to recreate the
 /tftpboot/xcat/xnba/nets files and it also reported Creating
 genesis.fs.x86_64.gz in /tftpboot/xcat. But I still get a missing NIC
 driver error when I PXE boot (this time explicit - confirming this is the
 problem). Upon PXE boot the clients get yaboot first, then load xnba.kpxe,
 which according to my /etc/dhcpd.conf file is set up to query the
 appropriate nets file:

 *if option user-class-identifier = xNBA and option client-architecture
 = 00:00 { #x86, xCAT Network Boot Agent*
 *   always-broadcast on;*
 *   filename = http://10.20.0.1/tftpboot/xcat/xnba/nets/10.20.0.0_16
 ;*

 Inside that file is:

 *[root@x3650-head01 etc]# cat /tftpboot/xcat/xnba/nets/10.20.0.0_16*
 *#!gpxe*
 *imgfetch -n kernel http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64
 xcatd=10.20.0.1:3001  BOOTIF=01-${netX/machyp}*
 *imgfetch -n nbfs http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.gz
 *
 *imgload kernel*
 *imgexec kernel*

 In my apache log I see:

 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET
 /tftpboot/xcat/xnba/nets/10.20.0.0_16 HTTP/1.0 200 235 - gPXE/0.9.7
 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET
 /tftpboot/xcat/genesis.kernel.x86_64 HTTP/1.0 200 3942032 - gPXE/0.9.7
 10.20.253.236 - - [07/Aug/2012:13:36:23 -0500] GET
 /tftpboot/xcat/genesis.fs.x86_64.gz HTTP/1.0 200 20210204 - gPXE/0.9.7

 Then genesis boots and (after I removed 'quiet' from the kernel arguments)
 does some initial boot checks and then it goes into a loop of dumping to
 the screen the help/syntax screen for grep (which indicates to me that what
 it is greping for is failing - possibly /tmp/dhcpserver like before).
 Eventually it gives up and prints this to the screen:

 ERROR Unable to find boot device (*maybe* the *nbroot is missing* the *
 driver* for your *nic*?)

 At that point it just sits there, and does not try anything else.

 What have I done wrong here? Is maybe the BOOTIF argument to genesis
 kernel wrong? Also why is it looking for nbroot? There was a previous
 bootloader in /opt/xcat/share/xcat/netboot/x86_64/nbroot/ but it should
 be looking in /opt/xcat/share/xcat/netboot/genesis/x86_64/ since it is
 using genesis, correct?

 Any ideas?

 -Josh


 On Wed, Jul 25, 2012 at 1:37 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 Hmm, with xcat 2.7.3 you should be pulling in the 'xCAT-genesis' packages
 that replace the environment with something newer that has the appropriate
 nic drivers...
 -Josh Nielsen jniel...@hudsonalpha.com 
 jniel...@hudsonalpha.comwrote: -

 To: xcat-user@lists.sourceforge.net
 From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com
 Date: 07/25/2012 12:59PM
 Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

 Hello,

 I have some new IBM System X DX360M4 nodes (all our previous ones were
 DX360M3s) that I am trying to autodiscover with xCAT and I am running into
 the same problem as in this mail thread:
 http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html.
 Essentially the node boots up, does a dhcpdiscover, and grabs a generic
 bootloader (in my case yaboot but it also works with pxelinux.0) which then
 reinitiates the dhcpdiscover and queries again for the appropriate
 bootloader from the file in the xcat/xnba/nets/ folder and is served 
 boots xnba.kpxe. At this point I believe the xnba image is supposed to load
 and execute the autodiscovery process which includes the getdestiny script,
 but all I am seeing are the messages:


 cat: can't open '/tmp/dhcpserver': No such file or directory
 grep: /tmp/destiny: No such file or directory
 grep: /tmp/destiny: No such file or directory


 One reply by Jarrod Johnson to that email thread above mentioned a
 possible network driver issue and suggested using Genesis from xCAT
 v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I
 am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working
 with my current version of xCAT and OS (inject drivers somehow

[xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

2012-07-25 Thread Josh Nielsen

Hello,

I have some new IBM System X DX360M4 nodes (all our previous ones were
DX360M3s) that I am trying to autodiscover with xCAT and I am running into
the same problem as in this mail thread:
http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html.
Essentially the node boots up, does a dhcpdiscover, and grabs a generic
bootloader (in my case yaboot but it also works with pxelinux.0) which then
reinitiates the dhcpdiscover and queries again for the appropriate
bootloader from the file in the xcat/xnba/nets/ folder and is served 
boots xnba.kpxe. At this point I believe the xnba image is supposed to load
and execute the autodiscovery process which includes the getdestiny script,
but all I am seeing are the messages:

cat: can't open '/tmp/dhcpserver': No such file or directory
grep: /tmp/destiny: No such file or directory
grep: /tmp/destiny: No such file or directory

One reply by Jarrod Johnson to that email thread above mentioned a possible
network driver issue and suggested using Genesis from xCAT v2.7. According
to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I am running
on Centos 5 (2.6.18-128.el5). Is there a way to get this working with my
current version of xCAT and OS (inject drivers somehow?), or do I need to
update to v2.7 of xCAT (does that require Centos 6)? In any case, is a
network driver the most likely explanation for what I am seeing? I have
tried everything that I can think of from the switch side to make sure SNMP
is enabled and the port definitions are correct in the switch table for
autodiscovery. Any ideas?

Thanks,
Josh Nielsen
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

2012-07-25 Thread Josh Nielsen

Okay, we will be doing an upgrade of xCAT soon. I just wanted to double
check.

P.S. Can you take a crack at answering that gPXE/UNDI question?

On Wed, Jul 25, 2012 at 1:37 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 Hmm, with xcat 2.7.3 you should be pulling in the 'xCAT-genesis' packages
 that replace the environment with something newer that has the appropriate
 nic drivers...
 -Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.comwrote: 
 -

 To: xcat-user@lists.sourceforge.net
 From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com
 Date: 07/25/2012 12:59PM
 Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

 Hello,

 I have some new IBM System X DX360M4 nodes (all our previous ones were
 DX360M3s) that I am trying to autodiscover with xCAT and I am running into
 the same problem as in this mail thread:
 http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html.
 Essentially the node boots up, does a dhcpdiscover, and grabs a generic
 bootloader (in my case yaboot but it also works with pxelinux.0) which then
 reinitiates the dhcpdiscover and queries again for the appropriate
 bootloader from the file in the xcat/xnba/nets/ folder and is served 
 boots xnba.kpxe. At this point I believe the xnba image is supposed to load
 and execute the autodiscovery process which includes the getdestiny script,
 but all I am seeing are the messages:


 cat: can't open '/tmp/dhcpserver': No such file or directory
 grep: /tmp/destiny: No such file or directory
 grep: /tmp/destiny: No such file or directory


 One reply by Jarrod Johnson to that email thread above mentioned a
 possible network driver issue and suggested using Genesis from xCAT
 v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I
 am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working
 with my current version of xCAT and OS (inject drivers somehow?), or do I
 need to update to v2.7 of xCAT (does that require Centos 6)? In any case,
 is a network driver the most likely explanation for what I am seeing? I
 have tried everything that I can think of from the switch side to make sure
 SNMP is enabled and the port definitions are correct in the switch table
 for autodiscovery. Any ideas?


 Thanks,
 Josh Nielsen


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user




 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

2012-07-25 Thread Josh Nielsen

Okay, thanks for the explanation.

Cheers,
Josh

On Wed, Jul 25, 2012 at 2:28 PM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 So the issue is that the xnba can work fine.  The problem is the linux
 image that subsequently loads lacked the driver update.  In an ideal world,
 we make an efi executable that does everything in UEFI that is moderately
 future proof.  In a practical world we work with the much richer linux
 toolset for lack of time to develop EFI resources.

 -Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.comwrote: 
 -

 To: xCAT Users Mailing list 
 xcat-user@lists.sourceforge.netxcat-user@lists.sourceforge.net
 From: Josh Nielsen jniel...@hudsonalpha.com jniel...@hudsonalpha.com
 Date: 07/25/2012 02:58PM
 Subject: Re: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

 Thanks for the information Lissa.

 I do have another more general question as well though. It is regarding
 the xnba boot image itself, which appears to be based off of gPXE. The idea
 of gPXE is that it is UNDI-capable and does not have to use TFTP to serve
 the images correct? And looking at the xnba.kpxe image/bootloader extension
 of '.kpxe' it looks like that is reserved specifically for images that load
 UNDI but offload PXE. According to this page (
 http://etherboot.org/wiki/gpxe_imagetypes) the extensions for images
 break down like this:

 - .pxe is an image designed to be chainloaded, unloading both the
 underlying PXE and UNDI code sections.
 - .kpxe is a PXE image that keeps UNDI loaded and unloads PXE
 - .kkpxe is a PXE image that keeps PXE+UNDI loaded and return to PXE
 (instead of int 18h).

 So does xnba.kpxe try to interact with the NIC card via UNDI once it
 loads? If so since UNDI is an abstracted API, and unless the API has been
 updated on the newer NIC cards, shouldn't it work with just about any
 network card regardless? From this PXE chainloading page (
 http://etherboot.org/wiki/pxechaining) it says:

 When chainloading gPXE from PXE, gPXE can use this API (instead of
 loading an hardware driver). This way, you're getting support for network
 controllers that are not natively supported by gPXE. Some network
 controllers have improved performance when using the UNDI driver over the
 vendor specific gPXE driver.
 I'm just curious about some of the theory behind this because I'm only
 used to good ol' legacy PXE. All this gPXE and UNDI stuff is new to me.


 On Wed, Jul 25, 2012 at 12:57 PM, Lissa Valletta lis...@us.ibm.comwrote:


 Version 2.3.x has not been supported for a long time.  You need to be on
 the latest level of 2.6 for support , but you might as well  go to the
 latest release which is  2.7.3. You can upgrade xCAT and stay at your
 current Centos level.

 Lissa K. Valletta
 2-3/T12
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 Josh Nielsen ---07/25/2012 01:33:25 PM---Hello, I have some new IBM
 System X DX360M4 nodes (all our previous ones were

 From: Josh Nielsen jniel...@hudsonalpha.com
 To: xcat-user@lists.sourceforge.net
 Date: 07/25/2012 01:33 PM
 Subject: [xcat-user] Getdestiny failing - can't open '/tmp/dhcpserver'

 --



 Hello,
 I have some new IBM System X DX360M4 nodes (all our previous ones were
 DX360M3s) that I am trying to autodiscover with xCAT and I am running into
 the same problem as in this mail thread: *
 http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html
 *http://www.mail-archive.com/xcat-user@lists.sourceforge.net/msg01267.html.
 Essentially the node boots up, does a dhcpdiscover, and grabs a generic
 bootloader (in my case yaboot but it also works with pxelinux.0) which then
 reinitiates the dhcpdiscover and queries again for the appropriate
 bootloader from the file in the xcat/xnba/nets/ folder and is served 
 boots xnba.kpxe. At this point I believe the xnba image is supposed to load
 and execute the autodiscovery process which includes the getdestiny script,
 but all I am seeing are the messages:

 cat: can't open '/tmp/dhcpserver': No such file or directory
 grep: /tmp/destiny: No such file or directory
 grep: /tmp/destiny: No such file or directory

 One reply by Jarrod Johnson to that email thread above mentioned a
 possible network driver issue and suggested using Genesis from xCAT
 v2.7. According to 'xcatconfig -v' I am running Version 2.3.1 of xCAT and I
 am running on Centos 5 (2.6.18-128.el5). Is there a way to get this working
 with my current version of xCAT and OS (inject drivers somehow?), or do I
 need to update to v2.7 of xCAT (does that require Centos 6)? In any case,
 is a network driver the most likely explanation for what I am seeing? I
 have tried everything that I can think of from the switch side to make sure
 SNMP is enabled and the port definitions are correct in the switch table
 for autodiscovery. Any ideas?

 Thanks,
 Josh Nielsen
 --
 Live Security Virtual Conference

81 matches

Mail list logo