I should also mention that for kicks last week we tried deploying one VM
that had been defined in the hpvmgroupA group 4 months ago with the rest of
them, but had not been deployed and had been delayed until now, and it too
is experiencing the same problems with the keys as well as our (unrelated?)
hostname issues, so I'm going for something that changed in our
environment, be it networking related, DNS, perhaps permissions, or any
host of other variables that can change on systems involved with the xCAT
deploy process. I just wish there were a meaningful error or way to
determine the root issue more easily.
On Wed, Mar 9, 2016 at 10:01 AM, Josh Nielsen <[email protected]>
wrote:
> Yes. Sorry that I did not mention it before, but they are stateful nodes.
> I have never used stateless nodes and have nothing configured for
> stateless. I'm just puzzled what could have changed in four months since I
> last I deployed 50+ VMs with the same osimage template and postscripts they
> used at that time. I'm deploying Centos 6.5 to a series of VMs on ESXi
> (standalone - not vSphere managed) on an HP CX7000 Blade chassis. I'm PXE
> booting as the method of delivering the kickstart, and I'm only having
> problems post-kickstart (or so it seems...). And the deploy is completely
> abstracted to the VM guest operating system level, so nothing specific to
> HP.
>
> I'm obscuring the IPs in the following excerpt from my xCAT 'hosts' table,
> but the only real change in xCAT was that I defined a third ('C') group of
> VMs that run on the HP cluster following the pattern I had done for the
> previous 50+ VMs which were divided between the A and B groups:
>
> "hpvmgroupA","|\D+(\d+).*$|X.Y.101.($1-0)|",,,"HP Compute nodes compute
> interface",
> "hpvmgroupB","|\D+(\d+).*$|X.Y.102.($1-0)|",,,"HP Compute nodes compute
> interface",
> "hpvmgroupC","|\D+(\d+).*$|X.Y.103.($1-0)|",,,"HP Compute nodes compute
> interface",
>
> Otherwise, if you do an lsdef and look at a node that was deployed
> successfully which was part of hpvmgroupA and one that I am currently
> trying to deploy in hpvmgroupC they are identical except for the details
> that should differ like IP address and other things; and the osimage for
> the Centos 6.5 image (which points to the *.tmpl kickstart files and the
> otherpkgs *.tmpl files) is the same. So perhaps something else in my
> environment changed?
>
> My coworker and I have tried delving into the code and placing echoes as
> debugging statements, and have looked at the -V verbose output of various
> commands, but can't seem to find a meaningful error as to why it is not
> fetching either the id_rsa or id_rsa.pub files. Perhaps this was taken care
> of in a certain postscript that I have taken for granted before and
> forgotten to run this time? I have no idea.
>
> -Josh
>
> On Tue, Mar 8, 2016 at 10:34 PM, Daniel Letai <[email protected]> wrote:
>
>> Can you confirm you are deploying stateful nodes and not stateless ?
>>
>>
>> On 03/09/2016 12:53 AM, Josh Nielsen wrote:
>>
>> My coworker just pointed out that the /xcatpost/mypostscript on the nodes
>> that are deployed actually have this line:
>>
>> ENABLESSHBETWEENNODES='NO'
>> export ENABLESSHBETWEENNODES
>>
>> That's interesting, given that sshbetweennodes (without 'enable' as the
>> beginning of the parameter name?) was not defined at all in the site table
>> and the default is supposedly enabled(?). However, I just set
>> sshbetweennodes in site to "sshbetweennodes","ALLGROUPS",, and am now
>> redeploying to see if it makes a difference.
>>
>> The man page for site says:
>>
>> sshbetweennodes: Comma separated list of groups to enable passwordless
>> root
>> ssh during install, or xdsh -K.
>> Default is ALLGROUPS.
>> Set to NOGROUPS,if you do not wish to
>> enabled any groups.
>> Service Nodes are not affected by
>> this attribute
>> they are always setup with
>> passwordless root access to nodes and
>> other SN.
>>
>> -Josh
>>
>> On Tue, Mar 8, 2016 at 4:26 PM, Josh Nielsen <[email protected]>
>> wrote:
>>
>>> Here is what I see in /var/log/messages when remoteshell is run:
>>>
>>> Mar 7 14:28:41 xcat-serv1 node0087c xcat: remoteshell: setup
>>> /etc/ssh/sshd_config and ssh_config
>>> Mar 7 14:28:41 xcat-serv1 node0087c xcat: Install: setup root .ssh
>>> Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16025]: xCAT: Allowing
>>> getcredentials ssh_dsa_hostkey from node0087c
>>> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: remoteshell: getting
>>> ssh_host_dsa_key
>>> Mar 7 14:28:42 xcat-serv1 xcat-serv1 xCAT[16027]: xCAT: Allowing
>>> getcredentials ssh_rsa_hostkey from node0087c
>>> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: ssh_rsa_hostkey
>>> Mar 7 14:28:42 xcat-serv1 node0087c xCAT: start up sshd
>>>
>>> I see new timestamps on authorized_keys and copy.sh when it is run, so
>>> it is actually doing something.
>>>
>>> Is there a substantial difference between remoteshell and updatenode -k?
>>> Why does updatenode -k successfully copy the id_rsa key to the node if I
>>> type in the password?
>>>
>>> As for the hostname in /etc/sysconfig/network, actually if I run just
>>> the kickstart and remove ifcfg-eth from the list of postscripts (in the
>>> node definition's postscripts= field) to execute automatically it ends up
>>> looking like this:
>>> # cat /etc/sysconfig/network
>>> NETWORKING=yes
>>> HOSTNAME=3(NXDOMAIN)
>>>
>>> But when I manually run updatenode node0087c -P ifcfg-eth it works
>>> correctly. (Also it changes the /etc/sysconfig/network-scripts/ifcfg-eth0
>>> file's BOOTPROTO parameter from dhcp to static and sets the IPADDR, as it
>>> should).
>>> # cat /etc/sysconfig/network
>>> NETWORKING=yes
>>> HOSTNAME=node0087c.morgan.haib.org
>>>
>>> Only if I keep the ifcfg-eth postscript in the node definition (listed
>>> in postscripts= if you 'lsdef') to be automatically executed does
>>> that /etc/sysconfig/network file say 'localhost', which seems to indicate
>>> an order of execution problem to me (even though I made sure ifcfg-eth was
>>> listed last). And, to answer your question, an nslookup node0087c before
>>> and after both return the correct IP from either of the SNs' slave DNS
>>> servers.
>>>
>>> They are possibly two unrelated issues, but I'm close to just upgrading
>>> xCAT and seeing if I have any better luck if I can't figure out some
>>> obvious problem soon. For kicks I'll explicitly set sshbetweennodes in the
>>> site table and rerun the remoteshell postscript.
>>>
>>> Regards,
>>> Josh
>>>
>>> On Tue, Mar 8, 2016 at 3:33 PM, Casandra H Qiu < <[email protected]>
>>> [email protected]> wrote:
>>>
>>>> mmm, I don't have system with xCAT 2.8.3. but I think sshbetweenodes
>>>> attribute is available for while. if it is not defined in the site table,
>>>> the default should be set up passwordless between nodes.
>>>> "nslookup nodename" still works after you update the hostname, right?
>>>> can u able to find any error message from logs? maybe in the
>>>> /var/log/message.
>>>>
>>>> Thanks,
>>>> Casandra
>>>> ...................................................................
>>>> Casandra Hong Qiu
>>>> Phone: (845) 433-9291 <%28845%29%20433-9291>, t/l 293-9291
>>>> Office: B/002, Floor 3, Z13
>>>> [email protected]
>>>>
>>>>
>>>>
>>>> [image: Inactive hide details for Josh Nielsen ---03/08/2016 03:58:33
>>>> PM---Thanks for the response Casandra. I should firstly note that]Josh
>>>> Nielsen ---03/08/2016 03:58:33 PM---Thanks for the response Casandra. I
>>>> should firstly note that I have xCAT 2.8.3. I know I need to upg
>>>>
>>>> From: Josh Nielsen < <[email protected]>[email protected]
>>>> >
>>>> To: xCAT Users Mailing list < <[email protected]>
>>>> [email protected]>
>>>> Date: 03/08/2016 03:58 PM
>>>> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key without
>>>> prompting for password
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>>
>>>> Thanks for the response Casandra. I should firstly note that I have
>>>> xCAT 2.8.3. I know I need to upgrade, but not only has this worked in the
>>>> past but I also successfully deployed 50+ nodes back in November with the
>>>> exact same xCAT version I have now and using the same osimage for Centos
>>>> 6.5, same kickstart, same defined postscripts, etc. So something else has
>>>> changed, perhaps in our environment?
>>>>
>>>> That being said, I did not see sshbetweennodes specified at all in the
>>>> site table. The following are the only two references to ssh in the table:
>>>>
>>>> #tabdump site | grep -i ssh
>>>> "maxssh","8",,
>>>> "rsh","/usr/bin/ssh",,
>>>>
>>>> Is 'sshbetweennodes' only a feature of versions newer than 2.8.X, or
>>>> has it been around a while?
>>>>
>>>> Lastly, you said that remoteshell copies over id_rsa.pub (regardless -
>>>> in either scenario), and I have seen that before as well, but actually I am
>>>> not seeing any id_rsa* keys (public or private) copied to the node at all.
>>>> And even an updatenode -k is only producing the id_rsa (if I manually type
>>>> the password) but not the .pub, which is also odd. But authorized_keys is
>>>> populated with the rsa public key signature. Something else must be going
>>>> on.
>>>>
>>>> P.S. The only other issue I'm still dealing with, which may irrelevant
>>>> for this issue, is a hostname problem to where if I run the ifcfg-eth
>>>> postscript it updates the hostname in /etc/sysconfig/network from the
>>>> correct node name to "localhost". My forward and reverse lookup entries in
>>>> DNS are present, and the hostname is set correctly by the kickstart before
>>>> ifcfg-eth is run, and remains there if it is not run, which I presume it
>>>> gets from either the node definition in dhcpd.leases which is created with
>>>> 'makedhcp' and/or the DNS entries for the host's IP. On the off chance that
>>>> key copying could be tied to name resolution inconsistencies I thought I
>>>> might mention that as well.
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>> On Tue, Mar 8, 2016 at 1:20 PM, Casandra H Qiu <
>>>> <[email protected]>*[email protected]
>>>> <[email protected]>*> wrote:
>>>>
>>>> can u check the site table if sshbetweennodes is set up? The
>>>> default for sshbetweennodes is ALLGROUPS, and will enable passwordless
>>>> between nodes. this attribute will be ignored if zone table is set up,
>>>> so
>>>> please check zone table also.
>>>>
>>>> if it enables, the remoteshell postscript will copy id_rsa and
>>>> id_rsa.pub over to compute node, otherwise, it only copies id_rsa.pub.
>>>>
>>>> from source code, updatenode -k is always required password.
>>>>
>>>>
>>>> Thanks,
>>>> Casandra
>>>> ...................................................................
>>>> Casandra Hong Qiu
>>>> Phone: *(845) 433-9291* <%28845%29%20433-9291>, t/l 293-9291
>>>> Office: B/002, Floor 3, Z13
>>>> *[email protected]* <[email protected]>
>>>>
>>>>
>>>>
>>>> [image: Inactive hide details for Josh Nielsen ---03/08/2016
>>>> 12:51:58 PM---Yes, I just verified. It is present, but that alone is
>>>> not s]Josh
>>>> Nielsen ---03/08/2016 12:51:58 PM---Yes, I just verified. It is present,
>>>> but that alone is not sufficient for that node to be able to SS
>>>>
>>>> From: Josh Nielsen < <[email protected]>*[email protected]
>>>> <[email protected]>*>
>>>> To: xCAT Users Mailing list <
>>>> <[email protected]>*[email protected]
>>>> <[email protected]>*>
>>>> Date: 03/08/2016 12:51 PM
>>>> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa key
>>>> without prompting for password
>>>> ------------------------------
>>>>
>>>>
>>>>
>>>> Yes, I just verified. It is present, but that alone is not
>>>> sufficient for that node to be able to SSH to other nodes itself. It
>>>> allows
>>>> other nodes which have the correct private key to SSH to it, but not the
>>>> other way around.
>>>>
>>>> For example, on one compute node I'm having trouble with /root/.ssh
>>>> has these three files:
>>>>
>>>> -rw-------. 1 root root 408 Mar 7 14:28 authorized_keys
>>>> -rw-------. 1 root root 411 Mar 7 14:28 copy.sh
>>>> -rw------- 1 root root 402 Mar 3 16:20 known_hosts
>>>>
>>>> And authorized_keys has the correct ssh-rsa public key entry, but I
>>>> cannot go from this node to any other node in my cluster via
>>>> passwordless
>>>> ssh. But as soon as I run updatenode -k, and type in the password that
>>>> it
>>>> prompts for to complete the command, the id_rsa key is added as the
>>>> fourth
>>>> file to the /root/.ssh directory, and then after that I can ssh to other
>>>> nodes from it without supplying a password. That is the issue.
>>>>
>>>> In the past simply running the remoteshell postscript (or so I
>>>> assumed) was sufficient for adding the id_rsa file, and it was all
>>>> automated from a fresh deploy by specifying remoteshell as one of the
>>>> default postscripts to run. But now it doesn't look like remoteshell is
>>>> placing the id_rsa file on the node (unless some other script or
>>>> command is
>>>> responsible for that), but remoteshell looks like it creates everything
>>>> else in /root/.ssh/ (and /etc/ssh/).
>>>>
>>>> Is remoteshell the correct postscript for that, or was the id_rsa
>>>> key most likely being pushed to the nodes some other way (like by some
>>>> code
>>>> that called updatenode -k upon initial deployment)? Either way, all I
>>>> can
>>>> say for sure is that id_rsa used to appear in /root/.ssh on the compute
>>>> node automatically and now it does not.
>>>>
>>>> Regards,
>>>> Josh
>>>>
>>>> On Tue, Mar 8, 2016 at 4:19 AM, Xiao Peng Wang <*[email protected]*
>>>> <[email protected]>> wrote:
>>>> To enable the login without password, the rsa public key should
>>>> be copied to /root/.ssh/authorized_keys in the compute node.
>>>> Could you
>>>> check whether the key has been added in to
>>>> /root/.ssh/authorized_keys?
>>>>
>>>>
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> Wang Xiaopeng (王晓朋)
>>>> IBM China System Technology Laboratory
>>>> Tel: 86-10-82453455
>>>> Email: *[email protected]* <[email protected]>
>>>> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang
>>>> West Road, Haidian District Beijing P.R.China 100193
>>>>
>>>>
>>>> ----- Original message -----
>>>> From: Josh Nielsen <
>>>> <[email protected]>*[email protected]
>>>> <[email protected]>*>
>>>> To: xCAT Users Mailing list <
>>>> <[email protected]>*[email protected]
>>>> <[email protected]>*>
>>>> Cc:
>>>> Subject: Re: [xcat-user] Updatenode -k won't create id_rsa
>>>> key without prompting for password
>>>> Date: Tue, Mar 8, 2016 5:26 AM
>>>>
>>>> Also if remoteshell is invoked directly as a
>>>> postscript ('updatenode node0086c -V -P remoteshell') it produces
>>>> the same
>>>> result, but does not prompt for a password (like invoking xdsh -K
>>>> directly
>>>> doesn't), and copies everything over except id_rsa. So actually
>>>> the
>>>> prompting for a password is specific to updatenode -k, not xdsh
>>>> -K or the
>>>> remoteshell postscript (which run that). So I'm not sure if that
>>>> is
>>>> relevant to the underlying problem or not, but if I do invoke
>>>> updatenode -k
>>>> and supply it the password it copies the id_rsa to the node.
>>>>
>>>> On Mon, Mar 7, 2016 at 2:12 PM, Josh Nielsen <
>>>> <[email protected]>*[email protected]
>>>> <[email protected]>*> wrote:
>>>> Hello,
>>>>
>>>> When we freshly deploy a node from the kickstart and run our
>>>> postscripts we noticed that for some reason the /root/.ssh/id_rsa
>>>> file
>>>> which allows passwordless login from that node to other nodes is
>>>> missing,
>>>> though this was not the case just a few months ago. When I try to
>>>> generate
>>>> the key manually it prompts for a password, after which it will
>>>> copy/create
>>>> that file successfully (see below), but there are a few odd things
>>>> connected to this.
>>>>
>>>> The error is:
>>>> updatenode node0087c -k
>>>> Enter the password for the userid: root on the node where the
>>>> ssh keys will be updated:
>>>>
>>>> The first oddity is that even after supplying the password
>>>> once for a particular node it will prompt for the password every
>>>> time if I
>>>> run it again, as well as the related problem that this never used
>>>> to happen
>>>> before and the key used to be created without issue or prompting
>>>> for a
>>>> password. The 'passwd' xCAT table has the password for root (if
>>>> that is
>>>> where it looks for this command).
>>>>
>>>> Secondly I have done several manual debugging steps (and
>>>> poking around source code to see what is happening) and I have
>>>> run the
>>>> actual xdsh command that is called, shown from the -V verbose
>>>> output (which
>>>> it prints two of, the first apparently to prep the SNs and run the
>>>> 'remoteshell' postscript on them, and the second to actually do
>>>> the same to
>>>> the node specified).
>>>>
>>>> xdsh sn1,sn2 --nodestatus -s -v -e
>>>> /install/postscripts/xcatdsklspost 5 -m [MN_IP]
>>>> 'remoteshell,servicenode'
>>>> --tftp /tftpboot --installdir /install --nfsv4 no -c -V
>>>>
>>>> xdsh node0086c --nodestatus -s -v -e
>>>> /install/postscripts/xcatdsklspost 5 -m [SN1_IP] 'remoteshell'
>>>> --tftp
>>>> /tftpboot --installdir /install --nfsv4 no -c -V
>>>>
>>>> This did not reveal anything useful, except that when invoked
>>>> directly like this no password is prompted for and it runs, but
>>>> still
>>>> leaves out the id_rsa file. I followed also the suggestion by
>>>> Wang Xaiopeng
>>>> in this thread (
>>>> <http://tinyurl.com/jz2jzmb>*http://tinyurl.com/jz2jzmb
>>>> <http://tinyurl.com/jz2jzmb>**)* to test the getcredentials
>>>> call with:
>>>>
>>>> 1. Enable mini server
>>>> /xcatpost/allowcred.awk &
>>>>
>>>> 2.Try to get rsa hostkey
>>>> USEOPENSSLFORXCAT=yes XCATSERVER=<MNIP>:3001
>>>> /xcatpost/getcredentials.awk ssh_rsa_hostkey
>>>>
>>>> This returned ssh_rsa_hostkey sucessfully. When remoteshell
>>>> is run (whether with updatenode -k or xdsh -K) it actually does
>>>> copy over
>>>> the key files into /etc/ssh/ and it copies known_hosts, copy.sh,
>>>> and
>>>> authorized_keys into /root/.ssh on the compute node but omits
>>>> id_rsa. What
>>>> could be going wrong here?
>>>>
>>>> Regards,
>>>> Josh Nielsen
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> *http://makebettercode.com/inteldaal-eval*
>>>> <http://makebettercode.com/inteldaal-eval>
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> *[email protected]* <[email protected]>
>>>> *https://lists.sourceforge.net/lists/listinfo/xcat-user*
>>>> <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> *http://makebettercode.com/inteldaal-eval*
>>>> <http://makebettercode.com/inteldaal-eval>
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> *[email protected]* <[email protected]>
>>>> *https://lists.sourceforge.net/lists/listinfo/xcat-user*
>>>> <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> *http://makebettercode.com/inteldaal-eval*
>>>> <http://makebettercode.com/inteldaal-eval>
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> *[email protected]* <[email protected]>
>>>> *https://lists.sourceforge.net/lists/listinfo/xcat-user*
>>>> <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> *http://makebettercode.com/inteldaal-eval*
>>>> <http://makebettercode.com/inteldaal-eval>
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> *[email protected]* <[email protected]>
>>>> *https://lists.sourceforge.net/lists/listinfo/xcat-user*
>>>> <https://lists.sourceforge.net/lists/listinfo/xcat-user>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://makebettercode.com/inteldaal-eval
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Transform Data into Opportunity.
>>>> Accelerate data analysis in your applications with
>>>> Intel Data Analytics Acceleration Library.
>>>> Click to learn more.
>>>> http://makebettercode.com/inteldaal-eval
>>>> _______________________________________________
>>>> xCAT-user mailing list
>>>> [email protected]
>>>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.http://makebettercode.com/inteldaal-eval
>>
>>
>>
>> _______________________________________________
>> xCAT-user mailing
>> [email protected]https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Transform Data into Opportunity.
>> Accelerate data analysis in your applications with
>> Intel Data Analytics Acceleration Library.
>> Click to learn more.
>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>> _______________________________________________
>> xCAT-user mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user