Re: [xcat-user] packimage users and groups

2017-02-07 Thread Yuan Y Bai
Hi Geert,

Could you resolve this problem?

packimage will remove some directories based on exclude list exlist. If you
add some directories you needed, this may cause problem.
Could you check your exlist in osimage definition if it contains some
directories you needed?

exlist is in osimage definition, for example, osimage name is
rhels7.2-x86_64-netboot-compute, you can get the exlist:

]# lsdef -t osimage rhels7.2-x86_64-netboot-compute -i exlist
Object name: rhels7.2-x86_64-netboot-compute
exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
]# cat /opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist



Best Regards
--
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com
Address: IBM ZGC Campus. Ring Building 28,
 ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
District,
 Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193



From:   Geert Geurts 
To: 
Date:   01/26/2017 07:09 PM
Subject:[xcat-user] packimage users and groups



Hello all,

I've build an centos 7.2 osimage using genimage but I noticed that a few
services didn't comeup correctly.

chronyd.service, sm-client.service and systemd-tmpfiles-setup.service
all failed because of failing users/groups.

I've added the needed users to the image, but these users get removed by
packimage.

Why does packimage care about the useres defined in an image?

What am I doing wrong here to get these users defined?


Best regards,

Geert



--

Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] upgrading xCAT onto new servers

2017-02-07 Thread David D Johnson
Historical details: cluster originally set up with GPFS accessMode= allToAll, 
back in 2009.  
All nodes still have ability to ssh as root to any other node as root, but we 
changed
the gpfs cluster configuration to accessMode=central a few years ago.
Some day we may want to tighten it down, but not now.

Originally /var/mmfs/gen was copied into the diskless boot image, but this 
became painful as we continued to add a few dozen new nodes every few months,
and would have to respin the diskless images to update the SDR.

Nodes in the old cluster continues to work fine with xCAT 2.8.x and centos 6.7 
and gpfs 4.2.2-1.
New cluster nodes are exhibiting this problem with xCAT 2.13.1 and rhels7.2 and 
gpfs 4.2.2-1.

So, did xCAT formerly run postscripts in a pseudo-tty environment, and not any 
more?

The goal is to have each compute node be able to boot up diskless, rejoin GPFS 
at will
without any manual intervention. 
> On Feb 7, 2017, at 2:02 PM, David D. Johnson  wrote:
> 
> Drilling down deeper, seems to be two different situations.
> 
> On nodes without X11, remoteshell script takes less than a second,
> -rw--- 1 root root 1231 Feb  7  2017 authorized_keys
> -rw--- 1 root root 1675 Feb  7  2017 id_rsa
> -rw--- 1 root root  410 Feb  7  2017 id_rsa.pub
> 7cb5ab60ff42ede791c823afd016997d  /root/.ssh/authorized_keys
> 13f430f0001adff42dc250f818eabbd1  /root/.ssh/id_rsa
> 3f5101404ac152d4aaea6c62f7eb6e30  /root/.ssh/id_rsa.pub
> 
> However later in the script, trying to set up to start gpfs, I get this 
> message:
> 
> Install: recovering gpfs sdr
> Tue Feb  7 18:20:28 UTC 2017: mmsdrrestore: Processing node gpu002
> mmsdrrestore: Run the command from an active terminal or enable global 
> passwordless access.
> mmsdrrestore: Unable to retrieve GPFS cluster files from node 
> ut002.oscar.ccv.brown.edu 
> mmsdrrestore: File /var/mmfs/ssl/stage/genkeyData1 not found.
>Use mmauth genkey to recover the file, or to generate and commit a new key.
> mmsdrrestore: Unexpected error from updateMmfsEnvironment.  Return code: 1
> mmsdrrestore: Command failed. Examine previous error messages to determine 
> cause.
> 
> 
> If I copy/paste the command from the postscript file, run it from ssh login, 
> I get
> [root@gpu002 xcat]# /usr/lpp/mmfs/bin/mmsdrrestore -p ut003 -R /usr/bin/scp
> Tue Feb  7 18:26:10 UTC 2017: mmsdrrestore: Processing node gpu002
> Warning: Permanently added 'ut002.oscar.ccv.brown.edu 
> ' (RSA) to the list of known hosts.
> mmsdrrestore: Node gpu002 successfully restored.
> 
> There is no difference in the /root/.ssh files before or after. Why does it 
> work by hand, but not from inside script?
> 
> Found that on nodes with X11, remoteshell script was taking 12 minutes to run 
> to “completion”,
> and the result is zero length id_rsa.pub file.
> 
> -rw--- 1 root root 821 Feb  7 12:59 authorized_keys
> -rw--- 1 root root   0 Feb  7 13:09 id_rsa.pub
> -rw-r--r-- 1 root root 183 Feb  7 13:02 known_hosts
> 4cd344ed6d3721a283f442977862b981  /root/.ssh/authorized_keys
> d41d8cd98f00b204e9800998ecf8427e  /root/.ssh/id_rsa.pub
> a178f5a553c74d99590b2047d9517363  /root/.ssh/known_hosts
> 
> I thought it was NetworkManager, but it turns out it was firewalld.
> (chroot . systemctl disable firewalld )
> 
> — ddj
> 
>> On Feb 7, 2017, at 6:35 AM, David D Johnson > > wrote:
>> 
>> That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
>> I don't believe it will forward requests within the zones that it is 
>> authoritative.
>> I ended up using tabdump to recreate the hosts and nodelist tables. Mostly 
>> good.
>> 
>> Now the problem of the day is fixing the SSH credentials so that all the 
>> diskless nodes booting off the
>> new frontend can get root access to all the nodes still booted off the old 
>> frontend.  Need this
>> especially for GPFS.  I've been trying to follow what's going on in the 
>> remoteshell postscript,
>> and I'm wondering if my "sitespecific" postscript is running before 
>> "remoteshell" is competed.
>> Is there a way to determine/force the order the postscripts are executed?  
>> Sitespecific is after
>> remoteshell both in alphabet and in the lsdef output. 
>> The basic problem is that mmsdrrestore fails during sitespecific, but works 
>> fine when I try it again later by hand.
>> 
>>  -- ddj
>> Dave Johnson
>> Brown University
>> 
>>> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao >> > wrote:
>>> 
>>> Hi, David
>>>  
>>> Will you pls try 'chdef -t site forwarders=' and then 'makedns' 
>>> to use mgt1 as your remote DNS server.
>>> Pls feel free to let me know if there is any more issues.
>>>  
>>> Thx!
>>> Best Regards,
>>> ---
>>> Zhao Er Tao
>>> 
>>> IBM China System and Technology Laboratory, Beijing
>>> 

Re: [xcat-user] upgrading xCAT onto new servers

2017-02-07 Thread David D. Johnson
Drilling down deeper, seems to be two different situations.

On nodes without X11, remoteshell script takes less than a second,
-rw--- 1 root root 1231 Feb  7  2017 authorized_keys
-rw--- 1 root root 1675 Feb  7  2017 id_rsa
-rw--- 1 root root  410 Feb  7  2017 id_rsa.pub
7cb5ab60ff42ede791c823afd016997d  /root/.ssh/authorized_keys
13f430f0001adff42dc250f818eabbd1  /root/.ssh/id_rsa
3f5101404ac152d4aaea6c62f7eb6e30  /root/.ssh/id_rsa.pub

However later in the script, trying to set up to start gpfs, I get this message:

Install: recovering gpfs sdr
Tue Feb  7 18:20:28 UTC 2017: mmsdrrestore: Processing node gpu002
mmsdrrestore: Run the command from an active terminal or enable global 
passwordless access.
mmsdrrestore: Unable to retrieve GPFS cluster files from node 
ut002.oscar.ccv.brown.edu
mmsdrrestore: File /var/mmfs/ssl/stage/genkeyData1 not found.
   Use mmauth genkey to recover the file, or to generate and commit a new key.
mmsdrrestore: Unexpected error from updateMmfsEnvironment.  Return code: 1
mmsdrrestore: Command failed. Examine previous error messages to determine 
cause.


If I copy/paste the command from the postscript file, run it from ssh login, I 
get
[root@gpu002 xcat]# /usr/lpp/mmfs/bin/mmsdrrestore -p ut003 -R /usr/bin/scp
Tue Feb  7 18:26:10 UTC 2017: mmsdrrestore: Processing node gpu002
Warning: Permanently added 'ut002.oscar.ccv.brown.edu' (RSA) to the list of 
known hosts.
mmsdrrestore: Node gpu002 successfully restored.

There is no difference in the /root/.ssh files before or after. Why does it 
work by hand, but not from inside script?

Found that on nodes with X11, remoteshell script was taking 12 minutes to run 
to “completion”,
and the result is zero length id_rsa.pub file.

-rw--- 1 root root 821 Feb  7 12:59 authorized_keys
-rw--- 1 root root   0 Feb  7 13:09 id_rsa.pub
-rw-r--r-- 1 root root 183 Feb  7 13:02 known_hosts
4cd344ed6d3721a283f442977862b981  /root/.ssh/authorized_keys
d41d8cd98f00b204e9800998ecf8427e  /root/.ssh/id_rsa.pub
a178f5a553c74d99590b2047d9517363  /root/.ssh/known_hosts

I thought it was NetworkManager, but it turns out it was firewalld.
(chroot . systemctl disable firewalld )

— ddj

> On Feb 7, 2017, at 6:35 AM, David D Johnson  wrote:
> 
> That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
> I don't believe it will forward requests within the zones that it is 
> authoritative.
> I ended up using tabdump to recreate the hosts and nodelist tables. Mostly 
> good.
> 
> Now the problem of the day is fixing the SSH credentials so that all the 
> diskless nodes booting off the
> new frontend can get root access to all the nodes still booted off the old 
> frontend.  Need this
> especially for GPFS.  I've been trying to follow what's going on in the 
> remoteshell postscript,
> and I'm wondering if my "sitespecific" postscript is running before 
> "remoteshell" is competed.
> Is there a way to determine/force the order the postscripts are executed?  
> Sitespecific is after
> remoteshell both in alphabet and in the lsdef output. 
> The basic problem is that mmsdrrestore fails during sitespecific, but works 
> fine when I try it again later by hand.
> 
>  -- ddj
> Dave Johnson
> Brown University
> 
>> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao > > wrote:
>> 
>> Hi, David
>>  
>> Will you pls try 'chdef -t site forwarders=' and then 'makedns' 
>> to use mgt1 as your remote DNS server.
>> Pls feel free to let me know if there is any more issues.
>>  
>> Thx!
>> Best Regards,
>> ---
>> Zhao Er Tao
>> 
>> IBM China System and Technology Laboratory, Beijing
>> Tel:(86-10)82450485
>> Email: erta...@cn.ibm.com 
>> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
>> No.8 DongBeiWang West Road, Haidian District,
>> Beijing, 100193, P.R.China
>>  
>>  
>> - Original message -
>> From: "David D. Johnson" > >
>> To: "xcat-user@lists.sourceforge.net 
>> " > >
>> Cc:
>> Subject: [xcat-user] upgrading xCAT onto new servers
>> Date: Sat, Feb 4, 2017 3:04 AM
>>  
>> We’re upgrading cluster mgt node hardware and software at the same time, 
>> going from 2.8.3 to 2.13.1,
>> and from centos6.7 to rhels7.2.   I have the new frontend installed and 
>> somewhat functional.
>> Right now I’m needing to clone the DNS / named from “mgt1” that is still 
>> authoritative for the production cluster.
>> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m 
>> thinking there might be a way to make
>> the new mgt5 a slave to the existing named running on mgt1.   Any pros/cons? 
>>  What would you do?
>> 
>> Thanks,
>> 
>>  — ddj
>> --
>> 

Re: [xcat-user] upgrading xCAT onto new servers

2017-02-07 Thread David D Johnson
That was already the case (IP of mgt1 and IP of mgt[2] are the forwarders).
I don't believe it will forward requests within the zones that it is 
authoritative.
I ended up using tabdump to recreate the hosts and nodelist tables. Mostly good.

Now the problem of the day is fixing the SSH credentials so that all the 
diskless nodes booting off the
new frontend can get root access to all the nodes still booted off the old 
frontend.  Need this
especially for GPFS.  I've been trying to follow what's going on in the 
remoteshell postscript,
and I'm wondering if my "sitespecific" postscript is running before 
"remoteshell" is competed.
Is there a way to determine/force the order the postscripts are executed?  
Sitespecific is after
remoteshell both in alphabet and in the lsdef output. 
The basic problem is that mmsdrrestore fails during sitespecific, but works 
fine when I try it again later by hand.

 -- ddj
Dave Johnson
Brown University

> On Feb 7, 2017, at 4:32 AM, Er Tao Zhao  wrote:
> 
> Hi, David
>  
> Will you pls try 'chdef -t site forwarders=' and then 'makedns' 
> to use mgt1 as your remote DNS server.
> Pls feel free to let me know if there is any more issues.
>  
> Thx!
> Best Regards,
> ---
> Zhao Er Tao
> 
> IBM China System and Technology Laboratory, Beijing
> Tel:(86-10)82450485
> Email: erta...@cn.ibm.com
> Address: 1/F, 28 Building,ZhongGuanCun Software Park,
> No.8 DongBeiWang West Road, Haidian District,
> Beijing, 100193, P.R.China
>  
>  
> - Original message -
> From: "David D. Johnson" 
> To: "xcat-user@lists.sourceforge.net" 
> Cc:
> Subject: [xcat-user] upgrading xCAT onto new servers
> Date: Sat, Feb 4, 2017 3:04 AM
>  
> We’re upgrading cluster mgt node hardware and software at the same time, 
> going from 2.8.3 to 2.13.1,
> and from centos6.7 to rhels7.2.   I have the new frontend installed and 
> somewhat functional.
> Right now I’m needing to clone the DNS / named from “mgt1” that is still 
> authoritative for the production cluster.
> I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m 
> thinking there might be a way to make
> the new mgt5 a slave to the existing named running on mgt1.   Any pros/cons?  
> What would you do?
> 
> Thanks,
> 
>  — ddj
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot 
> 
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user 
> 
>  
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! 
> http://sdm.link/slashdot___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] upgrading xCAT onto new servers

2017-02-07 Thread Er Tao Zhao
Hi, David
 
Will you pls try 'chdef -t site forwarders=' and then 'makedns' to use mgt1 as your remote DNS server.
Pls feel free to let me know if there is any more issues.
 
Thx!
Best Regards,---Zhao Er TaoIBM China System and Technology Laboratory, BeijingTel:(86-10)82450485Email: erta...@cn.ibm.comAddress: 1/F, 28 Building,ZhongGuanCun Software Park,No.8 DongBeiWang West Road, Haidian District,Beijing, 100193, P.R.China
 
 
- Original message -From: "David D. Johnson" To: "xcat-user@lists.sourceforge.net" Cc:Subject: [xcat-user] upgrading xCAT onto new serversDate: Sat, Feb 4, 2017 3:04 AM 
We’re upgrading cluster mgt node hardware and software at the same time, going from 2.8.3 to 2.13.1,and from centos6.7 to rhels7.2.   I have the new frontend installed and somewhat functional.Right now I’m needing to clone the DNS / named from “mgt1” that is still authoritative for the production cluster.I could just tabdump hosts and nodelist and do makedns on “mgt5”, or I’m thinking there might be a way to makethe new mgt5 a slave to the existing named running on mgt1.   Any pros/cons?  What would you do?Thanks, — ddj--Check out the vibrant tech community on one of the world's mostengaging tech sites, SlashDot.org! http://sdm.link/slashdot___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user