Re: [xcat-user] resolv.conf multiple search domains

2018-07-03 Thread Yuan Y Bai
Hi Jeff,
 
I think "/etc/resolv.conf" in your node is over-written by dhclient.
You can try the following to get correct result.
 
If "/etc/resolv.conf" in litefile table is "ro", you can customize the rootimg/etc/resolv.conf, after the node is booted, you can get the customized /etc/resolv.conf, it is "ro".
 
If  "/etc/resolv.conf" in litefile table is "tmpfs" or "rw",  "/etc/resolv.conf" will be generated by dhclient. In this situation, nameservers and domain from "networks" table are more priority than that in "site" table. So you can configure correct nameservers and domain for specific network entry in networks table. If nameservers and domain are empty in networks table, you can correct nameservers and domain in site table. 
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: [xcat-user] resolv.conf multiple search domainsDate: Fri, Jun 29, 2018 9:54 PM 
Good afternoon,configuring the cluster is proceeding apace, and I find myself unclear on how best to deal with some dns issues.xcat 2.14.1, Master and statelite nodes all running CentOS7.5When I boot, a resolv.conf file is being generated and installed, and although it has the right nameservers, it does not have the search domains we want.  After liteimg, the .defaults/etc/resolv.conf file contains just the dummy line.  And on boot, I end up with a resolv.conf that looks like:search  nameserver nameserver nameserver nameserver That is, the same domain is duplicated on the search line.  The nameservers themselves are correct, though.I tried editing the .defaults/etc/resolv.conf file, but it had no effect.Obviously I am unclear on how that resolv.conf file is being generated, and any pointers that anyone can provide will be gratefully followed up.Jeff Berry, MRC CBU--Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org! http://sdm.link/slashdot___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user 
 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread Sam Davis
Will do, but it will be Thursday before I am back in the office. Thank you!

Sam

On Tue, Jul 3, 2018 at 11:35 PM Song BJ Yang  wrote:

> hi Sam,
>
> is the screenshot captured during the rootimg boot up? or during initrd
> boot up before rootimage tarball is download? please check the status of
> the node by `lsdef  -i status`
>
>
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
>
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>
>
>
> - Original message -
> From: "Sam Davis" 
> To: 
> Cc:
> Subject: [xcat-user] New XCAT installtion PXE boot issue.
> Date: Wed, Jul 4, 2018 2:56 AM
>
>
> Hello,
>
>   I am trying to setup a new HPC cluster using XCAT (2.14).  I have
> installed the management node and created the boot image (RHEL 7.5).  The
> node has been discover and PXE boots, downloading the image file.  But the
> boot process stalls and never finishes.  I have even copied over a working
> RHEL 7.3 image from our other cluster to see if that is the issue.  I’ve
> tried disabling and enabling hyperthreading in the client machine.  I’ve
> also updated the firmware on the client machine.  Does anyone have any
> ideas of what I might try next?
>
>
> Node Hardware
>
> IBM x3850 X5
>
> 256 GB RAM
>
> Machine Type 7143 AC1
>
> 4 x Intell Xeon E7 4820
>
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread Song BJ Yang
hi Sam,
 
is the screenshot captured during the rootimg boot up? or during initrd boot up before rootimage tarball is download? please check the status of the node by `lsdef  -i status`
 
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: "Sam Davis" To: Cc:Subject: [xcat-user] New XCAT installtion PXE boot issue.Date: Wed, Jul 4, 2018 2:56 AM 
Hello,
  I am trying to setup a new HPC cluster using XCAT (2.14).  I have installed the management node and created the boot image (RHEL 7.5).  The node has been discover and PXE boots, downloading the image file.  But the boot process stalls and never finishes.  I have even copied over a working RHEL 7.3 image from our other cluster to see if that is the issue.  I’ve tried disabling and enabling hyperthreading in the client machine.  I’ve also updated the firmware on the client machine.  Does anyone have any ideas of what I might try next?
Node Hardware
IBM x3850 X5
256 GB RAM
Machine Type 7143 AC1
4 x Intell Xeon E7 4820
 
 
--Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org! http://sdm.link/slashdot
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread david_johnson
Yes, I recall someone having trouble that went away when they turned off 
hardware flow control on the serial console. If you have  115200nr or 38400nr 
change it to get rid of the “r”. 

  -- ddj
Dave Johnson

> On Jul 3, 2018, at 5:17 PM, Michael Robbert  wrote:
> 
> I don't know if this is your issue, but I had similar symptoms recently while 
> testing RHEL/CentOS 7 images on our cluster that has worked for years just 
> fine with CentOS 6.x. I found that my problem had something to do with 
> console redirection. For some reason systemd was hanging while trying to send 
> console messages to the SOL. If I disabled that redirection by deleting the 
> nodehm.cons attribute for the node that I was testing then the node booted 
> fine. If you can do that and find that it works then you'll know what is 
> wrong and you can probably find out from somebody at Lenovo how that is 
> supposed to be configured. The hardware where I'm having this problem is from 
> SuperMicro so I'm on my own for figuring out how it is supposed to work. 
> Mike
>> On 7/3/18 2:26 PM, Sam Davis wrote:
>> I have left it trying to boot overnight with no success.  I did find earlier 
>> I was getting an NTP error during the boot cycle.  I configure ntpd on the 
>> management node and the client node reports syncing early in the boot 
>> process now.
>>  
>> From: david_john...@brown.edu  
>> Sent: Tuesday, July 03, 2018 4:01 PM
>> To: xCAT Users Mailing list 
>> Subject: Re: [xcat-user] New XCAT installtion PXE boot issue.
>>  
>> So a quick question is how long is never?  I had a similar situation today 
>> and the setupntp script was taking a really long time but finally gave up. 
>> The problem for me was that chronyd was not configured on the management 
>> node to respond on any network interfaces. Chronyd has replaced ntpd on 
>> redhat 7.  
>> 
>>   -- ddj
>> Dave Johnson
>> 
>> On Jul 3, 2018, at 2:55 PM, Sam Davis  wrote:
>> 
>> Hello,
>>   I am trying to setup a new HPC cluster using XCAT (2.14).  I have 
>> installed the management node and created the boot image (RHEL 7.5).  The 
>> node has been discover and PXE boots, downloading the image file.  But the 
>> boot process stalls and never finishes.  I have even copied over a working 
>> RHEL 7.3 image from our other cluster to see if that is the issue.  I’ve 
>> tried disabling and enabling hyperthreading in the client machine.  I’ve 
>> also updated the firmware on the client machine.  Does anyone have any ideas 
>> of what I might try next?
>> 
>> 
>> 
>> Node Hardware
>> IBM x3850 X5
>> 256 GB RAM
>> Machine Type 7143 AC1
>> 4 x Intell Xeon E7 4820
>>  
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>> 
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> 
>> 
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread Michael Robbert
I don't know if this is your issue, but I had similar symptoms recently 
while testing RHEL/CentOS 7 images on our cluster that has worked for 
years just fine with CentOS 6.x. I found that my problem had something 
to do with console redirection. For some reason systemd was hanging 
while trying to send console messages to the SOL. If I disabled that 
redirection by deleting the nodehm.cons attribute for the node that I 
was testing then the node booted fine. If you can do that and find that 
it works then you'll know what is wrong and you can probably find out 
from somebody at Lenovo how that is supposed to be configured. The 
hardware where I'm having this problem is from SuperMicro so I'm on my 
own for figuring out how it is supposed to work.


Mike

On 7/3/18 2:26 PM, Sam Davis wrote:


I have left it trying to boot overnight with no success.  I did find 
earlier I was getting an NTP error during the boot cycle.  I configure 
ntpd on the management node and the client node reports syncing early 
in the boot process now.


*From:* david_john...@brown.edu 
*Sent:* Tuesday, July 03, 2018 4:01 PM
*To:* xCAT Users Mailing list 
*Subject:* Re: [xcat-user] New XCAT installtion PXE boot issue.

So a quick question is how long is never?  I had a similar situation 
today and the setupntp script was taking a really long time but 
finally gave up. The problem for me was that chronyd was not 
configured on the management node to respond on any network 
interfaces. Chronyd has replaced ntpd on redhat 7.


  -- ddj

Dave Johnson


On Jul 3, 2018, at 2:55 PM, Sam Davis > wrote:


Hello,

  I am trying to setup a new HPC cluster using XCAT (2.14).  I
have installed the management node and created the boot image
(RHEL 7.5). The node has been discover and PXE boots, downloading
the image file.  But the boot process stalls and never finishes. 
I have even copied over a working RHEL 7.3 image from our other
cluster to see if that is the issue. I’ve tried disabling and
enabling hyperthreading in the client machine.  I’ve also updated
the firmware on the client machine.  Does anyone have any ideas of
what I might try next?



Node Hardware

IBM x3850 X5

256 GB RAM

Machine Type 7143 AC1

4 x Intell Xeon E7 4820




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org !
http://sdm.link/slashdot

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net

https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread David D Johnson
The next thing that it would be doing at this point is downloading the rootimg 
via the http server on the management node.
Maybe check the httpd logs? 

-- ddj

> On Jul 3, 2018, at 4:28 PM, Sam Davis  wrote:
> 
> I see that this list does strip attachments.  The last line on the console 
> when it effectively hangs is:
> 
> [  925.551288] IPv6: ADDRCONF(NETDEV_CHANGE): eeno1: link becomes ready
>  
> From: david_john...@brown.edu  
> Sent: Tuesday, July 03, 2018 4:01 PM
> To: xCAT Users Mailing list 
> Subject: Re: [xcat-user] New XCAT installtion PXE boot issue.
>  
> So a quick question is how long is never?  I had a similar situation today 
> and the setupntp script was taking a really long time but finally gave up. 
> The problem for me was that chronyd was not configured on the management node 
> to respond on any network interfaces. Chronyd has replaced ntpd on redhat 7.  
> 
>   -- ddj
> Dave Johnson
> 
> On Jul 3, 2018, at 2:55 PM, Sam Davis  > wrote:
> 
>> Hello,
>>   I am trying to setup a new HPC cluster using XCAT (2.14).  I have 
>> installed the management node and created the boot image (RHEL 7.5).  The 
>> node has been discover and PXE boots, downloading the image file.  But the 
>> boot process stalls and never finishes.  I have even copied over a working 
>> RHEL 7.3 image from our other cluster to see if that is the issue.  I’ve 
>> tried disabling and enabling hyperthreading in the client machine.  I’ve 
>> also updated the firmware on the client machine.  Does anyone have any ideas 
>> of what I might try next?
>> 
>> 
>> 
>> Node Hardware
>> IBM x3850 X5
>> 256 GB RAM
>> Machine Type 7143 AC1
>> 4 x Intell Xeon E7 4820
>>  
>> 
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org ! 
>> http://sdm.link/slashdot 
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net 
>> https://lists.sourceforge.net/lists/listinfo/xcat-user 
>> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org ! 
> http://sdm.link/slashdot___ 
> 
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net 
> https://lists.sourceforge.net/lists/listinfo/xcat-user 
> 
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] New XCAT installtion PXE boot issue.

2018-07-03 Thread Sam Davis
I have left it trying to boot overnight with no success.  I did find earlier I 
was getting an NTP error during the boot cycle.  I configure ntpd on the 
management node and the client node reports syncing early in the boot process 
now.

 

From: david_john...@brown.edu  
Sent: Tuesday, July 03, 2018 4:01 PM
To: xCAT Users Mailing list 
Subject: Re: [xcat-user] New XCAT installtion PXE boot issue.

 

So a quick question is how long is never?  I had a similar situation today and 
the setupntp script was taking a really long time but finally gave up. The 
problem for me was that chronyd was not configured on the management node to 
respond on any network interfaces. Chronyd has replaced ntpd on redhat 7.  

  -- ddj

Dave Johnson


On Jul 3, 2018, at 2:55 PM, Sam Davis mailto:aractha...@gmail.com> > wrote:

Hello,

  I am trying to setup a new HPC cluster using XCAT (2.14).  I have installed 
the management node and created the boot image (RHEL 7.5).  The node has been 
discover and PXE boots, downloading the image file.  But the boot process 
stalls and never finishes.  I have even copied over a working RHEL 7.3 image 
from our other cluster to see if that is the issue.  I’ve tried disabling and 
enabling hyperthreading in the client machine.  I’ve also updated the firmware 
on the client machine.  Does anyone have any ideas of what I might try next?





Node Hardware

IBM x3850 X5

256 GB RAM

Machine Type 7143 AC1

4 x Intell Xeon E7 4820

 



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org  ! 
http://sdm.link/slashdot

___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net  
https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master

2018-07-03 Thread david_johnson
In this case it was because the old cluster had a copy of the centos 7.3 iso 
which was called for in the directions, and I used nfs mount to copy it.  Just 
forgot to unmount it afterwards. I maybe could have said -o ro or used scp, but 
nfs is the most convenient. 
I had originally hoped to manage two clusters from one master but the problems 
with separate default routes, separate resolv.conf, separate root passwords and 
ssh keys, separate domain names and separate dhcp all wore me down to the point 
that I needed to make this host a new master. 

I still had a problem yesterday with rebooting this diskless node when it tried 
to start gpfs.  It ended up getting expelled, and came back with only one out 
of three gpfs filesystems mounted. I hope this is not repeatable. Other  nodes 
had gpfs hang while this was going on. 

  -- ddj
Dave Johnson

> On Jul 3, 2018, at 5:30 AM, Song BJ Yang  wrote:
> 
> hi David,
>  
> Glad to hear that you have resolved this.
>  
> I have a question for you, why do you need to mount the whole /install 
> directory among the management nodes? is it just a mistake? or do you want to 
> share something, like installation sources and customized osimage related 
> files ?  if the answer is the latter, we are working on some solution to 
> share the cluster inventory data and osimages between different cluster
>  
> thanks  
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
> 
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>  
>  
> - Original message -
> From: David Johnson 
> To: xCAT Users Mailing list 
> Cc:
> Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from 
> production xcat master
> Date: Tue, Jul 3, 2018 3:48 AM
>  
> Well, I went through and restored all updated files in /install/postscripts 
> back to previous version, and now  
> my node boots again.  Something in one of the scripts probably depends on 
> something in the newer binaries
> that were not updated.   Thanks for the debugging tips.  
>  
>  — ddj
>  
>> 
>> On Jul 2, 2018, at 3:24 PM, David Johnson  wrote:
>>  
>> .
>> .
>> .
>> Running command on mgt5.oscar.ccv.brown.edu: chmod -R a+r 
>> /install/postscripts 2>&1
>>  
>>   mgt5.oscar.ccv.brown.edu: Internal call command: xdsh node552 --nodestatus 
>> -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.20.0.6 'setupntp' 
>> --tftp /tftpboot --installdir /install --nfsv4 no -c -V
>> Running command on mgt5.oscar.ccv.brown.edu: /bin/hostname 2>&1
>> Running command on mgt5.oscar.ccv.brown.edu: ip -4 --oneline addr show |awk 
>> -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
>> Running command on mgt5.oscar.ccv.brown.edu: hostname 2>&1
>> Running command on mgt5.oscar.ccv.brown.edu: /opt/xcat/bin/pping node552 2>&1
>> node552: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
>>  
>> Error: node552 remote shell had error code: 255
>> [root@mgt5 xcat]# 
>>  
>>  
>>> 
>>> On Jul 2, 2018, at 3:20 PM, David Johnson  wrote:
>>>  
>>> After the chdef command and reboot the message in /var/log/xcat/xcat.log is 
>>> exactly the same
>>> [root@node552 xcat]# cat xcat.log 
>>> Mon Jul  2 15:18:10 EDT 2018 [info]: xcat.xcatdsklspost: trying to download 
>>> postscripts...
>>> Mon Jul  2 15:18:26 EDT 2018 [err]: xcat.xcatdsklspost: failed to download 
>>> the postscripts from the xCAT server for node node552.oscar.ccv.brown.edu
>>>  
>>> [root@mgt5 xcat]# tabdump site | grep debug
>>> "xcatdebugmode","1",,
>>> [root@mgt5 xcat]# 
>>>  
>>>  
 
 On Jul 2, 2018, at 3:13 PM, Casandra H Qiu  wrote:
  
 updatenode will issue command from management node and process command 
 (postscripts) on the compute node after node is booted. 
 postscripts on the compute node /xcatpost/ will be downloaded again from 
 MN:/install/postscripts (not sure if you want to do this)
 
 ]# updatenode mid21tor24cn01 setupntp -V
 [boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F 
 ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
 
 [boston01]: Running command on boston01: chmod -R a+r /install/postscripts 
 2>&1
 
 [boston01]: boston01: Internal call command: xdsh mid21tor24cn01 
 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.16.37.1 
 'setupntp' --tftp /tftpboot --installdir /install --nfsv4 no -c -V
 [boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F 
 ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
 [boston01]: Running command on boston01: hostname 2>&1
 [boston01]: Running command on boston01: /opt/xcat/bin/pping 
 mid21tor24cn01 2>&1
 [boston01]: mid21tor24cn01: Running /tmp/filez8tn6J.dsh 1 -m 172.16.37.1 
 setupntp --tftp 

Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat master

2018-07-03 Thread Song BJ Yang
hi David,
 
Glad to hear that you have resolved this.
 
I have a question for you, why do you need to mount the whole /install directory among the management nodes? is it just a mistake? or do you want to share something, like installation sources and customized osimage related files ?  if the answer is the latter, we are working on some solution to share the cluster inventory data and osimages between different cluster
 
thanks  
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: David Johnson To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] go-xcat accidentally run with /install mounted from production xcat masterDate: Tue, Jul 3, 2018 3:48 AM Well, I went through and restored all updated files in /install/postscripts back to previous version, and now 
my node boots again.  Something in one of the scripts probably depends on something in the newer binaries
that were not updated.   Thanks for the debugging tips.  
 
 — ddj
 
On Jul 2, 2018, at 3:24 PM, David Johnson  wrote: 


.
.
.
Running command on mgt5.oscar.ccv.brown.edu: chmod -R a+r /install/postscripts 2>&1
 
  mgt5.oscar.ccv.brown.edu: Internal call command: xdsh node552 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.20.0.6 'setupntp' --tftp /tftpboot --installdir /install --nfsv4 no -c -V
Running command on mgt5.oscar.ccv.brown.edu: /bin/hostname 2>&1
Running command on mgt5.oscar.ccv.brown.edu: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1
Running command on mgt5.oscar.ccv.brown.edu: hostname 2>&1
Running command on mgt5.oscar.ccv.brown.edu: /opt/xcat/bin/pping node552 2>&1
node552: Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
 
Error: node552 remote shell had error code: 255
[root@mgt5 xcat]# 
 
 
On Jul 2, 2018, at 3:20 PM, David Johnson  wrote: 


After the chdef command and reboot the message in /var/log/xcat/xcat.log is exactly the same
[root@node552 xcat]# cat xcat.log 
Mon Jul  2 15:18:10 EDT 2018 [info]: xcat.xcatdsklspost: trying to download postscripts...
Mon Jul  2 15:18:26 EDT 2018 [err]: xcat.xcatdsklspost: failed to download the postscripts from the xCAT server for node node552.oscar.ccv.brown.edu
 
[root@mgt5 xcat]# tabdump site | grep debug
"xcatdebugmode","1",,
[root@mgt5 xcat]# 
 
 
On Jul 2, 2018, at 3:13 PM, Casandra H Qiu  wrote: 

updatenode will issue command from management node and process command (postscripts) on the compute node after node is booted. postscripts on the compute node /xcatpost/ will be downloaded again from MN:/install/postscripts (not sure if you want to do this)]# updatenode mid21tor24cn01 setupntp -V[boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1[boston01]: Running command on boston01: chmod -R a+r /install/postscripts 2>&1[boston01]: boston01: Internal call command: xdsh mid21tor24cn01 --nodestatus -s -v -e /install/postscripts/xcatdsklspost 1 -m 172.16.37.1 'setupntp' --tftp /tftpboot --installdir /install --nfsv4 no -c -V[boston01]: Running command on boston01: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1[boston01]: Running command on boston01: hostname 2>&1[boston01]: Running command on boston01: /opt/xcat/bin/pping mid21tor24cn01 2>&1[boston01]: mid21tor24cn01: Running /tmp/filez8tn6J.dsh 1 -m 172.16.37.1 setupntp --tftp /tftpboot --installdir /install --nfsv4 no -c -V[boston01]: mid21tor24cn01: trying to download postscripts...[boston01]: mid21tor24cn01: trying to download postscripts from http://172.16.37.1/install/postscripts/[boston01]: mid21tor24cn01: postscripts are downloaded from 172.16.37.1 successfully.[boston01]: mid21tor24cn01: postscripts downloaded successfully[boston01]: mid21tor24cn01: trying to get mypostscript from 172.16.37.1...[boston01]: mid21tor24cn01: trying to download http://172.16.37.1/tftpboot/mypostscripts/mypostscript.mid21tor24cn01...[boston01]: mid21tor24cn01: mypostscript.mid21tor24cn01 is downloaded successfully.[boston01]: mid21tor24cn01: Running //xcatpost/mypostscript[boston01]: mid21tor24cn01: Mon Jul 2 14:39:39 EDT 2018 Running postscript: setupntp[boston01]: mid21tor24cn01: Failed to set time zone: Invalid time zone 'America/New_York'[boston01]: mid21tor24cn01: inactive[boston01]: mid21tor24cn01: syncing the clock ...[boston01]: mid21tor24cn01: WARNING: NTP Sync Failed before timeout. ntp server will try to sync...[boston01]: mid21tor24cn01: Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service.[boston01]: mid21tor24cn01: 

Re: [xcat-user] resolv.conf multiple search domains

2018-07-03 Thread Yuan Y Bai
You can configure litetree table, here is example:
 
[root@bybc0602 ~]# tabdump litetree#priority,image,directory,mntopts,comments,disable"1","rhels7.3-custom-statelite","bybc0602:/statelite/install/",,,
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] resolv.conf multiple search domainsDate: Mon, Jul 2, 2018 5:11 PM  
Hi Yuan,
 
that’s what I thought should be happening – but it isn’t working properly.     When I log into the node and compare /etc/resolv.conf and /.default/etc/resolv.conf  they are not the same.  
 
I wondering if there’s a configuration setting that I’ve got wrong.
 
Best,
 
Jeff
 
 
From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 02 July 2018 03:40To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] resolv.conf multiple search domains
 
Hi Jeff,
 
You can also customize /etc/resolv.conf.
 
When a node boots up in statelite mode, it will by default copy all of its tmpfs files from the .default directory of the root image, for example /install/netboot/rhels7.3/x86_64/compute/rootimg/.default, so there is not required to set up a litetree table. If you decide that you want some of the files pulled from different locations that are different per node, you can use this table. The litetree table controls where the initial content of the files in the litefile table come from, and the long term content of the ro files.
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: [xcat-user] resolv.conf multiple search domainsDate: Fri, Jun 29, 2018 9:54 PM  
Good afternoon,configuring the cluster is proceeding apace, and I find myself unclear on how best to deal with some dns issues.xcat 2.14.1, Master and statelite nodes all running CentOS7.5When I boot, a resolv.conf file is being generated and installed, and although it has the right nameservers, it does not have the search domains we want.  After liteimg, the .defaults/etc/resolv.conf file contains just the dummy line.  And on boot, I end up with a resolv.conf that looks like:search  nameserver nameserver nameserver nameserver That is, the same domain is duplicated on the search line.  The nameservers themselves are correct, though.I tried editing the .defaults/etc/resolv.conf file, but it had no effect.Obviously I am unclear on how that resolv.conf file is being generated, and any pointers that anyone can provide will be gratefully followed up.Jeff Berry, MRC CBU--Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org!  http://sdm.link/slashdot___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user 
 
 
--Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org! http://sdm.link/slashdot
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user