Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-21 Thread Jeff Berry
Yes -  console=tty0 console=ttyS1,115200 MNTOPTS=

I rebuilt the xcat server with CentOS 7.5 (and built images with that), and now 
the drac console is working properly.  In any case, to a first approximation, I 
can deploy the cluster.

Thanks again,

Jeff Berry

From: david_john...@brown.edu [mailto:david_john...@brown.edu]
Sent: 20 June 2018 15:58
To: xCAT Users Mailing list 
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems

Do you have serial console on ttyS1 at 115200 baud with no flow control?
The dell servers I have used use second serial port for console, rather than 
ttyS0 that most IBM Lenovo servers use.
  -- ddj
Dave Johnson

On Jun 20, 2018, at 7:02 AM, Song BJ Yang 
mailto:yang...@cn.ibm.com>> wrote:
hi,

is it possible to provide the screen log of the running `genimage` and  system 
boot up on console after `rinstall/rpower`?
--
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com>
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193


- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Wed, Jun 20, 2018 6:25 PM



Hi,



the message is on boot and I never get to a console prompt.  I’m starting to 
wonder if there may be some issue with dracut, possibly on the specific 
hardware I’m using.   I’m using Dell C6420s, and try as I might I can’t get the 
install to drop to a dracut shell.



Regards,



Jeff



From: Song BJ Yang [mailto:yang...@cn.ibm.com]
Sent: 20 June 2018 08:19
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems



hi Jeff Berry,



when did you see the error message? during `genimage`? or during node boot up?



"

code killed, status 6/ABRT

on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
shut down.

"



can you see the login prompt on the console?







--
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com>
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193





- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Tue, Jun 19, 2018 4:42 PM


Hi Yuan,



I made the change to litefile as you suggested.

I was already running break.cleanup, but I  tried pre-pivot as well.

In both cases, the node boots, is pingable, but the console doesn’t get to 
login and ssh gives a connection refused.



Best,



Jeff



From: Yuan Y Bai [mailto:by...@cn.ibm.com]
Sent: 19 June 2018 08:37
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems



Hi Jeff,



Could you try rd.break=cleanup as following, or you can try to set break point 
addkcmdline=rd.break=pre-pivot.



 chdef node-i01 addkcmdline=rd.break=cleanup

 rinstall node-i01 osimage

 rcons node-i01



Have you try to add "/etc/systemd/" in litefile? Now we just add  
"/etc/systemd/system/multi-user.target.wants/".







Best Regards
--
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com>
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193





- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Mon, Jun 18, 2018 5:36 PM


Hi everyone,



thanks for the pointers.   I decided to go back to the very beginning and did a 
clean reinstall of xcat:

Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri 
Jun  1 03:00:53 EDT 2018)



then I walked through the documentation - 
https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  
I’m no longer getting udev errors, but I’m still getting

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-20 Thread david_johnson
Do you have serial console on ttyS1 at 115200 baud with no flow control?
The dell servers I have used use second serial port for console, rather than 
ttyS0 that most IBM Lenovo servers use. 

  -- ddj
Dave Johnson

> On Jun 20, 2018, at 7:02 AM, Song BJ Yang  wrote:
> 
> hi, 
>  
> is it possible to provide the screen log of the running `genimage` and  
> system boot up on console after `rinstall/rpower`?
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
> 
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>  
>  
> - Original message -
> From: Jeff Berry 
> To: xCAT Users Mailing list 
> Cc:
> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
> Date: Wed, Jun 20, 2018 6:25 PM
>  
> Hi,
> 
>  
> 
> the message is on boot and I never get to a console prompt.  I’m starting to 
> wonder if there may be some issue with dracut, possibly on the specific 
> hardware I’m using.   I’m using Dell C6420s, and try as I might I can’t get 
> the install to drop to a dracut shell.
> 
>  
> 
> Regards,
> 
>  
> 
> Jeff
> 
>  
> 
> From: Song BJ Yang [mailto:yang...@cn.ibm.com]
> Sent: 20 June 2018 08:19
> To: xcat-user@lists.sourceforge.net
> Cc: xcat-user@lists.sourceforge.net
> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
> 
>  
> 
> hi Jeff Berry,
> 
>  
> 
> when did you see the error message? during `genimage`? or during node boot 
> up? 
> 
>  
> 
> "
> 
> code killed, status 6/ABRT
> 
> on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
> shut down.
> 
> "
> 
>  
> 
> can you see the login prompt on the console?
> 
>  
> 
>  
> 
>  
> 
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
> 
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
> 
>  
> 
>  
> 
> - Original message -
> From: Jeff Berry 
> To: xCAT Users Mailing list 
> Cc:
> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
> Date: Tue, Jun 19, 2018 4:42 PM
>  
> 
> Hi Yuan,
> 
>  
> 
> I made the change to litefile as you suggested.
> 
> I was already running break.cleanup, but I  tried pre-pivot as well.
> 
> In both cases, the node boots, is pingable, but the console doesn’t get to 
> login and ssh gives a connection refused.
> 
>  
> 
> Best,
> 
>  
> 
> Jeff
> 
>  
> 
> From: Yuan Y Bai [mailto:by...@cn.ibm.com]
> Sent: 19 June 2018 08:37
> To: xcat-user@lists.sourceforge.net
> Cc: xcat-user@lists.sourceforge.net
> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
> 
>  
> 
> Hi Jeff,
> 
>  
> 
> Could you try rd.break=cleanup as following, or you can try to set break 
> point addkcmdline=rd.break=pre-pivot.
> 
>  
> 
>  chdef node-i01 addkcmdline=rd.break=cleanup
> 
>  rinstall node-i01 osimage
> 
>  rcons node-i01
> 
>  
> 
> Have you try to add "/etc/systemd/" in litefile? Now we just add  
> "/etc/systemd/system/multi-user.target.wants/".
> 
>  
> 
>  
> 
>  
> 
> Best Regards
> --
> Yuan Bai (白媛)
> 
> CSTL HPC System Management Development
> Tel:86-10-82451401
> E-mail: by...@cn.ibm.com
> Address: IBM ZGC Campus. Ring Building 28,
> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
> Beijing P.R.China 100193
> 
> IBM环宇大厦
> 北京市海淀区东北旺西路8号,中关村软件园28号楼
> 邮编:100193
> 
>  
> 
>  
> 
> - Original message -
> From: Jeff Berry 
> To: xCAT Users Mailing list 
> Cc:
> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
> Date: Mon, Jun 18, 2018 5:36 PM
>  
> 
> Hi everyone,
> 
>  
> 
> thanks for the pointers.   I decided to go back to the very beginning and did 
> a clean reinstall of xcat:
> 
> Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built 
> Fri Jun  1 03:00:53 EDT 2018)
> 
>  
> 
> then I walked through the documentation - 
> https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. 
>  I’m no longer getting udev errors, but

Re: [xcat-user] SciLinux 7.4 statelite problems - mostly solved

2018-06-20 Thread Jeff Berry
Hi all,

many thanks to Gilad Berman who suggested that I disable serialflow=hard.  With 
that disabled, I can now ssh into the node.

Oddly, neither the drac console or rcons will provide a login prompt.   Still, 
this has both pointed me to the place I should be looking and also given me a 
workaround to get the rest of the deployment sorted.

Thanks to everyone (and especially Gilad) for the help and advice,

Jeff Berry, MRC Cognition and Brain Sciences Unit
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-20 Thread Song BJ Yang
hi, 
 
is it possible to provide the screen log of the running `genimage` and  system boot up on console after `rinstall/rpower`?
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Wed, Jun 20, 2018 6:25 PM  
Hi,
 
the message is on boot and I never get to a console prompt.  I’m starting to wonder if there may be some issue with dracut, possibly on the specific hardware I’m using.   I’m using Dell C6420s, and try as I might I can’t get the install to drop to a dracut shell.
 
Regards,
 
Jeff
 
From: Song BJ Yang [mailto:yang...@cn.ibm.com]Sent: 20 June 2018 08:19To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems
 
hi Jeff Berry,
 
when did you see the error message? during `genimage`? or during node boot up? 
 
"
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down.
"
 
can you see the login prompt on the console?
 
 
 
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 19, 2018 4:42 PM 
Hi Yuan,
 
I made the change to litefile as you suggested.
I was already running break.cleanup, but I  tried pre-pivot as well.
In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused.
 
Best,
 
Jeff
 
From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 19 June 2018 08:37To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems
 
Hi Jeff,
 
Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot.
 
 chdef node-i01 addkcmdline=rd.break=cleanup
 rinstall node-i01 osimage
 rcons node-i01
 
Have you try to add "/etc/systemd/" in litefile? Now we just add  "/etc/systemd/system/multi-user.target.wants/". 
 
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM 
Hi everyone,
 
thanks for the pointers.   I decided to go back to the very beginning and did a clean reinstall of xcat:
Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun  1 03:00:53 EDT 2018)
 
then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  I’m no longer getting udev errors, but I’m still getting journald errors:
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down.
 
which looks like it might be a space/memory issue?
 
In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420).  It’s pingable at the correct ip address.
 
As per the email below, I checked the image for pkglist, exlist, and postinall:
 
Object name: SL7.4-statelite-v1
    exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
    imagetype=linux
    osarch=x86_64
    osdistroname=SL7.4-x86_64
    osname=Linux
    osvers=SL7.4
    otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64
    permission=755
    pkgdir=/install/SL7.4/x86_64
    pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist
    postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall
    profile="">
    provmethod=statelite
    rootimgdir=/install/netboot/SL7.4/x86_64/compute
 
I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ...
the litefile is standard, but I’m thinking that I 

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-20 Thread Jeff Berry
Hi,

the message is on boot and I never get to a console prompt.  I’m starting to 
wonder if there may be some issue with dracut, possibly on the specific 
hardware I’m using.   I’m using Dell C6420s, and try as I might I can’t get the 
install to drop to a dracut shell.

Regards,

Jeff

From: Song BJ Yang [mailto:yang...@cn.ibm.com]
Sent: 20 June 2018 08:19
To: xcat-user@lists.sourceforge.net
Cc: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems

hi Jeff Berry,

when did you see the error message? during `genimage`? or during node boot up?

"

code killed, status 6/ABRT

on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
shut down.
"

can you see the login prompt on the console?



--
YANG Song (杨嵩)
IBM China System Technology Laboratory
Tel: 86-10-82452903
Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com>
Address: Building 28, ZhongGuanCun Software Park,
No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC

北京市海淀区东北旺西路8号中关村软件园28号楼
邮编: 100193


- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Tue, Jun 19, 2018 4:42 PM


Hi Yuan,



I made the change to litefile as you suggested.

I was already running break.cleanup, but I  tried pre-pivot as well.

In both cases, the node boots, is pingable, but the console doesn’t get to 
login and ssh gives a connection refused.



Best,



Jeff



From: Yuan Y Bai [mailto:by...@cn.ibm.com]
Sent: 19 June 2018 08:37
To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net>
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems



Hi Jeff,



Could you try rd.break=cleanup as following, or you can try to set break point 
addkcmdline=rd.break=pre-pivot.



 chdef node-i01 addkcmdline=rd.break=cleanup

 rinstall node-i01 osimage

 rcons node-i01



Have you try to add "/etc/systemd/" in litefile? Now we just add  
"/etc/systemd/system/multi-user.target.wants/".







Best Regards
--
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com>
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193





- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Mon, Jun 18, 2018 5:36 PM


Hi everyone,



thanks for the pointers.   I decided to go back to the very beginning and did a 
clean reinstall of xcat:

Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri 
Jun  1 03:00:53 EDT 2018)



then I walked through the documentation - 
https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  
I’m no longer getting udev errors, but I’m still getting journald errors:

code killed, status 6/ABRT

on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
shut down.



which looks like it might be a space/memory issue?



In any case, even just after boot, I have the same problem where I can’t ssh to 
the node or rcons, or even get a console prompt on the drac card (it’s a dell 
C6420).  It’s pingable at the correct ip address.



As per the email below, I checked the image for pkglist, exlist, and postinall:



Object name: SL7.4-statelite-v1

exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist

imagetype=linux

osarch=x86_64

osdistroname=SL7.4-x86_64

osname=Linux

osvers=SL7.4

otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64

permission=755

pkgdir=/install/SL7.4/x86_64

pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist


postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall

profile=compute

provmethod=statelite

rootimgdir=/install/netboot/SL7.4/x86_64/compute



I had a brief moment where I thought it might be an selinux problem, but in the 
rootimg selinux is disabled in /etc/selinux/config ...

the litefile is standard, but I’m thinking that I might change /var and /run to 
persistent to see if I can some extra insight into what’s happening on the node.

#image,file,options,comments,disable

"ALL","/etc/adjtime","tmpfs",,

"ALL","/etc/securetty","tmpfs",,

"ALL","/etc/lvm/","tmpfs",,

"ALL","/etc/

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-20 Thread Song BJ Yang
hi Jeff Berry,
 
when did you see the error message? during `genimage`? or during node boot up? 
 
"
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down.
"
 
can you see the login prompt on the console?
 
 
 
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 19, 2018 4:42 PM  
Hi Yuan,
 
I made the change to litefile as you suggested.
I was already running break.cleanup, but I  tried pre-pivot as well.
In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused.
 
Best,
 
Jeff
 
From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 19 June 2018 08:37To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems
 
Hi Jeff,
 
Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot.
 
 chdef node-i01 addkcmdline=rd.break=cleanup
 rinstall node-i01 osimage
 rcons node-i01
 
Have you try to add "/etc/systemd/" in litefile? Now we just add  "/etc/systemd/system/multi-user.target.wants/". 
 
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM 
Hi everyone,
 
thanks for the pointers.   I decided to go back to the very beginning and did a clean reinstall of xcat:
Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun  1 03:00:53 EDT 2018)
 
then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  I’m no longer getting udev errors, but I’m still getting journald errors:
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down.
 
which looks like it might be a space/memory issue?
 
In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420).  It’s pingable at the correct ip address.
 
As per the email below, I checked the image for pkglist, exlist, and postinall:
 
Object name: SL7.4-statelite-v1
    exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
    imagetype=linux
    osarch=x86_64
    osdistroname=SL7.4-x86_64
    osname=Linux
    osvers=SL7.4
    otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64
    permission=755
    pkgdir=/install/SL7.4/x86_64
    pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist
    postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall
    profile="">
    provmethod=statelite
    rootimgdir=/install/netboot/SL7.4/x86_64/compute
 
I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ...
the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node.
#image,file,options,comments,disable
"ALL","/etc/adjtime","tmpfs",,
"ALL","/etc/securetty","tmpfs",,
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/rsyslog.conf","tmpfs",,
"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,
"ALL","/etc/udev/","tmpfs",,
"ALL","/etc/ntp.conf.predhclient","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/resolv.conf.predhclient","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/opt/xc

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-19 Thread Jeff Berry
Hi Yuan,

I made the change to litefile as you suggested.
I was already running break.cleanup, but I  tried pre-pivot as well.
In both cases, the node boots, is pingable, but the console doesn’t get to 
login and ssh gives a connection refused.

Best,

Jeff

From: Yuan Y Bai [mailto:by...@cn.ibm.com]
Sent: 19 June 2018 08:37
To: xcat-user@lists.sourceforge.net
Cc: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems

Hi Jeff,

Could you try rd.break=cleanup as following, or you can try to set break point 
addkcmdline=rd.break=pre-pivot.


 chdef node-i01 addkcmdline=rd.break=cleanup

 rinstall node-i01 osimage

 rcons node-i01

Have you try to add "/etc/systemd/" in litefile? Now we just add  
"/etc/systemd/system/multi-user.target.wants/".



Best Regards
--
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com>
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193


- Original message -
From: Jeff Berry 
mailto:jeff.be...@mrc-cbu.cam.ac.uk>>
To: xCAT Users Mailing list 
mailto:xcat-user@lists.sourceforge.net>>
Cc:
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems
Date: Mon, Jun 18, 2018 5:36 PM


Hi everyone,



thanks for the pointers.   I decided to go back to the very beginning and did a 
clean reinstall of xcat:

Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri 
Jun  1 03:00:53 EDT 2018)



then I walked through the documentation - 
https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  
I’m no longer getting udev errors, but I’m still getting journald errors:

code killed, status 6/ABRT

on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
shut down.



which looks like it might be a space/memory issue?



In any case, even just after boot, I have the same problem where I can’t ssh to 
the node or rcons, or even get a console prompt on the drac card (it’s a dell 
C6420).  It’s pingable at the correct ip address.



As per the email below, I checked the image for pkglist, exlist, and postinall:



Object name: SL7.4-statelite-v1

exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist

imagetype=linux

osarch=x86_64

osdistroname=SL7.4-x86_64

osname=Linux

osvers=SL7.4

otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64

permission=755

pkgdir=/install/SL7.4/x86_64

pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist


postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall

profile=compute

provmethod=statelite

rootimgdir=/install/netboot/SL7.4/x86_64/compute



I had a brief moment where I thought it might be an selinux problem, but in the 
rootimg selinux is disabled in /etc/selinux/config ...

the litefile is standard, but I’m thinking that I might change /var and /run to 
persistent to see if I can some extra insight into what’s happening on the node.

#image,file,options,comments,disable

"ALL","/etc/adjtime","tmpfs",,

"ALL","/etc/securetty","tmpfs",,

"ALL","/etc/lvm/","tmpfs",,

"ALL","/etc/ntp.conf","tmpfs",,

"ALL","/etc/rsyslog.conf","tmpfs",,

"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,

"ALL","/etc/udev/","tmpfs",,

"ALL","/etc/ntp.conf.predhclient","tmpfs",,

"ALL","/etc/resolv.conf","tmpfs",,

"ALL","/etc/yp.conf","tmpfs",,

"ALL","/etc/resolv.conf.predhclient","tmpfs",,

"ALL","/etc/sysconfig/","tmpfs",,

"ALL","/etc/ssh/","tmpfs",,

"ALL","/etc/inittab","tmpfs",,

"ALL","/tmp/","tmpfs",,

"ALL","/var/","tmpfs",,

"ALL","/opt/xcat/","tmpfs",,

"ALL","/xcatpost/","tmpfs",,

"ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",,

"ALL","/root/.ssh/","tmpfs",,

"ALL","/etc/rc3.d/","tmpfs",,

"ALL","/etc/rc2.d/","tmpfs",,

"ALL","/etc/rc4.d/","tmpfs",,

"ALL","/etc/rc5.d/","tmpfs",,



I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I 
think because the root image *is* mounting.



As I said, thanks for the thoughts, and I ju

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-19 Thread Yuan Y Bai
Hi Jeff,
 
Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot.
 
 chdef node-i01 addkcmdline=rd.break=cleanup
 rinstall node-i01 osimage
 rcons node-i01
 
Have you try to add "/etc/systemd/" in litefile? Now we just add  "/etc/systemd/system/multi-user.target.wants/". 
 
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM  
Hi everyone,
 
thanks for the pointers.   I decided to go back to the very beginning and did a clean reinstall of xcat:
Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun  1 03:00:53 EDT 2018)
 
then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  I’m no longer getting udev errors, but I’m still getting journald errors:
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down.
 
which looks like it might be a space/memory issue?
 
In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420).  It’s pingable at the correct ip address.
 
As per the email below, I checked the image for pkglist, exlist, and postinall:
 
Object name: SL7.4-statelite-v1
    exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
    imagetype=linux
    osarch=x86_64
    osdistroname=SL7.4-x86_64
    osname=Linux
    osvers=SL7.4
    otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64
    permission=755
    pkgdir=/install/SL7.4/x86_64
    pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist
    postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall
    profile="">
    provmethod=statelite
    rootimgdir=/install/netboot/SL7.4/x86_64/compute
 
I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ...
the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node.
#image,file,options,comments,disable
"ALL","/etc/adjtime","tmpfs",,
"ALL","/etc/securetty","tmpfs",,
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/rsyslog.conf","tmpfs",,
"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,
"ALL","/etc/udev/","tmpfs",,
"ALL","/etc/ntp.conf.predhclient","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/resolv.conf.predhclient","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/opt/xcat/","tmpfs",,
"ALL","/xcatpost/","tmpfs",,
"ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",,
"ALL","/root/.ssh/","tmpfs",,
"ALL","/etc/rc3.d/","tmpfs",,
"ALL","/etc/rc2.d/","tmpfs",,
"ALL","/etc/rc4.d/","tmpfs",,
"ALL","/etc/rc5.d/","tmpfs",,
 
I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I think because the root image *is* mounting.
 
As I said, thanks for the thoughts, and I just wanted to make sure that people know that I appreciate the input,
 
Best,
 
Jeff Berry
 
 
 
 
From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 12 June 2018 10:01To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems
 
Hi Jeff,
 
Could you check your osimage definition about exlist, pkglist and postinstall?
We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. so could you try to use the rhels7 related files for your osimage? 
 
Here I give you an example for osimage, you can find the right arch files under /opt/xcat/share/xcat/netboot/rh/:
]# ls

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-18 Thread Jeff Berry
Hi everyone,

thanks for the pointers.   I decided to go back to the very beginning and did a 
clean reinstall of xcat:
Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri 
Jun  1 03:00:53 EDT 2018)

then I walked through the documentation - 
https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now.  
I’m no longer getting udev errors, but I’m still getting journald errors:
code killed, status 6/ABRT
on restart ‘/run/log/journal//system.journal corrupted or uncleanly 
shut down.

which looks like it might be a space/memory issue?

In any case, even just after boot, I have the same problem where I can’t ssh to 
the node or rcons, or even get a console prompt on the drac card (it’s a dell 
C6420).  It’s pingable at the correct ip address.

As per the email below, I checked the image for pkglist, exlist, and postinall:

Object name: SL7.4-statelite-v1
exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
imagetype=linux
osarch=x86_64
osdistroname=SL7.4-x86_64
osname=Linux
osvers=SL7.4
otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64
permission=755
pkgdir=/install/SL7.4/x86_64
pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist

postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall
profile=compute
provmethod=statelite
rootimgdir=/install/netboot/SL7.4/x86_64/compute

I had a brief moment where I thought it might be an selinux problem, but in the 
rootimg selinux is disabled in /etc/selinux/config ...
the litefile is standard, but I’m thinking that I might change /var and /run to 
persistent to see if I can some extra insight into what’s happening on the node.
#image,file,options,comments,disable
"ALL","/etc/adjtime","tmpfs",,
"ALL","/etc/securetty","tmpfs",,
"ALL","/etc/lvm/","tmpfs",,
"ALL","/etc/ntp.conf","tmpfs",,
"ALL","/etc/rsyslog.conf","tmpfs",,
"ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",,
"ALL","/etc/udev/","tmpfs",,
"ALL","/etc/ntp.conf.predhclient","tmpfs",,
"ALL","/etc/resolv.conf","tmpfs",,
"ALL","/etc/yp.conf","tmpfs",,
"ALL","/etc/resolv.conf.predhclient","tmpfs",,
"ALL","/etc/sysconfig/","tmpfs",,
"ALL","/etc/ssh/","tmpfs",,
"ALL","/etc/inittab","tmpfs",,
"ALL","/tmp/","tmpfs",,
"ALL","/var/","tmpfs",,
"ALL","/opt/xcat/","tmpfs",,
"ALL","/xcatpost/","tmpfs",,
"ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",,
"ALL","/root/.ssh/","tmpfs",,
"ALL","/etc/rc3.d/","tmpfs",,
"ALL","/etc/rc2.d/","tmpfs",,
"ALL","/etc/rc4.d/","tmpfs",,
"ALL","/etc/rc5.d/","tmpfs",,

I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I 
think because the root image *is* mounting.

As I said, thanks for the thoughts, and I just wanted to make sure that people 
know that I appreciate the input,

Best,

Jeff Berry




From: Yuan Y Bai [mailto:by...@cn.ibm.com]
Sent: 12 June 2018 10:01
To: xcat-user@lists.sourceforge.net
Cc: xcat-user@lists.sourceforge.net
Subject: Re: [xcat-user] SciLinux 7.4 statelite problems

Hi Jeff,

Could you check your osimage definition about exlist, pkglist and postinstall?
We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. 
so could you try to use the rhels7 related files for your osimage?

Here I give you an example for osimage, you can find the right arch files under 
/opt/xcat/share/xcat/netboot/rh/:
]# lsdef -t osimage rhels7.4-x86_64-statelite-compute -i 
exlist,pkglist,postinstall
Object name: rhels7.4-x86_64-statelite-compute
exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist
pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist

postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall


"Failing to install mlx_en", I got the same message when there is no mlx in my 
system.


Best Regards
--
Yuan Bai (白媛)

CSTL HPC System Management Development
Tel:86-10-82451401
E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com>
Address: IBM ZGC Campus. Ring Building 28,
ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
Beijing P.R.China 100193

IBM环宇大厦
北京市海淀区东北旺西路8号,中关村软件园28号楼
邮编:100193


- Original message -
F

Re: [xcat-user] SciLinux 7.4 statelite problems

2018-06-12 Thread Yuan Y Bai
Hi Jeff,
 
Could you check your osimage definition about exlist, pkglist and postinstall?
We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. so could you try to use the rhels7 related files for your osimage? 
 
Here I give you an example for osimage, you can find the right arch files under /opt/xcat/share/xcat/netboot/rh/:
]# lsdef -t osimage rhels7.4-x86_64-statelite-compute -i exlist,pkglist,postinstallObject name: rhels7.4-x86_64-statelite-compute    exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist    pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist    postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall
 
 
"Failing to install mlx_en", I got the same message when there is no mlx in my system.
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 12, 2018 4:25 PM  
Good morning all,
 
I’m still wrestling with a SciLinux 7.4 statelite deployment with xcat 2.13.11.    The dracut hooks don’t seem to be working properly, which is both making it difficult to debug and also probably symptomatic of a larger problem.   Running genimage, a few things have caught my eye.
 
The package list is looking for busybox-anaconda, which doesn’t seem to exist for SciLinux 7.  A bit of poking seems to suggest that it is deprecated, but it’s not clear to me what a suitable replacement might be.  Is there a preferred solution/workaround?
 
The dracut install also is throwing a couple of errors.  Failing to install mlx_en is, I think, benign.  I am also getting this error: “dracut-install: ERROR: installing '/etc/udev/udev.conf'”  which seems like it might be more significant, especially in light of my dracut problems.  However, I don’t know what might be causing this problem, nor how to fix it.
 
Any insight will be latched upon to with unseemly haste,
 
Jeff Berry
MRC-CBSU, Cambridge
--Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org! http://sdm.link/slashdot
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] SciLinux 7.4 statelite problems

2018-06-12 Thread Jeff Berry
Good morning all,

I'm still wrestling with a SciLinux 7.4 statelite deployment with xcat 2.13.11. 
   The dracut hooks don't seem to be working properly, which is both making it 
difficult to debug and also probably symptomatic of a larger problem.   Running 
genimage, a few things have caught my eye.

The package list is looking for busybox-anaconda, which doesn't seem to exist 
for SciLinux 7.  A bit of poking seems to suggest that it is deprecated, but 
it's not clear to me what a suitable replacement might be.  Is there a 
preferred solution/workaround?

The dracut install also is throwing a couple of errors.  Failing to install 
mlx_en is, I think, benign.  I am also getting this error: "dracut-install: 
ERROR: installing '/etc/udev/udev.conf'"  which seems like it might be more 
significant, especially in light of my dracut problems.  However, I don't know 
what might be causing this problem, nor how to fix it.

Any insight will be latched upon to with unseemly haste,

Jeff Berry
MRC-CBSU, Cambridge
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user