Re: [xcat-user] SciLinux 7.4 statelite problems
Yes - console=tty0 console=ttyS1,115200 MNTOPTS= I rebuilt the xcat server with CentOS 7.5 (and built images with that), and now the drac console is working properly. In any case, to a first approximation, I can deploy the cluster. Thanks again, Jeff Berry From: david_john...@brown.edu [mailto:david_john...@brown.edu] Sent: 20 June 2018 15:58 To: xCAT Users Mailing list Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Do you have serial console on ttyS1 at 115200 baud with no flow control? The dell servers I have used use second serial port for console, rather than ttyS0 that most IBM Lenovo servers use. -- ddj Dave Johnson On Jun 20, 2018, at 7:02 AM, Song BJ Yang mailto:yang...@cn.ibm.com>> wrote: hi, is it possible to provide the screen log of the running `genimage` and system boot up on console after `rinstall/rpower`? -- YANG Song (杨嵩) IBM China System Technology Laboratory Tel: 86-10-82452903 Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com> Address: Building 28, ZhongGuanCun Software Park, No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC 北京市海淀区东北旺西路8号中关村软件园28号楼 邮编: 100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Wed, Jun 20, 2018 6:25 PM Hi, the message is on boot and I never get to a console prompt. I’m starting to wonder if there may be some issue with dracut, possibly on the specific hardware I’m using. I’m using Dell C6420s, and try as I might I can’t get the install to drop to a dracut shell. Regards, Jeff From: Song BJ Yang [mailto:yang...@cn.ibm.com] Sent: 20 June 2018 08:19 To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems hi Jeff Berry, when did you see the error message? during `genimage`? or during node boot up? " code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. " can you see the login prompt on the console? -- YANG Song (杨嵩) IBM China System Technology Laboratory Tel: 86-10-82452903 Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com> Address: Building 28, ZhongGuanCun Software Park, No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC 北京市海淀区东北旺西路8号中关村软件园28号楼 邮编: 100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Tue, Jun 19, 2018 4:42 PM Hi Yuan, I made the change to litefile as you suggested. I was already running break.cleanup, but I tried pre-pivot as well. In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused. Best, Jeff From: Yuan Y Bai [mailto:by...@cn.ibm.com] Sent: 19 June 2018 08:37 To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards -- Yuan Bai (白媛) CSTL HPC System Management Development Tel:86-10-82451401 E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28, ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, Beijing P.R.China 100193 IBM环宇大厦 北京市海淀区东北旺西路8号,中关村软件园28号楼 邮编:100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting
Re: [xcat-user] SciLinux 7.4 statelite problems
Do you have serial console on ttyS1 at 115200 baud with no flow control? The dell servers I have used use second serial port for console, rather than ttyS0 that most IBM Lenovo servers use. -- ddj Dave Johnson > On Jun 20, 2018, at 7:02 AM, Song BJ Yang wrote: > > hi, > > is it possible to provide the screen log of the running `genimage` and > system boot up on console after `rinstall/rpower`? > -- > YANG Song (杨嵩) > IBM China System Technology Laboratory > Tel: 86-10-82452903 > Email: yang...@cn.ibm.com > Address: Building 28, ZhongGuanCun Software Park, > No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC > > 北京市海淀区东北旺西路8号中关村软件园28号楼 > 邮编: 100193 > > > - Original message - > From: Jeff Berry > To: xCAT Users Mailing list > Cc: > Subject: Re: [xcat-user] SciLinux 7.4 statelite problems > Date: Wed, Jun 20, 2018 6:25 PM > > Hi, > > > > the message is on boot and I never get to a console prompt. I’m starting to > wonder if there may be some issue with dracut, possibly on the specific > hardware I’m using. I’m using Dell C6420s, and try as I might I can’t get > the install to drop to a dracut shell. > > > > Regards, > > > > Jeff > > > > From: Song BJ Yang [mailto:yang...@cn.ibm.com] > Sent: 20 June 2018 08:19 > To: xcat-user@lists.sourceforge.net > Cc: xcat-user@lists.sourceforge.net > Subject: Re: [xcat-user] SciLinux 7.4 statelite problems > > > > hi Jeff Berry, > > > > when did you see the error message? during `genimage`? or during node boot > up? > > > > " > > code killed, status 6/ABRT > > on restart ‘/run/log/journal//system.journal corrupted or uncleanly > shut down. > > " > > > > can you see the login prompt on the console? > > > > > > > > -- > YANG Song (杨嵩) > IBM China System Technology Laboratory > Tel: 86-10-82452903 > Email: yang...@cn.ibm.com > Address: Building 28, ZhongGuanCun Software Park, > No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC > > 北京市海淀区东北旺西路8号中关村软件园28号楼 > 邮编: 100193 > > > > > > - Original message - > From: Jeff Berry > To: xCAT Users Mailing list > Cc: > Subject: Re: [xcat-user] SciLinux 7.4 statelite problems > Date: Tue, Jun 19, 2018 4:42 PM > > > Hi Yuan, > > > > I made the change to litefile as you suggested. > > I was already running break.cleanup, but I tried pre-pivot as well. > > In both cases, the node boots, is pingable, but the console doesn’t get to > login and ssh gives a connection refused. > > > > Best, > > > > Jeff > > > > From: Yuan Y Bai [mailto:by...@cn.ibm.com] > Sent: 19 June 2018 08:37 > To: xcat-user@lists.sourceforge.net > Cc: xcat-user@lists.sourceforge.net > Subject: Re: [xcat-user] SciLinux 7.4 statelite problems > > > > Hi Jeff, > > > > Could you try rd.break=cleanup as following, or you can try to set break > point addkcmdline=rd.break=pre-pivot. > > > > chdef node-i01 addkcmdline=rd.break=cleanup > > rinstall node-i01 osimage > > rcons node-i01 > > > > Have you try to add "/etc/systemd/" in litefile? Now we just add > "/etc/systemd/system/multi-user.target.wants/". > > > > > > > > Best Regards > -- > Yuan Bai (白媛) > > CSTL HPC System Management Development > Tel:86-10-82451401 > E-mail: by...@cn.ibm.com > Address: IBM ZGC Campus. Ring Building 28, > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, > Beijing P.R.China 100193 > > IBM环宇大厦 > 北京市海淀区东北旺西路8号,中关村软件园28号楼 > 邮编:100193 > > > > > > - Original message - > From: Jeff Berry > To: xCAT Users Mailing list > Cc: > Subject: Re: [xcat-user] SciLinux 7.4 statelite problems > Date: Mon, Jun 18, 2018 5:36 PM > > > Hi everyone, > > > > thanks for the pointers. I decided to go back to the very beginning and did > a clean reinstall of xcat: > > Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built > Fri Jun 1 03:00:53 EDT 2018) > > > > then I walked through the documentation - > https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. > I’m no longer getting udev errors, but
Re: [xcat-user] SciLinux 7.4 statelite problems - mostly solved
Hi all, many thanks to Gilad Berman who suggested that I disable serialflow=hard. With that disabled, I can now ssh into the node. Oddly, neither the drac console or rcons will provide a login prompt. Still, this has both pointed me to the place I should be looking and also given me a workaround to get the rest of the deployment sorted. Thanks to everyone (and especially Gilad) for the help and advice, Jeff Berry, MRC Cognition and Brain Sciences Unit -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] SciLinux 7.4 statelite problems
hi, is it possible to provide the screen log of the running `genimage` and system boot up on console after `rinstall/rpower`? --YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193 - Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Wed, Jun 20, 2018 6:25 PM Hi, the message is on boot and I never get to a console prompt. I’m starting to wonder if there may be some issue with dracut, possibly on the specific hardware I’m using. I’m using Dell C6420s, and try as I might I can’t get the install to drop to a dracut shell. Regards, Jeff From: Song BJ Yang [mailto:yang...@cn.ibm.com]Sent: 20 June 2018 08:19To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems hi Jeff Berry, when did you see the error message? during `genimage`? or during node boot up? " code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. " can you see the login prompt on the console? --YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193 - Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 19, 2018 4:42 PM Hi Yuan, I made the change to litefile as you suggested. I was already running break.cleanup, but I tried pre-pivot as well. In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused. Best, Jeff From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 19 June 2018 08:37To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193 - Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=""> provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I
Re: [xcat-user] SciLinux 7.4 statelite problems
Hi, the message is on boot and I never get to a console prompt. I’m starting to wonder if there may be some issue with dracut, possibly on the specific hardware I’m using. I’m using Dell C6420s, and try as I might I can’t get the install to drop to a dracut shell. Regards, Jeff From: Song BJ Yang [mailto:yang...@cn.ibm.com] Sent: 20 June 2018 08:19 To: xcat-user@lists.sourceforge.net Cc: xcat-user@lists.sourceforge.net Subject: Re: [xcat-user] SciLinux 7.4 statelite problems hi Jeff Berry, when did you see the error message? during `genimage`? or during node boot up? " code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. " can you see the login prompt on the console? -- YANG Song (杨嵩) IBM China System Technology Laboratory Tel: 86-10-82452903 Email: yang...@cn.ibm.com<mailto:yang...@cn.ibm.com> Address: Building 28, ZhongGuanCun Software Park, No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC 北京市海淀区东北旺西路8号中关村软件园28号楼 邮编: 100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Tue, Jun 19, 2018 4:42 PM Hi Yuan, I made the change to litefile as you suggested. I was already running break.cleanup, but I tried pre-pivot as well. In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused. Best, Jeff From: Yuan Y Bai [mailto:by...@cn.ibm.com] Sent: 19 June 2018 08:37 To: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Cc: xcat-user@lists.sourceforge.net<mailto:xcat-user@lists.sourceforge.net> Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards -- Yuan Bai (白媛) CSTL HPC System Management Development Tel:86-10-82451401 E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28, ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, Beijing P.R.China 100193 IBM环宇大厦 北京市海淀区东北旺西路8号,中关村软件园28号楼 邮编:100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=compute provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node. #image,file,options,comments,disable "ALL","/etc/adjtime","tmpfs",, "ALL","/etc/securetty","tmpfs",, "ALL","/etc/lvm/","tmpfs",, "ALL","/etc/
Re: [xcat-user] SciLinux 7.4 statelite problems
hi Jeff Berry, when did you see the error message? during `genimage`? or during node boot up? " code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. " can you see the login prompt on the console? --YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193 - Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 19, 2018 4:42 PM Hi Yuan, I made the change to litefile as you suggested. I was already running break.cleanup, but I tried pre-pivot as well. In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused. Best, Jeff From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 19 June 2018 08:37To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193 - Original message -From: Jeff Berry <jeff.be...@mrc-cbu.cam.ac.uk>To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=""> provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node. #image,file,options,comments,disable "ALL","/etc/adjtime","tmpfs",, "ALL","/etc/securetty","tmpfs",, "ALL","/etc/lvm/","tmpfs",, "ALL","/etc/ntp.conf","tmpfs",, "ALL","/etc/rsyslog.conf","tmpfs",, "ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",, "ALL","/etc/udev/","tmpfs",, "ALL","/etc/ntp.conf.predhclient","tmpfs",, "ALL","/etc/resolv.conf","tmpfs",, "ALL","/etc/yp.conf","tmpfs",, "ALL","/etc/resolv.conf.predhclient","tmpfs",, "ALL","/etc/sysconfig/","tmpfs",, "ALL","/etc/ssh/","tmpfs",, "ALL","/etc/inittab","tmpfs",, "ALL","/tmp/","tmpfs",, "ALL","/var/","tmpfs",, "ALL","/opt/xc
Re: [xcat-user] SciLinux 7.4 statelite problems
Hi Yuan, I made the change to litefile as you suggested. I was already running break.cleanup, but I tried pre-pivot as well. In both cases, the node boots, is pingable, but the console doesn’t get to login and ssh gives a connection refused. Best, Jeff From: Yuan Y Bai [mailto:by...@cn.ibm.com] Sent: 19 June 2018 08:37 To: xcat-user@lists.sourceforge.net Cc: xcat-user@lists.sourceforge.net Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards -- Yuan Bai (白媛) CSTL HPC System Management Development Tel:86-10-82451401 E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28, ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, Beijing P.R.China 100193 IBM环宇大厦 北京市海淀区东北旺西路8号,中关村软件园28号楼 邮编:100193 - Original message - From: Jeff Berry mailto:jeff.be...@mrc-cbu.cam.ac.uk>> To: xCAT Users Mailing list mailto:xcat-user@lists.sourceforge.net>> Cc: Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Date: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=compute provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node. #image,file,options,comments,disable "ALL","/etc/adjtime","tmpfs",, "ALL","/etc/securetty","tmpfs",, "ALL","/etc/lvm/","tmpfs",, "ALL","/etc/ntp.conf","tmpfs",, "ALL","/etc/rsyslog.conf","tmpfs",, "ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",, "ALL","/etc/udev/","tmpfs",, "ALL","/etc/ntp.conf.predhclient","tmpfs",, "ALL","/etc/resolv.conf","tmpfs",, "ALL","/etc/yp.conf","tmpfs",, "ALL","/etc/resolv.conf.predhclient","tmpfs",, "ALL","/etc/sysconfig/","tmpfs",, "ALL","/etc/ssh/","tmpfs",, "ALL","/etc/inittab","tmpfs",, "ALL","/tmp/","tmpfs",, "ALL","/var/","tmpfs",, "ALL","/opt/xcat/","tmpfs",, "ALL","/xcatpost/","tmpfs",, "ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",, "ALL","/root/.ssh/","tmpfs",, "ALL","/etc/rc3.d/","tmpfs",, "ALL","/etc/rc2.d/","tmpfs",, "ALL","/etc/rc4.d/","tmpfs",, "ALL","/etc/rc5.d/","tmpfs",, I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I think because the root image *is* mounting. As I said, thanks for the thoughts, and I ju
Re: [xcat-user] SciLinux 7.4 statelite problems
Hi Jeff, Could you try rd.break=cleanup as following, or you can try to set break point addkcmdline=rd.break=pre-pivot. chdef node-i01 addkcmdline=rd.break=cleanup rinstall node-i01 osimage rcons node-i01 Have you try to add "/etc/systemd/" in litefile? Now we just add "/etc/systemd/system/multi-user.target.wants/". Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193 - Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: Re: [xcat-user] SciLinux 7.4 statelite problemsDate: Mon, Jun 18, 2018 5:36 PM Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=""> provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node. #image,file,options,comments,disable "ALL","/etc/adjtime","tmpfs",, "ALL","/etc/securetty","tmpfs",, "ALL","/etc/lvm/","tmpfs",, "ALL","/etc/ntp.conf","tmpfs",, "ALL","/etc/rsyslog.conf","tmpfs",, "ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",, "ALL","/etc/udev/","tmpfs",, "ALL","/etc/ntp.conf.predhclient","tmpfs",, "ALL","/etc/resolv.conf","tmpfs",, "ALL","/etc/yp.conf","tmpfs",, "ALL","/etc/resolv.conf.predhclient","tmpfs",, "ALL","/etc/sysconfig/","tmpfs",, "ALL","/etc/ssh/","tmpfs",, "ALL","/etc/inittab","tmpfs",, "ALL","/tmp/","tmpfs",, "ALL","/var/","tmpfs",, "ALL","/opt/xcat/","tmpfs",, "ALL","/xcatpost/","tmpfs",, "ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",, "ALL","/root/.ssh/","tmpfs",, "ALL","/etc/rc3.d/","tmpfs",, "ALL","/etc/rc2.d/","tmpfs",, "ALL","/etc/rc4.d/","tmpfs",, "ALL","/etc/rc5.d/","tmpfs",, I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I think because the root image *is* mounting. As I said, thanks for the thoughts, and I just wanted to make sure that people know that I appreciate the input, Best, Jeff Berry From: Yuan Y Bai [mailto:by...@cn.ibm.com]Sent: 12 June 2018 10:01To: xcat-user@lists.sourceforge.netCc: xcat-user@lists.sourceforge.netSubject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you check your osimage definition about exlist, pkglist and postinstall? We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. so could you try to use the rhels7 related files for your osimage? Here I give you an example for osimage, you can find the right arch files under /opt/xcat/share/xcat/netboot/rh/: ]# ls
Re: [xcat-user] SciLinux 7.4 statelite problems
Hi everyone, thanks for the pointers. I decided to go back to the very beginning and did a clean reinstall of xcat: Version 2.14.1 (git commit 70d6e7f93cc9714a127c22df2e7ca53d4996a34c, built Fri Jun 1 03:00:53 EDT 2018) then I walked through the documentation - https://xcat-docs.readthedocs.io/en/stable - and it works slighly better now. I’m no longer getting udev errors, but I’m still getting journald errors: code killed, status 6/ABRT on restart ‘/run/log/journal//system.journal corrupted or uncleanly shut down. which looks like it might be a space/memory issue? In any case, even just after boot, I have the same problem where I can’t ssh to the node or rcons, or even get a console prompt on the drac card (it’s a dell C6420). It’s pingable at the correct ip address. As per the email below, I checked the image for pkglist, exlist, and postinall: Object name: SL7.4-statelite-v1 exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist imagetype=linux osarch=x86_64 osdistroname=SL7.4-x86_64 osname=Linux osvers=SL7.4 otherpkgdir=/install/post/otherpkgs/SL7.4/x86_64 permission=755 pkgdir=/install/SL7.4/x86_64 pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall profile=compute provmethod=statelite rootimgdir=/install/netboot/SL7.4/x86_64/compute I had a brief moment where I thought it might be an selinux problem, but in the rootimg selinux is disabled in /etc/selinux/config ... the litefile is standard, but I’m thinking that I might change /var and /run to persistent to see if I can some extra insight into what’s happening on the node. #image,file,options,comments,disable "ALL","/etc/adjtime","tmpfs",, "ALL","/etc/securetty","tmpfs",, "ALL","/etc/lvm/","tmpfs",, "ALL","/etc/ntp.conf","tmpfs",, "ALL","/etc/rsyslog.conf","tmpfs",, "ALL","/etc/rsyslog.conf.XCATORIG","tmpfs",, "ALL","/etc/udev/","tmpfs",, "ALL","/etc/ntp.conf.predhclient","tmpfs",, "ALL","/etc/resolv.conf","tmpfs",, "ALL","/etc/yp.conf","tmpfs",, "ALL","/etc/resolv.conf.predhclient","tmpfs",, "ALL","/etc/sysconfig/","tmpfs",, "ALL","/etc/ssh/","tmpfs",, "ALL","/etc/inittab","tmpfs",, "ALL","/tmp/","tmpfs",, "ALL","/var/","tmpfs",, "ALL","/opt/xcat/","tmpfs",, "ALL","/xcatpost/","tmpfs",, "ALL","/etc/systemd/system/multi-user.target.wants/","tmpfs",, "ALL","/root/.ssh/","tmpfs",, "ALL","/etc/rc3.d/","tmpfs",, "ALL","/etc/rc2.d/","tmpfs",, "ALL","/etc/rc4.d/","tmpfs",, "ALL","/etc/rc5.d/","tmpfs",, I’m booting with rd.debug and rd.break=cleanup, but I don’t get a shell – I think because the root image *is* mounting. As I said, thanks for the thoughts, and I just wanted to make sure that people know that I appreciate the input, Best, Jeff Berry From: Yuan Y Bai [mailto:by...@cn.ibm.com] Sent: 12 June 2018 10:01 To: xcat-user@lists.sourceforge.net Cc: xcat-user@lists.sourceforge.net Subject: Re: [xcat-user] SciLinux 7.4 statelite problems Hi Jeff, Could you check your osimage definition about exlist, pkglist and postinstall? We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. so could you try to use the rhels7 related files for your osimage? Here I give you an example for osimage, you can find the right arch files under /opt/xcat/share/xcat/netboot/rh/: ]# lsdef -t osimage rhels7.4-x86_64-statelite-compute -i exlist,pkglist,postinstall Object name: rhels7.4-x86_64-statelite-compute exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall "Failing to install mlx_en", I got the same message when there is no mlx in my system. Best Regards -- Yuan Bai (白媛) CSTL HPC System Management Development Tel:86-10-82451401 E-mail: by...@cn.ibm.com<mailto:by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28, ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District, Beijing P.R.China 100193 IBM环宇大厦 北京市海淀区东北旺西路8号,中关村软件园28号楼 邮编:100193 - Original message - F
Re: [xcat-user] SciLinux 7.4 statelite problems
Hi Jeff, Could you check your osimage definition about exlist, pkglist and postinstall? We do not formal ship compute.SL7.pkglist, we user the same files for rhels7. so could you try to use the rhels7 related files for your osimage? Here I give you an example for osimage, you can find the right arch files under /opt/xcat/share/xcat/netboot/rh/: ]# lsdef -t osimage rhels7.4-x86_64-statelite-compute -i exlist,pkglist,postinstallObject name: rhels7.4-x86_64-statelite-compute exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.exlist pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.pkglist postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.x86_64.postinstall "Failing to install mlx_en", I got the same message when there is no mlx in my system. Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193 - Original message -From: Jeff Berry To: xCAT Users Mailing list Cc:Subject: [xcat-user] SciLinux 7.4 statelite problemsDate: Tue, Jun 12, 2018 4:25 PM Good morning all, I’m still wrestling with a SciLinux 7.4 statelite deployment with xcat 2.13.11. The dracut hooks don’t seem to be working properly, which is both making it difficult to debug and also probably symptomatic of a larger problem. Running genimage, a few things have caught my eye. The package list is looking for busybox-anaconda, which doesn’t seem to exist for SciLinux 7. A bit of poking seems to suggest that it is deprecated, but it’s not clear to me what a suitable replacement might be. Is there a preferred solution/workaround? The dracut install also is throwing a couple of errors. Failing to install mlx_en is, I think, benign. I am also getting this error: “dracut-install: ERROR: installing '/etc/udev/udev.conf'” which seems like it might be more significant, especially in light of my dracut problems. However, I don’t know what might be causing this problem, nor how to fix it. Any insight will be latched upon to with unseemly haste, Jeff Berry MRC-CBSU, Cambridge --Check out the vibrant tech community on one of the world's mostengaging tech sites, Slashdot.org! http://sdm.link/slashdot ___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] SciLinux 7.4 statelite problems
Good morning all, I'm still wrestling with a SciLinux 7.4 statelite deployment with xcat 2.13.11. The dracut hooks don't seem to be working properly, which is both making it difficult to debug and also probably symptomatic of a larger problem. Running genimage, a few things have caught my eye. The package list is looking for busybox-anaconda, which doesn't seem to exist for SciLinux 7. A bit of poking seems to suggest that it is deprecated, but it's not clear to me what a suitable replacement might be. Is there a preferred solution/workaround? The dracut install also is throwing a couple of errors. Failing to install mlx_en is, I think, benign. I am also getting this error: "dracut-install: ERROR: installing '/etc/udev/udev.conf'" which seems like it might be more significant, especially in light of my dracut problems. However, I don't know what might be causing this problem, nor how to fix it. Any insight will be latched upon to with unseemly haste, Jeff Berry MRC-CBSU, Cambridge -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user