[xcat-user] xcat deployment of sr630 with optane
Hello, Trying to deploy RH7.5 on an SR630 with Optane fails with various UDEV timeout errors. Has anyone had any luck deploying Optane servers using xCAT? Our version of xCAT is: [root@qmgt3 ~]# rpm -qa |grep -i xcat *xCAT*-client-2.15-snap201911041517.noarch *xCAT*-2.15-snap201911041517.x86_64 *xCAT*-buildkit-2.15-snap201911041517.noarch *xCAT*-genesis-scripts-x86_64-2.15-snap201911041517.noarch grub2-*xcat*-2.02-0.76.el7.1.snap201905160255.noarch *xCAT*-genesis-base-x86_64-2.14.5-snap201811190037.noarch *xCAT*-genesis-base-ppc64-2.14.5-snap201811160710.noarch elilo-*xcat*-3.14-4.noarch syslinux-*xcat*-3.86-2.noarch *xCAT*-probe-2.15-snap201911041517.noarch ipmitool-*xcat*-1.8.18-0.x86_64 perl-*xCAT*-2.15-snap201911041517.noarch *xCAT*-genesis-scripts-ppc64-2.15-snap201911041517.noarch *xCAT*-server-2.15-snap201911041517.noarch Thanks, Damir ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] running bmc setup and the USERID password
Here is what I have done so far: logged into both qgpu0101 and qgpu0102 and ran /bin/bmcsetup. They have the correct IP. After the expect script did not work I have logged into only qgpu0101 genesis and ran: ipmitool user set password 2 it worked but I still could not login into the USERID@qgpu0101 On subsequent attempts, I can no longer change the password. When I test it: xCAT Genesis running on qgpu0101 /]# ipmitool user test 2 16 Password for user 2: Success impitool lan print 1 shows: Auth Type Enable : Callback : : User : MD5 PASSWORD : Operator : MD5 PASSWORD : *Admin : MD5* : OEM : User USERID is ADMIN so I ran: ipmitool lan set 1 auth ADMIN MD5,PASSWORD But alas I can't login to USERID@qgpu0101 still. And eventually, Admin changes back to just MD5 ?? I also changed USERID to an OPERATOR level using: ipmitool user priv 2 0x3 1 On Thu, Feb 11, 2021 at 4:28 PM Damir Krstic wrote: > Lenovo has implemented change the USERID password from the default > PASSW0RD in recent firmware iterations. We had success in implementing an > expect script that ran after the bmcsetup script ran that would change the > password to our own. > > However, in the recent batch of nodes, the default password (PASSW0RD) is > changed / different. Does anyone know what the new password is? Also, this > new policy of having to change the bmc password right away is, to say > the least, not convenient. We spend more time now messing with the bmc > scripts than deploying the nodes. > > Is there a procedure or some step that I am missing? > > Thanks, > Damir > ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] running bmc setup and the USERID password
Lenovo has implemented change the USERID password from the default PASSW0RD in recent firmware iterations. We had success in implementing an expect script that ran after the bmcsetup script ran that would change the password to our own. However, in the recent batch of nodes, the default password (PASSW0RD) is changed / different. Does anyone know what the new password is? Also, this new policy of having to change the bmc password right away is, to say the least, not convenient. We spend more time now messing with the bmc scripts than deploying the nodes. Is there a procedure or some step that I am missing? Thanks, Damir ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] bmcsetup and complex password
Hi all, The new Lenovo hardware requires that the bmc password is changed to complex password on the initial login. This seems to be tripping up bmcsetup script and it's not completing. We are running xcat 2.15. Thank you. Damir ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] programmatically getting power telemetry from nextscale and SD530
Is there a programmatic way of getting the power usage telemetry out of NextScale FPCs and SD530 chassis? I know that we can access that info using the web interface by signing in to the FPC and viewing the power information. However, I would like to do this across the entire cluster in order to record the cluster's power usage. If I had to guess ipmitool may be able to do it, but I am unsure what, or how to get that information using ipmitool. Thank you. Damir ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] sd530 deployment guide
so do we need confluent service installed to deploy SD530? On Thu, May 3, 2018 at 11:15 AM peter CZ1 Peng <peng...@lenovo.com> wrote: > Hi ,Damir > > Can you try this > > SD530 support zero power on configure ( don’t need to power on the > node ,only AC on ,no DC on ) ,and confluent service will be detect the SMM > and XCC by the IPv6 address , that also called out of band management > > > > > https://github.com/824380210/xcat_book/blob/master/20180109_stark_SD530_WI.md > > > > Let me know if anything is not clear , and I will try to help ,thanks > > > > > > *Peter CZ peng* > Department :Complex Solution Rack TE > Address:ISH3 Shenzhen > > Lenovo China > > [image: Phone]+86 86361590 > [image: Email]+86 18129979128 <+86%20181%202997%209128> > [image: VOIP]609 1590 > [image: Email]peng...@lenovo.com <zengd...@lenovo.com> > > > > Lenovo.com /www.lenovo.com <http://www.lenovo.com/www.lenovo.com> > Twitter <http://twitter.com/lenovo> | *Facebook* | Instagram > <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums > <http://forums.lenovo.com/> > > [image: Lenovo_2015] > > > > > > *From:* Damir Krstic <damir.krs...@gmail.com> > *Sent:* Thursday, May 3, 2018 11:59 PM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* [xcat-user] sd530 deployment guide > > > > Hi all, > > > > We have just purchased a rack of SD530 nodes for our cluster. We have been > deploying NextScale nodes for sometime now and are very familiar with the > process. Relatedly, xCAT group had a wonderful document outlining all > stages of NextScale deployment at this page: > https://sourceforge.net/p/xcat/wiki/XCAT_NeXtScale_Clusters/ > > > > I am wondering if such page exists for SD530 deployment using xCAT? I > searched for it and I can't find it. We are trying to deploy a single > chassis with 4 SD530 servers and is having little bit of trouble with SMM > and bmcsetup and things like that. Without a complete guide, it's not easy > to figure out how all pieces fit together. For example, is SMM equivelent > to FPC on NextScale, and if so, what is the configfpc equivalent command > for SD530? Also, each SD530 node has two eth ports in it...is one of the > ports IMM/Eth shared port that we run normal bmcsetup against or is the SMM > now a single IMM port that is shared across 4 servers in the chassis? > > > > Things like this are not clear so any help is appreciated. > > > > Thank you. > > Damir > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] sd530 deployment guide
Hi all, We have just purchased a rack of SD530 nodes for our cluster. We have been deploying NextScale nodes for sometime now and are very familiar with the process. Relatedly, xCAT group had a wonderful document outlining all stages of NextScale deployment at this page: https://sourceforge.net/p/xcat/wiki/XCAT_NeXtScale_Clusters/ I am wondering if such page exists for SD530 deployment using xCAT? I searched for it and I can't find it. We are trying to deploy a single chassis with 4 SD530 servers and is having little bit of trouble with SMM and bmcsetup and things like that. Without a complete guide, it's not easy to figure out how all pieces fit together. For example, is SMM equivelent to FPC on NextScale, and if so, what is the configfpc equivalent command for SD530? Also, each SD530 node has two eth ports in it...is one of the ports IMM/Eth shared port that we run normal bmcsetup against or is the SMM now a single IMM port that is shared across 4 servers in the chassis? Things like this are not clear so any help is appreciated. Thank you. Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] connection refused on bmc port
it turns out telnet session was disabled - here is how we fixed it: On Thu, Jul 20, 2017 at 1:57 AM Nicolas Roosen <nicolas.roo...@hpe.com> wrote: > Hi, > > On 07/20/2017 04:31 AM, Damir Krstic wrote: > > Hi all, > > > > We just got couple of new x3650 servers in and discovering them went > > without a problem. Running bmcsetup worked ok too. For some reason after > > the bmc setup was done, the interface was still in dedicated mode and > > per Jarrod's instructions some time ago, I was able to change it to > > shared. What Jarrod asked me, back in 2015 to do was to ssh to the node > > while bmcsetup was running and execute following command: > > ipmitool raw 0xc 1 1 0xc0 0 > > > > you can try to "reset" the BMC: > > ipmitool mc reset warm (or "cold" if a warm reset is not enough). > > > Or maybe the raw command changed since there are new servers (and new > BMCs firmware I guess)? > > On a Supermicro I had to run this from the OS to set the BMC to shared: > > ipmitool raw 0x30 0x70 0x0c 1 1 > > And to check the actual value: > > ipmitool raw 0x30 0x70 0x0c 0 > > > Nicolas > > > This sets the interface to the shared mode. I did that and it looks OK. > > However, telnet -bmc I get connection refused and I don't recall > > ever getting this message before. > > > > Any help is appreciated. > > > > Thanks, > > Damir > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] connection refused on bmc port
Hi all, We just got couple of new x3650 servers in and discovering them went without a problem. Running bmcsetup worked ok too. For some reason after the bmc setup was done, the interface was still in dedicated mode and per Jarrod's instructions some time ago, I was able to change it to shared. What Jarrod asked me, back in 2015 to do was to ssh to the node while bmcsetup was running and execute following command: ipmitool raw 0xc 1 1 0xc0 0 This sets the interface to the shared mode. I did that and it looks OK. However, telnet -bmc I get connection refused and I don't recall ever getting this message before. Any help is appreciated. Thanks, Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] slow private network after RH7 upgrade
We have recently upgraded all our compute nodes to RHELS 7.3. We have left the management node on RedHat 6. After the nodes were upgraded and booted, we are experiencing a slow down on our private (172.20) network. For example, psh compute date command finishes successfully and quickly, but when we try to copy password file across all nodes, it hangs almost always on different nodes. I thought I had this issue isolated to a specific rack, but now it's obvious that it happens on all nodes. As a workaround, I am copying files across the cluster using -ib0 interface. I was wondering if RH7 is doing something funky with routing, or networking. I have looked at the tcpdump and can't really see anything strange except to say that it takes a long time for the packet to come back to the management node. I am thinking it's either routing or dns (reverse lookup) issue but I can't be sure. I am hoping somebody on this listserv had a similar issue and was able to resolve it. Thanks in advance. Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] problems booting redhat 7.3 on NextScale 360M5
So I removed it from the node’s definition in the nodehm table: [root@mgt pxelinux.cfg]# nodels qnode5118 nodehm qnode5118: nodehm.mgt: ipmi qnode5118: nodehm.serialport: 0 qnode5118: nodehm.node: qnode5118 qnode5118: nodehm.serialspeed: 115200 qnode5118: nodehm.serialflow: qnode5118: nodehm.cmdmapping: qnode5118: nodehm.termport: qnode5118: nodehm.comments: qnode5118: nodehm.consoleondemand: qnode5118: nodehm.cons: qnode5118: nodehm.conserver: qnode5118: nodehm.getmac: qnode5118: nodehm.termserver: qnode5118: nodehm.power: qnode5118: nodehm.disable: Set it to boot: [root@mgt pxelinux.cfg]# nodeset qnode5118 boot qnode5118: boot Rebooted it and it still hangs: [root@mgt pxelinux.cfg]# rpower qnode5118 boot qnode5118: reset > On May 25, 2017, at 1:52 PM, Gilad Berman <gber...@lenovo.com> wrote: > > Do you have consoles on demand set to yes in the site table (or specific to > the node)? <> > If yes, remove “hard” from your console settings, nodeset again and try. > > If this is a similar case, it is because when you set the flow control to > hardware, the OS waits for the serial console to be connected (which is flow > control..) > > > > Gilad Berman > HPC Architect > Lenovo EMEA > +972-52-2554262 > gber...@lenovo.com <mailto:gber...@lenovo.com> > > Lenovo.com <http://www.lenovo.com/> > Twitter <http://twitter.com/lenovo> | Facebook > <http://www.facebook.com/lenovo> | Instagram <https://instagram.com/lenovo> | > Blogs <http://blog.lenovo.com/> | Forums <http://forums.lenovo.com/> > > > > From: Damir Krstic [mailto:damir.krs...@gmail.com] > Sent: Thursday, May 25, 2017 9:44 PM > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5 > > We are installing RH7.3 on NextScale nodes and after the install, node > reboots. The problem is node seems to get "stuck" until I rcons into it and > then it continues booting. So far, I can't pinpoint exact spot where it gets > stuck, but it just sits there until I remote console (rcons) into it and then > it continues booting. Here is where the last one got stuck until I did remote > console: > > [root@mgt ~]# rcons qnode5118 > [Enter `^Ec?' for help] > Info: SOL payload already de-activated > [SOL Session operational. Use ~? for help] > ??6?+ > ?+#6cK??6?[?+s? > [ 131.345026] systemd[1]: Created slice Root Slice. > [ 131.350307] systemd[1]: Starting Root Slice. > [ OK ] Listening on Journal Socket. > [ 131.360103] systemd[1]: Listening on Journal Socket. > [ 131.365676] systemd[1]: Starting Journal Socket. > [ OK ] Listening on udev Control Socket. > [ 131.377102] systemd[1]: Listening on udev Control Socket. > [ 131.383152] systemd[1]: Starting udev Control Socket. > [ OK ] Listening on udev Kernel Socket. > [ 131.395101] systemd[1]: Listening on udev Kernel Socket. > [ 131.401055] systemd[1]: Starting udev Kernel Socket. > [ OK ] Reached target Sockets. > [ 131.411102] systemd[1]: Reached target Sockets. > [ 131.416180] systemd[1]: Starting Sockets. > [ OK ] Created slice System Slice. > [ 131.426105] systemd[1]: Created slice System Slice. > [ 131.431578] systemd[1]: Starting System Slice. > [ 131.437357] systemd[1]: Starting Apply Kernel Variables... > Starting Apply Kernel Variables... > [ 131.448652] systemd[1]: Starting Journal Service... > > Any help is appreciated. > Thanks, > Damir > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org <http://slashdot.org/>! > http://sdm.link/slashdot___ > <http://sdm.link/slashdot___> > xCAT-user mailing list > xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net> > https://lists.sourceforge.net/lists/listinfo/xcat-user > <https://lists.sourceforge.net/lists/listinfo/xcat-user> -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] problems booting redhat 7.3 on NextScale 360M5
We are installing RH7.3 on NextScale nodes and after the install, node reboots. The problem is node seems to get "stuck" until I rcons into it and then it continues booting. So far, I can't pinpoint exact spot where it gets stuck, but it just sits there until I remote console (rcons) into it and then it continues booting. Here is where the last one got stuck until I did remote console: [root@mgt ~]# rcons qnode5118 [Enter `^Ec?' for help] Info: SOL payload already de-activated [SOL Session operational. Use ~? for help] ??6?+ ?+#6cK??6?[?+s? [ 131.345026] systemd[1]: Created slice Root Slice. [ 131.350307] systemd[1]: Starting Root Slice. [ OK ] Listening on Journal Socket. [ 131.360103] systemd[1]: Listening on Journal Socket. [ 131.365676] systemd[1]: Starting Journal Socket. [ OK ] Listening on udev Control Socket. [ 131.377102] systemd[1]: Listening on udev Control Socket. [ 131.383152] systemd[1]: Starting udev Control Socket. [ OK ] Listening on udev Kernel Socket. [ 131.395101] systemd[1]: Listening on udev Kernel Socket. [ 131.401055] systemd[1]: Starting udev Kernel Socket. [ OK ] Reached target Sockets. [ 131.411102] systemd[1]: Reached target Sockets. [ 131.416180] systemd[1]: Starting Sockets. [ OK ] Created slice System Slice. [ 131.426105] systemd[1]: Created slice System Slice. [ 131.431578] systemd[1]: Starting System Slice. [ 131.437357] systemd[1]: Starting Apply Kernel Variables... Starting Apply Kernel Variables... [ 131.448652] systemd[1]: Starting Journal Service... Any help is appreciated. Thanks, Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] NextScale nodes not booting from the disk
These are my settings: IMM.PXE_NextBootEnabled=Disabled PXE.NicPortMacAddress.1=E4:1D:2D:73:AF:41 PXE.NicPortMacAddress.2=E4:1D:2D:73:AF:42 PXE.NicPortMacAddress.3=40:F2:E9:C5:48:14 PXE.NicPortMacAddress.4=40:F2:E9:C5:48:15 PXE.NicPortPxeMode.1=UEFI and Legacy Support PXE.NicPortPxeMode.2=UEFI and Legacy Support PXE.NicPortPxeMode.3=UEFI and Legacy Support PXE.NicPortPxeMode.4=UEFI and Legacy Support PXE.NicPortPxeProtocol.1=IPV4 PXE.NicPortPxeProtocol.2=IPV4 PXE.NicPortPxeProtocol.3=IPV4 PXE.NicPortPxeProtocol.4=IPV4 LegacySupport.Non-PlanarPXE=Enabled BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1 BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0 BroadcomGigabitEthernetBCM5717-40F2E9C54814.LegacyBootProtocol=PXE BroadcomGigabitEthernetBCM5717-40F2E9C54815.LegacyBootProtocol=PXE I'll change them so they look like yours and try again. Damir On Fri, May 12, 2017 at 1:50 PM Gilad Berman <gber...@lenovo.com> wrote: > Will give it another try (with the ASU settings now)… maybe not the issue > at all, but worth a shot – > > > > How does the following settings looks in your machine? Here are mine from > a M5 system – > > > > opafm1: PXE.NicPortPxeMode.1=Enabled > > opafm1: PXE.NicPortPxeMode.2=Enabled > > opafm1: PXE.NicPortPxeMode.3=Enabled > > opafm1: PXE.NicPortPxeMode.4=Enabled > > opafm1: PXE.NicPortLegacyPxeMode.1=Disabled > > opafm1: PXE.NicPortLegacyPxeMode.2=Enabled > > opafm1: PXE.NicPortLegacyPxeMode.3=Enabled > > opafm1: PXE.NicPortLegacyPxeMode.4=Enabled > > > > I disabled the Legacy PXE on the install NIC > > > > [image: > http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif] > > *Gilad Berman* > HPC Architect > Lenovo EMEA > > [image: Phone]+972-52-2554262 <+972%2052-255-4262> > [image: Email]gber...@lenovo.com <gber...@lenovo.com> > > > > Lenovo.com <http://www.lenovo.com/> > Twitter <http://twitter.com/lenovo> | Facebook > <http://www.facebook.com/lenovo> | Instagram > <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums > <http://forums.lenovo.com/> > > [image: DCG-Hardware] > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > > *Sent:* Friday, May 12, 2017 9:33 PM > > > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk > > > > Well the thing is, the node installs properly, and it reboots. On the > reboot, it's suppose to run the postbootscripts and it never boots to the > OS. So we tried changing the boot order to HD and that does not seem to > work. > > > > Damir > > > > On Fri, May 12, 2017 at 1:05 PM Nathan Harper <nathan.har...@cfms.org.uk> > wrote: > > I have a similar issue with some non-IBM/Lenovo equipment. Old dx360s > work, other new equipment boots, installs but when instructed by PXE to > boot locally, you get no more. I have had to work around by setting boot > order to HDD post install. > > Regards, > > Nathan > > > On 12 May 2017, at 17:06, Gilad Berman <gber...@lenovo.com> wrote: > > There is another setting that set the boot mode to Legacy – it is the > network PXE boot (found under network in the BIOS), make sure it is set to > UEFI. > > > > If you can’t find it let me know and I will provide the exact ASU setting > (I am not logged in to my lab currently). > > > > It sounds very much like an issue I had so hopefully it should solve the > issue > > > > Sorry if I missed anything in the thread and my suggestion is stupid > > > > > > *Gilad Berman* > HPC Architect > Lenovo EMEA > > +972-52-2554262 <+972%2052-255-4262> > gber...@lenovo.com <gber...@lenovo.com> > > > > Lenovo.com <http://www.lenovo.com/> > Twitter <http://twitter.com/lenovo> | Facebook > <http://www.facebook.com/lenovo> | Instagram > <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums > <http://forums.lenovo.com/> > > > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com > <damir.krs...@gmail.com>] > *Sent:* Friday, May 12, 2017 5:49 PM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk > > > > I've removed the Legacy Only and the node hangs on the boot - just a > cursor in the corner of the screen and nothing coming up. It never seems to > time out > > > > Super frustrating especially since dx360 nodes are b
Re: [xcat-user] NextScale nodes not booting from the disk
Well the thing is, the node installs properly, and it reboots. On the reboot, it's suppose to run the postbootscripts and it never boots to the OS. So we tried changing the boot order to HD and that does not seem to work. Damir On Fri, May 12, 2017 at 1:05 PM Nathan Harper <nathan.har...@cfms.org.uk> wrote: > I have a similar issue with some non-IBM/Lenovo equipment. Old dx360s > work, other new equipment boots, installs but when instructed by PXE to > boot locally, you get no more. I have had to work around by setting boot > order to HDD post install. > > Regards, > Nathan > > On 12 May 2017, at 17:06, Gilad Berman <gber...@lenovo.com> wrote: > > There is another setting that set the boot mode to Legacy – it is the > network PXE boot (found under network in the BIOS), make sure it is set to > UEFI. > > > > If you can’t find it let me know and I will provide the exact ASU setting > (I am not logged in to my lab currently). > > > > It sounds very much like an issue I had so hopefully it should solve the > issue > > > > Sorry if I missed anything in the thread and my suggestion is stupid > > > > > > *Gilad Berman* > HPC Architect > Lenovo EMEA > > +972-52-2554262 <+972%2052-255-4262> > gber...@lenovo.com <gber...@lenovo.com> > > > > Lenovo.com <http://www.lenovo.com/> > Twitter <http://twitter.com/lenovo> | Facebook > <http://www.facebook.com/lenovo> | Instagram > <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums > <http://forums.lenovo.com/> > > > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com > <damir.krs...@gmail.com>] > *Sent:* Friday, May 12, 2017 5:49 PM > *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk > > > > I've removed the Legacy Only and the node hangs on the boot - just a > cursor in the corner of the screen and nothing coming up. It never seems to > time out > > > > Super frustrating especially since dx360 nodes are booting just fine. > > Damir > > On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote: > > Could you share the IMM and Boot sections from asu show on one of your > troublesome nodes? > > -- ddj > Dave Johnson > > > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com> > wrote: > > > > We are switching our image from stateless to stateful in a month (going > from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image > we have created. Our NextScale nodes are not...The OS gets installed and > the node reboots but then it never boots from the hard drive. It times out > at the PXE prompt and just hangs. > > > > I am wondering if some BIOS setting is doing this - maybe something with > legacy mode, or boot order or something else. I've tried manually hitting > F12 and selecting the hard drive but that did not seem to work. > > > > Does anyone have NextScales booting from the hard drive, and if so, > would you mind sharing (dumping) your settings via asu show? > > > > Thanks, > > Damir > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] NextScale nodes not booting from the disk
Thanks for the reply - I don't think our issue is with PXE...node boots fine from the PXE and it gets installed...it's just after the installation when it's supposed to boot from the HD it does not. Damir On Fri, May 12, 2017 at 9:03 AM David D. Johnson <david_john...@brown.edu> wrote: > Sorry, misspoke on one point — we did take out Legacy Mode from the > BootModes. > > On May 12, 2017, at 9:51 AM, David D. Johnson <david_john...@brown.edu> > wrote: > > When we got our first 5465 Lenovo nodes, we changed the noderes netboot > attribute from pxe to xnba > and also changed BootModes.SystemBootMode to “UEFI Mode”. Even though > Legacy Only is first in > the boot order, they still are able to pxe boot just fine. That is the > only difference I see, maybe worth a quick > try to see if it makes any difference in behavior. We don’t have disks in > any of our NextScale nodes, so > I can’t try it out in our environment. > > — ddj > > > On May 12, 2017, at 9:01 AM, Damir Krstic <damir.krs...@gmail.com> wrote: > > Sure here it is: > > IMM.ForceBootToUefi=Disabled > IMM.PXE_NextBootEnabled=Disabled > IMM.SystemNextBootMode=Legacy > IMM.DHCPBootPCClientPortControl=Open > LegacySupport.ForceLegacyVideoonBoot=Enabled > LegacySupport.InfiniteBootRetry=Disabled > LegacySupport.BBSBoot=Enabled > BackupBankManagement.NumberOfSuccessfulConsecutiveBoots=1 > DevicesandIOPorts.Com1ActiveAfterBoot=Enable > DevicesandIOPorts.Com2ActiveAfterBoot=Disable > BootModes.SystemBootMode=Legacy Mode > BootModes.OptimizedBoot=Enabled > BootModes.QuietBoot=Enabled > BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1 > BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0 > SecureBootConfiguration.SecureBootis=Disabled > > > On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote: > >> Could you share the IMM and Boot sections from asu show on one of your >> troublesome nodes? >> >> -- ddj >> Dave Johnson >> >> > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com> >> wrote: >> > >> > We are switching our image from stateless to stateful in a month (going >> from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image >> we have created. Our NextScale nodes are not...The OS gets installed and >> the node reboots but then it never boots from the hard drive. It times out >> at the PXE prompt and just hangs. >> > >> > I am wondering if some BIOS setting is doing this - maybe something >> with legacy mode, or boot order or something else. I've tried manually >> hitting F12 and selecting the hard drive but that did not seem to work. >> > >> > Does anyone have NextScales booting from the hard drive, and if so, >> would you mind sharing (dumping) your settings via asu show? >> > >> > Thanks, >> > Damir >> > >> -- >> > Check out the vibrant tech community on one of the world's most >> > engaging tech sites, Slashdot.org <http://slashdot.org/>! >> http://sdm.link/slashdot >> > ___ >> > xCAT-user mailing list >> > xCAT-user@lists.sourceforge.net >> > https://lists.sourceforge.net/lists/listinfo/xcat-user >> >> >> -- >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org <http://slashdot.org/>! >> http://sdm.link/slashdot >> ___ >> xCAT-user mailing list >> xCAT-user@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/xcat-user >> > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org <http://slashdot.org/>! > http://sdm.link/slashdot___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] NextScale nodes not booting from the disk
Sure here it is: IMM.ForceBootToUefi=Disabled IMM.PXE_NextBootEnabled=Disabled IMM.SystemNextBootMode=Legacy IMM.DHCPBootPCClientPortControl=Open LegacySupport.ForceLegacyVideoonBoot=Enabled LegacySupport.InfiniteBootRetry=Disabled LegacySupport.BBSBoot=Enabled BackupBankManagement.NumberOfSuccessfulConsecutiveBoots=1 DevicesandIOPorts.Com1ActiveAfterBoot=Enable DevicesandIOPorts.Com2ActiveAfterBoot=Disable BootModes.SystemBootMode=Legacy Mode BootModes.OptimizedBoot=Enabled BootModes.QuietBoot=Enabled BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1 BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0 SecureBootConfiguration.SecureBootis=Disabled On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote: > Could you share the IMM and Boot sections from asu show on one of your > troublesome nodes? > > -- ddj > Dave Johnson > > > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com> > wrote: > > > > We are switching our image from stateless to stateful in a month (going > from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image > we have created. Our NextScale nodes are not...The OS gets installed and > the node reboots but then it never boots from the hard drive. It times out > at the PXE prompt and just hangs. > > > > I am wondering if some BIOS setting is doing this - maybe something with > legacy mode, or boot order or something else. I've tried manually hitting > F12 and selecting the hard drive but that did not seem to work. > > > > Does anyone have NextScales booting from the hard drive, and if so, > would you mind sharing (dumping) your settings via asu show? > > > > Thanks, > > Damir > > > -- > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > ___ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] NextScale nodes not booting from the disk
We are switching our image from stateless to stateful in a month (going from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image we have created. Our NextScale nodes are not...The OS gets installed and the node reboots but then it never boots from the hard drive. It times out at the PXE prompt and just hangs. I am wondering if some BIOS setting is doing this - maybe something with legacy mode, or boot order or something else. I've tried manually hitting F12 and selecting the hard drive but that did not seem to work. Does anyone have NextScales booting from the hard drive, and if so, would you mind sharing (dumping) your settings via asu show? Thanks, Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] dracut error with stateful image
Just checking to see if anyone knows about dracut error messages. We are getting close to booting 700+ with this new image and I am afraid we will have a lot of failures because of the dracut error. Thanks, Damir On Mon, May 1, 2017 at 2:23 PM Damir Krstic <damir.krs...@gmail.com> wrote: > We are in process of building a RH 7 stateful image on our cluster. After > setting a node to install with this image, sometimes we will see the > following error: > > dracut-initqueue[736]: Warning: No carrier detected on interface eno1 > > Usually, resetting the node fixes this issue (node installs properly). We > are planning to take downtime in June and install all 700+ nodes with this > image. I would like to clear up this delay / error if at all possible > before the downtime. > > Has anyone seen this issue before? > > Thanks, > Damir > -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] dracut error with stateful image
We are in process of building a RH 7 stateful image on our cluster. After setting a node to install with this image, sometimes we will see the following error: dracut-initqueue[736]: Warning: No carrier detected on interface eno1 Usually, resetting the node fixes this issue (node installs properly). We are planning to take downtime in June and install all 700+ nodes with this image. I would like to clear up this delay / error if at all possible before the downtime. Has anyone seen this issue before? Thanks, Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] dracut errors when deploying a stateful image
We are in process of building a stateful RH7 image. When booting nodes sometimes they will get stuck at the following screen: -- Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] redhat 7.3 and xCAT-2.9.1
Hi Christian - Thanks - I will look at renaming the interface. As for the name, DNS is working properly on the client. We are not stopping/disabling NetworkManager during the install. Should we? I should also mention that RedHat 7.1 booted and the name was set properly. Thanks, Damir On Thu, Mar 2, 2017 at 11:16 AM Christian Caruthers <ccaruth...@lenovo.com> wrote: > Damir, > > > > The net device naming you're seeing is consistent net device naming. > There's a write up for disabling it on the RHEL documentation site. > > > > > https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Disabling_Consistent_Network_Device_Naming.html > > > > I believe you can place the "net.ifnames=0" option in > bootparams.addkcmdline > > > > If I'm not mistaken, the hostname should be set by the DHCP server and > then hard coded using the hardeths postscript, if it's configured to run. > Is DNS working properly on the client? Are you stopping/disabling > NetworkManager during install? > > > > Regards, > *Christian Caruthers* > Lenovo xESS IT Consultant > > Mobile: 757-289-9872 <(757)%20289-9872> > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Thursday, March 2, 2017 11:33 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] redhat 7.3 and xCAT-2.9.1 > > > > Hi All, > > > > Our management server is running RedHat 6.6 and xcat version 2.9.1. > > > > We are hoping to upgrade all of our compute clients to RedHat 7 by July of > this year. > > > > We are going from stateless to stateful (installed on local hard drive) > images. > > > > To that end, I've done copycds of RH7.3 ISO and have followed xCAT > document on creating a new install image. > > > > We got one of the compute nodes installed and booted with 7.3 but there > are couple of problems: > > > > 1. Interface is named eno1 <-- we would like to change this permanently on > boot to eth0 > > 2. hostname is not set <-- node boots with localhost for hostname > > > > Any help is appreciated. > > Damir > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] redhat 7.3 and xCAT-2.9.1
Hi All, Our management server is running RedHat 6.6 and xcat version 2.9.1. We are hoping to upgrade all of our compute clients to RedHat 7 by July of this year. We are going from stateless to stateful (installed on local hard drive) images. To that end, I've done copycds of RH7.3 ISO and have followed xCAT document on creating a new install image. We got one of the compute nodes installed and booted with 7.3 but there are couple of problems: 1. Interface is named eno1 <-- we would like to change this permanently on boot to eth0 2. hostname is not set <-- node boots with localhost for hostname Any help is appreciated. Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] rhel7.1 working kickstart file
Thanks I've copied a file from /opt/xcat/share - let's see if it works. Damir On Mon, Jan 23, 2017 at 10:52 AM Russell Auld <russa...@comcast.net> wrote: > You definitely need to use a RH7 compatible script. > Look in /opt/xcat/share for sample files > > On Jan 23, 2017 11:25 AM, Damir Krstic <damir.krs...@gmail.com> wrote: > > I am trying to install RH7.1 (stateful) and the install keeps failing at > various points of my kickstart file. I think I may be using RH6 kickstart > file template in /install/custom/install/rh > > Does anyone have a working generic RH7.1 kickstart template that I can use? > > Thank you. > Damir > > > > -- > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] rhel7.1 working kickstart file
I am trying to install RH7.1 (stateful) and the install keeps failing at various points of my kickstart file. I think I may be using RH6 kickstart file template in /install/custom/install/rh Does anyone have a working generic RH7.1 kickstart template that I can use? Thank you. Damir -- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] statefull vs. stateless images
Hi Jarrod, Thanks for the prompt answer. I agree with you re. stateless. Next hardware purchase we will be going statefull. to that end, we are running following version of xCAT: [root@mgt rh]# rpm -qa |grep -i xcat conserver-xcat-8.1.16-10.x86_64 xCAT-2.9.1-snap201503190326.x86_64 xCAT-genesis-base-x86_64-2.9-snap201504212134.noarch elilo-xcat-3.14-4.noarch xCAT-server-2.9.1-snap201503190325.noarch grub2-xcat-1.0-2.noarch perl-xCAT-2.9.1-snap201503190325.noarch xCAT-buildkit-2.9.1-snap201503190326.noarch ipmitool-xcat-1.8.11-3.x86_64 xCAT-client-2.9.1-snap201503190325.noarch xCAT-genesis-scripts-x86_64-2.9.1-snap201503190326.noarch syslinux-xcat-3.86-2.noarch I think in order to deploy statefull version of RH7.3 we will need to update our xCAT. What is the most painless way of upgrading from our version to the latest stable RH 7 supporting version? Are there any gotchas or recommended practices when it comes to upgrade of xCAT? Last time I had to do this, instead of upgrading, I deployed a new xCAT server which was not too painful but I don't have the notes of what I had to do to get it going. I would much rather just upgrade the xCAT on this server because the machine itself is not that old (2 years or so now). Anything I should back up before attempting upgrade as well? Thanks, Damir On Fri, Jan 13, 2017 at 9:10 AM Jarrod Johnson <jjohns...@lenovo.com> wrote: > I think stateless makes a little less sense over time. > > > > 1) Local boot storage is cheaper and more durable than it used to > be, and this is only going to get more extreme > > 2) Dynamism is probably better and more easily served by somethig > like Singularity, which makes things easier for users to do their thing > without the administrators having to accommodate. > > 3) Mitigating drift can be done in other ways. Stateless has > traditionally had the side effect of mitigating accumulating ‘drift’ as > people do things ad-hoc to OS images, by punishing those practices. > Strictly speaking the same discipline can be self-imposed without downside, > it just takes some willpower. > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Friday, January 13, 2017 9:20 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] statefull vs. stateless images > > > > We have been running our cluster using stateless images for over 6 years > now. For the most part, things are running great. There are two reasons for > our decision to run stateless: > > 1. our compute nodes originally did not have local hard drives > > 2. we envisioned a dynamic environment in which we would boot nodes > frequently with different images to satisfy different research needs > > > > Today both of those points are invalid / do not apply. All of our compute > nodes come with hard drives, and we have never really booted cluster with > any images other than our "production" image. In addition, downtimes are > really hard to come by in our environment, and we treat our cluster as > production system. > > > > So, my question is, does it make sense to continue with stateless images, > or would we be better served with statefull (installed on local disk) > images. > > > > I question our today's method because: > > 1. stateless images are not trivial to build and update using genimage, > putting mellanox drivers, gpfs etc. We don't do it often enough so every > time we have to do it, we are re-inventing a wheel. > > 2. stateless images take up portion of compute node memory > > > > Are there any downsides to running a 700+ node cluster using statefull > images? Like I said, we don't boot the cluster at all for many months at > the time (we get a single downtime during the year), and most of the > packages outside of normal RH installation are installed using postscripts. > > > > Let me know your thoughts. > > > > Thanks, > > Damir > > -- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] configuring fpc
Thanks for the info - I was able to reset it by removing battery procedure. Next time I will look for the paperclip hole - that will save me some time. I got the FPC discovered and programmed. Thanks all. Damir On Mon, Jun 13, 2016 at 10:15 AM Christian Caruthers <ccaruth...@lenovo.com> wrote: > You can reset the FPC to factory default. It's an ornery process. If I > receall correctly, you remove the FPC for 10-minutes. Remove the battery > before reinserting the FPC. Let the FPC run w/o a battery for 10-minutes. > Remove thre FPC, replace the battery and reinsert the FPC. It's important > to respect the 10-minute guideline. I've seen where a customer was short by > a minute or so, and the FPC did not reset to factory default. > > > > Once that is done, configfpc should be able to find the default IP. > > > > Regards, > *Christian Caruthers* > Lenovo xESS IT Consultant > > Mobile: 757-289-9872 > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Monday, June 13, 2016 10:57 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] configuring fpc > > > > We got an empty N1200 from Lenovo some time back in anticipation of new > nodes arriving this summer. In previous times, Lenovo would program the > FPCs with some internal address as a part of their cluster solution > (172.30.101.141 for example). > > > > This empty chassis does not have its IP recorded in the paperwork we > received from Lenovo on delivery of the rack. > > > > I am trying to configure this FPC using configfpc command and if I try > using the defaults: configfpc -i bond0 I get no default IP found. > > > > I see the FPC on the switch port and I see its mac: > > qfivebnt08#show mac-address-table interface port 42 > > MAC address VLAN PortTrnk State Permanent Openflow > > - --- - - > > 6c:ae:8b:5e:56:14 142 FWD N > > > > I also have the fpc configured for the right switch port in xCAT: > > > > [root@mgt ~]# lsdef qfpc24 > > Object name: qfpc24 > > bmc=qfpc24 > > bmcpassword=PASSW0RD > > bmcusername=USERID > > cons=ipmi > > groups=rack-t22fpc,qfpc,all > > ip=172.30.11.24 > > mgt=ipmi > > nodetype=qfpc > > postbootscripts=otherpkgs,setupntp > > postscripts=syslog,remoteshell,syncfiles > > switch=qfivebnt08 > > switchport=42 > > I suspect this FPC is configured with some other IP (other than default) > but I don't know what that IP is since it's not documented. Any way of > programing the FPC if I don't have the IP? > > > > Thanks, > > Damir > > -- > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] configuring fpc
OK I'll try that. Thanks, Damir On Mon, Jun 13, 2016 at 10:15 AM Christian Caruthers <ccaruth...@lenovo.com> wrote: > You can reset the FPC to factory default. It's an ornery process. If I > receall correctly, you remove the FPC for 10-minutes. Remove the battery > before reinserting the FPC. Let the FPC run w/o a battery for 10-minutes. > Remove thre FPC, replace the battery and reinsert the FPC. It's important > to respect the 10-minute guideline. I've seen where a customer was short by > a minute or so, and the FPC did not reset to factory default. > > > > Once that is done, configfpc should be able to find the default IP. > > > > Regards, > *Christian Caruthers* > Lenovo xESS IT Consultant > > Mobile: 757-289-9872 > > > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Monday, June 13, 2016 10:57 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] configuring fpc > > > > We got an empty N1200 from Lenovo some time back in anticipation of new > nodes arriving this summer. In previous times, Lenovo would program the > FPCs with some internal address as a part of their cluster solution > (172.30.101.141 for example). > > > > This empty chassis does not have its IP recorded in the paperwork we > received from Lenovo on delivery of the rack. > > > > I am trying to configure this FPC using configfpc command and if I try > using the defaults: configfpc -i bond0 I get no default IP found. > > > > I see the FPC on the switch port and I see its mac: > > qfivebnt08#show mac-address-table interface port 42 > > MAC address VLAN PortTrnk State Permanent Openflow > > - --- - - > > 6c:ae:8b:5e:56:14 142 FWD N > > > > I also have the fpc configured for the right switch port in xCAT: > > > > [root@mgt ~]# lsdef qfpc24 > > Object name: qfpc24 > > bmc=qfpc24 > > bmcpassword=PASSW0RD > > bmcusername=USERID > > cons=ipmi > > groups=rack-t22fpc,qfpc,all > > ip=172.30.11.24 > > mgt=ipmi > > nodetype=qfpc > > postbootscripts=otherpkgs,setupntp > > postscripts=syslog,remoteshell,syncfiles > > switch=qfivebnt08 > > switchport=42 > > I suspect this FPC is configured with some other IP (other than default) > but I don't know what that IP is since it's not documented. Any way of > programing the FPC if I don't have the IP? > > > > Thanks, > > Damir > > -- > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] configuring fpc
We got an empty N1200 from Lenovo some time back in anticipation of new nodes arriving this summer. In previous times, Lenovo would program the FPCs with some internal address as a part of their cluster solution (172.30.101.141 for example). This empty chassis does not have its IP recorded in the paperwork we received from Lenovo on delivery of the rack. I am trying to configure this FPC using configfpc command and if I try using the defaults: configfpc -i bond0 I get no default IP found. I see the FPC on the switch port and I see its mac: qfivebnt08#show mac-address-table interface port 42 MAC address VLAN PortTrnk State Permanent Openflow - --- - - 6c:ae:8b:5e:56:14 142 FWD N I also have the fpc configured for the right switch port in xCAT: [root@mgt ~]# lsdef qfpc24 Object name: qfpc24 bmc=qfpc24 bmcpassword=PASSW0RD bmcusername=USERID cons=ipmi groups=rack-t22fpc,qfpc,all ip=172.30.11.24 mgt=ipmi nodetype=qfpc postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles switch=qfivebnt08 switchport=42 I suspect this FPC is configured with some other IP (other than default) but I don't know what that IP is since it's not documented. Any way of programing the FPC if I don't have the IP? Thanks, Damir -- What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] couldn't find the kernel file matched 2.6.32-504.16.2.el6.x86_64 in /install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2/rootimg/boot at ./genimage line 72..
Getting following error when generating new image: couldn't find the kernel file matched 2.6.32-504.16.2.el6.x86_64 in /install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2/rootimg/boot at ./genimage line 72.. Here is the lsdef for the image: [root@mgt boot]# lsdef -t osimage compute6.6gpfs4.2 Object name: compute6.6gpfs4.2 exlist=/install/custom/netboot/rh/compute6.6gpfs4.2.exlist imagetype=linux kerneldir=/install/kernels/ kernelver=2.6.32-504.16.2.el6.x86_64 osarch=x86_64 osdistroname=rhels6.6-x86_64 osname=Linux osvers=rhels6.6 otherpkgdir=/install/post/otherpkgs/rhels6.6/x86_64 otherpkglist=/install/custom/netboot/rh/compute6.6gpfs4.2.otherpkgs.pkglist permission=755 pkgdir=/install/rhels6.6/x86_64 pkglist=/install/custom/netboot/rh/compute6.6gpfs4.2.pkglist postinstall=/install/custom/netboot/rh/compute6.6gpfs4.2.postinstall profile=compute6.6gpfs4.2 provmethod=netboot rootimgdir=/install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2 synclists=/install/custom/netboot/rh/synclist6.6gpfs4.2 Here is the listing of the directory /install/kernels [root@mgt boot]# ls -l /install/kernels/ total 4 drwxr-xr-x 3 root root 4096 Mar 21 09:03 2.6.32-504.16.2.el6.x86_64 [root@mgt boot]# ls -l /install/kernels/2.6.32-504.16.2.el6.x86_64/ total 88160 -rwx-- 1 root root 371 Jan 13 10:02 install.sh -rw-r--r-- 1 root root 30526712 Apr 21 2015 kernel-2.6.32-504.16.2.el6.x86_64.rpm -rw-r--r-- 1 root root 31244080 Apr 21 2015 kernel-debug-2.6.32-504.16.2.el6.x86_64.rpm -rw-r--r-- 1 root root 9831692 Apr 21 2015 kernel-devel-2.6.32-504.16.2.el6.x86_64.rpm -rw-r--r-- 1 root root 15140648 Apr 21 2015 kernel-firmware-2.6.32-504.16.2.el6.noarch.rpm -rw-r--r-- 1 root root 3517060 Apr 21 2015 kernel-headers-2.6.32-504.16.2.el6.x86_64.rpm drwxr-xr-x 2 root root 4096 Mar 21 09:03 repodata I ran the createrepo command after copying the kernel files in this directory. Still getting the error in generating image. Any help is much appreciated. Thanks, Damir -- Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] issue when installing node
it turns out it was a bad eth cable. when watching console i see eth timing out so i replaced the cable and it worked. thanks, damir On Wed, Jan 13, 2016 at 12:32 AM Xiao Peng Wang <w...@cn.ibm.com> wrote: > From symptom looks like your eth0 booted from dhcp and it faild to get ip > from dhcpd. You may find dhcp failed log when this issue happens. > > Thanks > Best Regards > -- > Wang Xiaopeng (王晓朋) > IBM China System Technology Laboratory > Tel: 86-10-82453455 > Email: w...@cn.ibm.com > Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, > Haidian District Beijing P.R.China 100193 > > > > - Original message - > From: Damir Krstic <damir.krs...@gmail.com> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > > Cc: > Subject: Re: [xcat-user] issue when installing node > Date: Wed, Jan 13, 2016 9:09 AM > > Yes all deployed from single management using commands like: nodeadd > gpu[1-6] same thing for makehosts and makedns. That's why it's so weird and > confusing that only one mode is having this issue. > > Before I left work today I redeployed this mode using our compute > stateless image and that worked ok. Mode got the right name and eth > started. Not sure what else to check. I am suspecting dhcp issue at this > time but cannot be sure. > On Tue, Jan 12, 2016 at 16:17 Casandra H Qiu <cxh...@us.ibm.com> wrote: > > are all the 6 nodes defined on the same MN? check if the failed node > defined in the /etc/hosts. make sure you ran makedns and makedhcp and > compare node definition use lsdef command. > > > Thanks, > Casandra > ... > Casandra Hong Qiu > Phone: (845) 433-9291, t/l 293-9291 > Office: B/002, Floor 3, Z13 > cxh...@us.ibm.com > > > > [image: Inactive hide details for Damir Krstic ---01/12/2016 04:37:37 > PM---Hi, When installing new batch of GPU nodes via xCAT I've ran]Damir > Krstic ---01/12/2016 04:37:37 PM---Hi, When installing new batch of GPU > nodes via xCAT I've ran into an issue I > > From: Damir Krstic <damir.krs...@gmail.com> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Date: 01/12/2016 04:37 PM > Subject: [xcat-user] issue when installing node > -- > > > > > > > Hi, > > When installing new batch of GPU nodes via xCAT I've ran into an issue I > have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all > installed OK (nodeset osimage=gpu6.6) except one. > > Symptoms of the issue: node installs OK (watching it via rcons/wcons) but > on the reboot eth0 does not come up and hostname is set to > localhost.localdomain > > Has anyone seen this issue before? > > Thanks, > > Damir > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > [image: graycol.gif][image: graycol.gif] > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -
[xcat-user] issue when installing node
Hi, When installing new batch of GPU nodes via xCAT I've ran into an issue I have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all installed OK (nodeset osimage=gpu6.6) except one. Symptoms of the issue: node installs OK (watching it via rcons/wcons) but on the reboot eth0 does not come up and hostname is set to localhost.localdomain Has anyone seen this issue before? Thanks, Damir -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] nextscale nx360m4 GPU not booting from hard drive after installation
they are using pxe - i have traced the issue down to upgrade of kernel. initial install was redhat 6.6 kernel 504. after updating kernel to 504.16 node would not boot. i have to hook up the console to the node to see what it is doing - remote console does not show much information. i've reinstalled it with 504 kernel and it is booted now. On Tue, Jan 12, 2016 at 9:13 AM Rich Sudlow <r...@nd.edu> wrote: > On 01/11/2016 07:11 PM, Damir Krstic wrote: > > Hi, > > > > Installing 6 new GPU NextScale nodes...installation went fine but on the > reboot > > nodes get stuck after going through POST. Nothing on the console as far > as I can > > tell. Tried changing boot order to hard disk first and also changing > from UEFI > > to Legacy and that did not fix it. > > > > Also tried rsetboot hd and that did not seem to fix it > either. Any > > suggestion? > > When these nodes build are these using pxe or xnba - I'd recommend using > xnba. > > Rich > > > > > > > Thanks, > > Damir > > > > > > > -- > > Site24x7 APM Insight: Get Deep Visibility into Application Performance > > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > > Monitor end-to-end web transactions and take corrective actions now > > Troubleshoot faster and improve end-user experience. Signup Now! > > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > > > > > > > > ___ > > xCAT-user mailing list > > xCAT-user@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > > -- > Rich Sudlow > University of Notre Dame > Center for Research Computing - Union Station > 506 W. South St > South Bend, In 46601 > > (574) 631-7258 (office) > (574) 807-1046 (cell) > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] issue when installing node
Yes all deployed from single management using commands like: nodeadd gpu[1-6] same thing for makehosts and makedns. That's why it's so weird and confusing that only one mode is having this issue. Before I left work today I redeployed this mode using our compute stateless image and that worked ok. Mode got the right name and eth started. Not sure what else to check. I am suspecting dhcp issue at this time but cannot be sure. On Tue, Jan 12, 2016 at 16:17 Casandra H Qiu <cxh...@us.ibm.com> wrote: > are all the 6 nodes defined on the same MN? check if the failed node > defined in the /etc/hosts. make sure you ran makedns and makedhcp and > compare node definition use lsdef command. > > > Thanks, > Casandra > ... > Casandra Hong Qiu > Phone: (845) 433-9291, t/l 293-9291 > Office: B/002, Floor 3, Z13 > cxh...@us.ibm.com > > > > [image: Inactive hide details for Damir Krstic ---01/12/2016 04:37:37 > PM---Hi, When installing new batch of GPU nodes via xCAT I've ran]Damir > Krstic ---01/12/2016 04:37:37 PM---Hi, When installing new batch of GPU > nodes via xCAT I've ran into an issue I > > From: Damir Krstic <damir.krs...@gmail.com> > To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net> > Date: 01/12/2016 04:37 PM > Subject: [xcat-user] issue when installing node > -- > > > > > Hi, > > When installing new batch of GPU nodes via xCAT I've ran into an issue I > have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all > installed OK (nodeset osimage=gpu6.6) except one. > > Symptoms of the issue: node installs OK (watching it via rcons/wcons) but > on the reboot eth0 does not come up and hostname is set to > localhost.localdomain > > Has anyone seen this issue before? > > Thanks, > > Damir > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -- > Site24x7 APM Insight: Get Deep Visibility into Application Performance > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month > Monitor end-to-end web transactions and take corrective actions now > Troubleshoot faster and improve end-user experience. Signup Now! > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140 > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] nextscale nx360m4 GPU not booting from hard drive after installation
Hi, Installing 6 new GPU NextScale nodes...installation went fine but on the reboot nodes get stuck after going through POST. Nothing on the console as far as I can tell. Tried changing boot order to hard disk first and also changing from UEFI to Legacy and that did not fix it. Also tried rsetboot hd and that did not seem to fix it either. Any suggestion? Thanks, Damir -- Site24x7 APM Insight: Get Deep Visibility into Application Performance APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month Monitor end-to-end web transactions and take corrective actions now Troubleshoot faster and improve end-user experience. Signup Now! http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] issue programing bmc
Yes trying to use shared on board gigabit port: [xCAT Genesis running on qhimem0004 /bin]# ipmitool raw 0xc 2 1 0xc0 0 0 11 01 On Tue, Oct 6, 2015 at 12:36 PM Jarrod Johnson <jjohns...@lenovo.com> wrote: > Can you do an ipmitool raw 0xc 2 1 0xc0 0 0 > > > > I assume you are trying to use shared on the on board gigabit port? If > another configuration , let me know. > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Tuesday, October 06, 2015 10:56 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] issue programing bmc > > > > We have new x3550M5 that we just discovered. After being discovered I ran > runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1 > tells me the following: > > [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1 > > Set in Progress : Set Complete > > Auth Type Support : NONE MD5 PASSWORD > > Auth Type Enable: Callback : > > : User : MD5 PASSWORD > > : Operator : MD5 PASSWORD > > : Admin: MD5 > > : OEM : > > IP Address Source : Static Address > > IP Address : 172.29.10.14 > > Subnet Mask : 255.255.0.0 > > MAC Address : 40:f2:e9:bb:86:dd > > SNMP Community String : public > > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 > > BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled > > Gratituous ARP Intrvl : 2.0 seconds > > Default Gateway IP : 0.0.0.0 > > Default Gateway MAC : 00:00:00:00:00:00 > > Backup Gateway IP : 0.0.0.0 > > Backup Gateway MAC : 00:00:00:00:00:00 > > 802.1q VLAN ID : Disabled > > 802.1q VLAN Priority: 0 > > RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 > > Cipher Suite Priv Max : Xaa > > : X=Cipher Suite Unused > > : c=CALLBACK > > : u=USER > > : o=OPERATOR > > : a=ADMIN > > : O=OEM > > > > The IP is correct, and for set it indicates complete. However, I can not > telnet to this IP (qhimem0004-bmc). > > Here is the hosts table entry for this node: > > "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface", > > Here is the /etc/hosts file entry: > > 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc > > here is the nodels entry for the ipmi port: > > [root@mgt log]# nodels qhimem0004 ipmi.bmcport > > qhimem0004: 0 > > When trying to telnet to this address I get no route to host. I have > logged in to the switch itself and see mac address of the interface show up > on the port on the switch, but not the mac of the BMC port. > > > > Any help is greatly appreciated. > > Thanks, > > Damir > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] issue programing bmc
Thank you so much that fixed it. Damir On Tue, Oct 6, 2015 at 12:36 PM Jarrod Johnson <jjohns...@lenovo.com> wrote: > Can you do an ipmitool raw 0xc 2 1 0xc0 0 0 > > > > I assume you are trying to use shared on the on board gigabit port? If > another configuration , let me know. > > > > *From:* Damir Krstic [mailto:damir.krs...@gmail.com] > *Sent:* Tuesday, October 06, 2015 10:56 AM > *To:* xCAT Users Mailing list > *Subject:* [xcat-user] issue programing bmc > > > > We have new x3550M5 that we just discovered. After being discovered I ran > runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1 > tells me the following: > > [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1 > > Set in Progress : Set Complete > > Auth Type Support : NONE MD5 PASSWORD > > Auth Type Enable: Callback : > > : User : MD5 PASSWORD > > : Operator : MD5 PASSWORD > > : Admin: MD5 > > : OEM : > > IP Address Source : Static Address > > IP Address : 172.29.10.14 > > Subnet Mask : 255.255.0.0 > > MAC Address : 40:f2:e9:bb:86:dd > > SNMP Community String : public > > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 > > BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled > > Gratituous ARP Intrvl : 2.0 seconds > > Default Gateway IP : 0.0.0.0 > > Default Gateway MAC : 00:00:00:00:00:00 > > Backup Gateway IP : 0.0.0.0 > > Backup Gateway MAC : 00:00:00:00:00:00 > > 802.1q VLAN ID : Disabled > > 802.1q VLAN Priority: 0 > > RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 > > Cipher Suite Priv Max : Xaa > > : X=Cipher Suite Unused > > : c=CALLBACK > > : u=USER > > : o=OPERATOR > > : a=ADMIN > > : O=OEM > > > > The IP is correct, and for set it indicates complete. However, I can not > telnet to this IP (qhimem0004-bmc). > > Here is the hosts table entry for this node: > > "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface", > > Here is the /etc/hosts file entry: > > 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc > > here is the nodels entry for the ipmi port: > > [root@mgt log]# nodels qhimem0004 ipmi.bmcport > > qhimem0004: 0 > > When trying to telnet to this address I get no route to host. I have > logged in to the switch itself and see mac address of the interface show up > on the port on the switch, but not the mac of the BMC port. > > > > Any help is greatly appreciated. > > Thanks, > > Damir > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] issue programing bmc
I'll try power cycle. Thanks for the suggestion. I wish I could fully understand the bmcsetup process. For example, if I manually run bmcsetup out of /bin directory, it runs and completes and lights up the blue light on the front of the server. However, node destiny never changes from bmcsetup and the programmed IP is not pingable/telnet does not work. I'll try power cycle as you suggested. I just wish there were other troubleshooting steps I can take to see where I stand with this node. I tried tcpdump from the management node and I don't see any traffic with the mac of the shared eth interface come across the management bmc interface. Thanks, Damir On Tue, Oct 6, 2015 at 12:23 PM David D Johnson <david_john...@brown.edu> wrote: > If I remember correctly, changing the IP address does not take effect > until the IMM/BMC is > reset, or power-cycled. > > On Oct 6, 2015, at 1:03 PM, Damir Krstic <damir.krs...@gmail.com> wrote: > > I am not sure that's the case with these nodes. I have provisioned few > x3550M5s over few days and none of them had this issue. The issue with this > node is that I mis-provisioned it (using wrong IP etc.) so this morning I > cleared everything out and tried again. I see via ipmitool that set is > complete, but the destiny of the node never changes from bmcsetup. > > Is there way to force it to reprogram again? > > Damir > > On Tue, Oct 6, 2015 at 10:59 AM David D Johnson <david_john...@brown.edu> > wrote: > >> My suspicion is that your IMM2 is set to use the dedicated IMM ethernet >> port, >> but you intended to use the shared IMM/eth0 port instead. >> >> This needs to be configured using the UEFI -- hit configuration, >> under the tab >> where other IMM network settings are found. If there is a way to do this >> using ipmitool, >> I have not found it. >> >> If you plug a separate cable between switch and dedicated IMM port, and >> the MAC >> you're looking for shows up on that switch port when you ping it, you can >> then use ASU to change >> IMM.SharedNicMode from Dedicated to Shared and then reboot the imm >> (ipmitool mc reset cold). >> [note -- I checked this on an x3550-M4, sometimes the variables are >> spelled differently from release to release]. >> >> Remove the extra cable, and you should be back in business. >> >> -- ddj >> Dave Johnson >> Brown University CCV >> >> On Oct 6, 2015, at 10:56 AM, Damir Krstic <damir.krs...@gmail.com> wrote: >> >> We have new x3550M5 that we just discovered. After being discovered I ran >> runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1 >> tells me the following: >> >> [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1 >> >> Set in Progress : Set Complete >> >> Auth Type Support : NONE MD5 PASSWORD >> >> Auth Type Enable: Callback : >> >> : User : MD5 PASSWORD >> >> : Operator : MD5 PASSWORD >> >> : Admin: MD5 >> >> : OEM : >> >> IP Address Source : Static Address >> >> IP Address : 172.29.10.14 >> >> Subnet Mask : 255.255.0.0 >> >> MAC Address : 40:f2:e9:bb:86:dd >> >> SNMP Community String : public >> >> IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 >> >> BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled >> >> Gratituous ARP Intrvl : 2.0 seconds >> >> Default Gateway IP : 0.0.0.0 >> >> Default Gateway MAC : 00:00:00:00:00:00 >> >> Backup Gateway IP : 0.0.0.0 >> >> Backup Gateway MAC : 00:00:00:00:00:00 >> >> 802.1q VLAN ID : Disabled >> >> 802.1q VLAN Priority: 0 >> >> RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 >> >> Cipher Suite Priv Max : Xaa >> >> : X=Cipher Suite Unused >> >> : c=CALLBACK >> >> : u=USER >> >> : o=OPERATOR >> >> : a=ADMIN >> >> : O=OEM >> >> >> The IP is correct, and for set it indicates complete. However, I can not >> telnet to this IP (qhimem0004-bmc). &g
Re: [xcat-user] issue programing bmc
I am not sure that's the case with these nodes. I have provisioned few x3550M5s over few days and none of them had this issue. The issue with this node is that I mis-provisioned it (using wrong IP etc.) so this morning I cleared everything out and tried again. I see via ipmitool that set is complete, but the destiny of the node never changes from bmcsetup. Is there way to force it to reprogram again? Damir On Tue, Oct 6, 2015 at 10:59 AM David D Johnson <david_john...@brown.edu> wrote: > My suspicion is that your IMM2 is set to use the dedicated IMM ethernet > port, > but you intended to use the shared IMM/eth0 port instead. > > This needs to be configured using the UEFI -- hit configuration, > under the tab > where other IMM network settings are found. If there is a way to do this > using ipmitool, > I have not found it. > > If you plug a separate cable between switch and dedicated IMM port, and > the MAC > you're looking for shows up on that switch port when you ping it, you can > then use ASU to change > IMM.SharedNicMode from Dedicated to Shared and then reboot the imm > (ipmitool mc reset cold). > [note -- I checked this on an x3550-M4, sometimes the variables are > spelled differently from release to release]. > > Remove the extra cable, and you should be back in business. > > -- ddj > Dave Johnson > Brown University CCV > > On Oct 6, 2015, at 10:56 AM, Damir Krstic <damir.krs...@gmail.com> wrote: > > We have new x3550M5 that we just discovered. After being discovered I ran > runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1 > tells me the following: > > [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1 > > Set in Progress : Set Complete > > Auth Type Support : NONE MD5 PASSWORD > > Auth Type Enable: Callback : > > : User : MD5 PASSWORD > > : Operator : MD5 PASSWORD > > : Admin: MD5 > > : OEM : > > IP Address Source : Static Address > > IP Address : 172.29.10.14 > > Subnet Mask : 255.255.0.0 > > MAC Address : 40:f2:e9:bb:86:dd > > SNMP Community String : public > > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10 > > BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled > > Gratituous ARP Intrvl : 2.0 seconds > > Default Gateway IP : 0.0.0.0 > > Default Gateway MAC : 00:00:00:00:00:00 > > Backup Gateway IP : 0.0.0.0 > > Backup Gateway MAC : 00:00:00:00:00:00 > > 802.1q VLAN ID : Disabled > > 802.1q VLAN Priority: 0 > > RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 > > Cipher Suite Priv Max : Xaa > > : X=Cipher Suite Unused > > : c=CALLBACK > > : u=USER > > : o=OPERATOR > > : a=ADMIN > > : O=OEM > > > The IP is correct, and for set it indicates complete. However, I can not > telnet to this IP (qhimem0004-bmc). > > Here is the hosts table entry for this node: > > "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface", > > Here is the /etc/hosts file entry: > > 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc > > here is the nodels entry for the ipmi port: > > [root@mgt log]# nodels qhimem0004 ipmi.bmcport > > qhimem0004: 0 > > When trying to telnet to this address I get no route to host. I have > logged in to the switch itself and see mac address of the interface show up > on the port on the switch, but not the mac of the BMC port. > > > Any help is greatly appreciated. > > Thanks, > > Damir > > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -- > ___ > xCAT-user mailing list > xCAT-user@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] frustration with booting of x3650m5
Hi Jarrod, I'll try to upgrade to this firmware level in a bit. Re. meeting today, here is 8:36 right now. I am available any time today. Any of the options with addition of hangouts will/should work for me. If screen sharing does not work, I can create an account for you and we can do terminal screen share. Thank you so much. Damir On Fri, Aug 14, 2015 at 7:49 AM Jarrod Johnson jjohns...@lenovo.com wrote: FYI, what we are using right now is: http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.bin (doc at http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.txt and changelog at http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.chg ) Compared to your version, it contains a fix I have been suspecting to be related to your difficulties: Fixes: - Fixed issue where the system boot hangs when Legacy BIOS is disabled (using HII) on certain UEFI systems. (SCGCQ00637088) # sas3flash.x86_64 -list -c 0 LSI Corporation SAS3 Flash Utility Version 07.00.00.00 (2014.08.14) Copyright (C) 2008-2014 LSI Corporation. All rights reserved Adapter Selected is a LSI SAS: SAS3008(C0) Controller Number : 0 Controller : SAS3008(C0) PCI Address : 00:08:00:00 SAS Address : 500605b-0-0812-5070 NVDATA Version (Default) : 07.01.00.07 NVDATA Version (Persistent) : 07.01.00.08 Firmware Product ID : 0x2221 (IT) Firmware Version : 07.00.01.00 NVDATA Vendor : LSI NVDATA Product ID : N2226 HBA BIOS Version : 08.15.00.00 UEFI BSD Version : 08.00.00.00 *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 5:43 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Sure call would be great On Thu, Aug 13, 2015 at 16:37 Jarrod Johnson jjohns...@lenovo.com wrote: Odd... I don't anticipate driver issue (6.6 built in should suffice). I'll double check when I get back to office in 15 hours or so. If you have time we can arrange a call to just look at it live... On Aug 13, 2015 5:07 PM, Damir Krstic damir.krs...@gmail.com wrote: firmware version: MPT3BIOS*8.07.01.00 (2013.11.15) I think in my last email I was not clear. With following disabled, system did boot but did not see any of the LUNs: DevicesandIOPorts.UEFI_Slot4 disable DevicesandIOPorts.UEFI_Slot1 disable DevicesandIOPorts.Legacy_Slot4 disable DevicesandIOPorts.Legacy_Slot1 disable So you are probably right about drivers but...I did little bit of searching on the internet for the drivers I would need for these cards, and hits suggest that I need following drivers (both are loaded according to modprobe): modprobe --list |grep mptsas kernel/drivers/message/fusion/mptsas.ko [root@qstorage23 ~]# !252 modprobe --list |grep mptctl kernel/drivers/message/fusion/mptctl.ko Also this is what's showing in lspci of the booted system with these two HBAs: 06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) 10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com wrote: I meant to query the current version… *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:48 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 it was looping again but with what seemed more of a delay before a reboot. model of the sas controller is LSI3008. if you need anything else, please let me know. I really appreciate all your help. Damir On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com wrote: If the symptom changed from looping to hanging, then I'll need the adapter model of the slot1/slot4 SAS cards to give precise guidance. Probably a good idea to let me know that anyway. *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com] *Sent:* Thursday, August 13, 2015 4:37 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Does it loop failing to boot or does it hang trying to boot? *From:* Damir Krstic [mailto:damir.krs...@gmail.com damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 we initially disabled DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled and DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled With this setting system booted but did not see any LUNs. After your email, we re-enabled aforementioned settings and disabled following: DevicesandIOPorts.UEFI_Slot1=Disable DevicesandIOPorts.UEFI_Slot4=Disable
Re: [xcat-user] frustration with booting of x3650m5
Hi Jarrod, Thanks so much for your reply. Here is the output of the command you requested: [xCAT Genesis running on qstorage24 /]# efibootmgr -v Fatal: Couldn't open either sysfs or procfs directories for accessing EFI variables. Try 'modprobe efivars' as root. I did try modprobe efivars but it tells me that module efivars is not available. Thanks, Damir On Thu, Aug 13, 2015 at 10:39 AM Jarrod Johnson jjohns...@lenovo.com wrote: I'll try to do this through email, but may break down to a direct conversation (if you like). If you can 'nodeset shell' and boot the system to network, I'm interested in efibootmgr -v output. Usually in a UEFI style boot, you'll get an entry like: Boot0009* Red Hat Enterprise Linux 6 HD(1,800,19000,2737e48b-741f-461b-8ab1-7c7ea9ef8706)File(\EFI\redhat\grub.efi) If legacy booting and/or wanting the generic style options to work straightforward way without having to contend with external LUNs confusing things too much, the easiest thing to do would be to disable the boot support of the uefi/option rom for the adapter. For example (adjust for your slot). n1: DevicesandIOPorts.UEFI_Slot1=Disable n1: DevicesandIOPorts.Legacy_Slot1=Disable *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 11:15 AM *To:* xCAT Users Mailing list *Subject:* [xcat-user] frustration with booting of x3650m5 I just finished installing couple of new NSD servers using RH6.6. After the installation they would not boot from the local hard drive(s) (LSI M5210 RAID1). I went into bios and added boot option generic and added hard disk 0 through 4 and the servers booted fine after that. However, after plugging in SAS cables to DCS3700 controller and zonning the LUNs on the 3700 servers are not booting (constant boot loop). I've added and removed generic boot option (hdd) and I've changed boot mode from UEFI to legacy and still can't get them to boot. It's been very frustrating morning to say the least. Anyone else experience anything like this on 3650M5? Thanks, Damir -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] frustration with booting of x3650m5
Sure call would be great On Thu, Aug 13, 2015 at 16:37 Jarrod Johnson jjohns...@lenovo.com wrote: Odd... I don't anticipate driver issue (6.6 built in should suffice). I'll double check when I get back to office in 15 hours or so. If you have time we can arrange a call to just look at it live... On Aug 13, 2015 5:07 PM, Damir Krstic damir.krs...@gmail.com wrote: firmware version: MPT3BIOS*8.07.01.00 (2013.11.15) I think in my last email I was not clear. With following disabled, system did boot but did not see any of the LUNs: DevicesandIOPorts.UEFI_Slot4 disable DevicesandIOPorts.UEFI_Slot1 disable DevicesandIOPorts.Legacy_Slot4 disable DevicesandIOPorts.Legacy_Slot1 disable So you are probably right about drivers but...I did little bit of searching on the internet for the drivers I would need for these cards, and hits suggest that I need following drivers (both are loaded according to modprobe): modprobe --list |grep mptsas kernel/drivers/message/fusion/mptsas.ko [root@qstorage23 ~]# !252 modprobe --list |grep mptctl kernel/drivers/message/fusion/mptctl.ko Also this is what's showing in lspci of the booted system with these two HBAs: 06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) 10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com wrote: I meant to query the current version… *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:48 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 it was looping again but with what seemed more of a delay before a reboot. model of the sas controller is LSI3008. if you need anything else, please let me know. I really appreciate all your help. Damir On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com wrote: If the symptom changed from looping to hanging, then I'll need the adapter model of the slot1/slot4 SAS cards to give precise guidance. Probably a good idea to let me know that anyway. *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com] *Sent:* Thursday, August 13, 2015 4:37 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Does it loop failing to boot or does it hang trying to boot? *From:* Damir Krstic [mailto:damir.krs...@gmail.com damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 we initially disabled DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled and DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled With this setting system booted but did not see any LUNs. After your email, we re-enabled aforementioned settings and disabled following: DevicesandIOPorts.UEFI_Slot1=Disable DevicesandIOPorts.UEFI_Slot4=Disable DevicesandIOPorts.Legacy_Slot1=Disable DevicesandIOPorts.Legacy_Slot4=Disable And system now is again not booting. Thanks, Damir On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com wrote: Did you disable the slots or just the 'Legacy' and 'UEFI' items? The 'Legacy' and 'UEFI' items control how it can boot, but: x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable If that is not 'Enable' for the slot, then the OS won't see it either. So that should be 'Enable', and the other two things should be 'Disable' for easiest scenario. Is this currently the situation? *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 3:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 that worked in terms of getting the nsd server to boot but since both sas controllers are disabled, i can't see the LUNs. So how do I get this server to boot AND see all the LUNs i.e. have SAS HBAs enabled? thanks, Damir On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com wrote: So first is to get the asu utility: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU There is 'pasu' frontend in latest latest xCAT that will work with that package. If installing that rpm, try: pasu nodename show all As an example Failing that: asu64 --host immhostorip --username USERID --password Passw0rdhere show oll So I have a x3650 M5 here, I don't know which slots are installed where in yours. (edited) # pasu x1 show all|grep Slot x1: DevicesandIOPorts.UEFI_Slot1=Enable x1: DevicesandIOPorts.UEFI_Slot2=Enable x1: DevicesandIOPorts.UEFI_Slot3=Enable x1: DevicesandIOPorts.UEFI_Slot4=Enable x1: DevicesandIOPorts.UEFI_Slot5=Enable x1: DevicesandIOPorts.UEFI_Slot9=Enable x1: DevicesandIOPorts.Legacy_Slot1=Enable
Re: [xcat-user] frustration with booting of x3650m5
firmware version: MPT3BIOS*8.07.01.00 (2013.11.15) I think in my last email I was not clear. With following disabled, system did boot but did not see any of the LUNs: DevicesandIOPorts.UEFI_Slot4 disable DevicesandIOPorts.UEFI_Slot1 disable DevicesandIOPorts.Legacy_Slot4 disable DevicesandIOPorts.Legacy_Slot1 disable So you are probably right about drivers but...I did little bit of searching on the internet for the drivers I would need for these cards, and hits suggest that I need following drivers (both are loaded according to modprobe): modprobe --list |grep mptsas kernel/drivers/message/fusion/mptsas.ko [root@qstorage23 ~]# !252 modprobe --list |grep mptctl kernel/drivers/message/fusion/mptctl.ko Also this is what's showing in lspci of the booted system with these two HBAs: 06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) 10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com wrote: I meant to query the current version… *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:48 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 it was looping again but with what seemed more of a delay before a reboot. model of the sas controller is LSI3008. if you need anything else, please let me know. I really appreciate all your help. Damir On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com wrote: If the symptom changed from looping to hanging, then I'll need the adapter model of the slot1/slot4 SAS cards to give precise guidance. Probably a good idea to let me know that anyway. *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com] *Sent:* Thursday, August 13, 2015 4:37 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Does it loop failing to boot or does it hang trying to boot? *From:* Damir Krstic [mailto:damir.krs...@gmail.com damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 we initially disabled DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled and DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled With this setting system booted but did not see any LUNs. After your email, we re-enabled aforementioned settings and disabled following: DevicesandIOPorts.UEFI_Slot1=Disable DevicesandIOPorts.UEFI_Slot4=Disable DevicesandIOPorts.Legacy_Slot1=Disable DevicesandIOPorts.Legacy_Slot4=Disable And system now is again not booting. Thanks, Damir On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com wrote: Did you disable the slots or just the 'Legacy' and 'UEFI' items? The 'Legacy' and 'UEFI' items control how it can boot, but: x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable If that is not 'Enable' for the slot, then the OS won't see it either. So that should be 'Enable', and the other two things should be 'Disable' for easiest scenario. Is this currently the situation? *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 3:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 that worked in terms of getting the nsd server to boot but since both sas controllers are disabled, i can't see the LUNs. So how do I get this server to boot AND see all the LUNs i.e. have SAS HBAs enabled? thanks, Damir On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com wrote: So first is to get the asu utility: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU There is 'pasu' frontend in latest latest xCAT that will work with that package. If installing that rpm, try: pasu nodename show all As an example Failing that: asu64 --host immhostorip --username USERID --password Passw0rdhere show oll So I have a x3650 M5 here, I don't know which slots are installed where in yours. (edited) # pasu x1 show all|grep Slot x1: DevicesandIOPorts.UEFI_Slot1=Enable x1: DevicesandIOPorts.UEFI_Slot2=Enable x1: DevicesandIOPorts.UEFI_Slot3=Enable x1: DevicesandIOPorts.UEFI_Slot4=Enable x1: DevicesandIOPorts.UEFI_Slot5=Enable x1: DevicesandIOPorts.UEFI_Slot9=Enable x1: DevicesandIOPorts.Legacy_Slot1=Enable x1: DevicesandIOPorts.Legacy_Slot2=Enable x1: DevicesandIOPorts.Legacy_Slot3=Enable x1: DevicesandIOPorts.Legacy_Slot4=Enable x1: DevicesandIOPorts.Legacy_Slot5=Enable x1: DevicesandIOPorts.Legacy_Slot9=Enable So let's say that my SAS hba was in slot5: # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable Rinse and repeat per
Re: [xcat-user] frustration with booting of x3650m5
we initially disabled DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled and DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled With this setting system booted but did not see any LUNs. After your email, we re-enabled aforementioned settings and disabled following: DevicesandIOPorts.UEFI_Slot1=Disable DevicesandIOPorts.UEFI_Slot4=Disable DevicesandIOPorts.Legacy_Slot1=Disable DevicesandIOPorts.Legacy_Slot4=Disable And system now is again not booting. Thanks, Damir On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com wrote: Did you disable the slots or just the 'Legacy' and 'UEFI' items? The 'Legacy' and 'UEFI' items control how it can boot, but: x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable If that is not 'Enable' for the slot, then the OS won't see it either. So that should be 'Enable', and the other two things should be 'Disable' for easiest scenario. Is this currently the situation? *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 3:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 that worked in terms of getting the nsd server to boot but since both sas controllers are disabled, i can't see the LUNs. So how do I get this server to boot AND see all the LUNs i.e. have SAS HBAs enabled? thanks, Damir On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com wrote: So first is to get the asu utility: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU There is 'pasu' frontend in latest latest xCAT that will work with that package. If installing that rpm, try: pasu nodename show all As an example Failing that: asu64 --host immhostorip --username USERID --password Passw0rdhere show oll So I have a x3650 M5 here, I don't know which slots are installed where in yours. (edited) # pasu x1 show all|grep Slot x1: DevicesandIOPorts.UEFI_Slot1=Enable x1: DevicesandIOPorts.UEFI_Slot2=Enable x1: DevicesandIOPorts.UEFI_Slot3=Enable x1: DevicesandIOPorts.UEFI_Slot4=Enable x1: DevicesandIOPorts.UEFI_Slot5=Enable x1: DevicesandIOPorts.UEFI_Slot9=Enable x1: DevicesandIOPorts.Legacy_Slot1=Enable x1: DevicesandIOPorts.Legacy_Slot2=Enable x1: DevicesandIOPorts.Legacy_Slot3=Enable x1: DevicesandIOPorts.Legacy_Slot4=Enable x1: DevicesandIOPorts.Legacy_Slot5=Enable x1: DevicesandIOPorts.Legacy_Slot9=Enable So let's say that my SAS hba was in slot5: # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable Rinse and repeat per relevant HBA. UEFI style boot is meant to simplify this scenario, but this is a way to make things back to as simple as they were before external block devices start mucking about. *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 2:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 not sure how to do that - i have listed all of the asu options on the server and disabling non-raid sas adapters i am not sure how to do? do you have an example of 3650m5 with sas hba cards boot manager options and settings? thanks, damir On Thu, Aug 13, 2015 at 12:43 PM Jarrod Johnson jjohns...@lenovo.com wrote: Ok, so it must have bios booted then…. How about using asu to disable the boot firmware for the non-RAID SAS adapters? That may simplify things back down to reason. *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 12:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Hi Jarrod, Thanks so much for your reply. Here is the output of the command you requested: [xCAT Genesis running on qstorage24 /]# efibootmgr -v Fatal: Couldn't open either sysfs or procfs directories for accessing EFI variables. Try 'modprobe efivars' as root. I did try modprobe efivars but it tells me that module efivars is not available. Thanks, Damir On Thu, Aug 13, 2015 at 10:39 AM Jarrod Johnson jjohns...@lenovo.com wrote: I'll try to do this through email, but may break down to a direct conversation (if you like). If you can 'nodeset shell' and boot the system to network, I'm interested in efibootmgr -v output. Usually in a UEFI style boot, you'll get an entry like: Boot0009* Red Hat Enterprise Linux 6 HD(1,800,19000,2737e48b-741f-461b-8ab1-7c7ea9ef8706)File(\EFI\redhat\grub.efi) If legacy booting and/or wanting the generic style options to work straightforward way without having to contend with external LUNs confusing things too much, the easiest thing to do would be to disable the boot support of the uefi/option rom for the adapter. For example (adjust for your slot). n1: DevicesandIOPorts.UEFI_Slot1=Disable n1: DevicesandIOPorts.Legacy_Slot1=Disable
Re: [xcat-user] frustration with booting of x3650m5
it was looping again but with what seemed more of a delay before a reboot. model of the sas controller is LSI3008. if you need anything else, please let me know. I really appreciate all your help. Damir On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com wrote: If the symptom changed from looping to hanging, then I'll need the adapter model of the slot1/slot4 SAS cards to give precise guidance. Probably a good idea to let me know that anyway. *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com] *Sent:* Thursday, August 13, 2015 4:37 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Does it loop failing to boot or does it hang trying to boot? *From:* Damir Krstic [mailto:damir.krs...@gmail.com damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 4:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 we initially disabled DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled and DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled With this setting system booted but did not see any LUNs. After your email, we re-enabled aforementioned settings and disabled following: DevicesandIOPorts.UEFI_Slot1=Disable DevicesandIOPorts.UEFI_Slot4=Disable DevicesandIOPorts.Legacy_Slot1=Disable DevicesandIOPorts.Legacy_Slot4=Disable And system now is again not booting. Thanks, Damir On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com wrote: Did you disable the slots or just the 'Legacy' and 'UEFI' items? The 'Legacy' and 'UEFI' items control how it can boot, but: x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable If that is not 'Enable' for the slot, then the OS won't see it either. So that should be 'Enable', and the other two things should be 'Disable' for easiest scenario. Is this currently the situation? *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 3:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 that worked in terms of getting the nsd server to boot but since both sas controllers are disabled, i can't see the LUNs. So how do I get this server to boot AND see all the LUNs i.e. have SAS HBAs enabled? thanks, Damir On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com wrote: So first is to get the asu utility: https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU There is 'pasu' frontend in latest latest xCAT that will work with that package. If installing that rpm, try: pasu nodename show all As an example Failing that: asu64 --host immhostorip --username USERID --password Passw0rdhere show oll So I have a x3650 M5 here, I don't know which slots are installed where in yours. (edited) # pasu x1 show all|grep Slot x1: DevicesandIOPorts.UEFI_Slot1=Enable x1: DevicesandIOPorts.UEFI_Slot2=Enable x1: DevicesandIOPorts.UEFI_Slot3=Enable x1: DevicesandIOPorts.UEFI_Slot4=Enable x1: DevicesandIOPorts.UEFI_Slot5=Enable x1: DevicesandIOPorts.UEFI_Slot9=Enable x1: DevicesandIOPorts.Legacy_Slot1=Enable x1: DevicesandIOPorts.Legacy_Slot2=Enable x1: DevicesandIOPorts.Legacy_Slot3=Enable x1: DevicesandIOPorts.Legacy_Slot4=Enable x1: DevicesandIOPorts.Legacy_Slot5=Enable x1: DevicesandIOPorts.Legacy_Slot9=Enable So let's say that my SAS hba was in slot5: # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable Rinse and repeat per relevant HBA. UEFI style boot is meant to simplify this scenario, but this is a way to make things back to as simple as they were before external block devices start mucking about. *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 2:59 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 not sure how to do that - i have listed all of the asu options on the server and disabling non-raid sas adapters i am not sure how to do? do you have an example of 3650m5 with sas hba cards boot manager options and settings? thanks, damir On Thu, Aug 13, 2015 at 12:43 PM Jarrod Johnson jjohns...@lenovo.com wrote: Ok, so it must have bios booted then…. How about using asu to disable the boot firmware for the non-RAID SAS adapters? That may simplify things back down to reason. *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Thursday, August 13, 2015 12:26 PM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] frustration with booting of x3650m5 Hi Jarrod, Thanks so much for your reply. Here is the output of the command you requested: [xCAT Genesis running on qstorage24 /]# efibootmgr -v Fatal: Couldn't open either sysfs or procfs directories for accessing EFI variables. Try 'modprobe efivars
[xcat-user] x3650 M5 Kickstart fails with no disks found
We are installing couple of brand new x3650 M5 servers using RHEL6.2 kickstart file. File has not been modified from default in any way. Installation fails with no disks found even though we did configure RAID1 in bios. Has anyone seen this issue? Thanks, Damir -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] x3650 M5 Kickstart fails with no disks found
Thanks Rich. Installing RH6.6 worked for us. Damir On Mon, Aug 10, 2015 at 12:27 PM Rich Sudlow r...@nd.edu wrote: On 08/10/2015 11:23 AM, Damir Krstic wrote: We are installing couple of brand new x3650 M5 servers using RHEL6.2 kickstart file. File has not been modified from default in any way. Installation fails with no disks found even though we did configure RAID1 in bios. Has anyone seen this issue? I don't remember specifically this issue but we have had similar issues with hardware especially processors not being supported in older versions of RHEL (Like 6.2). I'd suggest trying a newer version of RHELS like 6.6 or 6.7. Rich Thanks, Damir -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Rich Sudlow University of Notre Dame Center for Research Computing - Union Station 506 W. South St South Bend, In 46601 (574) 631-7258 (office) (574) 807-1046 (cell) -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] systemimager-server missing on sourceforge
We are hoping to use image clone to deploy our stateful nodes (gpu). Trying to install systemimager-server is giving us an error: Downloading Packages: https://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/perl-AppConfig-1.52-4.noarch.rpm: [Errno 12] Timeout on http://master.dl.sourceforge.net/project/xcat/yum/xcat-dep/rh6/x86_64/perl-AppConfig-1.52-4.noarch.rpm: (28, 'connect() timed out!') Trying other mirror. https://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/systemconfigurator-2.2.11-1.noarch.rpm: [Errno 14] PYCURL ERROR 7 - couldn't connect to host Trying other mirror. I am guessing this is related to the sourceforge outage from last week. Is there another way of installing required packages? Thanks, Damir -- ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] best way to populate nodepos table
Thank you all so much. On Fri, Jul 10, 2015 at 09:40 Jarrod Johnson jjohns...@lenovo.com wrote: Oh, btw, does nodels nodes vpd.serial give you what you expect? If you want to do it, there's not a particularly well built in, but to make a script that would do it: # rinv n2-n4 serial|grep System | sed -e 's/^/nodech /' -e 's/: System Serial Number: / nodepos.comments=/' nodech n3 nodepos.comments=06CAWPX nodech n4 nodepos.comments=06CAWPY nodech n2 nodepos.comments=06CAWPW Redirect that to a shell script and the shell script should do it's thing. *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Friday, July 10, 2015 10:00 AM *To:* xCAT Users Mailing list *Subject:* [xcat-user] best way to populate nodepos table As of today we are not using nodepos table for anything. I started experimenting and while adding nodepos.rack, nodepos.chassis, and nodepos.height values are easy to add using nodech command, things that are different between nodes are not that easy to enter without a lot of manual data entry. My table currently looks like this: #node,rack,u,chassis,slot,room,height,comments,disable qnode5001,w22,1,qfpc01,left,,1u,, qnode5002,w22,1,qfpc01,right,,1u,, qnode5003,w22,2,qfpc01,left,,1u,, qnode5004,w22,2,qfpc01,right,,1u,, w22 is the floor position of the rack, slot is u 1 in the rack and left or right indicates where in the chassis node resides. Is there any way to automate slot and u position using nodech command? Also, under comments in this table I would like to read-in rinv serial number command but I am unsure how to do that. Any help would be greatly appreciated. Thanks, Damir -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] best way to populate nodepos table
As of today we are not using nodepos table for anything. I started experimenting and while adding nodepos.rack, nodepos.chassis, and nodepos.height values are easy to add using nodech command, things that are different between nodes are not that easy to enter without a lot of manual data entry. My table currently looks like this: #node,rack,u,chassis,slot,room,height,comments,disable qnode5001,w22,1,qfpc01,left,,1u,, qnode5002,w22,1,qfpc01,right,,1u,, qnode5003,w22,2,qfpc01,left,,1u,, qnode5004,w22,2,qfpc01,right,,1u,, w22 is the floor position of the rack, slot is u 1 in the rack and left or right indicates where in the chassis node resides. Is there any way to automate slot and u position using nodech command? Also, under comments in this table I would like to read-in rinv serial number command but I am unsure how to do that. Any help would be greatly appreciated. Thanks, Damir -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] how to exclude some ofed packages being installed with genimage
We are following this guide to install OFED in our compute image and it's been working great ( http://sourceforge.net/p/xcat/wiki/Managing_the_Mellanox_Infiniband_Network/ ). We just heard from our customer and they would like to remove MPI versions that come installed with OFED. Is there a way to specify what to exclude/uninstall from OFED during the genimage command? Thanks, Damir -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] NextScale deployment kernel crash
We are trying to boot NextScale nodes with our RedHat 6.4 stateless image. They are crashing during the initrd boot process with following error: dracut Warning: No root device 1 found dracut Warning: Boot has failed. To debug this issue add rdshell to the kernel command line. dracut Warning: Signal caught! dracut Warning: Boot has failed. To debug this issue add rdshell to the kernel command line. Kernel panic - not syncing: Attempted to kill init! Pid: 1, comm: init Tainted: G --- H 2.6.32-358.el6.x86_64 #1 Call Trace: [8150cfc8] ? panic+0xa7/0x16f [81073ae2] ? do_exit+0x862/0x870 [81182885] ? fput+0x25/0x30 [81073b48] ? do_group_exit+0x58/0xd0 [81073bd7] ? sys_exit_group+0x17/0x20 [8100b072] ? system_call_fastpath+0x16/0x1b [ cut here ] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60() (Tainted: G --- H ) Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]- Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded: scsi_wait_scan] Pid: 1, comm: init Tainted: G --- H 2.6.32-358.el6.x86_64 #1 Call Trace: IRQ [8106e2e7] ? warn_slowpath_common+0x87/0xc0 [8106e33a] ? warn_slowpath_null+0x1a/0x20 [8102dd9c] ? native_smp_send_reschedule+0x5c/0x60 [8105ae28] ? scheduler_tick+0x208/0x260 [810a7fd0] ? tick_sched_timer+0x0/0xc0 [810811de] ? update_process_times+0x6e/0x90 [810a8036] ? tick_sched_timer+0x66/0xc0 [8109b38e] ? __run_hrtimer+0x8e/0x1a0 [810a182f] ? ktime_get_update_offsets+0x4f/0xd0 [8107700f] ? __do_softirq+0x11f/0x1e0 [8109b6f6] ? hrtimer_interrupt+0xe6/0x260 [81516d7b] ? smp_apic_timer_interrupt+0x6b/0x9b [8100bb93] ? apic_timer_interrupt+0x13/0x20 EOI [8150d06d] ? panic+0x14c/0x16f [8150cffa] ? panic+0xd9/0x16f [81073ae2] ? do_exit+0x862/0x870 [81182885] ? fput+0x25/0x30 [81073b48] ? do_group_exit+0x58/0xd0 [81073bd7] ? sys_exit_group+0x17/0x20 [8100b072] ? system_call_fastpath+0x16/0x1b Any help would be appreciated. Thanks, Damir -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] NextScale deployment kernel crash
We just got it working by building RedHat 6.5 image. During boot we see it using tg3 driver. Thanks, Damir On Thu, Jun 25, 2015 at 12:38 PM Jarrod Johnson jjohns...@lenovo.com wrote: What nic driver was built in the initrd? m4 was igb, m5 uses tg3. extra unusable Ethernet ports on the motherboard that mess up the interface naming. Is there a workaround for this??? I'm interested in what this means and if I can help on that. *From:* David Johnson [mailto:david_john...@brown.edu] *Sent:* Thursday, June 25, 2015 11:30 AM *To:* xCAT Users Mailing list *Subject:* Re: [xcat-user] NextScale deployment kernel crash Yes, we are seeing exactly the same problem. 300 nodes from nehalem to nextscale m4 all work fine with the same centos 6.5 image, but not so for the the Lenovo nextscale M5 nodes. They seem to have extra unusable Ethernet ports on the motherboard that mess up the interface naming. Is there a workaround for this??? -- ddj Dave Johnson On Jun 25, 2015, at 10:49 AM, Damir Krstic damir.krs...@gmail.com wrote: We are trying to boot NextScale nodes with our RedHat 6.4 stateless image. They are crashing during the initrd boot process with following error: dracut Warning: No root device 1 found dracut Warning: Boot has failed. To debug this issue add rdshell to the kernel command line. dracut Warning: Signal caught! dracut Warning: Boot has failed. To debug this issue add rdshell to the kernel command line. Kernel panic - not syncing: Attempted to kill init! Pid: 1, comm: init Tainted: G --- H 2.6.32-358.el6.x86_64 #1 Call Trace: [8150cfc8] ? panic+0xa7/0x16f [81073ae2] ? do_exit+0x862/0x870 [81182885] ? fput+0x25/0x30 [81073b48] ? do_group_exit+0x58/0xd0 [81073bd7] ? sys_exit_group+0x17/0x20 [8100b072] ? system_call_fastpath+0x16/0x1b [ cut here ] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule+0x5c/0x60() (Tainted: G --- H ) Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]- Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded: scsi_wait_scan] Pid: 1, comm: init Tainted: G --- H 2.6.32-358.el6.x86_64 #1 Call Trace: IRQ [8106e2e7] ? warn_slowpath_common+0x87/0xc0 [8106e33a] ? warn_slowpath_null+0x1a/0x20 [8102dd9c] ? native_smp_send_reschedule+0x5c/0x60 [8105ae28] ? scheduler_tick+0x208/0x260 [810a7fd0] ? tick_sched_timer+0x0/0xc0 [810811de] ? update_process_times+0x6e/0x90 [810a8036] ? tick_sched_timer+0x66/0xc0 [8109b38e] ? __run_hrtimer+0x8e/0x1a0 [810a182f] ? ktime_get_update_offsets+0x4f/0xd0 [8107700f] ? __do_softirq+0x11f/0x1e0 [8109b6f6] ? hrtimer_interrupt+0xe6/0x260 [81516d7b] ? smp_apic_timer_interrupt+0x6b/0x9b [8100bb93] ? apic_timer_interrupt+0x13/0x20 EOI [8150d06d] ? panic+0x14c/0x16f [8150cffa] ? panic+0xd9/0x16f [81073ae2] ? do_exit+0x862/0x870 [81182885] ? fput+0x25/0x30 [81073b48] ? do_group_exit+0x58/0xd0 [81073bd7] ? sys_exit_group+0x17/0x20 [8100b072] ? system_call_fastpath+0x16/0x1b Any help would be appreciated. Thanks, Damir -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Monitor 25 network devices or servers for free with OpManager! OpManager is web-based network management software that monitors network devices and physical virtual servers, alerts via email sms for fault. Monitor 25 devices for free with no restriction. Download now http://ad.doubleclick.net/ddm/clk/292181274;119417398
[xcat-user] how to remove service node
what is the proper procedure for removing xCAT service node from xCAT. We have 3 service nodes in production right now, and I am planning on retiring one of them in next couple of weeks. None of the compute nodes in the cluster are set to boot from this service node any longer. Any help is appreciated. Damir -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] deploying new xcat management node
Hi Christian, Good to hear from you - I hope you are doing well. Existing mgt node is running RH5.3 and xCAT 2.7. New management node will run RH6.5 and xCAT 2.9. Thanks, Damir On Wed, Apr 29, 2015 at 9:22 AM Christian Caruthers ccaruth...@lenovo.com wrote: Damir, What version are you currently running? Will the new MN run the latest xCAT version? The main question woud be if the newer version of xCAT can import tables from an older version. One possible way around this is to install your current xCAT version on a VM, import your tables, upgrade xCAT on the VM, and export the tables from there for import into your new management node. This avoids touching your existing (working) MN and should provide you tables with all the right fields that the new version will recognize when they're imported. That said, I can't remember the last time I had a problem upgrading xCAT with working tables in place. Still, I haven't upgraded from something like 2.6, or earlier, to 2.9! If your new MN uses different network interfaces (ie. if the old MN had the compute network on eth0 and the new one has it on eth1) make sure you update the networks table as well as possibly the site table (dhcpinterfaces) and possibly, though not likely, the nics hosts tables. Regards, *Christian Caruthers* Senior Consultant - System x Linux HPC Mobile: 757-289-9872 *From:* Damir Krstic [mailto:damir.krs...@gmail.com] *Sent:* Wednesday, April 29, 2015 9:18 AM *To:* xCAT Users Mailing list *Subject:* [xcat-user] deploying new xcat management node We are planning on deploying a new management node on our iDataPlex cluster soon. I've asked if there is a document that outlines migrating to new management node and was pointed to this document: http://sourceforge.net/p/xcat/wiki/Setup_HA_Mgmt_Node_With_Shared_Data/ However, I don't think this applies since we are going from RH5.3 on the existing management node to RH6.5 and the xcat versions will also be different. Has anyone migrated from one management node to a new management node with different OS and xCAT versions? Damir -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] problem with bmc programming
Christian I'll try you suggestions. Thanks. Daniel switch does not show anything connected to that port. Damir On Fri, Apr 17, 2015 at 04:31 Daniel Letai d...@letai.org.il wrote: What does the switch shows as connected to that port? On Thu, Apr 16, 2015 at 10:12 PM, Christian Caruthers ccaruth...@lenovo.com wrote: Damir, I can think of 3 troubleshooting routes: 1. Load factory defaults, boot the system to the genesis kernel (nodeset NODE shell) and run bmcsetup 2. Pull power from the box and plug it back in to reboot the IMM. 3. Create Bootable Media Creator thumb drive and force it to flash the IMM. If none of that works, you might need to open a service call to replace the system board. Pull a DSA because they'll probably ask for it. Regards, Christian Caruthers Senior Consultant - System x Linux HPC Mobile: 757-289-9872 From: Damir Krstic [mailto:damir.krs...@gmail.com] Sent: Wednesday, April 15, 2015 2:22 PM To: xCAT Users Mailing list Subject: [xcat-user] problem with bmc programming one of our new nodes was just provisioned and I am having an issue programming bmc. we are using dedicated imm port on this 3650m4 server. imm port is plugged in to a switch with single vlan. imm interface is configured with following settings: IP Address Source : Static Address IP Address : 172.29.9.1 Subnet Mask : 255.255.0.0 MAC Address : 40:f2:e9:cd:bf:df SNMP Community String : public Here is the picture of the actual imm settings in the uefi i can't ping/telnet this interface at all. tcpdump basically shows me that the management node is asking who has the mac address of this node. i have logged in to the switch itself and this mac is not showing in the mac table on the switch. other interfaces (non imm) that are configured on this server and plugged in to the same switch function properly and are accessible with ssh/telnet etc. any help is appreciated. damir -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] migrating to a new management node
We are hoping to retire our original management node in next couple of months. Is there a documented way to migrate from existing production xCAT management node to a brand new one? Thanks, Damir -- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready
would this work: nodech quser10 noderes.installnic= ? On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson jarrod.b.john...@gmail.com wrote: What happens if you blank installnic? If not set it will autodetect and the result may surprise you. I recommend never setting installnic or primarynic on x86 anymore, since the autodetect works as desired 99.9% of the time. On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote: we have 4 new login nodes that i am trying to deploy in next couple of days. they were autodiscovered (have mac in the mac table) and i have trying to installed them now: nodeset quser10 install the installation stops at the following: NetworkManager: eth0 link is not ready eth0 deactivating device (screenshot included) lsdef of the node itself: Object name: quser10 arch=x86_64 bmc=quser10-bmc bmcpassword=PASSW0RD bmcport=0 bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-user6 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.4.10 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:ce:e2:8a mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib primarynic=eth0 profile=user6 provmethod=install serial=06ATFXT serialport=0 serialspeed=115200 status=configuring statustime=07-16-2014 14:29:10 supportedarchs=x86,x86_64 switch=bnt103 switchinterface=eth0 switchport=1 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 xcat version: [root@mgt rh]# xcatconfig --version Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012) We will be deploying a new management node with updated xCAT as soon as the login nodes are provisioned. Thanks in advance for your help. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready
I did the above nodech command and then did nodeset quser10 install and it still timed out with same message. damir On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic damir.krs...@gmail.com wrote: would this work: nodech quser10 noderes.installnic= ? On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson jarrod.b.john...@gmail.com wrote: What happens if you blank installnic? If not set it will autodetect and the result may surprise you. I recommend never setting installnic or primarynic on x86 anymore, since the autodetect works as desired 99.9% of the time. On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote: we have 4 new login nodes that i am trying to deploy in next couple of days. they were autodiscovered (have mac in the mac table) and i have trying to installed them now: nodeset quser10 install the installation stops at the following: NetworkManager: eth0 link is not ready eth0 deactivating device (screenshot included) lsdef of the node itself: Object name: quser10 arch=x86_64 bmc=quser10-bmc bmcpassword=PASSW0RD bmcport=0 bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-user6 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.4.10 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:ce:e2:8a mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib primarynic=eth0 profile=user6 provmethod=install serial=06ATFXT serialport=0 serialspeed=115200 status=configuring statustime=07-16-2014 14:29:10 supportedarchs=x86,x86_64 switch=bnt103 switchinterface=eth0 switchport=1 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 xcat version: [root@mgt rh]# xcatconfig --version Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012) We will be deploying a new management node with updated xCAT as soon as the login nodes are provisioned. Thanks in advance for your help. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready
i think i figured it out - in the /tftpboot/pxelinux.cfg/quser10 i changed ksdevice from eth0 to bootif...when the node finished installing eth2 was configured with the node ip address... what i think happened is - since these nodes have daughter eth card, linux saw the card that was plugged in as eth2 instead of eth0...telling pxe configuration file not to use eth0 seemed to have worked. i just tested it on another login node and it works. thanks, damir On Thu, Jul 17, 2014 at 10:45 AM, Damir Krstic damir.krs...@gmail.com wrote: I did the above nodech command and then did nodeset quser10 install and it still timed out with same message. damir On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic damir.krs...@gmail.com wrote: would this work: nodech quser10 noderes.installnic= ? On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson jarrod.b.john...@gmail.com wrote: What happens if you blank installnic? If not set it will autodetect and the result may surprise you. I recommend never setting installnic or primarynic on x86 anymore, since the autodetect works as desired 99.9% of the time. On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote: we have 4 new login nodes that i am trying to deploy in next couple of days. they were autodiscovered (have mac in the mac table) and i have trying to installed them now: nodeset quser10 install the installation stops at the following: NetworkManager: eth0 link is not ready eth0 deactivating device (screenshot included) lsdef of the node itself: Object name: quser10 arch=x86_64 bmc=quser10-bmc bmcpassword=PASSW0RD bmcport=0 bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-user6 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.4.10 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:ce:e2:8a mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib primarynic=eth0 profile=user6 provmethod=install serial=06ATFXT serialport=0 serialspeed=115200 status=configuring statustime=07-16-2014 14:29:10 supportedarchs=x86,x86_64 switch=bnt103 switchinterface=eth0 switchport=1 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 xcat version: [root@mgt rh]# xcatconfig --version Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012) We will be deploying a new management node with updated xCAT as soon as the login nodes are provisioned. Thanks in advance for your help. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready
yes - i got around it by editing node file under /tftpboot/pxelinux.cfg (see previous email)...instead of ksdevice=eth0 i did ksdevice=bootif...this worked. you are absolutely right that since this node has additional card with 2 10GbE ports in it, RH was confusing eth0 for something else. node is booted/installed now and eth2 is configured with node's ip address. i tried jarrod's suggestion of blanking out installnic= but did not try blanking out primarynic...i'll try that on another login node later today. i have 4 to deploy and 2 are already done by editing ksdevice statement...i'll try other two by blanking out both installnic and primarynic On Thu, Jul 17, 2014 at 11:34 AM, Christian Caruthers christian.caruth...@us.ibm.com wrote: Under /tftpboot/pxelinux.cfg there should be a file named for the node you're trying to install. This file contains the kickstart boot command that's passed to the system in response to its PXE request. Can you send the contents of that file? Also, does this node have 10Gb ports, or any additional PCI Ethernet cards in it? If so, Red Hat more than likely sees port 1 on this card as eth0 while the system BIOS (or uEFI or whatever) sees the planar port 1 as eth0. Clearing out installnic and prinic help get around this, Where you're install is failing, the network device Network Manager is trying to initialize is dictated by the ksdevice option in the file I mentioned above. Regards, * Christian Caruthers* Senior Consultant - System x Linux HPC Mobile: 757-289-9872 *Find me on LinkedIn* http://www.linkedin.com/profile/view?id=14378571trk=tab_pro From:Damir Krstic damir.krs...@gmail.com To:xCAT Users Mailing list xcat-user@lists.sourceforge.net Date:2014-07-17 11:51 Subject:Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready -- I did the above nodech command and then did nodeset quser10 install and it still timed out with same message. damir On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic *damir.krs...@gmail.com* damir.krs...@gmail.com wrote: would this work: nodech quser10 noderes.installnic= ? On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson *jarrod.b.john...@gmail.com* jarrod.b.john...@gmail.com wrote: What happens if you blank installnic? If not set it will autodetect and the result may surprise you. I recommend never setting installnic or primarynic on x86 anymore, since the autodetect works as desired 99.9% of the time. On Jul 17, 2014 10:05 AM, Damir Krstic *damir.krs...@gmail.com* damir.krs...@gmail.com wrote: we have 4 new login nodes that i am trying to deploy in next couple of days. they were autodiscovered (have mac in the mac table) and i have trying to installed them now: nodeset quser10 install the installation stops at the following: NetworkManager: eth0 link is not ready eth0 deactivating device (screenshot included) lsdef of the node itself: Object name: quser10 arch=x86_64 bmc=quser10-bmc bmcpassword=PASSW0RD bmcport=0 bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-user6 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.4.10 kcmdline=nofb utf8 ks=*http://172.20.0.1/install/autoinst/quser10* http://172.20.0.1/install/autoinst/quser10 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:ce:e2:8a mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib primarynic=eth0 profile=user6 provmethod=install serial=06ATFXT serialport=0 serialspeed=115200 status=configuring statustime=07-16-2014 14:29:10 supportedarchs=x86,x86_64 switch=bnt103 switchinterface=eth0 switchport=1 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 xcat version: [root@mgt rh]# xcatconfig --version Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012) We will be deploying a new management node with updated xCAT as soon as the login nodes are provisioned. Thanks in advance for your help. -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. *http://p.sf.net/sfu/bds* http://p.sf.net/sfu/bds ___ xCAT-user mailing list *xCAT-user@lists.sourceforge.net* xCAT-user@lists.sourceforge.net
Re: [xcat-user] problem installing new node
version is Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012) trying it now without installnic and primarynic specified. On Tue, Nov 26, 2013 at 4:04 PM, Ling Gao ling...@us.ibm.com wrote: What version of xCAT are you using? (xdsh -V) Can you change netboot=xnba and change installnic and primarynic to empty? Then run nodeset again and redeploy the node. Ling From:Damir Krstic damir.krs...@gmail.com To:xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date:11/26/2013 04:37 PM Subject:Re: [xcat-user] problem installing new node -- Verified the static/dynamic overlapping and that does not seem to be a problem. Also double checked the ip/mac and they are not duplicated/overlapping. Here is the lsdef of one of the problematic nodes: bject name: ttlogin01 arch=x86_64 bmc=ttlogin01-bmc bmcpassword=PASSW0RD bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-ttlogin6 groups=ttlogin6,ttlogin6-profile,ipmiB,x3650m2,ttlogin,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.7.1 kcmdline=nofb utf8 ks=*http://172.20.0.1/install/autoinst/ttlogin01*http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:0d:e2:64 mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles primarynic=eth0 profile=ttlogin6 provmethod=install serial=KQ0GV1M serialport=0 serialspeed=115200 status=installing statustime=11-26-2013 12:18:42 supportedarchs=x86,x86_64 switch=bnt101 switchinterface=eth0 switchport=29 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 On Tue, Nov 26, 2013 at 3:10 PM, Russell Jones *russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote: Verify you do not have dynamic and static networks overlapping for that network definition. Also verify you have configured the correct MAC address for that node in xcat and do not have overlapping MACs/IPs. What does an lsdef for one of the problem nodes look like? On 11/26/2013 2:54 PM, Damir Krstic wrote: We have couple of new x3550m4 that are not installing. Basically after BMC has been programmed and nodes have been set to install, and for some reason, pxe boot process never goes beyond serving pxelinux.0 (please see the log file below: ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via bond0 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to *172.20.7.1:1929*http://172.20.7.1:1929/ Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to *172.20.7.1:1930*http://172.20.7.1:1930/ Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to *172.20.7.1:1931*http://172.20.7.1:1931/ Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting Here is the tcpdump from the management node when this happens: 14:33:20.626124 IP (tos 0x0, ttl 64, id 50528, offset 0, flags [none], proto: UDP (17), length: 68) new-node.informatik-lm mgt node: [udp sum ok] 40 RRQ pxelinux.0 octet tsize 0 blksize 1468 in the /tftpboot/pxelinux.cfg directory we have a directory that corresponds to the hex of the ip for the new node: [root@mgt pxelinux.cfg]# ls -lrt AC140701 lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01 here is the content of the file: root@mgt pxelinux.cfg]# cat ttlogin01 #install rhels6.2-x86_64-ttlogin6 DEFAULT xCAT LABEL xCAT KERNEL xcat/rhels6.2/x86_64/vmlinuz APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo= *http://172.20.0.1/install/rhels6.2/x86_64/*http://172.20.0.1/install/rhels6.2/x86_64/ks= *http://172.20.0.1/install/autoinst/ttlogin01*http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0 cmdline console=tty0 console=ttyS0,115200 IPAPPEND 2 For some reason, tftpboot process never proceeds to the pxelinux.cfg directory after pxelinux.0 is served. Stateless nodes on this cluster boot fine so I think our tftpboot environment is OK. It's just these two nodes that have to be installed that are problematic. Any help is appreciated. Thanks, Damir. -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility
[xcat-user] problem installing new node
We have couple of new x3550m4 that are not installing. Basically after BMC has been programmed and nodes have been set to install, and for some reason, pxe boot process never goes beyond serving pxelinux.0 (please see the log file below: ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via bond0 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1929 Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1930 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1931 Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting Here is the tcpdump from the management node when this happens: 14:33:20.626124 IP (tos 0x0, ttl 64, id 50528, offset 0, flags [none], proto: UDP (17), length: 68) new-node.informatik-lm mgt node: [udp sum ok] 40 RRQ pxelinux.0 octet tsize 0 blksize 1468 in the /tftpboot/pxelinux.cfg directory we have a directory that corresponds to the hex of the ip for the new node: [root@mgt pxelinux.cfg]# ls -lrt AC140701 lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01 here is the content of the file: root@mgt pxelinux.cfg]# cat ttlogin01 #install rhels6.2-x86_64-ttlogin6 DEFAULT xCAT LABEL xCAT KERNEL xcat/rhels6.2/x86_64/vmlinuz APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo= http://172.20.0.1/install/rhels6.2/x86_64/ ks= http://172.20.0.1/install/autoinst/ttlogin01 ksdevice=eth0 cmdline console=tty0 console=ttyS0,115200 IPAPPEND 2 For some reason, tftpboot process never proceeds to the pxelinux.cfg directory after pxelinux.0 is served. Stateless nodes on this cluster boot fine so I think our tftpboot environment is OK. It's just these two nodes that have to be installed that are problematic. Any help is appreciated. Thanks, Damir. -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] problem installing new node
Verified the static/dynamic overlapping and that does not seem to be a problem. Also double checked the ip/mac and they are not duplicated/overlapping. Here is the lsdef of one of the problematic nodes: bject name: ttlogin01 arch=x86_64 bmc=ttlogin01-bmc bmcpassword=PASSW0RD bmcusername=USERID currchain=boot currstate=install rhels6.2-x86_64-ttlogin6 groups=ttlogin6,ttlogin6-profile,ipmiB,x3650m2,ttlogin,all initrd=xcat/rhels6.2/x86_64/initrd.img installnic=eth0 ip=172.20.7.1 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6 kernel=xcat/rhels6.2/x86_64/vmlinuz mac=40:f2:e9:0d:e2:64 mgt=ipmi mtm=7914AC1 netboot=pxe nfsserver=172.20.0.1 os=rhels6.2 postbootscripts=otherpkgs,setupntp postscripts=syslog,remoteshell,syncfiles primarynic=eth0 profile=ttlogin6 provmethod=install serial=KQ0GV1M serialport=0 serialspeed=115200 status=installing statustime=11-26-2013 12:18:42 supportedarchs=x86,x86_64 switch=bnt101 switchinterface=eth0 switchport=29 switchvlan=1 tftpserver=172.20.0.1 xcatmaster=172.20.0.1 On Tue, Nov 26, 2013 at 3:10 PM, Russell Jones russell-l...@jonesmail.mewrote: Verify you do not have dynamic and static networks overlapping for that network definition. Also verify you have configured the correct MAC address for that node in xcat and do not have overlapping MACs/IPs. What does an lsdef for one of the problem nodes look like? On 11/26/2013 2:54 PM, Damir Krstic wrote: We have couple of new x3550m4 that are not installing. Basically after BMC has been programmed and nodes have been set to install, and for some reason, pxe boot process never goes beyond serving pxelinux.0 (please see the log file below: ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via bond0 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1929 Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1930 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1931 Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468 Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting Here is the tcpdump from the management node when this happens: 14:33:20.626124 IP (tos 0x0, ttl 64, id 50528, offset 0, flags [none], proto: UDP (17), length: 68) new-node.informatik-lm mgt node: [udp sum ok] 40 RRQ pxelinux.0 octet tsize 0 blksize 1468 in the /tftpboot/pxelinux.cfg directory we have a directory that corresponds to the hex of the ip for the new node: [root@mgt pxelinux.cfg]# ls -lrt AC140701 lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01 here is the content of the file: root@mgt pxelinux.cfg]# cat ttlogin01 #install rhels6.2-x86_64-ttlogin6 DEFAULT xCAT LABEL xCAT KERNEL xcat/rhels6.2/x86_64/vmlinuz APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo= http://172.20.0.1/install/rhels6.2/x86_64/ ks= http://172.20.0.1/install/autoinst/ttlogin01 ksdevice=eth0 cmdline console=tty0 console=ttyS0,115200 IPAPPEND 2 For some reason, tftpboot process never proceeds to the pxelinux.cfg directory after pxelinux.0 is served. Stateless nodes on this cluster boot fine so I think our tftpboot environment is OK. It's just these two nodes that have to be installed that are problematic. Any help is appreciated. Thanks, Damir. -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk ___ xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user -- Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https
Re: [xcat-user] re-discovering node after motherboard replacement
OK - I'll try booting from the hard drive and see if that works, but...BMC never got programed. I can't reach this node with any of the rcons/rpower commands and if I try to telnet to its bmc port it fails. I'll keep poking to see if there are any other errors related to programing of BMC. Thanks, Damir On Mon, Oct 28, 2013 at 9:46 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: Should be ready to be nodeset to do something else. 'standby' in this case is 'completed everything supposed to happen, awaiting instructions' If you put in hard drives with os still working: nodeset node boot if hard drive needs reinstall: nodeset node osimage If stateless: nodeset node netboot [image: Inactive hide details for Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it repla]Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 10:42 AM Subject: [xcat-user] re-discovering node after motherboard replacement -- One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was replaced we ran rmnodecfg script and node was re-discovered: mgt xCAT node discovery: qgpu0020 has been discovered I can see the new MAC address in the mac table. However, we are running into issues reprograming BMC. It never finishes. Console screen displays: Received request to retry in a bit, will call xCAT back in amount seconds. lsdef on this node displays that node is standby mode (not sure what that means): chain=runcmd=bmcsetup,standby currchain=standby currstate=standby Here is the content of the pxelinux file for this node: #standby DEFAULT xCAT LABEL xCAT KERNEL xcat/genesis.kernel.x86_64 APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0 console=ttyS0,115200 xcatd=*172.20.0.1:3001* http://172.20.0.1:3001/ destiny=standby nouveau.modeset=0 IPAPPEND 2 I hope you can help. Damir -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user graycol.gif-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] nodeset error
Hello, We are creating a new stateless 6.2 image on our cluster and genimage and packimage commands completed successfully. However, running nodeset command generates following error: [root@qservice03 pxelinux.cfg]# nodeset qnode0002 netboot Error: Did you run genimage before running packimage? kernel cannot be found Error: Some nodes failed to set up netboot resources, aborting Error: Did you run genimage before running packimage? kernel cannot be found Error: Some nodes failed to set up netboot resources, aborting Error: Did you run genimage before running packimage? kernel cannot be found Error: Some nodes failed to set up netboot resources, aborting [root@qservice03 pxelinux.cfg]# Any help would be appreciated. Thank you, Damir -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] redhat 6.2 kickstart file
Does anyone have a good RH 6.2 kickstart file they are successfully using and don't mind sharing? I tried using our 6.1 template file and it keeps failing with syntax errors. Any help is appreciated. Thanks, Damir -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] redhat 6.2 stateful image
thanks for the reply...files seem to have copied fine. i see both vmlinuz and initrd.img both in /install/rhels6.2/x86_64/images/pxeboot and in /tftpboot/xcat/rhels6.2/x86_64. initrd.img and vmlinuz are both different in size from rhels6.1 but i read somewhere that redhat has switched to a new compression mechanism in 6.2 so that may be the reason. i'll try re-downloading the iso and running the copycds command again, but i don't have much hope for that. any other ideas? thanks. damir On Mon, Jun 4, 2012 at 9:34 PM, Guang Cheng Li ligua...@cn.ibm.com wrote: HI, The error indicates the files images/pxeboot/vmlinuz, images/pxeboot/initrd.img and images/pxeboot/initrd.img could not be found in directory /install/rhels6.2/x86_64/, you could check if the copycds has successfully copies the os packages to /install/rhels6.2/x86_64? Thanks, - Li,Guang Cheng (李光成) IBM China System Technology Laboratory Email: ligua...@cn.ibm.com Address: Building 28, ZhongGuanCun Software Park, No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC 北京市海淀区东北旺西路8号中关村软件园28号楼 邮编: 100193 [image: Inactive hide details for Damir Krstic ---2012/06/05 04:24:57---Damir Krstic damir.krs...@gmail.com]Damir Krstic ---2012/06/05 04:24:57---Damir Krstic damir.krs...@gmail.com *Damir Krstic damir.krs...@gmail.com* 2012/06/05 04:23 Please respond to xCAT Users Mailing list xcat-user@lists.sourceforge.net To xCAT-user@lists.sourceforge.net, cc Subject [xcat-user] redhat 6.2 stateful image Hi, I hope you can help us with an error we are encountering building a new RHEL 6.2 service node. We did copycds of a 6.2 image and we configured/edited appropriate tables for this node to role out, but...when issuing nodeset command this is the error we get: [root@mgt ~]# nodeset qservice03 install Error: Install image not found in /install/rhels6.2/x86_64 Error: Some nodes failed to set up install resources, aborting qservice03: install rhels6.2-x86_64-service6 qservice03: install rhels6.2-x86_64-service6 Here is the directory where I checked for stuff: [root@mgt x86_64]# pwd /tftpboot/xcat/rhels6.2/x86_64 and here is the listing of it: [root@mgt x86_64]# ls -lart total 32812 drwxr-xr-x 3 root root 4096 May 31 11:06 .. drwxr-xr-x 2 root root 4096 May 31 11:06 . -rw-r--r-- 1 root root 3938800 Jun 4 15:15 vmlinuz -rw-r--r-- 1 root root 29608959 Jun 4 15:15 initrd.img No idea what could be causing the error above. I had issues downloading the ISO from RedHat's website, so my next step was to re-download the ISO and re-run copycds command. Any help would be appreciated. Thanks, Damir -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user ecblank.gifpic18215.gifgraycol.gif-- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
[xcat-user] redhat 6.2 stateful image
Hi, I hope you can help us with an error we are encountering building a new RHEL 6.2 service node. We did copycds of a 6.2 image and we configured/edited appropriate tables for this node to role out, but...when issuing nodeset command this is the error we get: [root@mgt ~]# nodeset qservice03 install Error: Install image not found in /install/rhels6.2/x86_64 Error: Some nodes failed to set up install resources, aborting qservice03: install rhels6.2-x86_64-service6 qservice03: install rhels6.2-x86_64-service6 Here is the directory where I checked for stuff: [root@mgt x86_64]# pwd /tftpboot/xcat/rhels6.2/x86_64 and here is the listing of it: [root@mgt x86_64]# ls -lart total 32812 drwxr-xr-x 3 root root 4096 May 31 11:06 .. drwxr-xr-x 2 root root 4096 May 31 11:06 . -rw-r--r-- 1 root root 3938800 Jun 4 15:15 vmlinuz -rw-r--r-- 1 root root 29608959 Jun 4 15:15 initrd.img No idea what could be causing the error above. I had issues downloading the ISO from RedHat's website, so my next step was to re-download the ISO and re-run copycds command. Any help would be appreciated. Thanks, Damir -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user