[xcat-user] xcat deployment of sr630 with optane

2021-06-22 Thread Damir Krstic
Hello,

Trying to deploy RH7.5 on an SR630 with Optane fails with various UDEV
timeout errors. Has anyone had any luck deploying Optane servers using
xCAT?

Our version of xCAT is:

[root@qmgt3 ~]# rpm -qa |grep -i xcat

*xCAT*-client-2.15-snap201911041517.noarch

*xCAT*-2.15-snap201911041517.x86_64

*xCAT*-buildkit-2.15-snap201911041517.noarch

*xCAT*-genesis-scripts-x86_64-2.15-snap201911041517.noarch

grub2-*xcat*-2.02-0.76.el7.1.snap201905160255.noarch

*xCAT*-genesis-base-x86_64-2.14.5-snap201811190037.noarch

*xCAT*-genesis-base-ppc64-2.14.5-snap201811160710.noarch

elilo-*xcat*-3.14-4.noarch

syslinux-*xcat*-3.86-2.noarch

*xCAT*-probe-2.15-snap201911041517.noarch

ipmitool-*xcat*-1.8.18-0.x86_64

perl-*xCAT*-2.15-snap201911041517.noarch

*xCAT*-genesis-scripts-ppc64-2.15-snap201911041517.noarch

*xCAT*-server-2.15-snap201911041517.noarch


Thanks,

Damir
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] running bmc setup and the USERID password

2021-02-11 Thread Damir Krstic
Here is what I have done so far:
logged into both qgpu0101 and qgpu0102 and ran /bin/bmcsetup.
They have the correct IP. After the expect script did not work I have
logged into only qgpu0101 genesis and ran:
ipmitool user set password 2 
it worked but I still could not login into the USERID@qgpu0101
On subsequent attempts, I can no longer change the password. When I test it:
xCAT Genesis running on qgpu0101 /]# ipmitool user test 2 16
Password for user 2: 
Success
impitool lan print 1 shows:
Auth Type Enable : Callback :
: User : MD5 PASSWORD
: Operator : MD5 PASSWORD
: *Admin : MD5*
: OEM :
User USERID is ADMIN so I ran:
ipmitool lan set 1 auth ADMIN MD5,PASSWORD
But alas I can't login to USERID@qgpu0101 still. And eventually, Admin
changes back to just MD5 ??
I also changed USERID to an OPERATOR level using:
ipmitool user priv 2 0x3 1

On Thu, Feb 11, 2021 at 4:28 PM Damir Krstic  wrote:

> Lenovo has implemented change the USERID password from the default
> PASSW0RD in recent firmware iterations. We had success in implementing an
> expect script that ran after the bmcsetup script ran that would change the
> password to our own.
>
> However, in the recent batch of nodes, the default password (PASSW0RD) is
> changed / different. Does anyone know what the new password is? Also, this
> new policy of having to change the bmc password right away is, to say
> the least, not convenient. We spend more time now messing with the bmc
> scripts than deploying the nodes.
>
> Is there a procedure or some step that I am missing?
>
> Thanks,
> Damir
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] running bmc setup and the USERID password

2021-02-11 Thread Damir Krstic
Lenovo has implemented change the USERID password from the default PASSW0RD
in recent firmware iterations. We had success in implementing an expect
script that ran after the bmcsetup script ran that would change the
password to our own.

However, in the recent batch of nodes, the default password (PASSW0RD) is
changed / different. Does anyone know what the new password is? Also, this
new policy of having to change the bmc password right away is, to say
the least, not convenient. We spend more time now messing with the bmc
scripts than deploying the nodes.

Is there a procedure or some step that I am missing?

Thanks,
Damir
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] bmcsetup and complex password

2020-06-29 Thread Damir Krstic
Hi all,

The new Lenovo hardware requires that the bmc password is changed to
complex password on the initial login. This seems to be tripping up
bmcsetup script and it's not completing. We are running xcat 2.15.

Thank you.
Damir
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] programmatically getting power telemetry from nextscale and SD530

2019-12-20 Thread Damir Krstic
Is there a programmatic way of getting the power usage telemetry out of
NextScale FPCs and SD530 chassis? I know that we can access that info using
the web interface by signing in to the FPC and viewing the power
information. However, I would like to do this across the entire cluster in
order to record the cluster's power usage.

If I had to guess ipmitool may be able to do it, but I am unsure what, or
how to get that information using ipmitool.

Thank you.
Damir
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] sd530 deployment guide

2018-05-03 Thread Damir Krstic
so do we need confluent service installed to deploy SD530?

On Thu, May 3, 2018 at 11:15 AM peter CZ1 Peng <peng...@lenovo.com> wrote:

> Hi ,Damir
>
>  Can you try this
>
>  SD530 support zero power on configure ( don’t need to power on the
> node ,only AC on ,no DC on ) ,and confluent service will be detect the SMM
> and XCC by the IPv6 address  , that also called out of band management
>
>
>
>
> https://github.com/824380210/xcat_book/blob/master/20180109_stark_SD530_WI.md
>
>
>
> Let me know if anything is not clear , and I will try to help ,thanks
>
>
>
>
>
> *Peter CZ peng*
> Department :Complex Solution Rack TE
> Address:ISH3 Shenzhen
>
> Lenovo China
>
> [image: Phone]+86 86361590
> [image: Email]+86 18129979128 <+86%20181%202997%209128>
> [image: VOIP]609 1590
> [image: Email]peng...@lenovo.com <zengd...@lenovo.com>
>
>
>
> Lenovo.com /www.lenovo.com  <http://www.lenovo.com/www.lenovo.com>
> Twitter <http://twitter.com/lenovo> | *Facebook* | Instagram
> <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums
> <http://forums.lenovo.com/>
>
> [image: Lenovo_2015]
>
>
>
>
>
> *From:* Damir Krstic <damir.krs...@gmail.com>
> *Sent:* Thursday, May 3, 2018 11:59 PM
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* [xcat-user] sd530 deployment guide
>
>
>
> Hi all,
>
>
>
> We have just purchased a rack of SD530 nodes for our cluster. We have been
> deploying NextScale nodes for sometime now and are very familiar with the
> process. Relatedly, xCAT group had a wonderful document outlining all
> stages of NextScale deployment at this page:
> https://sourceforge.net/p/xcat/wiki/XCAT_NeXtScale_Clusters/
>
>
>
> I am wondering if such page exists for SD530 deployment using xCAT? I
> searched for it and I can't find it. We are trying to deploy a single
> chassis with 4 SD530 servers and is having little bit of trouble with SMM
> and bmcsetup and things like that. Without a complete guide, it's not easy
> to figure out how all pieces fit together. For example, is SMM equivelent
> to FPC on NextScale, and if so, what is the configfpc equivalent command
> for SD530? Also, each SD530 node has two eth ports in it...is one of the
> ports IMM/Eth shared port that we run normal bmcsetup against or is the SMM
> now a single IMM port that is shared across 4 servers in the chassis?
>
>
>
> Things like this are not clear so any help is appreciated.
>
>
>
> Thank you.
>
> Damir
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] sd530 deployment guide

2018-05-03 Thread Damir Krstic
Hi all,

We have just purchased a rack of SD530 nodes for our cluster. We have been
deploying NextScale nodes for sometime now and are very familiar with the
process. Relatedly, xCAT group had a wonderful document outlining all
stages of NextScale deployment at this page:
https://sourceforge.net/p/xcat/wiki/XCAT_NeXtScale_Clusters/

I am wondering if such page exists for SD530 deployment using xCAT? I
searched for it and I can't find it. We are trying to deploy a single
chassis with 4 SD530 servers and is having little bit of trouble with SMM
and bmcsetup and things like that. Without a complete guide, it's not easy
to figure out how all pieces fit together. For example, is SMM equivelent
to FPC on NextScale, and if so, what is the configfpc equivalent command
for SD530? Also, each SD530 node has two eth ports in it...is one of the
ports IMM/Eth shared port that we run normal bmcsetup against or is the SMM
now a single IMM port that is shared across 4 servers in the chassis?

Things like this are not clear so any help is appreciated.

Thank you.
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] connection refused on bmc port

2017-07-20 Thread Damir Krstic
it turns out telnet session was disabled - here is how we fixed it:



On Thu, Jul 20, 2017 at 1:57 AM Nicolas Roosen <nicolas.roo...@hpe.com>
wrote:

> Hi,
>
> On 07/20/2017 04:31 AM, Damir Krstic wrote:
> > Hi all,
> >
> > We just got couple of new x3650 servers in and discovering them went
> > without a problem. Running bmcsetup worked ok too. For some reason after
> > the bmc setup was done, the interface was still in dedicated mode and
> > per Jarrod's instructions some time ago, I was able to change it to
> > shared. What Jarrod asked me, back in 2015 to do was to ssh to the node
> > while bmcsetup was running and execute following command:
> > ipmitool raw 0xc 1 1 0xc0 0
> >
>
> you can try to "reset" the BMC:
>
> ipmitool mc reset warm (or "cold" if a warm reset is not enough).
>
>
> Or maybe the raw command changed since there are new servers (and new
> BMCs firmware I guess)?
>
> On a Supermicro I had to run this from the OS to set the BMC to shared:
>
> ipmitool raw 0x30 0x70 0x0c 1 1
>
> And to check the actual value:
>
> ipmitool raw 0x30 0x70 0x0c 0
>
>
> Nicolas
>
> > This sets the interface to the shared mode. I did that and it looks OK.
> > However, telnet -bmc I get connection refused and I don't recall
> > ever getting this message before.
> >
> > Any help is appreciated.
> >
> > Thanks,
> > Damir
> >
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] connection refused on bmc port

2017-07-19 Thread Damir Krstic
Hi all,

We just got couple of new x3650 servers in and discovering them went
without a problem. Running bmcsetup worked ok too. For some reason after
the bmc setup was done, the interface was still in dedicated mode and per
Jarrod's instructions some time ago, I was able to change it to shared.
What Jarrod asked me, back in 2015 to do was to ssh to the node while
bmcsetup was running and execute following command:
ipmitool raw 0xc 1 1 0xc0 0

This sets the interface to the shared mode. I did that and it looks OK.
However, telnet -bmc I get connection refused and I don't recall ever
getting this message before.

Any help is appreciated.

Thanks,
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] slow private network after RH7 upgrade

2017-07-05 Thread Damir Krstic
We have recently upgraded all our compute nodes to RHELS 7.3. We have left
the management node on RedHat 6. After the nodes were upgraded and booted,
we are experiencing a slow down on our private (172.20) network. For
example, psh compute date command finishes successfully and quickly, but
when we try to copy password file across all nodes, it hangs almost always
on different nodes. I thought I had this issue isolated to a specific rack,
but now it's obvious that it happens on all nodes. As a workaround, I am
copying files across the cluster using -ib0 interface.

I was wondering if RH7 is doing something funky with routing, or
networking. I have looked at the tcpdump and can't really see anything
strange except to say that it takes a long time for the packet to come back
to the management node.

I am thinking it's either routing or dns (reverse lookup) issue but I can't
be sure. I am hoping somebody on this listserv had a similar issue and was
able to resolve it.

Thanks in advance.

Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] problems booting redhat 7.3 on NextScale 360M5

2017-05-25 Thread Damir Krstic
So I removed it from the node’s definition in the nodehm table:

[root@mgt pxelinux.cfg]# nodels qnode5118 nodehm
qnode5118: nodehm.mgt: ipmi
qnode5118: nodehm.serialport: 0
qnode5118: nodehm.node: qnode5118
qnode5118: nodehm.serialspeed: 115200
qnode5118: nodehm.serialflow: 
qnode5118: nodehm.cmdmapping: 
qnode5118: nodehm.termport: 
qnode5118: nodehm.comments: 
qnode5118: nodehm.consoleondemand: 
qnode5118: nodehm.cons: 
qnode5118: nodehm.conserver: 
qnode5118: nodehm.getmac: 
qnode5118: nodehm.termserver: 
qnode5118: nodehm.power: 
qnode5118: nodehm.disable: 

Set it to boot:
[root@mgt pxelinux.cfg]# nodeset qnode5118 boot
qnode5118: boot

Rebooted it and it still hangs:

[root@mgt pxelinux.cfg]# rpower qnode5118 boot
qnode5118: reset


> On May 25, 2017, at 1:52 PM, Gilad Berman <gber...@lenovo.com> wrote:
> 
> Do you have consoles on demand set to yes in the site table (or specific to 
> the node)? <>
> If yes, remove “hard” from your console settings, nodeset again and try.
>  
> If this is a similar case, it is because when you set the flow control to 
> hardware, the OS waits for the serial console to be connected (which is flow 
> control..)
>  
>  
> 
> Gilad Berman
> HPC Architect
> Lenovo EMEA
> +972-52-2554262
> gber...@lenovo.com <mailto:gber...@lenovo.com>
>  
> Lenovo.com  <http://www.lenovo.com/>
> Twitter <http://twitter.com/lenovo> | Facebook 
> <http://www.facebook.com/lenovo> | Instagram <https://instagram.com/lenovo> | 
> Blogs <http://blog.lenovo.com/> | Forums <http://forums.lenovo.com/> 
> 
>  
>  
> From: Damir Krstic [mailto:damir.krs...@gmail.com] 
> Sent: Thursday, May 25, 2017 9:44 PM
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Subject: [xcat-user] problems booting redhat 7.3 on NextScale 360M5
>  
> We are installing RH7.3 on NextScale nodes and after the install, node 
> reboots. The problem is node seems to get "stuck" until I rcons into it and 
> then it continues booting. So far, I can't pinpoint exact spot where it gets 
> stuck, but it just sits there until I remote console (rcons) into it and then 
> it continues booting. Here is where the last one got stuck until I did remote 
> console:
>  
> [root@mgt ~]# rcons qnode5118
> [Enter `^Ec?' for help]
> Info: SOL payload already de-activated
> [SOL Session operational.  Use ~? for help]
> ??6?+
>  ?+#6cK??6?[?+s?
> [  131.345026] systemd[1]: Created slice Root Slice.
> [  131.350307] systemd[1]: Starting Root Slice.
> [  OK  ] Listening on Journal Socket.
> [  131.360103] systemd[1]: Listening on Journal Socket.
> [  131.365676] systemd[1]: Starting Journal Socket.
> [  OK  ] Listening on udev Control Socket.
> [  131.377102] systemd[1]: Listening on udev Control Socket.
> [  131.383152] systemd[1]: Starting udev Control Socket.
> [  OK  ] Listening on udev Kernel Socket.
> [  131.395101] systemd[1]: Listening on udev Kernel Socket.
> [  131.401055] systemd[1]: Starting udev Kernel Socket.
> [  OK  ] Reached target Sockets.
> [  131.411102] systemd[1]: Reached target Sockets.
> [  131.416180] systemd[1]: Starting Sockets.
> [  OK  ] Created slice System Slice.
> [  131.426105] systemd[1]: Created slice System Slice.
> [  131.431578] systemd[1]: Starting System Slice.
> [  131.437357] systemd[1]: Starting Apply Kernel Variables...
>  Starting Apply Kernel Variables...
> [  131.448652] systemd[1]: Starting Journal Service...
>  
> Any help is appreciated.
> Thanks,
> Damir
>  
>  
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>! 
> http://sdm.link/slashdot___ 
> <http://sdm.link/slashdot___>
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net <mailto:xCAT-user@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/xcat-user 
> <https://lists.sourceforge.net/lists/listinfo/xcat-user>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] problems booting redhat 7.3 on NextScale 360M5

2017-05-25 Thread Damir Krstic
We are installing RH7.3 on NextScale nodes and after the install, node
reboots. The problem is node seems to get "stuck" until I rcons into it and
then it continues booting. So far, I can't pinpoint exact spot where it
gets stuck, but it just sits there until I remote console (rcons) into it
and then it continues booting. Here is where the last one got stuck until I
did remote console:

[root@mgt ~]# rcons qnode5118

[Enter `^Ec?' for help]

Info: SOL payload already de-activated

[SOL Session operational.  Use ~? for help]

??6?+

 ?+#6cK??6?[?+s?

[  131.345026] systemd[1]: Created slice Root Slice.

[  131.350307] systemd[1]: Starting Root Slice.

[  OK  ] Listening on Journal Socket.

[  131.360103] systemd[1]: Listening on Journal Socket.

[  131.365676] systemd[1]: Starting Journal Socket.

[  OK  ] Listening on udev Control Socket.

[  131.377102] systemd[1]: Listening on udev Control Socket.

[  131.383152] systemd[1]: Starting udev Control Socket.

[  OK  ] Listening on udev Kernel Socket.

[  131.395101] systemd[1]: Listening on udev Kernel Socket.

[  131.401055] systemd[1]: Starting udev Kernel Socket.

[  OK  ] Reached target Sockets.

[  131.411102] systemd[1]: Reached target Sockets.

[  131.416180] systemd[1]: Starting Sockets.

[  OK  ] Created slice System Slice.

[  131.426105] systemd[1]: Created slice System Slice.

[  131.431578] systemd[1]: Starting System Slice.

[  131.437357] systemd[1]: Starting Apply Kernel Variables...

 Starting Apply Kernel Variables...

[  131.448652] systemd[1]: Starting Journal Service...

Any help is appreciated.
Thanks,
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] NextScale nodes not booting from the disk

2017-05-12 Thread Damir Krstic
These are my settings:

IMM.PXE_NextBootEnabled=Disabled

PXE.NicPortMacAddress.1=E4:1D:2D:73:AF:41

PXE.NicPortMacAddress.2=E4:1D:2D:73:AF:42

PXE.NicPortMacAddress.3=40:F2:E9:C5:48:14

PXE.NicPortMacAddress.4=40:F2:E9:C5:48:15

PXE.NicPortPxeMode.1=UEFI and Legacy Support

PXE.NicPortPxeMode.2=UEFI and Legacy Support

PXE.NicPortPxeMode.3=UEFI and Legacy Support

PXE.NicPortPxeMode.4=UEFI and Legacy Support

PXE.NicPortPxeProtocol.1=IPV4

PXE.NicPortPxeProtocol.2=IPV4

PXE.NicPortPxeProtocol.3=IPV4

PXE.NicPortPxeProtocol.4=IPV4

LegacySupport.Non-PlanarPXE=Enabled

BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1

BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0

BroadcomGigabitEthernetBCM5717-40F2E9C54814.LegacyBootProtocol=PXE

BroadcomGigabitEthernetBCM5717-40F2E9C54815.LegacyBootProtocol=PXE


I'll change them so they look like yours and try again.

Damir


On Fri, May 12, 2017 at 1:50 PM Gilad Berman <gber...@lenovo.com> wrote:

> Will give it another try (with the ASU settings now)… maybe not the issue
> at all, but worth a shot –
>
>
>
> How does the following settings looks in your machine? Here are mine from
> a M5 system –
>
>
>
> opafm1: PXE.NicPortPxeMode.1=Enabled
>
> opafm1: PXE.NicPortPxeMode.2=Enabled
>
> opafm1: PXE.NicPortPxeMode.3=Enabled
>
> opafm1: PXE.NicPortPxeMode.4=Enabled
>
> opafm1: PXE.NicPortLegacyPxeMode.1=Disabled
>
> opafm1: PXE.NicPortLegacyPxeMode.2=Enabled
>
> opafm1: PXE.NicPortLegacyPxeMode.3=Enabled
>
> opafm1: PXE.NicPortLegacyPxeMode.4=Enabled
>
>
>
> I disabled the Legacy PXE on the install NIC
>
>
>
> [image:
> http://lenovocentral.lenovo.com/marketing/branding/email_signature/images/gradient.gif]
>
> *Gilad Berman*
> HPC Architect
> Lenovo EMEA
>
> [image: Phone]+972-52-2554262 <+972%2052-255-4262>
> [image: Email]gber...@lenovo.com <gber...@lenovo.com>
>
>
>
> Lenovo.com <http://www.lenovo.com/>
> Twitter <http://twitter.com/lenovo> | Facebook
> <http://www.facebook.com/lenovo> | Instagram
> <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums
> <http://forums.lenovo.com/>
>
> [image: DCG-Hardware]
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
>
> *Sent:* Friday, May 12, 2017 9:33 PM
>
>
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk
>
>
>
> Well the thing is, the node installs properly, and it reboots. On the
> reboot, it's suppose to run the postbootscripts and it never boots to the
> OS. So we tried changing the boot order to HD and that does not seem to
> work.
>
>
>
> Damir
>
>
>
> On Fri, May 12, 2017 at 1:05 PM Nathan Harper <nathan.har...@cfms.org.uk>
> wrote:
>
> I have a similar issue with some non-IBM/Lenovo equipment. Old dx360s
> work, other new equipment boots, installs but when instructed by PXE to
> boot locally, you get no more. I have had to work around by setting boot
> order to HDD post install.
>
> Regards,
>
> Nathan
>
>
> On 12 May 2017, at 17:06, Gilad Berman <gber...@lenovo.com> wrote:
>
> There is another setting that set the boot mode to Legacy – it is the
> network PXE boot (found under network in the BIOS), make sure it is set to
> UEFI.
>
>
>
> If you can’t find it let me know and I will provide the exact ASU setting
> (I am not logged in to my lab currently).
>
>
>
> It sounds very much like an issue I had so hopefully it should solve the
> issue
>
>
>
> Sorry if I missed anything in the thread and my suggestion is stupid 
>
>
>
> 
>
> *Gilad Berman*
> HPC Architect
> Lenovo EMEA
>
> +972-52-2554262 <+972%2052-255-4262>
> gber...@lenovo.com <gber...@lenovo.com>
>
>
>
> Lenovo.com <http://www.lenovo.com/>
> Twitter <http://twitter.com/lenovo> | Facebook
> <http://www.facebook.com/lenovo> | Instagram
> <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums
> <http://forums.lenovo.com/>
>
> 
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com
> <damir.krs...@gmail.com>]
> *Sent:* Friday, May 12, 2017 5:49 PM
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk
>
>
>
> I've removed the Legacy Only and the node hangs on the boot - just a
> cursor in the corner of the screen and nothing coming up. It never seems to
> time out
>
>
>
> Super frustrating especially since dx360 nodes are b

Re: [xcat-user] NextScale nodes not booting from the disk

2017-05-12 Thread Damir Krstic
Well the thing is, the node installs properly, and it reboots. On the
reboot, it's suppose to run the postbootscripts and it never boots to the
OS. So we tried changing the boot order to HD and that does not seem to
work.

Damir

On Fri, May 12, 2017 at 1:05 PM Nathan Harper <nathan.har...@cfms.org.uk>
wrote:

> I have a similar issue with some non-IBM/Lenovo equipment. Old dx360s
> work, other new equipment boots, installs but when instructed by PXE to
> boot locally, you get no more. I have had to work around by setting boot
> order to HDD post install.
>
> Regards,
> Nathan
>
> On 12 May 2017, at 17:06, Gilad Berman <gber...@lenovo.com> wrote:
>
> There is another setting that set the boot mode to Legacy – it is the
> network PXE boot (found under network in the BIOS), make sure it is set to
> UEFI.
>
>
>
> If you can’t find it let me know and I will provide the exact ASU setting
> (I am not logged in to my lab currently).
>
>
>
> It sounds very much like an issue I had so hopefully it should solve the
> issue
>
>
>
> Sorry if I missed anything in the thread and my suggestion is stupid 
>
>
>
> 
>
> *Gilad Berman*
> HPC Architect
> Lenovo EMEA
>
> +972-52-2554262 <+972%2052-255-4262>
> gber...@lenovo.com <gber...@lenovo.com>
>
>
>
> Lenovo.com <http://www.lenovo.com/>
> Twitter <http://twitter.com/lenovo> | Facebook
> <http://www.facebook.com/lenovo> | Instagram
> <https://instagram.com/lenovo> | Blogs <http://blog.lenovo.com/> | Forums
> <http://forums.lenovo.com/>
>
> 
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com
> <damir.krs...@gmail.com>]
> *Sent:* Friday, May 12, 2017 5:49 PM
> *To:* xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> *Subject:* Re: [xcat-user] NextScale nodes not booting from the disk
>
>
>
> I've removed the Legacy Only and the node hangs on the boot - just a
> cursor in the corner of the screen and nothing coming up. It never seems to
> time out
>
>
>
> Super frustrating especially since dx360 nodes are booting just fine.
>
> Damir
>
> On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote:
>
> Could you share the IMM and Boot sections from asu show on one of your
> troublesome nodes?
>
>   -- ddj
> Dave Johnson
>
> > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com>
> wrote:
> >
> > We are switching our image from stateless to stateful in a month (going
> from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image
> we have created. Our NextScale nodes are not...The OS gets installed and
> the node reboots but then it never boots from the hard drive. It times out
> at the PXE prompt and just hangs.
> >
> > I am wondering if some BIOS setting is doing this - maybe something with
> legacy mode, or boot order or something else. I've tried manually hitting
> F12 and selecting the hard drive but that did not seem to work.
> >
> > Does anyone have NextScales booting from the hard drive, and if so,
> would you mind sharing (dumping) your settings via asu show?
> >
> > Thanks,
> > Damir
> >
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] NextScale nodes not booting from the disk

2017-05-12 Thread Damir Krstic
Thanks for the reply - I don't think our issue is with PXE...node boots
fine from the PXE and it gets installed...it's just after the installation
when it's supposed to boot from the HD it does not.

Damir

On Fri, May 12, 2017 at 9:03 AM David D. Johnson <david_john...@brown.edu>
wrote:

> Sorry, misspoke on one point — we did take out Legacy Mode from the
> BootModes.
>
> On May 12, 2017, at 9:51 AM, David D. Johnson <david_john...@brown.edu>
> wrote:
>
> When we got our first 5465 Lenovo nodes, we changed the noderes netboot
> attribute from pxe to xnba
> and also changed BootModes.SystemBootMode to “UEFI Mode”. Even though
> Legacy Only is first in
> the boot order,  they still are able to pxe boot just fine.  That is the
> only difference I see, maybe worth a quick
> try to see if it makes any difference in behavior.  We don’t have disks in
> any of our NextScale nodes, so
> I can’t try it out in our environment.
>
>  — ddj
>
>
> On May 12, 2017, at 9:01 AM, Damir Krstic <damir.krs...@gmail.com> wrote:
>
> Sure here it is:
>
> IMM.ForceBootToUefi=Disabled
> IMM.PXE_NextBootEnabled=Disabled
> IMM.SystemNextBootMode=Legacy
> IMM.DHCPBootPCClientPortControl=Open
> LegacySupport.ForceLegacyVideoonBoot=Enabled
> LegacySupport.InfiniteBootRetry=Disabled
> LegacySupport.BBSBoot=Enabled
> BackupBankManagement.NumberOfSuccessfulConsecutiveBoots=1
> DevicesandIOPorts.Com1ActiveAfterBoot=Enable
> DevicesandIOPorts.Com2ActiveAfterBoot=Disable
> BootModes.SystemBootMode=Legacy Mode
> BootModes.OptimizedBoot=Enabled
> BootModes.QuietBoot=Enabled
> BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1
> BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0
> SecureBootConfiguration.SecureBootis=Disabled
>
>
> On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote:
>
>> Could you share the IMM and Boot sections from asu show on one of your
>> troublesome nodes?
>>
>>   -- ddj
>> Dave Johnson
>>
>> > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com>
>> wrote:
>> >
>> > We are switching our image from stateless to stateful in a month (going
>> from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image
>> we have created. Our NextScale nodes are not...The OS gets installed and
>> the node reboots but then it never boots from the hard drive. It times out
>> at the PXE prompt and just hangs.
>> >
>> > I am wondering if some BIOS setting is doing this - maybe something
>> with legacy mode, or boot order or something else. I've tried manually
>> hitting F12 and selecting the hard drive but that did not seem to work.
>> >
>> > Does anyone have NextScales booting from the hard drive, and if so,
>> would you mind sharing (dumping) your settings via asu show?
>> >
>> > Thanks,
>> > Damir
>> >
>> --
>> > Check out the vibrant tech community on one of the world's most
>> > engaging tech sites, Slashdot.org <http://slashdot.org/>!
>> http://sdm.link/slashdot
>> > ___
>> > xCAT-user mailing list
>> > xCAT-user@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>> --
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>> http://sdm.link/slashdot
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org <http://slashdot.org/>!
> http://sdm.link/slashdot___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] NextScale nodes not booting from the disk

2017-05-12 Thread Damir Krstic
Sure here it is:

IMM.ForceBootToUefi=Disabled

IMM.PXE_NextBootEnabled=Disabled

IMM.SystemNextBootMode=Legacy

IMM.DHCPBootPCClientPortControl=Open

LegacySupport.ForceLegacyVideoonBoot=Enabled

LegacySupport.InfiniteBootRetry=Disabled

LegacySupport.BBSBoot=Enabled

BackupBankManagement.NumberOfSuccessfulConsecutiveBoots=1

DevicesandIOPorts.Com1ActiveAfterBoot=Enable

DevicesandIOPorts.Com2ActiveAfterBoot=Disable

BootModes.SystemBootMode=Legacy Mode

BootModes.OptimizedBoot=Enabled

BootModes.QuietBoot=Enabled

BootOrder.BootOrder=Legacy Only=PXE Network=Hard Disk 0=Hard Disk 1

BootOrder.WolBootOrder=PXE Network=CD/DVD Rom=Hard Disk 0

SecureBootConfiguration.SecureBootis=Disabled


On Fri, May 12, 2017 at 6:45 AM <david_john...@brown.edu> wrote:

> Could you share the IMM and Boot sections from asu show on one of your
> troublesome nodes?
>
>   -- ddj
> Dave Johnson
>
> > On May 12, 2017, at 6:45 AM, Damir Krstic <damir.krs...@gmail.com>
> wrote:
> >
> > We are switching our image from stateless to stateful in a month (going
> from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image
> we have created. Our NextScale nodes are not...The OS gets installed and
> the node reboots but then it never boots from the hard drive. It times out
> at the PXE prompt and just hangs.
> >
> > I am wondering if some BIOS setting is doing this - maybe something with
> legacy mode, or boot order or something else. I've tried manually hitting
> F12 and selecting the hard drive but that did not seem to work.
> >
> > Does anyone have NextScales booting from the hard drive, and if so,
> would you mind sharing (dumping) your settings via asu show?
> >
> > Thanks,
> > Damir
> >
> --
> > Check out the vibrant tech community on one of the world's most
> > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > ___
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] NextScale nodes not booting from the disk

2017-05-12 Thread Damir Krstic
We are switching our image from stateless to stateful in a month (going
from RH6 to RH7). Old IBM dx360 nodes are booting fine with this new image
we have created. Our NextScale nodes are not...The OS gets installed and
the node reboots but then it never boots from the hard drive. It times out
at the PXE prompt and just hangs.

I am wondering if some BIOS setting is doing this - maybe something with
legacy mode, or boot order or something else. I've tried manually hitting
F12 and selecting the hard drive but that did not seem to work.

Does anyone have NextScales booting from the hard drive, and if so, would
you mind sharing (dumping) your settings via asu show?

Thanks,
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] dracut error with stateful image

2017-05-08 Thread Damir Krstic
Just checking to see if anyone knows about dracut error messages. We are
getting close to booting 700+ with this new image and I am afraid we will
have a lot of failures because of the dracut error.

Thanks,
Damir

On Mon, May 1, 2017 at 2:23 PM Damir Krstic <damir.krs...@gmail.com> wrote:

> We are in process of building a RH 7 stateful image on our cluster. After
> setting a node to install with this image, sometimes we will see the
> following error:
>
> dracut-initqueue[736]: Warning: No carrier detected on interface eno1
>
> Usually, resetting the node fixes this issue (node installs properly). We
> are planning to take downtime in June and install all 700+ nodes with this
> image. I would like to clear up this delay / error if at all possible
> before the downtime.
>
> Has anyone seen this issue before?
>
> Thanks,
> Damir
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] dracut error with stateful image

2017-05-01 Thread Damir Krstic
We are in process of building a RH 7 stateful image on our cluster. After
setting a node to install with this image, sometimes we will see the
following error:

dracut-initqueue[736]: Warning: No carrier detected on interface eno1

Usually, resetting the node fixes this issue (node installs properly). We
are planning to take downtime in June and install all 700+ nodes with this
image. I would like to clear up this delay / error if at all possible
before the downtime.

Has anyone seen this issue before?

Thanks,
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] dracut errors when deploying a stateful image

2017-05-01 Thread Damir Krstic
We are in process of building a stateful RH7 image. When booting nodes
sometimes they will get stuck at the following screen:
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] redhat 7.3 and xCAT-2.9.1

2017-03-02 Thread Damir Krstic
Hi Christian -

Thanks - I will look at renaming the interface. As for the name, DNS is
working properly on the client. We are not stopping/disabling
NetworkManager during the install. Should we? I should also mention that
RedHat 7.1 booted and the name was set properly.

Thanks,
Damir

On Thu, Mar 2, 2017 at 11:16 AM Christian Caruthers <ccaruth...@lenovo.com>
wrote:

> Damir,
>
>
>
> The net device naming you're seeing is consistent net device naming.
> There's a write up for disabling it on the RHEL documentation site.
>
>
>
>
> https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Networking_Guide/sec-Disabling_Consistent_Network_Device_Naming.html
>
>
>
> I believe you can place the "net.ifnames=0" option in
> bootparams.addkcmdline
>
>
>
> If I'm not mistaken, the hostname should be set by the DHCP server and
> then hard coded using the hardeths postscript, if it's configured to run.
> Is DNS working properly on the client? Are you stopping/disabling
> NetworkManager during install?
>
>
>
> Regards,
> *Christian Caruthers*
> Lenovo xESS IT Consultant
>
> Mobile: 757-289-9872 <(757)%20289-9872>
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Thursday, March 2, 2017 11:33 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] redhat 7.3 and xCAT-2.9.1
>
>
>
> Hi All,
>
>
>
> Our management server is running RedHat 6.6 and xcat version 2.9.1.
>
>
>
> We are hoping to upgrade all of our compute clients to RedHat 7 by July of
> this year.
>
>
>
> We are going from stateless to stateful (installed on local hard drive)
> images.
>
>
>
> To that end, I've done copycds of RH7.3 ISO and have followed xCAT
> document on creating a new install image.
>
>
>
> We got one of the compute nodes installed and booted with 7.3 but there
> are couple of problems:
>
>
>
> 1. Interface is named eno1 <-- we would like to change this permanently on
> boot to eth0
>
> 2. hostname is not set <-- node boots with localhost for hostname
>
>
>
> Any help is appreciated.
>
> Damir
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] redhat 7.3 and xCAT-2.9.1

2017-03-02 Thread Damir Krstic
Hi All,

Our management server is running RedHat 6.6 and xcat version 2.9.1.

We are hoping to upgrade all of our compute clients to RedHat 7 by July of
this year.

We are going from stateless to stateful (installed on local hard drive)
images.

To that end, I've done copycds of RH7.3 ISO and have followed xCAT document
on creating a new install image.

We got one of the compute nodes installed and booted with 7.3 but there are
couple of problems:

1. Interface is named eno1 <-- we would like to change this permanently on
boot to eth0
2. hostname is not set <-- node boots with localhost for hostname

Any help is appreciated.
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] rhel7.1 working kickstart file

2017-01-23 Thread Damir Krstic
Thanks I've copied a file from /opt/xcat/share - let's see if it works.
Damir

On Mon, Jan 23, 2017 at 10:52 AM Russell Auld <russa...@comcast.net> wrote:

> You definitely need to use a RH7 compatible script.
> Look in /opt/xcat/share for sample files
>
> On Jan 23, 2017 11:25 AM, Damir Krstic <damir.krs...@gmail.com> wrote:
>
> I am trying to install RH7.1 (stateful) and the install keeps failing at
> various points of my kickstart file. I think I may be using RH6 kickstart
> file template in /install/custom/install/rh
>
> Does anyone have a working generic RH7.1 kickstart template that I can use?
>
> Thank you.
> Damir
>
>
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, SlashDot.org! http://sdm.link/slashdot
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] rhel7.1 working kickstart file

2017-01-23 Thread Damir Krstic
I am trying to install RH7.1 (stateful) and the install keeps failing at
various points of my kickstart file. I think I may be using RH6 kickstart
file template in /install/custom/install/rh

Does anyone have a working generic RH7.1 kickstart template that I can use?

Thank you.
Damir
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] statefull vs. stateless images

2017-01-13 Thread Damir Krstic
Hi Jarrod,

Thanks for the prompt answer. I agree with you re. stateless. Next hardware
purchase we will be going statefull.

to that end, we are running following version of xCAT:

[root@mgt rh]# rpm -qa |grep -i xcat

conserver-xcat-8.1.16-10.x86_64

xCAT-2.9.1-snap201503190326.x86_64

xCAT-genesis-base-x86_64-2.9-snap201504212134.noarch

elilo-xcat-3.14-4.noarch

xCAT-server-2.9.1-snap201503190325.noarch

grub2-xcat-1.0-2.noarch

perl-xCAT-2.9.1-snap201503190325.noarch

xCAT-buildkit-2.9.1-snap201503190326.noarch

ipmitool-xcat-1.8.11-3.x86_64

xCAT-client-2.9.1-snap201503190325.noarch

xCAT-genesis-scripts-x86_64-2.9.1-snap201503190326.noarch

syslinux-xcat-3.86-2.noarch

I think in order to deploy statefull version of RH7.3 we will need to
update our xCAT. What is the most painless way of upgrading from our
version to the latest stable RH 7 supporting version? Are there any gotchas
or recommended practices when it comes to upgrade of xCAT? Last time I had
to do this, instead of upgrading, I deployed a new xCAT server which was
not too painful but I don't have the notes of what I had to do to get it
going.

I would much rather just upgrade the xCAT on this server because the
machine itself is not that old (2 years or so now).

Anything I should back up before attempting upgrade as well?

Thanks,
Damir


On Fri, Jan 13, 2017 at 9:10 AM Jarrod Johnson <jjohns...@lenovo.com> wrote:

> I think stateless makes a little less sense over time.
>
>
>
> 1)  Local boot storage is cheaper and more durable than it used to
> be, and this is only going to get more extreme
>
> 2)  Dynamism is probably better and more easily served by somethig
> like Singularity, which makes things easier for users to do their thing
> without the administrators having to accommodate.
>
> 3)  Mitigating drift can be done in other ways.  Stateless has
> traditionally had the side effect of mitigating accumulating ‘drift’ as
> people do things ad-hoc to OS images, by punishing those practices.
> Strictly speaking the same discipline can be self-imposed without downside,
> it just takes some willpower.
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Friday, January 13, 2017 9:20 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] statefull vs. stateless images
>
>
>
> We have been running our cluster using stateless images for over 6 years
> now. For the most part, things are running great. There are two reasons for
> our decision to run stateless:
>
> 1. our compute nodes originally did not have local hard drives
>
> 2. we envisioned a dynamic environment in which we would boot nodes
> frequently with different images to satisfy different research needs
>
>
>
> Today both of those points are invalid / do not apply. All of our compute
> nodes come with hard drives, and we have never really booted cluster with
> any images other than our "production" image. In addition, downtimes are
> really hard to come by in our environment, and we treat our cluster as
> production system.
>
>
>
> So, my question is, does it make sense to continue with stateless images,
> or would we be better served with statefull (installed on local disk)
> images.
>
>
>
> I question our today's method because:
>
> 1. stateless images are not trivial to build and update using genimage,
> putting mellanox drivers, gpfs etc. We don't do it often enough so every
> time we have to do it, we are re-inventing a wheel.
>
> 2. stateless images take up portion of compute node memory
>
>
>
> Are there any downsides to running a 700+ node cluster using statefull
> images? Like I said, we don't boot the cluster at all for many months at
> the time (we get a single downtime during the year), and most of the
> packages outside of normal RH installation are installed using postscripts.
>
>
>
> Let me know your thoughts.
>
>
>
> Thanks,
>
> Damir
>
> --
> Developer Access Program for Intel Xeon Phi Processors
> Access to Intel Xeon Phi processor-based developer platforms.
> With one year of Intel Parallel Studio XE.
> Training and support from Colfax.
> Order your platform today. http://sdm.link/xeonphi
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] configuring fpc

2016-06-13 Thread Damir Krstic
Thanks for the info - I was able to reset it by removing battery procedure.
Next time I will look for the paperclip hole - that will save me some time.

I got the FPC discovered and programmed.

Thanks all.
Damir

On Mon, Jun 13, 2016 at 10:15 AM Christian Caruthers <ccaruth...@lenovo.com>
wrote:

> You can reset the FPC to factory default. It's an ornery process. If I
> receall correctly, you remove the FPC for 10-minutes. Remove the battery
> before reinserting the FPC. Let the FPC run w/o a battery for 10-minutes.
> Remove thre FPC, replace the battery and reinsert the FPC. It's important
> to respect the 10-minute guideline. I've seen where a customer was short by
> a minute or so, and the FPC did not reset to factory default.
>
>
>
> Once that is done, configfpc should be able to find the default IP.
>
>
>
> Regards,
> *Christian Caruthers*
> Lenovo xESS IT Consultant
>
> Mobile: 757-289-9872
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Monday, June 13, 2016 10:57 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] configuring fpc
>
>
>
> We got an empty N1200 from Lenovo some time back in anticipation of new
> nodes arriving this summer. In previous times, Lenovo would program the
> FPCs with some internal address as a part of their cluster solution
> (172.30.101.141 for example).
>
>
>
> This empty chassis does not have its IP recorded in the paperwork we
> received from Lenovo on delivery of the rack.
>
>
>
> I am trying to configure this FPC using configfpc command and if I try
> using the defaults: configfpc -i bond0 I get no default IP found.
>
>
>
> I see the FPC on the switch port and I see its mac:
>
> qfivebnt08#show mac-address-table interface port 42
>
>  MAC address   VLAN PortTrnk  State  Permanent  Openflow
>
>   -    ---    -  -  
>
>   6c:ae:8b:5e:56:14   142 FWD  N
>
>
>
> I also have the fpc configured for the right switch port in xCAT:
>
>
>
> [root@mgt ~]# lsdef qfpc24
>
> Object name: qfpc24
>
> bmc=qfpc24
>
> bmcpassword=PASSW0RD
>
> bmcusername=USERID
>
> cons=ipmi
>
> groups=rack-t22fpc,qfpc,all
>
> ip=172.30.11.24
>
> mgt=ipmi
>
> nodetype=qfpc
>
> postbootscripts=otherpkgs,setupntp
>
> postscripts=syslog,remoteshell,syncfiles
>
> switch=qfivebnt08
>
> switchport=42
>
> I suspect this FPC is configured with some other IP (other than default)
> but I don't know what that IP is since it's not documented. Any way of
> programing the FPC if I don't have the IP?
>
>
>
> Thanks,
>
> Damir
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] configuring fpc

2016-06-13 Thread Damir Krstic
OK I'll try that. Thanks,
Damir

On Mon, Jun 13, 2016 at 10:15 AM Christian Caruthers <ccaruth...@lenovo.com>
wrote:

> You can reset the FPC to factory default. It's an ornery process. If I
> receall correctly, you remove the FPC for 10-minutes. Remove the battery
> before reinserting the FPC. Let the FPC run w/o a battery for 10-minutes.
> Remove thre FPC, replace the battery and reinsert the FPC. It's important
> to respect the 10-minute guideline. I've seen where a customer was short by
> a minute or so, and the FPC did not reset to factory default.
>
>
>
> Once that is done, configfpc should be able to find the default IP.
>
>
>
> Regards,
> *Christian Caruthers*
> Lenovo xESS IT Consultant
>
> Mobile: 757-289-9872
>
>
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Monday, June 13, 2016 10:57 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] configuring fpc
>
>
>
> We got an empty N1200 from Lenovo some time back in anticipation of new
> nodes arriving this summer. In previous times, Lenovo would program the
> FPCs with some internal address as a part of their cluster solution
> (172.30.101.141 for example).
>
>
>
> This empty chassis does not have its IP recorded in the paperwork we
> received from Lenovo on delivery of the rack.
>
>
>
> I am trying to configure this FPC using configfpc command and if I try
> using the defaults: configfpc -i bond0 I get no default IP found.
>
>
>
> I see the FPC on the switch port and I see its mac:
>
> qfivebnt08#show mac-address-table interface port 42
>
>  MAC address   VLAN PortTrnk  State  Permanent  Openflow
>
>   -    ---    -  -  
>
>   6c:ae:8b:5e:56:14   142 FWD  N
>
>
>
> I also have the fpc configured for the right switch port in xCAT:
>
>
>
> [root@mgt ~]# lsdef qfpc24
>
> Object name: qfpc24
>
> bmc=qfpc24
>
> bmcpassword=PASSW0RD
>
> bmcusername=USERID
>
> cons=ipmi
>
> groups=rack-t22fpc,qfpc,all
>
> ip=172.30.11.24
>
> mgt=ipmi
>
> nodetype=qfpc
>
> postbootscripts=otherpkgs,setupntp
>
> postscripts=syslog,remoteshell,syncfiles
>
> switch=qfivebnt08
>
> switchport=42
>
> I suspect this FPC is configured with some other IP (other than default)
> but I don't know what that IP is since it's not documented. Any way of
> programing the FPC if I don't have the IP?
>
>
>
> Thanks,
>
> Damir
>
> --
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] configuring fpc

2016-06-13 Thread Damir Krstic
We got an empty N1200 from Lenovo some time back in anticipation of new
nodes arriving this summer. In previous times, Lenovo would program the
FPCs with some internal address as a part of their cluster solution
(172.30.101.141 for example).

This empty chassis does not have its IP recorded in the paperwork we
received from Lenovo on delivery of the rack.

I am trying to configure this FPC using configfpc command and if I try
using the defaults: configfpc -i bond0 I get no default IP found.

I see the FPC on the switch port and I see its mac:

qfivebnt08#show mac-address-table interface port 42

 MAC address   VLAN PortTrnk  State  Permanent  Openflow

  -    ---    -  -  

  6c:ae:8b:5e:56:14   142 FWD  N


I also have the fpc configured for the right switch port in xCAT:


[root@mgt ~]# lsdef qfpc24

Object name: qfpc24

bmc=qfpc24

bmcpassword=PASSW0RD

bmcusername=USERID

cons=ipmi

groups=rack-t22fpc,qfpc,all

ip=172.30.11.24

mgt=ipmi

nodetype=qfpc

postbootscripts=otherpkgs,setupntp

postscripts=syslog,remoteshell,syncfiles

switch=qfivebnt08

switchport=42

I suspect this FPC is configured with some other IP (other than default)
but I don't know what that IP is since it's not documented. Any way of
programing the FPC if I don't have the IP?


Thanks,

Damir
--
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] couldn't find the kernel file matched 2.6.32-504.16.2.el6.x86_64 in /install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2/rootimg/boot at ./genimage line 72..

2016-03-21 Thread Damir Krstic
Getting following error when generating new image:

couldn't find the kernel file matched 2.6.32-504.16.2.el6.x86_64 in
/install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2/rootimg/boot at
./genimage line 72..

Here is the lsdef for the image:

[root@mgt boot]# lsdef -t osimage compute6.6gpfs4.2

Object name: compute6.6gpfs4.2

exlist=/install/custom/netboot/rh/compute6.6gpfs4.2.exlist

imagetype=linux

kerneldir=/install/kernels/

kernelver=2.6.32-504.16.2.el6.x86_64

osarch=x86_64

osdistroname=rhels6.6-x86_64

osname=Linux

osvers=rhels6.6

otherpkgdir=/install/post/otherpkgs/rhels6.6/x86_64


otherpkglist=/install/custom/netboot/rh/compute6.6gpfs4.2.otherpkgs.pkglist

permission=755

pkgdir=/install/rhels6.6/x86_64

pkglist=/install/custom/netboot/rh/compute6.6gpfs4.2.pkglist

postinstall=/install/custom/netboot/rh/compute6.6gpfs4.2.postinstall

profile=compute6.6gpfs4.2

provmethod=netboot

rootimgdir=/install/netboot/rhels6.6/x86_64/compute6.6gpfs4.2

synclists=/install/custom/netboot/rh/synclist6.6gpfs4.2

Here is the listing of the directory /install/kernels

[root@mgt boot]# ls -l /install/kernels/

total 4

drwxr-xr-x 3 root root 4096 Mar 21 09:03 2.6.32-504.16.2.el6.x86_64

[root@mgt boot]# ls -l /install/kernels/2.6.32-504.16.2.el6.x86_64/

total 88160

-rwx-- 1 root root  371 Jan 13 10:02 install.sh

-rw-r--r-- 1 root root 30526712 Apr 21  2015
kernel-2.6.32-504.16.2.el6.x86_64.rpm

-rw-r--r-- 1 root root 31244080 Apr 21  2015
kernel-debug-2.6.32-504.16.2.el6.x86_64.rpm

-rw-r--r-- 1 root root  9831692 Apr 21  2015
kernel-devel-2.6.32-504.16.2.el6.x86_64.rpm

-rw-r--r-- 1 root root 15140648 Apr 21  2015
kernel-firmware-2.6.32-504.16.2.el6.noarch.rpm

-rw-r--r-- 1 root root  3517060 Apr 21  2015
kernel-headers-2.6.32-504.16.2.el6.x86_64.rpm

drwxr-xr-x 2 root root 4096 Mar 21 09:03 repodata


I ran the createrepo command after copying the kernel files in this
directory.

Still getting the error in generating image. Any help is much appreciated.

Thanks,

Damir
--
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] issue when installing node

2016-01-13 Thread Damir Krstic
it turns out it was a bad eth cable. when watching console i see eth timing
out so i replaced the cable and it worked.

thanks,
damir

On Wed, Jan 13, 2016 at 12:32 AM Xiao Peng Wang <w...@cn.ibm.com> wrote:

> From symptom looks like your eth0 booted from dhcp and it faild to get ip
> from dhcpd. You may find dhcp failed log when this issue happens.
>
> Thanks
> Best Regards
> --
> Wang Xiaopeng (王晓朋)
> IBM China System Technology Laboratory
> Tel: 86-10-82453455
> Email: w...@cn.ibm.com
> Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
> Haidian District Beijing P.R.China 100193
>
>
>
> - Original message -
> From: Damir Krstic <damir.krs...@gmail.com>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
>
> Cc:
> Subject: Re: [xcat-user] issue when installing node
> Date: Wed, Jan 13, 2016 9:09 AM
>
> Yes all deployed from single management using commands like: nodeadd
> gpu[1-6] same thing for makehosts and makedns. That's why it's so weird and
> confusing that only one mode is having this issue.
>
> Before I left work today I redeployed this mode using our compute
> stateless image and that worked ok. Mode got the right name and eth
> started. Not sure what else to check. I am suspecting dhcp issue at this
> time but cannot be sure.
> On Tue, Jan 12, 2016 at 16:17 Casandra H Qiu <cxh...@us.ibm.com> wrote:
>
> are all the 6 nodes defined on the same MN? check if the failed node
> defined in the /etc/hosts. make sure you ran makedns and makedhcp and
> compare node definition use lsdef command.
>
>
> Thanks,
> Casandra
> ...
> Casandra Hong Qiu
> Phone: (845) 433-9291, t/l 293-9291
> Office: B/002, Floor 3, Z13
> cxh...@us.ibm.com
>
>
>
> [image: Inactive hide details for Damir Krstic ---01/12/2016 04:37:37
> PM---Hi, When installing new batch of GPU nodes via xCAT I've ran]Damir
> Krstic ---01/12/2016 04:37:37 PM---Hi, When installing new batch of GPU
> nodes via xCAT I've ran into an issue I
>
> From: Damir Krstic <damir.krs...@gmail.com>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Date: 01/12/2016 04:37 PM
> Subject: [xcat-user] issue when installing node
> --
>
>
>
>
>
>
> Hi,
>
> When installing new batch of GPU nodes via xCAT I've ran into an issue I
> have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all
> installed OK (nodeset  osimage=gpu6.6) except one.
>
> Symptoms of the issue: node installs OK (watching it via rcons/wcons) but
> on the reboot eth0 does not come up and hostname is set to
> localhost.localdomain
>
> Has anyone seen this issue before?
>
> Thanks,
>
> Damir
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
> [image: graycol.gif][image: graycol.gif]
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> -

[xcat-user] issue when installing node

2016-01-12 Thread Damir Krstic
Hi,

When installing new batch of GPU nodes via xCAT I've ran into an issue I
have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all
installed OK (nodeset  osimage=gpu6.6) except one.

Symptoms of the issue: node installs OK (watching it via rcons/wcons) but
on the reboot eth0 does not come up and hostname is set to
localhost.localdomain

Has anyone seen this issue before?

Thanks,
Damir
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] nextscale nx360m4 GPU not booting from hard drive after installation

2016-01-12 Thread Damir Krstic
they are using pxe - i have traced the issue down to upgrade of kernel.
initial install was redhat 6.6 kernel 504. after updating kernel to 504.16
node would not boot. i have to hook up the console to the node to see what
it is doing - remote console does not show much information. i've
reinstalled it with 504 kernel and it is booted now.

On Tue, Jan 12, 2016 at 9:13 AM Rich Sudlow <r...@nd.edu> wrote:

> On 01/11/2016 07:11 PM, Damir Krstic wrote:
> > Hi,
> >
> > Installing 6 new GPU NextScale nodes...installation went fine but on the
> reboot
> > nodes get stuck after going through POST. Nothing on the console as far
> as I can
> > tell. Tried changing boot order to hard disk first and also changing
> from UEFI
> > to Legacy and that did not fix it.
> >
> > Also tried rsetboot  hd and that did not seem to fix it
> either. Any
> > suggestion?
>
> When these nodes build are these using pxe or xnba - I'd recommend using
> xnba.
>
> Rich
>
>
>
> >
> > Thanks,
> > Damir
> >
> >
> >
> --
> > Site24x7 APM Insight: Get Deep Visibility into Application Performance
> > APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> > Monitor end-to-end web transactions and take corrective actions now
> > Troubleshoot faster and improve end-user experience. Signup Now!
> > http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> >
> >
> >
> > ___
> > xCAT-user mailing list
> > xCAT-user@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/xcat-user
> >
>
>
> --
> Rich Sudlow
> University of Notre Dame
> Center for Research Computing - Union Station
> 506 W. South St
> South Bend, In 46601
>
> (574) 631-7258 (office)
> (574) 807-1046 (cell)
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] issue when installing node

2016-01-12 Thread Damir Krstic
Yes all deployed from single management using commands like: nodeadd
gpu[1-6] same thing for makehosts and makedns. That's why it's so weird and
confusing that only one mode is having this issue.

Before I left work today I redeployed this mode using our compute stateless
image and that worked ok. Mode got the right name and eth started. Not sure
what else to check. I am suspecting dhcp issue at this time but cannot be
sure.
On Tue, Jan 12, 2016 at 16:17 Casandra H Qiu <cxh...@us.ibm.com> wrote:

> are all the 6 nodes defined on the same MN? check if the failed node
> defined in the /etc/hosts. make sure you ran makedns and makedhcp and
> compare node definition use lsdef command.
>
>
> Thanks,
> Casandra
> ...
> Casandra Hong Qiu
> Phone: (845) 433-9291, t/l 293-9291
> Office: B/002, Floor 3, Z13
> cxh...@us.ibm.com
>
>
>
> [image: Inactive hide details for Damir Krstic ---01/12/2016 04:37:37
> PM---Hi, When installing new batch of GPU nodes via xCAT I've ran]Damir
> Krstic ---01/12/2016 04:37:37 PM---Hi, When installing new batch of GPU
> nodes via xCAT I've ran into an issue I
>
> From: Damir Krstic <damir.krs...@gmail.com>
> To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
> Date: 01/12/2016 04:37 PM
> Subject: [xcat-user] issue when installing node
> --
>
>
>
>
> Hi,
>
> When installing new batch of GPU nodes via xCAT I've ran into an issue I
> have not seen before. Out of 6 NextScale nx360m4 nodes with k80 GPUs all
> installed OK (nodeset  osimage=gpu6.6) except one.
>
> Symptoms of the issue: node installs OK (watching it via rcons/wcons) but
> on the reboot eth0 does not come up and hostname is set to
> localhost.localdomain
>
> Has anyone seen this issue before?
>
> Thanks,
>
> Damir
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> --
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] nextscale nx360m4 GPU not booting from hard drive after installation

2016-01-11 Thread Damir Krstic
Hi,

Installing 6 new GPU NextScale nodes...installation went fine but on the
reboot nodes get stuck after going through POST. Nothing on the console as
far as I can tell. Tried changing boot order to hard disk first and also
changing from UEFI to Legacy and that did not fix it.

Also tried rsetboot  hd and that did not seem to fix it either.
Any suggestion?

Thanks,
Damir
--
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311=/4140___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] issue programing bmc

2015-10-06 Thread Damir Krstic
Yes trying to use shared on board gigabit port:

[xCAT Genesis running on qhimem0004 /bin]# ipmitool raw 0xc 2 1 0xc0 0 0

 11 01

On Tue, Oct 6, 2015 at 12:36 PM Jarrod Johnson <jjohns...@lenovo.com> wrote:

> Can you do an ipmitool raw 0xc 2 1 0xc0 0 0
>
>
>
> I assume you are trying to use shared on the on board gigabit port?  If
> another configuration , let me know.
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Tuesday, October 06, 2015 10:56 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] issue programing bmc
>
>
>
> We have new x3550M5 that we just discovered. After being discovered I ran
> runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1
> tells me the following:
>
> [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1
>
> Set in Progress : Set Complete
>
> Auth Type Support   : NONE MD5 PASSWORD
>
> Auth Type Enable: Callback :
>
> : User : MD5 PASSWORD
>
> : Operator : MD5 PASSWORD
>
> : Admin: MD5
>
> : OEM  :
>
> IP Address Source   : Static Address
>
> IP Address  : 172.29.10.14
>
> Subnet Mask : 255.255.0.0
>
> MAC Address : 40:f2:e9:bb:86:dd
>
> SNMP Community String   : public
>
> IP Header   : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
>
> BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
>
> Gratituous ARP Intrvl   : 2.0 seconds
>
> Default Gateway IP  : 0.0.0.0
>
> Default Gateway MAC : 00:00:00:00:00:00
>
> Backup Gateway IP   : 0.0.0.0
>
> Backup Gateway MAC  : 00:00:00:00:00:00
>
> 802.1q VLAN ID  : Disabled
>
> 802.1q VLAN Priority: 0
>
> RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
>
> Cipher Suite Priv Max   : Xaa
>
> : X=Cipher Suite Unused
>
> : c=CALLBACK
>
> : u=USER
>
> : o=OPERATOR
>
> : a=ADMIN
>
> : O=OEM
>
>
>
> The IP is correct, and for set it indicates complete. However, I can not
> telnet to this IP (qhimem0004-bmc).
>
> Here is the hosts table entry for this node:
>
> "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface",
>
> Here is the /etc/hosts file entry:
>
> 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc
>
> here is the nodels entry for the ipmi port:
>
> [root@mgt log]# nodels qhimem0004 ipmi.bmcport
>
> qhimem0004: 0
>
> When trying to telnet to this address I get no route to host. I have
> logged in to the switch itself and see mac address of the interface show up
> on the port on the switch, but not the mac of the BMC port.
>
>
>
> Any help is greatly appreciated.
>
> Thanks,
>
> Damir
>
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] issue programing bmc

2015-10-06 Thread Damir Krstic
Thank you so much that fixed it.
Damir

On Tue, Oct 6, 2015 at 12:36 PM Jarrod Johnson <jjohns...@lenovo.com> wrote:

> Can you do an ipmitool raw 0xc 2 1 0xc0 0 0
>
>
>
> I assume you are trying to use shared on the on board gigabit port?  If
> another configuration , let me know.
>
>
>
> *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
> *Sent:* Tuesday, October 06, 2015 10:56 AM
> *To:* xCAT Users Mailing list
> *Subject:* [xcat-user] issue programing bmc
>
>
>
> We have new x3550M5 that we just discovered. After being discovered I ran
> runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1
> tells me the following:
>
> [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1
>
> Set in Progress : Set Complete
>
> Auth Type Support   : NONE MD5 PASSWORD
>
> Auth Type Enable: Callback :
>
> : User : MD5 PASSWORD
>
> : Operator : MD5 PASSWORD
>
> : Admin: MD5
>
> : OEM  :
>
> IP Address Source   : Static Address
>
> IP Address  : 172.29.10.14
>
> Subnet Mask : 255.255.0.0
>
> MAC Address : 40:f2:e9:bb:86:dd
>
> SNMP Community String   : public
>
> IP Header   : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
>
> BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
>
> Gratituous ARP Intrvl   : 2.0 seconds
>
> Default Gateway IP  : 0.0.0.0
>
> Default Gateway MAC : 00:00:00:00:00:00
>
> Backup Gateway IP   : 0.0.0.0
>
> Backup Gateway MAC  : 00:00:00:00:00:00
>
> 802.1q VLAN ID  : Disabled
>
> 802.1q VLAN Priority: 0
>
> RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
>
> Cipher Suite Priv Max   : Xaa
>
> : X=Cipher Suite Unused
>
> : c=CALLBACK
>
> : u=USER
>
> : o=OPERATOR
>
> : a=ADMIN
>
> : O=OEM
>
>
>
> The IP is correct, and for set it indicates complete. However, I can not
> telnet to this IP (qhimem0004-bmc).
>
> Here is the hosts table entry for this node:
>
> "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface",
>
> Here is the /etc/hosts file entry:
>
> 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc
>
> here is the nodels entry for the ipmi port:
>
> [root@mgt log]# nodels qhimem0004 ipmi.bmcport
>
> qhimem0004: 0
>
> When trying to telnet to this address I get no route to host. I have
> logged in to the switch itself and see mac address of the interface show up
> on the port on the switch, but not the mac of the BMC port.
>
>
>
> Any help is greatly appreciated.
>
> Thanks,
>
> Damir
>
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] issue programing bmc

2015-10-06 Thread Damir Krstic
I'll try power cycle. Thanks for the suggestion. I wish I could fully
understand the bmcsetup process. For example, if I manually run bmcsetup
out of /bin directory, it runs and completes and lights up the blue light
on the front of the server. However, node destiny never changes from
bmcsetup and the programmed IP is not pingable/telnet does not work.

I'll try power cycle as you suggested. I just wish there were other
troubleshooting steps I can take to see where I stand with this node. I
tried tcpdump from the management node and I don't see any traffic with the
mac of the shared eth interface come across the management bmc interface.

Thanks,
Damir

On Tue, Oct 6, 2015 at 12:23 PM David D Johnson <david_john...@brown.edu>
wrote:

> If I remember correctly, changing the IP address does not take effect
> until the IMM/BMC is
> reset, or power-cycled.
>
> On Oct 6, 2015, at 1:03 PM, Damir Krstic <damir.krs...@gmail.com> wrote:
>
> I am not sure that's the case with these nodes. I have provisioned few
> x3550M5s over few days and none of them had this issue. The issue with this
> node is that I mis-provisioned it (using wrong IP etc.) so this morning I
> cleared everything out and tried again. I see via ipmitool that set is
> complete, but the destiny of the node never changes from bmcsetup.
>
> Is there way to force it to reprogram again?
>
> Damir
>
> On Tue, Oct 6, 2015 at 10:59 AM David D Johnson <david_john...@brown.edu>
> wrote:
>
>> My suspicion is that your IMM2 is set to use the dedicated IMM ethernet
>> port,
>> but you intended to use the shared IMM/eth0 port instead.
>>
>> This needs to be configured using the UEFI -- hit  configuration,
>> under the tab
>> where other IMM network settings are found.  If there is a way to do this
>> using ipmitool,
>> I have not found it.
>>
>> If you plug a separate cable between switch and dedicated IMM port, and
>> the MAC
>> you're looking for shows up on that switch port when you ping it, you can
>> then use ASU to change
>> IMM.SharedNicMode from Dedicated to Shared and then reboot the imm
>> (ipmitool mc reset cold).
>> [note -- I checked this on an x3550-M4, sometimes the variables are
>> spelled differently from release to release].
>>
>> Remove the extra cable, and you should be back in business.
>>
>>  -- ddj
>> Dave Johnson
>> Brown University CCV
>>
>> On Oct 6, 2015, at 10:56 AM, Damir Krstic <damir.krs...@gmail.com> wrote:
>>
>> We have new x3550M5 that we just discovered. After being discovered I ran
>> runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1
>> tells me the following:
>>
>> [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1
>>
>> Set in Progress : Set Complete
>>
>> Auth Type Support   : NONE MD5 PASSWORD
>>
>> Auth Type Enable: Callback :
>>
>> : User : MD5 PASSWORD
>>
>> : Operator : MD5 PASSWORD
>>
>> : Admin: MD5
>>
>> : OEM  :
>>
>> IP Address Source   : Static Address
>>
>> IP Address  : 172.29.10.14
>>
>> Subnet Mask : 255.255.0.0
>>
>> MAC Address : 40:f2:e9:bb:86:dd
>>
>> SNMP Community String   : public
>>
>> IP Header   : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
>>
>> BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
>>
>> Gratituous ARP Intrvl   : 2.0 seconds
>>
>> Default Gateway IP  : 0.0.0.0
>>
>> Default Gateway MAC : 00:00:00:00:00:00
>>
>> Backup Gateway IP   : 0.0.0.0
>>
>> Backup Gateway MAC  : 00:00:00:00:00:00
>>
>> 802.1q VLAN ID  : Disabled
>>
>> 802.1q VLAN Priority: 0
>>
>> RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
>>
>> Cipher Suite Priv Max   : Xaa
>>
>> : X=Cipher Suite Unused
>>
>> : c=CALLBACK
>>
>> : u=USER
>>
>> : o=OPERATOR
>>
>> : a=ADMIN
>>
>> : O=OEM
>>
>>
>> The IP is correct, and for set it indicates complete. However, I can not
>> telnet to this IP (qhimem0004-bmc).
&g

Re: [xcat-user] issue programing bmc

2015-10-06 Thread Damir Krstic
I am not sure that's the case with these nodes. I have provisioned few
x3550M5s over few days and none of them had this issue. The issue with this
node is that I mis-provisioned it (using wrong IP etc.) so this morning I
cleared everything out and tried again. I see via ipmitool that set is
complete, but the destiny of the node never changes from bmcsetup.

Is there way to force it to reprogram again?

Damir

On Tue, Oct 6, 2015 at 10:59 AM David D Johnson <david_john...@brown.edu>
wrote:

> My suspicion is that your IMM2 is set to use the dedicated IMM ethernet
> port,
> but you intended to use the shared IMM/eth0 port instead.
>
> This needs to be configured using the UEFI -- hit  configuration,
> under the tab
> where other IMM network settings are found.  If there is a way to do this
> using ipmitool,
> I have not found it.
>
> If you plug a separate cable between switch and dedicated IMM port, and
> the MAC
> you're looking for shows up on that switch port when you ping it, you can
> then use ASU to change
> IMM.SharedNicMode from Dedicated to Shared and then reboot the imm
> (ipmitool mc reset cold).
> [note -- I checked this on an x3550-M4, sometimes the variables are
> spelled differently from release to release].
>
> Remove the extra cable, and you should be back in business.
>
>  -- ddj
> Dave Johnson
> Brown University CCV
>
> On Oct 6, 2015, at 10:56 AM, Damir Krstic <damir.krs...@gmail.com> wrote:
>
> We have new x3550M5 that we just discovered. After being discovered I ran
> runcmd=bmcsetup command. I can ssh to the node and ipmitool lan print 1
> tells me the following:
>
> [xCAT Genesis running on qhimem0004 /]# ipmitool lan print 1
>
> Set in Progress : Set Complete
>
> Auth Type Support   : NONE MD5 PASSWORD
>
> Auth Type Enable: Callback :
>
> : User : MD5 PASSWORD
>
> : Operator : MD5 PASSWORD
>
> : Admin: MD5
>
> : OEM  :
>
> IP Address Source   : Static Address
>
> IP Address  : 172.29.10.14
>
> Subnet Mask : 255.255.0.0
>
> MAC Address : 40:f2:e9:bb:86:dd
>
> SNMP Community String   : public
>
> IP Header   : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
>
> BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
>
> Gratituous ARP Intrvl   : 2.0 seconds
>
> Default Gateway IP  : 0.0.0.0
>
> Default Gateway MAC : 00:00:00:00:00:00
>
> Backup Gateway IP   : 0.0.0.0
>
> Backup Gateway MAC  : 00:00:00:00:00:00
>
> 802.1q VLAN ID  : Disabled
>
> 802.1q VLAN Priority: 0
>
> RMCP+ Cipher Suites : 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
>
> Cipher Suite Priv Max   : Xaa
>
> : X=Cipher Suite Unused
>
> : c=CALLBACK
>
> : u=USER
>
> : o=OPERATOR
>
> : a=ADMIN
>
> : O=OEM
>
>
> The IP is correct, and for set it indicates complete. However, I can not
> telnet to this IP (qhimem0004-bmc).
>
> Here is the hosts table entry for this node:
>
> "qhimem0004-bmc","172.29.10.14",,,"qhimem0014 node bmc interface",
>
> Here is the /etc/hosts file entry:
>
> 172.29.10.14 qhimem0004-bmc.quest.it.northwestern.edu qhimem0004-bmc
>
> here is the nodels entry for the ipmi port:
>
> [root@mgt log]# nodels qhimem0004 ipmi.bmcport
>
> qhimem0004: 0
>
> When trying to telnet to this address I get no route to host. I have
> logged in to the switch itself and see mac address of the interface show up
> on the port on the switch, but not the mac of the BMC port.
>
>
> Any help is greatly appreciated.
>
> Thanks,
>
> Damir
>
>
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
> --
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] frustration with booting of x3650m5

2015-08-14 Thread Damir Krstic
Hi Jarrod,

I'll try to upgrade to this firmware level in a bit. Re. meeting today,
here is 8:36 right now. I am available any time today. Any of the options
with addition of hangouts will/should work for me. If screen sharing does
not work, I can create an account for you and we can do terminal screen
share.

Thank you so much.
Damir

On Fri, Aug 14, 2015 at 7:49 AM Jarrod Johnson jjohns...@lenovo.com wrote:

 FYI, what we are using right now is:


 http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.bin

 (doc at
 http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.txt
 and changelog at
 http://download4.boulder.ibm.com/sar/CMA/XSA/lnvgy_fw_mpt3sas_n2200-1.07_linux_32-64.chg
 )



 Compared to your version, it contains a fix I have been suspecting to be
 related to your difficulties:



 Fixes:

 - Fixed issue where the system boot hangs when Legacy BIOS is disabled

   (using HII) on certain UEFI systems. (SCGCQ00637088)



 # sas3flash.x86_64 -list -c 0
 LSI Corporation SAS3 Flash Utility
 Version 07.00.00.00 (2014.08.14)
 Copyright (C) 2008-2014 LSI Corporation. All rights reserved

 Adapter Selected is a LSI SAS: SAS3008(C0)

 Controller Number : 0
 Controller : SAS3008(C0)
 PCI Address : 00:08:00:00
 SAS Address : 500605b-0-0812-5070
 NVDATA Version (Default) : 07.01.00.07
 NVDATA Version (Persistent) : 07.01.00.08
 Firmware Product ID : 0x2221 (IT)
 Firmware Version : 07.00.01.00
 NVDATA Vendor : LSI
 NVDATA Product ID : N2226 HBA
 BIOS Version : 08.15.00.00
 UEFI BSD Version : 08.00.00.00



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 5:43 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Sure call would be great

 On Thu, Aug 13, 2015 at 16:37 Jarrod Johnson jjohns...@lenovo.com wrote:

 Odd...  I don't anticipate driver issue (6.6 built in should suffice).
 I'll double check when I get back to office in 15 hours or so.  If you have
 time we can arrange a call to just look at it live...

 On Aug 13, 2015 5:07 PM, Damir Krstic damir.krs...@gmail.com wrote:

 firmware version: MPT3BIOS*8.07.01.00 (2013.11.15)

 I think in my last email I was not clear. With following disabled, system
 did boot but did not see any of the LUNs:



 DevicesandIOPorts.UEFI_Slot4 disable

 DevicesandIOPorts.UEFI_Slot1 disable

 DevicesandIOPorts.Legacy_Slot4 disable

 DevicesandIOPorts.Legacy_Slot1 disable

 So you are probably right about drivers but...I did little bit of
 searching on the internet for the drivers I would need for these cards, and
 hits suggest that I need following drivers (both are loaded according to
 modprobe):

 modprobe --list |grep mptsas

 kernel/drivers/message/fusion/mptsas.ko

 [root@qstorage23 ~]# !252

 modprobe --list |grep mptctl

 kernel/drivers/message/fusion/mptctl.ko

 Also this is what's showing in lspci of the booted system with these two
 HBAs:

 06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
 PCI-Express Fusion-MPT SAS-3 (rev 02)

 10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
 PCI-Express Fusion-MPT SAS-3 (rev 02)



 On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 I meant to query the current version…



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:48 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 it was looping again but with what seemed more of a delay before a reboot.
 model of the sas controller is LSI3008. if you need anything else, please
 let me know. I really appreciate all your help.



 Damir



 On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 If the symptom changed from looping to hanging, then I'll need the adapter
 model of the slot1/slot4 SAS cards to give precise guidance.  Probably a
 good idea to let me know that anyway.



 *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com]
 *Sent:* Thursday, August 13, 2015 4:37 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Does it loop failing to boot or does it hang trying to boot?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com
 damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:26 PM
 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 we initially disabled

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled

 and

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled

 With this setting system booted but did not see any LUNs.

 After your email, we re-enabled aforementioned settings and disabled
 following:

 DevicesandIOPorts.UEFI_Slot1=Disable

 DevicesandIOPorts.UEFI_Slot4=Disable

Re: [xcat-user] frustration with booting of x3650m5

2015-08-13 Thread Damir Krstic
Hi Jarrod,

Thanks so much for your reply. Here is the output of the command you
requested:

[xCAT Genesis running on qstorage24 /]# efibootmgr -v

Fatal: Couldn't open either sysfs or procfs directories for accessing EFI
variables.

Try 'modprobe efivars' as root.


I did try modprobe efivars but it tells me that module efivars is not
available.

Thanks,

Damir

On Thu, Aug 13, 2015 at 10:39 AM Jarrod Johnson jjohns...@lenovo.com
wrote:

 I'll try to do this through email, but may break down to a direct
 conversation (if you like).



 If you can 'nodeset shell' and boot the system to network, I'm interested
 in efibootmgr -v output.  Usually in a UEFI style boot, you'll get an
 entry like:

 Boot0009* Red Hat Enterprise Linux 6
 HD(1,800,19000,2737e48b-741f-461b-8ab1-7c7ea9ef8706)File(\EFI\redhat\grub.efi)



 If legacy booting and/or wanting the generic style options to work
 straightforward way without having to contend with external LUNs confusing
 things too much, the easiest thing to do would be to disable the boot
 support of the uefi/option rom for the adapter.  For example (adjust for
 your slot).



 n1: DevicesandIOPorts.UEFI_Slot1=Disable

 n1: DevicesandIOPorts.Legacy_Slot1=Disable

 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 11:15 AM
 *To:* xCAT Users Mailing list
 *Subject:* [xcat-user] frustration with booting of x3650m5



 I just finished installing couple of new NSD servers using RH6.6. After
 the installation they would not boot from the local hard drive(s) (LSI
 M5210 RAID1). I went into bios and added boot option generic and added
 hard disk 0 through 4 and the servers booted fine after that.



 However, after plugging in SAS cables to DCS3700 controller and zonning
 the LUNs on the 3700 servers are not booting (constant boot loop). I've
 added and removed generic boot option (hdd) and I've changed boot mode from
 UEFI to legacy and still can't get them to boot. It's been very frustrating
 morning to say the least.



 Anyone else experience anything like this on 3650M5?



 Thanks,

 Damir

 --
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] frustration with booting of x3650m5

2015-08-13 Thread Damir Krstic
Sure call would be great
On Thu, Aug 13, 2015 at 16:37 Jarrod Johnson jjohns...@lenovo.com wrote:

 Odd...  I don't anticipate driver issue (6.6 built in should suffice).
 I'll double check when I get back to office in 15 hours or so.  If you have
 time we can arrange a call to just look at it live...
 On Aug 13, 2015 5:07 PM, Damir Krstic damir.krs...@gmail.com wrote:

 firmware version: MPT3BIOS*8.07.01.00 (2013.11.15)
 I think in my last email I was not clear. With following disabled, system
 did boot but did not see any of the LUNs:

 DevicesandIOPorts.UEFI_Slot4 disable

 DevicesandIOPorts.UEFI_Slot1 disable

 DevicesandIOPorts.Legacy_Slot4 disable

 DevicesandIOPorts.Legacy_Slot1 disable

 So you are probably right about drivers but...I did little bit of
 searching on the internet for the drivers I would need for these cards, and
 hits suggest that I need following drivers (both are loaded according to
 modprobe):

 modprobe --list |grep mptsas

 kernel/drivers/message/fusion/mptsas.ko

 [root@qstorage23 ~]# !252

 modprobe --list |grep mptctl

 kernel/drivers/message/fusion/mptctl.ko

 Also this is what's showing in lspci of the booted system with these two
 HBAs:

 06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
 PCI-Express Fusion-MPT SAS-3 (rev 02)

 10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
 PCI-Express Fusion-MPT SAS-3 (rev 02)

 On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 I meant to query the current version…



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:48 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 it was looping again but with what seemed more of a delay before a reboot.
 model of the sas controller is LSI3008. if you need anything else, please
 let me know. I really appreciate all your help.



 Damir



 On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 If the symptom changed from looping to hanging, then I'll need the adapter
 model of the slot1/slot4 SAS cards to give precise guidance.  Probably a
 good idea to let me know that anyway.



 *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com]
 *Sent:* Thursday, August 13, 2015 4:37 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Does it loop failing to boot or does it hang trying to boot?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com
 damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:26 PM
 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 we initially disabled

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled

 and

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled

 With this setting system booted but did not see any LUNs.

 After your email, we re-enabled aforementioned settings and disabled
 following:

 DevicesandIOPorts.UEFI_Slot1=Disable

 DevicesandIOPorts.UEFI_Slot4=Disable

 DevicesandIOPorts.Legacy_Slot1=Disable

 DevicesandIOPorts.Legacy_Slot4=Disable

 And system now is again not booting.

 Thanks,

 Damir



 On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 Did you disable the slots or just the 'Legacy' and 'UEFI' items?  The
 'Legacy' and 'UEFI' items control how it can boot, but:

 x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable

 If that is not 'Enable' for the slot, then the OS won't see it either.



 So that should be 'Enable', and the other two things should be 'Disable'
 for easiest scenario.  Is this currently the situation?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 3:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 that worked in terms of getting the nsd server to boot but since both sas
 controllers are disabled, i can't see the LUNs. So how do I get this server
 to boot AND see all the LUNs i.e. have SAS HBAs enabled?



 thanks,

 Damir



 On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 So first is to get the asu utility:

 https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU



 There is 'pasu' frontend in latest latest xCAT that will work with that
 package.  If installing that rpm, try:

 pasu nodename show all



 As an example



 Failing that:

 asu64 --host immhostorip --username USERID --password Passw0rdhere show
 oll



 So I have a x3650 M5 here, I don't know which slots are installed where in
 yours.

 (edited)

 # pasu x1 show all|grep Slot
 x1: DevicesandIOPorts.UEFI_Slot1=Enable
 x1: DevicesandIOPorts.UEFI_Slot2=Enable
 x1: DevicesandIOPorts.UEFI_Slot3=Enable
 x1: DevicesandIOPorts.UEFI_Slot4=Enable
 x1: DevicesandIOPorts.UEFI_Slot5=Enable
 x1: DevicesandIOPorts.UEFI_Slot9=Enable
 x1: DevicesandIOPorts.Legacy_Slot1=Enable

Re: [xcat-user] frustration with booting of x3650m5

2015-08-13 Thread Damir Krstic
firmware version: MPT3BIOS*8.07.01.00 (2013.11.15)
I think in my last email I was not clear. With following disabled, system
did boot but did not see any of the LUNs:

DevicesandIOPorts.UEFI_Slot4 disable

DevicesandIOPorts.UEFI_Slot1 disable

DevicesandIOPorts.Legacy_Slot4 disable

DevicesandIOPorts.Legacy_Slot1 disable

So you are probably right about drivers but...I did little bit of searching
on the internet for the drivers I would need for these cards, and hits
suggest that I need following drivers (both are loaded according to
modprobe):

modprobe --list |grep mptsas

kernel/drivers/message/fusion/mptsas.ko

[root@qstorage23 ~]# !252

modprobe --list |grep mptctl

kernel/drivers/message/fusion/mptctl.ko

Also this is what's showing in lspci of the booted system with these two
HBAs:

06:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
PCI-Express Fusion-MPT SAS-3 (rev 02)

10:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS3008
PCI-Express Fusion-MPT SAS-3 (rev 02)

On Thu, Aug 13, 2015 at 3:54 PM Jarrod Johnson jjohns...@lenovo.com wrote:

 I meant to query the current version…



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:48 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 it was looping again but with what seemed more of a delay before a reboot.
 model of the sas controller is LSI3008. if you need anything else, please
 let me know. I really appreciate all your help.



 Damir



 On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 If the symptom changed from looping to hanging, then I'll need the adapter
 model of the slot1/slot4 SAS cards to give precise guidance.  Probably a
 good idea to let me know that anyway.



 *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com]
 *Sent:* Thursday, August 13, 2015 4:37 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Does it loop failing to boot or does it hang trying to boot?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com
 damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:26 PM
 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 we initially disabled

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled

 and

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled

 With this setting system booted but did not see any LUNs.

 After your email, we re-enabled aforementioned settings and disabled
 following:

 DevicesandIOPorts.UEFI_Slot1=Disable

 DevicesandIOPorts.UEFI_Slot4=Disable

 DevicesandIOPorts.Legacy_Slot1=Disable

 DevicesandIOPorts.Legacy_Slot4=Disable

 And system now is again not booting.

 Thanks,

 Damir



 On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 Did you disable the slots or just the 'Legacy' and 'UEFI' items?  The
 'Legacy' and 'UEFI' items control how it can boot, but:

 x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable

 If that is not 'Enable' for the slot, then the OS won't see it either.



 So that should be 'Enable', and the other two things should be 'Disable'
 for easiest scenario.  Is this currently the situation?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 3:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 that worked in terms of getting the nsd server to boot but since both sas
 controllers are disabled, i can't see the LUNs. So how do I get this server
 to boot AND see all the LUNs i.e. have SAS HBAs enabled?



 thanks,

 Damir



 On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 So first is to get the asu utility:

 https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU



 There is 'pasu' frontend in latest latest xCAT that will work with that
 package.  If installing that rpm, try:

 pasu nodename show all



 As an example



 Failing that:

 asu64 --host immhostorip --username USERID --password Passw0rdhere show
 oll



 So I have a x3650 M5 here, I don't know which slots are installed where in
 yours.

 (edited)

 # pasu x1 show all|grep Slot
 x1: DevicesandIOPorts.UEFI_Slot1=Enable
 x1: DevicesandIOPorts.UEFI_Slot2=Enable
 x1: DevicesandIOPorts.UEFI_Slot3=Enable
 x1: DevicesandIOPorts.UEFI_Slot4=Enable
 x1: DevicesandIOPorts.UEFI_Slot5=Enable
 x1: DevicesandIOPorts.UEFI_Slot9=Enable
 x1: DevicesandIOPorts.Legacy_Slot1=Enable
 x1: DevicesandIOPorts.Legacy_Slot2=Enable
 x1: DevicesandIOPorts.Legacy_Slot3=Enable
 x1: DevicesandIOPorts.Legacy_Slot4=Enable
 x1: DevicesandIOPorts.Legacy_Slot5=Enable
 x1: DevicesandIOPorts.Legacy_Slot9=Enable



 So let's say that my SAS hba was in slot5:

 # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable

 # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable



 Rinse and repeat per

Re: [xcat-user] frustration with booting of x3650m5

2015-08-13 Thread Damir Krstic
we initially disabled

DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled

and

DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled

With this setting system booted but did not see any LUNs.

After your email, we re-enabled aforementioned settings and disabled
following:

DevicesandIOPorts.UEFI_Slot1=Disable

DevicesandIOPorts.UEFI_Slot4=Disable

DevicesandIOPorts.Legacy_Slot1=Disable

DevicesandIOPorts.Legacy_Slot4=Disable

And system now is again not booting.

Thanks,

Damir

On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com wrote:

 Did you disable the slots or just the 'Legacy' and 'UEFI' items?  The
 'Legacy' and 'UEFI' items control how it can boot, but:

 x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable

 If that is not 'Enable' for the slot, then the OS won't see it either.



 So that should be 'Enable', and the other two things should be 'Disable'
 for easiest scenario.  Is this currently the situation?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 3:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 that worked in terms of getting the nsd server to boot but since both sas
 controllers are disabled, i can't see the LUNs. So how do I get this server
 to boot AND see all the LUNs i.e. have SAS HBAs enabled?



 thanks,

 Damir



 On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 So first is to get the asu utility:

 https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU



 There is 'pasu' frontend in latest latest xCAT that will work with that
 package.  If installing that rpm, try:

 pasu nodename show all



 As an example



 Failing that:

 asu64 --host immhostorip --username USERID --password Passw0rdhere show
 oll



 So I have a x3650 M5 here, I don't know which slots are installed where in
 yours.

 (edited)

 # pasu x1 show all|grep Slot
 x1: DevicesandIOPorts.UEFI_Slot1=Enable
 x1: DevicesandIOPorts.UEFI_Slot2=Enable
 x1: DevicesandIOPorts.UEFI_Slot3=Enable
 x1: DevicesandIOPorts.UEFI_Slot4=Enable
 x1: DevicesandIOPorts.UEFI_Slot5=Enable
 x1: DevicesandIOPorts.UEFI_Slot9=Enable
 x1: DevicesandIOPorts.Legacy_Slot1=Enable
 x1: DevicesandIOPorts.Legacy_Slot2=Enable
 x1: DevicesandIOPorts.Legacy_Slot3=Enable
 x1: DevicesandIOPorts.Legacy_Slot4=Enable
 x1: DevicesandIOPorts.Legacy_Slot5=Enable
 x1: DevicesandIOPorts.Legacy_Slot9=Enable



 So let's say that my SAS hba was in slot5:

 # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable

 # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable



 Rinse and repeat per relevant HBA.



 UEFI style boot is meant to simplify this scenario, but this is a way to
 make things back to as simple as they were before external block devices
 start mucking about.

 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 2:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 not sure how to do that - i have listed all of the asu options on the
 server and disabling non-raid sas adapters i am not sure how to do? do you
 have an example of 3650m5 with sas hba cards boot manager options and
 settings?



 thanks,

 damir



 On Thu, Aug 13, 2015 at 12:43 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 Ok, so it must have bios booted then….



 How about using asu to disable the boot firmware for the non-RAID SAS
 adapters?  That may simplify things back down to reason.



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 12:26 PM


 *To:* xCAT Users Mailing list

 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Hi Jarrod,



 Thanks so much for your reply. Here is the output of the command you
 requested:



 [xCAT Genesis running on qstorage24 /]# efibootmgr -v

 Fatal: Couldn't open either sysfs or procfs directories for accessing EFI
 variables.

 Try 'modprobe efivars' as root.



 I did try modprobe efivars but it tells me that module efivars is not
 available.

 Thanks,

 Damir



 On Thu, Aug 13, 2015 at 10:39 AM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 I'll try to do this through email, but may break down to a direct
 conversation (if you like).



 If you can 'nodeset shell' and boot the system to network, I'm interested
 in efibootmgr -v output.  Usually in a UEFI style boot, you'll get an
 entry like:

 Boot0009* Red Hat Enterprise Linux 6
 HD(1,800,19000,2737e48b-741f-461b-8ab1-7c7ea9ef8706)File(\EFI\redhat\grub.efi)



 If legacy booting and/or wanting the generic style options to work
 straightforward way without having to contend with external LUNs confusing
 things too much, the easiest thing to do would be to disable the boot
 support of the uefi/option rom for the adapter.  For example (adjust for
 your slot).



 n1: DevicesandIOPorts.UEFI_Slot1=Disable

 n1: DevicesandIOPorts.Legacy_Slot1=Disable

Re: [xcat-user] frustration with booting of x3650m5

2015-08-13 Thread Damir Krstic
it was looping again but with what seemed more of a delay before a reboot.
model of the sas controller is LSI3008. if you need anything else, please
let me know. I really appreciate all your help.

Damir

On Thu, Aug 13, 2015 at 3:44 PM Jarrod Johnson jjohns...@lenovo.com wrote:

 If the symptom changed from looping to hanging, then I'll need the adapter
 model of the slot1/slot4 SAS cards to give precise guidance.  Probably a
 good idea to let me know that anyway.



 *From:* Jarrod Johnson [mailto:jjohns...@lenovo.com]
 *Sent:* Thursday, August 13, 2015 4:37 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Does it loop failing to boot or does it hang trying to boot?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com
 damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 4:26 PM
 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 we initially disabled

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot1=Disabled

 and

 DevicesandIOPorts.EnableDisableOnboardDevices_Slot4=Disabled

 With this setting system booted but did not see any LUNs.

 After your email, we re-enabled aforementioned settings and disabled
 following:

 DevicesandIOPorts.UEFI_Slot1=Disable

 DevicesandIOPorts.UEFI_Slot4=Disable

 DevicesandIOPorts.Legacy_Slot1=Disable

 DevicesandIOPorts.Legacy_Slot4=Disable

 And system now is again not booting.

 Thanks,

 Damir



 On Thu, Aug 13, 2015 at 3:06 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 Did you disable the slots or just the 'Legacy' and 'UEFI' items?  The
 'Legacy' and 'UEFI' items control how it can boot, but:

 x1: DevicesandIOPorts.EnableDisableOnboardDevices_Slot2=Enable

 If that is not 'Enable' for the slot, then the OS won't see it either.



 So that should be 'Enable', and the other two things should be 'Disable'
 for easiest scenario.  Is this currently the situation?



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 3:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 that worked in terms of getting the nsd server to boot but since both sas
 controllers are disabled, i can't see the LUNs. So how do I get this server
 to boot AND see all the LUNs i.e. have SAS HBAs enabled?



 thanks,

 Damir



 On Thu, Aug 13, 2015 at 2:29 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 So first is to get the asu utility:

 https://www-947.ibm.com/support/entry/portal/docdisplay?lndocid=LNVO-ASU



 There is 'pasu' frontend in latest latest xCAT that will work with that
 package.  If installing that rpm, try:

 pasu nodename show all



 As an example



 Failing that:

 asu64 --host immhostorip --username USERID --password Passw0rdhere show
 oll



 So I have a x3650 M5 here, I don't know which slots are installed where in
 yours.

 (edited)

 # pasu x1 show all|grep Slot
 x1: DevicesandIOPorts.UEFI_Slot1=Enable
 x1: DevicesandIOPorts.UEFI_Slot2=Enable
 x1: DevicesandIOPorts.UEFI_Slot3=Enable
 x1: DevicesandIOPorts.UEFI_Slot4=Enable
 x1: DevicesandIOPorts.UEFI_Slot5=Enable
 x1: DevicesandIOPorts.UEFI_Slot9=Enable
 x1: DevicesandIOPorts.Legacy_Slot1=Enable
 x1: DevicesandIOPorts.Legacy_Slot2=Enable
 x1: DevicesandIOPorts.Legacy_Slot3=Enable
 x1: DevicesandIOPorts.Legacy_Slot4=Enable
 x1: DevicesandIOPorts.Legacy_Slot5=Enable
 x1: DevicesandIOPorts.Legacy_Slot9=Enable



 So let's say that my SAS hba was in slot5:

 # pasu x1 set DevicesandIOPorts.Legacy_Slot5 Disable

 # pasu x1 set DevicesandIOPorts.UEFI_Slot5 Disable



 Rinse and repeat per relevant HBA.



 UEFI style boot is meant to simplify this scenario, but this is a way to
 make things back to as simple as they were before external block devices
 start mucking about.

 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 2:59 PM


 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 not sure how to do that - i have listed all of the asu options on the
 server and disabling non-raid sas adapters i am not sure how to do? do you
 have an example of 3650m5 with sas hba cards boot manager options and
 settings?



 thanks,

 damir



 On Thu, Aug 13, 2015 at 12:43 PM Jarrod Johnson jjohns...@lenovo.com
 wrote:

 Ok, so it must have bios booted then….



 How about using asu to disable the boot firmware for the non-RAID SAS
 adapters?  That may simplify things back down to reason.



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Thursday, August 13, 2015 12:26 PM


 *To:* xCAT Users Mailing list

 *Subject:* Re: [xcat-user] frustration with booting of x3650m5



 Hi Jarrod,



 Thanks so much for your reply. Here is the output of the command you
 requested:



 [xCAT Genesis running on qstorage24 /]# efibootmgr -v

 Fatal: Couldn't open either sysfs or procfs directories for accessing EFI
 variables.

 Try 'modprobe efivars

[xcat-user] x3650 M5 Kickstart fails with no disks found

2015-08-10 Thread Damir Krstic
We are installing couple of brand new x3650 M5 servers using RHEL6.2
kickstart file. File has not been modified from default in any way.
Installation fails with no disks found even though we did configure RAID1
in bios.

Has anyone seen this issue?

Thanks,
Damir
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] x3650 M5 Kickstart fails with no disks found

2015-08-10 Thread Damir Krstic
Thanks Rich. Installing RH6.6 worked for us.

Damir

On Mon, Aug 10, 2015 at 12:27 PM Rich Sudlow r...@nd.edu wrote:

 On 08/10/2015 11:23 AM, Damir Krstic wrote:
  We are installing couple of brand new x3650 M5 servers using RHEL6.2
 kickstart
  file. File has not been modified from default in any way. Installation
 fails
  with no disks found even though we did configure RAID1 in bios.
 
  Has anyone seen this issue?

 I don't remember specifically this issue but we have had similar issues
 with
 hardware especially processors not being supported in older versions of
 RHEL
 (Like 6.2).  I'd suggest trying a newer version of RHELS like 6.6 or 6.7.

 Rich



 
  Thanks,
  Damir
 
 
 
 --
 
 
 
  ___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user
 


 --
 Rich Sudlow
 University of Notre Dame
 Center for Research Computing - Union Station
 506 W. South St
 South Bend, In 46601

 (574) 631-7258 (office)
 (574) 807-1046 (cell)


 --
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] systemimager-server missing on sourceforge

2015-07-29 Thread Damir Krstic
We are hoping to use image clone to deploy our stateful nodes (gpu). Trying
to install systemimager-server is giving us an error:

Downloading Packages:

https://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/perl-AppConfig-1.52-4.noarch.rpm:
[Errno 12] Timeout on
http://master.dl.sourceforge.net/project/xcat/yum/xcat-dep/rh6/x86_64/perl-AppConfig-1.52-4.noarch.rpm:
(28, 'connect() timed out!')

Trying other mirror.

https://sourceforge.net/projects/xcat/files/yum/xcat-dep/rh6/x86_64/systemconfigurator-2.2.11-1.noarch.rpm:
[Errno 14] PYCURL ERROR 7 - couldn't connect to host

Trying other mirror.


I am guessing this is related to the sourceforge outage from last week. Is
there another way of installing required packages?


Thanks,

Damir
--
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] best way to populate nodepos table

2015-07-10 Thread Damir Krstic
Thank you all so much.
On Fri, Jul 10, 2015 at 09:40 Jarrod Johnson jjohns...@lenovo.com wrote:

  Oh, btw, does nodels nodes vpd.serial give you what you expect?



 If you want to do it, there's not a particularly well built in, but to
 make a script that would do it:



 # rinv n2-n4 serial|grep System | sed -e 's/^/nodech /' -e 's/: System
 Serial Number: / nodepos.comments=/'

 nodech n3 nodepos.comments=06CAWPX

 nodech n4 nodepos.comments=06CAWPY

 nodech n2 nodepos.comments=06CAWPW



 Redirect that to a shell script and the shell script should do it's thing.

 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]

 *Sent:* Friday, July 10, 2015 10:00 AM

 *To:* xCAT Users Mailing list

 *Subject:* [xcat-user] best way to populate nodepos table



 As of today we are not using nodepos table for anything. I started
 experimenting and while adding nodepos.rack, nodepos.chassis, and
 nodepos.height values are easy to add using nodech command, things that are
 different between nodes are not that easy to enter without a lot of manual
 data entry.



 My table currently looks like this:



 #node,rack,u,chassis,slot,room,height,comments,disable

 qnode5001,w22,1,qfpc01,left,,1u,,

 qnode5002,w22,1,qfpc01,right,,1u,,

 qnode5003,w22,2,qfpc01,left,,1u,,

 qnode5004,w22,2,qfpc01,right,,1u,,



 w22 is the floor position of the rack, slot is u 1 in the rack and left or
 right indicates where in the chassis node resides.

 Is there any way to automate slot and u position using nodech command?
 Also, under comments in this table I would like to read-in rinv serial
 number command but I am unsure how to do that.

 Any help would be greatly appreciated.

 Thanks,

 Damir

 --
 Don't Limit Your Business. Reach for the Cloud.
 GigeNET's Cloud Solutions provide you with the tools and support that
 you need to offload your IT needs and focus on growing your business.
 Configured For All Businesses. Start Your Cloud Today.
 https://www.gigenetcloud.com/
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] best way to populate nodepos table

2015-07-10 Thread Damir Krstic
As of today we are not using nodepos table for anything. I started
experimenting and while adding nodepos.rack, nodepos.chassis, and
nodepos.height values are easy to add using nodech command, things that are
different between nodes are not that easy to enter without a lot of manual
data entry.

My table currently looks like this:

#node,rack,u,chassis,slot,room,height,comments,disable

qnode5001,w22,1,qfpc01,left,,1u,,

qnode5002,w22,1,qfpc01,right,,1u,,

qnode5003,w22,2,qfpc01,left,,1u,,

qnode5004,w22,2,qfpc01,right,,1u,,


w22 is the floor position of the rack, slot is u 1 in the rack and left or
right indicates where in the chassis node resides.

Is there any way to automate slot and u position using nodech command?
Also, under comments in this table I would like to read-in rinv serial
number command but I am unsure how to do that.

Any help would be greatly appreciated.

Thanks,

Damir
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] how to exclude some ofed packages being installed with genimage

2015-06-29 Thread Damir Krstic
We are following this guide to install OFED in our compute image and it's
been working great (
http://sourceforge.net/p/xcat/wiki/Managing_the_Mellanox_Infiniband_Network/
).

We just heard from our customer and they would like to remove MPI versions
that come installed with OFED. Is there a way to specify what to
exclude/uninstall from OFED during the genimage command?

Thanks,
Damir
--
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] NextScale deployment kernel crash

2015-06-25 Thread Damir Krstic
We are trying to boot NextScale nodes with our RedHat 6.4 stateless image.
They are crashing during the initrd boot process with following error:

dracut Warning: No root device 1 found


dracut Warning: Boot has failed. To debug this issue add rdshell to the
kernel command line.


dracut Warning: Signal caught!



dracut Warning: Boot has failed. To debug this issue add rdshell to the
kernel command line.

Kernel panic - not syncing: Attempted to kill init!

Pid: 1, comm: init Tainted: G   --- H
2.6.32-358.el6.x86_64 #1

Call Trace:

 [8150cfc8] ? panic+0xa7/0x16f

 [81073ae2] ? do_exit+0x862/0x870

 [81182885] ? fput+0x25/0x30

 [81073b48] ? do_group_exit+0x58/0xd0

 [81073bd7] ? sys_exit_group+0x17/0x20

 [8100b072] ? system_call_fastpath+0x16/0x1b

[ cut here ]

WARNING: at arch/x86/kernel/smp.c:117
native_smp_send_reschedule+0x5c/0x60() (Tainted: G
--- H )

Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]-

Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded:
scsi_wait_scan]

Pid: 1, comm: init Tainted: G   --- H
2.6.32-358.el6.x86_64 #1

Call Trace:

 IRQ  [8106e2e7] ? warn_slowpath_common+0x87/0xc0

 [8106e33a] ? warn_slowpath_null+0x1a/0x20

 [8102dd9c] ? native_smp_send_reschedule+0x5c/0x60

 [8105ae28] ? scheduler_tick+0x208/0x260

 [810a7fd0] ? tick_sched_timer+0x0/0xc0

 [810811de] ? update_process_times+0x6e/0x90

 [810a8036] ? tick_sched_timer+0x66/0xc0

 [8109b38e] ? __run_hrtimer+0x8e/0x1a0

 [810a182f] ? ktime_get_update_offsets+0x4f/0xd0

 [8107700f] ? __do_softirq+0x11f/0x1e0

 [8109b6f6] ? hrtimer_interrupt+0xe6/0x260

 [81516d7b] ? smp_apic_timer_interrupt+0x6b/0x9b

 [8100bb93] ? apic_timer_interrupt+0x13/0x20

 EOI  [8150d06d] ? panic+0x14c/0x16f

 [8150cffa] ? panic+0xd9/0x16f

 [81073ae2] ? do_exit+0x862/0x870

 [81182885] ? fput+0x25/0x30

 [81073b48] ? do_group_exit+0x58/0xd0

 [81073bd7] ? sys_exit_group+0x17/0x20

 [8100b072] ? system_call_fastpath+0x16/0x1b


Any help would be appreciated.


Thanks,

Damir
--
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical  virtual servers, alerts via email  sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] NextScale deployment kernel crash

2015-06-25 Thread Damir Krstic
We just got it working by building RedHat 6.5 image. During boot we see it
using tg3 driver.

Thanks,
Damir

On Thu, Jun 25, 2015 at 12:38 PM Jarrod Johnson jjohns...@lenovo.com
wrote:

  What nic driver was built in the initrd?  m4 was igb, m5 uses tg3.



  extra unusable Ethernet ports on the motherboard that mess up the
 interface naming. Is there a workaround for this???



 I'm interested in what this means and if I can help on that.



 *From:* David Johnson [mailto:david_john...@brown.edu]
 *Sent:* Thursday, June 25, 2015 11:30 AM
 *To:* xCAT Users Mailing list
 *Subject:* Re: [xcat-user] NextScale deployment kernel crash



 Yes, we are seeing exactly the same problem. 300 nodes from nehalem to
 nextscale m4 all work fine with the same centos 6.5 image, but not so for
 the the Lenovo nextscale M5 nodes. They seem to have extra unusable
 Ethernet ports on the motherboard that mess up the interface naming. Is
 there a workaround for this???

   -- ddj

 Dave Johnson


 On Jun 25, 2015, at 10:49 AM, Damir Krstic damir.krs...@gmail.com wrote:

  We are trying to boot NextScale nodes with our RedHat 6.4 stateless
 image. They are crashing during the initrd boot process with following
 error:



 dracut Warning: No root device 1 found



 dracut Warning: Boot has failed. To debug this issue add rdshell to the
 kernel command line.



 dracut Warning: Signal caught!





 dracut Warning: Boot has failed. To debug this issue add rdshell to the
 kernel command line.

 Kernel panic - not syncing: Attempted to kill init!

 Pid: 1, comm: init Tainted: G   --- H
 2.6.32-358.el6.x86_64 #1

 Call Trace:

  [8150cfc8] ? panic+0xa7/0x16f

  [81073ae2] ? do_exit+0x862/0x870

  [81182885] ? fput+0x25/0x30

  [81073b48] ? do_group_exit+0x58/0xd0

  [81073bd7] ? sys_exit_group+0x17/0x20

  [8100b072] ? system_call_fastpath+0x16/0x1b

 [ cut here ]

 WARNING: at arch/x86/kernel/smp.c:117
 native_smp_send_reschedule+0x5c/0x60() (Tainted: G
 --- H )

 Hardware name: IBM NeXtScale nx360 M5: -[5465AC1]-

 Modules linked in: sd_mod crc_t10dif ahci mlx4_core [last unloaded:
 scsi_wait_scan]

 Pid: 1, comm: init Tainted: G   --- H
 2.6.32-358.el6.x86_64 #1

 Call Trace:

  IRQ  [8106e2e7] ? warn_slowpath_common+0x87/0xc0

  [8106e33a] ? warn_slowpath_null+0x1a/0x20

  [8102dd9c] ? native_smp_send_reschedule+0x5c/0x60

  [8105ae28] ? scheduler_tick+0x208/0x260

  [810a7fd0] ? tick_sched_timer+0x0/0xc0

  [810811de] ? update_process_times+0x6e/0x90

  [810a8036] ? tick_sched_timer+0x66/0xc0

  [8109b38e] ? __run_hrtimer+0x8e/0x1a0

  [810a182f] ? ktime_get_update_offsets+0x4f/0xd0

  [8107700f] ? __do_softirq+0x11f/0x1e0

  [8109b6f6] ? hrtimer_interrupt+0xe6/0x260

  [81516d7b] ? smp_apic_timer_interrupt+0x6b/0x9b

  [8100bb93] ? apic_timer_interrupt+0x13/0x20

  EOI  [8150d06d] ? panic+0x14c/0x16f

  [8150cffa] ? panic+0xd9/0x16f

  [81073ae2] ? do_exit+0x862/0x870

  [81182885] ? fput+0x25/0x30

  [81073b48] ? do_group_exit+0x58/0xd0

  [81073bd7] ? sys_exit_group+0x17/0x20

  [8100b072] ? system_call_fastpath+0x16/0x1b



 Any help would be appreciated.



 Thanks,

 Damir


 --
 Monitor 25 network devices or servers for free with OpManager!
 OpManager is web-based network management software that monitors
 network devices and physical  virtual servers, alerts via email  sms
 for fault. Monitor 25 devices for free with no restriction. Download now
 http://ad.doubleclick.net/ddm/clk/292181274;119417398;o

  ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


 --
 Monitor 25 network devices or servers for free with OpManager!
 OpManager is web-based network management software that monitors
 network devices and physical  virtual servers, alerts via email  sms
 for fault. Monitor 25 devices for free with no restriction. Download now
 http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical  virtual servers, alerts via email  sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398

[xcat-user] how to remove service node

2015-05-08 Thread Damir Krstic
what is the proper procedure for removing xCAT service node from xCAT.  We
have 3 service nodes in production right now, and I am planning on retiring
one of them in next couple of weeks.

None of the compute nodes in the cluster are set to boot from this service
node any longer.

Any help is appreciated.
Damir
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] deploying new xcat management node

2015-04-30 Thread Damir Krstic
Hi Christian,

Good to hear from you - I hope you are doing well.

Existing mgt node is running RH5.3 and xCAT 2.7.  New management node will
run RH6.5 and xCAT 2.9.

Thanks,
Damir

On Wed, Apr 29, 2015 at 9:22 AM Christian Caruthers ccaruth...@lenovo.com
wrote:

  Damir,



 What version are you currently running? Will the new MN run the latest
 xCAT version?



 The main question woud be if the newer version of xCAT can import tables
 from an older version. One possible way around this is to install your
 current xCAT version on a VM, import your tables, upgrade xCAT on the VM,
 and export the tables from there for import into your new management node.
 This avoids touching your existing (working) MN and should provide you
 tables with all the right fields that the new version will recognize when
 they're imported.



 That said, I can't remember the last time I had a problem upgrading xCAT
 with working tables in place. Still, I haven't upgraded from something like
 2.6, or earlier, to 2.9!



 If your new MN uses different network interfaces (ie. if the old MN had
 the compute network on eth0 and the new one has it on eth1) make sure you
 update the networks table as well as possibly the site table
 (dhcpinterfaces) and possibly, though not likely, the nics  hosts tables.



 Regards,
 *Christian Caruthers*
 Senior Consultant - System x Linux HPC
 Mobile: 757-289-9872



 *From:* Damir Krstic [mailto:damir.krs...@gmail.com]
 *Sent:* Wednesday, April 29, 2015 9:18 AM
 *To:* xCAT Users Mailing list
 *Subject:* [xcat-user] deploying new xcat management node



 We are planning on deploying a new management node on our iDataPlex
 cluster soon.  I've asked if there is a document that outlines migrating to
 new management node and was pointed to this document:



 http://sourceforge.net/p/xcat/wiki/Setup_HA_Mgmt_Node_With_Shared_Data/



 However, I don't think this applies since we are going from RH5.3 on the
 existing management node to RH6.5 and the xcat versions will also be
 different.



 Has anyone migrated from one management node to a new management node with
 different OS and xCAT versions?



 Damir

 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] problem with bmc programming

2015-04-17 Thread Damir Krstic
Christian I'll try you suggestions. Thanks.

Daniel switch does not show anything connected to that port.
Damir
On Fri, Apr 17, 2015 at 04:31 Daniel Letai d...@letai.org.il wrote:

 What does the switch shows as connected to that port?

 On Thu, Apr 16, 2015 at 10:12 PM, Christian Caruthers
 ccaruth...@lenovo.com wrote:
  Damir,
 
 
 
  I can think of 3 troubleshooting routes:
 
 
 
  1. Load factory defaults, boot the system to the genesis kernel (nodeset
  NODE shell) and run bmcsetup
 
  2. Pull power from the box and plug it back in to reboot the IMM.
 
  3. Create Bootable Media Creator thumb drive and force it to flash the
 IMM.
 
 
 
  If none of that works, you might need to open a service call to replace
 the
  system board. Pull a DSA because they'll probably ask for it.
 
 
 
  Regards,
  Christian Caruthers
  Senior Consultant - System x Linux HPC
  Mobile: 757-289-9872
 
 
 
  From: Damir Krstic [mailto:damir.krs...@gmail.com]
  Sent: Wednesday, April 15, 2015 2:22 PM
  To: xCAT Users Mailing list
  Subject: [xcat-user] problem with bmc programming
 
 
 
  one of our new nodes was just provisioned and I am having an issue
  programming bmc.  we are using dedicated imm port on this 3650m4 server.
  imm port is plugged in to a switch with single vlan.
 
 
 
  imm interface is configured with following settings:
 
  IP Address Source   : Static Address
 
  IP Address  : 172.29.9.1
 
  Subnet Mask : 255.255.0.0
 
  MAC Address : 40:f2:e9:cd:bf:df
 
  SNMP Community String   : public
 
  Here is the picture of the actual imm settings in the uefi
 
 
 
  i can't ping/telnet this interface at all.  tcpdump basically shows me
 that
  the management node is asking who has the mac address of this node.  i
 have
  logged in to the switch itself and this mac is not showing in the mac
 table
  on the switch.
 
  other interfaces (non imm) that are configured on this server and
 plugged in
  to the same switch function properly and are accessible with ssh/telnet
 etc.
 
  any help is appreciated.
 
  damir
 
 
 
 --
  BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
  Develop your own process in accordance with the BPMN 2 standard
  Learn Process modeling best practices with Bonita BPM through live
 exercises
  http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
  source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
  ___
  xCAT-user mailing list
  xCAT-user@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/xcat-user
 


 --
 BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
 Develop your own process in accordance with the BPMN 2 standard
 Learn Process modeling best practices with Bonita BPM through live
 exercises
 http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual-
 event?utm_
 source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user

--
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15utm_medium=emailutm_campaign=VA_SF___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] migrating to a new management node

2014-12-08 Thread Damir Krstic
We are hoping to retire our original management node in next couple of
months.  Is there a documented way to migrate from existing production xCAT
management node to a brand new one?

Thanks,
Damir
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration  more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready

2014-07-17 Thread Damir Krstic
would this work:

nodech quser10 noderes.installnic=


?


On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson jarrod.b.john...@gmail.com
 wrote:

 What happens if you blank installnic?  If not set it will autodetect and
 the result may surprise you. I recommend never setting installnic or
 primarynic on x86 anymore, since the autodetect works as desired 99.9% of
 the time.
 On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote:

 we have 4 new login nodes that i am trying to deploy in next couple of
 days.  they were autodiscovered (have mac in the mac table) and i have
 trying to installed them now:

 nodeset quser10 install

 the installation stops at the following:

 NetworkManager: eth0 link is not ready
 eth0 deactivating device
 (screenshot included)

 lsdef of the node itself:

 Object name: quser10

 arch=x86_64

 bmc=quser10-bmc

 bmcpassword=PASSW0RD

 bmcport=0

 bmcusername=USERID

 currchain=boot

 currstate=install rhels6.2-x86_64-user6

 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all

 initrd=xcat/rhels6.2/x86_64/initrd.img

 installnic=eth0

 ip=172.20.4.10

 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10
 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6

 kernel=xcat/rhels6.2/x86_64/vmlinuz

 mac=40:f2:e9:ce:e2:8a

 mgt=ipmi

 mtm=7914AC1

 netboot=pxe

 nfsserver=172.20.0.1

 os=rhels6.2

 postbootscripts=otherpkgs,setupntp


 postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib

 primarynic=eth0

 profile=user6

 provmethod=install

 serial=06ATFXT

 serialport=0

 serialspeed=115200

 status=configuring

 statustime=07-16-2014 14:29:10

 supportedarchs=x86,x86_64

 switch=bnt103

 switchinterface=eth0

 switchport=1

 switchvlan=1

 tftpserver=172.20.0.1

 xcatmaster=172.20.0.1


 xcat version:
 [root@mgt rh]# xcatconfig --version
 Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012)

 We will be deploying a new management node with updated xCAT as soon as
 the login nodes are provisioned.
 Thanks in advance for your help.


 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready

2014-07-17 Thread Damir Krstic
I did the above nodech command and then did nodeset quser10 install and it
still timed out with same message.

damir


On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic damir.krs...@gmail.com
wrote:

 would this work:

 nodech quser10 noderes.installnic=


 ?


 On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson 
 jarrod.b.john...@gmail.com wrote:

 What happens if you blank installnic?  If not set it will autodetect and
 the result may surprise you. I recommend never setting installnic or
 primarynic on x86 anymore, since the autodetect works as desired 99.9% of
 the time.
 On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote:

 we have 4 new login nodes that i am trying to deploy in next couple of
 days.  they were autodiscovered (have mac in the mac table) and i have
 trying to installed them now:

 nodeset quser10 install

 the installation stops at the following:

 NetworkManager: eth0 link is not ready
 eth0 deactivating device
 (screenshot included)

 lsdef of the node itself:

 Object name: quser10

 arch=x86_64

 bmc=quser10-bmc

 bmcpassword=PASSW0RD

 bmcport=0

 bmcusername=USERID

 currchain=boot

 currstate=install rhels6.2-x86_64-user6

 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all

 initrd=xcat/rhels6.2/x86_64/initrd.img

 installnic=eth0

 ip=172.20.4.10

 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10
 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6

 kernel=xcat/rhels6.2/x86_64/vmlinuz

 mac=40:f2:e9:ce:e2:8a

 mgt=ipmi

 mtm=7914AC1

 netboot=pxe

 nfsserver=172.20.0.1

 os=rhels6.2

 postbootscripts=otherpkgs,setupntp


 postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib

 primarynic=eth0

 profile=user6

 provmethod=install

 serial=06ATFXT

 serialport=0

 serialspeed=115200

 status=configuring

 statustime=07-16-2014 14:29:10

 supportedarchs=x86,x86_64

 switch=bnt103

 switchinterface=eth0

 switchport=1

 switchvlan=1

 tftpserver=172.20.0.1

 xcatmaster=172.20.0.1


 xcat version:
 [root@mgt rh]# xcatconfig --version
 Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012)

 We will be deploying a new management node with updated xCAT as soon as
 the login nodes are provisioned.
 Thanks in advance for your help.


 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready

2014-07-17 Thread Damir Krstic
i think i figured it out - in the

/tftpboot/pxelinux.cfg/quser10 i changed ksdevice from eth0 to
bootif...when the node finished installing eth2 was configured with the
node ip address...

what i think happened is - since these nodes have daughter eth card, linux
saw the card that was plugged in as eth2 instead of eth0...telling pxe
configuration file not to use eth0 seemed to have worked.  i just tested it
on another login node and it works.


thanks,

damir


On Thu, Jul 17, 2014 at 10:45 AM, Damir Krstic damir.krs...@gmail.com
wrote:

 I did the above nodech command and then did nodeset quser10 install and it
 still timed out with same message.

 damir


 On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic damir.krs...@gmail.com
 wrote:

 would this work:

 nodech quser10 noderes.installnic=


 ?


 On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson 
 jarrod.b.john...@gmail.com wrote:

 What happens if you blank installnic?  If not set it will autodetect and
 the result may surprise you. I recommend never setting installnic or
 primarynic on x86 anymore, since the autodetect works as desired 99.9% of
 the time.
 On Jul 17, 2014 10:05 AM, Damir Krstic damir.krs...@gmail.com wrote:

 we have 4 new login nodes that i am trying to deploy in next couple of
 days.  they were autodiscovered (have mac in the mac table) and i have
 trying to installed them now:

 nodeset quser10 install

 the installation stops at the following:

 NetworkManager: eth0 link is not ready
 eth0 deactivating device
 (screenshot included)

 lsdef of the node itself:

 Object name: quser10

 arch=x86_64

 bmc=quser10-bmc

 bmcpassword=PASSW0RD

 bmcport=0

 bmcusername=USERID

 currchain=boot

 currstate=install rhels6.2-x86_64-user6

 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all

 initrd=xcat/rhels6.2/x86_64/initrd.img

 installnic=eth0

 ip=172.20.4.10

 kcmdline=nofb utf8 ks=http://172.20.0.1/install/autoinst/quser10
 ksdevice=eth0 console=tty0 console=ttyS0,115200 noipv6

 kernel=xcat/rhels6.2/x86_64/vmlinuz

 mac=40:f2:e9:ce:e2:8a

 mgt=ipmi

 mtm=7914AC1

 netboot=pxe

 nfsserver=172.20.0.1

 os=rhels6.2

 postbootscripts=otherpkgs,setupntp


 postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib

 primarynic=eth0

 profile=user6

 provmethod=install

 serial=06ATFXT

 serialport=0

 serialspeed=115200

 status=configuring

 statustime=07-16-2014 14:29:10

 supportedarchs=x86,x86_64

 switch=bnt103

 switchinterface=eth0

 switchport=1

 switchvlan=1

 tftpserver=172.20.0.1

 xcatmaster=172.20.0.1


 xcat version:
 [root@mgt rh]# xcatconfig --version
 Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012)

 We will be deploying a new management node with updated xCAT as soon as
 the login nodes are provisioned.
 Thanks in advance for your help.


 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user




--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] trying to install new 3650m4 eth0 link is not ready

2014-07-17 Thread Damir Krstic
yes - i got around it by editing node file under /tftpboot/pxelinux.cfg
(see previous email)...instead of ksdevice=eth0 i did
ksdevice=bootif...this worked.  you are absolutely right that since this
node has additional card with 2 10GbE ports in it, RH was confusing eth0
for something else.  node is booted/installed now and eth2 is configured
with node's ip address.

i tried jarrod's suggestion of blanking out installnic= but did not try
blanking out primarynic...i'll try that on another login node later today.
 i have 4 to deploy and 2 are already done by editing ksdevice
statement...i'll try other two by blanking out both installnic and
primarynic


On Thu, Jul 17, 2014 at 11:34 AM, Christian Caruthers 
christian.caruth...@us.ibm.com wrote:

 Under /tftpboot/pxelinux.cfg there should be a file named for the node
 you're trying to install. This file contains the kickstart boot command
 that's passed to the system in response to its PXE request. Can you send
 the contents of that file? Also, does this node have 10Gb ports, or any
 additional PCI Ethernet cards in it? If so, Red Hat more than likely sees
 port 1 on this card as eth0 while the system BIOS (or uEFI or whatever)
 sees the planar port 1 as eth0. Clearing out installnic and prinic help get
 around this, Where you're install is failing, the network device Network
 Manager is trying to initialize is dictated by the ksdevice option in the
 file I mentioned above.

 Regards,
 * Christian Caruthers*
 Senior Consultant - System x Linux HPC
 Mobile: 757-289-9872
 *Find me on LinkedIn*
 http://www.linkedin.com/profile/view?id=14378571trk=tab_pro



 From:Damir Krstic damir.krs...@gmail.com
 To:xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date:2014-07-17 11:51
 Subject:Re: [xcat-user] trying to install new 3650m4 eth0 link is
 not ready
 --



 I did the above nodech command and then did nodeset quser10 install and it
 still timed out with same message.

 damir


 On Thu, Jul 17, 2014 at 10:37 AM, Damir Krstic *damir.krs...@gmail.com*
 damir.krs...@gmail.com wrote:
 would this work:

 nodech quser10 noderes.installnic=

 ?



 On Thu, Jul 17, 2014 at 10:28 AM, Jarrod Johnson 
 *jarrod.b.john...@gmail.com* jarrod.b.john...@gmail.com wrote:
 What happens if you blank installnic?  If not set it will autodetect and
 the result may surprise you. I recommend never setting installnic or
 primarynic on x86 anymore, since the autodetect works as desired 99.9% of
 the time.

 On Jul 17, 2014 10:05 AM, Damir Krstic *damir.krs...@gmail.com*
 damir.krs...@gmail.com wrote:
 we have 4 new login nodes that i am trying to deploy in next couple of
 days.  they were autodiscovered (have mac in the mac table) and i have
 trying to installed them now:

 nodeset quser10 install

 the installation stops at the following:

 NetworkManager: eth0 link is not ready
 eth0 deactivating device
 (screenshot included)

 lsdef of the node itself:

 Object name: quser10

 arch=x86_64

 bmc=quser10-bmc

 bmcpassword=PASSW0RD

 bmcport=0

 bmcusername=USERID

 currchain=boot

 currstate=install rhels6.2-x86_64-user6

 groups=user6,user6-profile,ipmi,bnt103-user6,x3650m2,all

 initrd=xcat/rhels6.2/x86_64/initrd.img

 installnic=eth0

 ip=172.20.4.10

 kcmdline=nofb utf8 ks=*http://172.20.0.1/install/autoinst/quser10*
 http://172.20.0.1/install/autoinst/quser10 ksdevice=eth0 console=tty0
 console=ttyS0,115200 noipv6

 kernel=xcat/rhels6.2/x86_64/vmlinuz

 mac=40:f2:e9:ce:e2:8a

 mgt=ipmi

 mtm=7914AC1

 netboot=pxe

 nfsserver=172.20.0.1

 os=rhels6.2

 postbootscripts=otherpkgs,setupntp


 postscripts=syslog,remoteshell,syncfiles,syslog-adminnodes,ssh,ifcfg-eth,fstab,passwd,statefull_tasks6,ipoib

 primarynic=eth0

 profile=user6

 provmethod=install

 serial=06ATFXT

 serialport=0

 serialspeed=115200

 status=configuring

 statustime=07-16-2014 14:29:10

 supportedarchs=x86,x86_64

 switch=bnt103

 switchinterface=eth0

 switchport=1

 switchvlan=1

 tftpserver=172.20.0.1

 xcatmaster=172.20.0.1

 xcat version:
 [root@mgt rh]# xcatconfig --version
 Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012)

 We will be deploying a new management node with updated xCAT as soon as
 the login nodes are provisioned.

 Thanks in advance for your help.


 --
 Want fast and easy access to all the code in your enterprise? Index and
 search up to 200,000 lines of code with a free copy of Black Duck
 Code Sight - the same software that powers the world's largest code
 search on Ohloh, the Black Duck Open Hub! Try it now.
 *http://p.sf.net/sfu/bds* http://p.sf.net/sfu/bds
 ___
 xCAT-user mailing list
 *xCAT-user@lists.sourceforge.net* xCAT-user@lists.sourceforge.net

Re: [xcat-user] problem installing new node

2013-11-27 Thread Damir Krstic
version is Version 2.7.3 (svn r13117, built Mon Jun 18 05:12:28 EDT 2012)

trying it now without installnic and primarynic specified.


On Tue, Nov 26, 2013 at 4:04 PM, Ling Gao ling...@us.ibm.com wrote:

 What version of xCAT are you using?  (xdsh -V)
 Can you change  netboot=xnba and change installnic and primarynic to
 empty?  Then run nodeset again and redeploy the node.

 Ling



 From:Damir Krstic damir.krs...@gmail.com
 To:xCAT Users Mailing list xcat-user@lists.sourceforge.net,
 Date:11/26/2013 04:37 PM
 Subject:Re: [xcat-user] problem installing new node
 --



 Verified the static/dynamic overlapping and that does not seem to be a
 problem.  Also double checked the ip/mac and they are not
 duplicated/overlapping.

 Here is the lsdef of one of the problematic nodes:
 bject name: ttlogin01
 arch=x86_64
 bmc=ttlogin01-bmc
 bmcpassword=PASSW0RD
 bmcusername=USERID
 currchain=boot
 currstate=install rhels6.2-x86_64-ttlogin6
 groups=ttlogin6,ttlogin6-profile,ipmiB,x3650m2,ttlogin,all
 initrd=xcat/rhels6.2/x86_64/initrd.img
 installnic=eth0
 ip=172.20.7.1
 kcmdline=nofb utf8 
 ks=*http://172.20.0.1/install/autoinst/ttlogin01*http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0
  console=tty0 console=ttyS0,115200 noipv6
 kernel=xcat/rhels6.2/x86_64/vmlinuz
 mac=40:f2:e9:0d:e2:64
 mgt=ipmi
 mtm=7914AC1
 netboot=pxe
 nfsserver=172.20.0.1
 os=rhels6.2
 postbootscripts=otherpkgs,setupntp
 postscripts=syslog,remoteshell,syncfiles
 primarynic=eth0
 profile=ttlogin6
 provmethod=install
 serial=KQ0GV1M
 serialport=0
 serialspeed=115200
 status=installing
 statustime=11-26-2013 12:18:42
 supportedarchs=x86,x86_64
 switch=bnt101
 switchinterface=eth0
 switchport=29
 switchvlan=1
 tftpserver=172.20.0.1
 xcatmaster=172.20.0.1


 On Tue, Nov 26, 2013 at 3:10 PM, Russell Jones 
 *russell-l...@jonesmail.me* russell-l...@jonesmail.me wrote:
 Verify you do not have dynamic and static networks overlapping for that
 network definition. Also verify you have configured the correct MAC address
 for that node in xcat and do not have overlapping MACs/IPs.

 What does an lsdef for one of the problem nodes look like?



 On 11/26/2013 2:54 PM, Damir Krstic wrote:
 We have couple of new x3550m4 that are not installing.  Basically after
 BMC has been programmed and nodes have been set to install, and for some
 reason, pxe boot process never goes beyond serving pxelinux.0 (please see
 the log file below:

 ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via
 bond0
 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 
 *172.20.7.1:1929*http://172.20.7.1:1929/
 Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148
 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 
 *172.20.7.1:1930*http://172.20.7.1:1930/
 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
 Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to 
 *172.20.7.1:1931*http://172.20.7.1:1931/
 Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting

 Here is the tcpdump from the management node when this happens:
 14:33:20.626124 IP (tos 0x0, ttl  64, id 50528, offset 0, flags [none],
 proto: UDP (17), length: 68) new-node.informatik-lm  mgt node: [udp
 sum ok]  40 RRQ pxelinux.0 octet tsize 0 blksize 1468

 in the /tftpboot/pxelinux.cfg directory we have a directory that
 corresponds to the hex of the ip for the new node:

 [root@mgt pxelinux.cfg]# ls -lrt AC140701
 lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01

 here is the content of the file:
 root@mgt pxelinux.cfg]# cat ttlogin01
 #install rhels6.2-x86_64-ttlogin6
 DEFAULT xCAT
 LABEL xCAT
  KERNEL xcat/rhels6.2/x86_64/vmlinuz
  APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo=
 *http://172.20.0.1/install/rhels6.2/x86_64/*http://172.20.0.1/install/rhels6.2/x86_64/ks=
 *http://172.20.0.1/install/autoinst/ttlogin01*http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0
  cmdline console=tty0 console=ttyS0,115200
   IPAPPEND 2

 For some reason, tftpboot process never proceeds to the pxelinux.cfg
 directory after pxelinux.0 is served.

 Stateless nodes on this cluster boot fine so I think our tftpboot
 environment is OK.  It's just these two nodes that have to be installed
 that are problematic.

 Any help is appreciated.

 Thanks,
 Damir.




 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility

[xcat-user] problem installing new node

2013-11-26 Thread Damir Krstic
We have couple of new x3550m4 that are not installing.  Basically after BMC
has been programmed and nodes have been set to install, and for some
reason, pxe boot process never goes beyond serving pxelinux.0 (please see
the log file below:

ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via
bond0
Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1929
Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148
Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1930
Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1931
Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468
Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting

Here is the tcpdump from the management node when this happens:
14:33:20.626124 IP (tos 0x0, ttl  64, id 50528, offset 0, flags [none],
proto: UDP (17), length: 68) new-node.informatik-lm  mgt node: [udp
sum ok]  40 RRQ pxelinux.0 octet tsize 0 blksize 1468

in the /tftpboot/pxelinux.cfg directory we have a directory that
corresponds to the hex of the ip for the new node:

[root@mgt pxelinux.cfg]# ls -lrt AC140701
lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01

here is the content of the file:
root@mgt pxelinux.cfg]# cat ttlogin01
#install rhels6.2-x86_64-ttlogin6
DEFAULT xCAT
LABEL xCAT
 KERNEL xcat/rhels6.2/x86_64/vmlinuz
 APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo=
http://172.20.0.1/install/rhels6.2/x86_64/ ks=
http://172.20.0.1/install/autoinst/ttlogin01 ksdevice=eth0 cmdline
console=tty0 console=ttyS0,115200
  IPAPPEND 2

For some reason, tftpboot process never proceeds to the pxelinux.cfg
directory after pxelinux.0 is served.

Stateless nodes on this cluster boot fine so I think our tftpboot
environment is OK.  It's just these two nodes that have to be installed
that are problematic.

Any help is appreciated.

Thanks,
Damir.
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] problem installing new node

2013-11-26 Thread Damir Krstic
Verified the static/dynamic overlapping and that does not seem to be a
problem.  Also double checked the ip/mac and they are not
duplicated/overlapping.

Here is the lsdef of one of the problematic nodes:
bject name: ttlogin01
arch=x86_64
bmc=ttlogin01-bmc
bmcpassword=PASSW0RD
bmcusername=USERID
currchain=boot
currstate=install rhels6.2-x86_64-ttlogin6
groups=ttlogin6,ttlogin6-profile,ipmiB,x3650m2,ttlogin,all
initrd=xcat/rhels6.2/x86_64/initrd.img
installnic=eth0
ip=172.20.7.1
kcmdline=nofb utf8
ks=http://172.20.0.1/install/autoinst/ttlogin01ksdevice=eth0
console=tty0 console=ttyS0,115200 noipv6
kernel=xcat/rhels6.2/x86_64/vmlinuz
mac=40:f2:e9:0d:e2:64
mgt=ipmi
mtm=7914AC1
netboot=pxe
nfsserver=172.20.0.1
os=rhels6.2
postbootscripts=otherpkgs,setupntp
postscripts=syslog,remoteshell,syncfiles
primarynic=eth0
profile=ttlogin6
provmethod=install
serial=KQ0GV1M
serialport=0
serialspeed=115200
status=installing
statustime=11-26-2013 12:18:42
supportedarchs=x86,x86_64
switch=bnt101
switchinterface=eth0
switchport=29
switchvlan=1
tftpserver=172.20.0.1
xcatmaster=172.20.0.1


On Tue, Nov 26, 2013 at 3:10 PM, Russell Jones russell-l...@jonesmail.mewrote:

  Verify you do not have dynamic and static networks overlapping for that
 network definition. Also verify you have configured the correct MAC address
 for that node in xcat and do not have overlapping MACs/IPs.

 What does an lsdef for one of the problem nodes look like?



 On 11/26/2013 2:54 PM, Damir Krstic wrote:

 We have couple of new x3550m4 that are not installing.  Basically after
 BMC has been programmed and nodes have been set to install, and for some
 reason, pxe boot process never goes beyond serving pxelinux.0 (please see
 the log file below:

  ov 26 14:43:12 mgt dhcpd: DHCPACK on 172.20.7.1 to 40:f2:e9:0d:e2:64 via
 bond0
 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1929
 Nov 26 14:43:12 mgt atftpd[10629]: tsize option - 13148
 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
 Nov 26 14:43:12 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1930
 Nov 26 14:43:12 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:12 mgt atftpd[10629]: Server thread exiting
 Nov 26 14:43:13 mgt atftpd[10629]: Serving pxelinux.0 to 172.20.7.1:1931
 Nov 26 14:43:13 mgt atftpd[10629]: blksize option - 1468
 Nov 26 14:43:13 mgt atftpd[10629]: Server thread exiting

  Here is the tcpdump from the management node when this happens:
  14:33:20.626124 IP (tos 0x0, ttl  64, id 50528, offset 0, flags [none],
 proto: UDP (17), length: 68) new-node.informatik-lm  mgt node: [udp
 sum ok]  40 RRQ pxelinux.0 octet tsize 0 blksize 1468

  in the /tftpboot/pxelinux.cfg directory we have a directory that
 corresponds to the hex of the ip for the new node:

  [root@mgt pxelinux.cfg]# ls -lrt AC140701
  lrwxrwxrwx 1 root root 9 Nov 26 09:28 AC140702 - ttlogin01

  here is the content of the file:
  root@mgt pxelinux.cfg]# cat ttlogin01
 #install rhels6.2-x86_64-ttlogin6
 DEFAULT xCAT
 LABEL xCAT
  KERNEL xcat/rhels6.2/x86_64/vmlinuz
  APPEND initrd=xcat/rhels6.2/x86_64/initrd.img repo=
 http://172.20.0.1/install/rhels6.2/x86_64/ ks=
 http://172.20.0.1/install/autoinst/ttlogin01 ksdevice=eth0 cmdline
 console=tty0 console=ttyS0,115200
   IPAPPEND 2

  For some reason, tftpboot process never proceeds to the pxelinux.cfg
 directory after pxelinux.0 is served.

  Stateless nodes on this cluster boot fine so I think our tftpboot
 environment is OK.  It's just these two nodes that have to be installed
 that are problematic.

  Any help is appreciated.

  Thanks,
 Damir.



 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility into your
 Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics 
 Pro!http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk



 ___
 xCAT-user mailing 
 listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user




 --
 Rapidly troubleshoot problems before they affect your business. Most IT
 organizations don't have a clear picture of how application performance
 affects their revenue. With AppDynamics, you get 100% visibility into your
 Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics
 Pro!
 http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https

Re: [xcat-user] re-discovering node after motherboard replacement

2013-10-28 Thread Damir Krstic
OK - I'll try booting from the hard drive and see if that works, but...BMC
never got programed.  I can't reach this node with any of the rcons/rpower
commands and if I try to telnet to its bmc port it fails.

I'll keep poking to see if there are any other errors related to programing
of BMC.
Thanks,
Damir


On Mon, Oct 28, 2013 at 9:46 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote:

 Should be ready to be nodeset to do something else.

 'standby' in this case is 'completed everything supposed to happen,
 awaiting instructions'

 If you put in hard drives with os still working:
 nodeset node boot

 if hard drive needs reinstall:
 nodeset node osimage

 If stateless:
 nodeset node netboot

 [image: Inactive hide details for Damir Krstic ---10/28/2013 10:42:01
 AM---One of our GPU nodes had bad motherboard and we had it repla]Damir
 Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard
 and we had it replaced few days ago.  After motherboard was

 From: Damir Krstic damir.krs...@gmail.com
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 10/28/2013 10:42 AM
 Subject: [xcat-user] re-discovering node after motherboard replacement
 --



 One of our GPU nodes had bad motherboard and we had it replaced few days
 ago.  After motherboard was replaced we ran rmnodecfg script and node was
 re-discovered:

 mgt xCAT node discovery: qgpu0020 has been discovered

 I can see the new MAC address in the mac table.  However, we are running
 into issues reprograming BMC.  It never finishes.  Console screen displays:

 Received request to retry in a bit, will call xCAT back in amount
 seconds.

 lsdef on this node displays that node is standby mode (not sure what that
 means):

 chain=runcmd=bmcsetup,standby

 currchain=standby

 currstate=standby

 Here is the content of the pxelinux file for this node:

 #standby

 DEFAULT xCAT

 LABEL xCAT

  KERNEL xcat/genesis.kernel.x86_64

  APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0
 console=ttyS0,115200 xcatd=*172.20.0.1:3001* http://172.20.0.1:3001/ 
 destiny=standby
 nouveau.modeset=0

   IPAPPEND 2

 I hope you can help.

 Damir
 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



 --
 October Webinars: Code for Performance
 Free Intel webinars can help you accelerate application performance.
 Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most
 from
 the latest Intel processors and coprocessors. See abstracts and register 
 http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


graycol.gif--
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register 
http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] nodeset error

2012-08-23 Thread Damir Krstic
Hello,

We are creating a new stateless 6.2 image on our cluster and genimage and
packimage commands completed successfully.  However, running nodeset
command generates following error:

[root@qservice03 pxelinux.cfg]# nodeset qnode0002 netboot
Error: Did you run genimage before running packimage? kernel cannot be
found
Error: Some nodes failed to set up netboot resources, aborting
Error: Did you run genimage before running packimage? kernel cannot be
found
Error: Some nodes failed to set up netboot resources, aborting
Error: Did you run genimage before running packimage? kernel cannot be
found
Error: Some nodes failed to set up netboot resources, aborting
[root@qservice03 pxelinux.cfg]#


Any help would be appreciated.

Thank you,
Damir
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] redhat 6.2 kickstart file

2012-06-12 Thread Damir Krstic
Does anyone have a good RH 6.2 kickstart file they are successfully using
and don't mind sharing?  I tried using our 6.1 template file and it keeps
failing with syntax errors.

Any help is appreciated.

Thanks,
Damir
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] redhat 6.2 stateful image

2012-06-05 Thread Damir Krstic
thanks for the reply...files seem to have copied fine.  i see both vmlinuz
and initrd.img both in /install/rhels6.2/x86_64/images/pxeboot and in
/tftpboot/xcat/rhels6.2/x86_64.

initrd.img and vmlinuz are both different in size from rhels6.1 but i read
somewhere that redhat has switched to a new compression mechanism in 6.2 so
that may be the reason.  i'll try re-downloading the iso and running the
copycds command again, but i don't have much hope for that.

any other ideas?

thanks.
damir

On Mon, Jun 4, 2012 at 9:34 PM, Guang Cheng Li ligua...@cn.ibm.com wrote:

 HI,

 The error indicates the files images/pxeboot/vmlinuz,
 images/pxeboot/initrd.img and images/pxeboot/initrd.img could not be found
 in directory /install/rhels6.2/x86_64/, you could check if the copycds has
 successfully copies the os packages to /install/rhels6.2/x86_64?


 Thanks,
 -
 Li,Guang Cheng (李光成)
 IBM China System Technology Laboratory
  Email: ligua...@cn.ibm.com
 Address: Building 28, ZhongGuanCun Software Park,
  No.8, Dong Bei Wang West Road, Haidian District Beijing 100193,
 PRC

 北京市海淀区东北旺西路8号中关村软件园28号楼
 邮编: 100193

 [image: Inactive hide details for Damir Krstic ---2012/06/05
 04:24:57---Damir Krstic damir.krs...@gmail.com]Damir Krstic
 ---2012/06/05 04:24:57---Damir Krstic damir.krs...@gmail.com


*Damir Krstic damir.krs...@gmail.com*

2012/06/05 04:23
Please respond to
xCAT Users Mailing list xcat-user@lists.sourceforge.net



 To


xCAT-user@lists.sourceforge.net,


 cc


 Subject


[xcat-user] redhat 6.2 stateful image


 Hi,

 I hope you can help us with an error we are encountering building a new
 RHEL 6.2 service node.  We did copycds of a 6.2 image and we
 configured/edited appropriate tables for this node to role out, but...when
 issuing nodeset command this is the error we get:

 [root@mgt ~]# nodeset qservice03 install
 Error: Install image not found in /install/rhels6.2/x86_64
 Error: Some nodes failed to set up install resources, aborting
 qservice03: install rhels6.2-x86_64-service6
 qservice03: install rhels6.2-x86_64-service6

 Here is the directory where I checked for stuff:

 [root@mgt x86_64]# pwd
 /tftpboot/xcat/rhels6.2/x86_64

 and here is the listing of it:

 [root@mgt x86_64]# ls -lart
 total 32812
 drwxr-xr-x 3 root root 4096 May 31 11:06 ..
 drwxr-xr-x 2 root root 4096 May 31 11:06 .
 -rw-r--r-- 1 root root  3938800 Jun  4 15:15 vmlinuz
 -rw-r--r-- 1 root root 29608959 Jun  4 15:15 initrd.img

 No idea what could be causing the error above.  I had issues downloading
 the ISO from RedHat's website, so my next step was to re-download the ISO
 and re-run copycds command.

 Any help would be appreciated.

 Thanks,
 Damir
 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user



 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 xCAT-user mailing list
 xCAT-user@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/xcat-user


ecblank.gifpic18215.gifgraycol.gif--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


[xcat-user] redhat 6.2 stateful image

2012-06-04 Thread Damir Krstic
Hi,

I hope you can help us with an error we are encountering building a new
RHEL 6.2 service node.  We did copycds of a 6.2 image and we
configured/edited appropriate tables for this node to role out, but...when
issuing nodeset command this is the error we get:

[root@mgt ~]# nodeset qservice03 install
Error: Install image not found in /install/rhels6.2/x86_64
Error: Some nodes failed to set up install resources, aborting
qservice03: install rhels6.2-x86_64-service6
qservice03: install rhels6.2-x86_64-service6

Here is the directory where I checked for stuff:

[root@mgt x86_64]# pwd
/tftpboot/xcat/rhels6.2/x86_64

and here is the listing of it:

[root@mgt x86_64]# ls -lart
total 32812
drwxr-xr-x 3 root root 4096 May 31 11:06 ..
drwxr-xr-x 2 root root 4096 May 31 11:06 .
-rw-r--r-- 1 root root  3938800 Jun  4 15:15 vmlinuz
-rw-r--r-- 1 root root 29608959 Jun  4 15:15 initrd.img

No idea what could be causing the error above.  I had issues downloading
the ISO from RedHat's website, so my next step was to re-download the ISO
and re-run copycds command.

Any help would be appreciated.

Thanks,
Damir
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user