Hi John,
Trying to recreate the issue you are hitting on some x3550 M4 machines.
but seems to work okay..
I'm running rhels6.7 running the same xCAT 2.11 level you have, manually
booting into the genesis shell seems to work okay.
nodeset <node> shell
rsetboot <node> net
rpower <node> off/on
On the rcons, I do see something very slightly different than your output,
my machine is using ELILO, do you have that package installed?
xCAT Network Boot Agent
iPXE 1.0.3-131028 (d603e) -- Open Source Network Boot Firmware --
http://ipxe.org
Features: HTTP HTTPS iSCSI DNS TFTP EFI
net0: 34:40:b5:b9:b9:47 using <NULL> on EFI SNP (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 34:40:b5:b9:b9:47)... ok
net0: 10.4.29.1/255.0.0.0 gw 10.0.0.103
Next server: 10.4.36.1
Filename: http://10.4.36.1/tftpboot/xcat/xnba/nodes/c910f04x29.uefi
http://10.4.36.1/tftpboot/xcat/xnba/nodes/c910f04x29.uefi... ok
http://10.4.36.1/tftpboot/xcat/elilo-x64.efi... ok
ELILO v3.14 for EFI/x86_64
Loading kernel /tftpboot/xcat/genesis.kernel.x86_64... done
Loading file /tftpboot/xcat/genesis.fs.x86_64.lzma...done <------- this
ended up "done" instead of 74%
�
[root@c910f04x36 ~]# rpm -qa | grep elilo
elilo-xcat-3.14-4.noarch
That should have been pulled in from the xcat-deps..
Regards,
VICTOR K. HU
Phone: 1-845-433-9571
E-mail: [email protected]
2455 South Rd
Poughkeepsie, NY 12601-5400
United States
From: "Westlund, John A" <[email protected]>
To: xCAT Users Mailing list <[email protected]>
Date: 02/02/2016 11:51 PM
Subject: Re: [xcat-user] Failure booting genesis kernel
I get into the same state regardless of whether I’m bringing the node up
with auto-discovery or I’ve manually defined it.
Here are the processes of a node that’s been up a few minutes:
[xCAT Genesis running on (none) /]# ps -elf
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S root 1 0 1 80 0 - 2869 wait 18:59 ? 00:00:08
/bin/sh /init
1 S root 2 0 0 80 0 - 0 kthrea 18:59 ? 00:00:00
[kthreadd]
…
1 S root 446 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kthrotld/31]
1 S root 456 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kpsmoused]
1 S root 457 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[usbhid_resumer]
1 S root 458 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[deferwq]
5 S root 481 1 0 76 -4 - 2720 poll_s 18:59 ? 00:00:01
udevd --daemon
5 S root 563 481 0 78 -2 - 2695 poll_s 18:59 ? 00:00:00
udevd --daemon
1 S root 567 1 0 80 0 - 2869 wait 18:59 ? 00:00:00
/bin/sh /init
4 S root 569 567 0 80 0 - 5499 pause 18:59 ? 00:00:00
screen -ln
5 S root 570 569 0 80 0 - 5499 poll_s 18:59 ? 00:00:00
SCREEN -ln
4 S root 571 570 0 80 0 - 2835 n_tty_ 18:59 pts/0 00:00:00
/bin/sh
1 S root 576 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[mlx4]
1 S root 640 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_0]
1 S root 641 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_1]
1 S root 642 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_2]
1 S root 643 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_3]
1 S root 644 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_4]
1 S root 645 2 0 80 0 - 0 scsi_e 18:59 ? 00:00:00
[scsi_eh_5]
1 S root 707 2 0 99 19 - 0 ipmi_t 18:59 ? 00:00:00
[kipmi0]
4 S root 855 1 0 80 0 - 5499 pause 18:59 ? 00:00:00
screen -L -ln doxcat
5 S root 856 855 0 80 0 - 5500 poll_s 18:59 ? 00:00:00
SCREEN -L -ln doxcat
4 S root 857 856 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 860 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kondemand/0]
…
1 S root 891 2 0 80 0 - 0 worker 18:59 ? 00:00:00
[kondemand/31]
5 S rpc 923 1 0 80 0 - 4744 poll_s 18:59 ? 00:00:00
rpcbind
5 S root 925 1 0 80 0 - 5837 poll_s 18:59 ? 00:00:00
rpc.statd
5 S root 930 1 0 80 0 - 16672 poll_s 18:59 ? 00:00:00
/usr/sbin/sshd
5 S root 953 1 0 80 0 - 3396 poll_s 18:59 ? 00:00:00
lldpad -d
4 S root 982 857 0 80 0 - 2280 poll_s 18:59 pts/1 00:00:00
dhclient -6 -pf /var/run/dhclient6.eth0.pid eth0 -lf
/var/lib/dhclient/dhclient6.
1 S root 994 857 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 995 857 0 80 0 - 2309 wait 18:59 pts/1 00:00:00
/bin/sh /bin/doxcat
1 S root 1759 1 0 80 0 - 2280 poll_s 19:00 ? 00:00:00
dhclient -cf /etc/dhclient.conf -pf /var/run/dhclient.eth0.pid eth0
5 S root 1773 1 0 80 0 - 6627 poll_s 19:00 ? 00:00:00 ntpd
-g -x
5 S root 1787 481 0 78 -2 - 2719 poll_s 19:00 ? 00:00:00
udevd --daemon
1 S root 1807 2 0 80 0 - 0 kaudit 19:00 ? 00:00:00
[kauditd]
5 S root 1834 1 0 80 0 - 31077 poll_s 19:00 ? 00:00:00
/sbin/rsyslogd -c4
4 S root 2896 930 0 80 0 - 17830 - 19:06 ? 00:00:00
sshd: root@pts/2
4 S root 2924 2896 0 80 0 - 2835 wait 19:06 pts/2 00:00:00
-bash
0 S root 2959 994 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 5
0 S root 2960 995 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 5
0 S root 2961 857 0 80 0 - 1018 hrtime 19:07 pts/1 00:00:00
sleep 1
4 R root 2962 2924 2 80 0 - 3344 - 19:07 pts/2 00:00:00
ps -elf
From: Xiao Peng Wang [mailto:[email protected]]
Sent: Tuesday, February 2, 2016 2:17 AM
To: [email protected]
Cc: [email protected]
Subject: Re: [xcat-user] Failure booting genesis kernel
It's possible that genesis is waiting for the tasks to run instead of
dead. Could show out the out of 'ps -elf' in the genesis to see what
processes are running.
How did you get your node into genesis? A new node which got into genesis
for discovery, or you run the 'nodeset' to force the node got into genesis
to run certain task?
Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: [email protected]
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193
----- Original message -----
From: "Westlund, John A" <[email protected]>
To: xCAT Users Mailing list <[email protected]>, Er Tao
Zhao/China/IBM@IBMCN
Cc:
Subject: Re: [xcat-user] Failure booting genesis kernel
Date: Tue, Feb 2, 2016 3:04 PM
I can ping and get into the node:
[xCAT Genesis running on (none) /]# ls
bin debian emergency init initqueue-finished
initqueue-timeout lib64 netroot pre-pivot pre-udev root
screenlog.0 sysroot usr
cmdline dev etc initqueue initqueue-settled lib
mount pre-mount pre-trigger proc sbin sys tmp var
This is what is running:
# lsxcatd -a
Version 2.11 (git commit 9ea36ca6163392bf9ab684830217f017193815be, built
Mon Nov 30 05:43:11 EST 2015)
This is a Management Node
dbengine=SQLite
John
From: Xiao Peng Wang [mailto:[email protected]]
Sent: Monday, February 1, 2016 10:23 PM
To: [email protected]; Er Tao Zhao
Cc: [email protected]
Subject: Re: [xcat-user] Failure booting genesis kernel
You mentioned the genesis got a dead end, could you ping to the compute
node or try to login the compute node? Please run the 'lsxcatd -a' to
show the xcat version.
Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: [email protected]
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193
----- Original message -----
From: "Westlund, John A" <[email protected]>
To: "[email protected]" <[email protected]>
Cc:
Subject: [xcat-user] Failure booting genesis kernel
Date: Tue, Feb 2, 2016 12:07 PM
I’m trying to bring up a new system, but have run into a dead end. I get
no error message other than a blinking question mark in a diamond (some
un-assigned UTF character):
CLIENT MAC ADDR: 84 8F 69 FD 4F 28 GUID: 44454C4C 4300 1042 8054
B6C04F355631
CLIENT IP: 192.168.91.9 MASK: 255.255.240.0 DHCP IP: 10.10.1.167
GATEWAY IP: 192.168.92.54
PXE->EB:�P: 192.168.92.54
PXE->EB: !PXE at 98D2:0070, entry point at 98D2:0106
UNDI code segment 98D2:5210, data segment 9297:63B0 (586-632kB)
UNDI device is PCI 02:00.0, type DIX+802.3
546kB free base memory after PXE unload
xNBA initialising devices...ok
xCAT Network Boot Agent
iPXE 1.0.3-131028 (d603e) -- Open Source Network Boot Firmware --
http://ipxe.or
g
Features: HTTP HTTPS iSCSI DNS TFTP bzImage ELF PXE PXEXT
net0: 84:8f:69:fd:4f:28 using undionly on UNDI-PCI02:00.0 (open)
[Link:up, TX:0 TXE:0 RX:0 RXE:0]
DHCP (net0 84:8f:69:fd:4f:28)... ok
net0: 192.168.91.9/255.255.240.0 gw 192.168.92.54
Next server: 192.168.92.53
Filename: http://192.168.92.53/tftpboot/xcat/xnba/nets/192.168.80.0_20
http://192.168.92.53/tftpboot/xcat/xnba/nets/192.168.80.0_20... ok
http://192.168.92.53/tftpboot/xcat/genesis.kernel.x86_64... ok
http://192.168.92.53/tftpboot/xcat/genesis.fs.x86_64.lzma... 74%
I’m assuming the genesis.fs finishes loading even though it read “74%,”
and a CSI code bounces the cursor up the screen before failing.
Where should I be looking for debug this?
Thanks,
John
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user