First point:
lsdef -t osimage bellatest_node
Object name: bellatest_node
imagetype=linux
osarch=x86_64
osdistroname=rhels6.5-x86_64
osname=Linux
osvers=rhels6.5
otherpkgdir=/install/post/otherpkgs/rhels6.5/x86_64
otherpkglist=/install/configuration/worker.otherpkgs.pkglist
pkgdir=/install/rhels6.5/x86_64
pkglist=/install/configuration/worker.rhels6.x86_64.pkglist
profile=cluster
provmethod=install
synclists=/install/configuration/common.synclist,/install/configuration/worker.synclist
template=/install/configuration/worker.rhels6.x86_64.tmpl
Second point:
As far as I know, Dell has renamed their NIC names.
Cluster one (same hardware (Dell), but xCAT version 2.7.3):
ifconfig -a | grep HW
Ifconfig uses the ioctl access method to get the full address information,
which limits hardware addresses to 8 bytes.
Because Infiniband address has 20 bytes, only the first 8 bytes are displayed
correctly.
Ifconfig is obsolete! For replacement check ip.
em1 Link encap:Ethernet HWaddr 84:8F:69:FE:25:8E
em2 Link encap:Ethernet HWaddr 84:8F:69:FE:25:8F
ib0 Link encap:InfiniBand HWaddr
80:00:00:03:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Note:
em2 exists on my cluster as well, but is not used.
Second cluster (same xCAT version, but different hardware (Intel) and wiring):
ifconfig -a | grep HW
bond0 Link encap:Ethernet HWaddr 00:1E:67:3C:DD:58
eth0 Link encap:Ethernet HWaddr 00:1E:67:3C:DD:58
eth1 Link encap:Ethernet HWaddr 00:1E:67:3C:DD:59
eth2 Link encap:Ethernet HWaddr 00:1E:67:3C:DD:58
De : Lissa Valletta <[email protected]<mailto:[email protected]>>
Répondre à : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Date : Thursday, 15 May 2014 16:16
À : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Cc : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Objet : Re: [xcat-user] xCAT - Node reboot problem
So the changes were just for handling postscripts which you are not getting
that far.
run lsdef -t osimage bellatest_node and let us see how the image is defined.
The other thing I wonder about is your ifconfig comes back with what we usually
have defined as eth* as em*. I am not sure we have ever tested that. You
said you had other machines working. Does ifconfig -a on those machines also
use the em* string instead of eth*?
Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102
[Inactive hide details for De Giorgi Jean-Claude ---05/15/2014 09:56:24
AM---OK, I changed useflowcontrol value to "no".]De Giorgi Jean-Claude
---05/15/2014 09:56:24 AM---OK, I changed useflowcontrol value to "no".
From: De Giorgi Jean-Claude
<[email protected]<mailto:[email protected]>>
To: xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>,
Date: 05/15/2014 09:56 AM
Subject: Re: [xcat-user] xCAT - Node reboot problem
________________________________
OK,
I changed useflowcontrol value to "no".
I'm not sure to understand, do I have to change the value of
precreatemypostscripts to 0 (zero) as well?
Yes, rcons works and bmc console as well.
So after changing the useflowcontrol (and not precreatemypostscripts), I
launched the command you gave me and I got a kernel panic on the boot (see
attached picture).
Tks,
Jean-Claude
De : Lissa Valletta <[email protected]<mailto:[email protected]>>
Répondre à : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Date : Thursday, 15 May 2014 15:20
À : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Objet : Re: [xcat-user] xCAT - Node reboot problem
One other thing, change the following. This may be causing you problems and
why you needed the precreatemypostscripts set.
You have in the site table, we put it there by default. Change it to "no".
useflowcontrol","yes"
chtab key=useflowcontrol site.value=no
When you change the node definitions rerun nodeset
nodeset boot2 osimage=bellatest_node
rsetboot boot2 net
rpower boot2 boot
Does rcons work for you, might help to watch.
Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102
[Inactive hide details for De Giorgi Jean-Claude ---05/15/2014 08:46:09
AM---Dear Lissa, Thank you for your reply.]De Giorgi Jean-Claude ---05/15/2014
08:46:09 AM---Dear Lissa, Thank you for your reply.
From: De Giorgi Jean-Claude
<[email protected]<mailto:[email protected]>>
To: xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>,
Date: 05/15/2014 08:46 AM
Subject: Re: [xcat-user] xCAT - Node reboot problem
________________________________
Dear Lissa,
Thank you for your reply.
Here are the asked details:
Lsdef b002:
Object name: b002
arch=x86_64
bmc=b002bmc
currchain=boot
currstate=boot
groups=all,worker,ipmi
initrd=xcat/osimage/bellatest_node/initrd.img
installnic=em1
ip=172.30.0.2
kcmdline=quiet
repo=http://172.30.0.1:80/install/rhels6.5/x86_64<http://172.30.0.1/install/rhels6.5/x86_64>
ks=http://172.30.0.1:80/install/autoinst/b002<http://172.30.0.1/install/autoinst/b002>
ksdevice=em1
kernel=xcat/osimage/bellatest_node/vmlinuz
mac=00:23:ae:ee:b7:2d
mgt=ipmi
netboot=pxe
nfsserver=172.30.0.1
os=rhels6.5
postbootscripts=otherpkgs,route_node_set.sh,bella_install_QLogic.sh,nfs_node_mount.sh,ganglia_node_config.sh,munge,slurm
postscripts=syslog,remoteshell,syncfiles,bella_config_bios_C6220.sh,bella_config_services.sh
power=ipmi
primarynic=em1
profile=cluster
provmethod=bellatest_node
status=booting
statustime=05-15-2014 14:22:02
tftpserver=172.30.0.1
updatestatus=synced
updatestatustime=05-14-2014 16:27:49
xcatmaster=172.30.0.1
abdump site
#key,value,comments,disable
"blademaxp","64",,
"domain","cluster",,
"fsptimeout","0",,
"installdir","/install",,
"ipmimaxp","64",,
"ipmiretries","3",,
"ipmitimeout","2",,
"consoleondemand","no",,
"master","172.30.0.1",,
"forwarders","128.178.15.8,128.178.15.7",,
"nameservers","172.30.0.1",,
"maxssh","8",,
"ppcmaxp","64",,
"ppcretry","3",,
"ppctimeout","0",,
"powerinterval","0",,
"syspowerinterval","0",,
"sharedtftp","1",,
"SNsyncfiledir","/var/xcat/syncfiles",,
"nodesyncfiledir","/var/xcat/node/syncfiles",,
"tftpdir","/tftpboot",,
"xcatdport","3001",,
"xcatiport","3002",,
"xcatconfdir","/etc/xcat",,
"timezone","Europe/Zurich",,
"useNmapfromMN","no",,
"enableASMI","no",,
"db2installloc","/mntdb2",,
"databaseloc","/var/lib",,
"sshbetweennodes","ALLGROUPS",,
"dnshandler","ddns",,
"vsftp","n",,
"cleanupxcatpost","no",,
"dhcplease","43200",,
"useflowcontrol","yes",,
"dhcpinterfaces","em1",,
"ntpservers","172.30.0.1",,
"precreatemypostscripts","1",,
Thank you.
Jean-Claude
De : Lissa Valletta <[email protected]<mailto:[email protected]>>
Répondre à : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Date : Thursday, 15 May 2014 14:33
À : xCAT Users Mailing list
<[email protected]<mailto:[email protected]>>
Cc : "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>
Objet : Re: [xcat-user] xCAT - Node reboot problem
I will warn you we did not test or support redhat6.5 until our next release
xCAT 2.8.4. You say you have working nodes with the same OS and xCAT level
though.
Could you give us lsdef boot2 and a tabdump site.
Is this a new installation? Might check the setup with this doc, it is good
for all x-series installations.
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_iDataPlex_Cluster_Quick_Start
Lissa K. Valletta
8-3/B10
Poughkeepsie, NY 12601
(tie 293) 433-3102
[Inactive hide details for De Giorgi Jean-Claude ---05/15/2014 08:21:17
AM---Hello all, After a node has been reinstalled, it wo]De Giorgi Jean-Claude
---05/15/2014 08:21:17 AM---Hello all, After a node has been reinstalled, it
won't boot its OS.
From: De Giorgi Jean-Claude
<[email protected]<mailto:[email protected]>>
To: "[email protected]<mailto:[email protected]>"
<[email protected]<mailto:[email protected]>>,
Date: 05/15/2014 08:21 AM
Subject: [xcat-user] xCAT - Node reboot problem
________________________________
Hello all,
After a node has been reinstalled, it won't boot its OS.
The installation works well and once it has to reboot, I can see on the remote
console (BMC) that it's kind of stuck on this message:
Trying to load: pxelinux.cfg/AC1E0002 OK
Booting from local disk…
PXE-M0F: Exiting Intel Boot Agent.
I can see in the nodelist table that the status of this node is "failed" :
tabdump nodelist
#node,groups,status,statustime,appstatus,appstatustime,primarysn,hidden,updatestatus,updatestatustime,comments,disable
"b002","all,worker,ipmi","failed","05-15-2014
09:31:22",,,,,"synced","05-14-2014 16:27:49",,
"b002bmc","bmc",,,,,,,,,,
The /var/log/httpd/error_log on the MN gives me that error:
[Wed May 14 16:33:57 2014] [error] [client 172.30.0.2] File does not exist:
/tftpboot/mypostscripts/mypostscript.b002
[Wed May 14 16:33:57 2014] [error] [client 172.30.0.2] File does not exist:
/install/rhels6.5/x86_64/images/updates.img
The first error has been cleared by adding the following option:
chdef -t site -o clustersite precreatemypostscripts=1
Which gave me a kickstart-like file in /tftpboot/mypostscripts.
(I don't know if that was important but at least I don't have anymore the error
message.)
Regarding the second error, I checked on our other clusters with the same xCAT
version and OS, but this file doesn't exist and this error message neither.
The funny thing is if I force the next reboot on the HD, the node boots the OS
normally (rsetboot b002 hd).
So my only clues are this error message and the state of the node written in
the nodelist table.
I cannot find any log files
To me, the /var/log/messages on the MN seems normal:
May 15 09:34:06 b01 dhcpd: DHCPDISCOVER from 00:23:ae:ee:b7:2d via em1
May 15 09:34:06 b01 dhcpd: DHCPOFFER on 172.30.0.2 to 00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 dhcpd: DHCPREQUEST for 172.30.0.2 (172.30.0.1) from
00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 dhcpd: DHCPACK on 172.30.0.2 to 00:23:ae:ee:b7:2d via em1
May 15 09:34:08 b01 in.tftpd[113221]: RRQ from 172.30.0.2 filename pxelinux.0
May 15 09:34:08 b01 in.tftpd[113221]: tftp: client does not accept options
May 15 09:34:08 b01 in.tftpd[113222]: RRQ from 172.30.0.2 filename pxelinux.0
May 15 09:34:08 b01 in.tftpd[113223]: RRQ from 172.30.0.2 filename
pxelinux.cfg/44454c4c-3500-104c-805a-c4c04f585831
May 15 09:34:08 b01 in.tftpd[113224]: RRQ from 172.30.0.2 filename
pxelinux.cfg/01-00-23-ae-ee-b7-2d
May 15 09:34:08 b01 in.tftpd[113225]: RRQ from 172.30.0.2 filename
pxelinux.cfg/AC1E0002
cat AC1E0002
#boot
DEFAULT xCAT
LABEL xCAT
LOCALBOOT 0
Here are some details about my configurations:
The node (b002) and the management node are a Dell C6220 .
CPU: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
OS: RHEL 6.5
xCAT: Version 2.8.3 (built Tue Nov 12 23:16:15 EST 2013)
This is a Management Node
dbengine=SQLite
xCAT tables:
Noderes:
tabdump noderes
"worker",,"pxe","172.30.0.1",,"172.30.0.1",,,"em1","em1",,,"172.30.0.1",
nodetype
"b002","rhels6.5","x86_64","cluster","bellatest_node",
Any ideas?
Thanks.
Jean-Claude
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
[attachment "graycol.gif" deleted by Lissa Valletta/Poughkeepsie/IBM] (See
attached file:
node_b002-kernel_panic_at_boot.jpg)------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs_______________________________________________
xCAT-user mailing list
[email protected]<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/xcat-user
(See attached file: graycol.gif)
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user