|
All info I could think of is in the attached file. Let me know if
I can send any more info. Wang - the cluster is again accessible
from the internet, same as before. So - why do I need to run updatenode manually to execute the
postscripts, and why doesn't "updatenode otherpkgs" work? On 03/01//2012 15:33, Lissa Valletta wrote: Give us the output of lsdef n01, it may help to have the entire setup of n01.Lissa K. Valletta 2-3/T12 Poughkeepsie, NY 12601 (tie 293) 433-3102 From: Xiao Peng Wang <[email protected]> To: [email protected] Cc: [email protected], Daniel Letai <[email protected]> Date: 01/03/2012 07:27 AM Subject: Re: [xcat-user] connection to the cluster with makedhcp (pxe) and updatenode issues So you meant the otherpkgs during the installation did not work, but update ohterpkgs worked after the reboot of the node, right? Could you confirm that the optherpkgs was added in the postbootscripts attribute (postbootscripts=otherpkgs) Thanks Best Regards ---------------------------------------------------------------------- Wang Xiaopeng (王晓朋) IBM China System Technology Laboratory Tel: 86-10-82453455 Email: [email protected] Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193 (Embedded image moved to file: pic14288.gif)Inactive hide details for Sten Wolf ---2012-01-02 17:06:50---As I mentioned - doesn't work. after nodeset osimage, rpower n01 Sten Wolf ---2012-01-02 17:06:50---As I mentioned - doesn't work. after nodeset osimage, rpower n01 reset at the end the otherpkgs are From: Sten Wolf <[email protected]> To: Xiao Peng Wang/China/IBM@IBMCN Cc: Daniel Letai <[email protected]> Date: 2012-01-02 17:06 Subject: Re: connection to the cluster with makedhcp (pxe) and updatenode issues Sent by: Daniel Letai <[email protected]> As I mentioned - doesn't work. after nodeset osimage, rpower n01 reset at the end the otherpkgs are not installed. Must run updatenode manually after final reboot to install otherpkgs. ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user |
# nodels -v
Version 2.6.10 (svn r11245, built Wed Dec 14 10:57:34 EST 2011)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# lsdef n01 -l --osimage
Object name: n01
arch=x86_64
bmc=ipmi01
bmcpassword=ADMIN
bmcusername=ADMIN
currchain=boot
currstate=boot
groups=ipmi,compute,all
hostnames=n01.cluster
initrd=xcat/centos6.1/x86_64/initrd.img
installnic=mac
interface=eth0
ip=10.1.10.1
kcmdline=nofb utf8 ks=http://10.1.10.254/install/autoinst/n01
ksdevice=00:25:90:4A:88:DC console=tty0 console=ttyS0,115200n8r noipv6
kernel=xcat/centos6.1/x86_64/vmlinuz
mac=00:25:90:4A:88:DC
mgt=ipmi
netboot=pxe
nfsserver=10.1.10.254
nodetype=osi
os=centos6.1
otherinterfaces=ipmi01:10.1.11.1,ib01:10.1.12.1
postbootscripts=otherpkgs,ofed,slurm,reboot
postscripts=syslog,remoteshell,syncfiles,initial_fstab
primarynic=mac
profile=compute
provmethod=install
serialflow=hard
serialport=0
serialspeed=115200
status=booted
statustime=01-01-2012 06:24:00
tftpserver=10.1.10.254
xcatmaster=10.1.10.254
template=/opt/xcat/share/xcat/install/centos/compute.centos6.tmpl
otherpkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
pkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.pkglist
otherpkgdir=/install/post/otherpkgs/centos6.1/x86_64
imagetype=linux
synclists=/install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
pkgdir=/install/centos6.1/x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# nodeset n01 osimage=centos6.1-x86_64-install-compute
n01: install centos6.1-x86_64-compute
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# lsdef n01 -l --osimage
Object name: n01
arch=x86_64
bmc=ipmi01
bmcpassword=ADMIN
bmcusername=ADMIN
currchain=boot
currstate=install centos6.1-x86_64-compute
groups=ipmi,compute,all
hostnames=n01.cluster
initrd=xcat/centos6.1/x86_64/initrd.img
installnic=mac
interface=eth0
ip=10.1.10.1
kcmdline=nofb utf8 ks=http://10.1.10.254/install/autoinst/n01
ksdevice=00:25:90:4A:88:DC console=tty0 console=ttyS0,115200n8r noipv6
kernel=xcat/centos6.1/x86_64/vmlinuz
mac=00:25:90:4A:88:DC
mgt=ipmi
netboot=pxe
nfsserver=10.1.10.254
nodetype=osi
os=centos6.1
otherinterfaces=ipmi01:10.1.11.1,ib01:10.1.12.1
postbootscripts=otherpkgs,ofed,slurm,reboot
postscripts=syslog,remoteshell,syncfiles,initial_fstab
primarynic=mac
profile=compute
provmethod=centos6.1-x86_64-install-compute
serialflow=hard
serialport=0
serialspeed=115200
status=booted
statustime=01-01-2012 06:24:00
tftpserver=10.1.10.254
xcatmaster=10.1.10.254
template=/opt/xcat/share/xcat/install/centos/compute.centos6.tmpl
otherpkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
pkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.pkglist
otherpkgdir=/install/post/otherpkgs/centos6.1/x86_64
imagetype=linux
synclists=/install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
pkgdir=/install/centos6.1/x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# cat
/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
blcr.x86_64
blcr-devel.x86_64
blcr-libs.x86_64
blcr-testsuite.x86_64
hwloc.x86_64
hwloc-devel.x86_64
munge.x86_64
munge-libs.x86_64
munge-devel.x86_64
slurm.x86_64
slurm-blcr.x86_64
slurm-munge.x86_64
io-watchdog.x86_64
io-watchdog-libs.x86_64
io-watchdog-slurm.x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# cat /install/postscripts/slurm
#!/bin/bash
#ln -s /opt/conf/slurm.conf /etc/slurm/
mkdir -p /var/slurm/checkpoint /var/log/slurm /tmp/slurmd
chown -R slurm:slurm /var/slurm /var/log/slurm
echo "ulimit -l unlimited" >> /etc/sysconfig/slurm
sed -i 's_LIBDIR=/usr/lib_LIBDIR=/usr/lib64_g' /etc/init.d/slurm
chkconfig munge on
chkconfig slurm on
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# cat /install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
/etc/sudoers -> /etc/sudoers
/etc/group -> /etc/group
/etc/passwd -> /etc/passwd
/etc/shadow -> /etc/shadow
/etc/hosts -> /etc/hosts
/etc/security/limits.conf -> /etc/security/limits.conf
/etc/profile.d/intel.sh /etc/profile.d/intel.csh -> /etc/profile.d/
/etc/ld.so.conf.d/intel-x86_64.conf -> /etc/ld.so.conf.d/intel-x86_64.conf
/etc/munge/munge.key -> /etc/munge/munge.key
/etc/slurm/slurm.conf -> /etc/slurm/slurm.conf
/etc/logrotate.d/slurm -> /etc/logrotate.d/slurm
/etc/sysconfig/slurm -> /etc/sysconfig/slurm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
once the node deployment finished (3 reboots)
a few tests:
1st - IBoIP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ping ib01 -c 1
PING ib01 (10.1.12.1) 56(84) bytes of data.
From mn01-ib (10.1.12.254) icmp_seq=1 Destination Host Unreachable
--- ib01 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3001ms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2nd - otherpkgs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 rpm -q blcr
n01: package blcr is not installed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
so otherpkgs didn't install during node deployment
3rd - synclist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 cat /etc/ld.so.conf.d/intel-x86_64.conf
n01: /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
synclist is good
now for the interesting parts:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# updatenode n01 otherpkgs
n01: Running postscript: otherpkgs
n01: mv: cannot stat `10.1.10.254//post/otherpkgs/centos6.1/x86_64/*': No such
file or directory
n01: NFSSERVER=10.1.10.254
n01: OTHERPKGDIR=10.1.10.254//post/otherpkgs/centos6.1/x86_64
n01: yum -y upgrade
n01: Loaded plugins: fastestmirror
n01: Loading mirror speeds from cached hostfile
n01: Error: Cannot find a valid baseurl for repo: base
n01: Could not retrieve mirrorlist
http://mirrorlist.centos.org/?release=6&arch=x86_64&repo=os error was
n01: 14: PYCURL ERROR 7 - "Failed to connect to 199.187.126.90: Network is
unreachable"
n01: rpm -Uvh --replacepkgs blcr.x86_64* blcr-devel.x86_64* blcr-libs.x86_64*
blcr-testsuite.x86_64* hwloc.x86_64* hwloc-devel.x86_64* munge.x86_64*
munge-libs.x86_64* munge-devel.x86_64* slurm.x86_64* slurm-blcr.x86_64*
slurm-munge.x86_64* io-watchdog.x86_64* io-watchdog-libs.x86_64*
io-watchdog-slurm.x86_64*
n01: error: File not found by glob: blcr.x86_64*
n01: error: File not found by glob: blcr-devel.x86_64*
n01: error: File not found by glob: blcr-libs.x86_64*
n01: error: File not found by glob: blcr-testsuite.x86_64*
n01: error: File not found by glob: hwloc.x86_64*
n01: error: File not found by glob: hwloc-devel.x86_64*
n01: error: File not found by glob: munge.x86_64*
n01: error: File not found by glob: munge-libs.x86_64*
n01: error: File not found by glob: munge-devel.x86_64*
n01: error: File not found by glob: slurm.x86_64*
n01: error: File not found by glob: slurm-blcr.x86_64*
n01: error: File not found by glob: slurm-munge.x86_64*
n01: error: File not found by glob: io-watchdog.x86_64*
n01: error: File not found by glob: io-watchdog-libs.x86_64*
n01: error: File not found by glob: io-watchdog-slurm.x86_64*
n01: Running of postscripts has completed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
but...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# updatenode n01
Performing software maintenance operations. This could take a while.
<snipped lots of output>
.
.
.
<snipped lots of output>
n01: Running postscript: slurm
n01: Running postscript: reboot
n01: Running of postscripts has completed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and once n01 is back online:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ping ib01 -c 1
PING ib01 (10.1.12.1) 56(84) bytes of data.
64 bytes from ib01 (10.1.12.1): icmp_seq=1 ttl=64 time=0.145 ms
--- ib01 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.145/0.145/0.145/0.000 ms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 rpm -q blcr
n01: blcr-0.8.4-1.x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 cat /etc/ld.so.conf.d/intel-x86_64.conf
n01: /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
everything is now ok.
So - why do I need to run updatenode manually to execute the postscripts, and
why doesn't "updatenode otherpkgs" work?
------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev
_______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
