All info I could think of is in the attached file. Let me know if I can send any more info. Wang - the cluster is again accessible from the internet, same as before.

So - why do I need to run updatenode manually to execute the postscripts, and why doesn't "updatenode otherpkgs" work?

Thanks all

On 03/01//2012 15:33, Lissa Valletta wrote:
Give us the output of lsdef n01, it may help to have the entire setup of
n01.

Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102





From:	Xiao Peng Wang <[email protected]>
To:	[email protected]
Cc:	[email protected], Daniel Letai <[email protected]>
Date:	01/03/2012 07:27 AM
Subject:	Re: [xcat-user] connection to the cluster with makedhcp (pxe)
            and	updatenode issues



So you meant the otherpkgs during the installation did not work, but update
ohterpkgs worked after the reboot of the node, right?

Could you confirm that the optherpkgs was added in the postbootscripts
attribute (postbootscripts=otherpkgs)

Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: [email protected]
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193

(Embedded image moved to file: pic14288.gif)Inactive hide details for Sten
Wolf ---2012-01-02 17:06:50---As I mentioned - doesn't work. after nodeset
osimage,  rpower n01 Sten Wolf ---2012-01-02 17:06:50---As I mentioned -
doesn't work. after nodeset osimage, rpower n01 reset at the end the
otherpkgs are

From: Sten Wolf <[email protected]>
To: Xiao Peng Wang/China/IBM@IBMCN
Cc: Daniel Letai <[email protected]>
Date: 2012-01-02 17:06
Subject: Re: connection to the cluster with makedhcp (pxe) and updatenode
issues
Sent by: Daniel Letai <[email protected]>



As I mentioned - doesn't work.
after nodeset osimage,
rpower n01 reset
at the end the otherpkgs are not installed.
Must run updatenode manually after final reboot to install otherpkgs.


------------------------------------------------------------------------------

Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consumers worldwide. Explore the
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user


# nodels -v
Version 2.6.10 (svn r11245, built Wed Dec 14 10:57:34 EST 2011)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# lsdef n01 -l --osimage
Object name: n01
    arch=x86_64
    bmc=ipmi01
    bmcpassword=ADMIN
    bmcusername=ADMIN
    currchain=boot
    currstate=boot
    groups=ipmi,compute,all
    hostnames=n01.cluster
    initrd=xcat/centos6.1/x86_64/initrd.img
    installnic=mac
    interface=eth0
    ip=10.1.10.1
    kcmdline=nofb utf8 ks=http://10.1.10.254/install/autoinst/n01 
ksdevice=00:25:90:4A:88:DC console=tty0 console=ttyS0,115200n8r noipv6
    kernel=xcat/centos6.1/x86_64/vmlinuz
    mac=00:25:90:4A:88:DC
    mgt=ipmi
    netboot=pxe
    nfsserver=10.1.10.254
    nodetype=osi
    os=centos6.1
    otherinterfaces=ipmi01:10.1.11.1,ib01:10.1.12.1
    postbootscripts=otherpkgs,ofed,slurm,reboot
    postscripts=syslog,remoteshell,syncfiles,initial_fstab
    primarynic=mac
    profile=compute
    provmethod=install
    serialflow=hard
    serialport=0
    serialspeed=115200
    status=booted
    statustime=01-01-2012 06:24:00
    tftpserver=10.1.10.254
    xcatmaster=10.1.10.254
    template=/opt/xcat/share/xcat/install/centos/compute.centos6.tmpl
    
otherpkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
    pkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.pkglist
    otherpkgdir=/install/post/otherpkgs/centos6.1/x86_64
    imagetype=linux
    
synclists=/install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
    pkgdir=/install/centos6.1/x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
# nodeset n01 osimage=centos6.1-x86_64-install-compute
n01: install centos6.1-x86_64-compute
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# lsdef n01 -l --osimage
Object name: n01
    arch=x86_64
    bmc=ipmi01
    bmcpassword=ADMIN
    bmcusername=ADMIN
    currchain=boot
    currstate=install centos6.1-x86_64-compute
    groups=ipmi,compute,all
    hostnames=n01.cluster
    initrd=xcat/centos6.1/x86_64/initrd.img
    installnic=mac
    interface=eth0
    ip=10.1.10.1
    kcmdline=nofb utf8 ks=http://10.1.10.254/install/autoinst/n01 
ksdevice=00:25:90:4A:88:DC console=tty0 console=ttyS0,115200n8r noipv6
    kernel=xcat/centos6.1/x86_64/vmlinuz
    mac=00:25:90:4A:88:DC
    mgt=ipmi
    netboot=pxe
    nfsserver=10.1.10.254
    nodetype=osi
    os=centos6.1
    otherinterfaces=ipmi01:10.1.11.1,ib01:10.1.12.1
    postbootscripts=otherpkgs,ofed,slurm,reboot
    postscripts=syslog,remoteshell,syncfiles,initial_fstab
    primarynic=mac
    profile=compute
    provmethod=centos6.1-x86_64-install-compute
    serialflow=hard
    serialport=0
    serialspeed=115200
    status=booted
    statustime=01-01-2012 06:24:00
    tftpserver=10.1.10.254
    xcatmaster=10.1.10.254
    template=/opt/xcat/share/xcat/install/centos/compute.centos6.tmpl
    
otherpkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
    pkglist=/opt/xcat/share/xcat/install/centos/compute.centos6.pkglist
    otherpkgdir=/install/post/otherpkgs/centos6.1/x86_64
    imagetype=linux
    
synclists=/install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
    pkgdir=/install/centos6.1/x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~     
# cat 
/opt/xcat/share/xcat/install/centos/compute.centos6.1.x86_64.otherpkgs.pkglist
blcr.x86_64
blcr-devel.x86_64
blcr-libs.x86_64
blcr-testsuite.x86_64
hwloc.x86_64
hwloc-devel.x86_64
munge.x86_64
munge-libs.x86_64
munge-devel.x86_64
slurm.x86_64
slurm-blcr.x86_64
slurm-munge.x86_64
io-watchdog.x86_64
io-watchdog-libs.x86_64
io-watchdog-slurm.x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# cat /install/postscripts/slurm 
#!/bin/bash

#ln -s /opt/conf/slurm.conf /etc/slurm/
mkdir -p /var/slurm/checkpoint /var/log/slurm /tmp/slurmd
chown -R slurm:slurm /var/slurm /var/log/slurm
echo "ulimit -l unlimited" >> /etc/sysconfig/slurm
sed -i 's_LIBDIR=/usr/lib_LIBDIR=/usr/lib64_g' /etc/init.d/slurm
chkconfig munge on
chkconfig slurm on
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# cat /install/custom/install/centos/centos6.1-x86_64-install-compute.synclist
/etc/sudoers -> /etc/sudoers
/etc/group -> /etc/group
/etc/passwd -> /etc/passwd
/etc/shadow -> /etc/shadow
/etc/hosts -> /etc/hosts
/etc/security/limits.conf -> /etc/security/limits.conf
/etc/profile.d/intel.sh /etc/profile.d/intel.csh -> /etc/profile.d/
/etc/ld.so.conf.d/intel-x86_64.conf -> /etc/ld.so.conf.d/intel-x86_64.conf
/etc/munge/munge.key -> /etc/munge/munge.key
/etc/slurm/slurm.conf -> /etc/slurm/slurm.conf
/etc/logrotate.d/slurm -> /etc/logrotate.d/slurm
/etc/sysconfig/slurm -> /etc/sysconfig/slurm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
once the node deployment finished (3 reboots)
a few tests:
1st - IBoIP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ping ib01 -c 1
PING ib01 (10.1.12.1) 56(84) bytes of data.
From mn01-ib (10.1.12.254) icmp_seq=1 Destination Host Unreachable

--- ib01 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 3001ms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2nd - otherpkgs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 rpm -q blcr
n01: package blcr is not installed
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
so otherpkgs didn't install during node deployment
3rd - synclist
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 cat /etc/ld.so.conf.d/intel-x86_64.conf
n01: /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
synclist is good

now for the interesting parts:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# updatenode n01 otherpkgs
n01: Running postscript: otherpkgs
n01: mv: cannot stat `10.1.10.254//post/otherpkgs/centos6.1/x86_64/*': No such 
file or directory
n01: NFSSERVER=10.1.10.254
n01: OTHERPKGDIR=10.1.10.254//post/otherpkgs/centos6.1/x86_64
n01: yum -y upgrade
n01: Loaded plugins: fastestmirror
n01: Loading mirror speeds from cached hostfile
n01: Error: Cannot find a valid baseurl for repo: base
n01: Could not retrieve mirrorlist 
http://mirrorlist.centos.org/?release=6&arch=x86_64&repo=os error was
n01: 14: PYCURL ERROR 7 - "Failed to connect to 199.187.126.90: Network is 
unreachable"
n01:  rpm -Uvh --replacepkgs  blcr.x86_64* blcr-devel.x86_64* blcr-libs.x86_64* 
blcr-testsuite.x86_64* hwloc.x86_64* hwloc-devel.x86_64* munge.x86_64* 
munge-libs.x86_64* munge-devel.x86_64* slurm.x86_64* slurm-blcr.x86_64* 
slurm-munge.x86_64* io-watchdog.x86_64* io-watchdog-libs.x86_64* 
io-watchdog-slurm.x86_64*
n01: error: File not found by glob: blcr.x86_64*
n01: error: File not found by glob: blcr-devel.x86_64*
n01: error: File not found by glob: blcr-libs.x86_64*
n01: error: File not found by glob: blcr-testsuite.x86_64*
n01: error: File not found by glob: hwloc.x86_64*
n01: error: File not found by glob: hwloc-devel.x86_64*
n01: error: File not found by glob: munge.x86_64*
n01: error: File not found by glob: munge-libs.x86_64*
n01: error: File not found by glob: munge-devel.x86_64*
n01: error: File not found by glob: slurm.x86_64*
n01: error: File not found by glob: slurm-blcr.x86_64*
n01: error: File not found by glob: slurm-munge.x86_64*
n01: error: File not found by glob: io-watchdog.x86_64*
n01: error: File not found by glob: io-watchdog-libs.x86_64*
n01: error: File not found by glob: io-watchdog-slurm.x86_64*
n01: Running of postscripts has completed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
but...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# updatenode n01
Performing software maintenance operations. This could take a while.
<snipped lots of output>
.
.
.
<snipped lots of output>
n01: Running postscript: slurm
n01: Running postscript: reboot
n01: Running of postscripts has completed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and once n01 is back online:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# ping ib01 -c 1
PING ib01 (10.1.12.1) 56(84) bytes of data.
64 bytes from ib01 (10.1.12.1): icmp_seq=1 ttl=64 time=0.145 ms

--- ib01 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.145/0.145/0.145/0.000 ms
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 rpm -q blcr
n01: blcr-0.8.4-1.x86_64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# xdsh n01 cat /etc/ld.so.conf.d/intel-x86_64.conf
n01: /opt/intel/composer_xe_2011_sp1.6.233/compiler/lib/intel64
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
everything is now ok.

So - why do I need to run updatenode manually to execute the postscripts, and 
why doesn't "updatenode otherpkgs" work?

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to