[Oscar-users] Slaves cant boot from disc and hang booting of clients using 2.4.20-6bigmem

Howell Silverman Thu, 13 Nov 2003 17:09:09 -0800

Hello,

I hope someone can help.

Environment is 8 Systems total, Dual Xeon, the master and the 7 nodes each have 4GB of memory, Master has 120GB system drive, 40GB drives in each node.

This is a standard load from RH 9 no changes.

Problem Description

When I choose vmlinux-2.4.20-6bigmem for
the oscarimage:

At step 2, "Configure Selected OSCAR Packages", there are
3 things to configure:
"Environment Switcher" - pick mpich
"ntpconfig"            - pick default
"kernel_picker"        - pick /boot/vmlinux-2.4.20-6bigmem

In the "kernel_picker", if I choose
/boot/vmlinux-2.4.20-6bigmem, there is a further option
whether to not to use loadable kernel modules.

1) choose "use loadable kernel modules"

    I can successfully build everything, and network boot
the slave node. After the network boot, I changed back
the boot order to the hard disk. Then The slave node cannot boot from
harddisk. Will check to see if there are any messages displayed.

2) choose "not to use loadable kernel modules"
    Everything works until step 5. When I tried to "Add
Clients" to oscar, the install_cluster hangs at:

/opt/kernel_picker/bin/kernel_picker --bootkernel
    /boot/vmlinux-2.4.20-6bigmem --bootramdisk N
--networkboot N --kernelversion --modulespath

Below are 3 problems we encountered on the master node.

1) Something strange happened on the master node this
afternoon.
    Initially, the "df" commands gives 76% used on hda3.
And we wanted to find out which directory is so big.
Surprisingly, "df -ms /" gives 6gig as used space, and
the recycle bin is empty.

    After we went to the single user mode, and using "df",
the percent used decreased steadily from 76% to 6%. And
6% is consistent with 6GB.

    We wonder what's going on?

2) The log files on /var/log, like messages, secure, often
gets several GB. There are lots of white spaces in the
log files. Is there a way to get rid of them?

3) Although I've commented everything in /etc/crontab,
    sometimes the master node is writing to the hard disk,
and the available blocks on the hard disk decreased rather
fast, writing about 664000K every 10 minutes. Is there
anyway we can find which process is writing the hard disk?

Is this normal?

[Oscar-users] Slaves cant boot from disc and hang booting of clients using 2.4.20-6bigmem

Reply via email to