Hi all,

This is an especially frustrating help letter to have to write. :) I will explain why:

1) I have successfully used (December '03) OSCAR 3.0 on Redhat 9.0 to install on 32 Dell PowerEdge 1750 servers.

2) Recently, another researcher has purchased a 64 node cluster of Dell PowerEdge 1750 servers (the servers arrived September '04) and I am setting this cluster up using RH9 and OSCAR 3.0. I am attempting to use the exact same configuration as I used for MY cluster, which is happily running OSCAR right now.

I pursued the standard install of OSCAR. Because I've done this once before on what should presumably be identical hardware, I remembered to:

a) replace /usr/share/systemimager/boot/i386/standard/* with the tarfile from Frank Crawford, who had given me the boel_binaries.tar.gz, kernel, config, and initrd.img files that will be used. This EXACT set of files enabled me to do the imaging on my cluster of PowerEdge 1750s in December of 2003.

b) create an /var/lib/systemimager/override/IMAGENAME/etc/modules.conf containing the EXACT same file as that file on my previous cluster so that the machines would remember to load the drivers.

However, when I start up network boot on the new server, and netboot one of the new clients, it gets DHCP, receives the correct DHCP address, and then begins to load the imaging kernel. However, I get the following errors (I had to write them down, so these are just excerpts, albeit in chronological order)

tg3: (02:00.0) phy probe failed, err -16
tg3: problem fetching invariants of chip, aborting
tg3: (02:00.1) phy probe failed, err -16
tg3: problem fetching invariants of chip, aborting

<stuff>

SCSI subsystem driver Revision: 1.00
kmod: failed to execv /sbin/modprobe -s -k scsi_hostadapter, errno = 2

< stuff>

FusionMPT base driver 2.03.00
mptbase: Initiating ioc0 bringup
mptbase: ioc0: WARNING: unexpected doorbell active
mptbase: ioc0: ERROR: doorbell ACK timeout (2)

<more stuff>

VFS: Mounted root (cramfs filesystem)
Mounted devfs on /dev
Freeing unused kernel memory: 524k freed
Unable to handle kernel NULL pointer dereference at virtual address <>
EIP: 0060:<c0264257>

< BUNCH OF NUMBERS>

Kernel panic: attempted to stop init!

And then it dies.

This is pretty annoying. I had assumed that it would JUST WORK given that the hardware, software, and operating system (except for the head node, which is a 2650) is (nominally?) identical in both cases.

I did a little searching on the net for this "tg3: problem fetching invariants of chip, aborting" error, and turned up this link,

http://www.mail-archive.com/[EMAIL PROTECTED]/ msg00705.html

which has another source of these boel_binaries, etc that should ALSO work. They do not work either for this new cluster I am attempting to install: they get a similar tg3 error, and then fail.

What is going on here? I've read that some people have been happy with tg3 and some with bc5700... I was perfectly happy with tg3 until they don't seem to work for these *particular* Dell 1750s. :-(

And what about the kmod: failed to execv /sbin/modprobe -s -k scsi_hostadapter, errno = 2 error? Does that suggest that it hasn't correctly loaded the SCSI driver EITHER?

Does anyone have any suggestions? What exactly is involved (as much detail as possible would be appreciated) in trying to make my very own set of boel_binaries/kernel/initrd.img?

Have I missed something really obvious? Can anybody suggest something? Everyone was so helpful getting it to work correctly the first time that I thought I'd take another crack at the list. :-)

Thanks a bunch,

Jason

--------------
Jason Hlady, B. Sc., M. Sc. (Chem), Adv. Cert. (Comp. Sci.)
Programmer/Analyst (Bioinformatics Specialist)
U of Saskatchewan, Bioinformatics Research Laboratory (BIRL)
[EMAIL PROTECTED] (306) 966-2075



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to