Re: FreeBSD-6 amr and ahd trouble

2005-12-05 Thread Scott Long

Joerg Pulz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Fri, 2 Dec 2005, Michael Rogato wrote:

I know I'm a couple weeks late, but I've been having the same problem 
with my 300-8x. It seems that after a seemingly random period of time 
on my dual opteron box, the system just hangs. It did kernel panic 
once when I was taking down the geom array. Originally I thought it 
might have something to do with GEOM, but since it's also happened 
outside of a GEOM array, I'm kind of at a loss.


Have you managed to find anything out about what exactly is causing 
the problem? I don't get any kind of error messages, so I haven't had 
much luck in tracking it down.



With help from Scott Long and John Baldwin, i have my system up and 
running again without problems.

You should build your own kernel which should have
options MUTEX_NOINLINE
in the kernel configuration. With this option my system is working.

regards
Joerg



I think that the root problem is actually memory corruption from the
amr-cam module.  I haven't been able to nail it down further, though.
However, this module is entirely optional and isn't used for anything
in the base system (it's only useful if you hook up a cdrom or tape
drive to your RAID card), so I've disabled it CVS HEAD and RELENG_6
until I can fix it for good.

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD-6 amr and ahd trouble

2005-12-05 Thread Joerg Pulz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Mon, 5 Dec 2005, Scott Long wrote:


Joerg Pulz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Fri, 2 Dec 2005, Michael Rogato wrote:

I know I'm a couple weeks late, but I've been having the same problem with 
my 300-8x. It seems that after a seemingly random period of time on my 
dual opteron box, the system just hangs. It did kernel panic once when I 
was taking down the geom array. Originally I thought it might have 
something to do with GEOM, but since it's also happened outside of a GEOM 
array, I'm kind of at a loss.


Have you managed to find anything out about what exactly is causing the 
problem? I don't get any kind of error messages, so I haven't had much 
luck in tracking it down.



With help from Scott Long and John Baldwin, i have my system up and running 
again without problems.

You should build your own kernel which should have
options MUTEX_NOINLINE
in the kernel configuration. With this option my system is working.

regards
Joerg



I think that the root problem is actually memory corruption from the
amr-cam module.  I haven't been able to nail it down further, though.
However, this module is entirely optional and isn't used for anything
in the base system (it's only useful if you hook up a cdrom or tape
drive to your RAID card), so I've disabled it CVS HEAD and RELENG_6
until I can fix it for good.


Hi Scott,

i've just backported the amr.c changes from HEAD and removed the
options MUTEX_NOINLINE
line from my kernel configuration. After rebuilding and installing the new 
kernel, the syste came up without any problems, so your assumption about 
the amr-cam interface seems to be right.


thanks
Joerg

- -- 
The beginning is the most important part of the work.

-Plato
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDlIW5SPOsGF+KA+MRArG8AJ9B+QuW28AcC+WgxnZLtqr2GOs/WgCeOhiF
uAjgpBQzOfT31ziF9C7MBVw=
=d98U
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD-6 amr and ahd trouble

2005-12-03 Thread Joerg Pulz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On Fri, 2 Dec 2005, Michael Rogato wrote:

I know I'm a couple weeks late, but I've been having the same problem with my 
300-8x. It seems that after a seemingly random period of time on my dual 
opteron box, the system just hangs. It did kernel panic once when I was 
taking down the geom array. Originally I thought it might have something to 
do with GEOM, but since it's also happened outside of a GEOM array, I'm kind 
of at a loss.


Have you managed to find anything out about what exactly is causing the 
problem? I don't get any kind of error messages, so I haven't had much luck 
in tracking it down.


With help from Scott Long and John Baldwin, i have my system up and 
running again without problems.

You should build your own kernel which should have
options MUTEX_NOINLINE
in the kernel configuration. With this option my system is working.

regards
Joerg

- -- 
The beginning is the most important part of the work.

-Plato
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2 (FreeBSD)

iD8DBQFDkYV+SPOsGF+KA+MRAkphAKCLkFRbqmRuPhEXoDb7V02WuLNOdwCgxLlt
0JFCSevZQtpbl2btHvIKhb8=
=KtiS
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD-6 amr and ahd trouble

2005-12-02 Thread Michael Rogato
I know I'm a couple weeks late, but I've been having the same problem 
with my 300-8x. It seems that after a seemingly random period of time on 
my dual opteron box, the system just hangs. It did kernel panic once 
when I was taking down the geom array. Originally I thought it might 
have something to do with GEOM, but since it's also happened outside of 
a GEOM array, I'm kind of at a loss.


Have you managed to find anything out about what exactly is causing the 
problem? I don't get any kind of error messages, so I haven't had much 
luck in tracking it down.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD-6 amr and ahd trouble

2005-11-16 Thread Scott Long

Joerg Pulz wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Hi guys,

I'm running an Fujitsu-Siemens Primergy RX300 dual-XEON hyperthreading 
enabled server with an onboard LSI MegaRAID controller and an Adaptec 
39320A Ultra320 dual channel SCSI adapter. The LSI MegaRAID controller 
is configured to RAID1 with two disk and one hotspare. On this array 
FreeBSD is installed.
Up to now, the system was running fine with FreeBSD-5.3 first and 
FreeBSD-5.4 now.
I tried to upgrade this beast to FreeBSD-6.0-RELEASE without success. 
The kernel is booting and detects all devices correctly but when it 
comes to read from the amr(4) the last thing i see is GEOM: new disk 
amrd0 after that the system hangs and its nearly impossible to scroll 
the kernel messages up or down (Scroll lock pressed). then after a while 
there are a lot of SCSI error messages about SCB timeouts coming from 
the ahd(4).
I decided to boot the old RELENG_5_4 kernel and cvsup'ed the sources to 
RELENG_6 but i got the same results. booting from a FreeBSD-6.0-RELEASE 
bootonly CDRom got again the same results.
I searched google about this, and found something about a tuneable 
sysctl/loader setting called hw.pci.do_powerstate and tried it, but the 
same result. later i saw, that in RELENG_6 this tuneable is renamed and 
set to 0 anyway.
the next step was removing the Adaptec card to make sure this one is not 
interrupting the amr(4) but the only thing that happened was the SCSI 
error messages going away so this was not the problem.
I decided to give CURRENT from today a try, and it was working without 
any problems. I have tested CURRENT some steps back until i hit 73 
dated to Sun Sep 18 05:12:39 2005 UTC which is exactly the same time 
the RELENG_6 branch was marked for 6.0-BETA5 and CURRENT was working 
with every point i checked out from cvs. Unfortunately 6.0-BETA5 is NOT 
working.
I checked out the sources for 6.0-BETA4 and it is working again. So 
somewhere between 6.0-BETA4 and 6.0-BETA5 the whole thing is broken, at 
least for me and my hardware.
I've seen some differences in sys/cam/cam_xpt.c, maybe these cause the 
trouble i have, but I'm not so deep in the FreeBSD kernel code to make 
this sure.


It would be nice if someone can take a look at this to get this fixed in 
RELENG_6.

Any patches to test are welcome.

regards
Joerg



This is almost certainly an interrupt routing bug.  Can you try booting 
with ACPI disabled?  Can you try building a 6.0 kernel without SMP and

the 'apic' devices?  From 5.4, can you send your system information?

Scott
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]