Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-06-02 Thread Joe Karthauser

on 23/05/2009 05:26 Alexander Motin said the following:

Hi.

Joe Karthauser wrote:

I spoke too soon. It must have just randomly booted, because it is now
hanging again. No amount of jiggling cables has made any difference.


Can you provide verbose boot messages of your system from the beginning
up to the problem? Especially, all related to the ATA.



Attached.



Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation?



It's set up as AHCI in the bios.

What is strange is that it has now started working again. I can't make 
any sense of it. The machine boots up fine.  It was definitely hanging 
at the ata probes though, just after the ZFS messages are output.


Joe
Copyright (c) 1992-2009 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 7.2-STABLE #7: Fri May 22 23:10:15 BST 2009
r...@athenaeum.tao.org.uk:/usr/obj/usr/src/sys/ATHENAEUM
Preloaded elf kernel /boot/kernel/kernel at 0x80b47000.
Preloaded elf module /boot/kernel/zfs.ko at 0x80b4719c.
Preloaded elf module /boot/kernel/opensolaris.ko at 0x80b47244.
Preloaded elf module /boot/kernel/geom_eli.ko at 0x80b472f4.
Preloaded elf module /boot/kernel/crypto.ko at 0x80b473a4.
Preloaded elf module /boot/kernel/zlib.ko at 0x80b47450.
Preloaded elf module /boot/kernel/geom_label.ko at 0x80b474fc.
Preloaded elf module /boot/kernel/geom_mirror.ko at 0x80b475ac.
Preloaded /boot/zfs/zpool.cache /boot/zfs/zpool.cache at 0x80b4765c.
Preloaded elf module /boot/kernel/acpi.ko at 0x80b476b4.
module_register: module g_label already exists!
Module g_label failed to register: 17
Calibrating clock(s) ... i8254 clock: 1192003 Hz
CLK_USE_I8254_CALIBRATION not specified - using default frequency
Timecounter i8254 frequency 1193182 Hz quality 0
Calibrating TSC clock ... TSC clock: 2402413236 Hz
CPU: Intel(R) Core(TM)2 Quad CPUQ6600  @ 2.40GHz (2402.41-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x6fb  Stepping = 11
  
Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE
  Features2=0xe3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM
  AMD Features=0x2010NX,LM
  AMD Features2=0x1LAHF
  Cores per package: 4

Instruction TLB: 4 KB Pages, 4-way set associative, 128 entries
1st-level instruction cache: 32 KB, 8-way set associative, 64 byte line size
1st-level data cache: 32 KB, 8-way set associative, 64 byte line size
L2 cache: 4096 kbytes, 16-way associative, 64 bytes/line
real memory  = 3756916736 (3582 MB)
Physical memory chunk(s):
0x1000 - 0x0009dfff, 643072 bytes (157 pages)
0x0010 - 0x003f, 3145728 bytes (768 pages)
0x00c25000 - 0xdbf7, 3677728768 bytes (897883 pages)
avail memory = 3673681920 (3503 MB)
Table 'FACP' at 0xdfee30c0
Table 'HPET' at 0xdfee7e00
Table 'MCFG' at 0xdfee7e80
Table 'APIC' at 0xdfee7d00
MADT: Found table at 0xdfee7d00
MP Configuration Table version 1.4 found at 0x800f0d00
APIC: Using the MADT enumerator.
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 3 ACPI ID 1: enabled
SMP: Added CPU 3 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 2: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 1 ACPI ID 3: enabled
SMP: Added CPU 1 (AP)
ACPI APIC Table: GBTGBTUACPI
INTR: Adding local APIC 1 as a target
INTR: Adding local APIC 2 as a target
INTR: Adding local APIC 3 as a target
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
bios32: Found BIOS32 Service Directory header at 0x800fad30
bios32: Entry = 0xfb3f0 (800fb3f0)  Rev = 0  Len = 1
pcibios: PCI BIOS entry at 0xf+0xb420
pnpbios: Found PnP BIOS data at 0x800fbf90
pnpbios: Entry = f:bfc0  Rev = 1.0
Other BIOS signatures found:
APIC: CPU 0 has ACPI ID 0
APIC: CPU 1 has ACPI ID 3
APIC: CPU 2 has ACPI ID 2
APIC: CPU 3 has ACPI ID 1
ULE: setup cpu group 0
ULE: setup cpu 0
ULE: adding cpu 0 to group 0: cpus 1 mask 0x1
ULE: setup cpu group 1
ULE: setup cpu 1
ULE: adding cpu 1 to group 1: cpus 1 mask 0x2
ULE: setup cpu group 2
ULE: setup cpu 2
ULE: adding cpu 2 to group 2: cpus 1 mask 0x4
ULE: setup cpu group 3
ULE: setup cpu 3
ULE: adding cpu 3 to group 3: cpus 1 mask 0x8
This module (opensolaris) contains code covered by the
Common Development and Distribution License (CDDL)
see http://opensolaris.org/os/licensing/opensolaris_license/
ACPI: RSDP @ 0x0xf6c30/0x0014 (v  0 GBT   )
ACPI: RSDT @ 0x0xdfee3040/0x0034 (v  1 GBTGBTUACPI 0x42302E31 GBTU 
0x01010101)
ACPI: FACP @ 0x0xdfee30c0/0x0074 (v  1 GBTGBTUACPI 0x42302E31 GBTU 
0x01010101)
ACPI: DSDT @ 0x0xdfee3180/0x4B32 (v  1 GBTGBTUACPI 0x1000 MSFT 
0x010C)
ACPI: FACS @ 0x0xdfee/0x0040
ACPI: HPET @ 

Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-22 Thread Joe Karthauser

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I 
still get the same problem. Very confusing. :(.


Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.org  wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4 bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1; atacontrol
attach sata4' and the device reappears. This happens on the other buses, but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't appear
to be that controller or slow in the drive bay because if I unplug all the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org







___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-22 Thread Kip Macy
Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauser j...@freebsd.org wrote:
 Hi Kip,

 I seriously don't understand what has happened. If I boot kernel.old I still
 get the same problem. Very confusing. :(.

 Joe

 on 21/05/2009 19:28 Kip Macy said the following:

 I have no idea what is happening. I think our best bet is having
 someone with insight into ATA provide us with help in adding
 diagnostics.

 Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

 Cheers,
 Kip


 On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.org  wrote:

 Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
 server, which now doesn't boot. :(.

 So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
 disks
 (gmirror on 500Mb partition on each of five disks, and zraid2 over the
 rest
 of each drive).

 What I did was to update the userland, and then reboot. I didn't upgrade
 the
 kernel (but I've subsequently done that and have the same problem).

 What happens is that the kernel hangs booting just after displaying a
 LABEL
 message or ZFS pool/spool message. I _can_ get it to boot if I boot
 single
 user with acpi switched off. When I do that I can manually start zfs, and
 mount all the partitions. However, one of the disks is missing more
 on
 that next.

 The machine is running a gigabyte motherboard (domestic gamer P35 board,
 similar to this

 http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
 although it might be a DS4 variant).  I've got 5 of the 6 sata ports
 wired
 to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4
 bays
 kind of thing).

 Now, because of the gmirror I can boot the system on any disk, or
 combination of plugged in disks. I should be able to succeed with the
 kernel probe up to the attempt to mount the root filesystem irrespective
 of
 any zfs pool, etc. And, indeed, this has been working fine for about two
 years.

 But, now it hangs in the same place no matter what disk I boot on (I've
 tried every bay).

 But, without ACPI enabled it does appear to boot ok... what's going on
 here?
 Is it possible that the machine has developed a hardware fault?

 Ok, finally, if I boot with ACPI disabled then one of the disks is
 missing.
 If I unplug it I get a disconnect message from the ata device, and a
 reconnect and reinit attempt when I plug it back in, but no device
 appears
 on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
 atacontrol
 attach sata4' and the device reappears. This happens on the other buses,
 but
 not on the last one. It's not the disk, because if I swap it into another
 bay, it comes up and appears on the bus. On the other hand it doesn't
 appear
 to be that controller or slow in the drive bay because if I unplug all
 the
 over disks the system will boot that disk and get as far as the hang
 hmm.

 Is this a consequence of disabling the ACPI?

 Does anyone have a clue what might be going on?

 Joe
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org









-- 
When bad men combine, the good must associate; else they will fall one
by one, an unpitied sacrifice in a contemptible struggle.

Edmund Burke
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem 
goes away if I disconnect some combination of bays.


Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauserj...@freebsd.org  wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.orgwrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start zfs, and
mount all the partitions. However, one of the disks is missing more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35 board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,
although it might be a DS4 variant).  I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 5-1/4
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for about two
years.

But, now it hangs in the same place no matter what disk I boot on (I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other buses,
but
not on the last one. It's not the disk, because if I swap it into another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org












___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.


Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.


Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauserj...@freebsd.org wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.org
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Larry Rosenman

I saw really strange stuff with one bad SATA cable on my 6 drive ZFS array.
It would work most of the time, but
the scrub would either cough up CRC's or hang.

I wound up replacing the disk *AND* the cable, and it's been fine since. 

This is on a SuperMicro chassis with Intel chips.

YMMV
-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683E-Mail: l...@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893

-Original Message-
From: owner-freebsd-sta...@freebsd.org
[mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Joe Karthauser
Sent: Friday, May 22, 2009 3:45 PM
To: Alexander Motin
Cc: freebsd-stable@freebsd.org; Kip Macy
Subject: Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at
kernel boot now, but didn't before... (Re: ZFS MFC heads up))

This appears to have gone away now. I unplugged the bay that was causing 
the trouble, and the system booted just fine on the remaining 4 drives. 
Then I plugged the bay back in (live) and did an atacontrol 
detach/attach on that bus (I wonder why I always have to do that). The 
drive was seen, and ZFS resilvered itself. I'm doing a ZFS scrub now to 
make sure that everything is good, and I'll do a reboot and see if it's 
all ok after that.

Strange, so it looks like a cable might have got a little loose or 
something. I wonder why that would have hung the kernel probe though.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:
 Hi Alexander,

 I've love it if you were able to provide some insight into this problem.

 I'm going to try switching sata cables around next to see if the problem
 goes away if I disconnect some combination of bays.

 Thanks,
 Joe

 on 22/05/2009 19:39 Kip Macy said the following:
 Motin is your best bet in tracking down ATA problems.

 Cheers,
 Kip


 On Fri, May 22, 2009 at 10:40 AM, Joe Karthauserj...@freebsd.org wrote:
 Hi Kip,

 I seriously don't understand what has happened. If I boot kernel.old
 I still
 get the same problem. Very confusing. :(.

 Joe

 on 21/05/2009 19:28 Kip Macy said the following:
 I have no idea what is happening. I think our best bet is having
 someone with insight into ATA provide us with help in adding
 diagnostics.

 Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

 Cheers,
 Kip


 On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.org
 wrote:
 Hmm, I've had a bit of a miserable afternoon trying to fight my
 RELENG_7
 server, which now doesn't boot. :(.

 So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
 disks
 (gmirror on 500Mb partition on each of five disks, and zraid2 over the
 rest
 of each drive).

 What I did was to update the userland, and then reboot. I didn't
 upgrade
 the
 kernel (but I've subsequently done that and have the same problem).

 What happens is that the kernel hangs booting just after displaying a
 LABEL
 message or ZFS pool/spool message. I _can_ get it to boot if I boot
 single
 user with acpi switched off. When I do that I can manually start
 zfs, and
 mount all the partitions. However, one of the disks is missing
 more
 on
 that next.

 The machine is running a gigabyte motherboard (domestic gamer P35
 board,
 similar to this


http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?Produ
ctID=2533,

 although it might be a DS4 variant). I've got 5 of the 6 sata ports
 wired
 to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
 5-1/4
 bays
 kind of thing).

 Now, because of the gmirror I can boot the system on any disk, or
 combination of plugged in disks. I should be able to succeed with the
 kernel probe up to the attempt to mount the root filesystem
 irrespective
 of
 any zfs pool, etc. And, indeed, this has been working fine for
 about two
 years.

 But, now it hangs in the same place no matter what disk I boot on
 (I've
 tried every bay).

 But, without ACPI enabled it does appear to boot ok... what's going on
 here?
 Is it possible that the machine has developed a hardware fault?

 Ok, finally, if I boot with ACPI disabled then one of the disks is
 missing.
 If I unplug it I get a disconnect message from the ata device, and a
 reconnect and reinit attempt when I plug it back in, but no device
 appears
 on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
 atacontrol
 attach sata4' and the device reappears. This happens on the other
 buses,
 but
 not on the last one. It's not the disk, because if I swap it into
 another
 bay, it comes up and appears on the bus. On the other hand it doesn't
 appear
 to be that controller or slow in the drive bay because if I unplug all
 the
 over disks the system will boot that disk and get as far as the
 hang
 hmm.

 Is this a consequence of disabling the ACPI?

 Does anyone have a clue what might be going on?

 Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo

Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Joe Karthauser
I spoke too soon. It must have just randomly booted, because it is now 
hanging again. No amount of jiggling cables has made any difference.


:(.

Joe

on 22/05/2009 20:40 Joe Karthauser said the following:

Hi Alexander,

I've love it if you were able to provide some insight into this problem.

I'm going to try switching sata cables around next to see if the problem
goes away if I disconnect some combination of bays.

Thanks,
Joe

on 22/05/2009 19:39 Kip Macy said the following:

Motin is your best bet in tracking down ATA problems.

Cheers,
Kip


On Fri, May 22, 2009 at 10:40 AM, Joe Karthauserj...@freebsd.org wrote:

Hi Kip,

I seriously don't understand what has happened. If I boot kernel.old
I still
get the same problem. Very confusing. :(.

Joe

on 21/05/2009 19:28 Kip Macy said the following:

I have no idea what is happening. I think our best bet is having
someone with insight into ATA provide us with help in adding
diagnostics.

Sorry for the trouble. Perhaps you can just roll back to 7.2 for now.

Cheers,
Kip


On Thu, May 21, 2009 at 10:50 AM, Joe Karthauserj...@freebsd.org
wrote:

Hmm, I've had a bit of a miserable afternoon trying to fight my
RELENG_7
server, which now doesn't boot. :(.

So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5
disks
(gmirror on 500Mb partition on each of five disks, and zraid2 over the
rest
of each drive).

What I did was to update the userland, and then reboot. I didn't
upgrade
the
kernel (but I've subsequently done that and have the same problem).

What happens is that the kernel hangs booting just after displaying a
LABEL
message or ZFS pool/spool message. I _can_ get it to boot if I boot
single
user with acpi switched off. When I do that I can manually start
zfs, and
mount all the partitions. However, one of the disks is missing
more
on
that next.

The machine is running a gigabyte motherboard (domestic gamer P35
board,
similar to this

http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533,

although it might be a DS4 variant). I've got 5 of the 6 sata ports
wired
to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3
5-1/4
bays
kind of thing).

Now, because of the gmirror I can boot the system on any disk, or
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem
irrespective
of
any zfs pool, etc. And, indeed, this has been working fine for
about two
years.

But, now it hangs in the same place no matter what disk I boot on
(I've
tried every bay).

But, without ACPI enabled it does appear to boot ok... what's going on
here?
Is it possible that the machine has developed a hardware fault?

Ok, finally, if I boot with ACPI disabled then one of the disks is
missing.
If I unplug it I get a disconnect message from the ata device, and a
reconnect and reinit attempt when I plug it back in, but no device
appears
on the bus. Usually I can do a 'atacontrol detach sata4; sleep 1;
atacontrol
attach sata4' and the device reappears. This happens on the other
buses,
but
not on the last one. It's not the disk, because if I swap it into
another
bay, it comes up and appears on the bus. On the other hand it doesn't
appear
to be that controller or slow in the drive bay because if I unplug all
the
over disks the system will boot that disk and get as far as the
hang
hmm.

Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
freebsd-stable-unsubscr...@freebsd.org














___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Sudden wierd SATA problem on RELENG_7 (Re: ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up))

2009-05-22 Thread Alexander Motin

Hi.

Joe Karthauser wrote:
I spoke too soon. It must have just randomly booted, because it is now 
hanging again. No amount of jiggling cables has made any difference.


Can you provide verbose boot messages of your system from the beginning 
up to the problem? Especially, all related to the ATA.


Do you have AHCI mode enabled in BIOS, or you using legacy ATA emulation?

--
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


ZFS hanging at kernel boot now, but didn't before... (Re: ZFS MFC heads up)

2009-05-21 Thread Joe Karthauser
Hmm, I've had a bit of a miserable afternoon trying to fight my RELENG_7 
server, which now doesn't boot. :(.


So, it's a ZRAID2 pool with a ufs/gmirror root partition split over 5 
disks (gmirror on 500Mb partition on each of five disks, and zraid2 over 
the rest of each drive).


What I did was to update the userland, and then reboot. I didn't upgrade 
the kernel (but I've subsequently done that and have the same problem).


What happens is that the kernel hangs booting just after displaying a 
LABEL message or ZFS pool/spool message. I _can_ get it to boot if I 
boot single user with acpi switched off. When I do that I can manually 
start zfs, and mount all the partitions. However, one of the disks is 
missing more on that next.


The machine is running a gigabyte motherboard (domestic gamer P35 board, 
similar to this 
http://www.gigabyte.com.tw/Products/Motherboard/Products_Overview.aspx?ProductID=2533, 
although it might be a DS4 variant).  I've got 5 of the 6 sata ports 
wired to a 5 unit SATA hot swap bay (5 drives vertially mounted into 3 
5-1/4 bays kind of thing).


Now, because of the gmirror I can boot the system on any disk, or 
combination of plugged in disks. I should be able to succeed with the
kernel probe up to the attempt to mount the root filesystem irrespective 
of any zfs pool, etc. And, indeed, this has been working fine for about 
two years.


But, now it hangs in the same place no matter what disk I boot on (I've 
tried every bay).


But, without ACPI enabled it does appear to boot ok... what's going on 
here? Is it possible that the machine has developed a hardware fault?


Ok, finally, if I boot with ACPI disabled then one of the disks is 
missing. If I unplug it I get a disconnect message from the ata device, 
and a reconnect and reinit attempt when I plug it back in, but no device 
appears on the bus. Usually I can do a 'atacontrol detach sata4; sleep 
1; atacontrol attach sata4' and the device reappears. This happens on 
the other buses, but not on the last one. It's not the disk, because if 
I swap it into another bay, it comes up and appears on the bus. On the 
other hand it doesn't appear to be that controller or slow in the drive 
bay because if I unplug all the over disks the system will boot that 
disk and get as far as the hang hmm.


Is this a consequence of disabling the ACPI?

Does anyone have a clue what might be going on?

Joe
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org