Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Cindy Swearingen

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in 
mirrored pool and then immediately running a scrub on the pool. It works 
as expected.


Any other symptoms (like a power failure?) before the disk went offline? 
It is possible that both disks went offline?


We would like to review the crash dump if you still have it, just let me 
know when its uploaded.


Thanks,

Cindy


On 01/19/10 12:30, Frank Middleton wrote:

This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?

The panic starts with:

Jan 19 13:27:13 host6 ^Mpanic[cpu1]/thread=2a1009f5c80:
Jan 19 13:27:13 host6 unix: [ID 403854 kern.notice] assertion failed: 0 
== zap_update(dp-dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, 
DMU_POOL_SCRUB_BOOKMARK, sizeof (uint64_t), 4, dp-dp_scrub_bookmark, 
tx), file: ../../common/fs/zfs/dsl_scrub.c, line: 853


FWIW when the system came back up, it resilvered with no
problem and now I'm rerunning the scrub.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


The disk has to fail whilst the scrub is running. It has happened twice now,
once with the bottom half of the mirror, and again with the top half.
 

Any other symptoms (like a power failure?) before the disk went offline?
It is possible that both disks went offline?


Neither. The system is on a pretty beefy UPS, and one half of the mirror
was definitely online (zpool status just before panic showed one disk
offline and the pool as degraded).


We would like to review the crash dump if you still have it, just let me
know when its uploaded.


Do you need the unix.0, vmcore.0 or both? I'll add either or both as
attachments to newly created Bug 14012, Panic running a scrub,
when you let me know which one(s) you want.

Thanks -- Frank


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Cindy Swearingen

Hi Frank,

We need both files.

Thanks,

Cindy

On 01/20/10 15:43, Frank Middleton wrote:

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


The disk has to fail whilst the scrub is running. It has happened twice 
now,

once with the bottom half of the mirror, and again with the top half.
 

Any other symptoms (like a power failure?) before the disk went offline?
It is possible that both disks went offline?


Neither. The system is on a pretty beefy UPS, and one half of the mirror
was definitely online (zpool status just before panic showed one disk
offline and the pool as degraded).


We would like to review the crash dump if you still have it, just let me
know when its uploaded.


Do you need the unix.0, vmcore.0 or both? I'll add either or both as
attachments to newly created Bug 14012, Panic running a scrub,
when you let me know which one(s) you want.

Thanks -- Frank



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 05:55 PM, Cindy Swearingen wrote:

Hi Frank,

We need both files.


The vmcore is 1.4GB. An http upload is never going to complete.
Is there an ftp-able place to send it, or can you download it if I
post it somewhere?

Cheers -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-20 Thread Frank Middleton

On 01/20/10 04:27 PM, Cindy Swearingen wrote:

Hi Frank,

I couldn't reproduce this problem on SXCE build 130 by failing a disk in
mirrored pool and then immediately running a scrub on the pool. It works
as expected.


As noted, the disk mustn't go offline until well after the scrub has started.

There's another wrinkle. There are some COMSTAR iscsi targets on this
pool. If there are no initiators accessing any of them, the scrub completes
with no errors after 6 hours. If one specific target is active, the panic
ensues reproducibly at about 5h30m or so.

The precise configuration has 2 disks on one LSI controller as a
mirrored pool (whole disks - no slices). Around 750GB of 1.3TB was
in use when the most recent iscsi target was created. The pool
is read-mostly, so it probably isn't fragmented. The zvol has
copies=1; compression off (no dedupe with snv124). The initiator
is VirtualBox running on Fedora C10 on AMD64 and the target disk
has 32 bit Fedora C12 installed as whole disk, which I believe is EFI.

To reproduce this might require setting up a COMSTAR iscsi
target on a mirrored pool, formatting it with an EFI label, and
then running a scrub. Another, similar, target has OpenSolaris
installed on it, and it doesn't seem to cause a panic on a scrub
if it is running; AFAIK it doesn't use EFI, but I have not run
a scrub with it active since converting to COMSTAR either.

This wouldn't explain why one or the other disk randomly goes
offline and it may be a red herring. But the scrub now runs to
completion just as it always has. Since I can't get FC12 to boot
from the EFI disk in VirtualBox, I may reinstall FC12 without
EFI and see if that makes a difference, but it is an extremely
slow process since it takes almost 6 hours for the panic to occur
each time and there's no practical way to relocate the zvol
to the start of the pool.

HTH -- Frank




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Panic running a scrub

2010-01-19 Thread Frank Middleton

This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?

The panic starts with:

Jan 19 13:27:13 host6 ^Mpanic[cpu1]/thread=2a1009f5c80:
Jan 19 13:27:13 host6 unix: [ID 403854 kern.notice] assertion failed: 0 == 
zap_update(dp-dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_SCRUB_BOOKMARK, 
sizeof (uint64_t), 4, dp-dp_scrub_bookmark, tx), file: 
../../common/fs/zfs/dsl_scrub.c, line: 853

FWIW when the system came back up, it resilvered with no
problem and now I'm rerunning the scrub.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-19 Thread Bob Friesenhahn

On Tue, 19 Jan 2010, Frank Middleton wrote:


This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?


I seem to recall that you are not using ECC memory.  If so, maybe the 
panic is a good thing.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Panic running a scrub

2010-01-19 Thread David Magda

On Jan 19, 2010, at 14:30, Frank Middleton wrote:


This is probably unreproducible, but I just got a panic whilst
scrubbing a simple mirrored pool on scxe snv124. Evidently
on of the disks went offline for some reason and shortly
thereafter the panic happened. I have the dump and  the
/var/adm/messages containing the trace.

Is there any point in submitting a bug report?


Was a crash dump generated? If so, then there's a chance that it can  
be tracked down I would guess.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss