Re: [zfs-discuss] possible ZFS-related panic?

2010-09-03 Thread Cindy Swearingen

Hi Marion,

I'm not the right person to analyze your panic stack, but a quick
search says the page_sub: bad arg(s): pp panic string might be
associated with a bad CPU or a page locking problem.

I would recommend running CPU/memory diagnostics on this system.


Thanks,

Cindy

On 09/02/10 20:31, Marion Hakanson wrote:

Folks,

Has anyone seen a panic traceback like the following?  This is Solaris-10u7
on a Thumper, acting as an NFS server.  The machine was up for nearly a
year, I added a dataset to an existing pool, set compression=on for the
first time on this system, loaded some data in there (via rsync),
then mounted it to the NFS client.

The first data was written by the client itself in a 10pm cron-job, and
the system crashed at 10:02pm as below:

panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 
872b5610, *ppp 0


fe8000f5c470 unix:mutex_exit_critical_size+20219 ()
fe8000f5c4b0 unix:page_list_sub_pages+161 ()
fe8000f5c510 unix:page_claim_contig_pages+190 ()
fe8000f5c600 unix:page_geti_contig_pages+44b ()
fe8000f5c660 unix:page_get_contig_pages+c2 ()
fe8000f5c6f0 unix:page_get_freelist+1a4 ()
fe8000f5c760 unix:page_create_get_something+95 ()
fe8000f5c7f0 unix:page_create_va+2a1 ()
fe8000f5c850 unix:segkmem_page_create+72 ()
fe8000f5c8b0 unix:segkmem_xalloc+60 ()
fe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
fe8000f5c8f0 unix:segkmem_alloc+10 ()
fe8000f5c9c0 genunix:vmem_xalloc+315 ()
fe8000f5ca20 genunix:vmem_alloc+155 ()
fe8000f5ca90 genunix:kmem_slab_create+77 ()
fe8000f5cac0 genunix:kmem_slab_alloc+107 ()
fe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
fe8000f5cb00 zfs:zio_buf_alloc+1d ()
fe8000f5cb50 zfs:zio_compress_data+ba ()
fe8000f5cba0 zfs:zio_write_compress+78 ()
fe8000f5cbc0 zfs:zio_execute+60 ()
fe8000f5cc40 genunix:taskq_thread+bc ()
fe8000f5cc50 unix:thread_start+8 ()

syncing file systems... done
. . .

Unencumbered by more than a gut feeling, I disabled compression on
the dataset, and we've gotten through two nightly runs of the same
NFS client job without crashing, but of course we would tecnically
have to wait for nearly a year before we've exactly replicated the
original situation (:-).

Unfortunately the dump-slice was slightly too small, we were just short
of enough space to capture the whole 10GB crash dump.  I did get savecore
to write something out, and I uploaded it to the Oracle support site,but it 
gives scat too much indigestion to be useful to the engineer I'm working

with.  They have not found any matching bugs so far, so I thought I'd ask a
slightly wider audience here.

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] possible ZFS-related panic?

2010-09-02 Thread Marion Hakanson
Folks,

Has anyone seen a panic traceback like the following?  This is Solaris-10u7
on a Thumper, acting as an NFS server.  The machine was up for nearly a
year, I added a dataset to an existing pool, set compression=on for the
first time on this system, loaded some data in there (via rsync),
then mounted it to the NFS client.

The first data was written by the client itself in a 10pm cron-job, and
the system crashed at 10:02pm as below:

panic[cpu2]/thread=fe8000f5cc60: page_sub: bad arg(s): pp 
872b5610, *ppp 0

fe8000f5c470 unix:mutex_exit_critical_size+20219 ()
fe8000f5c4b0 unix:page_list_sub_pages+161 ()
fe8000f5c510 unix:page_claim_contig_pages+190 ()
fe8000f5c600 unix:page_geti_contig_pages+44b ()
fe8000f5c660 unix:page_get_contig_pages+c2 ()
fe8000f5c6f0 unix:page_get_freelist+1a4 ()
fe8000f5c760 unix:page_create_get_something+95 ()
fe8000f5c7f0 unix:page_create_va+2a1 ()
fe8000f5c850 unix:segkmem_page_create+72 ()
fe8000f5c8b0 unix:segkmem_xalloc+60 ()
fe8000f5c8e0 unix:segkmem_alloc_vn+8a ()
fe8000f5c8f0 unix:segkmem_alloc+10 ()
fe8000f5c9c0 genunix:vmem_xalloc+315 ()
fe8000f5ca20 genunix:vmem_alloc+155 ()
fe8000f5ca90 genunix:kmem_slab_create+77 ()
fe8000f5cac0 genunix:kmem_slab_alloc+107 ()
fe8000f5caf0 genunix:kmem_cache_alloc+e9 ()
fe8000f5cb00 zfs:zio_buf_alloc+1d ()
fe8000f5cb50 zfs:zio_compress_data+ba ()
fe8000f5cba0 zfs:zio_write_compress+78 ()
fe8000f5cbc0 zfs:zio_execute+60 ()
fe8000f5cc40 genunix:taskq_thread+bc ()
fe8000f5cc50 unix:thread_start+8 ()

syncing file systems... done
. . .

Unencumbered by more than a gut feeling, I disabled compression on
the dataset, and we've gotten through two nightly runs of the same
NFS client job without crashing, but of course we would tecnically
have to wait for nearly a year before we've exactly replicated the
original situation (:-).

Unfortunately the dump-slice was slightly too small, we were just short
of enough space to capture the whole 10GB crash dump.  I did get savecore
to write something out, and I uploaded it to the Oracle support site,but it 
gives scat too much indigestion to be useful to the engineer I'm working
with.  They have not found any matching bugs so far, so I thought I'd ask a
slightly wider audience here.

Thanks and regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss