I don't think that this is hardware issue, however i don't except this. I'll 
try to explain why.

1. I've replaced all memory modules which are more likely to cause such a 
problem.

2. There are many different applications running on that server (Apache, 
PostgreSQL, etc.). However, if you look at the four different crash dump stack 
traces you see the same picture:

------ crash dump st1 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
spa_scrub_io_start+0xf1()
spa_scrub_cb+0x13d()

------ crash dump st2 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

------ crash dump st3 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

------ crash dump st4 ------
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()


All four crash dumps show problem at zio_read/zio_buf_alloc. Three of these 
appeared during metadata prefetch (dmu_prefetch) and one during scrubbing. I 
don't think that it's coincidence. IMHO, checksum errors are the result of this 
inconsistency.

I tend to think that problem is in ZFS it exists even in the latest Solaris 
version (maybe OpenSolaris as well).


> 
> Lots of CKSUM errors like you see is often indicative
> of bad hardware. Run 
> memtest for 24-48 hours.
> 
> -marc
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to