On 26/09/2014 00:46, Andrew M. Hettinger wrote:
I'm presently running tests on a pool using 3x Samsung 850 SSDs on a LSI-9211-8i (IT) contoller. I thought I'd try seperating the intent log to see if lowering the write amplification on the pool-drives would help, so I added another matching SSD for that, but under load I still seem to get extensive checksum errors. Does anyone have any ideas as to what would be causing this?pool: test-array state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 0 in 0h0m with 0 errors on Wed Sep 24 18:20:56 2014 config: NAME STATE READ WRITE CKSUM test-array DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c0t50025388700060D4d0 DEGRADED 0 0 155 too many errors c0t50025388700060AEd0 DEGRADED 0 0 149 too many errors c0t50025388700060C2d0 DEGRADED 0 0 174 too many errors logs c0t50025388A067DBE9d0 ONLINE 0 0 0 errors: No known data errors ---- errors --- s/w h/w trn tot device 0 2 6 8 c0t50025388700060D4d0 0 0 0 0 c0t50025388700060AEd0 0 0 0 0 c0t50025388700060C2d0 0 0 0 0 c0t50025388A067DBE9d0
Transport errors could be bad cabling. 850's are very new, so I also wouldn't exclude firmware problems. But it could also be that you again see an instance of a mysterious possible bug when scrubbing mirrors. I myself have an SSD rpool (Supertalent SataII), and nearly always get these errors (though in the range of 10, not 150) when scrubbing this pool since day 1, regardless of firmware. I never get them with ordinary disks, so maybe the speed is a factor to trigger this problem. I tried to hunt that down, but it's really difficult if you're not a kernel developer.... I have the suspicion that maybe an overlapping interrupt with with the USB system plays a role here, but that's just a speculation. I've seen a similar post for rpool mirrors a couple of years ago, and this also led to no conclusion. Maybe the number of errors you see opens a better opportunity to hunt down the problem source with some clever dtrace scripts, but for that I would recommend to switch over to the illumos-developer list, where those experts are lurking.
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ oi-dev mailing list [email protected] http://openindiana.org/mailman/listinfo/oi-dev
