[zfs-discuss] test
I have not seen any email from this list in a couple of days. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool scrub on b123
Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub on b123
Hi Karl... I just saw this same condition on another list. I think the poster resolved it by replacing the HBA. Drives go bad but they generally don't all go bad at once, so I would suspect some common denominator like the HBA/controller, cables, and so on. See what FMA thinks by running fmdump like this: # fmdump TIME UUID SUNW-MSG-ID Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3 Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3 Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3 This output will tell you when the problem started. Depending on what fmdump says, which probably indicates multiple drive problems, I would run diagnostics on the HBA or get it replaced. Always have good backups. Thanks, Cindy On 04/15/11 12:52, Karl Rossing wrote: Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub on b123
D'oh. One more thing. We had a problem in b120-123 that caused random checksum errors on RAIDZ configs. This info is still in the ZFS troubleshooting guide. See if a zpool clear resolves these errors. If that works, then I would upgrade to a more recent build and see if the problem is resolved completely. If not, then see the recommendation below. Thanks, Cindy On 04/15/11 13:18, Cindy Swearingen wrote: Hi Karl... I just saw this same condition on another list. I think the poster resolved it by replacing the HBA. Drives go bad but they generally don't all go bad at once, so I would suspect some common denominator like the HBA/controller, cables, and so on. See what FMA thinks by running fmdump like this: # fmdump TIME UUID SUNW-MSG-ID Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3 Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3 Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3 This output will tell you when the problem started. Depending on what fmdump says, which probably indicates multiple drive problems, I would run diagnostics on the HBA or get it replaced. Always have good backups. Thanks, Cindy On 04/15/11 12:52, Karl Rossing wrote: Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub on b123
I'm going to wait until the scrub is complete before diving in some more. I'm wondering if replacing the LSI SAS 3801E with an LSI SAS 9200-8e might help too. Karl On 04/15/2011 02:23 PM, Cindy Swearingen wrote: D'oh. One more thing. We had a problem in b120-123 that caused random checksum errors on RAIDZ configs. This info is still in the ZFS troubleshooting guide. See if a zpool clear resolves these errors. If that works, then I would upgrade to a more recent build and see if the problem is resolved completely. If not, then see the recommendation below. Thanks, Cindy On 04/15/11 13:18, Cindy Swearingen wrote: Hi Karl... I just saw this same condition on another list. I think the poster resolved it by replacing the HBA. Drives go bad but they generally don't all go bad at once, so I would suspect some common denominator like the HBA/controller, cables, and so on. See what FMA thinks by running fmdump like this: # fmdump TIME UUID SUNW-MSG-ID Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3 Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3 Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3 This output will tell you when the problem started. Depending on what fmdump says, which probably indicates multiple drive problems, I would run diagnostics on the HBA or get it replaced. Always have good backups. Thanks, Cindy On 04/15/11 12:52, Karl Rossing wrote: Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub on b123
Would moving the pool to a Solaris 10U9 server fix the random RAIDZ errors? On 04/15/2011 02:23 PM, Cindy Swearingen wrote: D'oh. One more thing. We had a problem in b120-123 that caused random checksum errors on RAIDZ configs. This info is still in the ZFS troubleshooting guide. See if a zpool clear resolves these errors. If that works, then I would upgrade to a more recent build and see if the problem is resolved completely. If not, then see the recommendation below. Thanks, Cindy On 04/15/11 13:18, Cindy Swearingen wrote: Hi Karl... I just saw this same condition on another list. I think the poster resolved it by replacing the HBA. Drives go bad but they generally don't all go bad at once, so I would suspect some common denominator like the HBA/controller, cables, and so on. See what FMA thinks by running fmdump like this: # fmdump TIME UUID SUNW-MSG-ID Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3 Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3 Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3 This output will tell you when the problem started. Depending on what fmdump says, which probably indicates multiple drive problems, I would run diagnostics on the HBA or get it replaced. Always have good backups. Thanks, Cindy On 04/15/11 12:52, Karl Rossing wrote: Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool scrub on b123
Yes, the Solaris 10 9/10 release has the fix for RAIDZ checksum errors if you have ruled out any hardware problems. cs On 04/15/11 14:47, Karl Rossing wrote: Would moving the pool to a Solaris 10U9 server fix the random RAIDZ errors? On 04/15/2011 02:23 PM, Cindy Swearingen wrote: D'oh. One more thing. We had a problem in b120-123 that caused random checksum errors on RAIDZ configs. This info is still in the ZFS troubleshooting guide. See if a zpool clear resolves these errors. If that works, then I would upgrade to a more recent build and see if the problem is resolved completely. If not, then see the recommendation below. Thanks, Cindy On 04/15/11 13:18, Cindy Swearingen wrote: Hi Karl... I just saw this same condition on another list. I think the poster resolved it by replacing the HBA. Drives go bad but they generally don't all go bad at once, so I would suspect some common denominator like the HBA/controller, cables, and so on. See what FMA thinks by running fmdump like this: # fmdump TIME UUID SUNW-MSG-ID Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3 Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3 Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3 This output will tell you when the problem started. Depending on what fmdump says, which probably indicates multiple drive problems, I would run diagnostics on the HBA or get it replaced. Always have good backups. Thanks, Cindy On 04/15/11 12:52, Karl Rossing wrote: Hi, One of our zfs volumes seems to be having some errors. So I ran zpool scrub and it's currently showing the following. -bash-3.2$ pfexec /usr/sbin/zpool status -x pool: vdipool state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go config: NAME STATE READ WRITE CKSUM vdipool ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c9t14d0 ONLINE 0 012 6K repaired c9t15d0 ONLINE 0 013 167K repaired c9t16d0 ONLINE 0 011 5.50K repaired c9t17d0 ONLINE 0 020 10K repaired c9t18d0 ONLINE 0 015 7.50K repaired spares c9t19d0AVAIL errors: No known data errors I have another server connected to the same jbod using drives c8t1d0 to c8t13d0 and it doesn't seem to have any errors. I'm wondering how it could have gotten so screwed up? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss