Re: [zfs-discuss] resilver that never finishes
Hi, The drives and the chassis are fine, what I am questioning is how can it be resilvering more data to a device than the capacity of the device? If data on pool has changed during resilver, resilver counter will not update accordingly, and it will show resilvering 100% for needed time to catch up. Yours Markus Kovero ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 19 September, 2010 - Markus Kovero sent me these 0,5K bytes: Hi, The drives and the chassis are fine, what I am questioning is how can it be resilvering more data to a device than the capacity of the device? If data on pool has changed during resilver, resilver counter will not update accordingly, and it will show resilvering 100% for needed time to catch up. I believe this was fixed recently, by displaying how many blocks it has checked vs how many to check... /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 18/09/10 15:25, George Wilson wrote: Tom Bird wrote: In my case, other than an hourly snapshot, the data is not significantly changing. It'd be nice to see a response other than you're doing it wrong, rebuilding 5x the data on a drive relative to its capacity is clearly erratic behaviour, I am curious as to what is actually happening. It sounds like you're hitting '6891824 7410 NAS head continually resilvering following HDD replacement'. If you stop taking and destroying snapshots you should see the resilver finish. George, I think you've won the prize. I suspended the snapshots last night and this morning one pool had completed, one left to go. Thanks, Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
Hi all one of our system just developed something remotely similar: s06:~# zpool status pool: atlashome state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM atlashome DEGRADED 0 0 0 raidz2-0DEGRADED 0 0 0 c0t0d0ONLINE 0 0 0 c1t0d0ONLINE 0 0 0 c5t0d0ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 c7t0d0s0/o FAULTED 0 0 0 corrupted data c7t0d0 ONLINE 0 0 0 678G resilvered [...] It's 100% done for more than a day now, system is running fully patched Solaris 10 (patchref from September 10th or 13th I believe) Has someone an idea how it is possible to resilver 678G of data on a 500G drive? s06:~# iostat -En c7t0d0 c7t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: HITACHI HDS7250S Revision: AV0A Serial No: Size: 500.11GB 500107861504 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 197 Predictive Failure Analysis: 0 I still have to upgrade the zpool versin, but wanted to wait for the resilver to complete. Any ideas? Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 09/18/10 06:47 PM, Carsten Aulbert wrote: Hi all one of our system just developed something remotely similar: s06:~# zpool status pool: atlashome state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go config: NAME STATE READ WRITE CKSUM atlashome DEGRADED 0 0 0 raidz2-0DEGRADED 0 0 0 c0t0d0ONLINE 0 0 0 c1t0d0ONLINE 0 0 0 c5t0d0ONLINE 0 0 0 replacing-3 DEGRADED 0 0 0 c7t0d0s0/o FAULTED 0 0 0 corrupted data c7t0d0 ONLINE 0 0 0 678G resilvered [...] It's 100% done for more than a day now, system is running fully patched Solaris 10 (patchref from September 10th or 13th I believe) Has someone an idea how it is possible to resilver 678G of data on a 500G drive? I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
Hi On Saturday 18 September 2010 10:02:42 Ian Collins wrote: I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. Ah ok, that may be, there is one particular active user on this box right now. Interesting I've never seen this in the past. Is there really an end to this and do I just have to wait? Cheers Carsten ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 09/18/10 08:58 PM, Carsten Aulbert wrote: Hi On Saturday 18 September 2010 10:02:42 Ian Collins wrote: I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. Ah ok, that may be, there is one particular active user on this box right now. Interesting I've never seen this in the past. Is there really an end to this and do I just have to wait? Oh yes, the last one I had was 100% done for about 40 hours! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 18/09/10 09:02, Ian Collins wrote: On 09/18/10 06:47 PM, Carsten Aulbert wrote: Has someone an idea how it is possible to resilver 678G of data on a 500G drive? I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. In my case, other than an hourly snapshot, the data is not significantly changing. It'd be nice to see a response other than you're doing it wrong, rebuilding 5x the data on a drive relative to its capacity is clearly erratic behaviour, I am curious as to what is actually happening. All said and done though, we will have to live with snv_134's bugs from now on, or perhaps I could try Sol 10. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On Sat, Sep 18, 2010 at 7:01 PM, Tom Bird t...@marmot.org.uk wrote: All said and done though, we will have to live with snv_134's bugs from now on, or perhaps I could try Sol 10. or OpenIllumos. Or Nexenta. Or FreeBSD. Or insert osol distro name. -- O ascii ribbon campaign - stop html mail - www.asciiribbon.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 18/09/10 13:06, Edho P Arief wrote: On Sat, Sep 18, 2010 at 7:01 PM, Tom Birdt...@marmot.org.uk wrote: All said and done though, we will have to live with snv_134's bugs from now on, or perhaps I could try Sol 10. or OpenIllumos. Or Nexenta. Or FreeBSD. Orinsert osol distro name. ... none of which will receive ZFS code updates unless Oracle deigns to bestow them upon the community, this or ZFS dev is taken over by said community, in which case we end up with diverging code bases that would be a sisyphean task to try and merge. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
But all of which have newer code, today, than onnv-134. On 18 September 2010 22:20, Tom Bird t...@marmot.org.uk wrote: On 18/09/10 13:06, Edho P Arief wrote: On Sat, Sep 18, 2010 at 7:01 PM, Tom Birdt...@marmot.org.uk wrote: All said and done though, we will have to live with snv_134's bugs from now on, or perhaps I could try Sol 10. or OpenIllumos. Or Nexenta. Or FreeBSD. Orinsert osol distro name. ... none of which will receive ZFS code updates unless Oracle deigns to bestow them upon the community, this or ZFS dev is taken over by said community, in which case we end up with diverging code bases that would be a sisyphean task to try and merge. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
Tom Bird wrote: On 18/09/10 09:02, Ian Collins wrote: In my case, other than an hourly snapshot, the data is not significantly changing. It'd be nice to see a response other than you're doing it wrong, rebuilding 5x the data on a drive relative to its capacity is clearly erratic behaviour, I am curious as to what is actually happening. All said and done though, we will have to live with snv_134's bugs from now on, or perhaps I could try Sol 10. Tom ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss It sounds like you're hitting '6891824 7410 NAS head continually resilvering following HDD replacement'. If you stop taking and destroying snapshots you should see the resilver finish. Thanks, George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 09/19/10 12:01 AM, Tom Bird wrote: On 18/09/10 09:02, Ian Collins wrote: On 09/18/10 06:47 PM, Carsten Aulbert wrote: Has someone an idea how it is possible to resilver 678G of data on a 500G drive? I see this all the time on a troublesome Thumper. I believe this happens because the data in the pool is continuously changing. In my case, other than an hourly snapshot, the data is not significantly changing. It'd be nice to see a response other than you're doing it wrong, rebuilding 5x the data on a drive relative to its capacity is clearly erratic behaviour, I am curious as to what is actually happening. The ridiculous pool design isn't helping! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On Fri, 17 Sep 2010, Tom Bird wrote: Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it's doing it all again. You have twice as many big slow drives in a raidz2 that any sane person would recommend. It looks like you either have drives which are too weak to sustain resilvering a failed disk, or a chassis which is not strong enough. Your only option seems to be to also replace c7t5000CCA221DE2225d0 and hope for the best. Expect the replacement to take a very long time. It is wise to restart the pool from scratch with multiple vdevs comprised of fewer devices. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
On 09/18/10 04:28 AM, Tom Bird wrote: Bob Friesenhahn wrote: On Fri, 17 Sep 2010, Tom Bird wrote: Morning, c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it? This is actually an old capture of the status output, it got to nearly 10T before deciding that there was an error and not completing, reseat disk and it's doing it all again. You have twice as many big slow drives in a raidz2 that any sane person would recommend. It looks like you either have drives which are too weak to sustain resilvering a failed disk, or a chassis which is not strong enough. The drives and the chassis are fine, what I am questioning is how can it be resilvering more data to a device than the capacity of the device? Is the pool in use? If so, data will be changing while the resliver is running. With such a ridiculously wide vdev and large drives, the resliver will take a very very long time it complete. if the pool is sufficiently busy, it may never complete. Your only option seems to be to also replace c7t5000CCA221DE2225d0 and hope for the best. Expect the replacement to take a very long time. It is wise to restart the pool from scratch with multiple vdevs comprised of fewer devices. This stuff should just work, if it only rewrote the 2T that was meant to be on the drive the rebuild would take a day or so. Bob's comments about the pool design are correct, you have a disaster waiting to happen. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver that never finishes
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Tom Bird We recently had a long discussion in this list, about resilver times versus raid types. In the end, the conclusion was: resilver code is very inefficient for raidzN. Someday it may be better optimized, but until that day comes, you really need to break your giant raidzN into smaller vdev's. 3 vdev's of 7 disk raidz is preferable over a 21 disk raidz3. If you want this resilver to complete, you should do anything you can to (a) stop taking snapshots (b) don't scrub (c) stop all IO possible. And be patient. Most people in your situation find it faster to zfs send to some other storage, and then destroy recreate the pool. I know it stinks. But that's what you're facing. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss