Re: [zfs-discuss] resilver that never finishes

2010-09-19 Thread Markus Kovero
Hi, 

 The drives and the chassis are fine, what I am questioning is how can it 
 be resilvering more data to a device than the capacity of the device?

If data on pool has changed during resilver, resilver counter will not update 
accordingly, and it will show resilvering 100% for needed time to catch up.

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-19 Thread Tomas Ögren
On 19 September, 2010 - Markus Kovero sent me these 0,5K bytes:

 Hi, 
 
  The drives and the chassis are fine, what I am questioning is how can it 
  be resilvering more data to a device than the capacity of the device?
 
 If data on pool has changed during resilver, resilver counter will not
 update accordingly, and it will show resilvering 100% for needed time
 to catch up.

I believe this was fixed recently, by displaying how many blocks it has
checked vs how many to check...

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-19 Thread Tom Bird

On 18/09/10 15:25, George Wilson wrote:

Tom Bird wrote:



In my case, other than an hourly snapshot, the data is not
significantly changing.

It'd be nice to see a response other than you're doing it wrong,
rebuilding 5x the data on a drive relative to its capacity is clearly
erratic behaviour, I am curious as to what is actually happening.



It sounds like you're hitting '6891824 7410 NAS head continually
resilvering following HDD replacement'. If you stop taking and
destroying snapshots you should see the resilver finish.


George, I think you've won the prize.  I suspended the snapshots last 
night and this morning one pool had completed, one left to go.


Thanks,

Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Carsten Aulbert
Hi all

one of our system just developed something remotely similar:


s06:~# zpool status
  pool: atlashome
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go
config:

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz2-0DEGRADED 0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
replacing-3   DEGRADED 0 0 0
  c7t0d0s0/o  FAULTED  0 0 0  corrupted data
  c7t0d0  ONLINE   0 0 0  678G resilvered

[...]

It's 100% done for more than a day now, system is running fully patched 
Solaris 10 (patchref from September 10th or 13th I believe)

Has someone an idea how it is possible to resilver 678G of data on a 500G 
drive?

s06:~# iostat -En c7t0d0
c7t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: HITACHI HDS7250S Revision: AV0A Serial No:  
Size: 500.11GB 500107861504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 197 Predictive Failure Analysis: 0 

I still have to upgrade the zpool versin, but wanted to wait for the resilver 
to complete.

Any ideas?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Ian Collins

On 09/18/10 06:47 PM, Carsten Aulbert wrote:

Hi all

one of our system just developed something remotely similar:


s06:~# zpool status
   pool: atlashome
  state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
 continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go
config:

 NAME  STATE READ WRITE CKSUM
 atlashome DEGRADED 0 0 0
   raidz2-0DEGRADED 0 0 0
 c0t0d0ONLINE   0 0 0
 c1t0d0ONLINE   0 0 0
 c5t0d0ONLINE   0 0 0
 replacing-3   DEGRADED 0 0 0
   c7t0d0s0/o  FAULTED  0 0 0  corrupted data
   c7t0d0  ONLINE   0 0 0  678G resilvered

[...]

It's 100% done for more than a day now, system is running fully patched
Solaris 10 (patchref from September 10th or 13th I believe)

Has someone an idea how it is possible to resilver 678G of data on a 500G
drive?
   


I see this all the time on a troublesome Thumper.  I believe this 
happens because the data in the pool is continuously changing.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Carsten Aulbert
Hi

On Saturday 18 September 2010 10:02:42 Ian Collins wrote:
 
 I see this all the time on a troublesome Thumper.  I believe this
 happens because the data in the pool is continuously changing.

Ah ok, that may be, there is one particular active user on this box right now.

Interesting I've never seen this in the past.

Is there really an end to this and do I just have to wait?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Ian Collins

On 09/18/10 08:58 PM, Carsten Aulbert wrote:

Hi

On Saturday 18 September 2010 10:02:42 Ian Collins wrote:
   

I see this all the time on a troublesome Thumper.  I believe this
happens because the data in the pool is continuously changing.
 

Ah ok, that may be, there is one particular active user on this box right now.

Interesting I've never seen this in the past.

Is there really an end to this and do I just have to wait?

   

Oh yes, the last one I had was 100% done for about 40 hours!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Tom Bird

On 18/09/10 09:02, Ian Collins wrote:

On 09/18/10 06:47 PM, Carsten Aulbert wrote:



Has someone an idea how it is possible to resilver 678G of data on a 500G
drive?


I see this all the time on a troublesome Thumper. I believe this happens
because the data in the pool is continuously changing.


In my case, other than an hourly snapshot, the data is not significantly 
changing.


It'd be nice to see a response other than you're doing it wrong, 
rebuilding 5x the data on a drive relative to its capacity is clearly 
erratic behaviour, I am curious as to what is actually happening.


All said and done though, we will have to live with snv_134's bugs from 
now on, or perhaps I could try Sol 10.


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Edho P Arief
On Sat, Sep 18, 2010 at 7:01 PM, Tom Bird t...@marmot.org.uk wrote:
 All said and done though, we will have to live with snv_134's bugs from now
 on, or perhaps I could try Sol 10.


or OpenIllumos. Or Nexenta. Or FreeBSD. Or insert osol distro name.

-- 
O ascii ribbon campaign - stop html mail - www.asciiribbon.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Tom Bird

On 18/09/10 13:06, Edho P Arief wrote:

On Sat, Sep 18, 2010 at 7:01 PM, Tom Birdt...@marmot.org.uk  wrote:

All said and done though, we will have to live with snv_134's bugs from now
on, or perhaps I could try Sol 10.


or OpenIllumos. Or Nexenta. Or FreeBSD. Orinsert osol distro name.


... none of which will receive ZFS code updates unless Oracle deigns to 
bestow them upon the community, this or ZFS dev is taken over by said 
community, in which case we end up with diverging code bases that would 
be a sisyphean task to try and merge.


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread taemun
But all of which have newer code, today, than onnv-134.

On 18 September 2010 22:20, Tom Bird t...@marmot.org.uk wrote:

 On 18/09/10 13:06, Edho P Arief wrote:

 On Sat, Sep 18, 2010 at 7:01 PM, Tom Birdt...@marmot.org.uk  wrote:

 All said and done though, we will have to live with snv_134's bugs from
 now
 on, or perhaps I could try Sol 10.


 or OpenIllumos. Or Nexenta. Or FreeBSD. Orinsert osol distro name.


 ... none of which will receive ZFS code updates unless Oracle deigns to
 bestow them upon the community, this or ZFS dev is taken over by said
 community, in which case we end up with diverging code bases that would be a
 sisyphean task to try and merge.

 Tom

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread George Wilson

Tom Bird wrote:

On 18/09/10 09:02, Ian Collins wrote:


In my case, other than an hourly snapshot, the data is not significantly 
changing.


It'd be nice to see a response other than you're doing it wrong, 
rebuilding 5x the data on a drive relative to its capacity is clearly 
erratic behaviour, I am curious as to what is actually happening.


All said and done though, we will have to live with snv_134's bugs from 
now on, or perhaps I could try Sol 10.


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



It sounds like you're hitting '6891824 7410 NAS head continually 
resilvering following HDD replacement'. If you stop taking and 
destroying snapshots you should see the resilver finish.


Thanks,
George
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Ian Collins

On 09/19/10 12:01 AM, Tom Bird wrote:

On 18/09/10 09:02, Ian Collins wrote:

On 09/18/10 06:47 PM, Carsten Aulbert wrote:


Has someone an idea how it is possible to resilver 678G of data on a 
500G

drive?


I see this all the time on a troublesome Thumper. I believe this happens
because the data in the pool is continuously changing.


In my case, other than an hourly snapshot, the data is not 
significantly changing.


It'd be nice to see a response other than you're doing it wrong, 
rebuilding 5x the data on a drive relative to its capacity is clearly 
erratic behaviour, I am curious as to what is actually happening.



The ridiculous pool design isn't helping!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Bob Friesenhahn

On Fri, 17 Sep 2010, Tom Bird wrote:


Morning,

c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it?

This is actually an old capture of the status output, it got to nearly 10T 
before deciding that there was an error and not completing, reseat disk and 
it's doing it all again.


You have twice as many big slow drives in a raidz2 that any sane 
person would recommend.  It looks like you either have drives which 
are too weak to sustain resilvering a failed disk, or a chassis which 
is not strong enough.


Your only option seems to be to also replace c7t5000CCA221DE2225d0 and 
hope for the best.  Expect the replacement to take a very long time.


It is wise to restart the pool from scratch with multiple vdevs 
comprised of fewer devices.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Ian Collins

On 09/18/10 04:28 AM, Tom Bird wrote:

Bob Friesenhahn wrote:

On Fri, 17 Sep 2010, Tom Bird wrote:


Morning,

c7t5000CCA221F4EC54d0 is a 2T disk, how can it resilver 5.63T of it?

This is actually an old capture of the status output, it got to 
nearly 10T before deciding that there was an error and not 
completing, reseat disk and it's doing it all again.


You have twice as many big slow drives in a raidz2 that any sane 
person would recommend.  It looks like you either have drives which 
are too weak to sustain resilvering a failed disk, or a chassis which 
is not strong enough.


The drives and the chassis are fine, what I am questioning is how can 
it be resilvering more data to a device than the capacity of the 
device?


Is the pool in use?  If so, data will be changing while the resliver is 
running.  With such a ridiculously wide vdev and large drives, the 
resliver will take a very very long time it complete.  if the pool is 
sufficiently busy, it may never complete.


Your only option seems to be to also replace c7t5000CCA221DE2225d0 
and hope for the best.  Expect the replacement to take a very long time.


It is wise to restart the pool from scratch with multiple vdevs 
comprised of fewer devices.


This stuff should just work, if it only rewrote the 2T that was meant 
to be on the drive the rebuild would take a day or so.


Bob's comments about the pool design are correct, you have a disaster 
waiting to happen.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-17 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Tom Bird
 

We recently had a long discussion in this list, about resilver times versus
raid types.  In the end, the conclusion was:  resilver code is very
inefficient for raidzN.  Someday it may be better optimized, but until that
day comes, you really need to break your giant raidzN into smaller vdev's.

3 vdev's of 7 disk raidz is preferable over a 21 disk raidz3.

If you want this resilver to complete, you should do anything you can to (a)
stop taking snapshots (b) don't scrub (c) stop all IO possible.  And be
patient.

Most people in your situation find it faster to zfs send to some other
storage, and then destroy  recreate the pool.  I know it stinks.  But
that's what you're facing.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss