Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-16 Thread Richard Elling
Measure the I/O performance with iostat.  You should see something that
looks sorta like (iostat -zxCn 10):
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 5948.9  349.3 40322.3 5238.1 0.1 16.70.02.7   0 330 c9
3.70.0  230.70.0  0.0  0.10.0   13.5   0   2 c9t1d0
  845.00.0 5497.40.0  0.0  0.90.01.1   1  32 c9t2d0
3.80.0  230.70.0  0.0  0.00.0   10.6   0   1 c9t3d0
  845.20.0 5495.40.0  0.0  0.90.01.1   1  32 c9t4d0
3.80.0  237.10.0  0.0  0.00.0   10.4   0   1 c9t5d0
  841.40.0 5519.70.0  0.0  0.90.01.1   1  32 c9t6d0
3.80.0  237.30.0  0.0  0.00.09.2   0   1 c9t7d0
  843.50.0 5485.20.0  0.0  0.90.01.1   1  31 c9t8d0
3.70.0  230.80.0  0.0  0.10.0   15.2   0   2 c9t9d0
  850.20.0 5488.60.0  0.0  0.90.01.1   1  31 c9t10d0
3.10.0  211.20.0  0.0  0.00.0   13.2   0   1 c9t11d0
  847.90.0 5523.40.0  0.0  0.90.01.1   1  31 c9t12d0
3.10.0  204.90.0  0.0  0.00.09.6   0   1 c9t13d0
  847.20.0 5506.00.0  0.0  0.90.01.1   1  31 c9t14d0
3.40.0  224.10.0  0.0  0.00.0   12.3   0   1 c9t15d0
0.0  349.30.0 5238.1  0.0  9.90.0   28.4   1 100 c9t16d0

Here you can clearly see a raidz2 resilver in progress. c9t16d0
is the disk being resilvered (write workload) and half of the 
others are being read to generate the resilvering data.  Note
the relative performance and the ~30% busy for the surviving
disks.  If you see iostat output that looks significantly different
than this, then you might be seeing one of two common causes:

1. Your version of ZFS has the new resilver throttle *and* the
  pool is otherwise servicing I/O.

2. Disks are throwing errors or responding very slowly.  Use
  fmdump -eV to observe error reports.

 -- richard

On Nov 1, 2010, at 12:33 PM, Mark Sandrock wrote:

 Hello,
 
   I'm working with someone who replaced a failed 1TB drive (50% utilized),
 on an X4540 running OS build 134, and I think something must be wrong.
 
 Last Tuesday afternoon, zpool status reported:
 
 scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
 
 and a week being 168 hours, that put completion at sometime tomorrow night.
 
 However, he just reported zpool status shows:
 
 scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
 
 so it's looking more like 2011 now. That can't be right.
 
 I'm hoping for a suggestion or two on this issue.
 
 I'd search the archives, but they don't seem searchable. Or am I wrong about 
 that?
 
 Thanks.
 Mark (subscription pending)
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
ZFS and performance consulting
http://www.RichardElling.com













___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-15 Thread Mark Sandrock

On Nov 2, 2010, at 12:10 AM, Ian Collins wrote:

 On 11/ 2/10 08:33 AM, Mark Sandrock wrote:
 
 
 I'm working with someone who replaced a failed 1TB drive (50% utilized),
 on an X4540 running OS build 134, and I think something must be wrong.
 
 Last Tuesday afternoon, zpool status reported:
 
 scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
 
 and a week being 168 hours, that put completion at sometime tomorrow night.
 
 However, he just reported zpool status shows:
 
 scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
 
 so it's looking more like 2011 now. That can't be right.

 
 
 How is the pool configured?

Both 10 and 12 disk RAIDZ-2. That, plus too much other io
must be the problem. I'm thinking 5 x (7-2) would be better,
assuming he doesn't want to go RAID-10.

Thanks much for all the helpful replies.

Mark
 
 I look after a very busy x5400 with 500G drives configured as 8 drive raidz2 
 and these take about 100 hours to resilver.  The workload on this box is 
 probably worst case for resivering, it receives a steady stream of snapshots.
 
 -- 
 Ian.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-01 Thread Mark Sandrock
Hello,

I'm working with someone who replaced a failed 1TB drive (50% utilized),
on an X4540 running OS build 134, and I think something must be wrong.

Last Tuesday afternoon, zpool status reported:

scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go

and a week being 168 hours, that put completion at sometime tomorrow night.

However, he just reported zpool status shows:

scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go

so it's looking more like 2011 now. That can't be right.

I'm hoping for a suggestion or two on this issue.

I'd search the archives, but they don't seem searchable. Or am I wrong about 
that?

Thanks.
Mark (subscription pending)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-01 Thread Ross Walker
On Nov 1, 2010, at 3:33 PM, Mark Sandrock mark.sandr...@oracle.com wrote:

 Hello,
 
   I'm working with someone who replaced a failed 1TB drive (50% utilized),
 on an X4540 running OS build 134, and I think something must be wrong.
 
 Last Tuesday afternoon, zpool status reported:
 
 scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
 
 and a week being 168 hours, that put completion at sometime tomorrow night.
 
 However, he just reported zpool status shows:
 
 scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
 
 so it's looking more like 2011 now. That can't be right.
 
 I'm hoping for a suggestion or two on this issue.
 
 I'd search the archives, but they don't seem searchable. Or am I wrong about 
 that?

Some zpool versions have an issue where snapshot creation/deletion during a 
resilver causes it to start over.

Try suspending all snapshot activity during the resilver.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-01 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Mark Sandrock
 
   I'm working with someone who replaced a failed 1TB drive (50%
 utilized),
 on an X4540 running OS build 134, and I think something must be wrong.
 
 Last Tuesday afternoon, zpool status reported:
 
 scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go
 
 and a week being 168 hours, that put completion at sometime tomorrow
 night.
 
 However, he just reported zpool status shows:
 
 scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go
 
 so it's looking more like 2011 now. That can't be right.
 
 I'm hoping for a suggestion or two on this issue.

For a typical live system, which has been in production for a long time with
files being created, snapshotted, partially overwritten, snapshots
destroyed, etc etc...  the blocks written to disk tend to be largely written
in random order.  And at least for now, the order of resilvering blocks is
by creation time, not disk location.  So resilver time is typically limited
by IOPS for random IO, and the number of records that are in the affected
vdev.  

To reduce the number of records in an affected vdev, it is effective to
build the pool using mirrors instead of raidz... Or use smaller vdevs of
raidz1 instead of large raidz3.  Unfortunately, you're not going to be able
to change that with an existing system.  Roughly speaking, a 23-disk raidz3
with capacity of 20 disks would take 40x longer to resilver than one of the
mirrors in a 40-disk stripe of mirrors with capacity of 20 disks.  In rough
numbers, that might be 20 days instead of 12 hours.

To reduce the IOPS time...  (for background info: Under normal
circumstances, you should disable the HBA WriteBack cache if you have
dedicated log present (on the X4275 that is done via realtek HBA utility, I
don't know about X4540) ) ... But during resilver, you might enable the
WriteBack for the drive that's being resilvered.  I don't know for sure if
that will help, but I think it should make some difference, because the
logic which led to the disabling of WB does not apply to resilver writes.

To reduce the number of records to resilver...
* If possible, disable the creation of new snapshots while resilver is
running.  
* If possible, delete files and destroy old snapshots that are not needed
anymore
* If possible, limit new writes to the system.

By the way, I'm sorry to say ... Also don't trust the progress indicator.
You're likely to reach 100% completed, and stay there for a long time.  Even
2T resilvered on a 1T disk... This is an ugly area which looks bad in the
face, but it's actually physically correct because the filesystem is in use,
and performing new writes during the resilver...

To reduce the IOPS time...
* If possible, limit the live IO to the system.  Resilver has lower
priority and therefore gets delayed a lot for production systems.
* Definitely DON'T scrub the pool while it's resilvering.

Maybe you might be able to offload some of the IO by adding cache devices,
dedicated log, or ram?  Meaning... I know it's sound in principle, but YMMV
immensely, depending on your workload.

All of the above is likely to be not amazingly effective.  There's not much
you can do, if you started with a huge raidz3, for example.  The most
important thing you can do to affect resilver time is choose to use mirrors
instead of raidz, at the time of pool creation.

So, as a last ditch effort ...  If you zfs send the pool to some other
storage, and then recreate your local pool, which will be empty and
therefore resilver completed, because zfs only resilvers used blocks... and
then zfs send the data back to restore the pool...  Then besides the fact
that your resilver has been forcibly completed, your received data will also
be ordered on disk optimally, which will greatly help in case another
resilver is needed in the near future... and you create an opportunity to
revisit the pool architecture, possibly in favor of mirrors instead of
raidz.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-01 Thread Ian Collins

On 11/ 2/10 08:33 AM, Mark Sandrock wrote:

Hello,

I'm working with someone who replaced a failed 1TB drive (50% utilized),
on an X4540 running OS build 134, and I think something must be wrong.

Last Tuesday afternoon, zpool status reported:

scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go

and a week being 168 hours, that put completion at sometime tomorrow 
night.


However, he just reported zpool status shows:

scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go

so it's looking more like 2011 now. That can't be right.

I'm hoping for a suggestion or two on this issue.

I'd search the archives, but they don't seem searchable. Or am I wrong 
about that?


How is the pool configured?

I look after a very busy x5400 with 500G drives configured as 8 drive 
raidz2 and these take about 100 hours to resilver.  The workload on this 
box is probably worst case for resivering, it receives a steady stream 
of snapshots.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Excruciatingly slow resilvering on X4540 (build 134)

2010-11-01 Thread Ian Collins

On 11/ 2/10 11:55 AM, Ross Walker wrote:
On Nov 1, 2010, at 3:33 PM, Mark Sandrock mark.sandr...@oracle.com 
mailto:mark.sandr...@oracle.com wrote:



Hello,

I'm working with someone who replaced a failed 1TB drive (50% utilized),
on an X4540 running OS build 134, and I think something must be wrong.

Last Tuesday afternoon, zpool status reported:

scrub: resilver in progress for 306h0m, 63.87% done, 173h7m to go

and a week being 168 hours, that put completion at sometime tomorrow 
night.


However, he just reported zpool status shows:

scrub: resilver in progress for 447h26m, 65.07% done, 240h10m to go

so it's looking more like 2011 now. That can't be right.

I'm hoping for a suggestion or two on this issue.

I'd search the archives, but they don't seem searchable. Or am I 
wrong about that?


Some zpool versions have an issue where snapshot creation/deletion 
during a resilver causes it to start over.



That was fixed long before build 134.


Try suspending all snapshot activity during the resilver.


This always helps!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss