[zfs-discuss] Scrub performance

2013-02-04 Thread Karl Wagner
 

Hi all 

I have had a ZFS file server for a while now. I recently
upgraded it, giving it 16GB RAM and an SSD for L2ARC. This allowed me to
evaluate dedupe on certain datasets, which worked pretty well. 

The
main reason for the upgrade was that something wasn't working quite
right, and I was getting errors on the disks (all of them) leading to
occasional data loss. The first thing I did, therefore, was schedule in
regular scrubbing. 

It was not long before I cut this down from daily
to weekly, as no errors were being found but performance during the
scrub was, obviously, not so hot. The scrub was scheduled for 4am Sunday
morning, when it would have least impact on use, and normally ran for
approx 4-6hrs. 

Recently, however, it has started taking over 20hours
to complete. Not much has happened to it in that time: A few extra files
added, maybe a couple of deletions, but not a huge amount. I am finding
it difficult to understand why performance would have dropped so
dramatically. 

FYI the server is my dev box running Solaris 11 express,
2 mirrored pairs of 1.5GB SATA disks for data (at v28), a separate root
pool and a 64GB SSD for L2ARC. The data pool has 1.2TB allocated. 

Can
anyone shed some light on this? 

Thanks 

Karl ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Koopmann, Jan-Peter
Hi Karl,


Recently, however, it has started taking over 20hours to complete. Not much has 
happened to it in that time: A few extra files added, maybe a couple of 
deletions, but not a huge amount. I am finding it difficult to understand why 
performance would have dropped so dramatically.

FYI the server is my dev box running Solaris 11 express, 2 mirrored pairs of 
1.5GB SATA disks for data (at v28), a separate root pool and a 64GB SSD for 
L2ARC. The data pool has 1.2TB allocated.

Can anyone shed some light on this?

all I can tell you is that I've had terrible scrub rates when I used dedup. The 
DDT was a bit too big to fit in my memory (I assume according to some very 
basic debugging). Only two of my datasets were deduped. On scrubs and resilvers 
I noticed that sometimes I had terrible rates with  10MB/sec. Then later it 
rose up to  70MB/sec. After upgrading some discs (same speeds observed) I got 
rid of the deduped datasets (zfs send/receive them) and guess what: All of the 
sudden scrub goes to 350MB/sec steady and only take a fraction of the time.

While I certainly cannot deliver all the necessary explanations I can only tell 
you that from my personal observation simply getting rid of dedup speeded up my 
scrub times by factor 7 or so (same server, same discs, same data).


Kind regards,
   JP

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Koopmann, Jan-Peter
 
 all I can tell you is that I've had terrible scrub rates when I used dedup. 

I can tell you I've had terrible everything rates when I used dedup.


 The
 DDT was a bit too big to fit in my memory (I assume according to some very
 basic debugging). 

This is more or less irrelevant, becuase the system doesn't load it into memory 
anyway.  It will cache a copy in ARC just like everything else in the pool.  It 
gets evicted just as quickly as everything else.


 Only two of my datasets were deduped. On scrubs and
 resilvers I noticed that sometimes I had terrible rates with  10MB/sec. Then
 later it rose up to  70MB/sec. After upgrading some discs (same speeds
 observed) I got rid of the deduped datasets (zfs send/receive them) and
 guess what: All of the sudden scrub goes to 350MB/sec steady and only take
 a fraction of the time.

Are you talking about scrub rates for the complete scrub?  Because if you sit 
there and watch it, from minute to minute, it's normal for it to bounce really 
low for a long time, and then really high for a long time, etc.  The only 
measurement that has any real meaning is time to completion.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Koopmann, Jan-Peter
Hi Edward,

From: 
zfs-discuss-boun...@opensolaris.orgmailto:zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-
boun...@opensolaris.orgmailto:boun...@opensolaris.org] On Behalf Of Koopmann, 
Jan-Peter
all I can tell you is that I've had terrible scrub rates when I used dedup.

I can tell you I've had terrible everything rates when I used dedup.

I am not alone then. Thanks! :-)

Only two of my datasets were deduped. On scrubs and
resilvers I noticed that sometimes I had terrible rates with  10MB/sec. Then
later it rose up to  70MB/sec. After upgrading some discs (same speeds
observed) I got rid of the deduped datasets (zfs send/receive them) and
guess what: All of the sudden scrub goes to 350MB/sec steady and only take
a fraction of the time.

Are you talking about scrub rates for the complete scrub?  Because if you sit 
there and watch it, from minute to minute, it's normal for it to bounce really 
low for a long time, and then really high for a long time, etc.  The only 
measurement that has any real meaning is time to completion.



Well both actually. The lowest rate I observed increased significantly without 
dedup and the time to completion decreased a lot as well. I remember doing a 
scrub after a resilver that took appr. 20-24 hours. Last scrub without dedup 
3:26. :-) A LOT FASTER….


Kind regards,
   JP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Jim Klimov
On 2013-02-04 15:52, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

I noticed that sometimes I had terrible rates with  10MB/sec. Then
later it rose up to  70MB/sec.

Are you talking about scrub rates for the complete scrub?  Because if you sit 
there and watch it, from minute to minute, it's normal for it to bounce really 
low for a long time, and then really high for a long time, etc.  The only 
measurement that has any real meaning is time to completion.


To paraphrase, the random IOs on HDDs are slow - these are multiple
reads of small blocks dispersed on the disk, be it small files or
copies of metadata or seeks into the DDT. Fast reads are large
sequentially stored files, i.e. when a scrub hits an ISO image or
a movie on your disk, or a series of smaller files from the same
directory than happened to be created and saved in the same TXG
or so, and their userdata was queued to disk as a large sequential
blob in a coalesced write operation.

HTH,
//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Karl Wagner
 

OK then, I guess my next question would be what's the best way to
undedupe the data I have? 

Would it work for me to zfs send/receive
on the same pool (with dedup off), deleting the old datasets once they
have been 'copied'? I think I remember reading somewhere that the DDT
never shrinks, so this would not work, but it would be the simplest way.


Otherwise, I would be left with creating another pool or destroying
and restoring from a backup, neither of which is ideal. 

On 2013-02-04
15:37, Jim Klimov wrote: 

 On 2013-02-04 15:52, Edward Ned Harvey 

(opensolarisisdeadlongliveopensolaris) wrote:
 I noticed that
sometimes I had terrible rates with  10MB/sec. Then later it rose up to
 70MB/sec.
 Are you talking about scrub rates for the complete scrub?
Because if you sit there and watch it, from minute to minute, it's
normal for it to bounce really low for a long time, and then really high
for a long time, etc. The only measurement that has any real meaning is
time to completion.
 To paraphrase, the random IOs on HDDs are slow -
these are multiple reads of small blocks dispersed on the disk, be it
small files or copies of metadata or seeks into the DDT. Fast reads are
large sequentially stored files, i.e. when a scrub hits an ISO image or
a movie on your disk, or a series of smaller files from the same
directory than happened to be created and saved in the same TXG or so,
and their userdata was queued to disk as a large sequential blob in a
coalesced write operation. HTH, //Jim
___ zfs-discuss mailing list
zfs-discuss@opensolaris.org [1]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss [2]




Links:
--
[1] mailto:zfs-discuss@opensolaris.org
[2]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Koopmann, Jan-Peter
Hi,


OK then, I guess my next question would be what's the best way to undedupe 
the data I have?

Would it work for me to zfs send/receive on the same pool (with dedup off), 
deleting the old datasets once they have been 'copied'?

yes. Worked for my.


I think I remember reading somewhere that the DDT never shrinks, so this would 
not work, but it would be the simplest way.

Once you delete all snapshots with dedup on, the DDT will be empty. This is 
what I did here and it worked like a charm. But again: I only had it actively 
enabled on two datasets so my situation might be different from yours.

Kind regards,
   JP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs + NFS + FreeBSD with performance prob

2013-02-04 Thread Paul Kraus
On Jan 31, 2013, at 5:16 PM, Albert Shih wrote:

 Well I've server running FreeBSD 9.0 with (don't count / on differents
 disks) zfs pool with 36 disk.
 
 The performance is very very good on the server.
 
 I've one NFS client running FreeBSD 8.3 and the performance over NFS is
 very good : 
 
 For example : Read from the client and write over NFS to ZFS:
 
 [root@ .tmp]# time tar xf /tmp/linux-3.7.5.tar 
 
 real1m7.244s
 user0m0.921s
 sys 0m8.990s
 
 this client is on 1Gbits/s network cable and same network switch as the
 server.
 
 I've a second NFS client running FreeBSD 9.1-Stable, and on this second
 client the performance is catastrophic. After 1 hour the tar isn't finish.
 OK this second client is connect with 100Mbit/s and not on the same switch.
 But well from 2 min -- ~ 90 min ...:-(
 
 I've try for this second client to change on the ZFS-NFS server the
 
   zfs set sync=disabled 
 
 and that change nothing.

I have been using FreeBSD 9 with ZFS and NFS to a couple Mac OS X (10.6.8 Snow 
Leopard) boxes and I get between 40 and 50 MB/sec throughput on a Gigabit 
ethernet link. Since you have already ruled out the known sync issue with ZFS 
and no SSD-based write cache, then perhaps you are running into an NFS 3 vs. 
NFS 4 issue. I am not sure if Mac OS X is using NFS 3 or NFS 4.

--
Paul Kraus
Deputy Technical Director, LoneStarCon 3
Sound Coordinator, Schenectady Light Opera Company

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Jim Klimov

On 2013-02-04 17:10, Karl Wagner wrote:

OK then, I guess my next question would be what's the best way to
undedupe the data I have?

Would it work for me to zfs send/receive on the same pool (with dedup
off), deleting the old datasets once they have been 'copied'? I think I
remember reading somewhere that the DDT never shrinks, so this would not
work, but it would be the simplest way.

Otherwise, I would be left with creating another pool or destroying and
restoring from a backup, neither of which is ideal.


If you have enough space, then copying with dedup=off should work
(zfs send, rsync, whatever works for you best).

I think DDT should shrink, deleting entries as soon as their reference
count goes to 0, however this by itself can take quite a while and
cause lots of random IO - in my case this might have been reason for
system hangs and/or panics due to memory starvation. However, after
a series of reboots (and a couple of weeks of disk-thrashing) I was
able to get rid of some more offending datasets in my tests a couple
of years ago now...

As for smarter undedup - I've asked recently, proposing a method
to do it in a stone-age way; but overall there is no ready solution
so far.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub performance

2013-02-04 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey

 I can tell you I've had terrible everything rates when I used dedup.

So, the above comment isn't fair, really.  The truth is here:
http://mail.opensolaris.org/pipermail/zfs-discuss/2011-July/049209.html

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss