[zfs-discuss] Scrub performance
Hi all I have had a ZFS file server for a while now. I recently upgraded it, giving it 16GB RAM and an SSD for L2ARC. This allowed me to evaluate dedupe on certain datasets, which worked pretty well. The main reason for the upgrade was that something wasn't working quite right, and I was getting errors on the disks (all of them) leading to occasional data loss. The first thing I did, therefore, was schedule in regular scrubbing. It was not long before I cut this down from daily to weekly, as no errors were being found but performance during the scrub was, obviously, not so hot. The scrub was scheduled for 4am Sunday morning, when it would have least impact on use, and normally ran for approx 4-6hrs. Recently, however, it has started taking over 20hours to complete. Not much has happened to it in that time: A few extra files added, maybe a couple of deletions, but not a huge amount. I am finding it difficult to understand why performance would have dropped so dramatically. FYI the server is my dev box running Solaris 11 express, 2 mirrored pairs of 1.5GB SATA disks for data (at v28), a separate root pool and a 64GB SSD for L2ARC. The data pool has 1.2TB allocated. Can anyone shed some light on this? Thanks Karl ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
Hi Karl, Recently, however, it has started taking over 20hours to complete. Not much has happened to it in that time: A few extra files added, maybe a couple of deletions, but not a huge amount. I am finding it difficult to understand why performance would have dropped so dramatically. FYI the server is my dev box running Solaris 11 express, 2 mirrored pairs of 1.5GB SATA disks for data (at v28), a separate root pool and a 64GB SSD for L2ARC. The data pool has 1.2TB allocated. Can anyone shed some light on this? all I can tell you is that I've had terrible scrub rates when I used dedup. The DDT was a bit too big to fit in my memory (I assume according to some very basic debugging). Only two of my datasets were deduped. On scrubs and resilvers I noticed that sometimes I had terrible rates with 10MB/sec. Then later it rose up to 70MB/sec. After upgrading some discs (same speeds observed) I got rid of the deduped datasets (zfs send/receive them) and guess what: All of the sudden scrub goes to 350MB/sec steady and only take a fraction of the time. While I certainly cannot deliver all the necessary explanations I can only tell you that from my personal observation simply getting rid of dedup speeded up my scrub times by factor 7 or so (same server, same discs, same data). Kind regards, JP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Koopmann, Jan-Peter all I can tell you is that I've had terrible scrub rates when I used dedup. I can tell you I've had terrible everything rates when I used dedup. The DDT was a bit too big to fit in my memory (I assume according to some very basic debugging). This is more or less irrelevant, becuase the system doesn't load it into memory anyway. It will cache a copy in ARC just like everything else in the pool. It gets evicted just as quickly as everything else. Only two of my datasets were deduped. On scrubs and resilvers I noticed that sometimes I had terrible rates with 10MB/sec. Then later it rose up to 70MB/sec. After upgrading some discs (same speeds observed) I got rid of the deduped datasets (zfs send/receive them) and guess what: All of the sudden scrub goes to 350MB/sec steady and only take a fraction of the time. Are you talking about scrub rates for the complete scrub? Because if you sit there and watch it, from minute to minute, it's normal for it to bounce really low for a long time, and then really high for a long time, etc. The only measurement that has any real meaning is time to completion. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
Hi Edward, From: zfs-discuss-boun...@opensolaris.orgmailto:zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.orgmailto:boun...@opensolaris.org] On Behalf Of Koopmann, Jan-Peter all I can tell you is that I've had terrible scrub rates when I used dedup. I can tell you I've had terrible everything rates when I used dedup. I am not alone then. Thanks! :-) Only two of my datasets were deduped. On scrubs and resilvers I noticed that sometimes I had terrible rates with 10MB/sec. Then later it rose up to 70MB/sec. After upgrading some discs (same speeds observed) I got rid of the deduped datasets (zfs send/receive them) and guess what: All of the sudden scrub goes to 350MB/sec steady and only take a fraction of the time. Are you talking about scrub rates for the complete scrub? Because if you sit there and watch it, from minute to minute, it's normal for it to bounce really low for a long time, and then really high for a long time, etc. The only measurement that has any real meaning is time to completion. Well both actually. The lowest rate I observed increased significantly without dedup and the time to completion decreased a lot as well. I remember doing a scrub after a resilver that took appr. 20-24 hours. Last scrub without dedup 3:26. :-) A LOT FASTER…. Kind regards, JP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
On 2013-02-04 15:52, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I noticed that sometimes I had terrible rates with 10MB/sec. Then later it rose up to 70MB/sec. Are you talking about scrub rates for the complete scrub? Because if you sit there and watch it, from minute to minute, it's normal for it to bounce really low for a long time, and then really high for a long time, etc. The only measurement that has any real meaning is time to completion. To paraphrase, the random IOs on HDDs are slow - these are multiple reads of small blocks dispersed on the disk, be it small files or copies of metadata or seeks into the DDT. Fast reads are large sequentially stored files, i.e. when a scrub hits an ISO image or a movie on your disk, or a series of smaller files from the same directory than happened to be created and saved in the same TXG or so, and their userdata was queued to disk as a large sequential blob in a coalesced write operation. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
OK then, I guess my next question would be what's the best way to undedupe the data I have? Would it work for me to zfs send/receive on the same pool (with dedup off), deleting the old datasets once they have been 'copied'? I think I remember reading somewhere that the DDT never shrinks, so this would not work, but it would be the simplest way. Otherwise, I would be left with creating another pool or destroying and restoring from a backup, neither of which is ideal. On 2013-02-04 15:37, Jim Klimov wrote: On 2013-02-04 15:52, Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote: I noticed that sometimes I had terrible rates with 10MB/sec. Then later it rose up to 70MB/sec. Are you talking about scrub rates for the complete scrub? Because if you sit there and watch it, from minute to minute, it's normal for it to bounce really low for a long time, and then really high for a long time, etc. The only measurement that has any real meaning is time to completion. To paraphrase, the random IOs on HDDs are slow - these are multiple reads of small blocks dispersed on the disk, be it small files or copies of metadata or seeks into the DDT. Fast reads are large sequentially stored files, i.e. when a scrub hits an ISO image or a movie on your disk, or a series of smaller files from the same directory than happened to be created and saved in the same TXG or so, and their userdata was queued to disk as a large sequential blob in a coalesced write operation. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org [1] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss [2] Links: -- [1] mailto:zfs-discuss@opensolaris.org [2] http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
Hi, OK then, I guess my next question would be what's the best way to undedupe the data I have? Would it work for me to zfs send/receive on the same pool (with dedup off), deleting the old datasets once they have been 'copied'? yes. Worked for my. I think I remember reading somewhere that the DDT never shrinks, so this would not work, but it would be the simplest way. Once you delete all snapshots with dedup on, the DDT will be empty. This is what I did here and it worked like a charm. But again: I only had it actively enabled on two datasets so my situation might be different from yours. Kind regards, JP ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
On 2013-02-04 17:10, Karl Wagner wrote: OK then, I guess my next question would be what's the best way to undedupe the data I have? Would it work for me to zfs send/receive on the same pool (with dedup off), deleting the old datasets once they have been 'copied'? I think I remember reading somewhere that the DDT never shrinks, so this would not work, but it would be the simplest way. Otherwise, I would be left with creating another pool or destroying and restoring from a backup, neither of which is ideal. If you have enough space, then copying with dedup=off should work (zfs send, rsync, whatever works for you best). I think DDT should shrink, deleting entries as soon as their reference count goes to 0, however this by itself can take quite a while and cause lots of random IO - in my case this might have been reason for system hangs and/or panics due to memory starvation. However, after a series of reboots (and a couple of weeks of disk-thrashing) I was able to get rid of some more offending datasets in my tests a couple of years ago now... As for smarter undedup - I've asked recently, proposing a method to do it in a stone-age way; but overall there is no ready solution so far. //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrub performance
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey I can tell you I've had terrible everything rates when I used dedup. So, the above comment isn't fair, really. The truth is here: http://mail.opensolaris.org/pipermail/zfs-discuss/2011-July/049209.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] scrub performance
I currently have an X4500 running S10U4 with the latest ZFS uber patch (127729-07) for which zpool scrub is making very slow progress even though the necessary resources are apparently available. Currently it has been running for 3 days to reach 75% completion, however, in the last 12 hours this only advanced by 3%. At times this server is busy running NFSD and it is understandable that the scrub to take a lower priority, however, I have observed interestingly long time intervals when neither prstat nor iostat show any obvious bottlenecks, e.g., disks at 10% busy. Is there a throttle on scrub resource allocation that does not readily open up again after being limited due to other system activity? For comparison, an identical system (same OS/zpool config, and roughly the same number of filesystems and files) finished a scrub in 2 days. This is not a critical problem, but at least initially it was clear from iostat that scrub was pegging all the disk IOPS/BW as available, but I am curious why it has backed off from that after a few days of running. P.S. I realize it is not a user command and that the last event can be found in zpool status, but I would find it convenient if the scrub completion event was also logged in the zpool history along with the initiation event. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
On Thu, Mar 06, 2008 at 11:51:00AM -0800, Stuart Anderson wrote: I currently have an X4500 running S10U4 with the latest ZFS uber patch (127729-07) for which zpool scrub is making very slow progress even though the necessary resources are apparently available. Currently it has It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Thanks. # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.49% done, 28h51m to go errors: No known data errors Thu Mar 6 08:50:59 PST 2008 # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.24% done, 31h20m to go errors: No known data errors Thu Mar 6 15:15:39 PST 2008 -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
[EMAIL PROTECTED] said: It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Sorry I don't have any helpful experience in this area. It occurs to me that perhaps you are detecting a gravity wave of some sort -- Thumpers are pretty heavy, and thus may be more affected than the average server. Or the guys at SLAC have, unbeknownst to you, somehow accelerated your Thumper to near the speed of light. (:-) Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
On Thu, Mar 06, 2008 at 05:55:53PM -0800, Marion Hakanson wrote: [EMAIL PROTECTED] said: It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Sorry I don't have any helpful experience in this area. It occurs to me that perhaps you are detecting a gravity wave of some sort -- Thumpers are pretty heavy, and thus may be more affected than the average server. Or the guys at SLAC have, unbeknownst to you, somehow accelerated your Thumper to near the speed of light. (:-) If true, that would certainly help, since we actually are using these thumpers to help detect gravitational waves! See, http://www.ligo.caltech.edu. Thanks. -- Stuart Anderson [EMAIL PROTECTED] http://www.ligo.caltech.edu/~anderson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub performance
Stuart Anderson wrote: On Thu, Mar 06, 2008 at 11:51:00AM -0800, Stuart Anderson wrote: I currently have an X4500 running S10U4 with the latest ZFS uber patch (127729-07) for which zpool scrub is making very slow progress even though the necessary resources are apparently available. Currently it has It is also interesting to note that this system is now making negative progress. I can understand the remaining time estimate going up with time, but what does it mean for the % complete number to go down after 6 hours of work? Thanks. # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.49% done, 28h51m to go errors: No known data errors Thu Mar 6 08:50:59 PST 2008 # zpool status | egrep -e progress|errors ; date scrub: scrub in progress, 75.24% done, 31h20m to go errors: No known data errors Thu Mar 6 15:15:39 PST 2008 There are a few things which may cause the scrub to restart. See: 6655927 zpool status causes a resilver or scrub to restart 6343667 scrub/resilver has to start over when a snapshot is taken Sorry the latter doesn't have a useful description, but the synopsis says it all: taking snapshots causes scrubs to restart. Either of these may explain the negative progress. -- David Pacheco, Sun Microsystems ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss