[zfs-discuss] slow zfs send
Hi, I'm showing slow zfs send on pool v29. About 25MB/sec bash-3.2# zpool status vdipool pool: vdipool state: ONLINE scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb 6 01:36:23 2012 config: NAME STATE READ WRITE CKSUM vdipoolONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C500103F2057d0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C5000440AA0Bd0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E9FFBd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E370Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E120Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod logs mirror-1 ONLINE 0 0 0 c0t500151795955D430d0 ONLINE 0 0 0(ATA-INTEL SSDSA2VP02-02M5-18.64GB) onboard drive on x4140 c0t500151795955BDB6d0 ONLINE 0 0 0 (ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140 cache c0t5001517BB271845Dd0ONLINE 0 0 0 (ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140 spares c0t5000C500103E368Fd0AVAIL (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod The drives are in an external promise 12 drive jbod. The jbod is also connected to another server that uses the other 6 SEAGATE ST31000640SS drives. This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for the external promise jbod and an internal 9200 for the zli and l2arc which also uses rpool. FW versions on both cards are MPTFW-12.00.00.00-IT and MPT2BIOS-7.23.01.00. I'm wondering why the zfs send could be so slow. Could the other server be slowing down the sas bus? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
2012-05-07 20:45, Karl Rossing цкщеу: I'm wondering why the zfs send could be so slow. Could the other server be slowing down the sas bus? I hope other posters would have more relevant suggestions, but you can see if the buses are contended by dd'ing from the drives. At least that would give you the measure of available sequential throughput. During the send you can also monitor zpool iostat 1 and usual iostat -xnz 1 in order to see how busy the disks are and how many IO requests are issued. The snapshots are likely sent in the order of block age (TXG number), which for a busy pool may mean heavy fragmentation and lots of random small IOs... HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
Hi Karl, I like to verify that no dead or dying disk is killing pool performance and your zpool status looks good. Jim has replied with some ideas to check your individual device performance. Otherwise, you might be impacted by this CR: 7060894 zfs recv is excruciatingly slow This CR covers both zfs send/recv ops and should be resolved in an upcoming Solaris 10 release. Its already available in an s11 SRU. Thanks, Cindy On 5/7/12 10:45 AM, Karl Rossing wrote: Hi, I'm showing slow zfs send on pool v29. About 25MB/sec bash-3.2# zpool status vdipool pool: vdipool state: ONLINE scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb 6 01:36:23 2012 config: NAME STATE READ WRITE CKSUM vdipoolONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C500103F2057d0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C5000440AA0Bd0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E9FFBd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E370Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E120Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod logs mirror-1 ONLINE 0 0 0 c0t500151795955D430d0 ONLINE 0 0 0(ATA-INTEL SSDSA2VP02-02M5-18.64GB) onboard drive on x4140 c0t500151795955BDB6d0 ONLINE 0 0 0 (ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140 cache c0t5001517BB271845Dd0ONLINE 0 0 0 (ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140 spares c0t5000C500103E368Fd0AVAIL (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod The drives are in an external promise 12 drive jbod. The jbod is also connected to another server that uses the other 6 SEAGATE ST31000640SS drives. This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for the external promise jbod and an internal 9200 for the zli and l2arc which also uses rpool. FW versions on both cards are MPTFW-12.00.00.00-IT and MPT2BIOS-7.23.01.00. I'm wondering why the zfs send could be so slow. Could the other server be slowing down the sas bus? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
Hi Karl, Someone sitting across the table from me (who saw my posting) informs me that CR 7060894 would not impact Solaris 10 releases, so kindly withdrawn my comment about CR 7060894. Thanks, Cindy On 5/7/12 11:35 AM, Cindy Swearingen wrote: Hi Karl, I like to verify that no dead or dying disk is killing pool performance and your zpool status looks good. Jim has replied with some ideas to check your individual device performance. Otherwise, you might be impacted by this CR: 7060894 zfs recv is excruciatingly slow This CR covers both zfs send/recv ops and should be resolved in an upcoming Solaris 10 release. Its already available in an s11 SRU. Thanks, Cindy On 5/7/12 10:45 AM, Karl Rossing wrote: Hi, I'm showing slow zfs send on pool v29. About 25MB/sec bash-3.2# zpool status vdipool pool: vdipool state: ONLINE scan: scrub repaired 86.5K in 7h15m with 0 errors on Mon Feb 6 01:36:23 2012 config: NAME STATE READ WRITE CKSUM vdipoolONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 c0t5000C500103F2057d0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C5000440AA0Bd0 ONLINE 0 0 0 (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E9FFBd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E370Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod c0t5000C500103E120Fd0 ONLINE 0 0 0(SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod logs mirror-1 ONLINE 0 0 0 c0t500151795955D430d0 ONLINE 0 0 0(ATA-INTEL SSDSA2VP02-02M5-18.64GB) onboard drive on x4140 c0t500151795955BDB6d0 ONLINE 0 0 0 (ATA-INTEL SSDSA2VP02-02M5-18.64GB)onboard drive on x4140 cache c0t5001517BB271845Dd0ONLINE 0 0 0 (ATA-INTEL SSDSA2CW16-0362-149.05GB)onboard drive on x4140 spares c0t5000C500103E368Fd0AVAIL (SEAGATE-ST31000640SS-0003-931.51GB) Promise Jbod The drives are in an external promise 12 drive jbod. The jbod is also connected to another server that uses the other 6 SEAGATE ST31000640SS drives. This on Solaris 10 8/11 (Generic_147441-01). I'm using LSI 9200 for the external promise jbod and an internal 9200 for the zli and l2arc which also uses rpool. FW versions on both cards are MPTFW-12.00.00.00-IT and MPT2BIOS-7.23.01.00. I'm wondering why the zfs send could be so slow. Could the other server be slowing down the sas bus? Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hung zfs destroy
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins On a Solaris 11 (SR3) system I have a zfs destroy process what appears to be doing nothing and can't be killed. It has used 5 seconds of CPU in a day and a half, but truss -p won't attach. No data appears to have been removed. The dataset (but not the pool) is busy. I thought this was an old problem that was fixed long ago in Solaris 10 (I had several temporary patches over the years), but it appears to be alive and well. How big is your dataset? On what type of disks/pool? zfs destroy does indeed take time (unlike zpool destroy.) A couple of days might be normal expected behavior, depending on your configuration. You didn't specify if you have dedup... Dedup will greatly hurt your zfs destroy speed, too. That being said, sometimes things go wrong, and I don't have any suggestion for you to determine if yours is behaving as expected. Or not. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] IOzone benchmarking
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn Has someone done real-world measurements which indicate that raidz* actually provides better sequential read or write than simple mirroring with the same number of disks? While it seems that there should be an advantage, I don't recall seeing posted evidence of such. If there was a measurable advantage, it would be under conditions which are unlikely in the real world. Apparently I pulled it down at some point, so I don't have a URL for you anymore, but I did, and I posted. Long story short, both raidzN and mirror configurations behave approximately the way you would hope they do. That is... Approximately, as compared to a single disk: And I *mean* approximately, because I'm just pulling it back from memory the way I chose to remember it, which is to say, a simplified model that I felt comfortable with: seq rd seq wr rand rd rand wr 2-disk mirror 2x 1x 2x 1x 3-disk mirror 3x 1x 3x 1x 2x 2disk mirr 4x 2x 4x 2x 3x 2disk mirr 6x 3x 6x 3x 3-disk raidz2x 2x 1x 1x 4-disk raidz3x 3x 1x 1x 5-disk raidz4x 4x 1x 1x 6-disk raidz5x 5x 1x 1x I went on to test larger and more complex arrangements... Started getting things like 1.9x and 1.8x where I would have expected 2x and so forth... Sorry for being vague now, but the data isn't in front of me anymore. Might not ever be again. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] IOzone benchmarking
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Paul Kraus Even with uncompressable data I measure better performance with compression turned on rather than off. *cough* ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hung zfs destroy
On 05/ 8/12 08:36 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Ian Collins On a Solaris 11 (SR3) system I have a zfs destroy process what appears to be doing nothing and can't be killed. It has used 5 seconds of CPU in a day and a half, but truss -p won't attach. No data appears to have been removed. The dataset (but not the pool) is busy. I thought this was an old problem that was fixed long ago in Solaris 10 (I had several temporary patches over the years), but it appears to be alive and well. How big is your dataset? Small, 15GB. On what type of disks/pool? Single iSCSI volume. zfs destroy does indeed take time (unlike zpool destroy.) A couple of days might be normal expected behavior, depending on your configuration. You didn't specify if you have dedup... Dedup will greatly hurt your zfs destroy speed, too. I've yet to find a system with enough RAM to make dedup worthwhile! After 5 days, a grand total of 1.2GB has been removed and the process responded to kill -9 and exited... I just re-ran the command it it completed in 2 seconds. Well odd. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] IOzone benchmarking
On Mon, 7 May 2012, Edward Ned Harvey wrote: Apparently I pulled it down at some point, so I don't have a URL for you anymore, but I did, and I posted. Long story short, both raidzN and mirror configurations behave approximately the way you would hope they do. That is... Approximately, as compared to a single disk: And I *mean* approximately, Yes, I remember your results. In a few weeks I should be setting up a new system with OpenIndiana and 8 SAS disks. This will give me an opportunity to test again. Last time I got to play was back in Feburary 2008 and I did not bother to test raidz (http://www.simplesystems.org/users/bfriesen/zfs-discuss/2540-zfs-performance.pdf). Most common benchmarking is sequential read/write and rarely read-file/write-file where 'file' is a megabyte or two and the file is different for each iteration. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
On 12-05-07 12:18 PM, Jim Klimov wrote: During the send you can also monitor zpool iostat 1 and usual iostat -xnz 1 in order to see how busy the disks are and how many IO requests are issued. The snapshots are likely sent in the order of block age (TXG number), which for a busy pool may mean heavy fragmentation and lots of random small IOs.. I have been able to verify that I can get a zfs send at 135MB/sec for a striped pool with 2 internal drives on the same server. Each dataset had about 3-4 snapshots. There were about 36 datasets I deleted the snapshots and the speed may have increased slightly. Given iostat -xnz 1 it looks like the IO's are very high. So I guess the drives are badly fragmented. So is fixing this going to require a zfs pool rebuild? Karl extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.04.00.0 16.0 0.0 0.00.00.0 0 0 c0t500151795955D430d0 0.04.00.0 16.0 0.0 0.00.00.0 0 0 c0t500151795955BDB6d0 0.01.00.08.0 0.0 0.00.00.1 0 0 c0t5001517BB271845Dd0 759.00.0 4800.00.0 0.0 2.90.03.8 0 75 c0t5000C500103F2057d0 887.00.0 4738.00.0 0.0 1.60.01.8 0 42 c0t5000C500103E9FFBd0 915.00.0 4628.50.0 0.0 1.50.01.6 0 30 c0t5000C5000440AA0Bd0 922.00.0 4676.50.0 0.0 1.00.01.1 0 26 c0t5000C500103E120Fd0 970.00.0 4276.00.0 0.0 1.00.01.0 0 20 c0t5000C500103E370Fd0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.04.00.0 32.0 0.0 0.00.00.1 0 0 c0t5001517BB271845Dd0 1363.00.0 9007.80.0 0.0 2.00.01.5 1 54 c0t5000C500103F2057d0 1405.00.0 10169.20.0 0.0 1.80.01.3 1 37 c0t5000C500103E9FFBd0 1448.00.0 9884.20.0 0.0 1.70.01.2 1 40 c0t5000C5000440AA0Bd0 1264.00.0 9537.30.0 0.0 2.10.01.7 0 51 c0t5000C500103E120Fd0 1260.00.0 9749.80.0 0.0 1.90.01.5 0 44 c0t5000C500103E370Fd0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.06.00.0 24.0 0.0 0.00.00.0 0 0 c0t500151795955D430d0 0.06.00.0 24.0 0.0 0.00.00.0 0 0 c0t500151795955BDB6d0 1023.00.0 5131.60.0 0.0 1.60.01.6 0 45 c0t5000C500103F2057d0 1003.00.0 5040.10.0 0.0 1.50.01.5 0 36 c0t5000C500103E9FFBd0 959.00.0 5069.10.0 0.0 1.70.01.8 0 46 c0t5000C5000440AA0Bd0 941.00.0 5117.60.0 0.0 1.70.01.8 0 45 c0t5000C500103E120Fd0 1043.00.0 5034.10.0 0.0 1.00.01.0 0 24 c0t5000C500103E370Fd0 CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
On Mon, 7 May 2012, Karl Rossing wrote: On 12-05-07 12:18 PM, Jim Klimov wrote: During the send you can also monitor zpool iostat 1 and usual iostat -xnz 1 in order to see how busy the disks are and how many IO requests are issued. The snapshots are likely sent in the order of block age (TXG number), which for a busy pool may mean heavy fragmentation and lots of random small IOs.. I have been able to verify that I can get a zfs send at 135MB/sec for a striped pool with 2 internal drives on the same server. I see that there are a huge number of reads and hardy any reads. Are you SURE that deduplication was not enabled for this pool? This is the sort of behavior that one might expect if deduplication was enabled without enough RAM or L2 read cache. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow zfs send
On 12-05-07 8:45 PM, Bob Friesenhahn wrote: I see that there are a huge number of reads and hardy any reads. Are you SURE that deduplication was not enabled for this pool? This is the sort of behavior that one might expect if deduplication was enabled without enough RAM or L2 read cache. Bob After hours the pool is pretty quiet. zpool history does not have dedup. zfs get dedup shows dedup off. Karl CONFIDENTIALITY NOTICE: This communication (including all attachments) is confidential and is intended for the use of the named addressee(s) only and may contain information that is private, confidential, privileged, and exempt from disclosure under law. All rights to privilege are expressly claimed and reserved and are not waived. Any use, dissemination, distribution, copying or disclosure of this message and any attachments, in whole or in part, by anyone other than the intended recipient(s) is strictly prohibited. If you have received this communication in error, please notify the sender immediately, delete this communication from all data storage devices and destroy all hard copies. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss