[zfs-discuss] Poor relative performance of SAS over SATA drives
Greetings, I have a fresh installation of OI151a: - SM X8DTH, 12GB RAM, LSI 9211-8i (latest IT-mode firmware) - pool_A : SG ES.2 Constellation (SAS) - pool_B : WD RE4 (SATA) - no settings in /etc/system *zpool status output* --- admin@openindiana:~# zpool status pool: pool_A state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool_A ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c7t5000C50035062EC1d0 ONLINE 0 0 0 c8t5000C50034C03759d0 ONLINE 0 0 0 pool: pool_B state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM pool_B ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t50014EE057FCD628d0 ONLINE 0 0 0 c2t50014EE6ABB89957d0 ONLINE 0 0 0 *Load generation via 2 concurrent dd streams:* -- dd if=/dev/zero of=/pool_A/bigfile bs=1024k count=100 dd if=/dev/zero of=/pool_B/bigfile bs=1024k count=100 *Initial Observation* --- capacity operationsbandwidth poolalloc free read write read write -- - - - - - - pool_A 1.68G 2.72T 0652 0 73.4M mirror1.68G 2.72T 0652 0 73.4M c7t5000C50035062EC1d0 - - 0619 0 73.4M c8t5000C50034C03759d0 - - 0619 0 73.4M -- - - - - - - pool_B 1.54G 1.81T 0 1.05K 0 123M mirror1.54G 1.81T 0 1.05K 0 123M c1t50014EE057FCD628d0 - - 0 1.02K 0 123M c2t50014EE6ABB89957d0 - - 0 1.01K 0 123M * 10-15mins later* =-- capacity operationsbandwidth poolalloc free read write read write -- - - - - - - pool_A 15.5G 2.70T 0 50 0 6.29M mirror15.5G 2.70T 0 50 0 6.29M c7t5000C50035062EC1d0 - - 0 62 0 7.76M c8t5000C50034C03759d0 - - 0 50 0 6.29M -- - - - - - - pool_B 28.0G 1.79T 0 1.07K 0 123M mirror28.0G 1.79T 0 1.07K 0 123M c1t50014EE057FCD628d0 - - 0 1.02K 0 123M c2t50014EE6ABB89957d0 - - 0 1.02K 0 123M Questions: 1. Why does SG SAS drives degrade to 10 MB/s while WD RE4 remain consistent at 100MB/s after 10-15 min? 2. Why does SG SAS drive show only 70+ MB/s where is the published figures are 100MB/s refer here http://www.seagate.com/www/en-us/products/enterprise-hard-drives/constellation-es/constellation-es-2/#tTabContentSpecifications? 3. All 4 drives are connected to a single HBA, so I assume the mpt_sas driver is used. Are SAS and SATA drives handled differently ? This is a test server, so any ideas to try and help me understand greatly appreciated. Many thanks, WL ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor relative performance of SAS over SATA drives
Hi, Thanks for the replies. In the beginning, I only had SAS drives installed when I observed the behavior, the SATA drives were added later for comparison and troubleshooting. The slow behavior is observed only after 10-15mins of running dd where the file size is about 15GB, then the throughput drops suddenly from 70 to 50 to 20 to 10MB/s in a matter of seconds and never recovers. This couldn't be right no matter how look at it. Regards, WL On 10/27/2011 9:59 PM, Brian Wilson wrote: On 10/27/11 07:03 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of weiliam.hong 3. All 4 drives are connected to a single HBA, so I assume the mpt_sas driver is used. Are SAS and SATA drives handled differently ? If they're all on the same HBA, they may be all on the same bus. It may be *because* you're mixing SATA and SAS disks on the same bus. I'll suggest separating the tests, don't run them concurrently, and see if there's any difference. Also, the HBA might have different defaults for SAS vs SATA, look in the HBA to see if write back / write through are the same... I don't know if the HBA gives you some way to enable/disable the on-disk cache, but take a look and see. Also, maybe the SAS disks are only doing SATA. If the HBA is only able to do SATA, then SAS disks will work, but might not work as optimally as they would if they were connected to a real SAS HBA. And one final thing - If you're planning to run ZFS (as I suspect you are, posting on this list running OI) ... It actually works *better* without any HBA. *Footnote *Footnote: ZFS works the worst, if you have ZIL enabled, no log device, and no HBA. It's a significant improvement, if you add a battery backed or nonvolatile HBA with writeback. It's a signfiicant improvement again, if you get rid of the HBA, add a log device. It's a significant improvement yet again, if you get rid of the HBA and log device, and run with ZIL disabled (if your work load is compatible with a disabled ZIL.) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss First, ditto everything Edward says above. I'd add that your dd test creates a lot of straight sequential IO, not anything that's likely to be random IO. I can't speak to why your SAS might not be performing any better than Edward did, but your SATA's probably screaming on straight sequential IO, where on something more random I would bet they won't perform as well as they do in this test. The tool I've seen used for that sort of testing is iozone - I'm sure there are others as well, and I can't attest what's better or worse. cheers, Brian ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor relative performance of SAS over SATA drives
Thanks for the reply. Some background.. The server is fresh installed. Right before running the tests, the pools are newly created. Some comments below On 10/31/2011 10:33 PM, Paul Kraus wrote: A couple points in line below ... On Wed, Oct 26, 2011 at 10:56 PM, weiliam.hongweiliam.h...@gmail.com wrote: I have a fresh installation of OI151a: - SM X8DTH, 12GB RAM, LSI 9211-8i (latest IT-mode firmware) - pool_A : SG ES.2 Constellation (SAS) - pool_B : WD RE4 (SATA) - no settings in /etc/system Load generation via 2 concurrent dd streams: -- dd if=/dev/zero of=/pool_A/bigfile bs=1024k count=100 dd if=/dev/zero of=/pool_B/bigfile bs=1024k count=100 dd generates straight line data, all sequential. yes. capacity operationsbandwidth poolalloc free read write read write -- - - - - - - pool_A 15.5G 2.70T 0 50 0 6.29M mirror15.5G 2.70T 0 50 0 6.29M c7t5000C50035062EC1d0 - - 0 62 0 7.76M c8t5000C50034C03759d0 - - 0 50 0 6.29M -- - - - - - - pool_B 28.0G 1.79T 0 1.07K 0 123M mirror28.0G 1.79T 0 1.07K 0 123M c1t50014EE057FCD628d0 - - 0 1.02K 0 123M c2t50014EE6ABB89957d0 - - 0 1.02K 0 123M What does `iostat -xnM c7t5000C50035062EC1d0 c8t5000C50034C03759d0 c1t50014EE057FCD628d0 c2t50014EE6ABB89957d0 1` show ? That will give you much more insight into the OS- drive interface. iostat numbers are similar. I will try to get the figures, a bit hard now as the hardware has been taken off my hands. What does `fsstat /pool_A /pool_B 1` show ? That will give you much more insight into the application- filesystem interface. In this case application == dd. In my opinion, `zpool iostat -v` is somewhat limited in what you can learn from it. The only thing I use it for these days is to see distribution of data and I/O between vdevs. Questions: 1. Why does SG SAS drives degrade to10 MB/s while WD RE4 remain consistent at100MB/s after 10-15 min? Something changes to slow them down ? Sorry for the obvious retort :-) See what iostat has to say. If the %b column is climbing, then you are slowly saturating the drives themselves, for example. There is no other workload or user using this system. The system is freshly installed, booted and the pools newly created. 2. Why does SG SAS drive show only 70+ MB/s where is the published figures are 100MB/s refer here? published where ? http://www.seagate.com/www/en-au/products/enterprise-hard-drives/constellation-es/constellation-es-2/#tTabContentSpecifications What does a dd to the device itself (no ZFS, no FS at all) show ? For example, `dd if=/dev/zero of=/dev/dsk/c7t5000C50035062EC1d0s0 bs=1024k count=100` (after you destroy the zpool and use format to create an s0 of the entire disk). This will test the device driver / HBA / drive with no FS or volume manager involved. Use iostat to watch the OS- drive interface. Perhaps the test below is useful to understand the observation. *dd test on slice 0* dd if=/dev/zero of=/dev/rdsk/c1t5000C50035062EC1d0s0 bs=1024k extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 155.40.0 159129.7 0.0 1.00.06.3 0 97 c1 0.0 155.40.0 159129.7 0.0 1.00.06.3 0 97 c1t5000C50035062EC1d0 == this is best case *dd test on slice 6* **dd if=/dev/zero of=/dev/rdsk/c1t5000C50035062EC1d0s6 bs=1024k extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 21.40.0 21913.6 0.0 1.00.0 46.6 0 100 c1 0.0 21.40.0 21913.6 0.0 1.00.0 46.6 0 100 c1t5000C50035062EC1d0 == only 20+MB/s !!! *Partition table info* Part TagFlag First Sector Size Last Sector 0usrwm 256 100.00GB 209715455 1 unassignedwm 000 2 unassignedwm 000 3 unassignedwm 000 4 unassignedwm 000 5 unassignedwm 000 6usrwm5650801295 100.00GB 5860516749 8 reservedwm5860516751 8.00MB 5860533134 Referring to pg 18 of http://www.seagate.com/staticfiles/support/docs/manual/enterprise/Constellation%203_5%20in/100628615f.pdf The transfer rate is supposed range from 68 - 155 MB/s. Why is the inner cylinders only showing 20+ MB/s ? Am I testing and understanding this wrongly ? 3. All 4 drives are connected to a
Re: [zfs-discuss] Poor relative performance of SAS over SATA drives
Thanks for the reply. On 11/1/2011 11:03 AM, Richard Elling wrote: On Oct 26, 2011, at 7:56 PM, weiliam.hong wrote: Questions: 1. Why does SG SAS drives degrade to10 MB/s while WD RE4 remain consistent at100MB/s after 10-15 min? 2. Why does SG SAS drive show only 70+ MB/s where is the published figures are 100MB/s refer here? Are the SAS drives multipathed? If so, do you have round-robin (default in most Solaris distros) or logical-block? Physically, the SAS drives are not multipathed as I connected them directly to the HBA. I also disable multipathing via mpt_sas.conf. Regards, 3. All 4 drives are connected to a single HBA, so I assume the mpt_sas driver is used. Are SAS and SATA drives handled differently ? Yes. SAS disks can be multipathed, SATA disks cannot. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss