Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
Now, if anyone is still reading, I have another question. The new Solaris 11 device naming convention hides the physical tree from me. I got just a list of long disk names all starting with c0 (see below) but I need to know which disk is connected to which controller so that I can create two parts of my mirrors to two different controllers in order to tolerate a single controller failure. I need a way of figuring the connection path for each disk. Hope I manage to explain what I want? See diskinfo(1M), for example: $ diskinfo -T bay -o Rc -h HDD00 - HDD01 - HDD02 c0t5000CCA00AC87F54d0 HDD03 c0t5000CCA00AA95838d0 HDD04 c0t5000CCA01510ECC0d0 HDD05 c0t5000CCA01515EE78d0 HDD06 c0t5000CCA01512DA3Cd0 HDD07 c0t5000CCA00AB3E1C8d0 HDD08 c0t5000CCA0151C1D18d0 HDD09 c0t5000CCA0151F7E08d0 HDD10 c0t5000CCA0151C7CA8d0 HDD11 c0t5000CCA00AA9D570d0 HDD12 c0t5000CCA0151CB180d0 HDD13 c0t5000CCA015208C98d0 HDD14 c0t5000CCA00AA97F04d0 HDD15 c0t5000CCA0151A287Cd0 HDD16 c0t5000CCA00AAA1544d0 HDD17 c0t5000CCA01521070Cd0 HDD18 c0t5000CCA00AA97EF4d0 HDD19 c0t5000CCA015214F84d0 HDD20 c0t5000CCA015214844d0 HDD21 c0t5000CCA00AAAD154d0 HDD22 c0t5000CCA00AA95558d0 HDD23 c0t5000CCA00AAA0D1Cd0 In your case you probably will have to put a configuration in place for your disk slots (on Oracle's HW it works out of the box) - go to support.oracle.com and look for the document: How To : Selecting a Physical Slot for a SAS Device with a WWN for an Oracle Solaris 11 Installation [ID 1411444.1] ps. there is also zpool status -l option which is cool: $ zpool status -l cwafseng3-0 pool: pool-0 state: ONLINE scan: scrub canceled on Thu Apr 12 13:52:13 2012 config: NAME STATE READ WRITE CKSUM pool-0 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD02/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD23/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD22/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD21/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD20/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD19/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD17/disk ONLINE 0 0 0 /dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD15/disk ONLINE 0 0 0 errors: No known data errors Best regards, Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
Thanks for the tips, everybody! Progress report: OpenIndiana failed to recognise LSI 9240-8i's. I installed 4.7 drivers from LSI website (for Solaris 11 and up) but it started throwing component failed messages. So I gave up on 9240's and re-flashed them into 9211-8i's (IT mode). Solaris 11 (11.11) recognised 9211 adapters instantly and so far show perfect performance with default drivers with dd test on raw disks both reading and writing and also on dd writing into a zpool built of 10 x two-way mirrors. The speed is around 1GB/s. There are still some hiccups in this sequential write process (for 4-5 sec the speed would drop on all disks suddenly when monitored by iostat but then pick up to the usual 140MB/s per disk). This is so much better then Solaris 11 with 9240's going persistently around 3-4MB/s per disk on a simple dd seq write. I am pleased with this performance. Now, if anyone is still reading, I have another question. The new Solaris 11 device naming convention hides the physical tree from me. I got just a list of long disk names all starting with c0 (see below) but I need to know which disk is connected to which controller so that I can create two parts of my mirrors to two different controllers in order to tolerate a single controller failure. I need a way of figuring the connection path for each disk. Hope I manage to explain what I want? root@carbon:~# echo | format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t5000CCA225CEFC73d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cefc73 1. c0t5000CCA225CEFD0Bd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cefd0b 2. c0t5000CCA225CEFD12d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cefd12 3. c0t5000CCA225CEFEDEd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cefede 4. c0t5000CCA225CEFEE7d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cefee7 5. c0t5000CCA225CF016Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf016c 6. c0t5000CCA225CF016Dd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf016d 7. c0t5000CCA225CF016Ed0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf016e 8. c0t5000CCA225CF023Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf023c 9. c0t5000CCA225CF042Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf042c 10. c0t5000CCA225CF050Fd0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf050f 11. c0t5000CCA225CF0115d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0115 12. c0t5000CCA225CF0119d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0119 13. c0t5000CCA225CF0144d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0144 14. c0t5000CCA225CF0156d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0156 15. c0t5000CCA225CF0167d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0167 16. c0t5000CCA225CF0419d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0419 17. c0t5000CCA225CF0420d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0420 18. c0t5000CCA225CF0517d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0517 19. c0t5000CCA225CF0522d0 ATA-Hitachi HUA72303-A5C0-2.73TB /scsi_vhci/disk@g5000cca225cf0522 20. c0t5001517BB27B5896d0 ATA-INTEL SSDSC2CW24-400i-223.57GB /scsi_vhci/disk@g5001517bb27b5896 21. c0t5001517BB27DCE0Bd0 ATA-INTEL SSDSC2CW24-400i-223.57GB /scsi_vhci/disk@g5001517bb27dce0b ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
I followed this guide but instead of 2108it.bin I downloaded the latest firmware file for 9211-8i from LSI web site. I now have three 9211's! :) http://lime-technology.com/forum/index.php?topic=12767.msg124393#msg124393 On 4 May 2012 18:33, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Fri, 4 May 2012, Rocky Shek wrote: If I were you, I will not use 9240-8I. I will use 9211-8I as pure HBA with IT FW for ZFS. Is there IT FW for the 9240-8i? They seem to use the same SAS chipset. My next system will have 9211-8i with IT FW. Playing it safe. Good enough for Nexenta is good enough for me. Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
Downloaded, unzipped and flying! It shows GUID which is part of the /dev/rdsk/c0t* name! Thanks!!! And thanks again! This msg goes to the group. root@carbon:~/bin/LSI-SAS2IRCU/SAS2IRCU_P13/sas2ircu_solaris_x86_rel# ./sas2ircu 0 DISPLAY | grep GUID GUID: 5000cca225cefd12 GUID: 5000cca225cf0119 GUID: 5000cca225cefd0b GUID: 5000cca225cf0420 GUID: 5000cca225cf0517 GUID: 5000cca225cf0115 GUID: 5000cca225cf016d On 9 May 2012 15:32, Daniel J. Priem daniel.pr...@disy.net wrote: http://www.lsi.com/channel/products/storagecomponents/Pages/LSISAS9211-8i.aspx select * SUPPORT DOWNLOADS download SAS2IRCU_P13 best regards daniel Roman Matiyenko rmatiye...@gmail.com writes: Thanks! LSI 9211-8i are not recognised by lsiutil. I run this S11 under VMware ESXi with the three PCI devices in pass-through mode. The main (virtual) disk controller is LSI as well with an VMDK boot disk attached, and it is recognised. Nevertheless, many thanks for trying to help! Roman PS I got your other message with links, will see now... root@carbon:~/bin# ./lsiutil LSI Logic MPT Configuration Utility, Version 1.62, January 14, 2009 1 MPT Port found Port Name Chip Vendor/Type/Rev MPT Rev Firmware Rev IOC 1. mpt0 LSI Logic 53C1030 B0 102 01032920 0 Select a device: [1-1 or 0 to quit] 1 1. Identify firmware, BIOS, and/or FCode 2. Download firmware (update the FLASH) 4. Download/erase BIOS and/or FCode (update the FLASH) 8. Scan for devices 10. Change IOC settings (interrupt coalescing) 11. Change SCSI Initiator settings 12. Change SCSI Target settings 20. Diagnostics 21. RAID actions 22. Reset bus 23. Reset target 42. Display operating system names for devices 59. Dump PCI config space 60. Show non-default settings 61. Restore default settings 69. Show board manufacturing information 99. Reset port e Enable expert mode in menus p Enable paged mode w Enable logging Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 mpt0 is /dev/cfg/c4 B___T___L Type Operating System Device Name 0 0 0 Disk /dev/rdsk/c4t0d0s2 Main menu, select an option: [1-99 or e/p/w or 0 to quit] 8 53C1030's host SCSI ID is 7 B___T___L Type Vendor Product Rev Negotiated Speed Width 0 0 0 Disk VMware Virtual disk 1.0 Ultra4 Wide, 320 MB/sec On 9 May 2012 15:08, Daniel J. Priem daniel.pr...@disy.net wrote: attached. i didn't know where to download Roman Matiyenko rmatiye...@gmail.com writes: Hi Daniel, Thanks. Where do I get lsiutil? I am on Oracle Solaris 11. LSI wibsite says that for 9211-8i you don't need to install drivers as they come with Solaris OS. So they don't have anything to download for solaris. Roman On 9 May 2012 14:22, Daniel J. Priem daniel.pr...@disy.net wrote: Hi, Roman Matiyenko rmatiye...@gmail.com writes: Now, if anyone is still reading, I have another question. The new Solaris 11 device naming convention hides the physical tree from me. I got just a list of long disk names all starting with c0 (see below) but I need to know which disk is connected to which controller so that I can create two parts of my mirrors to two different controllers in order to tolerate a single controller failure. I need a way of figuring the connection path for each disk. Hope I manage to explain what I want? lsiutil select controller select option 42 Main menu, select an option: [1-99 or e/p/w or 0 to quit] 42 mpt2 is /dev/cfg/c8 B___T___L Type Operating System Device Name 0 0 0 Disk /dev/rdsk/c8t0d0s2 0 1 0 Disk /dev/rdsk/c8t1d0s2 0 2 0 Tape /dev/rmt/0 0 3 0 Disk /dev/rdsk/c8t3d0s2 0 4 0 Disk /dev/rdsk/c8t4d0s2 Main menu, select an option: [1-99 or e/p/w or 0 to quit] Best Regards Daniel -- disy Informationssysteme GmbH Daniel Priem Senior Netzwerk- und Systemadministrator Tel: +49 721 1 600 6-000, Fax: -05, E-Mail: daniel.pr...@disy.net Firmensitz: Erbprinzenstr. 4-12, 76133 Karlsruhe Registergericht: Amtsgericht Mannheim, HRB 107964 Geschäftsführer: Claus Hofmann -- disy Informationssysteme GmbH Daniel Priem Senior Netzwerk- und Systemadministrator Tel: +49 721 1 600 6-000, Fax: -05, E-Mail: daniel.pr...@disy.net Firmensitz: Erbprinzenstr. 4-12, 76133 Karlsruhe Registergericht: Amtsgericht Mannheim, HRB 107964 Geschäftsführer: Claus Hofmann ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance on LSI 9240-8i?
Hi all, I have a bad bad problem with our brand new server! The lengthy details are below but to cut the story short, on the same hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and only 200-240MB/s on latest Solaris 11.11 (same zpool config). By writing directly to raw disks I found that in S10 the speed is 140MB/s sequential writes per disk (consistent with combined 1.4GB/s for my zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s zpool, 10 mirrors 24MB/s each). This must be the controller drivers, right? I downloaded drivers version 4.7 off LSI site (says for Solaris 10 and later) - they failed to attach on S11. Version 3.03 worked but the system would randomly crash, so I moved my experiments off S11 to S10. However, S10 has only the old implementation if iSCSI which gives me other problems so I decided to give S11 another go. Would there be any advice in this community? Many thanks! Roman == root@carbon:~# echo | format | grep Hitachi 1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB 20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB Reading DD from all disks: (dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 ) # Iostat –xznM 2 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 614.50.0 153.60.0 0.0 1.00.01.6 0 98 c5t8d1 595.50.0 148.90.0 0.0 1.00.01.7 0 99 c7t8d1 1566.50.0 391.60.0 0.0 1.00.00.6 1 96 c6t8d1 # (SSD) 618.50.0 154.60.0 0.0 1.00.01.6 0 99 c6t9d1 616.50.0 154.10.0 0.0 1.00.01.6 0 99 c5t9d1 1564.00.0 391.00.0 0.0 1.00.00.6 1 96 c7t9d1# (SSD) 616.00.0 154.00.0 0.0 1.00.01.6 0 98 c7t10d1 554.00.0 138.50.0 0.0 1.00.01.8 0 99 c6t10d1 598.50.0 149.60.0 0.0 1.00.01.7 0 99 c5t10d1 588.50.0 147.10.0 0.0 1.00.01.7 0 98 c6t11d1 590.50.0 147.60.0 0.0 1.00.01.7 0 98 c7t11d1 591.50.0 147.90.0 0.0 1.00.01.7 0 99 c5t11d1 600.50.0 150.10.0 0.0 1.00.01.6 0 98 c6t13d1 617.50.0 154.40.0 0.0 1.00.01.6 0 99 c7t12d1 611.00.0 152.80.0 0.0 1.00.01.6 0 99 c5t13d1 625.00.0 156.30.0 0.0 1.00.01.6 0 99 c6t14d1 592.50.0 148.10.0 0.0 1.00.01.7 0 99 c7t13d1 596.00.0 149.00.0 0.0 1.00.01.7 0 99 c5t14d1 598.50.0 149.60.0 0.0 1.00.01.6 0 98 c6t15d1 618.50.0 154.60.0 0.0 1.00.01.6 0 98 c7t14d1 606.50.0 151.60.0 0.0 1.00.01.6 0 98 c5t15d1 625.00.0 156.30.0 0.0 1.00.01.6 0 98 c7t15d1 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c5t8d1 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c7t8d1 1581.00.0 395.20.0 0.0 1.00.00.6 1 96 c6t8d1 611.50.0 152.90.0 0.0 1.00.01.6 0 99 c6t9d1 587.50.0 146.90.0 0.0 1.00.01.7 0 99 c5t9d1 1580.00.0 395.00.0 0.0 1.00.00.6 1 97 c7t9d1 593.00.0 148.20.0 0.0 1.00.01.7 0 99 c7t10d1 616.00.0 154.00.0 0.0 1.00.01.6 0 99 c6t10d1 601.00.0 150.20.0 0.0 1.00.01.6 0 99 c5t10d1 587.00.0 146.70.0 0.0 1.00.01.7 0 99 c6t11d1 578.50.0 144.60.0 0.0 1.00.01.7 0 99 c7t11d1 624.50.0 156.10.0 0.0 1.00.01.6 0 99 c5t11d1 604.50.0 151.10.0 0.0 1.00.01.6 0 99 c6t13d1 573.50.0 143.40.0 0.0 1.00.01.7 0 99 c7t12d1 609.00.0 152.20.0 0.0 1.00.01.6 0 99 c5t13d1 630.50.0 157.60.0 0.0 1.00.01.6 0 99 c6t14d1 618.50.0
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
/pci1000,9240@0 1 imraid_sas /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@8,1 4 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@9,1 5 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@a,1 9 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@b,1 11 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@d,1 14 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@e,1 17 sd /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@f,1 20 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0 2 imraid_sas /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@8,1 3 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@9,1 7 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@a,1 8 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@b,1 12 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@c,1 15 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@d,1 18 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@e,1 21 sd /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@f,1 23 sd root@carbon:~# grep imraid /etc/driver_aliases imraid_sas pciex1000,73 # ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- ZFS Performance and Training richard.ell...@richardelling.com +1-760-896-4422 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
hi s11 come with its own driver for some lsi sas HCA but on the HCL I only see LSI SAS 9200-8e http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi_logic/sol_11_11_11/9409.html LSI MegaRAID SAS 9260-8i http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi/sol_10_10_09/3264.html LSI 6Gb SAS2008 daughtercard http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi/sol_10_10_09/3263.html regards On 5/4/2012 8:25 AM, Roman Matiyenko wrote: Hi all, I have a bad bad problem with our brand new server! The lengthy details are below but to cut the story short, on the same hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and only 200-240MB/s on latest Solaris 11.11 (same zpool config). By writing directly to raw disks I found that in S10 the speed is 140MB/s sequential writes per disk (consistent with combined 1.4GB/s for my zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s zpool, 10 mirrors 24MB/s each). This must be the controller drivers, right? I downloaded drivers version 4.7 off LSI site (says for Solaris 10 and later) - they failed to attach on S11. Version 3.03 worked but the system would randomly crash, so I moved my experiments off S11 to S10. However, S10 has only the old implementation if iSCSI which gives me other problems so I decided to give S11 another go. Would there be any advice in this community? Many thanks! Roman == root@carbon:~# echo | format | grep Hitachi 1. c5t8d1ATA-Hitachi HUA72303-A5C0-2.73TB 2. c5t9d1ATA-Hitachi HUA72303-A5C0-2.73TB 3. c5t10d1ATA-Hitachi HUA72303-A5C0-2.73TB 4. c5t11d1ATA-Hitachi HUA72303-A5C0-2.73TB 5. c5t13d1ATA-Hitachi HUA72303-A5C0-2.73TB 6. c5t14d1ATA-Hitachi HUA72303-A5C0-2.73TB 7. c5t15d1ATA-Hitachi HUA72303-A5C0-2.73TB 9. c6t9d1ATA-Hitachi HUA72303-A5C0-2.73TB 10. c6t10d1ATA-Hitachi HUA72303-A5C0-2.73TB 11. c6t11d1ATA-Hitachi HUA72303-A5C0-2.73TB 12. c6t13d1ATA-Hitachi HUA72303-A5C0-2.73TB 13. c6t14d1ATA-Hitachi HUA72303-A5C0-2.73TB 14. c6t15d1ATA-Hitachi HUA72303-A5C0-2.73TB 15. c7t8d1ATA-Hitachi HUA72303-A5C0-2.73TB 17. c7t10d1ATA-Hitachi HUA72303-A5C0-2.73TB 18. c7t11d1ATA-Hitachi HUA72303-A5C0-2.73TB 19. c7t12d1ATA-Hitachi HUA72303-A5C0-2.73TB 20. c7t13d1ATA-Hitachi HUA72303-A5C0-2.73TB 21. c7t14d1ATA-Hitachi HUA72303-A5C0-2.73TB 22. c7t15d1ATA-Hitachi HUA72303-A5C0-2.73TB Reading DD from all disks: (dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1) # Iostat –xznM 2 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 614.50.0 153.60.0 0.0 1.00.01.6 0 98 c5t8d1 595.50.0 148.90.0 0.0 1.00.01.7 0 99 c7t8d1 1566.50.0 391.60.0 0.0 1.00.00.6 1 96 c6t8d1 # (SSD) 618.50.0 154.60.0 0.0 1.00.01.6 0 99 c6t9d1 616.50.0 154.10.0 0.0 1.00.01.6 0 99 c5t9d1 1564.00.0 391.00.0 0.0 1.00.00.6 1 96 c7t9d1# (SSD) 616.00.0 154.00.0 0.0 1.00.01.6 0 98 c7t10d1 554.00.0 138.50.0 0.0 1.00.01.8 0 99 c6t10d1 598.50.0 149.60.0 0.0 1.00.01.7 0 99 c5t10d1 588.50.0 147.10.0 0.0 1.00.01.7 0 98 c6t11d1 590.50.0 147.60.0 0.0 1.00.01.7 0 98 c7t11d1 591.50.0 147.90.0 0.0 1.00.01.7 0 99 c5t11d1 600.50.0 150.10.0 0.0 1.00.01.6 0 98 c6t13d1 617.50.0 154.40.0 0.0 1.00.01.6 0 99 c7t12d1 611.00.0 152.80.0 0.0 1.00.01.6 0 99 c5t13d1 625.00.0 156.30.0 0.0 1.00.01.6 0 99 c6t14d1 592.50.0 148.10.0 0.0 1.00.01.7 0 99 c7t13d1 596.00.0 149.00.0 0.0 1.00.01.7 0 99 c5t14d1 598.50.0 149.60.0 0.0 1.00.01.6 0 98 c6t15d1 618.50.0 154.60.0 0.0 1.00.01.6 0 98 c7t14d1 606.50.0 151.60.0 0.0 1.00.01.6 0 98 c5t15d1 625.00.0 156.30.0 0.0 1.00.01.6 0 98 c7t15d1 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c5t8d1 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c7t8d1 1581.00.0 395.20.0 0.0 1.00.00.6 1 96 c6t8d1 611.50.0 152.90.0 0.0 1.00.01.6 0 99 c6t9d1 587.50.0 146.90.0 0.0 1.00.01.7 0 99 c5t9d1 1580.00.0 395.00.0 0.0 1.00.00.6 1 97 c7t9d1 593.00.0 148.20.0 0.0 1.00.01.7 0 99 c7t10d1 616.00.0 154.00.0 0.0 1.00.01.6 0 99 c6t10d1
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
Roman, If I were you, I will not use 9240-8I. I will use 9211-8I as pure HBA with IT FW for ZFS. Rocky From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Richard Elling Sent: Friday, May 04, 2012 8:00 AM To: Roman Matiyenko Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] ZFS performance on LSI 9240-8i? On May 4, 2012, at 5:25 AM, Roman Matiyenko wrote: Hi all, I have a bad bad problem with our brand new server! The lengthy details are below but to cut the story short, on the same hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and only 200-240MB/s on latest Solaris 11.11 (same zpool config). By writing directly to raw disks I found that in S10 the speed is 140MB/s sequential writes per disk (consistent with combined 1.4GB/s for my zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s zpool, 10 mirrors 24MB/s each). This must be the controller drivers, right? I downloaded drivers version 4.7 off LSI site (says for Solaris 10 and later) - they failed to attach on S11. Version 3.03 worked but the system would randomly crash, so I moved my experiments off S11 to S10. However, S10 has only the old implementation if iSCSI which gives me other problems so I decided to give S11 another go. Would there be any advice in this community? Look at one of the other distros, OpenIndiana is a good first step. -- richard Many thanks! Roman == root@carbon:~# echo | format | grep Hitachi 1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB 20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB Reading DD from all disks: (dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 ) # Iostat -xznM 2 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 614.50.0 153.60.0 0.0 1.00.01.6 0 98 c5t8d1 595.50.0 148.90.0 0.0 1.00.01.7 0 99 c7t8d1 1566.50.0 391.60.0 0.0 1.00.00.6 1 96 c6t8d1 # (SSD) 618.50.0 154.60.0 0.0 1.00.01.6 0 99 c6t9d1 616.50.0 154.10.0 0.0 1.00.01.6 0 99 c5t9d1 1564.00.0 391.00.0 0.0 1.00.00.6 1 96 c7t9d1# (SSD) 616.00.0 154.00.0 0.0 1.00.01.6 0 98 c7t10d1 554.00.0 138.50.0 0.0 1.00.01.8 0 99 c6t10d1 598.50.0 149.60.0 0.0 1.00.01.7 0 99 c5t10d1 588.50.0 147.10.0 0.0 1.00.01.7 0 98 c6t11d1 590.50.0 147.60.0 0.0 1.00.01.7 0 98 c7t11d1 591.50.0 147.90.0 0.0 1.00.01.7 0 99 c5t11d1 600.50.0 150.10.0 0.0 1.00.01.6 0 98 c6t13d1 617.50.0 154.40.0 0.0 1.00.01.6 0 99 c7t12d1 611.00.0 152.80.0 0.0 1.00.01.6 0 99 c5t13d1 625.00.0 156.30.0 0.0 1.00.01.6 0 99 c6t14d1 592.50.0 148.10.0 0.0 1.00.01.7 0 99 c7t13d1 596.00.0 149.00.0 0.0 1.00.01.7 0 99 c5t14d1 598.50.0 149.60.0 0.0 1.00.01.6 0 98 c6t15d1 618.50.0 154.60.0 0.0 1.00.01.6 0 98 c7t14d1 606.50.0 151.60.0 0.0 1.00.01.6 0 98 c5t15d1 625.00.0 156.30.0 0.0 1.00.01.6 0 98 c7t15d1 extended device statistics r/sw/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c5t8d1 620.50.0 155.10.0 0.0 1.00.01.6 0 99 c7t8d1 1581.00.0 395.20.0 0.0 1.00.00.6 1 96 c6t8d1 611.50.0 152.90.0 0.0 1.00.01.6 0 99 c6t9d1 587.50.0 146.90.0 0.0 1.00.01.7 0 99 c5t9d1 1580.00.0 395.00.0 0.0 1.00.00.6 1 97 c7t9d1 593.00.0 148.20.0 0.0 1.00.01.7 0 99 c7t10d1 616.00.0 154.00.0 0.0 1.00.01.6 0 99 c6t10d1 601.00.0 150.20.0 0.0 1.00.01.6 0 99 c5t10d1 587.00.0
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
On Fri, 4 May 2012, Rocky Shek wrote: If I were you, I will not use 9240-8I. I will use 9211-8I as pure HBA with IT FW for ZFS. Is there IT FW for the 9240-8i? They seem to use the same SAS chipset. My next system will have 9211-8i with IT FW. Playing it safe. Good enough for Nexenta is good enough for me. Bob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance on LSI 9240-8i?
Hi, We add several bad experience with a LSI card (LSI 3081E, LSI SAS84016E). Even with Solaris official drivers provided by LSI. Finally we use LSI SAS9201-16i card. http://www.lsi.com/channel/france/products/storagecomponents/Pages/LSISAS9201-16i.aspx This one work as expected on Nexenta and OpenIndiana. Best regards, Hugues -Message initial- De:Roman Matiyenko rmatiye...@gmail.com Envoyé:ven. 04-05-2012 14:25 Sujet:[zfs-discuss] ZFS performance on LSI 9240-8i? À:zfs-discuss@opensolaris.org; Hi all, I have a bad bad problem with our brand new server! The lengthy details are below but to cut the story short, on the same hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and only 200-240MB/s on latest Solaris 11.11 (same zpool config). By writing directly to raw disks I found that in S10 the speed is 140MB/s sequential writes per disk (consistent with combined 1.4GB/s for my zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s zpool, 10 mirrors 24MB/s each). This must be the controller drivers, right? I downloaded drivers version 4.7 off LSI site (says for Solaris 10 and later) - they failed to attach on S11. Version 3.03 worked but the system would randomly crash, so I moved my experiments off S11 to S10. However, S10 has only the old implementation if iSCSI which gives me other problems so I decided to give S11 another go. Would there be any advice in this community? Many thanks! Roman == root@carbon:~# echo | format | grep Hitachi 1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB 10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB 15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB 17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB 18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB 19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB 20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB 21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB 22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB Reading DD from all disks: (dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 ) # Iostat –xznM 2 extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 614.5 0.0 153.6 0.0 0.0 1.0 0.0 1.6 0 98 c5t8d1 595.5 0.0 148.9 0.0 0.0 1.0 0.0 1.7 0 99 c7t8d1 1566.5 0.0 391.6 0.0 0.0 1.0 0.0 0.6 1 96 c6t8d1 # (SSD) 618.5 0.0 154.6 0.0 0.0 1.0 0.0 1.6 0 99 c6t9d1 616.5 0.0 154.1 0.0 0.0 1.0 0.0 1.6 0 99 c5t9d1 1564.0 0.0 391.0 0.0 0.0 1.0 0.0 0.6 1 96 c7t9d1# (SSD) 616.0 0.0 154.0 0.0 0.0 1.0 0.0 1.6 0 98 c7t10d1 554.0 0.0 138.5 0.0 0.0 1.0 0.0 1.8 0 99 c6t10d1 598.5 0.0 149.6 0.0 0.0 1.0 0.0 1.7 0 99 c5t10d1 588.5 0.0 147.1 0.0 0.0 1.0 0.0 1.7 0 98 c6t11d1 590.5 0.0 147.6 0.0 0.0 1.0 0.0 1.7 0 98 c7t11d1 591.5 0.0 147.9 0.0 0.0 1.0 0.0 1.7 0 99 c5t11d1 600.5 0.0 150.1 0.0 0.0 1.0 0.0 1.6 0 98 c6t13d1 617.5 0.0 154.4 0.0 0.0 1.0 0.0 1.6 0 99 c7t12d1 611.0 0.0 152.8 0.0 0.0 1.0 0.0 1.6 0 99 c5t13d1 625.0 0.0 156.3 0.0 0.0 1.0 0.0 1.6 0 99 c6t14d1 592.5 0.0 148.1 0.0 0.0 1.0 0.0 1.7 0 99 c7t13d1 596.0 0.0 149.0 0.0 0.0 1.0 0.0 1.7 0 99 c5t14d1 598.5 0.0 149.6 0.0 0.0 1.0 0.0 1.6 0 98 c6t15d1 618.5 0.0 154.6 0.0 0.0 1.0 0.0 1.6 0 98 c7t14d1 606.5 0.0 151.6 0.0 0.0 1.0 0.0 1.6 0 98 c5t15d1 625.0 0.0 156.3 0.0 0.0 1.0 0.0 1.6 0 98 c7t15d1 extended device statistics r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device 620.5 0.0 155.1 0.0 0.0 1.0 0.0 1.6 0 99 c5t8d1 620.5 0.0 155.1 0.0 0.0 1.0 0.0 1.6 0 99 c7t8d1 1581.0 0.0 395.2 0.0 0.0 1.0 0.0 0.6 1 96 c6t8d1 611.5 0.0 152.9 0.0 0.0 1.0 0.0 1.6 0 99 c6t9d1 587.5 0.0 146.9 0.0 0.0 1.0 0.0 1.7 0 99 c5t9d1 1580.0 0.0 395.0 0.0 0.0 1.0 0.0 0.6 1 97 c7t9d1 593.0 0.0 148.2 0.0 0.0 1.0 0.0 1.7 0 99 c7t10d1 616.0 0.0 154.0 0.0 0.0 1.0 0.0 1.6 0 99 c6t10d1 601.0 0.0 150.2 0.0
Re: [zfs-discuss] ZFS performance question over NFS
Hi Bob I don't know what the request pattern from filebench looks like but it seems like your ZEUS RAM devices are not keeping up or else many requests are bypassing the ZEUS RAM devices. Note that very large synchronous writes will bypass your ZEUS RAM device and go directly to a log in the main store. Small (= 128K) writes should directly benefit from the dedicated zil device. Find a copy of zilstat.ksh and run it while filebench is running in order to understand more about what is going on. Bob The pattern looks like: N-Bytes N-Bytes/s N-Max-RateB-Bytes B-Bytes/s B-Max-Rateops =4kB 4-32kB =32kB 958865695886569588656 88399872 88399872 88399872 90 0 0 90 666228066622806662280 87031808 87031808 87031808 83 0 0 83 636672863667286366728 72790016 72790016 72790016 79 0 0 79 631635263163526316352 83886080 83886080 83886080 80 0 0 80 668761666876166687616 84594688 84594688 84594688 92 0 0 92 490904849090484909048 69238784 69238784 69238784 73 0 0 73 660528066052806605280 81924096 81924096 81924096 79 0 0 79 689533668953366895336 81625088 81625088 81625088 85 0 0 85 653212865321286532128 87486464 87486464 87486464 90 0 0 90 692513669251366925136 86118400 86118400 86118400 83 0 0 83 So does it look good, bad or ugly ;) Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance question over NFS
Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile15476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op5us/op-cpu readfile15476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile25477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile15477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile190ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand190ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile190ops/s 0.0mb/s226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
What are the specs on the client? On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote: Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
Tim the client is identical as the server but no SAS drives attached. Also right now only one 1gbit Intel NIC Is available Thomas Am 18.08.2011 um 17:49 schrieb Tim Cook t...@cook.ms: What are the specs on the client? On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote: Dear all. We finally got all the parts for our new fileserver following several recommendations we got over this list. We use Dell R715, 96GB RAM, dual 8-core Opterons 1 10GE Intel dual-port NIC 2 LSI 9205-8e SAS controllers 2 DataON DNS-1600 JBOD chassis 46 Seagate constellation SAS drives 2 STEC ZEUS RAM The base zpool config utilizes 42 drives plus the STECs as mirrored log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2 junks plus as said a dedicated ZIL made of the mirrored STECs. As a quick'n dirty check we ran filebench with the fileserver workload. Running locally we get statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu with a single remote client (similar Dell System) using NFS statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu the same remote client with zpool sync disabled on the server statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu Disabling ZIL is no option but I expected a much better performance especially the ZEUS RAM only gets us a speed-up of about 1.8x Is this test realistic for a typical fileserver scenario or does it require many more clients to push the limits? Thanks Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance question over NFS
On Thu, 18 Aug 2011, Thomas Nau wrote: Tim the client is identical as the server but no SAS drives attached. Also right now only one 1gbit Intel NIC Is available I don't know what the request pattern from filebench looks like but it seems like your ZEUS RAM devices are not keeping up or else many requests are bypassing the ZEUS RAM devices. Note that very large synchronous writes will bypass your ZEUS RAM device and go directly to a log in the main store. Small (= 128K) writes should directly benefit from the dedicated zil device. Find a copy of zilstat.ksh and run it while filebench is running in order to understand more about what is going on. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance falls off a cliff
sirket, could you please share your OS, zfs, and zpool versions? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance falls off a cliff
~# uname -a SunOS nas01a 5.11 oi_147 i86pc i386 i86pc Solaris ~# zfs get version pool0 NAME PROPERTY VALUESOURCE pool0 version 5- ~# zpool get version pool0 NAME PROPERTY VALUESOURCE pool0 version 28 default -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 2/25/2011 4:15 PM, Torrey McMahon wrote: On 2/25/2011 3:49 PM, Tomas Ögren wrote: On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It happens at about 90% for me.. all of a sudden, the mail server got butt slow.. killed an old snapshot to get to 85% free or so, then it got snappy again. S10u9 sparc. Some of the recent updates have pushed the 80% watermark closer to 90% for most workloads. Sorry folks. I was thinking of yet an other change that was in the allocation algorithms. 80% is number to stick with. ... now where did I put my cold medicine? :) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On Sun, Feb 27, 2011 at 7:35 PM, Brandon High bh...@freaks.com wrote: It moves from best fit to any fit at a certain point, which is at ~ 95% (I think). Best fit looks for a large contiguous space to avoid fragmentation while any fit looks for any free space. I got the terminology wrong, it's first-fit when there is space, moving to best-fit at 96% full. See http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c for details. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Blasingame Oracle Keep pool space under 80% utilization to maintain pool performance. For what it's worth, the same is true for any other filesystem too. What really matters is the availability of suitably large sized unused sections of the hard drive. The larger the total space in your storage, the higher the percentage of used can be, while maintaining enough unused space to perform reasonably well. The more sequential your IO operations are, the less fragmentation you'll experience, and the less a problem there will be. If your workload is highly random, with a mixture of large small operations, with lots of snapshots being created and destroyed all the time, then you'll be fragmenting the drive quite a lot and experience this more. The 80% or 90% thing is just a rule of thumb. But you positively DON'T want to hit 100% full. I've had this happen and been required to power cycle and remove things in single user mode in order to bring it back up. It's not as if 100% full is certain to cause a problem... I can look up details if someone wants to know... There is a specific condition that only occurs sometimes when 100% full, which essentially makes the system unusable. But there is one specific thing, isn't there? Where ZFS will choose to use a different algorithm for something, when pool usage exceeds some threshold. Right? What is that? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It is, and in my experience, it doesn't matter much if you have a full pool and add another VDEV, the existing VDEVs will be full still, and performance will be slow. For this reason, new systems are setup with more, smaller drives to help upgrade later by replacing the drives with larger ones. Hopefully, we might see block pointer rewrite some time in the future to help rebalance pools. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On Sun, Feb 27, 2011 at 6:59 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: But there is one specific thing, isn't there? Where ZFS will choose to use a different algorithm for something, when pool usage exceeds some threshold. Right? What is that? It moves from best fit to any fit at a certain point, which is at ~ 95% (I think). Best fit looks for a large contiguous space to avoid fragmentation while any fit looks for any free space. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 27/02/11 9:59 AM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of David Blasingame Oracle Keep pool space under 80% utilization to maintain pool performance. For what it's worth, the same is true for any other filesystem too. I would expect COW puts more pressure on near-full behaviour compared to write-in-place filesystems. If that's not true, somebody correct me. --Toby What really matters is the availability of suitably large sized unused sections of the hard drive. ... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On Mon, Feb 28 at 0:30, Toby Thain wrote: I would expect COW puts more pressure on near-full behaviour compared to write-in-place filesystems. If that's not true, somebody correct me. Off the top of my head, I think it'd depend on the workload. Write-in-place will always be faster with large IOs than with smaller IOs, and write-in-place will always be faster than CoW with large enough IO because there's no overhead for choosing where the write goes (and with large enough IO, seek overhead ~= 0) With CoW, it probably matters more what the previous version of the LBAs you're overwriting looked like, plus how fragmented the free space is. Into a device with plenty of free space, small writes should be significantly faster than write-in-place. --eric -- Eric D. Mudama edmud...@bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Performance
Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. from : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage Pool Performance Considerations . Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
Hi Dave, Still true. Thanks, Cindy On 02/25/11 13:34, David Blasingame Oracle wrote: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. from : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage Pool Performance Considerations . Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It happens at about 90% for me.. all of a sudden, the mail server got butt slow.. killed an old snapshot to get to 85% free or so, then it got snappy again. S10u9 sparc. from : http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage Pool Performance Considerations . Keep pool space under 80% utilization to maintain pool performance. Currently, pool performance can degrade when a pool is very full and file systems are updated frequently, such as on a busy mail server. Full pools might cause a performance penalty, but no other issues. Dave ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance
On 2/25/2011 3:49 PM, Tomas Ögren wrote: On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes: Hi All, In reading the ZFS Best practices, I'm curious if this statement is still true about 80% utilization. It happens at about 90% for me.. all of a sudden, the mail server got butt slow.. killed an old snapshot to get to 85% free or so, then it got snappy again. S10u9 sparc. Some of the recent updates have pushed the 80% watermark closer to 90% for most workloads. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance questions
I have an OpenSolaris (technically OI 147) box running ZFS with Comstar (zpool version 28, zfs version 5) The box is a 2950 with 32 GB of RAM, Dell SAS5/e card connected to 6 Promise vTrak J610sD (dual controller SAS) disk shelves spread across both channels of the card (2 chains of 3 shelves). We currently have: 4 x OCZ Vertex 2 SSD's configured as a ZIL (We've been experimenting without a dedicated ZIL, with 2 mirrors, and with 4 individual drives- these are not meant to be a permanent part of the array- they were installed to evaluate limited SSD benefits) 2 x 300GB 15k RPM Hot Spare drives- one on each channel 2 x 600GB 15k RPM Hot Spare drives- one on each channel 52 x 300GB 15k RPM disks configured as 4 Disk RAIDz (13 zdevs) 20 x 600GB 15k RPM disks configured as 4 disk RAIDz (5 zdevs) (Eventually there will be 16 more 600GB disks -4 more zdevs for a total of 22 zdevs) Most of our disk access is through COMSTAR via iSCSI. That said- even performance tests direct to the local disks reveal good, but not great performance. Most of our sequential write performance tests show about 200 MB/sec to the storage which seems pretty low given the disk's and their individual performance. I'd love to have configured the disks as mirrors but I needed a minimum of 20 TB in the space provided and I could not achieve that when using mirrors. Can anyone provide a link to good performance analysis resources so I can try to track down where my limited write performance is coming from? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance Tuning
Hi, i am working with ZFS now a days, i am facing some performance issues from application team, as they said writes are very slow in ZFS w.r.t UFS. Kindly send me some good reference or books links. i will be very thankful to you. BR, Tayyab ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance Tuning
On Aug 4, 2010, at 3:22 AM, TAYYAB REHMAN wrote: Hi, i am working with ZFS now a days, i am facing some performance issues from application team, as they said writes are very slow in ZFS w.r.t UFS. Kindly send me some good reference or books links. i will be very thankful to you. Hi Tayyab, Please start with the ZFS Best Practices Guide. http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs performance issue
Hi, I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 with a few slices on a single disk. I was expecting a good read/write performance but I got the speed of 12-15MBps. How can I enhance the read/write performance of my raid? Thanks, Abhi. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance issue
Abhishek Gupta wrote: Hi, I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 with a few slices on a single disk. I was expecting a good read/write performance but I got the speed of 12-15MBps. How can I enhance the read/write performance of my raid? Thanks, Abhi. You absolutely DON'T want to do what you've done. Creating a ZFS pool (or, for that matter, any RAID device,whether hardware or software) out of slices/partitions of a single disk is a recipe for horrible performance. In essence, you reduce your performance to 1/N (or worse) of the whole disk, where N is the number of slices you created. So, create your zpool using disks or partitions from different disks. It's OK to have more than one partition on a disk - just use them in different pools for reasonable performance. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On 18/03/10 08:36 PM, Kashif Mumtaz wrote: Hi, I did another test on both machine. And write performance on ZFS extraordinary slow. Which build are you running? On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my desktop), I see these results: $ time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real 0m28.224s user 0m0.490s sys 0m19.061s This is a dataset on a straight mirrored pool, using two SATA2 drives (320Gb Seagate). On my Ultra24 with two mirrored 1Tb WD drives 8gb memory and snv_125 I only get :- rich: ptime dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real 1:44.352133699 user0.444280089 sys13.526079085 rich: uname -a SunOS ultra24 5.11 snv_125 i86pc i386 i86pc rich: zpool status tank pool: tank state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: scrub completed after 0h30m with 0 errors on Mon Apr 19 02:36:08 2010 config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 errors: No known data errors rich: ipstat -En c1t3d0 ipstat: Command not found. rich: iostat -En c1t3d0 c1t3d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD1001FALS-0 Revision: 0K05 Serial No: Size: 1000.20GB 1000204886016 bytes Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 4264 Predictive Failure Analysis: 0 rich: psrinfo -v Status of virtual processor 0 as of: 04/19/2010 14:23:42 on-line since 12/16/2009 21:56:59. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 1 as of: 04/19/2010 14:23:42 on-line since 12/16/2009 21:57:03. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 2 as of: 04/19/2010 14:23:42 on-line since 12/16/2009 21:57:03. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Status of virtual processor 3 as of: 04/19/2010 14:23:42 on-line since 12/16/2009 21:57:03. The i386 processor operates at 3000 MHz, and has an i387 compatible floating point processor. Why are my drives so slow? $ time dd if=test.dbf bs=8k of=/dev/null 1048576+0 records in 1048576+0 records out real 0m5.749s user 0m0.458s sys 0m5.260s James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
hi, Thanks for all the reply. I have found the real culprit. Hard disk was faulty. I changed the hard disk.And now ZFS performance is much better. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Hi, I did another test on both machine. And write performance on ZFS extraordinary slow. I did the following test on both machines For write time dd if=/dev/zero of=test.dbf bs=8k count=1048576 For read time dd if=/testpool/test.dbf of=/dev/null bs=8k ZFS machine has 32GB memory UFS machine has 16GB memory UFS machine test ### time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real2m18.352s user0m5.080s sys 1m44.388s #iostat -xnmpz 10 r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.6 107.94.8 62668.4 0.0 6.70.1 61.9 1 83 c0t0d0 0.00.20.00.2 0.0 0.00.00.8 0 0 c0t0d0s5 0.6 107.74.8 62668.2 0.0 6.70.1 62.0 1 83 c0t0d0s7 For read # time dd if=test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real1m21.285s user0m4.701s sys 1m15.322s For write it took 2.18 minutes and for read it took 1.21 minutes. ## ZFS machine test ## # time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real140m33.590s user0m5.182s sys 2m33.025s extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.08.20.0 1037.0 0.0 33.30.0 4062.3 0 100 c0t0d0 0.08.20.0 1037.0 0.0 33.30.0 4062.3 0 100 c0t0d0s0 - For read #time dd if=test.dbf of=/dev/null bs=8k 1048576+0 records in 1048576+0 records out real0m59.177s user0m4.471s sys 0m54.723s For write it took 140 minutes and for read 59 seconds(less then UFS) - In ZFS data was being write around 1037 kw/s while disk remain busy 100%. In UFS data was being written around 62668 kw/s while disk is busy at 83% Kindly help me how can I tune the writing performance on ZFS? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On 18/03/10 08:36 PM, Kashif Mumtaz wrote: Hi, I did another test on both machine. And write performance on ZFS extraordinary slow. Which build are you running? On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my desktop), I see these results: $ time dd if=/dev/zero of=test.dbf bs=8k count=1048576 1048576+0 records in 1048576+0 records out real0m28.224s user0m0.490s sys 0m19.061s This is a dataset on a straight mirrored pool, using two SATA2 drives (320Gb Seagate). $ time dd if=test.dbf bs=8k of=/dev/null 1048576+0 records in 1048576+0 records out real0m5.749s user0m0.458s sys 0m5.260s James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On Thu, Mar 18, 2010 at 03:36:22AM -0700, Kashif Mumtaz wrote: I did another test on both machine. And write performance on ZFS extraordinary slow. - In ZFS data was being write around 1037 kw/s while disk remain busy 100%. That is, as you say, such an extraordinarily slow number that we have to start at the very basics and eliminate fundamental problems. I have seen disks go bad in a way that they simply become very very slow. You need to be sure that this isn't your problem. Or perhaps there's some hardware issue when the disks are used in parallel? Check all the cables and connectors. Check logs for any errors. Do you have the opportunity to try testing write speed with dd to the raw disks? If the pool is mirrored, can you detach one side at a time? Test the detached disk with dd, and the pool with the other disk, one at a time and then concurrently. One slow disk will slow down the mirror (but I don't recall seeing such an imbalance in your iostat output either). Do you have some spare disks to try other tests with? Try a ZFS install on those, and see they also have the problem. Try a UFS install on the current disks, and see if they still have the problem. Can you swap the disks between the T1000s and see if the problem stays with the disks or the chassis? You have a gremlin to hunt... -- Dan. pgprooWSK6vzu.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 18.03.2010 21:31, Daniel Carosone wrote: You have a gremlin to hunt... Wouldn't Sun help here? ;) (sorry couldn't help myself, I've spent a week hunting gremlins until I hit the brick wall of the MPT problem) //Svein - -- - +---+--- /\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuikAIACgkQSBMQn1jNM7ZHpQCgn15+EsQzafhJw1HnhBWlTW9X STUAoPvVS4bfq3E3N3Vg7JCuQ3M5+Am6 =YSRa -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
James C. McPherson wrote: On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog I would generally agree with James, with the caveaut that you could try to update to something a bit latter than Update 4. That's pretty early-on in the ZFS deployment in Solaris 10. At the minimum, grab the latest Recommended Patch set and apply that, then see what your issues are. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
Erik Trimble wrote: James C. McPherson wrote: On 18/03/10 10:05 PM, Kashif Mumtaz wrote: Hi, Thanks for your reply BOTH are Sun Sparc T1000 machines. Hard disk 1 TB sata on both ZFS system Memory32 GB , Processor 1GH 6 core os Solaris 10 10/09 s10s_u8wos_08a SPARC PatchCluster level 142900-02(Dec 09 ) UFS machine Hard disk 1 TB sata Memory 16 GB Processor Processor 1GH 6 core Solaris 10 8/07 s10s_u4wos_12b SPARC Since you are seeing this on a Solaris 10 update release, you should log a call with your support provider to get this investigated. James C. McPherson -- Senior Software Engineer, Solaris Sun Microsystems http://www.jmcp.homeunix.com/blog I would generally agree with James, with the caveaut that you could try to update to something a bit latter than Update 4. That's pretty early-on in the ZFS deployment in Solaris 10. At the minimum, grab the latest Recommended Patch set and apply that, then see what your issues are. Oh, nevermind. I'm an idiot. I was looking at the UFS machine. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Performance on SATA Deive
hi , I'm using sun T1000 machines one machine is installed Solaris 10 with UFS and other system with ZFS file system , ZFS machine is performing slow . Running following commands on both systems shows Disk get busy immediatly to 100% ZFS MACHINE find / /dev/null 21 iostat -xnmpz 5 [r...@zfs-serv ktahir]# iostat -xnmpz 5 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.40.2 12.32.2 0.0 0.06.53.9 0 0 c0d0 0.00.00.00.0 0.0 0.00.00.9 0 0 192.168.150.131:/export/home2 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 86.40.0 5527.40.0 0.0 1.00.0 11.2 0 97 c0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 87.40.0 5593.70.0 0.0 1.00.0 11.1 0 96 c0d0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 85.20.0 5452.80.0 0.0 1.00.0 11.3 0 96 c0d0 but on UFS file system averge busy is 50% , any idea why ZFS makes disk more busy ? any idea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On Wed, 17 Mar 2010, Kashif Mumtaz wrote: but on UFS file system averge busy is 50% , any idea why ZFS makes disk more busy ? Clearly there are many more reads per second occuring on the zfs filesystem than the ufs filesystem. Assuming that the application-level requests are really the same, this suggests that the system does not have enough RAM installed in order to cache the working set. Another issue could be fileystem block size since zfs defaults the block size to 128K but some applications (e.g. database) work better with 4K, 8K, or 16K block size. Regardless, I suggest measuring the statistics with a 30 second interval rather than 5 seconds since zfs is assured to do whatever it does within 30 seconds. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Performance on SATA Deive
On Wed, Mar 17, 2010 at 10:15:53AM -0500, Bob Friesenhahn wrote: Clearly there are many more reads per second occuring on the zfs filesystem than the ufs filesystem. yes Assuming that the application-level requests are really the same From the OP, the workload is a find /. So, ZFS makes the disks busier.. but is it find'ing faster as a result, or doing more reads per found file? The ZFS io pipeline will be able to use the cpu concurrency of the T1000 better than UFS, even for a single-threaded find, and may just be issuing IO faster. Count the number of lines printed and divide by the time taken to compare whether the extra work being done is producing extra output or not. However, it might also be worthwhile to look for a better / more representative benchmark and compare further using that. Also, to be clear, could you clarify whether the problem you see that the numbers in iostat are larger, or that find runs slower, or that other processes are more impacted by find? this suggests that the system does not have enough RAM installed in order to cache the working set. Possibly, yes. Another issue could be fileystem block size since zfs defaults the block size to 128K but some applications (e.g. database) work better with 4K, 8K, or 16K block size. Unlikely to be relevant to fs metadata for find. Regardless, I suggest measuring the statistics with a 30 second interval rather than 5 seconds since zfs is assured to do whatever it does within 30 seconds. Relevant for write benchmarks more so than read. -- Dan. pgpXx2JnRah30.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
ZFS has intelligent prefetching. AFAIK, Solaris disk drivers do not prefetch. Can you point me to any reference? I didn't find anything stating yay or nay, for either of these. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Doesn't this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? It is understood, that any single point of failure could result in failure, yes. If you have a CPU that performs miscalculations, makes mistakes, it can instruct bad things to be written to disk (I've had something like that happen before.) If you have RAM with bit errors in it that go undetected, you can have corrupted memory, and if that memory is destined to write to disk, you'll have bad data written to disk. If you have a non-redundant raid controller, which buffers writes, and the buffer gets destroyed or corrupted before the writes are put to disk, then the data has become corrupt. Heck, the same is true even with redundant raid controllers, if there are memory errors in one that go undetected. So you'll have to do your own calculation. Which is worse? - Don't get the benefit of accelerated hardware, for all the time that the hardware is functioning correctly, Or - Take the risk of acceleration, with possibility the accelerator could fail and cause harm to the data it was working on. I know I always opt for using the raid write-back. If I ever have a situation where I'm so scared of the raid card corrupting data, I would be equally scared of the CPU or SAS bus or system ram or whatever. In that case, I'd find a solution that makes entire machines redundant, rather than worrying about one little perc card. Yes it can happen. I've seen it happen. But not just to raid cards; everything else is vulnerable too. I'll take a 4x performance improvement for 99.999% of the time, and risk the corruption the rest of the time. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
One more thing I¹d like to add here: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. It may be smart to double check, and ensure your OS does adaptive readahead. In Linux (rhel/centos) you can check that the ³readahead² service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. I don¹t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled. On 2/18/10 8:08 AM, Edward Ned Harvey sola...@nedharvey.com wrote: Ok, I¹ve done all the tests I plan to complete. For highest performance, it seems: ·The measure I think is the most relevant for typical operation is the fastest random read /write / mix. (Thanks Bob, for suggesting I do this test.) The winner is clearly striped mirrors in ZFS ·The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidz ·The fastest sustained sequential read is striped mirrors via ZFS, or maybe raidz Here are the results: ·Results summary of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.pd f ·Raw results of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip ·Results summary of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.pd f ·Raw results of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip From: Edward Ned Harvey [mailto:sola...@nedharvey.com] Sent: Saturday, February 13, 2010 9:07 AM To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org Subject: ZFS performance benchmarks in various configurations I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like ³striping mirrors is faster than raidz² and so on. Would anybody like me to test any particular configuration? Unfortunately I don¹t have any SSD, so I can¹t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5² SAS SSD they wouldn¹t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren¹t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: ·Single disk ·2-way mirror ·3-way mirror ·4-way mirror ·5-way mirror ·6-way mirror ·Two mirrors striped (or concatenated) ·Three mirrors striped (or concatenated) ·5-disk raidz ·6-disk raidz ·6-disk raidz2 Hypothesized results: ·N-way mirrors write at the same speed of a single disk ·N-way mirrors read n-times faster than a single disk ·Two mirrors striped read and write 2x faster than a single mirror ·Three mirrors striped read and write 3x faster than a single mirror ·Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it¹s slower than a single disk. Waiting to see the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
hello i have made some benchmarks with my napp-it zfs-serverbr a href=http://www.napp-it.org/bench.pdf; target=_blankscreenshot/abr br a href=http://www.napp-it.org/bench.pdf; target=_blankwww.napp-it.org/bench.pdf/abr br - 2gb vs 4 gb vs 8 gb rambr - mirror vs raidz vs raidz2 vs raidz3br - dedup and compress enabled vs disabledbr br result in short:br 8gb ram vs 2 Gb: + 10% .. +500% more power (green drives)br compress and dedup enabled: + 50% .. +300%br mirror vs Raidz: fastest is raidz, slowest mirror, raidz level +/-20%br br br gea -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 19, 2010, at 8:35 AM, Edward Ned Harvey wrote: One more thing I’d like to add here: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. It may be smart to double check, and ensure your OS does adaptive readahead. In Linux (rhel/centos) you can check that the “readahead” service is loading. I noticed this is enabled by default in runlevel 5, but disabled by default in runlevel 3. Interesting. I don’t know how to check solaris or opensolaris, to ensure adaptive readahead is enabled. ZFS has intelligent prefetching. AFAIK, Solaris disk drivers do not prefetch. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On 19 feb 2010, at 17.35, Edward Ned Harvey wrote: The PERC cache measurably and significantly accelerates small disk writes. However, for read operations, it is insignificant compared to system ram, both in terms of size and speed. There is no significant performance improvement by enabling adaptive readahead in the PERC. I will recommend instead, the PERC should be enabled for Write Back, and have the readahead disabled. Fortunately this is the default configuration on a new perc volume, so unless you changed it, you should be fine. If I understand correctly, ZFS now adays will only flush data to non volatile storage (such as a RAID controller NVRAM), and not all the way out to disks. (To solve performance problems with some storage systems, and I believe that it also is the right thing to do under normal circumstances.) Doesn't this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? /ragge ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
If I understand correctly, ZFS now adays will only flush data to non volatile storage (such as a RAID controller NVRAM), and not all the way out to disks. (To solve performance problems with some storage systems, and I believe that it also is the right thing to do under normal circumstances.) Doesn't this mean that if you enable write back, and you have a single, non-mirrored raid-controller, and your raid controller dies on you so that you loose the contents of the nvram, you have a potentially corrupt file system? ZFS requires,that all writes be flushed to non-volatile storage. This is needed for both transaction group (txg) commits to ensure pool integrity and for the ZIL to satisfy the synchronous requirement of fsync/O_DSYNC etc. If the caches weren't flushed then it would indeed be quicker but the pool would be susceptible to corruption. Sadly some hardware doesn't honour cache flushes and this can cause corruption. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Ok, I've done all the tests I plan to complete. For highest performance, it seems: . The measure I think is the most relevant for typical operation is the fastest random read /write / mix. (Thanks Bob, for suggesting I do this test.) The winner is clearly striped mirrors in ZFS . The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidz . The fastest sustained sequential read is striped mirrors via ZFS, or maybe raidz Here are the results: . Results summary of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary. pdf . Raw results of Bob's method http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip . Results summary of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary. pdf . Raw results of Ned's method http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip From: Edward Ned Harvey [mailto:sola...@nedharvey.com] Sent: Saturday, February 13, 2010 9:07 AM To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org Subject: ZFS performance benchmarks in various configurations I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like striping mirrors is faster than raidz and so on. Would anybody like me to test any particular configuration? Unfortunately I don't have any SSD, so I can't do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5 SAS SSD they wouldn't mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren't all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: . Single disk . 2-way mirror . 3-way mirror . 4-way mirror . 5-way mirror . 6-way mirror . Two mirrors striped (or concatenated) . Three mirrors striped (or concatenated) . 5-disk raidz . 6-disk raidz . 6-disk raidz2 Hypothesized results: . N-way mirrors write at the same speed of a single disk . N-way mirrors read n-times faster than a single disk . Two mirrors striped read and write 2x faster than a single mirror . Three mirrors striped read and write 3x faster than a single mirror . Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it's slower than a single disk. Waiting to see the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, 18 Feb 2010, Edward Ned Harvey wrote: Ok, I’ve done all the tests I plan to complete. For highest performance, it seems: · The measure I think is the most relevant for typical operation is the fastest random read /write / mix. (Thanks Bob, for suggesting I do this test.) The winner is clearly striped mirrors in ZFS A most excellent set of tests. We could use some units in the PDF file though. While it would take quite some time and effort to accomplish, we could use a similar summary for full disk resilver times in each configuration. · The fastest sustained sequential write is striped mirrors via ZFS, or maybe raidz Note that while these tests may be file-sequential, with 8 threads working at once, what the disks see is not necessarily sequential. However, for initial sequential write, it may be that zfs aggregates the write requests and orders them on disk in such a way that subsequent sequential reads by the name number of threads in a roughly similar order would see a performance benefit. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
A most excellent set of tests. We could use some units in the PDF file though. Oh, hehehe. ;-) The units are written in the raw txt files. On your tests, the units were ops/sec, and in mine, they were Kbytes/sec. If you like, you can always grab the xlsx and modify it to your tastes, and create an updated pdf. Just substitute .xlsx instead of .pdf in the previous URL's. Or just drop the filename off the URL. My web server allows indexing on that directory. Personally, I only look at the chart which is normalized against a single disk, so units are intentionally not present. While it would take quite some time and effort to accomplish, we could use a similar summary for full disk resilver times in each configuration. Actually, that's easy. Although the zpool create happens instantly, all the hardware raid configurations required an initial resilver. And they were exactly what you expect. Write 1 Gbit/s until you reach the size of the drive. I watched the progress while I did other things, and it was incredibly consistent. I am assuming, with very high confidence, that ZFS would match that performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
A most excellent set of tests. We could use some units in the PDF file though. Oh, by the way, you originally requested the 12G file to be used in benchmark, and later changed to 4G. But by that time, two of the tests had already completed on the 12G, and I didn't throw away those results, but I didn't include them in the summary either. If you look in the raw results, you'll see a directory called 12G, and if you compare those results against the equivalent 4G counterpart, you'll see the 12G in fact performed somewhat lower. The reason is that there are sometimes cache hits during read operations, and the write back buffer is enabled in the PERC. So the smaller the data set, the more frequently these things will accelerate you. And consequently, the 4G performance was measured higher. This doesn't affect me at all. I wanted to know qualitative results, not quantitative. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, 18 Feb 2010, Edward Ned Harvey wrote: Actually, that's easy. Although the zpool create happens instantly, all the hardware raid configurations required an initial resilver. And they were exactly what you expect. Write 1 Gbit/s until you reach the size of the drive. I watched the progress while I did other things, and it was incredibly consistent. This sounds like an initial 'silver' rather than a 'resilver'. In a 'resilver' process it is necessary to read other disks in the vdev in order to reconstruct the disk content. As a result, we now have additional seeks and reads going on, which seems considerably different than pure writes. What I am interested in is the answer to these sort of questions: o Does a mirror device resilver faster than raidz? o Does a mirror device in a triple mirror resilver faster than a two-device mirror? o Does a raidz2 with 9 disks resilver faster or slower than one with 6 disks? The answer to these questions could vary depending on how well the pool has been aged and if it has been used for a while close to 100% full. Before someone pipes up and says that measuring this is useless since results like this are posted all over the internet, I challenge that someone to find this data already published somewhere. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Thu, Feb 18, 2010 at 10:39:48PM -0600, Bob Friesenhahn wrote: This sounds like an initial 'silver' rather than a 'resilver'. Yes, in particular it will be entirely seqential. ZFS resilver is in txg order and involves seeking. What I am interested in is the answer to these sort of questions: o Does a mirror device resilver faster than raidz? o Does a mirror device in a triple mirror resilver faster than a two-device mirror? o Does a raidz2 with 9 disks resilver faster or slower than one with 6 disks? and, if we're wishing for comprehensive analysis: o What is the impact on concurrent IO benchmark loads, for each of the above. The answer to these questions could vary depending on how well the pool has been aged and if it has been used for a while close to 100% full. Indeed, which makes it even harder to compare results from different cases and test sources. To get usable relative-to-each-other results, one needs to compare idealised test cases with repeatable loads. This is weeks of work, at least, and can be fun to specualte about up front but rapidly gets very tiresome. -- Dan. pgp2nipoqXa1P.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Richard Elling wrote: ... As you can see, so much has changed, hopefully for the better, that running performance benchmarks on old software just isn't very interesting. NB. Oracle's Sun OpenStorage systems do not use Solaris 10 and if they did, they would not be competitive in the market. The notion that OpenSolaris is worthless and Solaris 10 rules is simply bull* OpenSolaris isn't worthless, but no way in hell would I run it in production, based on my experiences running it at home from b111 to now. The mpt driver problems are just one of many show stoppers (is that resolved yet, or do we still need magic /etc/system voodoo?). Of course, Solaris 10 couldn't properly drive the Marvell attached disks in an X4500 prior to U6 either, unless you ran an IDR (pretty inexcusable in a storage-centric server release). -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Never mind. I have no interest in performance tests for Solaris 10. The code is so old, that it does not represent current ZFS at all. Whatever. Regardless of what you say, it does show: . Which is faster, raidz, or a stripe of mirrors? . How much does raidz2 hurt performance compared to raidz? . Which is faster, raidz, or hardware raid 5? . Is a mirror twice as fast as a single disk for reading? Is a 3-way mirror 3x faster? And so on? I've seen and heard many people stating answers to these questions, and my results (not yet complete) already answer these questions, and demonstrate that all the previous assertions were partial truths. It's true, I am demonstrating no interest to compare performance of ZFS 3 versus ZFS 4. If you want that, test it yourself and don't complain about my tests. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
iozone -m -t 8 -T -O -r 128k -o -s 12G Actually, it seems that this is more than sufficient: iozone -m -t 8 -T -r 128k -o -s 4G Good news, cuz I kicked off the first test earlier today, and it seems like it will run till Wednesday. ;-) The first run, on a single disk, took 6.5 hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror, 4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks, stripe of 2 mirrors, stripe of 3 mirrors ... I'll go stop it, and change to 4G. Maybe it'll be done tomorrow. ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Whatever. Regardless of what you say, it does show: · Which is faster, raidz, or a stripe of mirrors? · How much does raidz2 hurt performance compared to raidz? · Which is faster, raidz, or hardware raid 5? · Is a mirror twice as fast as a single disk for reading? Is a 3-way mirror 3x faster? And so on? I’ve seen and heard many people stating answers to these questions, and my results (not yet complete) already answer these questions, and demonstrate that all the previous assertions were partial truths. I don't think he was complaining, i think he was sayign he dind't need you to run iosnoop on the old version of ZFS. Solaris 10 has a really old version of ZFS. i know there are some pretty big differences in zfs versions from my own non scientific benchmarks. It would make sense that people wouldn't be as interested in benchmarks of solaris 10 ZFS seeing as there are literally hundreds scattered around the internet. I don't think he was telling you not to bother testing for your own purposes though. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Edward Ned Harvey wrote: Never mind. I have no interest in performance tests for Solaris 10. The code is so old, that it does not represent current ZFS at all. Whatever. Regardless of what you say, it does show: Since Richard abandoned Sun (in favor of gmail), he has no qualms with suggesting to test the unstable version. ;-) Regardless of denials to the contrary, Solaris 10 is still the stable enterprise version of Solaris, and will be for quite some time. It has not yet achieved the status of Solaris 8. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Edward Ned Harvey wrote: iozone -m -t 8 -T -O -r 128k -o -s 12G Actually, it seems that this is more than sufficient: iozone -m -t 8 -T -r 128k -o -s 4G Good news, cuz I kicked off the first test earlier today, and it seems like it will run till Wednesday. ;-) The first run, on a single disk, took 6.5 hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror, 4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks, stripe of 2 mirrors, stripe of 3 mirrors ... I'll go stop it, and change to 4G. Maybe it'll be done tomorrow. ;-) Probably even 2G is plenty since that gives 16GB of total file data. Keep in mind that with file data much larger than memory, these benchmarks are testing the hardware more than they are testing Solaris. If you wanted to test Solaris, then you would intentionally give it enough memory to work with since that is now it is expected to be used. The performance of Solaris when it is given enough memory to do reasonable caching is astounding. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sun, 14 Feb 2010, Thomas Burgess wrote: Solaris 10 has a really old version of ZFS. i know there are some pretty big differences in zfs versions from my own non scientific benchmarks. It would make sense that people wouldn't be as interested in benchmarks of solaris 10 ZFS seeing as there are literally hundreds scattered around the internet. Can you provide URLs for these useful benchmarks? I am certainly interested in seeing them. Even my own benchmarks that I posted almost two years ago are quite useless now. Solaris 10 ZFS is a continually moving target. OpenSolaris performance postings I have seen are not terribly far from Solaris 10. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 14, 2010, at 6:45 PM, Thomas Burgess wrote: Whatever. Regardless of what you say, it does show: · Which is faster, raidz, or a stripe of mirrors? · How much does raidz2 hurt performance compared to raidz? · Which is faster, raidz, or hardware raid 5? · Is a mirror twice as fast as a single disk for reading? Is a 3-way mirror 3x faster? And so on? I’ve seen and heard many people stating answers to these questions, and my results (not yet complete) already answer these questions, and demonstrate that all the previous assertions were partial truths. I don't think he was complaining, i think he was sayign he dind't need you to run iosnoop on the old version of ZFS. iosnoop runs fine on Solaris 10. I am sorta complaining, though. If you wish to advance ZFS, then use the latest bits. If you wish to discover the performance bugs in Solaris 10 that are already fixed in OpenSolaris, then go ahead, be my guest. Examples of improvements are: + intelligent prefetch algorithm is smarter + txg commit interval logic is improved + ZIL logic improved and added logbias property + stat() performance is improved + raidz write performance improved and raidz3 added + zfs caching improved + dedup changes touched many parts of ZFS + zfs_vdev_max_pending reduced and smarter + metaslab allocation improved + zfs write activity doesn't hog resource quite so much + a new scheduling class, SDC, added to better observe and manage ZFS thread scheduling + buffers can be shared between file system modules (fewer copies) As you can see, so much has changed, hopefully for the better, that running performance benchmarks on old software just isn't very interesting. NB. Oracle's Sun OpenStorage systems do not use Solaris 10 and if they did, they would not be competitive in the market. The notion that OpenSolaris is worthless and Solaris 10 rules is simply bull* Solaris 10 has a really old version of ZFS. i know there are some pretty big differences in zfs versions from my own non scientific benchmarks. It would make sense that people wouldn't be as interested in benchmarks of solaris 10 ZFS seeing as there are literally hundreds scattered around the internet. I don't think he was telling you not to bother testing for your own purposes though. Correct. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance benchmarks in various configurations
I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like striping mirrors is faster than raidz and so on. Would anybody like me to test any particular configuration? Unfortunately I don't have any SSD, so I can't do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5 SAS SSD they wouldn't mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren't all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Configurations being tested: . Single disk . 2-way mirror . 3-way mirror . 4-way mirror . 5-way mirror . 6-way mirror . Two mirrors striped (or concatenated) . Three mirrors striped (or concatenated) . 5-disk raidz . 6-disk raidz . 6-disk raidz2 Hypothesized results: . N-way mirrors write at the same speed of a single disk . N-way mirrors read n-times faster than a single disk . Two mirrors striped read and write 2x faster than a single mirror . Three mirrors striped read and write 3x faster than a single mirror . Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it's slower than a single disk. Waiting to see the results. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
Some thoughts below... On Feb 13, 2010, at 6:06 AM, Edward Ned Harvey wrote: I have a new server, with 7 disks in it. I am performing benchmarks on it before putting it into production, to substantiate claims I make, like “striping mirrors is faster than raidz” and so on. Would anybody like me to test any particular configuration? Unfortunately I don’t have any SSD, so I can’t do any meaningful test on the ZIL etc. Unless someone in the Boston area has a 2.5” SAS SSD they wouldn’t mind lending for a few hours. ;-) My hardware configuration: Dell PE 2970 with 8 cores. Normally 32G, but I pulled it all out to get it down to 4G of ram. (Easier to benchmark disks when the file operations aren’t all cached.) ;-) Solaris 10 10/09. PERC 6/i controller. All disks are configured in PERC for Adaptive ReadAhead, and Write Back, JBOD. 7 disks present, each SAS 15krpm 160G. OS is occupying 1 disk, so I have 6 disks to play with. Put the memory back in and limit the ARC cache size instead. x86 boxes have a tendency to change the memory bus speed depending on how much memory is in the box. Similarly, you can test primarycache settings rather than just limiting ARC size. I am currently running the following tests: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 IMHO, sequential tests are a waste of time. With default configs, it will be difficult to separate the raw performance from prefetched performance. You might try disabling prefetch as an option. With sync writes, you will run into the zfs_immediate_write_sz boundary. Perhaps someone else can comment on how often they find interesting sequential workloads which aren't backup-related. Configurations being tested: · Single disk · 2-way mirror · 3-way mirror · 4-way mirror · 5-way mirror · 6-way mirror · Two mirrors striped (or concatenated) · Three mirrors striped (or concatenated) · 5-disk raidz · 6-disk raidz · 6-disk raidz2 Please add some raidz3 tests :-) We have little data on how raidz3 performs. Hypothesized results: · N-way mirrors write at the same speed of a single disk · N-way mirrors read n-times faster than a single disk · Two mirrors striped read and write 2x faster than a single mirror · Three mirrors striped read and write 3x faster than a single mirror · Raidz and raidz2: No hypothesis. Some people say they perform comparable to many disks working together. Some people say it’s slower than a single disk. Waiting to see the results. Please post results (with raw data would be nice ;-). If you would be so kind as to collect samples of iosnoop -Da I would be eternally grateful :-) -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote: Will test, including the time to flush(), various record sizes inside file sizes up to 16G, sequential write and sequential read. Not doing any mixed read/write requests. Not doing any random read/write. iozone -Reab somefile.wks -g 17G -i 1 -i 0 Make sure to also test with a command like iozone -m -t 8 -T -O -r 128k -o -s 12G I am eager to read your test report. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Bob Friesenhahn wrote: Make sure to also test with a command like iozone -m -t 8 -T -O -r 128k -o -s 12G Actually, it seems that this is more than sufficient: iozone -m -t 8 -T -r 128k -o -s 4G since it creates a 4GB test file for each thread, with 8 threads. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
IMHO, sequential tests are a waste of time. With default configs, it will be difficult to separate the raw performance from prefetched performance. You might try disabling prefetch as an option. Let me clarify: Iozone does a nonsequential series of sequential tests, specifically for the purpose of identifying the performance tiers, separating the various levels of hardware accelerated performance from the raw disk performance. This is the reason why I took out all but 4G of the system RAM. In the (incomplete) results I have so far, it's easy to see these tiers for a single disk: . For filesizes 0 to 4M, a single disk writes 2.8 Gbit/sec and reads ~40-60 Gbit/sec. This boost comes from writing to PERC cache, and reading from CPU L2 cache. . For filesizes 4M to 128M, a single disk writes 2.8 Gbit/sec and reads 24 Gbit/sec. This boost comes from writing to PERC cache, and reading from system memory. . For filesizes 128M to 4G, a single disk writes 1.2 Gbit/sec and reads 24 Gbit/sec. This boost comes from reading system memory. . For filesizes 4G to 16G, a single disk writes 1.2 Gbit/sec and reads 1.2 Gbit/sec This is the raw disk performance. (SAS, 15krpm, 146G disks) Please add some raidz3 tests :-) We have little data on how raidz3 performs. Does this require a specific version of OS? I'm on Solaris 10 10/09, and man zpool doesn't seem to say anything about raidz3 ... I haven't tried using it ... does it exist? Please post results (with raw data would be nice ;-). If you would be so kind as to collect samples of iosnoop -Da I would be eternally grateful :-) I'm guessing iosnoop is an opensolaris thing? Is there an equivalent for solaris? I'll post both the raw results, and my simplified conclusions. Most people would not want the raw data. Most people just want to know What's the performance hit I take by using raidz2 instead of raidz? and so on. Or ... What's faster, raidz, or hardware raid-5? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Sat, 13 Feb 2010, Edward Ned Harvey wrote: kind as to collect samples of iosnoop -Da I would be eternally grateful :-) I'm guessing iosnoop is an opensolaris thing? Is there an equivalent for solaris? Iosnoop is part of the DTrace Toolkit by Brendan Gregg, which does work on Solaris 10. See http://www.brendangregg.com/dtrace.html;. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance benchmarks in various configurations
On Feb 13, 2010, at 10:54 AM, Edward Ned Harvey wrote: Please add some raidz3 tests :-) We have little data on how raidz3 performs. Does this require a specific version of OS? I'm on Solaris 10 10/09, and man zpool doesn't seem to say anything about raidz3 ... I haven't tried using it ... does it exist? Never mind. I have no interest in performance tests for Solaris 10. The code is so old, that it does not represent current ZFS at all. IMHO, if you want to do performance tests, then you need to be on the very latest dev release. Otherwise, the results can't be carried forward to make a difference -- finding performance issues that are already fixed isn't a good use of your time. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance issues over 2.5 years.
Hi, We have approximately 3 million active users and have a storage capacity of 300 TB in ZFS zpools. The ZFS is mounted on Sun cluster using 3*T2000 servers connected with FC to SAN storage. Each zpool is a LUN in SAN which already provides raid so we're not doing raidz on top of it. We started using ZFS about 2.5 years ago on Solaris10U3 or U4 (I can't recall). Our storage growth is roughly 4TB a month. The zpools sizes are from 2TB to the biggest 32T. We're using ZFS to store mail headers (less than 4k) and attachments (1k to 12mb). Currently the Sun cluster handles approx. 20K NFS OPS. File sizing: 1. 2. 10 million files less than 4K a day. 3. 4. Addition to the 10 million there are another 10 million varies sizes: 5. 6. 20% less than 4K. 7. 8. 25% between 4K and 8K 9. 10. 50% between 9K and 100K 11. 12. 5% above 100K till 12M Total 20 million new files a day. We're using two file hierarchies for storing files: For the mail headers (less than 4K): /FF/YYMM/DDHH/SS/ABCDEFGH Explanation: First directory is for the mount point from 00..FF (up to 256 directories) Second directory year and month; Third directory day and hour; Forth is seconds; In the end we have a gzipped file up to 1K. For the mail object: We're using single instancing/de-dup on our application (Meaning no maildir or mbox). Mail objects can be 1K up to 12MB. Directory structure is as follows. /FF/FF/FF/FF/FF/FF/FF/FF/FF/file Explanation: First directory holds 256 directories 00 to FF and the other directories hold up to 256 directories, with the lower branches holding fewer directories than higher branches. At the end of the hierarchy there's a single file. Mail operation: When a new mail is arrived we split the mail into object: a header, and each attachment is an object (even text within a body). The header files are stored as a ²timestamp² (/FF/YYMM/DDHH/SS/file) so it may be advantage for reads because when the users are reading their mail the same day, the metadata of the directories and file can be in cache. But this is not the same for the attachments, when the attachments are store in directories with their HEX value. Our main issue, or problem, over the last 2.5 years of using ZFS: When a zpool becomes full, the write operation becomes significantly slower. At first it happened around 90% zpool capacity and now, after 30-40 zpools, it happens around 80% capacity. The meaning of this for us is that if we define a zpool of 4TB, we can use only 3.2T (82%) effectively. Is there a ³best practice² from SUN/ZFS regarding building directory hierarchies with huge capacity of files(20M a day) ? also, How can we avoid performance degradation when we reach 80% of zpool capacity? Regards Yariv ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
Final rant on this. Managed to get the box re-installed and the performance issue has vanished. So there is a performance bug in zfs some where. Not sure to put in a bug log as I can't now provide any more information. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
So I have poked and prodded the disks and they both seem fine. Any yet my rpool is still slow. Any ideas on what do do now. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
Please unsubscribe me COLLIER -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of John-Paul Drawneek Sent: Thursday, September 03, 2009 2:13 AM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6 So I have poked and prodded the disks and they both seem fine. Any yet my rpool is still slow. Any ideas on what do do now. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
No joy. c1t0d0 89 MB/sec c1t1d0 89 MB/sec c2t0d0 123 MB/sec c2t1d0 123 MB/sec First two are the rpool -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
i did not migrate my disks. I now have 2 pools - rpool is at 60% as is still dog slow. Also scrubbing the rpool causes the box to lock up. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
On Tue, 1 Sep 2009, John-Paul Drawneek wrote: i did not migrate my disks. I now have 2 pools - rpool is at 60% as is still dog slow. Also scrubbing the rpool causes the box to lock up. This sounds like a hardware problem and not something related to fragmentation. Probably you have a slow/failing disk. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
On Tue, 1 Sep 2009, Jpd wrote: Thanks. Any idea on how to work out which one. I can't find smart in ips, so what other ways are there? You could try using a script like this one to find pokey disks: #!/bin/ksh # Date: Mon, 14 Apr 2008 15:49:41 -0700 # From: Jeff Bonwick jeff.bonw...@sun.com # To: Henrik Hjort hj...@dhs.nu # Cc: zfs-discuss@opensolaris.org # Subject: Re: [zfs-discuss] Performance of one single 'cp' # # No, that is definitely not expected. # # One thing that can hose you is having a single disk that performs # really badly. I've seen disks as slow as 5 MB/sec due to vibration, # bad sectors, etc. To see if you have such a disk, try my diskqual.sh # script (below). On my desktop system, which has 8 drives, I get: # # # ./diskqual.sh # c1t0d0 65 MB/sec # c1t1d0 63 MB/sec # c2t0d0 59 MB/sec # c2t1d0 63 MB/sec # c3t0d0 60 MB/sec # c3t1d0 57 MB/sec # c4t0d0 61 MB/sec # c4t1d0 61 MB/sec # # The diskqual test is non-destructive (it only does reads), but to # get valid numbers you should run it on an otherwise idle system. disks=`format /dev/null | grep ' c.t' | nawk '{print $2}'` getspeed1() { ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 21 | nawk '$1 == real { printf(%.0f\n, 67.108864 / $2) }' } getspeed() { # Best out of 6 for iter in 1 2 3 4 5 6 do getspeed1 $1 done | sort -n | tail -2 | head -1 } for disk in $disks do echo $disk `getspeed $disk` MB/sec done -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6
As I understand it, when you expand a pool, the data do not automatically migrate to the other disks. You will have to rewrite the data somehow, usually a backup/restore. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 60% ?
Ok had a pool which got full - so performance tanked. Ran off and got some more disks to create a new pool to put all the extra data. Got the original pool down to 60% util, but the performance is still bad. Any ideas on how to get the performance back? Bad news is that the pool in question is rpool. And this is really starting to effect my productivity on the box. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
You might want to also try toggling the Nagle tcp setting to see if that helps with your workload: ndd -get /dev/tcp tcp_naglim_def (save that value, default is 4095) ndd -set /dev/tcp tcp_naglim_def 1 If no (or a negative) difference, set it back to the original value ndd -set /dev/tcp tcp_naglim_def 4095 (or whatever it was) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
2008/9/30 Jean Dion [EMAIL PROTECTED]: iSCSI requires dedicated network and not a shared network or even VLAN. Backup cause large I/O that fill your network quickly. Like ans SAN today. Could you clarify why it is not suitable to use VLANs for iSCSI? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
Simple. You cannot go faster than the slowest link. Any VLAN share the bandwidth workload and do not provide a dedicated bandwidth for each of them. That means if you have multiple VLAN coming out of the same wire of your server you do not have "n" time the bandwidth but only a fraction of it. Simple network maths. Also iSCSI works better by using segregated IP network switches. Beware that some switches do not guaranty full 1Gbits speed on all ports when all active at the same time. Plan multiple uplinks if you have more than one switch. Once again you cannot go faster than the slowest link. Jean gm_sjo wrote: 2008/9/30 Jean Dion [EMAIL PROTECTED]: iSCSI requires dedicated network and not a shared network or even VLAN. Backup cause large I/O that fill your network quickly. Like ans SAN today. Could you clarify why it is not suitable to use VLANs for iSCSI? -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote: Do you have dedicated iSCSI ports from your server to your NetApp? Yes, it's a dedicated redundant gigabit network. iSCSI requires dedicated network and not a shared network or even VLAN. Backup cause large I/O that fill your network quickly. Like ans SAN today. Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc). Not rare to see performance issues during backup with several thousands small files. Each small file cause seeks to your disk and file system. As the number of files and size you will be impact. That means, thousand of small files cause thousand of small I/O but not a lot of throughput. What statistics can I generate to observe this contention? ZFS pool I/O statistics are not that different when the backup is running. Bigger your file are more likely the block will be consecutive on the file system. Small file can be spread in the entire file system causing seeks, latency and bottleneck. Legato client and server contains tuning parameters to avoid such small file problems. Check your Legato buffer parameters. These buffer will use your server memory as disk cache. I'll ask our backup person to investigate those settings. I assume that Networker should not be buffering files since those files won't be read again. How can I see memory usage by ZFS and by applications? Here is a good source of network tuning parameters for your T2000 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000 The soft_ring is one of the best one. Here is another interesting place to look http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ Thanks. I'll review those documents. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
For Solaris internal debugging tools look here http://opensolaris.org/os/community/advocacy/events/techdays/seattle/OS_SEA_POD_JMAURO.pdf;jsessionid=9B3E275EEB6F1A0E0BC191D8DEC0F965 ZFS specifics is available here http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide Jean Gary Mills wrote: On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote: Do you have dedicated iSCSI ports from your server to your NetApp? Yes, it's a dedicated redundant gigabit network. iSCSI requires dedicated network and not a shared network or even VLAN. Backup cause large I/O that fill your network quickly. Like ans SAN today. Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc). Not rare to see performance issues during backup with several thousands small files. Each small file cause seeks to your disk and file system. As the number of files and size you will be impact. That means, thousand of small files cause thousand of small I/O but not a lot of throughput. What statistics can I generate to observe this contention? ZFS pool I/O statistics are not that different when the backup is running. Bigger your file are more likely the block will be consecutive on the file system. Small file can be spread in the entire file system causing seeks, latency and bottleneck. Legato client and server contains tuning parameters to avoid such small file problems. Check your Legato buffer parameters. These buffer will use your server memory as disk cache. I'll ask our backup person to investigate those settings. I assume that Networker should not be buffering files since those files won't be read again. How can I see memory usage by ZFS and by applications? Here is a good source of network tuning parameters for your T2000 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000 The soft_ring is one of the best one. Here is another interesting place to look http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ Thanks. I'll review those documents. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
Gary - Besides the network questions... What does your zpool status look like? Are you using compression on the file systems? (Was single-threaded and fixed in s10u4 or equiv patches) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
2008/9/30 Jean Dion [EMAIL PROTECTED]: Simple. You cannot go faster than the slowest link. That is indeed correct, but what is the slowest link when using a Layer 2 VLAN? You made a broad statement that iSCSI 'requires' a dedicated, standalone network. I do not believe this is the case. Any VLAN share the bandwidth workload and do not provide a dedicated bandwidth for each of them. That means if you have multiple VLAN coming out of the same wire of your server you do not have n time the bandwidth but only a fraction of it. Simple network maths. I can only assume that you are only referring to VLAN trunks, eg using a NIC on a server for both 'normal' traffic and having another virtual interface on it bound to a 'storage' VLAN. If this is the case then what you say is true, of course you are sharing the same physical link so ultimately that will be the limit. However, and this should be clarified before anyone gets the wrong idea, there is nothing wrong with segmenting a switch by using VLANs to have some ports for storage traffic and some ports for 'normal' traffic. You can have one/multiple NIC(s) for storage, and another/multiple NIC(s) for everything else (or however you please to use your interfaces!). These can be hooked up to switch ports that are on different physical VLANs with no performance degredation. It's best not to assume that every use of a VLAN is a trunk. Also iSCSI works better by using segregated IP network switches. Beware that some switches do not guaranty full 1Gbits speed on all ports when all active at the same time. Plan multiple uplinks if you have more than one switch. Once again you cannot go faster than the slowest link. I think it's fairly safe to assume that you're going to get per-port line-speed across anything other than the cheapest budget switches. Most SMB (and above) switches will be rated at say 48gbit/sec backplane on a 24 port item, for example. However, I am keen to see any benchmarks you may have that shows the performance difference between running a single switch with layer 2 vlans Vs. two seperate switches. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
Normal iSCSI setup split network traffic at physical layer and not logical layer. That mean physical ports and often physical PCI bridge chip if you can. That will be fine for small traffic but we are talking backup performance issues. IP network and number of small files are very often the bottlenecks. If you want performance you do not put all your I/O across the same physical wire. Once again you cannot go faster than the physical wire can support (CAT5E, CAT6, fibre). No matter if it is layer 2 or not. Using VLAN on single port you "share" the bandwidth and not creating more Gbits speed with Layer 2. iSCSI best practice require separated physical network. Many books, white papers are written about this. This is like any FC SAN implementation. We always split the workload between disk and tape using more than one HBA. Never forget , backup are intensive I/O and will fill the entire I/O path. Jean gm_sjo wrote: 2008/9/30 Jean Dion [EMAIL PROTECTED]: Simple. You cannot go faster than the slowest link. That is indeed correct, but what is the slowest link when using a Layer 2 VLAN? You made a broad statement that iSCSI 'requires' a dedicated, standalone network. I do not believe this is the case. Any VLAN share the bandwidth workload and do not provide a dedicated bandwidth for each of them. That means if you have multiple VLAN coming out of the same wire of your server you do not have "n" time the bandwidth but only a fraction of it. Simple network maths. I can only assume that you are only referring to VLAN trunks, eg using a NIC on a server for both 'normal' traffic and having another virtual interface on it bound to a 'storage' VLAN. If this is the case then what you say is true, of course you are sharing the same physical link so ultimately that will be the limit. However, and this should be clarified before anyone gets the wrong idea, there is nothing wrong with segmenting a switch by using VLANs to have some ports for storage traffic and some ports for 'normal' traffic. You can have one/multiple NIC(s) for storage, and another/multiple NIC(s) for everything else (or however you please to use your interfaces!). These can be hooked up to switch ports that are on different physical VLANs with no performance degredation. It's best not to assume that every use of a VLAN is a trunk. Also iSCSI works better by using segregated IP network switches. Beware that some switches do not guaranty full 1Gbits speed on all ports when all active at the same time. Plan multiple uplinks if you have more than one switch. Once again you cannot go faster than the slowest link. I think it's fairly safe to assume that you're going to get per-port line-speed across anything other than the cheapest budget switches. Most SMB (and above) switches will be rated at say 48gbit/sec backplane on a 24 port item, for example. However, I am keen to see any benchmarks you may have that shows the performance difference between running a single switch with layer 2 vlans Vs. two seperate switches. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
On Tue, Sep 30, 2008 at 10:32:50AM -0700, William D. Hathaway wrote: Gary - Besides the network questions... Yes, I suppose I should see if traffic on the Iscsi network is hitting a limit of some sort. What does your zpool status look like? Pretty simple: $ zpool status pool: space state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM spaceONLINE 0 0 0 c4t60A98000433469764E4A2D456A644A74d0 ONLINE 0 0 0 c4t60A98000433469764E4A2D456A696579d0 ONLINE 0 0 0 c4t60A98000433469764E4A476D2F6B385Ad0 ONLINE 0 0 0 c4t60A98000433469764E4A476D2F664E4Fd0 ONLINE 0 0 0 errors: No known data errors The four LUNs use the built-in I/O multipathing, with separate Iscsi networks, switches, and ethernet interfaces. Are you using compression on the file systems? (Was single-threaded and fixed in s10u4 or equiv patches) No, I've never enabled compression there. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote: Legato client and server contains tuning parameters to avoid such small file problems. Check your Legato buffer parameters. These buffer will use your server memory as disk cache. Our backup person tells me that there are no settings in Networker that affect buffering on the client side. Here is a good source of network tuning parameters for your T2000 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000 The soft_ring is one of the best one. Those references are for network tuning. I don't want to change things blindly. How do I tell if they are necessary, that is if the network is the bottleneck in the I/O system? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
2008/9/30 Jean Dion [EMAIL PROTECTED]: If you want performance you do not put all your I/O across the same physical wire. Once again you cannot go faster than the physical wire can support (CAT5E, CAT6, fibre). No matter if it is layer 2 or not. Using VLAN on single port you share the bandwidth and not creating more Gbits speed with Layer 2. iSCSI best practice require separated physical network. Many books, white papers are written about this. Yes, that's true, but I don't believe you mentioned single NIC implementations in your original statement. Just seeking clarification to help others :-) I think it's worth clarifying that iSCSI and VLANs is okay as long as people appreciate you will require seperate interfaces to get best performance. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
gm_sjo wrote: 2008/9/30 Jean Dion [EMAIL PROTECTED]: If you want performance you do not put all your I/O across the same physical wire. Once again you cannot go faster than the physical wire can support (CAT5E, CAT6, fibre). No matter if it is layer 2 or not. Using VLAN on single port you share the bandwidth and not creating more Gbits speed with Layer 2. iSCSI best practice require separated physical network. Many books, white papers are written about this. Yes, that's true, but I don't believe you mentioned single NIC implementations in your original statement. Just seeking clarification to help others :-) I think it's worth clarifying that iSCSI and VLANs is okay as long as people appreciate you will require seperate interfaces to get best performance. Separate interfaces or networks may not be required, but properly sized networks are highly desirable. For example, a back-of-the-envelope analysis shows that a single 10GbE pipe is sufficient to drive 8 T10KB drives. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS performance degradation when backups are running
Do you have dedicated iSCSI ports from your server to your NetApp? iSCSI requires dedicated network and not a shared network or even VLAN. Backup cause large I/O that fill your network quickly. Like ans SAN today. Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc). Not rare to see performance issues during backup with several thousands small files. Each small file cause seeks to your disk and file system. As the number of files and size you will be impact. That means, thousand of small files cause thousand of small I/O but not a lot of throughput. Bigger your file are more likely the block will be consecutive on the file system. Small file can be spread in the entire file system causing seeks, latency and bottleneck. Legato client and server contains tuning parameters to avoid such small file problems. Check your Legato buffer parameters. These buffer will use your server memory as disk cache. Here is a good source of network tuning parameters for your T2000 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000 The soft_ring is one of the best one. Here is another interesting place to look http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ Jean Dion Storage Architect Data Management Ambassador -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS performance degradation when backups are running
We have a moderately sized Cyrus installation with 2 TB of storage and a few thousand simultaneous IMAP sessions. When one of the backup processes is running during the day, there's a noticable slowdown in IMAP client performance. When I start my `mutt' mail reader, it pauses for several seconds at `Selecting INBOX'. That behavior disappears when the backup finishes. The IMAP server is a T2000 with six ZFS filesystems that correspond to Cyrus partitions. Storage is four Iscsi LUNs from our Netapp filer. The backup in question is done with EMC Networker. I've looked at zpool I/O statistics when the backup is running, but there's nothing clearly wrong. I'm wondering if perhaps all the read activity by the backup system is causing trouble with ZFS' caching. Is there some way to examine this area? -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored
Ralf Bertling wrote: Hi list, as this matter pops up every now and then in posts on this list I just want to clarify that the real performance of RaidZ (in its current implementation) is NOT anything that follows from raidz-style data efficient redundancy or the copy-on-write design used in ZFS. In a M-Way mirrored setup of N disks you get the write performance of the worst disk and a read performance that is the sum of all disks (for streaming and random workloads, while latency is not improved) Apart from the write performance you get very bad disk utilization from that scenario. I beg to differ with very bad disk utilization. IMHO you get perfect disk utilization for M-way redundancy :-) In Raid-Z currently we have to distinguish random reads from streaming reads: - Write performance (with COW) is (N-M)*worst single disk write performance since all writes are streaming writes by design of ZFS (which is N-M-1 times faste than mirrored) - Streaming read performance is N*worst read performance of a single disk (which is identical to mirrored if all disks have the same speed) - The problem with the current implementation is that N-M disks in a vdev are currently taking part in reading a single byte from a it, which i turn results in the slowest performance of N-M disks in question. You will not be able to predict real-world write or sequential read performance with this simple analysis because there are many caches involved. The caching effects will dominate for many cases. ZFS actually works well with write caches, so it will be doubly difficult to predict write performance. You can predict small, random read workload performance, though, because you can largely discount the caching effects for most scenarios, especially JBODs. Now lets see if this really has to be this way (this implies no, doesn't it ;-) When reading small blocks of data (as opposed to streams discussed earlier) the requested data resides on a single disk and thus reading it does not require to send read commands to all disks in the vdev. Without detailed knowledge of the ZFS code, I suspect the problem is the logical block size of any ZFS operation always uses the full stripe. If true, I think this is a design error. No, the reason is that the block is checksummed and we check for errors upon read by verifying the checksum. If you search the zfs-discuss archives you will find this topic arises every 6 months or so. Here is a more interesting thread on the subject, dated November 2006: http://mail.opensolaris.org/pipermail/zfs-discuss/2006-November/035711.html You will also note that for fixed record length workloads, we tend to recommend the blocksize be matched with the ZFS recordsize. This will improve efficiency for reads, in general. Without that, random reads to a raid-z are almost as fast as mirrored data. The theoretical disadvantages come from disks that have different speed (probably insignificant in any real-life scenario) and the statistical probability that by chance a few particular random reads do in fact have to access the same disk drive to be fulfilled. (In a mirrored setup, ZFS can choose from all idle devices, whereas in RAID-Z it has to wait for the disk that holds the data to be ready processing its current requests). Looking more closely, this effect mostly affects latency (not performance) as random read-requests coming in should be distributed equally across all devices even bette if the queue of requests gets longer (this would however require ZFS to reorder requests for maximum performance. ZFS does re-order I/O. Array controllers re-order the re-ordered I/O. Disks then re-order I/O, just to make sure it was re-ordered again. So it is also difficult to develop meaningful models of disk performance in these complex systems. Since this seems to be a real issue for many ZFS users, it would be nice if someone who has more time than me to look into the code, can comment on the amount of work required to boost RaidZ read performance. Periodically, someone offers to do this... but I haven't seen an implementation. Doing so would level the tradeoff between read- write- performance and disk utilization significantly. Obviously if disk space (and resulting electricity costs) do not matter compared to getting maximum read performance, you will always be best of with 3 or even more way mirrors and a very large number of vdevs in your pool. Space, performance, reliability: pick two. sidebar The ZFS checksum has proven to be very effective at identifying data corruption in systems. In a traditional RAID-5 implementation, like SVM, the data is assumed to be correct if the read operation returned without an error. If you try to make SVM more reliable by adding a checksum, then you will end up at approximately the same place ZFS is: by distrusting the hardware you take a performance penalty, but improve your data
[zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored
Hi list, as this matter pops up every now and then in posts on this list I just want to clarify that the real performance of RaidZ (in its current implementation) is NOT anything that follows from raidz-style data efficient redundancy or the copy-on-write design used in ZFS. In a M-Way mirrored setup of N disks you get the write performance of the worst disk and a read performance that is the sum of all disks (for streaming and random workloads, while latency is not improved) Apart from the write performance you get very bad disk utilization from that scenario. In Raid-Z currently we have to distinguish random reads from streaming reads: - Write performance (with COW) is (N-M)*worst single disk write performance since all writes are streaming writes by design of ZFS (which is N-M-1 times faste than mirrored) - Streaming read performance is N*worst read performance of a single disk (which is identical to mirrored if all disks have the same speed) - The problem with the current implementation is that N-M disks in a vdev are currently taking part in reading a single byte from a it, which i turn results in the slowest performance of N-M disks in question. Now lets see if this really has to be this way (this implies no, doesn't it ;-) When reading small blocks of data (as opposed to streams discussed earlier) the requested data resides on a single disk and thus reading it does not require to send read commands to all disks in the vdev. Without detailed knowledge of the ZFS code, I suspect the problem is the logical block size of any ZFS operation always uses the full stripe. If true, I think this is a design error. Without that, random reads to a raid-z are almost as fast as mirrored data. The theoretical disadvantages come from disks that have different speed (probably insignificant in any real-life scenario) and the statistical probability that by chance a few particular random reads do in fact have to access the same disk drive to be fulfilled. (In a mirrored setup, ZFS can choose from all idle devices, whereas in RAID- Z it has to wait for the disk that holds the data to be ready processing its current requests). Looking more closely, this effect mostly affects latency (not performance) as random read-requests coming in should be distributed equally across all devices even bette if the queue of requests gets longer (this would however require ZFS to reorder requests for maximum performance. Since this seems to be a real issue for many ZFS users, it would be nice if someone who has more time than me to look into the code, can comment on the amount of work required to boost RaidZ read performance. Doing so would level the tradeoff between read- write- performance and disk utilization significantly. Obviously if disk space (and resulting electricity costs) do not matter compared to getting maximum read performance, you will always be best of with 3 or even more way mirrors and a very large number of vdevs in your pool. A further question that springs to mind is if copies=N is also used to improve read performance. If so, you could have some read-optimized filesystems in a pool while others use maximum storage efficiency (as for backups). Regards, ralf -- Ralf Bertling ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss