Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-11 Thread Robert Milkowski


 Now, if anyone is still reading, I have another question. The new Solaris
11
 device naming convention hides the physical tree from me. I got just a
list of
 long disk names all starting with c0 (see below) but I need to know
which
 disk is connected to which controller so that I can create two parts of my
 mirrors to two different controllers in order to tolerate a single
controller
 failure. I need a way of figuring the connection path for each disk. Hope
I
 manage to explain what I want?

See diskinfo(1M), for example:


$ diskinfo -T bay -o Rc -h
HDD00  -
HDD01  -
HDD02  c0t5000CCA00AC87F54d0
HDD03  c0t5000CCA00AA95838d0
HDD04  c0t5000CCA01510ECC0d0
HDD05  c0t5000CCA01515EE78d0
HDD06  c0t5000CCA01512DA3Cd0
HDD07  c0t5000CCA00AB3E1C8d0
HDD08  c0t5000CCA0151C1D18d0
HDD09  c0t5000CCA0151F7E08d0
HDD10  c0t5000CCA0151C7CA8d0
HDD11  c0t5000CCA00AA9D570d0
HDD12  c0t5000CCA0151CB180d0
HDD13  c0t5000CCA015208C98d0
HDD14  c0t5000CCA00AA97F04d0
HDD15  c0t5000CCA0151A287Cd0
HDD16  c0t5000CCA00AAA1544d0
HDD17  c0t5000CCA01521070Cd0
HDD18  c0t5000CCA00AA97EF4d0
HDD19  c0t5000CCA015214F84d0
HDD20  c0t5000CCA015214844d0
HDD21  c0t5000CCA00AAAD154d0
HDD22  c0t5000CCA00AA95558d0
HDD23  c0t5000CCA00AAA0D1Cd0


In your case you probably will have to put a configuration in place for your
disk slots (on Oracle's HW it works out of the box) - go to
support.oracle.com and look for the document:

How To : Selecting a Physical Slot for a SAS Device with a WWN for
an Oracle Solaris 11 Installation [ID 1411444.1]




ps. there is also zpool status -l option which is cool:

$ zpool status -l cwafseng3-0
  pool: pool-0
 state: ONLINE
  scan: scrub canceled on Thu Apr 12 13:52:13 2012
config:

NAME
STATE READ WRITE CKSUM 
pool-0
ONLINE   0 0 0
  raidz1-0
ONLINE   0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD02/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD23/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD22/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD21/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD20/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD19/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD17/disk  ONLINE
0 0 0
/dev/chassis/SUN-FIRE-X4270-M2-SERVER.unknown/HDD15/disk  ONLINE
0 0 0

errors: No known data errors

Best regards,
 Robert Milkowski
 http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-09 Thread Roman Matiyenko
Thanks for the tips, everybody!

Progress report:

OpenIndiana failed to recognise LSI 9240-8i's. I installed 4.7 drivers
from LSI website (for Solaris 11 and up) but it started throwing
component failed messages. So I gave up on 9240's and re-flashed
them into 9211-8i's (IT mode). Solaris 11 (11.11) recognised 9211
adapters instantly and so far show perfect performance with default
drivers with dd test on raw disks both reading and writing and also on
dd writing into a zpool built of 10 x two-way mirrors. The speed is
around 1GB/s. There are still some hiccups in this sequential write
process (for 4-5 sec the speed would drop on all disks suddenly when
monitored by iostat but then pick up to the usual 140MB/s per disk).
This is so much better then Solaris 11 with 9240's going persistently
around 3-4MB/s per disk on a simple dd seq write. I am pleased with
this performance.

Now, if anyone is still reading, I have another question. The new
Solaris 11 device naming convention hides the physical tree from me. I
got just a list of long disk names all starting with c0 (see below)
but I need to know which disk is connected to which controller so that
I can create two parts of my mirrors to two different controllers in
order to tolerate a single controller failure. I need a way of
figuring the connection path for each disk. Hope I manage to explain
what I want?


root@carbon:~# echo | format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c0t5000CCA225CEFC73d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cefc73
   1. c0t5000CCA225CEFD0Bd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cefd0b
   2. c0t5000CCA225CEFD12d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cefd12
   3. c0t5000CCA225CEFEDEd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cefede
   4. c0t5000CCA225CEFEE7d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cefee7
   5. c0t5000CCA225CF016Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf016c
   6. c0t5000CCA225CF016Dd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf016d
   7. c0t5000CCA225CF016Ed0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf016e
   8. c0t5000CCA225CF023Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf023c
   9. c0t5000CCA225CF042Cd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf042c
  10. c0t5000CCA225CF050Fd0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf050f
  11. c0t5000CCA225CF0115d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0115
  12. c0t5000CCA225CF0119d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0119
  13. c0t5000CCA225CF0144d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0144
  14. c0t5000CCA225CF0156d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0156
  15. c0t5000CCA225CF0167d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0167
  16. c0t5000CCA225CF0419d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0419
  17. c0t5000CCA225CF0420d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0420
  18. c0t5000CCA225CF0517d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0517
  19. c0t5000CCA225CF0522d0 ATA-Hitachi HUA72303-A5C0-2.73TB
  /scsi_vhci/disk@g5000cca225cf0522
  20. c0t5001517BB27B5896d0 ATA-INTEL SSDSC2CW24-400i-223.57GB
  /scsi_vhci/disk@g5001517bb27b5896
  21. c0t5001517BB27DCE0Bd0 ATA-INTEL SSDSC2CW24-400i-223.57GB
  /scsi_vhci/disk@g5001517bb27dce0b
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-09 Thread Roman Matiyenko
I followed this guide but instead of 2108it.bin I downloaded the
latest firmware file for 9211-8i from LSI web site. I now have three
9211's! :)

http://lime-technology.com/forum/index.php?topic=12767.msg124393#msg124393





On 4 May 2012 18:33, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote:
 On Fri, 4 May 2012, Rocky Shek wrote:


 If I were you, I will not use 9240-8I.

 I will use 9211-8I as pure HBA with IT FW for ZFS.


 Is there IT FW for the 9240-8i?

 They seem to use the same SAS chipset.

 My next system will have 9211-8i with IT FW.  Playing it safe.  Good enough
 for Nexenta is good enough for me.

 Bob

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-09 Thread Roman Matiyenko
Downloaded, unzipped and flying! It shows GUID which is part of the
/dev/rdsk/c0t* name! Thanks!!! And thanks again! This msg goes
to the group.

root@carbon:~/bin/LSI-SAS2IRCU/SAS2IRCU_P13/sas2ircu_solaris_x86_rel#
./sas2ircu 0 DISPLAY | grep GUID
  GUID: 5000cca225cefd12
  GUID: 5000cca225cf0119
  GUID: 5000cca225cefd0b
  GUID: 5000cca225cf0420
  GUID: 5000cca225cf0517
  GUID: 5000cca225cf0115
  GUID: 5000cca225cf016d




On 9 May 2012 15:32, Daniel J. Priem daniel.pr...@disy.net wrote:

 http://www.lsi.com/channel/products/storagecomponents/Pages/LSISAS9211-8i.aspx

 select * SUPPORT  DOWNLOADS
 download  SAS2IRCU_P13

 best regards
 daniel



 Roman Matiyenko rmatiye...@gmail.com writes:

 Thanks!

 LSI 9211-8i are not recognised by lsiutil. I run this S11 under VMware
 ESXi with the three PCI devices in pass-through mode. The main
 (virtual) disk controller is LSI as well with an VMDK boot disk
 attached, and it is recognised. Nevertheless, many thanks for trying
 to help!

 Roman
 PS I got your other message with links, will see now...

 root@carbon:~/bin# ./lsiutil

 LSI Logic MPT Configuration Utility, Version 1.62, January 14, 2009

 1 MPT Port found

      Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC
  1.  mpt0              LSI Logic 53C1030 B0      102      01032920     0

 Select a device:  [1-1 or 0 to quit] 1

  1.  Identify firmware, BIOS, and/or FCode
  2.  Download firmware (update the FLASH)
  4.  Download/erase BIOS and/or FCode (update the FLASH)
  8.  Scan for devices
 10.  Change IOC settings (interrupt coalescing)
 11.  Change SCSI Initiator settings
 12.  Change SCSI Target settings
 20.  Diagnostics
 21.  RAID actions
 22.  Reset bus
 23.  Reset target
 42.  Display operating system names for devices
 59.  Dump PCI config space
 60.  Show non-default settings
 61.  Restore default settings
 69.  Show board manufacturing information
 99.  Reset port
  e   Enable expert mode in menus
  p   Enable paged mode
  w   Enable logging

 Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 42

 mpt0 is /dev/cfg/c4

  B___T___L  Type       Operating System Device Name
  0   0   0  Disk       /dev/rdsk/c4t0d0s2

 Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 8

 53C1030's host SCSI ID is 7

  B___T___L  Type       Vendor   Product          Rev   Negotiated Speed  
 Width
  0   0   0  Disk       VMware   Virtual disk     1.0   Ultra4 Wide, 320 
 MB/sec





 On 9 May 2012 15:08, Daniel J. Priem daniel.pr...@disy.net wrote:
 attached.
 i didn't know where to download




 Roman Matiyenko rmatiye...@gmail.com writes:

 Hi Daniel,
 Thanks. Where do I get lsiutil? I am on Oracle Solaris 11.
 LSI wibsite says that for 9211-8i you don't need to install drivers as
 they come with Solaris OS. So they don't have anything to download for
 solaris.
 Roman

 On 9 May 2012 14:22, Daniel J. Priem daniel.pr...@disy.net wrote:
 Hi,

 Roman Matiyenko rmatiye...@gmail.com writes:

 Now, if anyone is still reading, I have another question. The new
 Solaris 11 device naming convention hides the physical tree from me. I
 got just a list of long disk names all starting with c0 (see below)
 but I need to know which disk is connected to which controller so that
 I can create two parts of my mirrors to two different controllers in
 order to tolerate a single controller failure. I need a way of
 figuring the connection path for each disk. Hope I manage to explain
 what I want?

 lsiutil
 select controller
 select option 42

 Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 42

 mpt2 is /dev/cfg/c8

  B___T___L  Type       Operating System Device Name
  0   0   0  Disk       /dev/rdsk/c8t0d0s2
  0   1   0  Disk       /dev/rdsk/c8t1d0s2
  0   2   0  Tape       /dev/rmt/0
  0   3   0  Disk       /dev/rdsk/c8t3d0s2
  0   4   0  Disk       /dev/rdsk/c8t4d0s2

 Main menu, select an option:  [1-99 or e/p/w or 0 to quit]

 Best Regards
 Daniel

 --
 disy Informationssysteme GmbH
 Daniel Priem
 Senior Netzwerk- und Systemadministrator
 Tel: +49 721 1 600 6-000, Fax: -05, E-Mail: daniel.pr...@disy.net

 Firmensitz: Erbprinzenstr. 4-12, 76133 Karlsruhe
 Registergericht: Amtsgericht Mannheim, HRB 107964
 Geschäftsführer: Claus Hofmann


 --
 disy Informationssysteme GmbH
 Daniel Priem
 Senior Netzwerk- und Systemadministrator
 Tel: +49 721 1 600 6-000, Fax: -05, E-Mail: daniel.pr...@disy.net

 Firmensitz: Erbprinzenstr. 4-12, 76133 Karlsruhe
 Registergericht: Amtsgericht Mannheim, HRB 107964
 Geschäftsführer: Claus Hofmann
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Roman Matiyenko
Hi all,

I have a bad bad problem with our brand new server!

The lengthy details are below but to cut the story short, on the same
hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS
sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and
only 200-240MB/s on latest Solaris 11.11 (same zpool config). By
writing directly to raw disks I found that in S10 the speed is 140MB/s
sequential writes per disk (consistent with combined 1.4GB/s for my
zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s
zpool, 10 mirrors 24MB/s each).

This must be the controller drivers, right? I downloaded drivers
version 4.7 off LSI site (says for Solaris 10 and later) - they
failed to attach on S11. Version 3.03 worked but the system would
randomly crash, so I moved my experiments off S11 to S10. However, S10
has only the old implementation if iSCSI which gives me other problems
so I decided to give S11 another go.

Would there be any advice in this community?

Many thanks!

Roman

==


root@carbon:~# echo | format | grep Hitachi
  1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
  9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB



Reading DD from all disks:
(dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 )

# Iostat –xznM 2

   extended device statistics
   r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 614.50.0  153.60.0  0.0  1.00.01.6   0  98 c5t8d1
 595.50.0  148.90.0  0.0  1.00.01.7   0  99 c7t8d1
 1566.50.0  391.60.0  0.0  1.00.00.6   1  96 c6t8d1 # (SSD)
 618.50.0  154.60.0  0.0  1.00.01.6   0  99 c6t9d1
 616.50.0  154.10.0  0.0  1.00.01.6   0  99 c5t9d1
 1564.00.0  391.00.0  0.0  1.00.00.6   1  96 c7t9d1# (SSD)
 616.00.0  154.00.0  0.0  1.00.01.6   0  98 c7t10d1
 554.00.0  138.50.0  0.0  1.00.01.8   0  99 c6t10d1
 598.50.0  149.60.0  0.0  1.00.01.7   0  99 c5t10d1
 588.50.0  147.10.0  0.0  1.00.01.7   0  98 c6t11d1
 590.50.0  147.60.0  0.0  1.00.01.7   0  98 c7t11d1
 591.50.0  147.90.0  0.0  1.00.01.7   0  99 c5t11d1
 600.50.0  150.10.0  0.0  1.00.01.6   0  98 c6t13d1
 617.50.0  154.40.0  0.0  1.00.01.6   0  99 c7t12d1
 611.00.0  152.80.0  0.0  1.00.01.6   0  99 c5t13d1
 625.00.0  156.30.0  0.0  1.00.01.6   0  99 c6t14d1
 592.50.0  148.10.0  0.0  1.00.01.7   0  99 c7t13d1
 596.00.0  149.00.0  0.0  1.00.01.7   0  99 c5t14d1
 598.50.0  149.60.0  0.0  1.00.01.6   0  98 c6t15d1
 618.50.0  154.60.0  0.0  1.00.01.6   0  98 c7t14d1
 606.50.0  151.60.0  0.0  1.00.01.6   0  98 c5t15d1
 625.00.0  156.30.0  0.0  1.00.01.6   0  98 c7t15d1
   extended device statistics
   r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 620.50.0  155.10.0  0.0  1.00.01.6   0  99 c5t8d1
 620.50.0  155.10.0  0.0  1.00.01.6   0  99 c7t8d1
 1581.00.0  395.20.0  0.0  1.00.00.6   1  96 c6t8d1
 611.50.0  152.90.0  0.0  1.00.01.6   0  99 c6t9d1
 587.50.0  146.90.0  0.0  1.00.01.7   0  99 c5t9d1
 1580.00.0  395.00.0  0.0  1.00.00.6   1  97 c7t9d1
 593.00.0  148.20.0  0.0  1.00.01.7   0  99 c7t10d1
 616.00.0  154.00.0  0.0  1.00.01.6   0  99 c6t10d1
 601.00.0  150.20.0  0.0  1.00.01.6   0  99 c5t10d1
 587.00.0  146.70.0  0.0  1.00.01.7   0  99 c6t11d1
 578.50.0  144.60.0  0.0  1.00.01.7   0  99 c7t11d1
 624.50.0  156.10.0  0.0  1.00.01.6   0  99 c5t11d1
 604.50.0  151.10.0  0.0  1.00.01.6   0  99 c6t13d1
 573.50.0  143.40.0  0.0  1.00.01.7   0  99 c7t12d1
 609.00.0  152.20.0  0.0  1.00.01.6   0  99 c5t13d1
 630.50.0  157.60.0  0.0  1.00.01.6   0  99 c6t14d1
 618.50.0  

Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Richard Elling
/pci1000,9240@0 1 imraid_sas
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@8,1 4 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@9,1 5 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@a,1 9 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@b,1 11 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@d,1 14 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@e,1 17 sd
 /pci@0,0/pci15ad,7a0@16/pci1000,9240@0/sd@f,1 20 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0 2 imraid_sas
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@8,1 3 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@9,1 7 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@a,1 8 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@b,1 12 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@c,1 15 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@d,1 18 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@e,1 21 sd
 /pci@0,0/pci15ad,7a0@17/pci1000,9240@0/sd@f,1 23 sd
 
 root@carbon:~# grep imraid /etc/driver_aliases
 imraid_sas pciex1000,73
 
 #
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Hung-Sheng Tsao Ph.D.


hi
 s11 come with its own driver for some lsi sas HCA
but on  the HCL
I only see LSI
SAS 9200-8e 
http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi_logic/sol_11_11_11/9409.html 

LSI MegaRAID SAS 9260-8i 
http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi/sol_10_10_09/3264.html 

LSI 6Gb SAS2008 daughtercard 
http://www.oracle.com/webfolder/technetwork/hcl/data/components/details/lsi/sol_10_10_09/3263.html 


regards


On 5/4/2012 8:25 AM, Roman Matiyenko wrote:

Hi all,

I have a bad bad problem with our brand new server!

The lengthy details are below but to cut the story short, on the same
hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS
sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and
only 200-240MB/s on latest Solaris 11.11 (same zpool config). By
writing directly to raw disks I found that in S10 the speed is 140MB/s
sequential writes per disk (consistent with combined 1.4GB/s for my
zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s
zpool, 10 mirrors 24MB/s each).

This must be the controller drivers, right? I downloaded drivers
version 4.7 off LSI site (says for Solaris 10 and later) - they
failed to attach on S11. Version 3.03 worked but the system would
randomly crash, so I moved my experiments off S11 to S10. However, S10
has only the old implementation if iSCSI which gives me other problems
so I decided to give S11 another go.

Would there be any advice in this community?

Many thanks!

Roman

==


root@carbon:~# echo | format | grep Hitachi
   1. c5t8d1ATA-Hitachi HUA72303-A5C0-2.73TB
   2. c5t9d1ATA-Hitachi HUA72303-A5C0-2.73TB
   3. c5t10d1ATA-Hitachi HUA72303-A5C0-2.73TB
   4. c5t11d1ATA-Hitachi HUA72303-A5C0-2.73TB
   5. c5t13d1ATA-Hitachi HUA72303-A5C0-2.73TB
   6. c5t14d1ATA-Hitachi HUA72303-A5C0-2.73TB
   7. c5t15d1ATA-Hitachi HUA72303-A5C0-2.73TB
   9. c6t9d1ATA-Hitachi HUA72303-A5C0-2.73TB
  10. c6t10d1ATA-Hitachi HUA72303-A5C0-2.73TB
  11. c6t11d1ATA-Hitachi HUA72303-A5C0-2.73TB
  12. c6t13d1ATA-Hitachi HUA72303-A5C0-2.73TB
  13. c6t14d1ATA-Hitachi HUA72303-A5C0-2.73TB
  14. c6t15d1ATA-Hitachi HUA72303-A5C0-2.73TB
  15. c7t8d1ATA-Hitachi HUA72303-A5C0-2.73TB
  17. c7t10d1ATA-Hitachi HUA72303-A5C0-2.73TB
  18. c7t11d1ATA-Hitachi HUA72303-A5C0-2.73TB
  19. c7t12d1ATA-Hitachi HUA72303-A5C0-2.73TB
  20. c7t13d1ATA-Hitachi HUA72303-A5C0-2.73TB
  21. c7t14d1ATA-Hitachi HUA72303-A5C0-2.73TB
  22. c7t15d1ATA-Hitachi HUA72303-A5C0-2.73TB



Reading DD from all disks:
(dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1)

# Iostat –xznM 2

extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  614.50.0  153.60.0  0.0  1.00.01.6   0  98 c5t8d1
  595.50.0  148.90.0  0.0  1.00.01.7   0  99 c7t8d1
  1566.50.0  391.60.0  0.0  1.00.00.6   1  96 c6t8d1 # (SSD)
  618.50.0  154.60.0  0.0  1.00.01.6   0  99 c6t9d1
  616.50.0  154.10.0  0.0  1.00.01.6   0  99 c5t9d1
  1564.00.0  391.00.0  0.0  1.00.00.6   1  96 c7t9d1# (SSD)
  616.00.0  154.00.0  0.0  1.00.01.6   0  98 c7t10d1
  554.00.0  138.50.0  0.0  1.00.01.8   0  99 c6t10d1
  598.50.0  149.60.0  0.0  1.00.01.7   0  99 c5t10d1
  588.50.0  147.10.0  0.0  1.00.01.7   0  98 c6t11d1
  590.50.0  147.60.0  0.0  1.00.01.7   0  98 c7t11d1
  591.50.0  147.90.0  0.0  1.00.01.7   0  99 c5t11d1
  600.50.0  150.10.0  0.0  1.00.01.6   0  98 c6t13d1
  617.50.0  154.40.0  0.0  1.00.01.6   0  99 c7t12d1
  611.00.0  152.80.0  0.0  1.00.01.6   0  99 c5t13d1
  625.00.0  156.30.0  0.0  1.00.01.6   0  99 c6t14d1
  592.50.0  148.10.0  0.0  1.00.01.7   0  99 c7t13d1
  596.00.0  149.00.0  0.0  1.00.01.7   0  99 c5t14d1
  598.50.0  149.60.0  0.0  1.00.01.6   0  98 c6t15d1
  618.50.0  154.60.0  0.0  1.00.01.6   0  98 c7t14d1
  606.50.0  151.60.0  0.0  1.00.01.6   0  98 c5t15d1
  625.00.0  156.30.0  0.0  1.00.01.6   0  98 c7t15d1
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  620.50.0  155.10.0  0.0  1.00.01.6   0  99 c5t8d1
  620.50.0  155.10.0  0.0  1.00.01.6   0  99 c7t8d1
  1581.00.0  395.20.0  0.0  1.00.00.6   1  96 c6t8d1
  611.50.0  152.90.0  0.0  1.00.01.6   0  99 c6t9d1
  587.50.0  146.90.0  0.0  1.00.01.7   0  99 c5t9d1
  1580.00.0  395.00.0  0.0  1.00.00.6   1  97 c7t9d1
  593.00.0  148.20.0  0.0  1.00.01.7   0  99 c7t10d1
  616.00.0  154.00.0  0.0  1.00.01.6   0  99 c6t10d1
  

Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Rocky Shek
Roman,

 

If I were you, I will not use 9240-8I. 

 

I will use 9211-8I as pure HBA with IT FW for ZFS. 

 

Rocky

 

 

From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Richard Elling
Sent: Friday, May 04, 2012 8:00 AM
To: Roman Matiyenko
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

 

On May 4, 2012, at 5:25 AM, Roman Matiyenko wrote:





Hi all,

I have a bad bad problem with our brand new server!

The lengthy details are below but to cut the story short, on the same
hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS
sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and
only 200-240MB/s on latest Solaris 11.11 (same zpool config). By
writing directly to raw disks I found that in S10 the speed is 140MB/s
sequential writes per disk (consistent with combined 1.4GB/s for my
zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s
zpool, 10 mirrors 24MB/s each).

This must be the controller drivers, right? I downloaded drivers
version 4.7 off LSI site (says for Solaris 10 and later) - they
failed to attach on S11. Version 3.03 worked but the system would
randomly crash, so I moved my experiments off S11 to S10. However, S10
has only the old implementation if iSCSI which gives me other problems
so I decided to give S11 another go.

Would there be any advice in this community?

 

Look at one of the other distros, OpenIndiana is a good first step.

 -- richard






Many thanks!

Roman

==


root@carbon:~# echo | format | grep Hitachi
 1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
 9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB
20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB



Reading DD from all disks:
(dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 )

# Iostat -xznM 2

  extended device statistics
  r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
614.50.0  153.60.0  0.0  1.00.01.6   0  98 c5t8d1
595.50.0  148.90.0  0.0  1.00.01.7   0  99 c7t8d1
1566.50.0  391.60.0  0.0  1.00.00.6   1  96 c6t8d1 # (SSD)
618.50.0  154.60.0  0.0  1.00.01.6   0  99 c6t9d1
616.50.0  154.10.0  0.0  1.00.01.6   0  99 c5t9d1
1564.00.0  391.00.0  0.0  1.00.00.6   1  96 c7t9d1# (SSD)
616.00.0  154.00.0  0.0  1.00.01.6   0  98 c7t10d1
554.00.0  138.50.0  0.0  1.00.01.8   0  99 c6t10d1
598.50.0  149.60.0  0.0  1.00.01.7   0  99 c5t10d1
588.50.0  147.10.0  0.0  1.00.01.7   0  98 c6t11d1
590.50.0  147.60.0  0.0  1.00.01.7   0  98 c7t11d1
591.50.0  147.90.0  0.0  1.00.01.7   0  99 c5t11d1
600.50.0  150.10.0  0.0  1.00.01.6   0  98 c6t13d1
617.50.0  154.40.0  0.0  1.00.01.6   0  99 c7t12d1
611.00.0  152.80.0  0.0  1.00.01.6   0  99 c5t13d1
625.00.0  156.30.0  0.0  1.00.01.6   0  99 c6t14d1
592.50.0  148.10.0  0.0  1.00.01.7   0  99 c7t13d1
596.00.0  149.00.0  0.0  1.00.01.7   0  99 c5t14d1
598.50.0  149.60.0  0.0  1.00.01.6   0  98 c6t15d1
618.50.0  154.60.0  0.0  1.00.01.6   0  98 c7t14d1
606.50.0  151.60.0  0.0  1.00.01.6   0  98 c5t15d1
625.00.0  156.30.0  0.0  1.00.01.6   0  98 c7t15d1
  extended device statistics
  r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
620.50.0  155.10.0  0.0  1.00.01.6   0  99 c5t8d1
620.50.0  155.10.0  0.0  1.00.01.6   0  99 c7t8d1
1581.00.0  395.20.0  0.0  1.00.00.6   1  96 c6t8d1
611.50.0  152.90.0  0.0  1.00.01.6   0  99 c6t9d1
587.50.0  146.90.0  0.0  1.00.01.7   0  99 c5t9d1
1580.00.0  395.00.0  0.0  1.00.00.6   1  97 c7t9d1
593.00.0  148.20.0  0.0  1.00.01.7   0  99 c7t10d1
616.00.0  154.00.0  0.0  1.00.01.6   0  99 c6t10d1
601.00.0  150.20.0  0.0  1.00.01.6   0  99 c5t10d1
587.00.0

Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Bob Friesenhahn

On Fri, 4 May 2012, Rocky Shek wrote:



If I were you, I will not use 9240-8I.

I will use 9211-8I as pure HBA with IT FW for ZFS.


Is there IT FW for the 9240-8i?

They seem to use the same SAS chipset.

My next system will have 9211-8i with IT FW.  Playing it safe.  Good 
enough for Nexenta is good enough for me.


Bob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance on LSI 9240-8i?

2012-05-04 Thread Hugues LEPESANT
Hi,

 
We add several bad experience with a LSI card (LSI 3081E, LSI SAS84016E).

Even with Solaris official drivers provided by LSI.

 
Finally we use LSI SAS9201-16i card.

 
http://www.lsi.com/channel/france/products/storagecomponents/Pages/LSISAS9201-16i.aspx

 
This one work as expected on Nexenta and OpenIndiana.

 
 
Best regards,

Hugues

 
-Message initial-
De:Roman Matiyenko rmatiye...@gmail.com
Envoyé:ven. 04-05-2012 14:25
Sujet:[zfs-discuss] ZFS performance on LSI 9240-8i?
À:zfs-discuss@opensolaris.org; 
Hi all,

I have a bad bad problem with our brand new server!

The lengthy details are below but to cut the story short, on the same
hardware (3 x LSI 9240-8i, 20 x 3TB 6gb HDDs) I am getting ZFS
sequential writes of 1.4GB/s on Solaris 10 (20 disks, 10 mirrors) and
only 200-240MB/s on latest Solaris 11.11 (same zpool config). By
writing directly to raw disks I found that in S10 the speed is 140MB/s
sequential writes per disk (consistent with combined 1.4GB/s for my
zpool) whereas only 24MB/s in Solaris 11 (consistent with 240MB/s
zpool, 10 mirrors 24MB/s each).

This must be the controller drivers, right? I downloaded drivers
version 4.7 off LSI site (says for Solaris 10 and later) - they
failed to attach on S11. Version 3.03 worked but the system would
randomly crash, so I moved my experiments off S11 to S10. However, S10
has only the old implementation if iSCSI which gives me other problems
so I decided to give S11 another go.

Would there be any advice in this community?

Many thanks!

Roman

==


root@carbon:~# echo | format | grep Hitachi
      1. c5t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      2. c5t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      3. c5t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      4. c5t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      5. c5t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      6. c5t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      7. c5t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
      9. c6t9d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     10. c6t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     11. c6t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     12. c6t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     13. c6t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     14. c6t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     15. c7t8d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     17. c7t10d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     18. c7t11d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     19. c7t12d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     20. c7t13d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     21. c7t14d1 ATA-Hitachi HUA72303-A5C0-2.73TB
     22. c7t15d1 ATA-Hitachi HUA72303-A5C0-2.73TB



Reading DD from all disks:
(dd of=/dev/null bs=1024kb if=/dev/rdsk/c7t9d1 )

# Iostat –xznM 2

                   extended device statistics
   r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 614.5    0.0  153.6    0.0  0.0  1.0    0.0    1.6   0  98 c5t8d1
 595.5    0.0  148.9    0.0  0.0  1.0    0.0    1.7   0  99 c7t8d1
 1566.5    0.0  391.6    0.0  0.0  1.0    0.0    0.6   1  96 c6t8d1 # (SSD)
 618.5    0.0  154.6    0.0  0.0  1.0    0.0    1.6   0  99 c6t9d1
 616.5    0.0  154.1    0.0  0.0  1.0    0.0    1.6   0  99 c5t9d1
 1564.0    0.0  391.0    0.0  0.0  1.0    0.0    0.6   1  96 c7t9d1# (SSD)
 616.0    0.0  154.0    0.0  0.0  1.0    0.0    1.6   0  98 c7t10d1
 554.0    0.0  138.5    0.0  0.0  1.0    0.0    1.8   0  99 c6t10d1
 598.5    0.0  149.6    0.0  0.0  1.0    0.0    1.7   0  99 c5t10d1
 588.5    0.0  147.1    0.0  0.0  1.0    0.0    1.7   0  98 c6t11d1
 590.5    0.0  147.6    0.0  0.0  1.0    0.0    1.7   0  98 c7t11d1
 591.5    0.0  147.9    0.0  0.0  1.0    0.0    1.7   0  99 c5t11d1
 600.5    0.0  150.1    0.0  0.0  1.0    0.0    1.6   0  98 c6t13d1
 617.5    0.0  154.4    0.0  0.0  1.0    0.0    1.6   0  99 c7t12d1
 611.0    0.0  152.8    0.0  0.0  1.0    0.0    1.6   0  99 c5t13d1
 625.0    0.0  156.3    0.0  0.0  1.0    0.0    1.6   0  99 c6t14d1
 592.5    0.0  148.1    0.0  0.0  1.0    0.0    1.7   0  99 c7t13d1
 596.0    0.0  149.0    0.0  0.0  1.0    0.0    1.7   0  99 c5t14d1
 598.5    0.0  149.6    0.0  0.0  1.0    0.0    1.6   0  98 c6t15d1
 618.5    0.0  154.6    0.0  0.0  1.0    0.0    1.6   0  98 c7t14d1
 606.5    0.0  151.6    0.0  0.0  1.0    0.0    1.6   0  98 c5t15d1
 625.0    0.0  156.3    0.0  0.0  1.0    0.0    1.6   0  98 c7t15d1
                   extended device statistics
   r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 620.5    0.0  155.1    0.0  0.0  1.0    0.0    1.6   0  99 c5t8d1
 620.5    0.0  155.1    0.0  0.0  1.0    0.0    1.6   0  99 c7t8d1
 1581.0    0.0  395.2    0.0  0.0  1.0    0.0    0.6   1  96 c6t8d1
 611.5    0.0  152.9    0.0  0.0  1.0    0.0    1.6   0  99 c6t9d1
 587.5    0.0  146.9    0.0  0.0  1.0    0.0    1.7   0  99 c5t9d1
 1580.0    0.0  395.0    0.0  0.0  1.0    0.0    0.6   1  97 c7t9d1
 593.0    0.0  148.2    0.0  0.0  1.0    0.0    1.7   0  99 c7t10d1
 616.0    0.0  154.0    0.0  0.0  1.0    0.0    1.6   0  99 c6t10d1
 601.0    0.0  150.2    0.0

Re: [zfs-discuss] ZFS performance question over NFS

2011-08-19 Thread Thomas Nau
Hi Bob

 I don't know what the request pattern from filebench looks like but it seems 
 like your ZEUS RAM devices are not keeping up or
 else many requests are bypassing the ZEUS RAM devices.
 
 Note that very large synchronous writes will bypass your ZEUS RAM device and 
 go directly to a log in the main store.  Small (=
 128K) writes should directly benefit from the dedicated zil device.
 
 Find a copy of zilstat.ksh and run it while filebench is running in order to 
 understand more about what is going on.
 
 Bob

The pattern looks like:

   N-Bytes  N-Bytes/s N-Max-RateB-Bytes  B-Bytes/s B-Max-Rateops  =4kB 
4-32kB =32kB
   958865695886569588656   88399872   88399872   88399872 90  0 
 0 90
   666228066622806662280   87031808   87031808   87031808 83  0 
 0 83
   636672863667286366728   72790016   72790016   72790016 79  0 
 0 79
   631635263163526316352   83886080   83886080   83886080 80  0 
 0 80
   668761666876166687616   84594688   84594688   84594688 92  0 
 0 92
   490904849090484909048   69238784   69238784   69238784 73  0 
 0 73
   660528066052806605280   81924096   81924096   81924096 79  0 
 0 79
   689533668953366895336   81625088   81625088   81625088 85  0 
 0 85
   653212865321286532128   87486464   87486464   87486464 90  0 
 0 90
   692513669251366925136   86118400   86118400   86118400 83  0 
 0 83

So does it look good, bad or ugly ;)

Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance question over NFS

2011-08-18 Thread Thomas Nau
Dear all.
We finally got all the parts for our new fileserver following several
recommendations we got over this list. We use

Dell R715, 96GB RAM, dual 8-core Opterons
1 10GE Intel dual-port NIC
2 LSI 9205-8e SAS controllers
2 DataON DNS-1600 JBOD chassis
46 Seagate constellation SAS drives
2 STEC ZEUS RAM


The base zpool config utilizes 42 drives plus the STECs as mirrored
log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2
junks plus as said a dedicated ZIL made of the mirrored STECs.

As a quick'n dirty check we ran filebench with the fileserver
workload. Running locally we get

statfile15476ops/s   0.0mb/s  0.6ms/op  179us/op-cpu
deletefile1  5476ops/s   0.0mb/s  1.0ms/op  454us/op-cpu
closefile3   5476ops/s   0.0mb/s  0.0ms/op5us/op-cpu
readfile15476ops/s 729.5mb/s  0.2ms/op  128us/op-cpu
openfile25477ops/s   0.0mb/s  0.8ms/op  204us/op-cpu
closefile2   5477ops/s   0.0mb/s  0.0ms/op5us/op-cpu
appendfilerand1  5477ops/s  42.8mb/s  0.3ms/op  184us/op-cpu
openfile15477ops/s   0.0mb/s  0.9ms/op  209us/op-cpu
closefile1   5477ops/s   0.0mb/s  0.0ms/op6us/op-cpu
wrtfile1 5477ops/s 688.4mb/s  0.4ms/op  220us/op-cpu
createfile1  5477ops/s   0.0mb/s  2.7ms/op 1068us/op-cpu



with a single remote client (similar Dell System) using NFS

statfile1  90ops/s   0.0mb/s 27.6ms/op  145us/op-cpu
deletefile190ops/s   0.0mb/s 64.5ms/op  401us/op-cpu
closefile3 90ops/s   0.0mb/s 25.8ms/op   40us/op-cpu
readfile1  90ops/s  11.4mb/s  3.1ms/op  363us/op-cpu
openfile2  90ops/s   0.0mb/s 66.0ms/op  263us/op-cpu
closefile2 90ops/s   0.0mb/s 22.6ms/op  124us/op-cpu
appendfilerand190ops/s   0.7mb/s  0.5ms/op  101us/op-cpu
openfile1  90ops/s   0.0mb/s 72.6ms/op  269us/op-cpu
closefile1 90ops/s   0.0mb/s 43.6ms/op  189us/op-cpu
wrtfile1   90ops/s  11.2mb/s  0.2ms/op  211us/op-cpu
createfile190ops/s   0.0mb/s226.5ms/op  709us/op-cpu



the same remote client with zpool sync disabled on the server

statfile1 479ops/s   0.0mb/s  6.2ms/op  130us/op-cpu
deletefile1   479ops/s   0.0mb/s 13.0ms/op  351us/op-cpu
closefile3480ops/s   0.0mb/s  3.0ms/op   37us/op-cpu
readfile1 480ops/s  62.7mb/s  0.8ms/op  174us/op-cpu
openfile2 480ops/s   0.0mb/s 14.1ms/op  235us/op-cpu
closefile2480ops/s   0.0mb/s  6.0ms/op  123us/op-cpu
appendfilerand1   480ops/s   3.7mb/s  0.2ms/op   53us/op-cpu
openfile1 480ops/s   0.0mb/s 13.7ms/op  235us/op-cpu
closefile1480ops/s   0.0mb/s 11.1ms/op  190us/op-cpu
wrtfile1  480ops/s  60.3mb/s  0.2ms/op  233us/op-cpu
createfile1   480ops/s   0.0mb/s 35.6ms/op  683us/op-cpu


Disabling ZIL is no option but I expected a much better performance
especially the ZEUS RAM only gets us a speed-up of about 1.8x

Is this test realistic for a typical fileserver scenario or does it require many
more clients to push the limits?

Thanks
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance question over NFS

2011-08-18 Thread Tim Cook
What are the specs on the client?
On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote:
 Dear all.
 We finally got all the parts for our new fileserver following several
 recommendations we got over this list. We use

 Dell R715, 96GB RAM, dual 8-core Opterons
 1 10GE Intel dual-port NIC
 2 LSI 9205-8e SAS controllers
 2 DataON DNS-1600 JBOD chassis
 46 Seagate constellation SAS drives
 2 STEC ZEUS RAM


 The base zpool config utilizes 42 drives plus the STECs as mirrored
 log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2
 junks plus as said a dedicated ZIL made of the mirrored STECs.

 As a quick'n dirty check we ran filebench with the fileserver
 workload. Running locally we get

 statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu
 deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu
 closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
 readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu
 openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu
 closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
 appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu
 openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu
 closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu
 wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu
 createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu



 with a single remote client (similar Dell System) using NFS

 statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu
 deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu
 closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu
 readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu
 openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu
 closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu
 appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu
 openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu
 closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu
 wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu
 createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu



 the same remote client with zpool sync disabled on the server

 statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu
 deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu
 closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu
 readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu
 openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu
 closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu
 appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu
 openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu
 closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu
 wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu
 createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu


 Disabling ZIL is no option but I expected a much better performance
 especially the ZEUS RAM only gets us a speed-up of about 1.8x

 Is this test realistic for a typical fileserver scenario or does it
require many
 more clients to push the limits?

 Thanks
 Thomas
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance question over NFS

2011-08-18 Thread Thomas Nau
Tim
the client is identical as the server but no SAS drives attached.
Also right now only one 1gbit Intel NIC Is available

Thomas


Am 18.08.2011 um 17:49 schrieb Tim Cook t...@cook.ms:

 What are the specs on the client?
 
 On Aug 18, 2011 10:28 AM, Thomas Nau thomas@uni-ulm.de wrote:
  Dear all.
  We finally got all the parts for our new fileserver following several
  recommendations we got over this list. We use
  
  Dell R715, 96GB RAM, dual 8-core Opterons
  1 10GE Intel dual-port NIC
  2 LSI 9205-8e SAS controllers
  2 DataON DNS-1600 JBOD chassis
  46 Seagate constellation SAS drives
  2 STEC ZEUS RAM
  
  
  The base zpool config utilizes 42 drives plus the STECs as mirrored
  log devices. The Seagates are setup as a stripe of 7 times 6-drive-RAIDZ2
  junks plus as said a dedicated ZIL made of the mirrored STECs.
  
  As a quick'n dirty check we ran filebench with the fileserver
  workload. Running locally we get
  
  statfile1 5476ops/s 0.0mb/s 0.6ms/op 179us/op-cpu
  deletefile1 5476ops/s 0.0mb/s 1.0ms/op 454us/op-cpu
  closefile3 5476ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
  readfile1 5476ops/s 729.5mb/s 0.2ms/op 128us/op-cpu
  openfile2 5477ops/s 0.0mb/s 0.8ms/op 204us/op-cpu
  closefile2 5477ops/s 0.0mb/s 0.0ms/op 5us/op-cpu
  appendfilerand1 5477ops/s 42.8mb/s 0.3ms/op 184us/op-cpu
  openfile1 5477ops/s 0.0mb/s 0.9ms/op 209us/op-cpu
  closefile1 5477ops/s 0.0mb/s 0.0ms/op 6us/op-cpu
  wrtfile1 5477ops/s 688.4mb/s 0.4ms/op 220us/op-cpu
  createfile1 5477ops/s 0.0mb/s 2.7ms/op 1068us/op-cpu
  
  
  
  with a single remote client (similar Dell System) using NFS
  
  statfile1 90ops/s 0.0mb/s 27.6ms/op 145us/op-cpu
  deletefile1 90ops/s 0.0mb/s 64.5ms/op 401us/op-cpu
  closefile3 90ops/s 0.0mb/s 25.8ms/op 40us/op-cpu
  readfile1 90ops/s 11.4mb/s 3.1ms/op 363us/op-cpu
  openfile2 90ops/s 0.0mb/s 66.0ms/op 263us/op-cpu
  closefile2 90ops/s 0.0mb/s 22.6ms/op 124us/op-cpu
  appendfilerand1 90ops/s 0.7mb/s 0.5ms/op 101us/op-cpu
  openfile1 90ops/s 0.0mb/s 72.6ms/op 269us/op-cpu
  closefile1 90ops/s 0.0mb/s 43.6ms/op 189us/op-cpu
  wrtfile1 90ops/s 11.2mb/s 0.2ms/op 211us/op-cpu
  createfile1 90ops/s 0.0mb/s 226.5ms/op 709us/op-cpu
  
  
  
  the same remote client with zpool sync disabled on the server
  
  statfile1 479ops/s 0.0mb/s 6.2ms/op 130us/op-cpu
  deletefile1 479ops/s 0.0mb/s 13.0ms/op 351us/op-cpu
  closefile3 480ops/s 0.0mb/s 3.0ms/op 37us/op-cpu
  readfile1 480ops/s 62.7mb/s 0.8ms/op 174us/op-cpu
  openfile2 480ops/s 0.0mb/s 14.1ms/op 235us/op-cpu
  closefile2 480ops/s 0.0mb/s 6.0ms/op 123us/op-cpu
  appendfilerand1 480ops/s 3.7mb/s 0.2ms/op 53us/op-cpu
  openfile1 480ops/s 0.0mb/s 13.7ms/op 235us/op-cpu
  closefile1 480ops/s 0.0mb/s 11.1ms/op 190us/op-cpu
  wrtfile1 480ops/s 60.3mb/s 0.2ms/op 233us/op-cpu
  createfile1 480ops/s 0.0mb/s 35.6ms/op 683us/op-cpu
  
  
  Disabling ZIL is no option but I expected a much better performance
  especially the ZEUS RAM only gets us a speed-up of about 1.8x
  
  Is this test realistic for a typical fileserver scenario or does it require 
  many
  more clients to push the limits?
  
  Thanks
  Thomas
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance question over NFS

2011-08-18 Thread Bob Friesenhahn

On Thu, 18 Aug 2011, Thomas Nau wrote:


Tim
the client is identical as the server but no SAS drives attached.
Also right now only one 1gbit Intel NIC Is available


I don't know what the request pattern from filebench looks like but it 
seems like your ZEUS RAM devices are not keeping up or else many 
requests are bypassing the ZEUS RAM devices.


Note that very large synchronous writes will bypass your ZEUS RAM 
device and go directly to a log in the main store.  Small (= 128K) 
writes should directly benefit from the dedicated zil device.


Find a copy of zilstat.ksh and run it while filebench is running in 
order to understand more about what is going on.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance falls off a cliff

2011-05-13 Thread Aleksandr Levchuk
sirket, could you please share your OS, zfs, and zpool versions?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance falls off a cliff

2011-05-13 Thread Don
~# uname -a
SunOS nas01a 5.11 oi_147 i86pc i386 i86pc Solaris

~# zfs get version pool0
NAME   PROPERTY  VALUESOURCE
pool0  version   5-

~# zpool get version pool0
NAME   PROPERTY  VALUESOURCE
pool0  version   28   default
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-28 Thread Torrey McMahon

On 2/25/2011 4:15 PM, Torrey McMahon wrote:

On 2/25/2011 3:49 PM, Tomas Ögren wrote:

On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes:


  Hi All,

  In reading the ZFS Best practices, I'm curious if this statement is
  still true about 80% utilization.

It happens at about 90% for me.. all of a sudden, the mail server got
butt slow.. killed an old snapshot to get to 85% free or so, then it got
snappy again. S10u9 sparc.


Some of the recent updates have pushed the 80% watermark closer to 90% 
for most workloads.


Sorry folks. I was thinking of yet an other change that was in the 
allocation algorithms. 80% is number to stick with.


... now where did I put my cold medicine? :)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-28 Thread Brandon High
On Sun, Feb 27, 2011 at 7:35 PM, Brandon High bh...@freaks.com wrote:
 It moves from best fit to any fit at a certain point, which is at
 ~ 95% (I think). Best fit looks for a large contiguous space to avoid
 fragmentation while any fit looks for any free space.

I got the terminology wrong, it's first-fit when there is space,
moving to best-fit at 96% full.

See 
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c
for details.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-27 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Blasingame Oracle
 
 Keep pool space under 80% utilization to maintain pool performance.

For what it's worth, the same is true for any other filesystem too.  What
really matters is the availability of suitably large sized unused sections
of the hard drive.  The larger the total space in your storage, the higher
the percentage of used can be, while maintaining enough unused space to
perform reasonably well.  The more sequential your IO operations are, the
less fragmentation you'll experience, and the less a problem there will be.
If your workload is highly random, with a mixture of large  small
operations, with lots of snapshots being created and destroyed all the time,
then you'll be fragmenting the drive quite a lot and experience this more.

The 80% or 90% thing is just a rule of thumb.  But you positively DON'T want
to hit 100% full.  I've had this happen and been required to power cycle and
remove things in single user mode in order to bring it back up.  It's not as
if 100% full is certain to cause a problem...  I can look up details if
someone wants to know...  There is a specific condition that only occurs
sometimes when 100% full, which essentially makes the system unusable.

But there is one specific thing, isn't there?  Where ZFS will choose to use
a different algorithm for something, when pool usage exceeds some threshold.
Right?  What is that?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-27 Thread Roy Sigurd Karlsbakk
 In reading the ZFS Best practices, I'm curious if this statement is
 still true about 80% utilization.

It is, and in my experience, it doesn't matter much if you have a full pool and 
add another VDEV, the existing VDEVs will be full still, and performance will 
be slow. For this reason, new systems are setup with more, smaller drives to 
help upgrade later by replacing the drives with larger ones. Hopefully, we 
might see block pointer rewrite some time in the future to help rebalance pools.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-27 Thread Brandon High
On Sun, Feb 27, 2011 at 6:59 AM, Edward Ned Harvey
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
 But there is one specific thing, isn't there?  Where ZFS will choose to use
 a different algorithm for something, when pool usage exceeds some threshold.
 Right?  What is that?

It moves from best fit to any fit at a certain point, which is at
~ 95% (I think). Best fit looks for a large contiguous space to avoid
fragmentation while any fit looks for any free space.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-27 Thread Toby Thain
On 27/02/11 9:59 AM, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Blasingame Oracle

 Keep pool space under 80% utilization to maintain pool performance.
 
 For what it's worth, the same is true for any other filesystem too. 

I would expect COW puts more pressure on near-full behaviour compared to
write-in-place filesystems. If that's not true, somebody correct me.

--Toby

 What
 really matters is the availability of suitably large sized unused sections
 of the hard drive. ...
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-27 Thread Eric D. Mudama

On Mon, Feb 28 at  0:30, Toby Thain wrote:

I would expect COW puts more pressure on near-full behaviour compared to
write-in-place filesystems. If that's not true, somebody correct me.


Off the top of my head, I think it'd depend on the workload.

Write-in-place will always be faster with large IOs than with smaller
IOs, and write-in-place will always be faster than CoW with large
enough IO because there's no overhead for choosing where the write
goes (and with large enough IO, seek overhead ~= 0)

With CoW, it probably matters more what the previous version of the
LBAs you're overwriting looked like, plus how fragmented the free
space is.  Into a device with plenty of free space, small writes
should be significantly faster than write-in-place.

--eric

--
Eric D. Mudama
edmud...@bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Performance

2011-02-25 Thread David Blasingame Oracle

Hi All,

In reading the ZFS Best practices, I'm curious if this statement is 
still true about 80% utilization.


from :  
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide





 
http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage
 Pool Performance Considerations

.
Keep pool space under 80% utilization to maintain pool performance. 
Currently, pool performance can degrade when a pool is very full and 
file systems are updated frequently, such as on a busy mail server. Full 
pools might cause a performance penalty, but no other issues.




Dave


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-25 Thread Cindy Swearingen
Hi Dave,

Still true.

Thanks,

Cindy

On 02/25/11 13:34, David Blasingame Oracle wrote:
 Hi All,
 
 In reading the ZFS Best practices, I'm curious if this statement is 
 still true about 80% utilization.
 
 from :  
 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
 
 
 
 
   
 http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage
   Pool Performance Considerations
 
 .
 Keep pool space under 80% utilization to maintain pool performance. 
 Currently, pool performance can degrade when a pool is very full and 
 file systems are updated frequently, such as on a busy mail server. Full 
 pools might cause a performance penalty, but no other issues.
 
 
 
 Dave
 
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-25 Thread Tomas Ögren
On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes:

 Hi All,

 In reading the ZFS Best practices, I'm curious if this statement is  
 still true about 80% utilization.

It happens at about 90% for me.. all of a sudden, the mail server got
butt slow.. killed an old snapshot to get to 85% free or so, then it got
snappy again. S10u9 sparc.

 from :   
 http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

 


  
 http://www.solarisinternals.com/wiki/index.php?title=ZFS_Best_Practices_Guideaction=editsection=12Storage
  Pool Performance Considerations

 .
 Keep pool space under 80% utilization to maintain pool performance.  
 Currently, pool performance can degrade when a pool is very full and  
 file systems are updated frequently, such as on a busy mail server. Full  
 pools might cause a performance penalty, but no other issues.

 

 Dave



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance

2011-02-25 Thread Torrey McMahon

On 2/25/2011 3:49 PM, Tomas Ögren wrote:

On 25 February, 2011 - David Blasingame Oracle sent me these 2,6K bytes:


  Hi All,

  In reading the ZFS Best practices, I'm curious if this statement is
  still true about 80% utilization.

It happens at about 90% for me.. all of a sudden, the mail server got
butt slow.. killed an old snapshot to get to 85% free or so, then it got
snappy again. S10u9 sparc.


Some of the recent updates have pushed the 80% watermark closer to 90% 
for most workloads.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance questions

2010-11-24 Thread Don
I have an OpenSolaris (technically OI 147) box running ZFS with Comstar (zpool 
version 28, zfs version 5)

The box is a 2950 with 32 GB of RAM, Dell SAS5/e card connected to 6 Promise 
vTrak J610sD (dual controller SAS) disk shelves spread across both channels of 
the card (2 chains of 3 shelves).

We currently have:
4 x OCZ Vertex 2 SSD's configured as a ZIL (We've been experimenting without a 
dedicated ZIL, with 2 mirrors, and with 4 individual drives- these are not 
meant to be a permanent part of the array- they were installed to evaluate 
limited SSD benefits)
2 x 300GB 15k RPM Hot Spare drives- one on each channel
2 x 600GB 15k RPM Hot Spare drives- one on each channel
52 x 300GB 15k RPM disks configured as 4 Disk RAIDz (13 zdevs)
20 x 600GB 15k RPM disks configured as 4 disk RAIDz (5 zdevs)

(Eventually there will be 16 more 600GB disks -4 more zdevs for a total of 22 
zdevs)

Most of our disk access is through COMSTAR via iSCSI. That said- even 
performance tests direct to the local disks reveal good, but not great 
performance.

Most of our sequential write performance tests show about 200 MB/sec to the 
storage which seems pretty low given the disk's and their individual 
performance.

I'd love to have configured the disks as mirrors but I needed a minimum of 20 
TB in the space provided and I could not achieve that when using mirrors.

Can anyone provide a link to good performance analysis resources so I can try 
to track down where my limited write performance is coming from?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance Tuning

2010-08-04 Thread TAYYAB REHMAN
Hi,
i am working with ZFS now a days, i am facing some performance issues
from application team, as they said writes are very slow in ZFS w.r.t UFS.
Kindly send me some good reference or books links. i will be very thankful
to you.

BR,
Tayyab
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance Tuning

2010-08-04 Thread Richard Elling
On Aug 4, 2010, at 3:22 AM, TAYYAB REHMAN wrote:
 Hi,
 i am working with ZFS now a days, i am facing some performance issues 
 from application team, as they said writes are very slow in ZFS w.r.t UFS. 
 Kindly send me some good reference or books links. i will be very thankful to 
 you.

Hi Tayyab, 
Please start with the ZFS Best Practices Guide.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs performance issue

2010-05-10 Thread Abhishek Gupta

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 
with a few slices on a single disk. I was expecting a good read/write 
performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance issue

2010-05-10 Thread Erik Trimble

Abhishek Gupta wrote:

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created 
raidz2 with a few slices on a single disk. I was expecting a good 
read/write performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.


You absolutely DON'T want to do what you've done.  Creating a ZFS pool 
(or, for that matter, any RAID device,whether hardware or software) out 
of slices/partitions of a single disk is a recipe for horrible performance.


In essence, you reduce your performance to 1/N (or worse) of the whole 
disk, where N is the number of slices you created.


So, create your zpool using disks or partitions from different disks.  
It's OK to have more than one partition on a disk - just use them in 
different pools for reasonable performance.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-04-19 Thread Richard Skelton
 On 18/03/10 08:36 PM, Kashif Mumtaz wrote:
  Hi,
  I did another test on both machine. And write
 performance on ZFS extraordinary slow.
 
 Which build are you running?
 
 On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my
 desktop), I
 see these results:
 
 
 $ time dd if=/dev/zero of=test.dbf bs=8k
 count=1048576
 1048576+0 records in
 1048576+0 records out
 
 real  0m28.224s
 user  0m0.490s
 sys   0m19.061s
 
 This is a dataset on a straight mirrored pool, using
 two SATA2
 drives (320Gb Seagate).
On my Ultra24 with two mirrored 1Tb WD drives 8gb memory and snv_125

I only get :-
rich: ptime dd if=/dev/zero of=test.dbf bs=8k count=1048576
1048576+0 records in
1048576+0 records out

real 1:44.352133699
user0.444280089
sys13.526079085
rich: uname -a
SunOS ultra24 5.11 snv_125 i86pc i386 i86pc
rich: zpool status tank
  pool: tank
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed after 0h30m with 0 errors on Mon Apr 19 02:36:08
2010
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0

errors: No known data errors
rich: ipstat -En c1t3d0
ipstat: Command not found.
rich: iostat -En c1t3d0
c1t3d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD1001FALS-0 Revision: 0K05 Serial No:
Size: 1000.20GB 1000204886016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 4264 Predictive Failure Analysis: 0

rich: psrinfo -v
Status of virtual processor 0 as of: 04/19/2010 14:23:42
  on-line since 12/16/2009 21:56:59.
  The i386 processor operates at 3000 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 1 as of: 04/19/2010 14:23:42
  on-line since 12/16/2009 21:57:03.
  The i386 processor operates at 3000 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 2 as of: 04/19/2010 14:23:42
  on-line since 12/16/2009 21:57:03.
  The i386 processor operates at 3000 MHz,
and has an i387 compatible floating point processor.
Status of virtual processor 3 as of: 04/19/2010 14:23:42
  on-line since 12/16/2009 21:57:03.
  The i386 processor operates at 3000 MHz,
and has an i387 compatible floating point processor.



Why are my drives so slow?



 
 $ time dd if=test.dbf bs=8k of=/dev/null
 1048576+0 records in
 1048576+0 records out
 
 real  0m5.749s
 user  0m0.458s
 sys   0m5.260s
 
 
 James C. McPherson
 --
 Senior Software Engineer, Solaris
 Sun Microsystems
 http://www.jmcp.homeunix.com/blog
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-22 Thread Kashif Mumtaz
hi, Thanks for all the reply.

I have found the real culprit.
Hard disk was faulty. I  changed the hard disk.And now ZFS performance is much 
better.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Kashif Mumtaz
Hi,
I did another test on both machine. And write performance on ZFS extraordinary 
slow.
I did the following test  on both machines
For write  
time dd if=/dev/zero of=test.dbf bs=8k count=1048576
For read  
time dd if=/testpool/test.dbf of=/dev/null bs=8k

ZFS machine has 32GB memory
UFS machine  has 16GB memory


   UFS machine test ###

time dd if=/dev/zero of=test.dbf bs=8k count=1048576

1048576+0 records in
1048576+0 records out

real2m18.352s
user0m5.080s
sys 1m44.388s

#iostat -xnmpz 10

 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.6  107.94.8 62668.4  0.0  6.70.1   61.9   1  83 c0t0d0
0.00.20.00.2  0.0  0.00.00.8   0   0 c0t0d0s5
0.6  107.74.8 62668.2  0.0  6.70.1   62.0   1  83 c0t0d0s7


For read 
# time dd if=test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real1m21.285s
user0m4.701s
sys 1m15.322s


For write it took 2.18 minutes and for read it took 1.21 minutes.

## ZFS machine test ##

# time dd if=/dev/zero of=test.dbf bs=8k count=1048576

1048576+0 records in
1048576+0 records out

real140m33.590s
user0m5.182s
sys 2m33.025s


extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.08.20.0 1037.0  0.0 33.30.0 4062.3   0 100 c0t0d0
  0.08.20.0 1037.0  0.0 33.30.0 4062.3   0 100 c0t0d0s0


-
For read
#time dd if=test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real0m59.177s
user0m4.471s
sys 0m54.723s

For write it took  140 minutes and for read 59 seconds(less then UFS)



-
In ZFS data was being write around 1037 kw/s while disk remain busy 100%.

In UFS data was being written around 62668 kw/s while disk is busy at 83%


Kindly help me how can I tune the writing performance on ZFS?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread James C. McPherson

On 18/03/10 08:36 PM, Kashif Mumtaz wrote:

Hi,
I did another test on both machine. And write performance on ZFS extraordinary 
slow.


Which build are you running?

On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my desktop), I
see these results:


$ time dd if=/dev/zero of=test.dbf bs=8k count=1048576
1048576+0 records in
1048576+0 records out

real0m28.224s
user0m0.490s
sys 0m19.061s

This is a dataset on a straight mirrored pool, using two SATA2
drives (320Gb Seagate).

$ time dd if=test.dbf bs=8k of=/dev/null
1048576+0 records in
1048576+0 records out

real0m5.749s
user0m0.458s
sys 0m5.260s


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Kashif Mumtaz
Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine 
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

 Solaris 10 8/07 s10s_u4wos_12b SPARC
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Daniel Carosone
On Thu, Mar 18, 2010 at 03:36:22AM -0700, Kashif Mumtaz wrote:
 I did another test on both machine. And write performance on ZFS 
 extraordinary slow.
 -
 In ZFS data was being write around 1037 kw/s while disk remain busy 100%.

That is, as you say, such an extraordinarily slow number that we have
to start at the very basics and eliminate fundamental problems. 

I have seen disks go bad in a way that they simply become very very
slow. You need to be sure that this isn't your problem.  Or perhaps
there's some hardware issue when the disks are used in parallel?

Check all the cables and connectors. Check logs for any errors.

Do you have the opportunity to try testing write speed with dd to the
raw disks?  If the pool is mirrored, can you detach one side at a
time? Test the detached disk with dd, and the pool with the other
disk, one at a time and then concurrently.  One slow disk will slow
down the mirror (but I don't recall seeing such an imbalance in your
iostat output either).

Do you have some spare disks to try other tests with? Try a ZFS
install on those, and see they also have the problem. Try a UFS
install on the current disks, and see if they still have the
problem.  Can you swap the disks between the T1000s and see if the
problem stays with the disks or the chassis?

You have a gremlin to hunt...

--
Dan.


pgprooWSK6vzu.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 21:31, Daniel Carosone wrote:

 You have a gremlin to hunt...

Wouldn't Sun help here? ;)

(sorry couldn't help myself, I've spent a week hunting gremlins until I
hit the brick wall of the MPT problem)

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuikAIACgkQSBMQn1jNM7ZHpQCgn15+EsQzafhJw1HnhBWlTW9X
STUAoPvVS4bfq3E3N3Vg7JCuQ3M5+Am6
=YSRa
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread James C. McPherson

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Erik Trimble

James C. McPherson wrote:

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
I would generally agree with James, with the caveaut that you could try 
to update to something a bit latter than Update 4.  That's pretty 
early-on in the ZFS deployment in Solaris 10.


At the minimum, grab the latest Recommended Patch set and apply that, 
then see what your issues are.




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Erik Trimble

Erik Trimble wrote:

James C. McPherson wrote:

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
I would generally agree with James, with the caveaut that you could 
try to update to something a bit latter than Update 4.  That's pretty 
early-on in the ZFS deployment in Solaris 10.


At the minimum, grab the latest Recommended Patch set and apply that, 
then see what your issues are.






Oh, nevermind. I'm an idiot.  I was looking at the UFS machine.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Kashif Mumtaz
hi ,

I'm using sun T1000 machines one machine is installed Solaris 10 with UFS and 
other system with ZFS file system , ZFS machine is performing slow . Running 
following commands on both systems shows Disk get busy immediatly to 100% 

 ZFS MACHINE
find /  /dev/null 21 
iostat -xnmpz 5
[r...@zfs-serv ktahir]# iostat -xnmpz 5
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.40.2   12.32.2  0.0  0.06.53.9   0   0 c0d0
0.00.00.00.0  0.0  0.00.00.9   0   0 
192.168.150.131:/export/home2
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   86.40.0 5527.40.0  0.0  1.00.0   11.2   0  97 c0d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   87.40.0 5593.70.0  0.0  1.00.0   11.1   0  96 c0d0
extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   85.20.0 5452.80.0  0.0  1.00.0   11.3   0  96 c0d0


but on UFS file system averge busy is 50% ,

any idea why ZFS makes disk more busy ?

any idea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Bob Friesenhahn

On Wed, 17 Mar 2010, Kashif Mumtaz wrote:


but on UFS file system averge busy is 50% ,

any idea why ZFS makes disk more busy ?


Clearly there are many more reads per second occuring on the zfs 
filesystem than the ufs filesystem.  Assuming that the 
application-level requests are really the same, this suggests that the 
system does not have enough RAM installed in order to cache the 
working set.  Another issue could be fileystem block size since zfs 
defaults the block size to 128K but some applications (e.g. database) 
work better with 4K, 8K, or 16K block size.


Regardless, I suggest measuring the statistics with a 30 second 
interval rather than 5 seconds since zfs is assured to do whatever it 
does within 30 seconds.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-17 Thread Daniel Carosone
On Wed, Mar 17, 2010 at 10:15:53AM -0500, Bob Friesenhahn wrote:
 Clearly there are many more reads per second occuring on the zfs  
 filesystem than the ufs filesystem.

yes

 Assuming that the application-level requests are really the same

From the OP, the workload is a find /.

So, ZFS makes the disks busier.. but is it find'ing faster as a
result, or doing more reads per found file?   The ZFS io pipeline will
be able to use the cpu concurrency of the T1000 better than UFS, even
for a single-threaded find, and may just be issuing IO faster.

Count the number of lines printed and divide by the time taken to
compare whether the extra work being done is producing extra output or
not.

However, it might also be worthwhile to look for a better / more
representative benchmark and compare further using that.

Also, to be clear, could you clarify whether the problem you see
that the numbers in iostat are larger, or that find runs slower, or
that other processes are more impacted by find?

 this suggests that the system does not have 
 enough RAM installed in order to cache the working set.  

Possibly, yes. 

 Another issue 
 could be fileystem block size since zfs defaults the block size to 128K 
 but some applications (e.g. database) work better with 4K, 8K, or 16K 
 block size.

Unlikely to be relevant to fs metadata for find.

 Regardless, I suggest measuring the statistics with a 30 second interval 
 rather than 5 seconds since zfs is assured to do whatever it does within 
 30 seconds.

Relevant for write benchmarks more so than read.

--
Dan.


pgpXx2JnRah30.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-20 Thread Edward Ned Harvey
 ZFS has intelligent prefetching.  AFAIK, Solaris disk drivers do not
 prefetch.

Can you point me to any reference?  I didn't find anything stating yay or
nay, for either of these.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-20 Thread Edward Ned Harvey
 Doesn't this mean that if you enable write back, and you have
 a single, non-mirrored raid-controller, and your raid controller
 dies on you so that you loose the contents of the nvram, you have
 a potentially corrupt file system?

It is understood, that any single point of failure could result in failure,
yes.  If you have a CPU that performs miscalculations, makes mistakes, it
can instruct bad things to be written to disk (I've had something like that
happen before.)  If you have RAM with bit errors in it that go undetected,
you can have corrupted memory, and if that memory is destined to write to
disk, you'll have bad data written to disk.  If you have a non-redundant
raid controller, which buffers writes, and the buffer gets destroyed or
corrupted before the writes are put to disk, then the data has become
corrupt.  Heck, the same is true even with redundant raid controllers, if
there are memory errors in one that go undetected.

So you'll have to do your own calculation.  Which is worse?
- Don't get the benefit of accelerated hardware, for all the time that the
hardware is functioning correctly,
Or
- Take the risk of acceleration, with possibility the accelerator could fail
and cause harm to the data it was working on.

I know I always opt for using the raid write-back.  If I ever have a
situation where I'm so scared of the raid card corrupting data, I would be
equally scared of the CPU or SAS bus or system ram or whatever.  In that
case, I'd find a solution that makes entire machines redundant, rather than
worrying about one little perc card.

Yes it can happen.  I've seen it happen.  But not just to raid cards;
everything else is vulnerable too.

I'll take a 4x performance improvement for 99.999% of the time, and risk the
corruption the rest of the time.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Edward Ned Harvey
One more thing I¹d like to add here:

The PERC cache measurably and significantly accelerates small disk writes.
However, for read operations, it is insignificant compared to system ram,
both in terms of size and speed.  There is no significant performance
improvement by enabling adaptive readahead in the PERC.  I will recommend
instead, the PERC should be enabled for Write Back, and have the readahead
disabled.  Fortunately this is the default configuration on a new perc
volume, so unless you changed it, you should be fine.

It may be smart to double check, and ensure your OS does adaptive readahead.

In Linux (rhel/centos) you can check that the ³readahead² service is
loading.  I noticed this is enabled by default in runlevel 5, but disabled
by default in runlevel 3.  Interesting.

I don¹t know how to check solaris or opensolaris, to ensure adaptive
readahead is enabled.




On 2/18/10 8:08 AM, Edward Ned Harvey sola...@nedharvey.com wrote:

 Ok, I¹ve done all the tests I plan to complete.  For highest performance, it
 seems:
 ·The measure I think is the most relevant for typical operation is the
 fastest random read /write / mix.  (Thanks Bob, for suggesting I do this
 test.)
 The winner is clearly striped mirrors in ZFS
 
 ·The fastest sustained sequential write is striped mirrors via ZFS, or
 maybe raidz
 
 ·The fastest sustained sequential read is striped mirrors via ZFS, or
 maybe raidz
 
  
 Here are the results:
 ·Results summary of Bob's method
 
http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.pd
f
 
 ·Raw results of Bob's method
 http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip
 
 ·Results summary of Ned's method
 
http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.pd
f
 
 ·Raw results of Ned's method
 http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip
 
  
  
  
  
  
 
 From: Edward Ned Harvey [mailto:sola...@nedharvey.com]
 Sent: Saturday, February 13, 2010 9:07 AM
 To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org
 Subject: ZFS performance benchmarks in various configurations
  
 I have a new server, with 7 disks in it.  I am performing benchmarks on it
 before putting it into production, to substantiate claims I make, like
 ³striping mirrors is faster than raidz² and so on.  Would anybody like me to
 test any particular configuration? Unfortunately I don¹t have any SSD, so I
 can¹t do any meaningful test on the ZIL etc.  Unless someone in the Boston
 area has a 2.5² SAS SSD they wouldn¹t mind lending for a few hours.  ;-)
  
 My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I
 pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks
 when the file operations aren¹t all cached.)  ;-)  Solaris 10 10/09.  PERC 6/i
 controller.  All disks are configured in PERC for Adaptive ReadAhead, and
 Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is occupying 1
 disk, so I have 6 disks to play with.
  
 I am currently running the following tests:
  
 Will test, including the time to flush(), various record sizes inside file
 sizes up to 16G, sequential write and sequential read. Not doing any mixed
 read/write requests.  Not doing any random read/write.
 iozone -Reab somefile.wks -g 17G -i 1 -i 0
  
 Configurations being tested:
 ·Single disk
 
 ·2-way mirror
 
 ·3-way mirror
 
 ·4-way mirror
 
 ·5-way mirror
 
 ·6-way mirror
 
 ·Two mirrors striped (or concatenated)
 
 ·Three mirrors striped (or concatenated)
 
 ·5-disk raidz
 
 ·6-disk raidz
 
 ·6-disk raidz2
 
  
 Hypothesized results:
 ·N-way mirrors write at the same speed of a single disk
 
 ·N-way mirrors read n-times faster than a single disk
 
 ·Two mirrors striped read and write 2x faster than a single mirror
 
 ·Three mirrors striped read and write 3x faster than a single mirror
 
 ·Raidz and raidz2:  No hypothesis. Some people say they perform
 comparable to many disks working together. Some people say it¹s slower than a
 single disk.  Waiting to see the results.
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Günther
hello
i have made some benchmarks with my napp-it zfs-serverbr
a href=http://www.napp-it.org/bench.pdf; target=_blankscreenshot/abr
br
a href=http://www.napp-it.org/bench.pdf; 
target=_blankwww.napp-it.org/bench.pdf/abr
br
- 2gb vs 4 gb vs 8 gb rambr
- mirror vs raidz vs raidz2 vs raidz3br
- dedup and compress enabled vs disabledbr
br
result in short:br
8gb ram vs 2 Gb: + 10% .. +500% more power (green drives)br
compress and dedup enabled: + 50% .. +300%br
mirror vs Raidz: fastest is raidz, slowest mirror, raidz level +/-20%br
br
br
gea
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Richard Elling
On Feb 19, 2010, at 8:35 AM, Edward Ned Harvey wrote:
 One more thing I’d like to add here:
 
 The PERC cache measurably and significantly accelerates small disk writes.  
 However, for read operations, it is insignificant compared to system ram, 
 both in terms of size and speed.  There is no significant performance 
 improvement by enabling adaptive readahead in the PERC.  I will recommend 
 instead, the PERC should be enabled for Write Back, and have the readahead 
 disabled.  Fortunately this is the default configuration on a new perc 
 volume, so unless you changed it, you should be fine.
 
 It may be smart to double check, and ensure your OS does adaptive readahead.  
 In Linux (rhel/centos) you can check that the “readahead” service is loading. 
  I noticed this is enabled by default in runlevel 5, but disabled by default 
 in runlevel 3.  Interesting.
 
 I don’t know how to check solaris or opensolaris, to ensure adaptive 
 readahead is enabled.

ZFS has intelligent prefetching.  AFAIK, Solaris disk drivers do not prefetch.

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Ragnar Sundblad

On 19 feb 2010, at 17.35, Edward Ned Harvey wrote:

 The PERC cache measurably and significantly accelerates small disk writes.  
 However, for read operations, it is insignificant compared to system ram, 
 both in terms of size and speed.  There is no significant performance 
 improvement by enabling adaptive readahead in the PERC.  I will recommend 
 instead, the PERC should be enabled for Write Back, and have the readahead 
 disabled.  Fortunately this is the default configuration on a new perc 
 volume, so unless you changed it, you should be fine.

If I understand correctly, ZFS now adays will only flush data to
non volatile storage (such as a RAID controller NVRAM), and not
all the way out to disks. (To solve performance problems with some
storage systems, and I believe that it also is the right thing
to do under normal circumstances.)

Doesn't this mean that if you enable write back, and you have
a single, non-mirrored raid-controller, and your raid controller
dies on you so that you loose the contents of the nvram, you have
a potentially corrupt file system?

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-19 Thread Neil Perrin



If I understand correctly, ZFS now adays will only flush data to
non volatile storage (such as a RAID controller NVRAM), and not
all the way out to disks. (To solve performance problems with some
storage systems, and I believe that it also is the right thing
to do under normal circumstances.)

Doesn't this mean that if you enable write back, and you have
a single, non-mirrored raid-controller, and your raid controller
dies on you so that you loose the contents of the nvram, you have
a potentially corrupt file system?


ZFS requires,that all writes be flushed to non-volatile storage.
This is needed for both transaction group (txg) commits to ensure pool integrity
and for the ZIL to satisfy the synchronous requirement of fsync/O_DSYNC etc.
If the caches weren't flushed then it would indeed be quicker but the pool
would be susceptible to corruption. Sadly some hardware doesn't honour
cache flushes and this can cause corruption.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Edward Ned Harvey
Ok, I've done all the tests I plan to complete.  For highest performance, it
seems:

. The measure I think is the most relevant for typical operation is
the fastest random read /write / mix.  (Thanks Bob, for suggesting I do this
test.)
The winner is clearly striped mirrors in ZFS

. The fastest sustained sequential write is striped mirrors via ZFS,
or maybe raidz

. The fastest sustained sequential read is striped mirrors via ZFS,
or maybe raidz

 

Here are the results:

. Results summary of Bob's method
http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.
pdf

. Raw results of Bob's method
http://nedharvey.com/iozone_weezer/bobs%20method/raw_results.zip 

. Results summary of Ned's method
http://nedharvey.com/iozone_weezer/neds%20method/iozone%20results%20summary.
pdf

. Raw results of Ned's method
http://nedharvey.com/iozone_weezer/neds%20method/raw_results.zip

 

 

 

 

 

From: Edward Ned Harvey [mailto:sola...@nedharvey.com] 
Sent: Saturday, February 13, 2010 9:07 AM
To: opensolaris-disc...@opensolaris.org; zfs-discuss@opensolaris.org
Subject: ZFS performance benchmarks in various configurations

 

I have a new server, with 7 disks in it.  I am performing benchmarks on it
before putting it into production, to substantiate claims I make, like
striping mirrors is faster than raidz and so on.  Would anybody like me to
test any particular configuration?  Unfortunately I don't have any SSD, so I
can't do any meaningful test on the ZIL etc.  Unless someone in the Boston
area has a 2.5 SAS SSD they wouldn't mind lending for a few hours.  ;-)

 

My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I
pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks
when the file operations aren't all cached.)  ;-)  Solaris 10 10/09.  PERC
6/i controller.  All disks are configured in PERC for Adaptive ReadAhead,
and Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is
occupying 1 disk, so I have 6 disks to play with.

 

I am currently running the following tests:

 

Will test, including the time to flush(), various record sizes inside file
sizes up to 16G, sequential write and sequential read.  Not doing any mixed
read/write requests.  Not doing any random read/write.

iozone -Reab somefile.wks -g 17G -i 1 -i 0

 

Configurations being tested:

. Single disk

. 2-way mirror

. 3-way mirror

. 4-way mirror

. 5-way mirror

. 6-way mirror

. Two mirrors striped (or concatenated)

. Three mirrors striped (or concatenated)

. 5-disk raidz

. 6-disk raidz

. 6-disk raidz2

 

Hypothesized results:

. N-way mirrors write at the same speed of a single disk

. N-way mirrors read n-times faster than a single disk

. Two mirrors striped read and write 2x faster than a single mirror

. Three mirrors striped read and write 3x faster than a single
mirror

. Raidz and raidz2:  No hypothesis.  Some people say they perform
comparable to many disks working together.  Some people say it's slower than
a single disk.  Waiting to see the results.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Bob Friesenhahn

On Thu, 18 Feb 2010, Edward Ned Harvey wrote:



Ok, I’ve done all the tests I plan to complete.  For highest performance, it 
seems:

· The measure I think is the most relevant for typical operation is the 
fastest random read
/write / mix.  (Thanks Bob, for suggesting I do this test.)
The winner is clearly striped mirrors in ZFS


A most excellent set of tests.  We could use some units in the PDF 
file though.


While it would take quite some time and effort to accomplish, we could 
use a similar summary for full disk resilver times in each 
configuration.



· The fastest sustained sequential write is striped mirrors via ZFS, or 
maybe raidz


Note that while these tests may be file-sequential, with 8 threads 
working at once, what the disks see is not necessarily sequential. 
However, for initial sequential write, it may be that zfs aggregates 
the write requests and orders them on disk in such a way that 
subsequent sequential reads by the name number of threads in a 
roughly similar order would see a performance benefit.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Edward Ned Harvey
 A most excellent set of tests.  We could use some units in the PDF
 file though.

Oh, hehehe.  ;-)  The units are written in the raw txt files.  On your
tests, the units were ops/sec, and in mine, they were Kbytes/sec.  If you
like, you can always grab the xlsx and modify it to your tastes, and create
an updated pdf.  Just substitute .xlsx instead of .pdf in the previous
URL's.  Or just drop the filename off the URL.  My web server allows
indexing on that directory.

Personally, I only look at the chart which is normalized against a single
disk, so units are intentionally not present.


 While it would take quite some time and effort to accomplish, we could
 use a similar summary for full disk resilver times in each
 configuration.

Actually, that's easy.  Although the zpool create happens instantly, all
the hardware raid configurations required an initial resilver.  And they
were exactly what you expect.  Write 1 Gbit/s until you reach the size of
the drive.  I watched the progress while I did other things, and it was
incredibly consistent.

I am assuming, with very high confidence, that ZFS would match that
performance.
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Edward Ned Harvey
 A most excellent set of tests.  We could use some units in the PDF
 file though.

Oh, by the way, you originally requested the 12G file to be used in
benchmark, and later changed to 4G.  But by that time, two of the tests had
already completed on the 12G, and I didn't throw away those results, but I
didn't include them in the summary either.

If you look in the raw results, you'll see a directory called 12G, and if
you compare those results against the equivalent 4G counterpart, you'll see
the 12G in fact performed somewhat lower.

The reason is that there are sometimes cache hits during read operations,
and the write back buffer is enabled in the PERC.  So the smaller the data
set, the more frequently these things will accelerate you.  And
consequently, the 4G performance was measured higher.

This doesn't affect me at all.  I wanted to know qualitative results, not
quantitative.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Bob Friesenhahn

On Thu, 18 Feb 2010, Edward Ned Harvey wrote:

Actually, that's easy.  Although the zpool create happens instantly, all
the hardware raid configurations required an initial resilver.  And they
were exactly what you expect.  Write 1 Gbit/s until you reach the size of
the drive.  I watched the progress while I did other things, and it was
incredibly consistent.


This sounds like an initial 'silver' rather than a 'resilver'.  In a 
'resilver' process it is necessary to read other disks in the vdev in 
order to reconstruct the disk content.  As a result, we now have 
additional seeks and reads going on, which seems considerably 
different than pure writes.


What I am interested in is the answer to these sort of questions:

 o Does a mirror device resilver faster than raidz?

 o Does a mirror device in a triple mirror resilver faster than a
   two-device mirror?

 o Does a raidz2 with 9 disks resilver faster or slower than one with
   6 disks?

The answer to these questions could vary depending on how well the 
pool has been aged and if it has been used for a while close to 100% 
full.


Before someone pipes up and says that measuring this is useless since 
results like this are posted all over the internet, I challenge that 
someone to find this data already published somewhere.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-18 Thread Daniel Carosone
On Thu, Feb 18, 2010 at 10:39:48PM -0600, Bob Friesenhahn wrote:
 This sounds like an initial 'silver' rather than a 'resilver'. 

Yes, in particular it will be entirely seqential.

ZFS resilver is in txg order and involves seeking.

 What I am interested in is the answer to these sort of questions:

  o Does a mirror device resilver faster than raidz?

  o Does a mirror device in a triple mirror resilver faster than a
two-device mirror?

  o Does a raidz2 with 9 disks resilver faster or slower than one with
6 disks?

and, if we're wishing for comprehensive analysis:

  o What is the impact on concurrent IO benchmark loads, for each of the above. 

 The answer to these questions could vary depending on how well the pool 
 has been aged and if it has been used for a while close to 100% full.

Indeed, which makes it even harder to compare results from different
cases and test sources.  To get usable relative-to-each-other results,
one needs to compare idealised test cases with repeatable loads.
This is weeks of work, at least, and can be fun to specualte about up
front but rapidly gets very tiresome.

--
Dan.

pgp2nipoqXa1P.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-15 Thread Carson Gaspar

Richard Elling wrote:
...

As you can see, so much has changed, hopefully for the better, that running
performance benchmarks on old software just isn't very interesting.

NB. Oracle's Sun OpenStorage systems do not use Solaris 10 and if they did, they
would not be competitive in the market. The notion that OpenSolaris is worthless
and Solaris 10 rules is simply bull*


OpenSolaris isn't worthless, but no way in hell would I run it in 
production, based on my experiences running it at home from b111 to now. 
The mpt driver problems are just one of many show stoppers (is that 
resolved yet, or do we still need magic /etc/system voodoo?).


Of course, Solaris 10 couldn't properly drive the Marvell attached disks 
in an X4500 prior to U6 either, unless you ran an IDR (pretty 
inexcusable in a storage-centric server release).


--
Carson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Edward Ned Harvey
 Never mind. I have no interest in performance tests for Solaris 10.

 The code is so old, that it does not represent current ZFS at all.

 

Whatever.  Regardless of what you say, it does show:

. Which is faster, raidz, or a stripe of mirrors?

. How much does raidz2 hurt performance compared to raidz?

. Which is faster, raidz, or hardware raid 5?

. Is a mirror twice as fast as a single disk for reading?  Is a
3-way mirror 3x faster?  And so on?

 

I've seen and heard many people stating answers to these questions, and my
results (not yet complete) already answer these questions, and demonstrate
that all the previous assertions were partial truths.

 

It's true, I am demonstrating no interest to compare performance of ZFS 3
versus ZFS 4.  If you want that, test it yourself and don't complain about
my tests.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Edward Ned Harvey
   iozone -m -t 8 -T -O -r 128k -o -s 12G
 
 Actually, it seems that this is more than sufficient:
 
iozone -m -t 8 -T -r 128k -o -s 4G

Good news, cuz I kicked off the first test earlier today, and it seems like
it will run till Wednesday.  ;-)  The first run, on a single disk, took 6.5
hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror,
4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks,
stripe of 2 mirrors, stripe of 3 mirrors ...

I'll go stop it, and change to 4G.  Maybe it'll be done tomorrow.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Thomas Burgess
 Whatever.  Regardless of what you say, it does show:

 · Which is faster, raidz, or a stripe of mirrors?

 · How much does raidz2 hurt performance compared to raidz?

 · Which is faster, raidz, or hardware raid 5?

 · Is a mirror twice as fast as a single disk for reading?  Is a
 3-way mirror 3x faster?  And so on?



 I’ve seen and heard many people stating answers to these questions, and my
 results (not yet complete) already answer these questions, and demonstrate
 that all the previous assertions were partial truths.




I don't think he was complaining, i think he was sayign he dind't need you
to run iosnoop on the old version of ZFS.

Solaris 10 has a really old version of ZFS.  i know there are some pretty
big differences in zfs versions from my own non scientific benchmarks.  It
would make sense that people wouldn't be as interested in benchmarks of
solaris 10 ZFS seeing as there are literally hundreds scattered around the
internet.

I don't think he was telling you not to bother testing for your own purposes
though.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Bob Friesenhahn

On Sun, 14 Feb 2010, Edward Ned Harvey wrote:


 Never mind. I have no interest in performance tests for Solaris 10.

 The code is so old, that it does not represent current ZFS at all.

Whatever.  Regardless of what you say, it does show:


Since Richard abandoned Sun (in favor of gmail), he has no qualms with 
suggesting to test the unstable version. ;-)


Regardless of denials to the contrary, Solaris 10 is still the stable 
enterprise version of Solaris, and will be for quite some time.  It 
has not yet achieved the status of Solaris 8.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Bob Friesenhahn

On Sun, 14 Feb 2010, Edward Ned Harvey wrote:


 iozone -m -t 8 -T -O -r 128k -o -s 12G


Actually, it seems that this is more than sufficient:

   iozone -m -t 8 -T -r 128k -o -s 4G


Good news, cuz I kicked off the first test earlier today, and it seems like
it will run till Wednesday.  ;-)  The first run, on a single disk, took 6.5
hrs, and I have it configured to repeat ... 2-way mirror, 3-way mirror,
4-way mirror, 5-way mirror, raidz 5 disks, raidz 6 disks, raidz2 6 disks,
stripe of 2 mirrors, stripe of 3 mirrors ...

I'll go stop it, and change to 4G.  Maybe it'll be done tomorrow.  ;-)


Probably even 2G is plenty since that gives 16GB of total file data.

Keep in mind that with file data much larger than memory, these 
benchmarks are testing the hardware more than they are testing 
Solaris.  If you wanted to test Solaris, then you would intentionally 
give it enough memory to work with since that is now it is expected to 
be used.


The performance of Solaris when it is given enough memory to do 
reasonable caching is astounding.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Bob Friesenhahn

On Sun, 14 Feb 2010, Thomas Burgess wrote:


Solaris 10 has a really old version of ZFS.  i know there are some 
pretty big differences in zfs versions from my own non scientific 
benchmarks.  It would make sense that people wouldn't be as 
interested in benchmarks of solaris 10 ZFS seeing as there are 
literally hundreds scattered around the internet.


Can you provide URLs for these useful benchmarks?  I am certainly 
interested in seeing them.


Even my own benchmarks that I posted almost two years ago are quite 
useless now.  Solaris 10 ZFS is a continually moving target.


OpenSolaris performance postings I have seen are not terribly far from 
Solaris 10.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-14 Thread Richard Elling
On Feb 14, 2010, at 6:45 PM, Thomas Burgess wrote:
 
 Whatever.  Regardless of what you say, it does show:
 
 · Which is faster, raidz, or a stripe of mirrors?
 
 · How much does raidz2 hurt performance compared to raidz?
 
 · Which is faster, raidz, or hardware raid 5?
 
 · Is a mirror twice as fast as a single disk for reading?  Is a 3-way 
 mirror 3x faster?  And so on?
 
  
 I’ve seen and heard many people stating answers to these questions, and my 
 results (not yet complete) already answer these questions, and demonstrate 
 that all the previous assertions were partial truths.
 
  
 
 I don't think he was complaining, i think he was sayign he dind't need you to 
 run iosnoop on the old version of ZFS.

iosnoop runs fine on Solaris 10.

I am sorta complaining, though. If you wish to advance ZFS, then use the 
latest bits. If you wish to discover the performance bugs in Solaris 10 that are
already fixed in OpenSolaris, then go ahead, be my guest.  Examples of 
improvements are:
+ intelligent prefetch algorithm is smarter
+ txg commit interval logic is improved
+ ZIL logic improved and added logbias property
+ stat() performance is improved
+ raidz write performance improved and raidz3 added
+ zfs caching improved
+ dedup changes touched many parts of ZFS
+ zfs_vdev_max_pending reduced and smarter
+ metaslab allocation improved
+ zfs write activity doesn't hog resource quite so much
+ a new scheduling class, SDC, added to better observe and manage
   ZFS thread scheduling
+ buffers can be shared between file system modules (fewer copies)

As you can see, so much has changed, hopefully for the better, that running
performance benchmarks on old software just isn't very interesting.

NB. Oracle's Sun OpenStorage systems do not use Solaris 10 and if they did, they
would not be competitive in the market. The notion that OpenSolaris is worthless
and Solaris 10 rules is simply bull*

 Solaris 10 has a really old version of ZFS.  i know there are some pretty big 
 differences in zfs versions from my own non scientific benchmarks.  It would 
 make sense that people wouldn't be as interested in benchmarks of solaris 10 
 ZFS seeing as there are literally hundreds scattered around the internet.
 
 I don't think he was telling you not to bother testing for your own purposes 
 though.

Correct.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Edward Ned Harvey
I have a new server, with 7 disks in it.  I am performing benchmarks on it
before putting it into production, to substantiate claims I make, like
striping mirrors is faster than raidz and so on.  Would anybody like me to
test any particular configuration?  Unfortunately I don't have any SSD, so I
can't do any meaningful test on the ZIL etc.  Unless someone in the Boston
area has a 2.5 SAS SSD they wouldn't mind lending for a few hours.  ;-)

 

My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I
pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks
when the file operations aren't all cached.)  ;-)  Solaris 10 10/09.  PERC
6/i controller.  All disks are configured in PERC for Adaptive ReadAhead,
and Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is
occupying 1 disk, so I have 6 disks to play with.

 

I am currently running the following tests:

 

Will test, including the time to flush(), various record sizes inside file
sizes up to 16G, sequential write and sequential read.  Not doing any mixed
read/write requests.  Not doing any random read/write.

iozone -Reab somefile.wks -g 17G -i 1 -i 0

 

Configurations being tested:

. Single disk

. 2-way mirror

. 3-way mirror

. 4-way mirror

. 5-way mirror

. 6-way mirror

. Two mirrors striped (or concatenated)

. Three mirrors striped (or concatenated)

. 5-disk raidz

. 6-disk raidz

. 6-disk raidz2

 

Hypothesized results:

. N-way mirrors write at the same speed of a single disk

. N-way mirrors read n-times faster than a single disk

. Two mirrors striped read and write 2x faster than a single mirror

. Three mirrors striped read and write 3x faster than a single
mirror

. Raidz and raidz2:  No hypothesis.  Some people say they perform
comparable to many disks working together.  Some people say it's slower than
a single disk.  Waiting to see the results.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Richard Elling
Some thoughts below...

On Feb 13, 2010, at 6:06 AM, Edward Ned Harvey wrote:

 I have a new server, with 7 disks in it.  I am performing benchmarks on it 
 before putting it into production, to substantiate claims I make, like 
 “striping mirrors is faster than raidz” and so on.  Would anybody like me to 
 test any particular configuration?  Unfortunately I don’t have any SSD, so I 
 can’t do any meaningful test on the ZIL etc.  Unless someone in the Boston 
 area has a 2.5” SAS SSD they wouldn’t mind lending for a few hours.  ;-)
  
 My hardware configuration:  Dell PE 2970 with 8 cores.  Normally 32G, but I 
 pulled it all out to get it down to 4G of ram.  (Easier to benchmark disks 
 when the file operations aren’t all cached.)  ;-)  Solaris 10 10/09.  PERC 
 6/i controller.  All disks are configured in PERC for Adaptive ReadAhead, and 
 Write Back, JBOD.  7 disks present, each SAS 15krpm 160G.  OS is occupying 1 
 disk, so I have 6 disks to play with.

Put the memory back in and limit the ARC cache size instead. x86 boxes
have a tendency to change the memory bus speed depending on how much
memory is in the box.

Similarly, you can test primarycache settings rather than just limiting ARC 
size.

 I am currently running the following tests:
  
 Will test, including the time to flush(), various record sizes inside file 
 sizes up to 16G, sequential write and sequential read.  Not doing any mixed 
 read/write requests.  Not doing any random read/write.
 iozone -Reab somefile.wks -g 17G -i 1 -i 0

IMHO, sequential tests are a waste of time.  With default configs, it will be 
difficult to separate the raw performance from prefetched performance.
You might try disabling prefetch as an option.

With sync writes, you will run into the zfs_immediate_write_sz boundary.

Perhaps someone else can comment on how often they find interesting 
sequential workloads which aren't backup-related.

 Configurations being tested:
 · Single disk
 · 2-way mirror
 · 3-way mirror
 · 4-way mirror
 · 5-way mirror
 · 6-way mirror
 · Two mirrors striped (or concatenated)
 · Three mirrors striped (or concatenated)
 · 5-disk raidz
 · 6-disk raidz
 · 6-disk raidz2

Please add some raidz3 tests :-)  We have little data on how raidz3 performs.

  
 Hypothesized results:
 · N-way mirrors write at the same speed of a single disk
 · N-way mirrors read n-times faster than a single disk
 · Two mirrors striped read and write 2x faster than a single mirror
 · Three mirrors striped read and write 3x faster than a single mirror
 · Raidz and raidz2:  No hypothesis.  Some people say they perform 
 comparable to many disks working together.  Some people say it’s slower than 
 a single disk.  Waiting to see the results.

Please post results (with raw data would be nice ;-).  If you would be so
kind as to collect samples of iosnoop -Da I would be eternally grateful :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Edward Ned Harvey wrote:


Will test, including the time to flush(), various record sizes inside file 
sizes up to 16G,
sequential write and sequential read.  Not doing any mixed read/write 
requests.  Not doing any
random read/write.

iozone -Reab somefile.wks -g 17G -i 1 -i 0


Make sure to also test with a command like

  iozone -m -t 8 -T -O -r 128k -o -s 12G

I am eager to read your test report.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Bob Friesenhahn wrote:


Make sure to also test with a command like

 iozone -m -t 8 -T -O -r 128k -o -s 12G


Actually, it seems that this is more than sufficient:

  iozone -m -t 8 -T -r 128k -o -s 4G

since it creates a 4GB test file for each thread, with 8 threads.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Edward Ned Harvey
 IMHO, sequential tests are a waste of time.  With default configs, it

 will be

 difficult to separate the raw performance from prefetched

 performance.

 You might try disabling prefetch as an option.

 

Let me clarify:

 

Iozone does a nonsequential series of sequential tests, specifically for the
purpose of identifying the performance tiers, separating the various levels
of hardware accelerated performance from the raw disk performance.

 

This is the reason why I took out all but 4G of the system RAM.  In the
(incomplete) results I have so far, it's easy to see these tiers for a
single disk:  

. For filesizes 0 to 4M, a single disk 
writes 2.8 Gbit/sec and reads ~40-60 Gbit/sec.  
This boost comes from writing to PERC cache, and reading from CPU L2 cache.



. For filesizes 4M to 128M, a single disk 
writes 2.8 Gbit/sec and reads 24 Gbit/sec.  
This boost comes from writing to PERC cache, and reading from system memory.



. For filesizes 128M to 4G, a single disk 
writes 1.2 Gbit/sec and reads 24 Gbit/sec.
This boost comes from reading system memory.



. For filesizes 4G to 16G, a single disk
writes 1.2 Gbit/sec and reads 1.2 Gbit/sec
This is the raw disk performance.  (SAS, 15krpm, 146G disks)

 

 

 Please add some raidz3 tests :-)  We have little data on how raidz3

 performs.

 

Does this require a specific version of OS?  I'm on Solaris 10 10/09, and
man zpool doesn't seem to say anything about raidz3 ... I haven't tried
using it ... does it exist?

 

 

 Please post results (with raw data would be nice ;-).  If you would be

 so

 kind as to collect samples of iosnoop -Da I would be eternally

 grateful :-)

 

I'm guessing iosnoop is an opensolaris thing?  Is there an equivalent for
solaris?

 

I'll post both the raw results, and my simplified conclusions.  Most people
would not want the raw data.  Most people just want to know What's the
performance hit I take by using raidz2 instead of raidz?  and so on.

 

Or ... What's faster, raidz, or hardware raid-5?

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Bob Friesenhahn

On Sat, 13 Feb 2010, Edward Ned Harvey wrote:


 kind as to collect samples of iosnoop -Da I would be eternally 
 grateful :-)


I'm guessing iosnoop is an opensolaris thing?  Is there an equivalent for 
solaris?


Iosnoop is part of the DTrace Toolkit by Brendan Gregg, which does 
work on Solaris 10.  See http://www.brendangregg.com/dtrace.html;.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance benchmarks in various configurations

2010-02-13 Thread Richard Elling
On Feb 13, 2010, at 10:54 AM, Edward Ned Harvey wrote:
  Please add some raidz3 tests :-)  We have little data on how raidz3
  performs.
  
 Does this require a specific version of OS?  I'm on Solaris 10 10/09, and 
 man zpool doesn't seem to say anything about raidz3 ... I haven't tried 
 using it ... does it exist?

Never mind. I have no interest in performance tests for Solaris 10.
The code is so old, that it does not represent current ZFS at all.
IMHO, if you want to do performance tests, then you need to be
on the very latest dev release.  Otherwise, the results can't be
carried forward to make a difference -- finding performance issues
that are already fixed isn't a good use of your time.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance issues over 2.5 years.

2009-12-16 Thread Yariv Graf

Hi,
We have approximately 3 million active users and have a storage capacity of
300 TB in ZFS zpools.
The ZFS is mounted on Sun cluster using 3*T2000 servers connected with FC to
SAN storage.
Each zpool is a LUN in SAN which already provides raid so we're not doing
raidz on top of it.

We started using ZFS about 2.5 years ago on Solaris10U3 or U4 (I can't
recall).
Our storage growth is roughly 4TB a month.

The zpools sizes are from 2TB  to the biggest 32T.
We're using ZFS to store mail headers (less than 4k) and attachments (1k to
12mb).

Currently the Sun cluster handles approx. 20K NFS OPS.
€ File sizing:
1. 
2. 10 million files less than 4K a day.
3. 
4. Addition to the 10 million there are another 10 million varies sizes:
5. 
6. 20% less than 4K.
7. 
8.  25% between 4K and 8K
9. 
10.  50% between 9K and 100K
11. 
12.  5% above 100K till 12M

Total 20 million new files a day.

We're using two file hierarchies for storing files:

For the mail headers (less than 4K):

/FF/YYMM/DDHH/SS/ABCDEFGH

Explanation:

First directory is for the mount point from 00..FF (up to 256 directories)
Second directory year and month;
Third directory day and hour;
Forth is seconds;
In the end we have a gzipped file up to 1K.



For the mail object:
We're using single instancing/de-dup on our application (Meaning no maildir
or mbox).

Mail objects can be 1K up to 12MB.
Directory structure is as follows.

 /FF/FF/FF/FF/FF/FF/FF/FF/FF/file

Explanation:

First directory holds 256 directories 00 to FF and the other directories
hold up to 256 directories, with the lower branches holding fewer
directories than higher branches. At the end of the hierarchy there's a
single file.


Mail operation:
When a new mail is arrived we split the mail into object: a header, and each
attachment is an object (even text within a body).
The header files are stored as a ²timestamp² (/FF/YYMM/DDHH/SS/file) so it
may be advantage for reads because when the users are reading their mail the
same day, the metadata of the directories and file can be in cache.



But this is not the same for the attachments, when the attachments are store
in directories with their HEX value.

Our main issue, or problem, over the last 2.5 years of using ZFS:
When a zpool becomes full, the write operation becomes significantly slower.
At first it happened around 90% zpool capacity and now, after 30-40 zpools,
it happens around 80% capacity.
The meaning of this for us is that if we define a zpool of 4TB, we can use
only 3.2T (82%) effectively.


Is there a ³best practice² from SUN/ZFS regarding building directory
hierarchies with huge capacity of files(20M a day) ? also, How can we avoid
performance degradation when we reach 80% of zpool capacity?





Regards



Yariv
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-07 Thread John-Paul Drawneek
Final rant on this.

Managed to get the box re-installed and the performance issue has vanished.

So there is a performance bug in zfs some where.

Not sure to put in a bug log as I can't now provide any more information.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-03 Thread John-Paul Drawneek
So I have poked and prodded the disks and they both seem fine.

Any yet my rpool is still slow.

Any ideas on what do do now.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-03 Thread Collier Minerich
Please unsubscribe me

COLLIER


-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of John-Paul Drawneek
Sent: Thursday, September 03, 2009 2:13 AM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] zfs performance cliff when over 80% util, still
occuring when pool in 6

So I have poked and prodded the disks and they both seem fine.

Any yet my rpool is still slow.

Any ideas on what do do now.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-02 Thread John-Paul Drawneek
No joy.

c1t0d0 89 MB/sec
c1t1d0 89 MB/sec
c2t0d0 123 MB/sec
c2t1d0 123 MB/sec

First two are the rpool
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-01 Thread John-Paul Drawneek
i did not migrate my disks.

I now have 2 pools - rpool is at 60% as is still dog slow.

Also scrubbing the rpool causes the box to lock up.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-01 Thread Bob Friesenhahn

On Tue, 1 Sep 2009, John-Paul Drawneek wrote:


i did not migrate my disks.

I now have 2 pools - rpool is at 60% as is still dog slow.

Also scrubbing the rpool causes the box to lock up.


This sounds like a hardware problem and not something related to 
fragmentation.  Probably you have a slow/failing disk.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-09-01 Thread Bob Friesenhahn

On Tue, 1 Sep 2009, Jpd wrote:


Thanks.

Any idea on how to work out which one.

I can't find smart in ips, so what other ways are there?


You could try using a script like this one to find pokey disks:

#!/bin/ksh

# Date: Mon, 14 Apr 2008 15:49:41 -0700
# From: Jeff Bonwick jeff.bonw...@sun.com
# To: Henrik Hjort hj...@dhs.nu
# Cc: zfs-discuss@opensolaris.org
# Subject: Re: [zfs-discuss] Performance of one single 'cp'
# 
# No, that is definitely not expected.
# 
# One thing that can hose you is having a single disk that performs

# really badly.  I've seen disks as slow as 5 MB/sec due to vibration,
# bad sectors, etc.  To see if you have such a disk, try my diskqual.sh
# script (below).  On my desktop system, which has 8 drives, I get:
# 
# # ./diskqual.sh

# c1t0d0 65 MB/sec
# c1t1d0 63 MB/sec
# c2t0d0 59 MB/sec
# c2t1d0 63 MB/sec
# c3t0d0 60 MB/sec
# c3t1d0 57 MB/sec
# c4t0d0 61 MB/sec
# c4t1d0 61 MB/sec
# 
# The diskqual test is non-destructive (it only does reads), but to

# get valid numbers you should run it on an otherwise idle system.

disks=`format /dev/null | grep ' c.t' | nawk '{print $2}'`

getspeed1()
{
ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 21 |
nawk '$1 == real { printf(%.0f\n, 67.108864 / $2) }'
}

getspeed()
{
# Best out of 6
for iter in 1 2 3 4 5 6
do
getspeed1 $1
done | sort -n | tail -2 | head -1
}

for disk in $disks
do
echo $disk `getspeed $disk` MB/sec
done


--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 6

2009-08-31 Thread Scott Meilicke
As I understand it, when you expand a pool, the data do not automatically 
migrate to the other disks. You will have to rewrite the data somehow, usually 
a backup/restore.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs performance cliff when over 80% util, still occuring when pool in 60% ?

2009-08-30 Thread John-Paul Drawneek
Ok had a pool which got full - so performance tanked.

Ran off and got some more disks to create a new pool to put all the extra data.

Got the original pool down to 60% util, but the performance is still bad.

Any ideas on how to get the performance back?

Bad news is that the pool in question is rpool.

And this is really starting to effect my productivity on the box.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-10-01 Thread William D. Hathaway
You might want to also try toggling the Nagle tcp setting to see if that helps 
with your workload:
ndd -get /dev/tcp tcp_naglim_def 
(save that value, default is 4095)
ndd -set /dev/tcp tcp_naglim_def 1

If no (or a negative) difference, set it back to the original value
ndd -set /dev/tcp tcp_naglim_def 4095 (or whatever it was)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread gm_sjo
2008/9/30 Jean Dion [EMAIL PROTECTED]:
 iSCSI requires dedicated network and not a shared network or even VLAN.  
 Backup cause large I/O that fill your network quickly.  Like ans SAN today.

Could you clarify why it is not suitable to use VLANs for iSCSI?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Jean Dion




Simple. You cannot go faster than the slowest link.

Any VLAN share the bandwidth workload and do not provide a dedicated
bandwidth for each of them. That means if you have multiple VLAN
coming out of the same wire of your server you do not have "n" time the
bandwidth but only a fraction of it. Simple network maths.

Also iSCSI works better by using segregated IP network switches.
Beware that some switches do not guaranty full 1Gbits speed on all
ports when all active at the same time. Plan multiple uplinks if you
have more than one switch. Once again you cannot go faster than the
slowest link.

Jean


gm_sjo wrote:

  2008/9/30 Jean Dion [EMAIL PROTECTED]:
  
  
iSCSI requires dedicated network and not a shared network or even VLAN.  Backup cause large I/O that fill your network quickly.  Like ans SAN today.

  
  
Could you clarify why it is not suitable to use VLANs for iSCSI?
  

-- 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Gary Mills
On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote:
 Do you have dedicated iSCSI ports from your server to your NetApp?  

Yes, it's a dedicated redundant gigabit network.

 iSCSI requires dedicated network and not a shared network or even VLAN.  
 Backup cause large I/O that fill your network quickly.  Like ans SAN today.
 
 Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc).  
 Not rare to see performance issues during backup with several thousands small 
 files.  Each small file cause seeks to your disk and file system.  
 
 As the number of files and size you will be impact.  That means, thousand of 
 small files cause thousand of small I/O but not a lot of throughput.  

What statistics can I generate to observe this contention?  ZFS pool I/O
statistics are not that different when the backup is running.

 Bigger your file are more likely the block will be consecutive on the file 
 system.  Small file can be spread in the entire file system causing seeks, 
 latency and bottleneck.
 
 Legato client and server contains tuning parameters to avoid such small file 
 problems.  Check your Legato buffer parameters.  These buffer will use your 
 server memory as disk cache.  

I'll ask our backup person to investigate those settings.  I assume that
Networker should not be buffering files since those files won't be read
again.  How can I see memory usage by ZFS and by applications?

 Here is a good source of network tuning parameters for your T2000 
 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000
 
 The soft_ring is one of the best one.
 
 Here is another interesting place to look
 http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ

Thanks.  I'll review those documents.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Jean Dion




For Solaris internal debugging tools look here
http://opensolaris.org/os/community/advocacy/events/techdays/seattle/OS_SEA_POD_JMAURO.pdf;jsessionid=9B3E275EEB6F1A0E0BC191D8DEC0F965

ZFS specifics is available here
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Jean



Gary Mills wrote:

  On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote:
  
  
Do you have dedicated iSCSI ports from your server to your NetApp?  

  
  
Yes, it's a dedicated redundant gigabit network.

  
  
iSCSI requires dedicated network and not a shared network or even VLAN.  Backup cause large I/O that fill your network quickly.  Like ans SAN today.

Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc).  Not rare to see performance issues during backup with several thousands small files.  Each small file cause seeks to your disk and file system.  

As the number of files and size you will be impact.  That means, thousand of small files cause thousand of small I/O but not a lot of throughput.  

  
  
What statistics can I generate to observe this contention?  ZFS pool I/O
statistics are not that different when the backup is running.

  
  
Bigger your file are more likely the block will be consecutive on the file system.  Small file can be spread in the entire file system causing seeks, latency and bottleneck.

Legato client and server contains tuning parameters to avoid such small file problems.  Check your Legato buffer parameters.  These buffer will use your server memory as disk cache.  

  
  
I'll ask our backup person to investigate those settings.  I assume that
Networker should not be buffering files since those files won't be read
again.  How can I see memory usage by ZFS and by applications?

  
  
Here is a good source of network tuning parameters for your T2000 
http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000

The soft_ring is one of the best one.

Here is another interesting place to look
http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ

  
  
Thanks.  I'll review those documents.

  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread William D. Hathaway
Gary -
   Besides the network questions...

   What does your zpool status look like?


   Are you using compression on the file systems?
   (Was single-threaded and fixed in s10u4 or equiv patches)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread gm_sjo
2008/9/30 Jean Dion [EMAIL PROTECTED]:
 Simple. You cannot go faster than the slowest link.

That is indeed correct, but what is the slowest link when using a
Layer 2 VLAN? You made a broad statement that iSCSI 'requires' a
dedicated, standalone network. I do not believe this is the case.

 Any VLAN share the bandwidth workload and do not provide a dedicated
 bandwidth for each of them.   That means if you have multiple VLAN coming
 out of the same wire of your server you do not have n time the bandwidth
 but only a fraction of it.  Simple network maths.

I can only assume that you are only referring to VLAN trunks, eg using
a NIC on a server for both 'normal' traffic and having another virtual
interface on it bound to a 'storage' VLAN. If this is the case then
what you say is true, of course you are sharing the same physical link
so ultimately that will be the limit.

However, and this should be clarified before anyone gets the wrong
idea, there is nothing wrong with segmenting a switch by using VLANs
to have some ports for storage traffic and some ports for 'normal'
traffic. You can have one/multiple NIC(s) for storage, and
another/multiple NIC(s) for everything else (or however you please to
use your interfaces!). These can be hooked up to switch ports that are
on different physical VLANs with no performance degredation.

It's best not to assume that every use of a VLAN is a trunk.

 Also iSCSI works better by using segregated IP network switches.  Beware
 that some switches do not guaranty full 1Gbits speed on all ports when all
 active at the same time.   Plan multiple uplinks if you have more than one
 switch. Once again you cannot go faster than the slowest link.

I think it's fairly safe to assume that you're going to get per-port
line-speed across anything other than the cheapest budget switches.
Most SMB (and above) switches will be rated at say 48gbit/sec
backplane on a 24 port item, for example.

However, I am keen to see any benchmarks you may have that shows the
performance difference between running a single switch with layer 2
vlans Vs. two seperate switches.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Jean Dion




Normal iSCSI setup split network traffic at physical layer and not
logical layer. That mean physical ports and often physical PCI bridge
chip if you can. That will be fine for small traffic but we are
talking backup performance issues. IP network and number of small
files are very often the bottlenecks.

If you want performance you do not put all your I/O across the same
physical wire. Once again you cannot go faster than the physical wire
can support (CAT5E, CAT6, fibre). No matter if it is layer 2 or not.
Using VLAN on single port you "share" the bandwidth and not creating
more Gbits speed with Layer 2.

iSCSI best practice require separated physical network. Many books,
white papers are written about this. 

This is like any FC SAN implementation. We always split the workload
between disk and tape using more than one HBA. Never forget , backup
are intensive I/O and will fill the entire I/O path.

Jean


gm_sjo wrote:

  2008/9/30 Jean Dion [EMAIL PROTECTED]:
  
  
Simple. You cannot go faster than the slowest link.

  
  
That is indeed correct, but what is the slowest link when using a
Layer 2 VLAN? You made a broad statement that iSCSI 'requires' a
dedicated, standalone network. I do not believe this is the case.

  
  
Any VLAN share the bandwidth workload and do not provide a dedicated
bandwidth for each of them.   That means if you have multiple VLAN coming
out of the same wire of your server you do not have "n" time the bandwidth
but only a fraction of it.  Simple network maths.

  
  
I can only assume that you are only referring to VLAN trunks, eg using
a NIC on a server for both 'normal' traffic and having another virtual
interface on it bound to a 'storage' VLAN. If this is the case then
what you say is true, of course you are sharing the same physical link
so ultimately that will be the limit.

However, and this should be clarified before anyone gets the wrong
idea, there is nothing wrong with segmenting a switch by using VLANs
to have some ports for storage traffic and some ports for 'normal'
traffic. You can have one/multiple NIC(s) for storage, and
another/multiple NIC(s) for everything else (or however you please to
use your interfaces!). These can be hooked up to switch ports that are
on different physical VLANs with no performance degredation.

It's best not to assume that every use of a VLAN is a trunk.

  
  
Also iSCSI works better by using segregated IP network switches.  Beware
that some switches do not guaranty full 1Gbits speed on all ports when all
active at the same time.   Plan multiple uplinks if you have more than one
switch. Once again you cannot go faster than the slowest link.

  
  
I think it's fairly safe to assume that you're going to get per-port
line-speed across anything other than the cheapest budget switches.
Most SMB (and above) switches will be rated at say 48gbit/sec
backplane on a 24 port item, for example.

However, I am keen to see any benchmarks you may have that shows the
performance difference between running a single switch with layer 2
vlans Vs. two seperate switches.
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Gary Mills
On Tue, Sep 30, 2008 at 10:32:50AM -0700, William D. Hathaway wrote:
 Gary -
Besides the network questions...

Yes, I suppose I should see if traffic on the Iscsi network is
hitting a limit of some sort.

What does your zpool status look like?

Pretty simple:

  $ zpool status
pool: space
   state: ONLINE
   scrub: none requested
  config:
  
  NAME STATE READ WRITE CKSUM
  spaceONLINE   0 0 0
c4t60A98000433469764E4A2D456A644A74d0  ONLINE   0 0 0
c4t60A98000433469764E4A2D456A696579d0  ONLINE   0 0 0
c4t60A98000433469764E4A476D2F6B385Ad0  ONLINE   0 0 0
c4t60A98000433469764E4A476D2F664E4Fd0  ONLINE   0 0 0
  
  errors: No known data errors

The four LUNs use the built-in I/O multipathing, with separate Iscsi
networks, switches, and ethernet interfaces.

Are you using compression on the file systems?
(Was single-threaded and fixed in s10u4 or equiv patches)

No, I've never enabled compression there.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Gary Mills
On Mon, Sep 29, 2008 at 06:01:18PM -0700, Jean Dion wrote:
 
 Legato client and server contains tuning parameters to avoid such small file 
 problems.  Check your Legato buffer parameters.  These buffer will use your 
 server memory as disk cache.  

Our backup person tells me that there are no settings in Networker
that affect buffering on the client side.

 Here is a good source of network tuning parameters for your T2000 
 http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000
 
 The soft_ring is one of the best one.

Those references are for network tuning.  I don't want to change
things blindly.  How do I tell if they are necessary, that is if
the network is the bottleneck in the I/O system?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread gm_sjo
2008/9/30 Jean Dion [EMAIL PROTECTED]:
 If you want performance you do not put all your I/O across the same physical
 wire.  Once again you cannot go faster than the physical wire can support
 (CAT5E, CAT6, fibre).  No matter if it is layer 2 or not. Using VLAN on
 single port you share the bandwidth and not creating more Gbits speed with
 Layer 2.

 iSCSI best practice require separated physical network. Many books, white
 papers are written about this.

Yes, that's true, but I don't believe you mentioned single NIC
implementations in your original statement. Just seeking clarification
to help others :-)

I think it's worth clarifying that iSCSI and VLANs is okay as long as
people appreciate you will require seperate interfaces to get best
performance.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-30 Thread Richard Elling
gm_sjo wrote:
 2008/9/30 Jean Dion [EMAIL PROTECTED]:
   
 If you want performance you do not put all your I/O across the same physical
 wire.  Once again you cannot go faster than the physical wire can support
 (CAT5E, CAT6, fibre).  No matter if it is layer 2 or not. Using VLAN on
 single port you share the bandwidth and not creating more Gbits speed with
 Layer 2.

 iSCSI best practice require separated physical network. Many books, white
 papers are written about this.
 

 Yes, that's true, but I don't believe you mentioned single NIC
 implementations in your original statement. Just seeking clarification
 to help others :-)

 I think it's worth clarifying that iSCSI and VLANs is okay as long as
 people appreciate you will require seperate interfaces to get best
 performance.
   

Separate interfaces or networks may not be required, but properly sized
networks are highly desirable.  For example, a back-of-the-envelope analysis
shows that a single 10GbE pipe is sufficient to drive 8 T10KB drives.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance degradation when backups are running

2008-09-29 Thread Jean Dion
Do you have dedicated iSCSI ports from your server to your NetApp?  

iSCSI requires dedicated network and not a shared network or even VLAN.  Backup 
cause large I/O that fill your network quickly.  Like ans SAN today.

Backup are extremely demanding on hardware (CPU, Mem, I/O ports, disk etc).  
Not rare to see performance issues during backup with several thousands small 
files.  Each small file cause seeks to your disk and file system.  

As the number of files and size you will be impact.  That means, thousand of 
small files cause thousand of small I/O but not a lot of throughput.  

Bigger your file are more likely the block will be consecutive on the file 
system.  Small file can be spread in the entire file system causing seeks, 
latency and bottleneck.

Legato client and server contains tuning parameters to avoid such small file 
problems.  Check your Legato buffer parameters.  These buffer will use your 
server memory as disk cache.  

Here is a good source of network tuning parameters for your T2000 
http://www.solarisinternals.com/wiki/index.php/Networks#Tunable_for_general_workloads_on_T1000.2FT2000

The soft_ring is one of the best one.

Here is another interesting place to look
http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ


Jean Dion
Storage Architect 
Data Management Ambassador
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance degradation when backups are running

2008-09-27 Thread Gary Mills
We have a moderately sized Cyrus installation with 2 TB of storage
and a few thousand simultaneous IMAP sessions.  When one of the
backup processes is running during the day, there's a noticable
slowdown in IMAP client performance.  When I start my `mutt' mail
reader, it pauses for several seconds at `Selecting INBOX'.  That
behavior disappears when the backup finishes.

The IMAP server is a T2000 with six ZFS filesystems that correspond to
Cyrus partitions.  Storage is four Iscsi LUNs from our Netapp filer.
The backup in question is done with EMC Networker.  I've looked at
zpool I/O statistics when the backup is running, but there's nothing
clearly wrong.

I'm wondering if perhaps all the read activity by the backup system
is causing trouble with ZFS' caching.  Is there some way to examine
this area?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-23 Thread Richard Elling
Ralf Bertling wrote:
 Hi list,
 as this matter pops up every now and then in posts on this list I just 
 want to clarify that the real performance of RaidZ (in its current 
 implementation) is NOT anything that follows from raidz-style data 
 efficient redundancy or the copy-on-write design used in ZFS.

 In a M-Way mirrored setup of N disks you get the write performance of 
 the worst disk and a read performance that is the sum of all disks 
 (for streaming and random workloads, while latency is not improved)
 Apart from the write performance you get very bad disk utilization 
 from that scenario.

I beg to differ with very bad disk utilization.  IMHO you get perfect
disk utilization for M-way redundancy :-)

 In Raid-Z currently we have to distinguish random reads from streaming 
 reads:
 - Write performance (with COW) is (N-M)*worst single disk write 
 performance since all writes are streaming writes by design of ZFS 
 (which is N-M-1 times faste than mirrored)
 - Streaming read performance is N*worst read performance of a single 
 disk (which is identical to mirrored if all disks have the same speed)
 - The problem with the current implementation is that N-M disks in a 
 vdev are currently taking part in reading a single byte from a it, 
 which i turn results in the slowest performance of N-M disks in question.

You will not be able to predict real-world write or sequential
read performance with this simple analysis because there are
many caches involved.  The caching effects will dominate for
many cases.  ZFS actually works well with write caches, so
it will be doubly difficult to predict write performance.

You can predict small, random read workload performance,
though, because you can largely discount the caching effects
for most scenarios, especially JBODs.


 Now lets see if this really has to be this way (this implies no, 
 doesn't it ;-)
 When reading small blocks of data (as opposed to streams discussed 
 earlier) the requested data resides on a single disk and thus reading 
 it does not require to send read commands to all disks in the vdev. 
 Without detailed knowledge of the ZFS code, I suspect the problem is 
 the logical block size of any ZFS operation always uses the full 
 stripe. If true, I think this is a design error.

No, the reason is that the block is checksummed and we check
for errors upon read by verifying the checksum.  If you search
the zfs-discuss archives you will find this topic arises every 6
months or so.  Here is a more interesting thread on the subject,
dated November 2006:
http://mail.opensolaris.org/pipermail/zfs-discuss/2006-November/035711.html

You will also note that for fixed record length workloads, we
tend to recommend the blocksize be matched with the ZFS
recordsize. This will improve efficiency for reads, in general.

 Without that, random reads to a raid-z are almost as fast as mirrored 
 data. 
 The theoretical disadvantages come from disks that have different 
 speed (probably insignificant in any real-life scenario) and the 
 statistical probability that by chance a few particular random reads 
 do in fact have to access the same disk drive to be fulfilled. (In a 
 mirrored setup, ZFS can choose from all idle devices, whereas in 
 RAID-Z it has to wait for the disk that holds the data to be ready 
 processing its current requests).
 Looking more closely, this effect mostly affects latency (not 
 performance) as random read-requests coming in should be distributed 
 equally across all devices even bette if the queue of requests gets 
 longer (this would however require ZFS to reorder requests for 
 maximum performance.

ZFS does re-order I/O.  Array controllers re-order the re-ordered
I/O. Disks then re-order I/O, just to make sure it was re-ordered
again. So it is also difficult to develop meaningful models of disk
performance in these complex systems.


 Since this seems to be a real issue for many ZFS users, it would be 
 nice if someone who has more time than me to look into the code, can 
 comment on the amount of work required to boost RaidZ read performance.

Periodically, someone offers to do this... but I haven't seen an
implementation.


 Doing so would level the tradeoff between read- write- performance and 
 disk utilization significantly.
 Obviously if disk space (and resulting electricity costs) do not 
 matter compared to getting maximum read performance, you will always 
 be best of with 3 or even more way mirrors and a very large number of 
 vdevs in your pool.

Space, performance, reliability: pick two.

sidebar
The ZFS checksum has proven to be very effective at
identifying data corruption in systems.  In a traditional
RAID-5 implementation, like SVM, the data is assumed
to be correct if the read operation returned without an
error. If you try to make SVM more reliable by adding a
checksum, then you will end up at approximately the
same place ZFS is: by distrusting the hardware you take
a performance penalty, but improve your data 

[zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Ralf Bertling

Hi list,
as this matter pops up every now and then in posts on this list I just  
want to clarify that the real performance of RaidZ (in its current  
implementation) is NOT anything that follows from raidz-style data  
efficient redundancy or the copy-on-write design used in ZFS.


In a M-Way mirrored setup of N disks you get the write performance of  
the worst disk and a read performance that is the sum of all disks  
(for streaming and random workloads, while latency is not improved)
Apart from the write performance you get very bad disk utilization  
from that scenario.


In Raid-Z currently we have to distinguish random reads from streaming  
reads:
- Write performance (with COW) is (N-M)*worst single disk write  
performance since all writes are streaming writes by design of ZFS  
(which is N-M-1 times faste than mirrored)
- Streaming read performance is N*worst read performance of a single  
disk (which is identical to mirrored if all disks have the same speed)
- The problem with the current implementation is that N-M disks in a  
vdev are currently taking part in reading a single byte from a it,  
which i turn results in the slowest performance of N-M disks in  
question.


Now lets see if this really has to be this way (this implies no,  
doesn't it ;-)
When reading small blocks of data (as opposed to streams discussed  
earlier) the requested data resides on a single disk and thus reading  
it does not require to send read commands to all disks in the vdev.  
Without detailed knowledge of the ZFS code, I suspect the problem is  
the logical block size of any ZFS operation always uses the full  
stripe. If true, I think this is a design error.
Without that, random reads to a raid-z are almost as fast as mirrored  
data.
The theoretical disadvantages come from disks that have different  
speed (probably insignificant in any real-life scenario) and the  
statistical probability that by chance a few particular random reads  
do in fact have to access the same disk drive to be fulfilled. (In a  
mirrored setup, ZFS can choose from all idle devices, whereas in RAID- 
Z it has to wait for the disk that holds the data to be ready  
processing its current requests).
Looking more closely, this effect mostly affects latency (not  
performance) as random read-requests coming in should be distributed  
equally across all devices even bette if the queue of requests gets  
longer (this would however require ZFS to reorder requests for maximum  
performance.


Since this seems to be a real issue for many ZFS users, it would be  
nice if someone who has more time than me to look into the code, can  
comment on the amount of work required to boost RaidZ read performance.


Doing so would level the tradeoff between read- write- performance and  
disk utilization significantly.
Obviously if disk space (and resulting electricity costs) do not  
matter compared to getting maximum read performance, you will always  
be best of with 3 or even more way mirrors and a very large number of  
vdevs in your pool.


A further question that springs to mind is if copies=N is also used to  
improve read performance. If so, you could have some read-optimized  
filesystems in a pool while others use maximum storage efficiency (as  
for backups).


Regards,
ralf
--
Ralf Bertling ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   >