Re: [zfs-discuss] Apple Time Machine

2006-08-08 Thread Robert Gordon


On Aug 8, 2006, at 12:34 AM, Darren J Moffat wrote:


Adam Leventhal wrote:
Needless to say, this was a pretty interesting piece of the  
keynote from a
technical point of view that had quite a few of us scratching our  
heads.
After talking to some Apple engineers, it seems like what they're  
doing is

more or less this:
When a file is modified, the kernel fires off an event which a  
user-land
daemon listens for. Every so often, the user-land daemon does  
something
like a snapshot of the affected portions of the filesystem with  
hard links

(including hard links to directories -- I'm not making this up). That
might be a bit off, but it's the impression I was left with.


Which sounds very similar to how NTFS does single instance storage  
and some other things.


The interesting thing here is that this means HFS+ and NTFS both  
have a file event monitoring framework that is exposed up into  
userland.  This is something that would be VERY useful for  
OpenSolaris, particularly if we could do it at the VFS layer.


Anyhow, very slick UI, sort of dubious back end, interesting  
possibility

for integration with ZFS.


:-)  Which is the opposite of what we tend to do, slick backend  
and no

GUI and an integration challenge on the CLI :-)


Both FreeBSD[1] and Apple[2] (of course) use the kqueue for file event
notifications. openSolaris's FEM [3] starter-kit would be an  
interesting

place to visit and build upon..

-- Robert.

[1]: http://people.freebsd.org/~jlemon/papers/kqueue.pdf
[2]: http://developer.apple.com/documentation/Darwin/Reference/ 
ManPages/man2/kqueue.2.html
[3]: http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/sys/ 
fem.h

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple Time Machine

2006-08-08 Thread Tim Foster

Bryan Cantrill wrote:

So in short (and brace yourself, because I
know it will be a shock):  mentions by executives in keynotes don't always
accurately represent a technology.  DynFS, anyone?  ;)


I'm shocked and stunned, and not a little amazed!

I'll bet the OpenSolaris PPC guys are thrilled at the prospect DTrace on 
their platform.


cheers,
tim
--
Tim Foster, Sun Microsystems Inc, Operating Platforms Group
Engineering Operationshttp://blogs.sun.com/timf
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] zil_disable

2006-08-08 Thread Robert Milkowski
Hello Eric,

Monday, August 7, 2006, 6:29:45 PM, you wrote:

ES Robert -

ES This isn't surprising (either the switch or the results).  Our long term
ES fix for tweaking this knob is:

ES 6280630 zil synchronicity

ES Which would add 'zfs set sync' as a per-dataset option.  A cut from the
ES comments (which aren't visible on opensolaris):

ES sync={deferred,standard,forced}

ES Controls synchronous semantics for the dataset.
ES 
ES When set to 'standard' (the default), synchronous
ES operations such as fsync(3C) behave precisely as defined
ES in fcntl.h(3HEAD).

ES When set to 'deferred', requests for synchronous
ES semantics are ignored.  However, ZFS still guarantees
ES that ordering is preserved -- that is, consecutive
ES operations reach stable storage in order.  (If a thread
ES performs operation A followed by operation B, then the
ES moment that B reaches stable storage, A is guaranteed to
ES be on stable storage as well.)  ZFS also guarantees that
ES all operations will be scheduled for write to stable
ES storage within a few seconds, so that an unexpected
ES power loss only takes the last few seconds of change
ES with it.

ES When set to 'forced', all operations become synchronous.
ES No operation will return until all previous operations
ES have been committed to stable storage.  This option can
ES be useful if an application is found to depend on
ES synchronous semantics without actually requesting them;
ES otherwise, it will just make everything slow, and is not
ES recommended.

ES There was a thread describing the usefulness of this (for builds where
ES all-or-nothing over a long period of time), but I can't find it.

I remember the thread. Do you know if anyone is currently working on
it and when is it expected to be integrated into snv?

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] zil_disable

2006-08-08 Thread Robert Milkowski
Hello Neil,

Monday, August 7, 2006, 6:40:01 PM, you wrote:

NP Not quite, zil_disable is inspected on file system mounts.

I guess you right that umount/mount will suffice - I just hadn't time
to check it and export/import worked.

Anyway is there a way for file systems to make it active without
unmount/mount in current nevada?

NP It's also looked at dynamically on every write for zvols.

Good to know, thank you.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Robert Milkowski
Hello Richard,

Monday, August 7, 2006, 6:54:37 PM, you wrote:

RE Hi Robert, thanks for the data.
RE Please clarify one thing for me.
RE In the case of the HW raid, was there just one LUN?  Or was it 12 LUNs?

Just one lun which was build on 3510 from 12 luns in raid-1(0).


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] ZFS/Thumper experiences

2006-08-08 Thread Robert Milkowski
Hello David,

Tuesday, August 8, 2006, 3:39:42 AM, you wrote:

DJO Thanks, interesting read. It'll be nice to see the actual
DJO results if Sun ever publishes them. 

You may bet I'll post some results hopefully soon :)

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Querying ZFS version?

2006-08-08 Thread Luke Scharf
Although regular Solaris is good for what I'm doing at work, I prefer 
apt-get or yum for package management for a desktop.  So, I've been 
playing with Nexenta / GnuSolaris -- which appears to be the 
open-sourced Solaris kernel and low-level system utilities with Debian 
package management -- and a bunch of packages from Ubuntu.


The release I'm playing with (Alpha 5) does, indeed, have ZFS.  However, 
I can't determine what version of ZFS is included.  Dselect gives the 
following information, which doesn't ring any bells for me:

*** Req base sunwzfsr 5.11.40-1   5.11.40-1   ZFS (Root)

Is there a zfs version command that I don't see?

Thanks,
-Luke

--
Luke Scharf
Virginia Tech Unix Administration Services
Terascale Computing Facility



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Querying ZFS version?

2006-08-08 Thread Darren Reed

Luke Scharf wrote:

Although regular Solaris is good for what I'm doing at work, I prefer 
apt-get or yum for package management for a desktop.  So, I've been 
playing with Nexenta / GnuSolaris -- which appears to be the 
open-sourced Solaris kernel and low-level system utilities with Debian 
package management -- and a bunch of packages from Ubuntu.


The release I'm playing with (Alpha 5) does, indeed, have ZFS.  
However, I can't determine what version of ZFS is included.  Dselect 
gives the following information, which doesn't ring any bells for me:

 *** Req base sunwzfsr 5.11.40-1   5.11.40-1   ZFS (Root)


On Solaris,

pkginfo -l SUNWzfsr

would give you a package version for that part of ZFS..
and modinfo | grep zfs will tell you something about the kernel module 
rev.


Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Querying ZFS version?

2006-08-08 Thread George Wilson

Luke,

You can run 'zpool upgrade' to see what on-disk version you are capable 
of running. If you have the latest features then you should be running 
version 3:


hadji-2# zpool upgrade
This system is currently running ZFS version 3.

Unfortunately this won't tell you if you are running the latest fixes 
but it does tell you that you have all the latest features (at least up 
through snv_43).


Thanks,
George

Luke Scharf wrote:
Although regular Solaris is good for what I'm doing at work, I prefer 
apt-get or yum for package management for a desktop.  So, I've been 
playing with Nexenta / GnuSolaris -- which appears to be the 
open-sourced Solaris kernel and low-level system utilities with Debian 
package management -- and a bunch of packages from Ubuntu.


The release I'm playing with (Alpha 5) does, indeed, have ZFS.  However, 
I can't determine what version of ZFS is included.  Dselect gives the 
following information, which doesn't ring any bells for me:

*** Req base sunwzfsr 5.11.40-1   5.11.40-1   ZFS (Root)

Is there a zfs version command that I don't see?

Thanks,
-Luke




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Querying ZFS version?

2006-08-08 Thread Luke Scharf




George Wilson wrote:
Luke,
  
  
You can run 'zpool upgrade' to see what on-disk version you are capable
of running. If you have the latest features then you should be running
version 3:
  
  
hadji-2# zpool upgrade
  
This system is currently running ZFS version 3.
  
  
Unfortunately this won't tell you if you are running the latest fixes
but it does tell you that you have all the latest features (at least up
through snv_43).
  


That works; the Nexenta system says:

  [EMAIL PROTECTED]:~# zpool upgrade
This system is currently running ZFS version 2.

All pools are formatted using this version.
  

Which is the same as my Solaris x86 6/06 test-machine.

Thanks!
-Luke







smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Querying ZFS version?

2006-08-08 Thread Luke Scharf

Darren Reed wrote:

On Solaris,

pkginfo -l SUNWzfsr

would give you a package version for that part of ZFS..
and modinfo | grep zfs will tell you something about the kernel 
module rev.
No such luck.  Modinfo doesn't show the ZFS module as loaded; that's 
probably because I'm not running anything with ZFS on the machine at the 
moment.


No pkginfo on this system, which I think is part of the point of the 
distribution -- one package manager to rule them all.  Also, dselect / 
apt-get just has a one-line description that says ZFS root 
components.  Not real useful, even if you know what ZFS is - is root 
components those components used by the user root?  Or is it for 
putting the root partition on ZFS?  I'm assuming the former -- but the 
statement is quite ambiguous.


But, zpool upgrade gives me an idea of what featureset to expect, 
which was what I'm aiming for at this point.


Thanks,
-Luke



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] DTrace IO provider and oracle

2006-08-08 Thread przemolicc
Hello,

Solaris 10 GA + latest recommended patches:

while runing dtrace:

bash-3.00# dtrace -n 'io:::start [EMAIL PROTECTED], args[2]-fi_pathname] = 
count();}'
...
  vim 
/zones/obsdb3/root/opt/sfw/bin/vim  296
  tnslsnr none
 2373
  fsflush none
 2952
  sched   none
 9949
  ar60run none
13590
  RACUST  none
39252
  RAXTRX  none
39789
  RAXMTR  none
40671
  FNDLIBR none
64956
  oracle  none
  2096052

How can I interpret 'none' ? Is it possible to get full path (like in vim) ?

Regards
przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zil_disable

2006-08-08 Thread Neil Perrin

Robert Milkowski wrote:

Hello Neil,

Monday, August 7, 2006, 6:40:01 PM, you wrote:

NP Not quite, zil_disable is inspected on file system mounts.

I guess you right that umount/mount will suffice - I just hadn't time
to check it and export/import worked.

Anyway is there a way for file systems to make it active without
unmount/mount in current nevada?


No, sorry.

Neil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zil_disable

2006-08-08 Thread Neil Perrin

Robert Milkowski wrote:

Hello Eric,

Monday, August 7, 2006, 6:29:45 PM, you wrote:

ES Robert -

ES This isn't surprising (either the switch or the results).  Our long term
ES fix for tweaking this knob is:

ES 6280630 zil synchronicity

ES Which would add 'zfs set sync' as a per-dataset option.  A cut from the
ES comments (which aren't visible on opensolaris):

ES sync={deferred,standard,forced}

ES Controls synchronous semantics for the dataset.
ES 
ES When set to 'standard' (the default), synchronous

ES operations such as fsync(3C) behave precisely as defined
ES in fcntl.h(3HEAD).

ES When set to 'deferred', requests for synchronous
ES semantics are ignored.  However, ZFS still guarantees
ES that ordering is preserved -- that is, consecutive
ES operations reach stable storage in order.  (If a thread
ES performs operation A followed by operation B, then the
ES moment that B reaches stable storage, A is guaranteed to
ES be on stable storage as well.)  ZFS also guarantees that
ES all operations will be scheduled for write to stable
ES storage within a few seconds, so that an unexpected
ES power loss only takes the last few seconds of change
ES with it.

ES When set to 'forced', all operations become synchronous.
ES No operation will return until all previous operations
ES have been committed to stable storage.  This option can
ES be useful if an application is found to depend on
ES synchronous semantics without actually requesting them;
ES otherwise, it will just make everything slow, and is not
ES recommended.

ES There was a thread describing the usefulness of this (for builds where
ES all-or-nothing over a long period of time), but I can't find it.

I remember the thread. Do you know if anyone is currently working on
it and when is it expected to be integrated into snv?


I'm slated to work on it after I finish up some other ZIL bugs and performance
fixes.

Neil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Robert Milkowski
Hi.

  This time some RAID5/RAID-Z benchmarks.
This time I connected 3510 head unit with one link to the same server as 3510 
JBODs are connected (using second link). snv_44 is used, server is v440.

I also tried changing max pending IO requests for HW raid5 lun and checked with 
DTrace that larger value is really used - it is but it doesn't change benchmark 
numbers.


1. ZFS on HW RAID5 with 6 disks, atime=off
IO Summary:  444386 ops 7341.7 ops/s, (1129/1130 r/w)  36.1mb/s,297us 
cpu/op,   6.6ms latency
IO Summary:  438649 ops 7247.0 ops/s, (1115/1115 r/w)  35.5mb/s,293us 
cpu/op,   6.7ms latency

2. ZFS with software RAID-Z with 6 disks, atime=off
IO Summary:  457505 ops 7567.3 ops/s, (1164/1164 r/w)  37.2mb/s,340us 
cpu/op,   6.4ms latency
IO Summary:  457767 ops 7567.8 ops/s, (1164/1165 r/w)  36.9mb/s,340us 
cpu/op,   6.4ms latency

3. UFS on HW RAID5 with 6 disks, noatime
IO Summary:  62776 ops 1037.3 ops/s, (160/160 r/w)   5.5mb/s,481us 
cpu/op,  49.7ms latency
IO Summary:  63661 ops 1051.6 ops/s, (162/162 r/w)   5.4mb/s,477us 
cpu/op,  49.1ms latency

4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem 
mounted as in 3)
IO Summary:  393167 ops 6503.1 ops/s, (1000/1001 r/w)  32.4mb/s,405us 
cpu/op,   7.5ms latency
IO Summary:  394525 ops 6521.2 ops/s, (1003/1003 r/w)  32.0mb/s,407us 
cpu/op,   7.7ms latency

5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same 
disks as in test #2)
IO Summary:  461708 ops 7635.5 ops/s, (1175/1175 r/w)  37.4mb/s,330us 
cpu/op,   6.4ms latency
IO Summary:  457649 ops 7562.1 ops/s, (1163/1164 r/w)  37.0mb/s,328us 
cpu/op,   6.5ms latency


In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a 
little bit better performance than hardware raid-5. ZFS is also faster in both 
cases (HW ans SW raid) than UFS on HW raid.

Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works 
as expected.
ZFS on S10U2 in this benchmark gives the same results as on snv_44.


 details 


// c2t43d0 is a HW raid5 made of 6 disks
// array is configured for random IO's
# zpool create HW_RAID5_6disks c2t43d0
#
# zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 
c3t20d0 c3t21d0
#
# zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks
#
# zfs create HW_RAID5_6disks/t1
# zfs create zfs_raid5_6disks/t1
#

# /opt/filebench/bin/sparcv9/filebench
filebench load varmail
  450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully 
loaded
  450: 3.199: Usage: set $dir=dir
  450: 3.199:set $filesize=sizedefaults to 16384
  450: 3.199:set $nfiles=value defaults to 1000
  450: 3.199:set $nthreads=value   defaults to 16
  450: 3.199:set $meaniosize=value defaults to 16384
  450: 3.199:set $meandirwidth=size defaults to 100
  450: 3.199: (sets mean dir width and dir depth is calculated as log (width, 
nfiles)
  450: 3.199:  dirdepth therefore defaults to dir depth of 1 as in postmark
  450: 3.199:  set $meandir lower to increase depth beyond 1 if desired)
  450: 3.199:
  450: 3.199:run runtime (e.g. run 60)
  450: 3.199: syntax error, token expected on line 51
filebench set $dir=/HW_RAID5_6disks/t1
filebench run 60
  450: 13.320: Fileset bigfileset: 1000 files, avg dir = 100.0, avg depth = 
0.5, mbytes=15
  450: 13.321: Creating fileset bigfileset...
  450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds
  450: 15.515: Creating/pre-allocating files
  450: 15.515: Starting 1 filereader instances
  451: 16.525: Starting 16 filereaderthread threads
  450: 19.535: Running...
  450: 80.065: Run took 60 seconds...
  450: 80.079: Per-Operation Breakdown
closefile4565ops/s   0.0mb/s  0.0ms/op8us/op-cpu
readfile4 565ops/s   9.2mb/s  0.1ms/op   60us/op-cpu
openfile4 565ops/s   0.0mb/s  0.1ms/op   64us/op-cpu
closefile3565ops/s   0.0mb/s  0.0ms/op   11us/op-cpu
fsyncfile3565ops/s   0.0mb/s 12.9ms/op  147us/op-cpu
appendfilerand3   565ops/s   8.8mb/s  0.1ms/op  126us/op-cpu
readfile3 565ops/s   9.2mb/s  0.1ms/op   60us/op-cpu
openfile3 565ops/s   0.0mb/s  0.1ms/op   63us/op-cpu
closefile2565ops/s   0.0mb/s  0.0ms/op   11us/op-cpu
fsyncfile2565ops/s   0.0mb/s 12.9ms/op  102us/op-cpu
appendfilerand2   565ops/s   8.8mb/s  0.1ms/op   92us/op-cpu
createfile2   565ops/s   0.0mb/s  0.2ms/op  154us/op-cpu
deletefile1   565ops/s   0.0mb/s  0.1ms/op   86us/op-cpu

  450: 80.079:
IO Summary:  444386 ops 7341.7 ops/s, (1129/1130 r/w)  36.1mb/s,297us 
cpu/op,   6.6ms latency
  450: 80.079: Shutting down processes
filebench run 60
  450: 

RE: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Luke Lonergan
Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the prefetch 
logic?

These are great results for random I/O, I wonder how the sequential I/O looks?

Of course you'll not get great results for sequential I/O on the 3510 :-)

- Luke

Sent from my GoodLink synchronized handheld (www.good.com)


 -Original Message-
From:   Robert Milkowski [mailto:[EMAIL PROTECTED]
Sent:   Tuesday, August 08, 2006 10:15 AM Eastern Standard Time
To: zfs-discuss@opensolaris.org
Subject:[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

Hi.

  This time some RAID5/RAID-Z benchmarks.
This time I connected 3510 head unit with one link to the same server as 3510 
JBODs are connected (using second link). snv_44 is used, server is v440.

I also tried changing max pending IO requests for HW raid5 lun and checked with 
DTrace that larger value is really used - it is but it doesn't change benchmark 
numbers.


1. ZFS on HW RAID5 with 6 disks, atime=off
IO Summary:  444386 ops 7341.7 ops/s, (1129/1130 r/w)  36.1mb/s,297us 
cpu/op,   6.6ms latency
IO Summary:  438649 ops 7247.0 ops/s, (1115/1115 r/w)  35.5mb/s,293us 
cpu/op,   6.7ms latency

2. ZFS with software RAID-Z with 6 disks, atime=off
IO Summary:  457505 ops 7567.3 ops/s, (1164/1164 r/w)  37.2mb/s,340us 
cpu/op,   6.4ms latency
IO Summary:  457767 ops 7567.8 ops/s, (1164/1165 r/w)  36.9mb/s,340us 
cpu/op,   6.4ms latency

3. UFS on HW RAID5 with 6 disks, noatime
IO Summary:  62776 ops 1037.3 ops/s, (160/160 r/w)   5.5mb/s,481us 
cpu/op,  49.7ms latency
IO Summary:  63661 ops 1051.6 ops/s, (162/162 r/w)   5.4mb/s,477us 
cpu/op,  49.1ms latency

4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem 
mounted as in 3)
IO Summary:  393167 ops 6503.1 ops/s, (1000/1001 r/w)  32.4mb/s,405us 
cpu/op,   7.5ms latency
IO Summary:  394525 ops 6521.2 ops/s, (1003/1003 r/w)  32.0mb/s,407us 
cpu/op,   7.7ms latency

5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same 
disks as in test #2)
IO Summary:  461708 ops 7635.5 ops/s, (1175/1175 r/w)  37.4mb/s,330us 
cpu/op,   6.4ms latency
IO Summary:  457649 ops 7562.1 ops/s, (1163/1164 r/w)  37.0mb/s,328us 
cpu/op,   6.5ms latency


In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a 
little bit better performance than hardware raid-5. ZFS is also faster in both 
cases (HW ans SW raid) than UFS on HW raid.

Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works 
as expected.
ZFS on S10U2 in this benchmark gives the same results as on snv_44.


 details 


// c2t43d0 is a HW raid5 made of 6 disks
// array is configured for random IO's
# zpool create HW_RAID5_6disks c2t43d0
#
# zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 
c3t20d0 c3t21d0
#
# zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks
#
# zfs create HW_RAID5_6disks/t1
# zfs create zfs_raid5_6disks/t1
#

# /opt/filebench/bin/sparcv9/filebench
filebench load varmail
  450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully 
loaded
  450: 3.199: Usage: set $dir=dir
  450: 3.199:set $filesize=sizedefaults to 16384
  450: 3.199:set $nfiles=value defaults to 1000
  450: 3.199:set $nthreads=value   defaults to 16
  450: 3.199:set $meaniosize=value defaults to 16384
  450: 3.199:set $meandirwidth=size defaults to 100
  450: 3.199: (sets mean dir width and dir depth is calculated as log (width, 
nfiles)
  450: 3.199:  dirdepth therefore defaults to dir depth of 1 as in postmark
  450: 3.199:  set $meandir lower to increase depth beyond 1 if desired)
  450: 3.199:
  450: 3.199:run runtime (e.g. run 60)
  450: 3.199: syntax error, token expected on line 51
filebench set $dir=/HW_RAID5_6disks/t1
filebench run 60
  450: 13.320: Fileset bigfileset: 1000 files, avg dir = 100.0, avg depth = 
0.5, mbytes=15
  450: 13.321: Creating fileset bigfileset...
  450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds
  450: 15.515: Creating/pre-allocating files
  450: 15.515: Starting 1 filereader instances
  451: 16.525: Starting 16 filereaderthread threads
  450: 19.535: Running...
  450: 80.065: Run took 60 seconds...
  450: 80.079: Per-Operation Breakdown
closefile4565ops/s   0.0mb/s  0.0ms/op8us/op-cpu
readfile4 565ops/s   9.2mb/s  0.1ms/op   60us/op-cpu
openfile4 565ops/s   0.0mb/s  0.1ms/op   64us/op-cpu
closefile3565ops/s   0.0mb/s  0.0ms/op   11us/op-cpu
fsyncfile3565ops/s   0.0mb/s 12.9ms/op  147us/op-cpu
appendfilerand3   565ops/s   8.8mb/s  0.1ms/op  126us/op-cpu
readfile3 565ops/s   9.2mb/s  0.1ms/op   60us/op-cpu
openfile3 565ops/s   0.0mb/s  0.1ms/op   63us/op-cpu

[zfs-discuss] Re: ZFS + /var/log + Single-User

2006-08-08 Thread Pierre Klovsjo
Thanks for your answer Eric!

I don't see any problem mounting a filesystem under 'legacy' options as long as 
i can have the freedom of ZFS features by being able to add/remove/play around 
with disks really!
I tested the 'zfs mount -a' and of course my /var/log[b]/test[/b]  became 
visible and my ZONES (playing a little there as well).

1, What kind draw backs are there to have a filesystem mounted as 'legacy' ? 
2, What kind of 'features' of ZFS will remain ?

Regards,

Pierre
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS + /var/log + Single-User

2006-08-08 Thread Robert Milkowski
Hello Pierre,

Tuesday, August 8, 2006, 4:51:20 PM, you wrote:

PK Thanks for your answer Eric!

PK I don't see any problem mounting a filesystem under 'legacy'
PK options as long as i can have the freedom of ZFS features by being
PK able to add/remove/play around with disks really!
PK I tested the 'zfs mount -a' and of course my /var/log[b]/test[/b]
PK became visible and my ZONES (playing a little there as well).

PK 1, What kind draw backs are there to have a filesystem mounted as 'legacy' ?
PK 2, What kind of 'features' of ZFS will remain ?

legacy only means that ZFS won't mount/umount these file systems and
you should manage them manually or via /etc/vfstab. Nothing else.

There's one drawback when using legacy mountpoints - when you move a
pool of disks to different server then you also have to copy proper
vfstab entries - but it's not a problem in your case. Other than that
lagacy mountpoint do not impose any restrictions, etc.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

2006-08-08 Thread Leon Koll

On 8/8/06, eric kustarz [EMAIL PROTECTED] wrote:

Leon Koll wrote:

 I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
 LUNs, connected via FC SAN.
 The filesystems that were created on LUNS: UFS,VxFS,ZFS.
 Unfortunately the ZFS test couldn't complete bacuase the box was hung
 under very moderate load (3000 IOPs).
 Additional tests were done using UFS and VxFS that were built on ZFS
 raw devices (Zvolumes).
 Results can be seen here:
 http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html


hiya leon,

Out of curiosity, how was the setup for each filesystem type done?

I wasn't sure what 4 ZFS'es in The bad news that the test on 4 ZFS'es
couldn't run at all meant... so something like 'zpool status' would be
great.


Hi Eric,
here it is:

[EMAIL PROTECTED] ~ # zpool status
 pool: pool1
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool1ONLINE   0 0 0
 c4t00173801014Bd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool2
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool2ONLINE   0 0 0
 c4t00173801014Cd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool3
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool3ONLINE   0 0 0
 c4t001738010140001Cd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool4
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool4ONLINE   0 0 0
 c4t0017380101400012d0  ONLINE   0 0 0

errors: No known data errors



Do you know what you're limiting factor was for ZFS (CPU, memory, I/O...)?


Thanks to George Wilson who pointed me to the fact that the memory was
fully consumed.
I removed the line
set ncsize = 0x10 from /etc/system
and the now the host isn't hung during the test anymore.
But performance is still an issue.

-- Leon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple Time Machine

2006-08-08 Thread Frank Cusack

On August 8, 2006 3:04:09 PM +0930 Darren J Moffat [EMAIL PROTECTED] wrote:

Adam Leventhal wrote:

When a file is modified, the kernel fires off an event which a user-land
daemon listens for. Every so often, the user-land daemon does something
like a snapshot of the affected portions of the filesystem with hard links
(including hard links to directories -- I'm not making this up). That
might be a bit off, but it's the impression I was left with.


Which sounds very similar to how NTFS does single instance storage and some 
other things.


And how Google Desktop and the Mac OS X Spotlight features index data.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Robert Milkowski
Hello Luke,

Tuesday, August 8, 2006, 4:48:38 PM, you wrote:

LL Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the 
prefetch logic?

LL These are great results for random I/O, I wonder how the sequential I/O 
looks?

LL Of course you'll not get great results for sequential I/O on the 3510 :-)



filebench/singlestreamread v440



1. UFS, noatime, HW RAID5 6 disks, S10U2

 70MB/s

2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)

 87MB/s

3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2

 130MB/s
 

4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44

 133MB/s

 

ps.
With software RAID-Z I got about 940ms/s : well, after files were
created they were all cached and ZFS almost didn't touch a disks :)

ok, I changed filesize to be well over memory size of the server and
above results are with that larger filesize.




filebench/singlestreamwrite v440

1. UFS, noatime, HW RAID-5 6 disks, S10U2

70MB/s

2. ZFS, atime=off, HW RAID-5 6 disks, S10U2 (the same lun as in #1)

52MB/s

3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2

148MB/s

4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44

147MB/s


So sequential writing in ZFS on HWR5 is actually worse than UFS.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Luke Lonergan
Robert,

On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote:

 1. UFS, noatime, HW RAID5 6 disks, S10U2
  70MB/s
 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
  87MB/s
 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
  130MB/s
 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
  133MB/s

Well, the UFS results are miserable, but the ZFS results aren't good - I'd
expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from
8kb to 32kb.

Most of my ZFS experiments have been with RAID10, but there were some
massive improvements to seq I/O with the fixes I mentioned - I'd expect that
this shows that they aren't in snv44.

- Luke


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[4]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Robert Milkowski
Hello Luke,

Tuesday, August 8, 2006, 6:18:39 PM, you wrote:

LL Robert,

LL On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote:

 1. UFS, noatime, HW RAID5 6 disks, S10U2
  70MB/s
 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
  87MB/s
 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
  130MB/s
 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
  133MB/s

LL Well, the UFS results are miserable, but the ZFS results aren't good - I'd
LL expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from
LL 8kb to 32kb.

Well right now I'm testing with single 200MB/s fc link so it's upper
limit in this testing.

LL Most of my ZFS experiments have been with RAID10, but there were some
LL massive improvements to seq I/O with the fixes I mentioned - I'd expect that
LL this shows that they aren't in snv44.

So where did you get those fixes?


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Mark Maybee

Luke Lonergan wrote:

Robert,

On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote:



1. UFS, noatime, HW RAID5 6 disks, S10U2
70MB/s
2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
87MB/s
3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
130MB/s
4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
133MB/s



Well, the UFS results are miserable, but the ZFS results aren't good - I'd
expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from
8kb to 32kb.

Most of my ZFS experiments have been with RAID10, but there were some
massive improvements to seq I/O with the fixes I mentioned - I'd expect that
this shows that they aren't in snv44.


Those fixes went into snv_45

-Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[4]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Luke Lonergan
Robert,

 LL Most of my ZFS experiments have been with RAID10, but there were some
 LL massive improvements to seq I/O with the fixes I mentioned - I'd expect
 that
 LL this shows that they aren't in snv44.
 
 So where did you get those fixes?

From the fine people who implemented them!

As Mark said, apparently they're available in snv_45 (yay!)

- Luke


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS RAID10

2006-08-08 Thread Robert Milkowski
Hi.

snv_44, v440

filebench/varmail results for ZFS RAID10 with 6 disks and 32 disks.
What is suprising is that the results for both cases are almost the same!



6 disks:

   IO Summary:  566997 ops 9373.6 ops/s, (1442/1442 r/w)  45.7mb/s,
299us cpu/op,   5.1ms latency
   IO Summary:  542398 ops 8971.4 ops/s, (1380/1380 r/w)  43.9mb/s,
300us cpu/op,   5.4ms latency


32 disks:
   IO Summary:  572429 ops 9469.7 ops/s, (1457/1457 r/w)  46.2mb/s,
301us cpu/op,   5.1ms latency
   IO Summary:  560491 ops 9270.6 ops/s, (1426/1427 r/w)  45.4mb/s,
300us cpu/op,   5.2ms latency

   

Using iostat I can see that with 6 disks in a pool I get about 100-200 IO/s per 
disk in a pool, and with 32 disk pool I get only 30-70 IO/s per disk in a pool. 
Each CPU is used at about 25% in SYS (there're 4 CPUs).

Something is wrong here.


# zpool status
  pool: zfs_raid10_32disks
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
zfs_raid10_32disks  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t16d0  ONLINE   0 0 0
c3t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t18d0  ONLINE   0 0 0
c3t19d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t20d0  ONLINE   0 0 0
c3t21d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t22d0  ONLINE   0 0 0
c3t23d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t24d0  ONLINE   0 0 0
c3t25d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t26d0  ONLINE   0 0 0
c3t27d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t32d0  ONLINE   0 0 0
c3t33d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t34d0  ONLINE   0 0 0
c3t35d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t36d0  ONLINE   0 0 0
c3t37d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t38d0  ONLINE   0 0 0
c3t39d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t40d0  ONLINE   0 0 0
c3t41d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t42d0  ONLINE   0 0 0
c3t43d0  ONLINE   0 0 0

errors: No known data errors
bash-3.00# zpool destroy zfs_raid10_32disks
bash-3.00# zpool create zfs_raid10_6disks mirror c3t42d0 c3t43d0 mirror c3t40d0 
c3t41d0 mirror c3t38d0 c3t39d0
bash-3.00# zfs set atime=off zfs_raid10_6disks
bash-3.00# zfs create zfs_raid10_6disks/t1
bash-3.00#
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Doug Scott
 Robert,
 
 On 8/8/06 9:11 AM, Robert Milkowski
 [EMAIL PROTECTED] wrote:
 
  1. UFS, noatime, HW RAID5 6 disks, S10U2
   70MB/s
  2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the
 same lun as in #1)
   87MB/s
  3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
   130MB/s
  4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
   133MB/s
 
 Well, the UFS results are miserable, but the ZFS
 results aren't good - I'd
 expect between 250-350MB/s from a 6-disk RAID5 with
 read() blocksize from
 8kb to 32kb.
 
 Most of my ZFS experiments have been with RAID10, but
 there were some
 massive improvements to seq I/O with the fixes I
 mentioned - I'd expect that
 this shows that they aren't in snv44.
 
 - Luke

I dont think there is much chance of achieving anywhere near 350MB/s.
That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you
can always get very good results from a single disk IO, your percentage
gain is always decreasing the more disks you add to the equation. 

From a single 200MB/s fibre, expect some where between 160-180MB/s,
at best.

Doug
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Matthew Ahrens
On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote:
 filebench/singlestreamread v440
 
 1. UFS, noatime, HW RAID5 6 disks, S10U2
  70MB/s
 
 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
  87MB/s
 
 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
  130MB/s
  
 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
  133MB/s

FYI, Streaming read performance is improved considerably by Mark's
prefetch fixes which are in build 45.  (However, as mentioned you will
soon run into the bandwidth of a single fiber channel connection.)

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Robert Milkowski
Hello Matthew,

Tuesday, August 8, 2006, 7:25:17 PM, you wrote:

MA On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote:
 filebench/singlestreamread v440
 
 1. UFS, noatime, HW RAID5 6 disks, S10U2
  70MB/s
 
 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1)
  87MB/s
 
 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2
  130MB/s
  
 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44
  133MB/s

MA FYI, Streaming read performance is improved considerably by Mark's
MA prefetch fixes which are in build 45.  (However, as mentioned you will
MA soon run into the bandwidth of a single fiber channel connection.)

I will probably re-test with snv_45 (waiting for SX).

FC is not that big problem - if I will find enough time I will just
add another FC cards.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS RAID10

2006-08-08 Thread Matthew Ahrens
On Tue, Aug 08, 2006 at 09:54:16AM -0700, Robert Milkowski wrote:
 Hi.
 
 snv_44, v440
 
 filebench/varmail results for ZFS RAID10 with 6 disks and 32 disks.
 What is suprising is that the results for both cases are almost the same!
 
 
 
 6 disks:
 
IO Summary:  566997 ops 9373.6 ops/s, (1442/1442 r/w)  45.7mb/s,
 299us cpu/op,   5.1ms latency
IO Summary:  542398 ops 8971.4 ops/s, (1380/1380 r/w)  43.9mb/s,
 300us cpu/op,   5.4ms latency
 
 
 32 disks:
IO Summary:  572429 ops 9469.7 ops/s, (1457/1457 r/w)  46.2mb/s,
 301us cpu/op,   5.1ms latency
IO Summary:  560491 ops 9270.6 ops/s, (1426/1427 r/w)  45.4mb/s,
 300us cpu/op,   5.2ms latency
 

 
 Using iostat I can see that with 6 disks in a pool I get about 100-200 IO/s 
 per disk in a pool, and with 32 disk pool I get only 30-70 IO/s per disk in a 
 pool. Each CPU is used at about 25% in SYS (there're 4 CPUs).
 
 Something is wrong here.

It's possible that you are CPU limited.  I'm guessing that your test
uses only one thread, so that may be the limiting factor.

We can get a quick idea of where that CPU is being spent if you can run
'lockstat -kgIW sleep 60' while your test is running, and send us the
first 100 lines of output.  It would be nice to see the output of
'iostat -xnpc 3' while the test is running, too.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS RAID10

2006-08-08 Thread Robert Milkowski
Hello Doug,

Tuesday, August 8, 2006, 7:28:07 PM, you wrote:

DS Looks like somewhere between the CPU and your disks you have a limitation 
of 9500 ops/sec.
DS How did you connect 32 disks to your v440?

Some 3510 JBODs connected directly over FC.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: ZFS RAID10

2006-08-08 Thread Robert Milkowski
filebench in varmail by default creates 16 threads - I configrm it with prstat, 
16 threrads are created and running.


bash-3.00# lockstat -kgIW sleep 60|less
Profiling interrupt: 23308 events in 60.059 seconds (388 events/sec)

Count genr cuml rcnt nsec Hottest CPU+PILCaller
---
17615  76%  0.00 2016 cpu[2] thread_start
17255  74%  0.00 2001 cpu[2] idle
14439  62%  0.00 2015 cpu[2] disp_getwork
 4726  20%  0.00 2673 cpu[2] syscall_trap
 1010   4%  0.00 2625 cpu[2] fdsync
  998   4%  0.00 2630 cpu[2] fop_fsync
  988   4%  0.00 2632 cpu[2] zfs_fsync
  958   4%  0.00 2639 cpu[2] zil_commit
  839   4%  0.00 2814 cpu[2] fop_read
  765   3%  0.00 2624 cpu[0] write
  755   3%  0.00 2625 cpu[0] fop_write
  746   3%  0.00 2626 cpu[0] zfs_write
  739   3%  0.00 2751 cpu[1] copen
  705   3%  0.00 2712 cpu[1] vn_openat
  601   3%  0.00 2841 cpu[2] lookuppnat
  599   3%  0.00 2284 cpu0   (usermode)
  585   3%  0.00 2837 cpu[2] lookuppnvp
  546   2%  0.00 2653 cpu[2] zil_lwb_write_start
  541   2%  0.00 2726 cpu[0] pread
  493   2%  0.00 2762 cpu[1] read
  481   2%  0.00 2811 cpu[0] mutex_enter
  451   2%  0.00 2684 cpu0   zio_checksum_generate
  451   2%  0.00 2684 cpu0   fletcher_2_native
  439   2%  0.00 2740 cpu[2] uiomove
  413   2%  0.00 2523 cpu[1] zio_checksum
  401   2%  0.00 2969 cpu[0] lookupnameat
  384   2%  0.00 2755 cpu[1] zfs_read
  372   2%  0.00 2529 cpu[1] vn_createat
  371   2%  0.00 2653 cpu[0] lwp_mutex_timedlock
  352   2%  0.00 2914 cpu[2] pr_read_lwpusage
  321   1%  0.00 2777 cpu[0] bzero
  317   1%  0.00 2702 cpu[2] unlink
  314   1%  0.00 2695 cpu[2] vn_removeat
  313   1%  0.00 2760 cpu[1]+11  disp_getbest
  311   1%  0.00 2431 cpu[1] zil_lwb_commit
  296   1%  0.00 2774 cpu[2] bcopy_more
  289   1%  0.00 2796 cpu[1] copyout_more
  280   1%  0.00 2757 cpu0   zfs_grow_blocksize
  277   1%  0.00 2754 cpu0   dmu_object_set_blocksize
  277   1%  0.00 2592 cpu[1] dmu_write_uio
  276   1%  0.00 2912 cpu[2] traverse
  274   1%  0.00 2759 cpu0   dnode_set_blksz
  269   1%  0.00 2751 cpu0   fop_lookup
  263   1%  0.00 2675 cpu[0] lwp_upimutex_lock
  262   1%  0.00 2753 cpu0   dbuf_new_size
  261   1%  0.00 2942 cpu[1] dnode_hold_impl
  246   1%  0.00 2478 cpu[2] fop_create
  244   1%  0.00 2480 cpu[2] zfs_create
  244   1%  0.00 3080 cpu0   vdev_mirror_io_start
  212   1%  0.00 2755 cpu[2] mutex_vector_enter
  201   1%  0.00 2709 cpu0   zfs_lookup
  197   1%  0.00 2723 cpu[2] fop_remove
  194   1%  0.00 3007 cpu[2] zfs_zget
  194   1%  0.00 2720 cpu[2] zfs_remove
  182   1%  0.00 3040 cpu[2] fsop_root
  176   1%  0.00 3073 cpu[2] zfs_root
  174   1%  0.00 2841 cpu[1]+11  cv_wait
  171   1%  0.00 2593 cpu[0]+6   intr_thread
  165   1%  0.00 3246 cpu[1] dbuf_hold_impl
  163   1%  0.00 2465 cpu[2] zfs_get_data
  162   1%  0.00 2534 cpu[0] dbuf_read
  160   1%  0.00 2351 cpu[0] taskq_thread
  151   1%  0.00 3264 cpu[1] dbuf_hold
  147   1%  0.00 3162 cpu[2] dmu_bonus_hold
  143   1%  0.00 3770 cpu0   txg_sync_thread
  143   1%  0.00 3770 cpu0   spa_sync
  143   1%  0.00 3770 cpu0   dsl_pool_sync
  143   1%  0.00 3770 cpu0   dmu_objset_sync
  143   1%  0.00 3770 cpu0   dmu_objset_sync_dnodes
  141   1%  0.00 3798 cpu0   dsl_dataset_sync
  141   1%  0.00 2551 cpu[0] 

[zfs-discuss] Re: ZFS/Thumper experiences

2006-08-08 Thread Jochen M. Kaiser
Hello,

I really appreciate such information, could you please give us some additional 
insight regarding your statement, that [you] tried to drive ZFS to its limit, 
[...] 
found that the results were less consistent or predictable.
Especially when taking a closer look at the upcoming rdbms+thumper bundles,
erratic behavior of the system would not be a thing I'd appreciate in a 
production environment.

In addition to that I'd like to know whether you've got some advice regarding 
the 
disk setup. Is it advisable to place Solaris on a separate zfs mirror and to 
use the 
rest of the disks in a single or multiple raid-z pool(s)? 

Many thanks in advance,

Jochen
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS RAID10

2006-08-08 Thread Matthew Ahrens
On Tue, Aug 08, 2006 at 10:42:41AM -0700, Robert Milkowski wrote:
 filebench in varmail by default creates 16 threads - I configrm it
 with prstat, 16 threrads are created and running.

Ah, OK.  Looking at these results, it doesn't seem to be CPU bound, and
the disks are not fully utilized either.  However, because the test is
doing so much synchronous writes (eg. by calling fsync()), we are
continually writing out the intent log.

Unfortunately, we are only able to issue a small number of concurrent
i/os while doing the intent log writes.  All the threads must wait for
the intent log blocks to be written before they can enqueue more data.
Therefore, we are essentially doing:

many threads call fsync().
one of them will flush the intent log, issuing a few writes to the disks
all of the threads wait for the writes to complete
repeat.

This test fundamentally requires waiting for lots of syncronous writes.
Assuming no other activity on the system, the performance of syncronous
writes does not scale with the number of drives, it scales with the
drive's write latency.

If you were to alter the test to not require everything to be done
synchronously, then you would see much different behavior.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS

2006-08-08 Thread eric kustarz

Leon Koll wrote:


On 8/8/06, eric kustarz [EMAIL PROTECTED] wrote:


Leon Koll wrote:

 I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB
 LUNs, connected via FC SAN.
 The filesystems that were created on LUNS: UFS,VxFS,ZFS.
 Unfortunately the ZFS test couldn't complete bacuase the box was hung
 under very moderate load (3000 IOPs).
 Additional tests were done using UFS and VxFS that were built on ZFS
 raw devices (Zvolumes).
 Results can be seen here:
 
http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html



hiya leon,

Out of curiosity, how was the setup for each filesystem type done?

I wasn't sure what 4 ZFS'es in The bad news that the test on 4 ZFS'es
couldn't run at all meant... so something like 'zpool status' would be
great.



Hi Eric,
here it is:

[EMAIL PROTECTED] ~ # zpool status
 pool: pool1
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool1ONLINE   0 0 0
 c4t00173801014Bd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool2
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool2ONLINE   0 0 0
 c4t00173801014Cd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool3
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool3ONLINE   0 0 0
 c4t001738010140001Cd0  ONLINE   0 0 0

errors: No known data errors

 pool: pool4
state: ONLINE
scrub: none requested
config:

   NAME STATE READ WRITE CKSUM
   pool4ONLINE   0 0 0
 c4t0017380101400012d0  ONLINE   0 0 0

errors: No known data errors



So having 4 pools isn't a recommended config - i would destroy those 4 
pools and just create 1 RAID-0 pool:
#zpool create sfsrocks c4t00173801014Bd0 c4t00173801014Cd0 
c4t001738010140001Cd0 c4t0017380101400012d0


each of those devices is a 64GB lun, right?





Do you know what you're limiting factor was for ZFS (CPU, memory, 
I/O...)?



Thanks to George Wilson who pointed me to the fact that the memory was
fully consumed.
I removed the line
set ncsize = 0x10 from /etc/system
and the now the host isn't hung during the test anymore.
But performance is still an issue.



ah, you were limiting the # of dnlc entries... so you're still seeing 
ZFS max out at 2000 ops/s?  Let us know what happends when you switch to 
1 pool.


eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Lots of seeks?

2006-08-08 Thread Anton B. Rang
So while I'm feeling optimistic :-) we really ought to be able to do this in 
two I/O operations. If we have, say, 500K of data to write (including all of 
the metadata), we should be able to allocate a contiguous 500K block on disk 
and write that with a single operation. Then we update the überblock.

The only inherent problem preventing this right now is that we don't have 
general scatter/gather at the driver level (ugh). This is a bug that should be 
fixed, IMO. Then ZFS just needs to delay choosing physical block locations 
until they’re being written as part of a group. (Of course, as NetApp points 
out in their WAFL papers, the goal of optimizing writes can conflict with the 
goal of optimizing reads, so taken to an extreme, this optimization isn’t 
always desirable.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of seeks?

2006-08-08 Thread Spencer Shepler
On Tue, Anton B. Rang wrote:
 So while I'm feeling optimistic :-) we really ought to be able to do this in 
 two I/O operations. If we have, say, 500K of data to write (including all of 
 the metadata), we should be able to allocate a contiguous 500K block on disk 
 and write that with a single operation. Then we update the ??berblock.
 
 The only inherent problem preventing this right now is that we don't have 
 general scatter/gather at the driver level (ugh). 

Fixing this bug would help the NFS server significantly given the
general lack of continuity of incoming write data (split at mblk
boundaries).

Spencer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS/Thumper experiences

2006-08-08 Thread Luke Lonergan
Jochen,

On 8/8/06 10:47 AM, Jochen M. Kaiser [EMAIL PROTECTED] wrote:

 I really appreciate such information, could you please give us some additional
 insight regarding your statement, that [you] tried to drive ZFS to its limit,
 [...] 
 found that the results were less consistent or predictable.
 Especially when taking a closer look at the upcoming rdbms+thumper bundles,
 erratic behavior of the system would not be a thing I'd appreciate in a
 production environment.

Adrian's tests were done on code prior to fixing the I/O scheduler and
prefetch logic.  We at Greenplum worked with the ZFS team extensively to
locate the problems and establish predictable behavior from ZFS prior to the
release of the DBMS + Thumper system.

The fixes are apparently due sometime soon as part of the nv_45 release on
Solaris Express, and will be part of Sol10 U3.

- Luke 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID

2006-08-08 Thread Luke Lonergan
Doug,

On 8/8/06 10:15 AM, Doug Scott [EMAIL PROTECTED] wrote:

 I dont think there is much chance of achieving anywhere near 350MB/s.
 That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you
 can always get very good results from a single disk IO, your percentage
 gain is always decreasing the more disks you add to the equation.
 
 From a single 200MB/s fibre, expect some where between 160-180MB/s,
 at best.

Momentarily forgot about the sucky single FC limit - I've become so used to
calculating drive rate, which in this case would be 80MB/s per disk for
modern 15K RPM FC or SCSI drives - then multiply by the 5 drives in a 6
drive RAID5/Z.

We routinely get 950MB/s from 16 SATA disks on a single server with internal
storage.  We're getting 2,000 MB/s on 36 disks in an X4500 with ZFS.

- Luke


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss