Re: [zfs-discuss] Apple Time Machine
On Aug 8, 2006, at 12:34 AM, Darren J Moffat wrote: Adam Leventhal wrote: Needless to say, this was a pretty interesting piece of the keynote from a technical point of view that had quite a few of us scratching our heads. After talking to some Apple engineers, it seems like what they're doing is more or less this: When a file is modified, the kernel fires off an event which a user-land daemon listens for. Every so often, the user-land daemon does something like a snapshot of the affected portions of the filesystem with hard links (including hard links to directories -- I'm not making this up). That might be a bit off, but it's the impression I was left with. Which sounds very similar to how NTFS does single instance storage and some other things. The interesting thing here is that this means HFS+ and NTFS both have a file event monitoring framework that is exposed up into userland. This is something that would be VERY useful for OpenSolaris, particularly if we could do it at the VFS layer. Anyhow, very slick UI, sort of dubious back end, interesting possibility for integration with ZFS. :-) Which is the opposite of what we tend to do, slick backend and no GUI and an integration challenge on the CLI :-) Both FreeBSD[1] and Apple[2] (of course) use the kqueue for file event notifications. openSolaris's FEM [3] starter-kit would be an interesting place to visit and build upon.. -- Robert. [1]: http://people.freebsd.org/~jlemon/papers/kqueue.pdf [2]: http://developer.apple.com/documentation/Darwin/Reference/ ManPages/man2/kqueue.2.html [3]: http://cvs.opensolaris.org/source/xref/on/usr/src/uts/common/sys/ fem.h ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apple Time Machine
Bryan Cantrill wrote: So in short (and brace yourself, because I know it will be a shock): mentions by executives in keynotes don't always accurately represent a technology. DynFS, anyone? ;) I'm shocked and stunned, and not a little amazed! I'll bet the OpenSolaris PPC guys are thrilled at the prospect DTrace on their platform. cheers, tim -- Tim Foster, Sun Microsystems Inc, Operating Platforms Group Engineering Operationshttp://blogs.sun.com/timf ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] zil_disable
Hello Eric, Monday, August 7, 2006, 6:29:45 PM, you wrote: ES Robert - ES This isn't surprising (either the switch or the results). Our long term ES fix for tweaking this knob is: ES 6280630 zil synchronicity ES Which would add 'zfs set sync' as a per-dataset option. A cut from the ES comments (which aren't visible on opensolaris): ES sync={deferred,standard,forced} ES Controls synchronous semantics for the dataset. ES ES When set to 'standard' (the default), synchronous ES operations such as fsync(3C) behave precisely as defined ES in fcntl.h(3HEAD). ES When set to 'deferred', requests for synchronous ES semantics are ignored. However, ZFS still guarantees ES that ordering is preserved -- that is, consecutive ES operations reach stable storage in order. (If a thread ES performs operation A followed by operation B, then the ES moment that B reaches stable storage, A is guaranteed to ES be on stable storage as well.) ZFS also guarantees that ES all operations will be scheduled for write to stable ES storage within a few seconds, so that an unexpected ES power loss only takes the last few seconds of change ES with it. ES When set to 'forced', all operations become synchronous. ES No operation will return until all previous operations ES have been committed to stable storage. This option can ES be useful if an application is found to depend on ES synchronous semantics without actually requesting them; ES otherwise, it will just make everything slow, and is not ES recommended. ES There was a thread describing the usefulness of this (for builds where ES all-or-nothing over a long period of time), but I can't find it. I remember the thread. Do you know if anyone is currently working on it and when is it expected to be integrated into snv? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] zil_disable
Hello Neil, Monday, August 7, 2006, 6:40:01 PM, you wrote: NP Not quite, zil_disable is inspected on file system mounts. I guess you right that umount/mount will suffice - I just hadn't time to check it and export/import worked. Anyway is there a way for file systems to make it active without unmount/mount in current nevada? NP It's also looked at dynamically on every write for zvols. Good to know, thank you. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Richard, Monday, August 7, 2006, 6:54:37 PM, you wrote: RE Hi Robert, thanks for the data. RE Please clarify one thing for me. RE In the case of the HW raid, was there just one LUN? Or was it 12 LUNs? Just one lun which was build on 3510 from 12 luns in raid-1(0). -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] ZFS/Thumper experiences
Hello David, Tuesday, August 8, 2006, 3:39:42 AM, you wrote: DJO Thanks, interesting read. It'll be nice to see the actual DJO results if Sun ever publishes them. You may bet I'll post some results hopefully soon :) -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Querying ZFS version?
Although regular Solaris is good for what I'm doing at work, I prefer apt-get or yum for package management for a desktop. So, I've been playing with Nexenta / GnuSolaris -- which appears to be the open-sourced Solaris kernel and low-level system utilities with Debian package management -- and a bunch of packages from Ubuntu. The release I'm playing with (Alpha 5) does, indeed, have ZFS. However, I can't determine what version of ZFS is included. Dselect gives the following information, which doesn't ring any bells for me: *** Req base sunwzfsr 5.11.40-1 5.11.40-1 ZFS (Root) Is there a zfs version command that I don't see? Thanks, -Luke -- Luke Scharf Virginia Tech Unix Administration Services Terascale Computing Facility smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Querying ZFS version?
Luke Scharf wrote: Although regular Solaris is good for what I'm doing at work, I prefer apt-get or yum for package management for a desktop. So, I've been playing with Nexenta / GnuSolaris -- which appears to be the open-sourced Solaris kernel and low-level system utilities with Debian package management -- and a bunch of packages from Ubuntu. The release I'm playing with (Alpha 5) does, indeed, have ZFS. However, I can't determine what version of ZFS is included. Dselect gives the following information, which doesn't ring any bells for me: *** Req base sunwzfsr 5.11.40-1 5.11.40-1 ZFS (Root) On Solaris, pkginfo -l SUNWzfsr would give you a package version for that part of ZFS.. and modinfo | grep zfs will tell you something about the kernel module rev. Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Querying ZFS version?
Luke, You can run 'zpool upgrade' to see what on-disk version you are capable of running. If you have the latest features then you should be running version 3: hadji-2# zpool upgrade This system is currently running ZFS version 3. Unfortunately this won't tell you if you are running the latest fixes but it does tell you that you have all the latest features (at least up through snv_43). Thanks, George Luke Scharf wrote: Although regular Solaris is good for what I'm doing at work, I prefer apt-get or yum for package management for a desktop. So, I've been playing with Nexenta / GnuSolaris -- which appears to be the open-sourced Solaris kernel and low-level system utilities with Debian package management -- and a bunch of packages from Ubuntu. The release I'm playing with (Alpha 5) does, indeed, have ZFS. However, I can't determine what version of ZFS is included. Dselect gives the following information, which doesn't ring any bells for me: *** Req base sunwzfsr 5.11.40-1 5.11.40-1 ZFS (Root) Is there a zfs version command that I don't see? Thanks, -Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Querying ZFS version?
George Wilson wrote: Luke, You can run 'zpool upgrade' to see what on-disk version you are capable of running. If you have the latest features then you should be running version 3: hadji-2# zpool upgrade This system is currently running ZFS version 3. Unfortunately this won't tell you if you are running the latest fixes but it does tell you that you have all the latest features (at least up through snv_43). That works; the Nexenta system says: [EMAIL PROTECTED]:~# zpool upgrade This system is currently running ZFS version 2. All pools are formatted using this version. Which is the same as my Solaris x86 6/06 test-machine. Thanks! -Luke smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Querying ZFS version?
Darren Reed wrote: On Solaris, pkginfo -l SUNWzfsr would give you a package version for that part of ZFS.. and modinfo | grep zfs will tell you something about the kernel module rev. No such luck. Modinfo doesn't show the ZFS module as loaded; that's probably because I'm not running anything with ZFS on the machine at the moment. No pkginfo on this system, which I think is part of the point of the distribution -- one package manager to rule them all. Also, dselect / apt-get just has a one-line description that says ZFS root components. Not real useful, even if you know what ZFS is - is root components those components used by the user root? Or is it for putting the root partition on ZFS? I'm assuming the former -- but the statement is quite ambiguous. But, zpool upgrade gives me an idea of what featureset to expect, which was what I'm aiming for at this point. Thanks, -Luke smime.p7s Description: S/MIME Cryptographic Signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] DTrace IO provider and oracle
Hello, Solaris 10 GA + latest recommended patches: while runing dtrace: bash-3.00# dtrace -n 'io:::start [EMAIL PROTECTED], args[2]-fi_pathname] = count();}' ... vim /zones/obsdb3/root/opt/sfw/bin/vim 296 tnslsnr none 2373 fsflush none 2952 sched none 9949 ar60run none 13590 RACUST none 39252 RAXTRX none 39789 RAXMTR none 40671 FNDLIBR none 64956 oracle none 2096052 How can I interpret 'none' ? Is it possible to get full path (like in vim) ? Regards przemol ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zil_disable
Robert Milkowski wrote: Hello Neil, Monday, August 7, 2006, 6:40:01 PM, you wrote: NP Not quite, zil_disable is inspected on file system mounts. I guess you right that umount/mount will suffice - I just hadn't time to check it and export/import worked. Anyway is there a way for file systems to make it active without unmount/mount in current nevada? No, sorry. Neil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zil_disable
Robert Milkowski wrote: Hello Eric, Monday, August 7, 2006, 6:29:45 PM, you wrote: ES Robert - ES This isn't surprising (either the switch or the results). Our long term ES fix for tweaking this knob is: ES 6280630 zil synchronicity ES Which would add 'zfs set sync' as a per-dataset option. A cut from the ES comments (which aren't visible on opensolaris): ES sync={deferred,standard,forced} ES Controls synchronous semantics for the dataset. ES ES When set to 'standard' (the default), synchronous ES operations such as fsync(3C) behave precisely as defined ES in fcntl.h(3HEAD). ES When set to 'deferred', requests for synchronous ES semantics are ignored. However, ZFS still guarantees ES that ordering is preserved -- that is, consecutive ES operations reach stable storage in order. (If a thread ES performs operation A followed by operation B, then the ES moment that B reaches stable storage, A is guaranteed to ES be on stable storage as well.) ZFS also guarantees that ES all operations will be scheduled for write to stable ES storage within a few seconds, so that an unexpected ES power loss only takes the last few seconds of change ES with it. ES When set to 'forced', all operations become synchronous. ES No operation will return until all previous operations ES have been committed to stable storage. This option can ES be useful if an application is found to depend on ES synchronous semantics without actually requesting them; ES otherwise, it will just make everything slow, and is not ES recommended. ES There was a thread describing the usefulness of this (for builds where ES all-or-nothing over a long period of time), but I can't find it. I remember the thread. Do you know if anyone is currently working on it and when is it expected to be integrated into snv? I'm slated to work on it after I finish up some other ZIL bugs and performance fixes. Neil ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hi. This time some RAID5/RAID-Z benchmarks. This time I connected 3510 head unit with one link to the same server as 3510 JBODs are connected (using second link). snv_44 is used, server is v440. I also tried changing max pending IO requests for HW raid5 lun and checked with DTrace that larger value is really used - it is but it doesn't change benchmark numbers. 1. ZFS on HW RAID5 with 6 disks, atime=off IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s,297us cpu/op, 6.6ms latency IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s,293us cpu/op, 6.7ms latency 2. ZFS with software RAID-Z with 6 disks, atime=off IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s,340us cpu/op, 6.4ms latency IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s,340us cpu/op, 6.4ms latency 3. UFS on HW RAID5 with 6 disks, noatime IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s,481us cpu/op, 49.7ms latency IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s,477us cpu/op, 49.1ms latency 4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem mounted as in 3) IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s,405us cpu/op, 7.5ms latency IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s,407us cpu/op, 7.7ms latency 5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same disks as in test #2) IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s,330us cpu/op, 6.4ms latency IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s,328us cpu/op, 6.5ms latency In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a little bit better performance than hardware raid-5. ZFS is also faster in both cases (HW ans SW raid) than UFS on HW raid. Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works as expected. ZFS on S10U2 in this benchmark gives the same results as on snv_44. details // c2t43d0 is a HW raid5 made of 6 disks // array is configured for random IO's # zpool create HW_RAID5_6disks c2t43d0 # # zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 c3t20d0 c3t21d0 # # zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks # # zfs create HW_RAID5_6disks/t1 # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench load varmail 450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 450: 3.199: Usage: set $dir=dir 450: 3.199:set $filesize=sizedefaults to 16384 450: 3.199:set $nfiles=value defaults to 1000 450: 3.199:set $nthreads=value defaults to 16 450: 3.199:set $meaniosize=value defaults to 16384 450: 3.199:set $meandirwidth=size defaults to 100 450: 3.199: (sets mean dir width and dir depth is calculated as log (width, nfiles) 450: 3.199: dirdepth therefore defaults to dir depth of 1 as in postmark 450: 3.199: set $meandir lower to increase depth beyond 1 if desired) 450: 3.199: 450: 3.199:run runtime (e.g. run 60) 450: 3.199: syntax error, token expected on line 51 filebench set $dir=/HW_RAID5_6disks/t1 filebench run 60 450: 13.320: Fileset bigfileset: 1000 files, avg dir = 100.0, avg depth = 0.5, mbytes=15 450: 13.321: Creating fileset bigfileset... 450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 450: 15.515: Creating/pre-allocating files 450: 15.515: Starting 1 filereader instances 451: 16.525: Starting 16 filereaderthread threads 450: 19.535: Running... 450: 80.065: Run took 60 seconds... 450: 80.079: Per-Operation Breakdown closefile4565ops/s 0.0mb/s 0.0ms/op8us/op-cpu readfile4 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile4 565ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3565ops/s 0.0mb/s 12.9ms/op 147us/op-cpu appendfilerand3 565ops/s 8.8mb/s 0.1ms/op 126us/op-cpu readfile3 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile3 565ops/s 0.0mb/s 0.1ms/op 63us/op-cpu closefile2565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile2565ops/s 0.0mb/s 12.9ms/op 102us/op-cpu appendfilerand2 565ops/s 8.8mb/s 0.1ms/op 92us/op-cpu createfile2 565ops/s 0.0mb/s 0.2ms/op 154us/op-cpu deletefile1 565ops/s 0.0mb/s 0.1ms/op 86us/op-cpu 450: 80.079: IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s,297us cpu/op, 6.6ms latency 450: 80.079: Shutting down processes filebench run 60 450:
RE: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the prefetch logic? These are great results for random I/O, I wonder how the sequential I/O looks? Of course you'll not get great results for sequential I/O on the 3510 :-) - Luke Sent from my GoodLink synchronized handheld (www.good.com) -Original Message- From: Robert Milkowski [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 08, 2006 10:15 AM Eastern Standard Time To: zfs-discuss@opensolaris.org Subject:[zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID Hi. This time some RAID5/RAID-Z benchmarks. This time I connected 3510 head unit with one link to the same server as 3510 JBODs are connected (using second link). snv_44 is used, server is v440. I also tried changing max pending IO requests for HW raid5 lun and checked with DTrace that larger value is really used - it is but it doesn't change benchmark numbers. 1. ZFS on HW RAID5 with 6 disks, atime=off IO Summary: 444386 ops 7341.7 ops/s, (1129/1130 r/w) 36.1mb/s,297us cpu/op, 6.6ms latency IO Summary: 438649 ops 7247.0 ops/s, (1115/1115 r/w) 35.5mb/s,293us cpu/op, 6.7ms latency 2. ZFS with software RAID-Z with 6 disks, atime=off IO Summary: 457505 ops 7567.3 ops/s, (1164/1164 r/w) 37.2mb/s,340us cpu/op, 6.4ms latency IO Summary: 457767 ops 7567.8 ops/s, (1164/1165 r/w) 36.9mb/s,340us cpu/op, 6.4ms latency 3. UFS on HW RAID5 with 6 disks, noatime IO Summary: 62776 ops 1037.3 ops/s, (160/160 r/w) 5.5mb/s,481us cpu/op, 49.7ms latency IO Summary: 63661 ops 1051.6 ops/s, (162/162 r/w) 5.4mb/s,477us cpu/op, 49.1ms latency 4. UFS on HW RAID5 with 6 disks, noatime, S10U2 + patches (the same filesystem mounted as in 3) IO Summary: 393167 ops 6503.1 ops/s, (1000/1001 r/w) 32.4mb/s,405us cpu/op, 7.5ms latency IO Summary: 394525 ops 6521.2 ops/s, (1003/1003 r/w) 32.0mb/s,407us cpu/op, 7.7ms latency 5. ZFS with software RAID-Z with 6 disks, atime=off, S10U2 + patches (the same disks as in test #2) IO Summary: 461708 ops 7635.5 ops/s, (1175/1175 r/w) 37.4mb/s,330us cpu/op, 6.4ms latency IO Summary: 457649 ops 7562.1 ops/s, (1163/1164 r/w) 37.0mb/s,328us cpu/op, 6.5ms latency In this benchmark software raid-5 with ZFS (raid-z to be precise) gives a little bit better performance than hardware raid-5. ZFS is also faster in both cases (HW ans SW raid) than UFS on HW raid. Something is wrong with UFS on snv_44 - the same ufs filesystem on s10U2 works as expected. ZFS on S10U2 in this benchmark gives the same results as on snv_44. details // c2t43d0 is a HW raid5 made of 6 disks // array is configured for random IO's # zpool create HW_RAID5_6disks c2t43d0 # # zpool create -f zfs_raid5_6disks raidz c3t16d0 c3t17d0 c3t18d0 c3t19d0 c3t20d0 c3t21d0 # # zfs set atime=off zfs_raid5_6disks HW_RAID5_6disks # # zfs create HW_RAID5_6disks/t1 # zfs create zfs_raid5_6disks/t1 # # /opt/filebench/bin/sparcv9/filebench filebench load varmail 450: 3.175: Varmail Version 1.24 2005/06/22 08:08:30 personality successfully loaded 450: 3.199: Usage: set $dir=dir 450: 3.199:set $filesize=sizedefaults to 16384 450: 3.199:set $nfiles=value defaults to 1000 450: 3.199:set $nthreads=value defaults to 16 450: 3.199:set $meaniosize=value defaults to 16384 450: 3.199:set $meandirwidth=size defaults to 100 450: 3.199: (sets mean dir width and dir depth is calculated as log (width, nfiles) 450: 3.199: dirdepth therefore defaults to dir depth of 1 as in postmark 450: 3.199: set $meandir lower to increase depth beyond 1 if desired) 450: 3.199: 450: 3.199:run runtime (e.g. run 60) 450: 3.199: syntax error, token expected on line 51 filebench set $dir=/HW_RAID5_6disks/t1 filebench run 60 450: 13.320: Fileset bigfileset: 1000 files, avg dir = 100.0, avg depth = 0.5, mbytes=15 450: 13.321: Creating fileset bigfileset... 450: 15.514: Preallocated 812 of 1000 of fileset bigfileset in 3 seconds 450: 15.515: Creating/pre-allocating files 450: 15.515: Starting 1 filereader instances 451: 16.525: Starting 16 filereaderthread threads 450: 19.535: Running... 450: 80.065: Run took 60 seconds... 450: 80.079: Per-Operation Breakdown closefile4565ops/s 0.0mb/s 0.0ms/op8us/op-cpu readfile4 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile4 565ops/s 0.0mb/s 0.1ms/op 64us/op-cpu closefile3565ops/s 0.0mb/s 0.0ms/op 11us/op-cpu fsyncfile3565ops/s 0.0mb/s 12.9ms/op 147us/op-cpu appendfilerand3 565ops/s 8.8mb/s 0.1ms/op 126us/op-cpu readfile3 565ops/s 9.2mb/s 0.1ms/op 60us/op-cpu openfile3 565ops/s 0.0mb/s 0.1ms/op 63us/op-cpu
[zfs-discuss] Re: ZFS + /var/log + Single-User
Thanks for your answer Eric! I don't see any problem mounting a filesystem under 'legacy' options as long as i can have the freedom of ZFS features by being able to add/remove/play around with disks really! I tested the 'zfs mount -a' and of course my /var/log[b]/test[/b] became visible and my ZONES (playing a little there as well). 1, What kind draw backs are there to have a filesystem mounted as 'legacy' ? 2, What kind of 'features' of ZFS will remain ? Regards, Pierre This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS + /var/log + Single-User
Hello Pierre, Tuesday, August 8, 2006, 4:51:20 PM, you wrote: PK Thanks for your answer Eric! PK I don't see any problem mounting a filesystem under 'legacy' PK options as long as i can have the freedom of ZFS features by being PK able to add/remove/play around with disks really! PK I tested the 'zfs mount -a' and of course my /var/log[b]/test[/b] PK became visible and my ZONES (playing a little there as well). PK 1, What kind draw backs are there to have a filesystem mounted as 'legacy' ? PK 2, What kind of 'features' of ZFS will remain ? legacy only means that ZFS won't mount/umount these file systems and you should manage them manually or via /etc/vfstab. Nothing else. There's one drawback when using legacy mountpoints - when you move a pool of disks to different server then you also have to copy proper vfstab entries - but it's not a problem in your case. Other than that lagacy mountpoint do not impose any restrictions, etc. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS
On 8/8/06, eric kustarz [EMAIL PROTECTED] wrote: Leon Koll wrote: I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB LUNs, connected via FC SAN. The filesystems that were created on LUNS: UFS,VxFS,ZFS. Unfortunately the ZFS test couldn't complete bacuase the box was hung under very moderate load (3000 IOPs). Additional tests were done using UFS and VxFS that were built on ZFS raw devices (Zvolumes). Results can be seen here: http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html hiya leon, Out of curiosity, how was the setup for each filesystem type done? I wasn't sure what 4 ZFS'es in The bad news that the test on 4 ZFS'es couldn't run at all meant... so something like 'zpool status' would be great. Hi Eric, here it is: [EMAIL PROTECTED] ~ # zpool status pool: pool1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool1ONLINE 0 0 0 c4t00173801014Bd0 ONLINE 0 0 0 errors: No known data errors pool: pool2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool2ONLINE 0 0 0 c4t00173801014Cd0 ONLINE 0 0 0 errors: No known data errors pool: pool3 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool3ONLINE 0 0 0 c4t001738010140001Cd0 ONLINE 0 0 0 errors: No known data errors pool: pool4 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool4ONLINE 0 0 0 c4t0017380101400012d0 ONLINE 0 0 0 errors: No known data errors Do you know what you're limiting factor was for ZFS (CPU, memory, I/O...)? Thanks to George Wilson who pointed me to the fact that the memory was fully consumed. I removed the line set ncsize = 0x10 from /etc/system and the now the host isn't hung during the test anymore. But performance is still an issue. -- Leon ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Apple Time Machine
On August 8, 2006 3:04:09 PM +0930 Darren J Moffat [EMAIL PROTECTED] wrote: Adam Leventhal wrote: When a file is modified, the kernel fires off an event which a user-land daemon listens for. Every so often, the user-land daemon does something like a snapshot of the affected portions of the filesystem with hard links (including hard links to directories -- I'm not making this up). That might be a bit off, but it's the impression I was left with. Which sounds very similar to how NTFS does single instance storage and some other things. And how Google Desktop and the Mac OS X Spotlight features index data. -frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Luke, Tuesday, August 8, 2006, 4:48:38 PM, you wrote: LL Does snv44 have the ZFS fixes to the I/O scheduler, the ARC and the prefetch logic? LL These are great results for random I/O, I wonder how the sequential I/O looks? LL Of course you'll not get great results for sequential I/O on the 3510 :-) filebench/singlestreamread v440 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s ps. With software RAID-Z I got about 940ms/s : well, after files were created they were all cached and ZFS almost didn't touch a disks :) ok, I changed filesize to be well over memory size of the server and above results are with that larger filesize. filebench/singlestreamwrite v440 1. UFS, noatime, HW RAID-5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID-5 6 disks, S10U2 (the same lun as in #1) 52MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 148MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 147MB/s So sequential writing in ZFS on HWR5 is actually worse than UFS. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert, On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote: 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s Well, the UFS results are miserable, but the ZFS results aren't good - I'd expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from 8kb to 32kb. Most of my ZFS experiments have been with RAID10, but there were some massive improvements to seq I/O with the fixes I mentioned - I'd expect that this shows that they aren't in snv44. - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[4]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Luke, Tuesday, August 8, 2006, 6:18:39 PM, you wrote: LL Robert, LL On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote: 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s LL Well, the UFS results are miserable, but the ZFS results aren't good - I'd LL expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from LL 8kb to 32kb. Well right now I'm testing with single 200MB/s fc link so it's upper limit in this testing. LL Most of my ZFS experiments have been with RAID10, but there were some LL massive improvements to seq I/O with the fixes I mentioned - I'd expect that LL this shows that they aren't in snv44. So where did you get those fixes? -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Luke Lonergan wrote: Robert, On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote: 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s Well, the UFS results are miserable, but the ZFS results aren't good - I'd expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from 8kb to 32kb. Most of my ZFS experiments have been with RAID10, but there were some massive improvements to seq I/O with the fixes I mentioned - I'd expect that this shows that they aren't in snv44. Those fixes went into snv_45 -Mark ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: Re[4]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert, LL Most of my ZFS experiments have been with RAID10, but there were some LL massive improvements to seq I/O with the fixes I mentioned - I'd expect that LL this shows that they aren't in snv44. So where did you get those fixes? From the fine people who implemented them! As Mark said, apparently they're available in snv_45 (yay!) - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS RAID10
Hi. snv_44, v440 filebench/varmail results for ZFS RAID10 with 6 disks and 32 disks. What is suprising is that the results for both cases are almost the same! 6 disks: IO Summary: 566997 ops 9373.6 ops/s, (1442/1442 r/w) 45.7mb/s, 299us cpu/op, 5.1ms latency IO Summary: 542398 ops 8971.4 ops/s, (1380/1380 r/w) 43.9mb/s, 300us cpu/op, 5.4ms latency 32 disks: IO Summary: 572429 ops 9469.7 ops/s, (1457/1457 r/w) 46.2mb/s, 301us cpu/op, 5.1ms latency IO Summary: 560491 ops 9270.6 ops/s, (1426/1427 r/w) 45.4mb/s, 300us cpu/op, 5.2ms latency Using iostat I can see that with 6 disks in a pool I get about 100-200 IO/s per disk in a pool, and with 32 disk pool I get only 30-70 IO/s per disk in a pool. Each CPU is used at about 25% in SYS (there're 4 CPUs). Something is wrong here. # zpool status pool: zfs_raid10_32disks state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zfs_raid10_32disks ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t16d0 ONLINE 0 0 0 c3t17d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t18d0 ONLINE 0 0 0 c3t19d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t20d0 ONLINE 0 0 0 c3t21d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t22d0 ONLINE 0 0 0 c3t23d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t24d0 ONLINE 0 0 0 c3t25d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t26d0 ONLINE 0 0 0 c3t27d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t32d0 ONLINE 0 0 0 c3t33d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t34d0 ONLINE 0 0 0 c3t35d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t36d0 ONLINE 0 0 0 c3t37d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t38d0 ONLINE 0 0 0 c3t39d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t40d0 ONLINE 0 0 0 c3t41d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t42d0 ONLINE 0 0 0 c3t43d0 ONLINE 0 0 0 errors: No known data errors bash-3.00# zpool destroy zfs_raid10_32disks bash-3.00# zpool create zfs_raid10_6disks mirror c3t42d0 c3t43d0 mirror c3t40d0 c3t41d0 mirror c3t38d0 c3t39d0 bash-3.00# zfs set atime=off zfs_raid10_6disks bash-3.00# zfs create zfs_raid10_6disks/t1 bash-3.00# This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Robert, On 8/8/06 9:11 AM, Robert Milkowski [EMAIL PROTECTED] wrote: 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s Well, the UFS results are miserable, but the ZFS results aren't good - I'd expect between 250-350MB/s from a 6-disk RAID5 with read() blocksize from 8kb to 32kb. Most of my ZFS experiments have been with RAID10, but there were some massive improvements to seq I/O with the fixes I mentioned - I'd expect that this shows that they aren't in snv44. - Luke I dont think there is much chance of achieving anywhere near 350MB/s. That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you can always get very good results from a single disk IO, your percentage gain is always decreasing the more disks you add to the equation. From a single 200MB/s fibre, expect some where between 160-180MB/s, at best. Doug This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote: filebench/singlestreamread v440 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s FYI, Streaming read performance is improved considerably by Mark's prefetch fixes which are in build 45. (However, as mentioned you will soon run into the bandwidth of a single fiber channel connection.) --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Hello Matthew, Tuesday, August 8, 2006, 7:25:17 PM, you wrote: MA On Tue, Aug 08, 2006 at 06:11:09PM +0200, Robert Milkowski wrote: filebench/singlestreamread v440 1. UFS, noatime, HW RAID5 6 disks, S10U2 70MB/s 2. ZFS, atime=off, HW RAID5 6 disks, S10U2 (the same lun as in #1) 87MB/s 3. ZFS, atime=off, SW RAID-Z 6 disks, S10U2 130MB/s 4. ZFS, atime=off, SW RAID-Z 6 disks, snv_44 133MB/s MA FYI, Streaming read performance is improved considerably by Mark's MA prefetch fixes which are in build 45. (However, as mentioned you will MA soon run into the bandwidth of a single fiber channel connection.) I will probably re-test with snv_45 (waiting for SX). FC is not that big problem - if I will find enough time I will just add another FC cards. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS RAID10
On Tue, Aug 08, 2006 at 09:54:16AM -0700, Robert Milkowski wrote: Hi. snv_44, v440 filebench/varmail results for ZFS RAID10 with 6 disks and 32 disks. What is suprising is that the results for both cases are almost the same! 6 disks: IO Summary: 566997 ops 9373.6 ops/s, (1442/1442 r/w) 45.7mb/s, 299us cpu/op, 5.1ms latency IO Summary: 542398 ops 8971.4 ops/s, (1380/1380 r/w) 43.9mb/s, 300us cpu/op, 5.4ms latency 32 disks: IO Summary: 572429 ops 9469.7 ops/s, (1457/1457 r/w) 46.2mb/s, 301us cpu/op, 5.1ms latency IO Summary: 560491 ops 9270.6 ops/s, (1426/1427 r/w) 45.4mb/s, 300us cpu/op, 5.2ms latency Using iostat I can see that with 6 disks in a pool I get about 100-200 IO/s per disk in a pool, and with 32 disk pool I get only 30-70 IO/s per disk in a pool. Each CPU is used at about 25% in SYS (there're 4 CPUs). Something is wrong here. It's possible that you are CPU limited. I'm guessing that your test uses only one thread, so that may be the limiting factor. We can get a quick idea of where that CPU is being spent if you can run 'lockstat -kgIW sleep 60' while your test is running, and send us the first 100 lines of output. It would be nice to see the output of 'iostat -xnpc 3' while the test is running, too. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS RAID10
Hello Doug, Tuesday, August 8, 2006, 7:28:07 PM, you wrote: DS Looks like somewhere between the CPU and your disks you have a limitation of 9500 ops/sec. DS How did you connect 32 disks to your v440? Some 3510 JBODs connected directly over FC. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: ZFS RAID10
filebench in varmail by default creates 16 threads - I configrm it with prstat, 16 threrads are created and running. bash-3.00# lockstat -kgIW sleep 60|less Profiling interrupt: 23308 events in 60.059 seconds (388 events/sec) Count genr cuml rcnt nsec Hottest CPU+PILCaller --- 17615 76% 0.00 2016 cpu[2] thread_start 17255 74% 0.00 2001 cpu[2] idle 14439 62% 0.00 2015 cpu[2] disp_getwork 4726 20% 0.00 2673 cpu[2] syscall_trap 1010 4% 0.00 2625 cpu[2] fdsync 998 4% 0.00 2630 cpu[2] fop_fsync 988 4% 0.00 2632 cpu[2] zfs_fsync 958 4% 0.00 2639 cpu[2] zil_commit 839 4% 0.00 2814 cpu[2] fop_read 765 3% 0.00 2624 cpu[0] write 755 3% 0.00 2625 cpu[0] fop_write 746 3% 0.00 2626 cpu[0] zfs_write 739 3% 0.00 2751 cpu[1] copen 705 3% 0.00 2712 cpu[1] vn_openat 601 3% 0.00 2841 cpu[2] lookuppnat 599 3% 0.00 2284 cpu0 (usermode) 585 3% 0.00 2837 cpu[2] lookuppnvp 546 2% 0.00 2653 cpu[2] zil_lwb_write_start 541 2% 0.00 2726 cpu[0] pread 493 2% 0.00 2762 cpu[1] read 481 2% 0.00 2811 cpu[0] mutex_enter 451 2% 0.00 2684 cpu0 zio_checksum_generate 451 2% 0.00 2684 cpu0 fletcher_2_native 439 2% 0.00 2740 cpu[2] uiomove 413 2% 0.00 2523 cpu[1] zio_checksum 401 2% 0.00 2969 cpu[0] lookupnameat 384 2% 0.00 2755 cpu[1] zfs_read 372 2% 0.00 2529 cpu[1] vn_createat 371 2% 0.00 2653 cpu[0] lwp_mutex_timedlock 352 2% 0.00 2914 cpu[2] pr_read_lwpusage 321 1% 0.00 2777 cpu[0] bzero 317 1% 0.00 2702 cpu[2] unlink 314 1% 0.00 2695 cpu[2] vn_removeat 313 1% 0.00 2760 cpu[1]+11 disp_getbest 311 1% 0.00 2431 cpu[1] zil_lwb_commit 296 1% 0.00 2774 cpu[2] bcopy_more 289 1% 0.00 2796 cpu[1] copyout_more 280 1% 0.00 2757 cpu0 zfs_grow_blocksize 277 1% 0.00 2754 cpu0 dmu_object_set_blocksize 277 1% 0.00 2592 cpu[1] dmu_write_uio 276 1% 0.00 2912 cpu[2] traverse 274 1% 0.00 2759 cpu0 dnode_set_blksz 269 1% 0.00 2751 cpu0 fop_lookup 263 1% 0.00 2675 cpu[0] lwp_upimutex_lock 262 1% 0.00 2753 cpu0 dbuf_new_size 261 1% 0.00 2942 cpu[1] dnode_hold_impl 246 1% 0.00 2478 cpu[2] fop_create 244 1% 0.00 2480 cpu[2] zfs_create 244 1% 0.00 3080 cpu0 vdev_mirror_io_start 212 1% 0.00 2755 cpu[2] mutex_vector_enter 201 1% 0.00 2709 cpu0 zfs_lookup 197 1% 0.00 2723 cpu[2] fop_remove 194 1% 0.00 3007 cpu[2] zfs_zget 194 1% 0.00 2720 cpu[2] zfs_remove 182 1% 0.00 3040 cpu[2] fsop_root 176 1% 0.00 3073 cpu[2] zfs_root 174 1% 0.00 2841 cpu[1]+11 cv_wait 171 1% 0.00 2593 cpu[0]+6 intr_thread 165 1% 0.00 3246 cpu[1] dbuf_hold_impl 163 1% 0.00 2465 cpu[2] zfs_get_data 162 1% 0.00 2534 cpu[0] dbuf_read 160 1% 0.00 2351 cpu[0] taskq_thread 151 1% 0.00 3264 cpu[1] dbuf_hold 147 1% 0.00 3162 cpu[2] dmu_bonus_hold 143 1% 0.00 3770 cpu0 txg_sync_thread 143 1% 0.00 3770 cpu0 spa_sync 143 1% 0.00 3770 cpu0 dsl_pool_sync 143 1% 0.00 3770 cpu0 dmu_objset_sync 143 1% 0.00 3770 cpu0 dmu_objset_sync_dnodes 141 1% 0.00 3798 cpu0 dsl_dataset_sync 141 1% 0.00 2551 cpu[0]
[zfs-discuss] Re: ZFS/Thumper experiences
Hello, I really appreciate such information, could you please give us some additional insight regarding your statement, that [you] tried to drive ZFS to its limit, [...] found that the results were less consistent or predictable. Especially when taking a closer look at the upcoming rdbms+thumper bundles, erratic behavior of the system would not be a thing I'd appreciate in a production environment. In addition to that I'd like to know whether you've got some advice regarding the disk setup. Is it advisable to place Solaris on a separate zfs mirror and to use the rest of the disks in a single or multiple raid-z pool(s)? Many thanks in advance, Jochen This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS RAID10
On Tue, Aug 08, 2006 at 10:42:41AM -0700, Robert Milkowski wrote: filebench in varmail by default creates 16 threads - I configrm it with prstat, 16 threrads are created and running. Ah, OK. Looking at these results, it doesn't seem to be CPU bound, and the disks are not fully utilized either. However, because the test is doing so much synchronous writes (eg. by calling fsync()), we are continually writing out the intent log. Unfortunately, we are only able to issue a small number of concurrent i/os while doing the intent log writes. All the threads must wait for the intent log blocks to be written before they can enqueue more data. Therefore, we are essentially doing: many threads call fsync(). one of them will flush the intent log, issuing a few writes to the disks all of the threads wait for the writes to complete repeat. This test fundamentally requires waiting for lots of syncronous writes. Assuming no other activity on the system, the performance of syncronous writes does not scale with the number of drives, it scales with the drive's write latency. If you were to alter the test to not require everything to be done synchronously, then you would see much different behavior. --matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SPEC SFS97 benchmark of ZFS,UFS,VxFS
Leon Koll wrote: On 8/8/06, eric kustarz [EMAIL PROTECTED] wrote: Leon Koll wrote: I performed a SPEC SFS97 benchmark on Solaris 10u2/Sparc with 4 64GB LUNs, connected via FC SAN. The filesystems that were created on LUNS: UFS,VxFS,ZFS. Unfortunately the ZFS test couldn't complete bacuase the box was hung under very moderate load (3000 IOPs). Additional tests were done using UFS and VxFS that were built on ZFS raw devices (Zvolumes). Results can be seen here: http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html hiya leon, Out of curiosity, how was the setup for each filesystem type done? I wasn't sure what 4 ZFS'es in The bad news that the test on 4 ZFS'es couldn't run at all meant... so something like 'zpool status' would be great. Hi Eric, here it is: [EMAIL PROTECTED] ~ # zpool status pool: pool1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool1ONLINE 0 0 0 c4t00173801014Bd0 ONLINE 0 0 0 errors: No known data errors pool: pool2 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool2ONLINE 0 0 0 c4t00173801014Cd0 ONLINE 0 0 0 errors: No known data errors pool: pool3 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool3ONLINE 0 0 0 c4t001738010140001Cd0 ONLINE 0 0 0 errors: No known data errors pool: pool4 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM pool4ONLINE 0 0 0 c4t0017380101400012d0 ONLINE 0 0 0 errors: No known data errors So having 4 pools isn't a recommended config - i would destroy those 4 pools and just create 1 RAID-0 pool: #zpool create sfsrocks c4t00173801014Bd0 c4t00173801014Cd0 c4t001738010140001Cd0 c4t0017380101400012d0 each of those devices is a 64GB lun, right? Do you know what you're limiting factor was for ZFS (CPU, memory, I/O...)? Thanks to George Wilson who pointed me to the fact that the memory was fully consumed. I removed the line set ncsize = 0x10 from /etc/system and the now the host isn't hung during the test anymore. But performance is still an issue. ah, you were limiting the # of dnlc entries... so you're still seeing ZFS max out at 2000 ops/s? Let us know what happends when you switch to 1 pool. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: Lots of seeks?
So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the überblock. The only inherent problem preventing this right now is that we don't have general scatter/gather at the driver level (ugh). This is a bug that should be fixed, IMO. Then ZFS just needs to delay choosing physical block locations until they’re being written as part of a group. (Of course, as NetApp points out in their WAFL papers, the goal of optimizing writes can conflict with the goal of optimizing reads, so taken to an extreme, this optimization isn’t always desirable.) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Lots of seeks?
On Tue, Anton B. Rang wrote: So while I'm feeling optimistic :-) we really ought to be able to do this in two I/O operations. If we have, say, 500K of data to write (including all of the metadata), we should be able to allocate a contiguous 500K block on disk and write that with a single operation. Then we update the ??berblock. The only inherent problem preventing this right now is that we don't have general scatter/gather at the driver level (ugh). Fixing this bug would help the NFS server significantly given the general lack of continuity of incoming write data (split at mblk boundaries). Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: ZFS/Thumper experiences
Jochen, On 8/8/06 10:47 AM, Jochen M. Kaiser [EMAIL PROTECTED] wrote: I really appreciate such information, could you please give us some additional insight regarding your statement, that [you] tried to drive ZFS to its limit, [...] found that the results were less consistent or predictable. Especially when taking a closer look at the upcoming rdbms+thumper bundles, erratic behavior of the system would not be a thing I'd appreciate in a production environment. Adrian's tests were done on code prior to fixing the I/O scheduler and prefetch logic. We at Greenplum worked with the ZFS team extensively to locate the problems and establish predictable behavior from ZFS prior to the release of the DBMS + Thumper system. The fixes are apparently due sometime soon as part of the nv_45 release on Solaris Express, and will be part of Sol10 U3. - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Re: Re[2]: Re: 3510 HW RAID vs 3510 JBOD ZFS SOFTWARE RAID
Doug, On 8/8/06 10:15 AM, Doug Scott [EMAIL PROTECTED] wrote: I dont think there is much chance of achieving anywhere near 350MB/s. That is a hell of a lot of IO/s for 6 disks+raid(5/Z)+shared fibre. While you can always get very good results from a single disk IO, your percentage gain is always decreasing the more disks you add to the equation. From a single 200MB/s fibre, expect some where between 160-180MB/s, at best. Momentarily forgot about the sucky single FC limit - I've become so used to calculating drive rate, which in this case would be 80MB/s per disk for modern 15K RPM FC or SCSI drives - then multiply by the 5 drives in a 6 drive RAID5/Z. We routinely get 950MB/s from 16 SATA disks on a single server with internal storage. We're getting 2,000 MB/s on 36 disks in an X4500 with ZFS. - Luke ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss