Re: [zfs-discuss] ZFS Web administration interface
Just came across this myself, the command you want to enable just the web admin interface is: # svccfg svc: select system/webconsole svc:/system/webconsole setprop options/tcp_listen=true svc:/system/webconsole quit # svcadm restart system/webconsole This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recovering corrupted root pool
Rainer Orth [EMAIL PROTECTED] writes: Yesterday evening, I tried Live Upgrade on a Sun Fire V60x running SX:CE 90 to SX:CE 93 with ZFS root (mirrored root pool called root). The LU itself ran without problems, but before rebooting the machine, I wanted to add some space to the root pool that had previously been in use for an UFS BE. Both disks (c0t0d0 and c0t1d0) were partitioned as follows: Part TagFlag Cylinders SizeBlocks 0 rootwm 1 - 18810 25.91GB(18810/0/0) 54342090 1 unassignedwm 18811 - 246188.00GB(5808/0/0) 16779312 2 backupwm 0 - 24618 33.91GB(24619/0/0) 71124291 3 unassignedwu 00 (0/0/0)0 4 unassignedwu 00 (0/0/0)0 5 unassignedwu 00 (0/0/0)0 6 unassignedwu 00 (0/0/0)0 7 unassignedwu 00 (0/0/0)0 8 bootwu 0 - 01.41MB(1/0/0) 2889 9 unassignedwu 00 (0/0/0)0 Slice 0 is used by the root pool, slice 1 was used by the UFS BE. To achieve this, I ludeleted the now unused UFS BE and used # NOINUSE_CHECK=1 format to extend slice 0 by the size of slice 1, deleting the latter afterwards. I'm pretty sure that I've done this successfully before, even on a live system, but this time something went wrong: I remember an FMA message about one side of the root pool mirror being broken (something about an inconsistent label, unfortunately I didn't write down the exact message). Nonetheless, I rebooted the machine after luactivate sol_nv_93 (the new ZFS BE), but the machine didn't come up: SunOS Release 5.11 Version snv_93 32-bit Copyright 1983-2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. NOTICE: spa_import_rootpool: error 22 panic[cpu0]/thread=fec1cfe0: cannot mount root path /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a /[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a fec351ac genunix:rootconf+10b (c0f040, 1, fec1c750) fec351d0 genunix:vfs_mountroot+54 (fe800010, fec30fd8,) fec351e4 genunix:main+b4 () panic: entering debugger (no dump device, continue to reboot) skipping system dump - no dump device configured rebooting... I've managed a failsafe boot (from the same pool), and zpool import reveals pool: root id: 14475053522795106129 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: root UNAVAIL insufficient replicas mirror UNAVAIL corrupted data c0t1d0s0 ONLINE c0t0d0s0 ONLINE Even restoring slice 1 on both disks to its old size and shrinking slice 0 accordingly doesn't help. I'm sure I've done this correctly since I could boot from the old sol_nv_b90_ufs BE, which was still on c0t0d0s1. I didn't have much success to find out what's going on here: I tried to remove either of the disks in case both sides of the mirror are inconsistent, but to no avail. I didn't have much luck with zdb either. Here's the output of zdb -l /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t1d0s0: c0t0d0s0: LABEL 0 version=10 name='root' state=0 txg=14643945 pool_guid=14475053522795106129 hostid=336880771 hostname='erebus' top_guid=17627503873514720747 guid=6121143629633742955 vdev_tree type='mirror' id=0 guid=17627503873514720747 whole_disk=0 metaslab_array=13 metaslab_shift=28 ashift=9 asize=36409180160 is_log=0 children[0] type='disk' id=0 guid=1526746004928780410 path='/dev/dsk/c0t1d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a' whole_disk=0 DTL=160 children[1] type='disk' id=1 guid=6121143629633742955 path='/dev/dsk/c0t0d0s0' devid='id1,[EMAIL PROTECTED]/a' phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a' whole_disk=0
Re: [zfs-discuss] ZFS deduplication
Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping backout). Thus, Mertol, it is possible (and could work very well). Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs sparc boot Bad magic number in disk label
Hello Cindy, That did the trick. Thank you for your quick assessment and solution. Joe This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Formatting Problem of ZFS Adm Guide (pdf)
On Mon, 21 Jul 2008, W. Wayne Liauh wrote: Perhaps some considerations should be given to create those documents with OpenOffice.org or StarOffice/StarSuite. I would encourage Sun to continue using the system which has already been working for so many years so that it can focus on creating more good documentation with less cut-and-paste and more clarity. The complaints seem to be about flaws in the viewer programs rather than in the actual PDF. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
[EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM: Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia. org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. dedup is not revision control, you seem to completely misunderstand the problem. Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html ) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping backout). Now it seems you have veered even further off course. What are you implying the LKCD has to do with zfs, solaris, dedup, let alone revision control software? -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] L2ARC is Solaris 10?
When will L2ARC be available in Solaris 10? Thanks, Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup scrubber would take what was written, do the voodoo magic of checksumming the new data, scanning the tree to see if there are any matches, locking the duplicates, run the usage counters up or down for that block of data, swapping out inodes, and marking the duplicate data as free space. It's a lofty goal, but one that is doable. I guess this is only necessary if deduplication is done at the file level. If done at the block level, it could possibly be done on the fly, what with the already implemented checksumming at the block level, but then your reads will suffer because pieces of files can potentially be spread all over hell and half of Georgia on the zdevs. Deduplication is going to require the judicious application of hallucinogens and man hours. I expect that someone is up to the task. On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM: Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia. org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. dedup is not revision control, you seem to completely misunderstand the problem. Using a LKCD ( http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html ) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping backout). Now it seems you have veered even further off course. What are you implying the LKCD has to do with zfs, solaris, dedup, let alone revision control software? -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup scrubber would take what was written, do the voodoo magic of checksumming the new data, scanning the tree to see if there are any matches, locking the duplicates, run the usage counters up or down for that block of data, swapping out inodes, and marking the duplicate data as free space. I agree, but what you are describing is file based dedup, ZFS already has the groundwork for dedup in the system (block level checksuming and pointers). It's a lofty goal, but one that is doable. I guess this is only necessary if deduplication is done at the file level. If done at the block level, it could possibly be done on the fly, what with the already implemented checksumming at the block level, exactly -- that is why it is attractive for ZFS, so much of the groundwork is done and needed for the fs/pool already. but then your reads will suffer because pieces of files can potentially be spread all over hell and half of Georgia on the zdevs. I don't know that you can make this statement without some study of an actual implementation on real world data -- and then because it is block based, you should see varying degrees of this dedup-flack-frag depending on data/usage. For instance, I would imagine that in many scenarios much od the dedup data blocks would belong to the same or very similar files. In this case the blocks were written as best they could on the first write, the deduped blocks would point to a pretty sequential line o blocks. Now on some files there may be duplicate header or similar portions of data -- these may cause you to jump around the disk; but I do not know how much this would be hit or impact real world usage. Deduplication is going to require the judicious application of hallucinogens and man hours. I expect that someone is up to the task. I would prefer the coder(s) not be seeing pink elephants while writing this, but yes it can and will be done. It (I believe) will be easier after the grow/shrink/evac code paths are in place though. Also, the grow/shrink/evac path allows (if it is done right) for other cool things like a base to build a roaming defrag that takes into account snaps, clones, live and the like. I know that some feel that the grow/shrink/evac code is more important for home users, but I think that it is super important for most of these additional features. -Wade On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM: Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia. org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. dedup is not revision control, you seem to completely misunderstand the problem. Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html ) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping backout). Now it seems you have veered even further off course. What are you implying the LKCD has to do with zfs, solaris, dedup, let alone revision control software? -Wade ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- chris -at- microcozm -dot- net === Si Hoc Legere Scis Nimium Eruditionis Habes ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup scrubber would take what was written, do the voodoo magic of checksumming the new data, scanning the tree to see if there are any matches, locking the duplicates, run the usage counters up or down for that block of data, swapping out inodes, and marking the duplicate data as free space. I agree, but what you are describing is file based dedup, ZFS already has the groundwork for dedup in the system (block level checksuming and pointers). It's a lofty goal, but one that is doable. I guess this is only necessary if deduplication is done at the file level. If done at the block level, it could possibly be done on the fly, what with the already implemented checksumming at the block level, exactly -- that is why it is attractive for ZFS, so much of the groundwork is done and needed for the fs/pool already. but then your reads will suffer because pieces of files can potentially be spread all over hell and half of Georgia on the zdevs. I don't know that you can make this statement without some study of an actual implementation on real world data -- and then because it is block based, you should see varying degrees of this dedup-flack-frag depending on data/usage. It's just a NonScientificWAG. I agree that most of the duplicated blocks will in most cases be part of identical files anyway, and thus lined up exactly as you'd want them. I was just free thinking and typing. For instance, I would imagine that in many scenarios much od the dedup data blocks would belong to the same or very similar files. In this case the blocks were written as best they could on the first write, the deduped blocks would point to a pretty sequential line o blocks. Now on some files there may be duplicate header or similar portions of data -- these may cause you to jump around the disk; but I do not know how much this would be hit or impact real world usage. Deduplication is going to require the judicious application of hallucinogens and man hours. I expect that someone is up to the task. I would prefer the coder(s) not be seeing pink elephants while writing this, but yes it can and will be done. It (I believe) will be easier after the grow/shrink/evac code paths are in place though. Also, the grow/shrink/evac path allows (if it is done right) for other cool things like a base to build a roaming defrag that takes into account snaps, clones, live and the like. I know that some feel that the grow/shrink/evac code is more important for home users, but I think that it is super important for most of these additional features. The elephants are just there to keep the coders company. There are tons of benefits for dedup, both for home and non-home users. I'm happy that it's going to be done. I expect the first complaints will come from those people who don't understand it, and their df and du numbers look different than their zpool status ones. Perhaps df/du will just have to be faked out for those folks, or we just apply the same hallucinogens to them instead. -Wade On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote: [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM: Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia. org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. dedup is not revision control, you seem to completely misunderstand the problem. Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html ) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive
Though possible, I don't think we would classify it as a best practice. -- richard Looking at http://opensolaris.org/os/community/volume_manager/ I see: Supports RAID-0, RAID-1, RAID-5, Root mirroring and Seamless upgrades and live upgrades (that would go nicely with my ZFS root mirror - right). I also don't see that there is a nice GUI for those that desire one ... Looking at http://evms.sourceforge.net/gui_screen/ I see some great screenshots and page http://evms.sourceforge.net/ says it supports: Ext2/3, JFS, ReiserFS, XFS, Swap, OCFS2, NTFS, FAT -- so it might be better to suggest adding ZFS there instead of focusing on non-ZFS solutions in this ZFS discussion group. Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Chris Cosby wrote: On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup scrubber would take what was written, do the voodoo magic of checksumming the new data, scanning the tree to see if there are any matches, locking the duplicates, run the usage counters up or down for that block of data, swapping out inodes, and marking the duplicate data as free space. I agree, but what you are describing is file based dedup, ZFS already has the groundwork for dedup in the system (block level checksuming and pointers). It's a lofty goal, but one that is doable. I guess this is only necessary if deduplication is done at the file level. If done at the block level, it could possibly be done on the fly, what with the already implemented checksumming at the block level, exactly -- that is why it is attractive for ZFS, so much of the groundwork is done and needed for the fs/pool already. but then your reads will suffer because pieces of files can potentially be spread all over hell and half of Georgia on the zdevs. I don't know that you can make this statement without some study of an actual implementation on real world data -- and then because it is block based, you should see varying degrees of this dedup-flack-frag depending on data/usage. It's just a NonScientificWAG. I agree that most of the duplicated blocks will in most cases be part of identical files anyway, and thus lined up exactly as you'd want them. I was just free thinking and typing. No, you are right to be concerned over block-level dedup seriously impacting seeks. The problem is that, given many common storage scenarios, you will have not just similar files, but multiple common sections of many files. Things such as the various standard productivity app documents will not just have the same header sections, but internally, there will be significant duplications of considerable length with other documents from the same application. Your 5MB Word file is thus likely to share several (actually, many) multi-kB segments with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. I can list at least a couple more common scenarios where dedup has to potential to save at least some reasonable amount of space, yet will absolutely kill performance. For instance, I would imagine that in many scenarios much od the dedup data blocks would belong to the same or very similar files. In this case the blocks were written as best they could on the first write, the deduped blocks would point to a pretty sequential line o blocks. Now on some files there may be duplicate header or similar portions of data -- these may cause you to jump around the disk; but I do not know how much this would be hit or impact real world usage. Deduplication is going to require the judicious application of hallucinogens and man hours. I expect that someone is up to the task. I would prefer the coder(s) not be seeing pink elephants while writing this, but yes it can and will be done. It (I believe) will be easier after the grow/shrink/evac code paths are in place though. Also, the grow/shrink/evac path allows (if it is done right) for other cool things like a base to build a roaming defrag that takes into account snaps, clones, live and the like. I know that some feel that the grow/shrink/evac code is more important for home users, but I think that it is super important for most of these additional features. The elephants are just there to keep the coders company. There are tons of benefits for dedup, both for home and non-home users. I'm happy that it's going to be done. I expect the first complaints will come from those people who don't understand it, and their df and du numbers look different than their zpool status ones. Perhaps df/du will just have to be faked out for those folks, or we just apply the same hallucinogens to them instead. I'm still not convinced that dedup is really worth it for anything but very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are countered. This in many ways reminds me the last year's discussion over file versioning in the
[zfs-discuss] Cannot attach mirror to SPARC zfs root pool
I just wanted to attach a second mirror to a ZFS root pool on an Ultra 1/170E running snv_93. I've followed the workarounds for CR 6680633 and 6680633 from the ZFS Admin Guide, but booting from the newly attached mirror fails like so: Boot device: disk File and args: Can't mount root Fast Data Access MMU Miss while the original side of the mirror works just fine. Any advice on what could be wrong here? Rainer - Rainer Orth, Faculty of Technology, Bielefeld University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Moving ZFS root pool to different system breaks boot
Recently, I needed to move the boot disks containing a ZFS root pool in an Ultra 1/170E running snv_93 to a different system (same hardware) because the original system was broken/unreliable. To my dismay, unlike with UFS, the new machine wouldn't boot: WARNING: pool 'root' could not be loaded as it was last accessed by another system (host: hostid: 0x808f7fd8). See: http://www.sun.com/msg/ZFS-8000-EY panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 occurred in module unix due to a NULL pointer dereference : trap type = 0x31 pid=0, pc=0x1046de4, sp=0x180a561, tstate=0x4480001602, context=0x0 g1-g7: 0, 180b1c8, 3314f80, 306, 18c2c00, 10, 180e000 0180a9e0 unix:die+74 (10c1400, 180acc0, 0, 0, 10, 180aaa0) %l0-3: 0100 %l4-7: 2000 010c1848 010c1800 1a7e 0180aac0 unix:trap+9d8 (180acc0, 0, 31, 1c00, 0, 5) %l0-3: c168 e000 01835bc0 %l4-7: 0001 0001 01162800 0002 0180ac10 unix:ktl0+48 (0, 180e000, 180afe8, 3314b00, 2f, 3314b00) %l0-3: 0003 1400 004480001602 0101b5b0 %l4-7: 000c 0003 0180acc0 0180ad60 genunix:lookuppnat+90 (0, 0, 1, 180afe0, 180b1c8, 0) %l0-3: 0118 cc23 %l4-7: 003f 000a 01835bc0 0180afe8 0180ae20 genunix:vn_createat+11c (7fff, 1, 180b1d0, 80, 80, 1) %l0-3: %l4-7: 2102 0001 fdff 0180b020 genunix:vn_openat+164 (180b420, 2102, 2302, 200, 1, 100) %l0-3: 0001 %l4-7: 0080 0300 0180b1c8 0180b270 genunix:vn_open+30 (, 1, 2302, 1a4, 0, 0) %l0-3: 0004 01877c00 03309418 03309418 %l4-7: 000c 0003 03c114a8 062c 0180b340 zfs:spa_config_write+c8 (3c13928, 3c138a8, 0, 1, 3c137e8, 18d6c00) %l0-3: 03045870 03d19b10 030458c0 0134fc00 %l4-7: 0018 0005 2000 0134fc00 0180b4b0 zfs:spa_config_sync+104 (33111c0, 0, 1, 33111e8, 0, 5) %l0-3: 03311660 033111c0 03c13928 0135 %l4-7: 0134fc00 0135 0180b570 zfs:spa_import_common+470 (0, 134e400, 0, 1, 0, 33111c0) %l0-3: 01346d58 0001 %l4-7: 0001 01346c00 0134e400 012c 0180b650 zfs:spa_import_rootpool+74 (183b3d0, 183b000, 134f000, 8, 134bc00, 134f000) %l0-3: 0180e000 01873000 0036 01815000 %l4-7: 0035 0002 01843ab8 4885f531 0180b720 zfs:zfs_mountroot+54 (1899f28, 0, 18c2800, 708, 33b09f0, 13380cc) %l0-3: 01815400 018c8000 012ba000 011e8000 %l4-7: 018c3400 012ba000 011e8000 018bbc00 0180b7e0 swapgeneric:rootconf+1ac (12dc400, 0, 1873400, 1873bb0, 18c35f0, 18bfc00) %l0-3: 01873400 018c0800 0304c6f0 %l4-7: 018c2c00 012dc400 01873400 0001 0180b890 unix:stubs_common_code+70 (30f1000, 0, 4, 304c520, 30f1000, 1877f18) %l0-3: 0180b149 0180b211 0001 %l4-7: 01818ab8 01142800 0180b950 genunix:vfs_mountroot+5c (600, 200, 800, 200, 1873400, 189a000) %l0-3: 0001d524 0064 0001d4c0 1d4c %l4-7: 05dc 1770 0640 018c5800 0180ba10 genunix:main+b4 (1815000, 180c000, 1835bc0, 18151f8, 1, 180e000) %l0-3: 01836b58 70002000 010c0400 %l4-7: 0183ac00 0180c000 01836800 panic: entering debugger (no dump device, continue to reboot) This seems to be the same issue as CR 6716241, which has been closed as `Not a defect'. I consider this completely unacceptable since this is a serious regression compared to UFS, which has no such requirement. There needs to be some sort of documented workaround for situations like this. Fortunately, the machine still had a UFS BE with snv_93, where I could import the root pool like this: # zpool import -f -R /mnt root Afterwards, the machine booted as expected. In the
Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool
On Tue, 22 Jul 2008, Rainer Orth wrote: I just wanted to attach a second mirror to a ZFS root pool on an Ultra 1/170E running snv_93. I've followed the workarounds for CR 6680633 and 6680633 from the ZFS Admin Guide, but booting from the newly attached mirror fails like so: I think you're running into CR 6668666. I'd try manually running instlalboot on the new disk and see if that fixes it. Regards, markm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool
Mark J Musante writes: On Tue, 22 Jul 2008, Rainer Orth wrote: I just wanted to attach a second mirror to a ZFS root pool on an Ultra 1/170E running snv_93. I've followed the workarounds for CR 6680633 and 6680633 from the ZFS Admin Guide, but booting from the newly attached mirror fails like so: I think you're running into CR 6668666. I'd try manually running oops, cut-and-paste error on my part: 6668666 was one of the two CRs mentioned in the zfs admin guide which I worked around. instlalboot on the new disk and see if that fixes it. Unfortunately, it didn't. Reconsidering now, I see that I ran installboot against slice 0 (reduced by 1 sector as required by CR 6680633) instead of slice 2 (whole disk). Doing so doesn't fix the problem either, though. Regards. Rainer - Rainer Orth, Faculty of Technology, Bielefeld University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
FWIW, Sun's VTL products use ZFS and offer de-duplication services. http://www.sun.com/aboutsun/pr/2008-04/sunflash.20080407.2.xml -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
[EMAIL PROTECTED] wrote on 07/22/2008 11:48:30 AM: Chris Cosby wrote: On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM: To do dedup properly, it seems like there would have to be some overly complicated methodology for a sort of delayed dedup of the data. For speed, you'd want your writes to go straight into the cache and get flushed out as quickly as possibly, keep everything as ACID as possible. Then, a dedup scrubber would take what was written, do the voodoo magic of checksumming the new data, scanning the tree to see if there are any matches, locking the duplicates, run the usage counters up or down for that block of data, swapping out inodes, and marking the duplicate data as free space. I agree, but what you are describing is file based dedup, ZFS already has the groundwork for dedup in the system (block level checksuming and pointers). It's a lofty goal, but one that is doable. I guess this is only necessary if deduplication is done at the file level. If done at the block level, it could possibly be done on the fly, what with the already implemented checksumming at the block level, exactly -- that is why it is attractive for ZFS, so much of the groundwork is done and needed for the fs/pool already. but then your reads will suffer because pieces of files can potentially be spread all over hell and half of Georgia on the zdevs. I don't know that you can make this statement without some study of an actual implementation on real world data -- and then because it is block based, you should see varying degrees of this dedup-flack-frag depending on data/usage. It's just a NonScientificWAG. I agree that most of the duplicated blocks will in most cases be part of identical files anyway, and thus lined up exactly as you'd want them. I was just free thinking and typing. No, you are right to be concerned over block-level dedup seriously impacting seeks. The problem is that, given many common storage scenarios, you will have not just similar files, but multiple common sections of many files. Things such as the various standard productivity app documents will not just have the same header sections, but internally, there will be significant duplications of considerable length with other documents from the same application. Your 5MB Word file is thus likely to share several (actually, many) multi-kB segments with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. I can list at least a couple more common scenarios where dedup has to potential to save at least some reasonable amount of space, yet will absolutely kill performance. While you may have a point on some data sets, actual testing of this type of data (28.000+ of actual end user doc files) using xdelta with 4k and 8k block sizes shows that the similar blocks in these files are in the 2% range (~ 6% for 4k). That means a full read of each file on average would require 6% seeks to other disk areas. That is not bad, but this is the worst case picture as those duplicate blocks would need to live in the same offsets and have the same block boundaries to match under the proposed algo. To me this means word docs are not a good candidate for dedup at the block level -- but the actual cost to dedup anyways seems small. Of course you could come up with data that is pathologically bad for these benchmarks, but I do not believe it would be nearly as bad as you are making it out to be on real world data. For instance, I would imagine that in many scenarios much od the dedup data blocks would belong to the same or very similar files. In this case the blocks were written as best they could on the first write, the deduped blocks would point to a pretty sequential line o blocks. Now on some files there may be duplicate header or similar portions of data -- these may cause you to jump around the disk; but I do not know how much this would be hit or impact real world usage. Deduplication is going to require the judicious application of hallucinogens and man hours. I expect that someone is up to the task. I would prefer the coder(s) not be seeing pink elephants while writing this, but yes it can and will be done. It (I believe) will be easier after the grow/shrink/evac code paths are in place though. Also, the grow/shrink/evac path allows (if it is done right) for other cool things like a base to build a roaming defrag that takes into account snaps, clones, live and the like. I know
Re: [zfs-discuss] ZFS deduplication
On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote: No, you are right to be concerned over block-level dedup seriously impacting seeks. The problem is that, given many common storage scenarios, you will have not just similar files, but multiple common sections of many files. Things such as the various standard productivity app documents will not just have the same header sections, but internally, there will be significant duplications of considerable length with other documents from the same application. Your 5MB Word file is thus likely to share several (actually, many) multi-kB segments with other Word files. You will thus end up seeking all over the disk to read _most_ Word files. Which really sucks. I can list at least a couple more common scenarios where dedup has to potential to save at least some reasonable amount of space, yet will absolutely kill performance. This would actually argue in favor of dedup... If the blocks are common they are more likely to be in the ARC with dedup, thus avoiding a read altogether. There would likely be greater overhead in assembling smaller packets Here's some real life... I have 442 Word documents created by me and others over several years. Many were created from the same corporate templates. I generated the MD5 hash of every 8 KB of each file and came up with a total of 8409 hash - implying 65 MB of word documents. Taking those hashes through sort | uniq -c | sort -n led to the following: 3 p9I7HgbxFme7TlPZmsD6/Q 3 sKE3RBwZt8A6uz+tAihMDA 3 uA4PK1+SQqD+h1Nv6vJ6fQ 3 wQoU2g7f+dxaBMzY5rVE5Q 3 yM0csnXKtRxjpSxg1Zma0g 3 yyokNamrTcD7lQiitcVgqA 4 jdsZZfIHtshYZiexfX3bQw 17 pohs0DWPFwF8HJ8p/HnFKw 19 s0eKyh/vT1LothTvsqtZOw 64 CCn3F0CqsauYsz6uId7hIg Note that CCn3F0CqsauYsz6uId7hIg is the MD5 hash of 8 KB of zeros. If compression is used as well, this block would not even be stored. If 512 byte blocks are used, the story is a bit different: 81 DEf6rofNmnr1g5f7oaV75w 109 3gP+ZaZ2XKqMkTQ6zGLP/A 121 ypk+0ryBeMVRnnjYQD2ZEA 124 HcuMdyNKV7FDYcPqvb2o3Q 371 s0eKyh/vT1LothTvsqtZOw 372 ozgGMCCoc+0/RFbFDO8MsQ 8535 v2GerAzfP2jUluqTRBN+iw As you might guess, that most common hash is a block of zeros. Most likely, however, these files will end up using 128K blocks for the first part of the file, smaller for the portions that don't fit. When I look at just 128K... 1 znJqBX8RtPrAOV2I6b5Wew 2 6tuJccWHGVwv3v4nee6B9w 2 Qr//PMqqhMtuKfgKhUIWVA 2 idX0awfYjjFmwHwi60MAxg 2 s0eKyh/vT1LothTvsqtZOw 3 +Q/cXnknPr/uUCARsaSIGw 3 /kyIGuWnPH/dC5ETtMqqLw 3 4G/QmksvChYvfhAX+rfgzg 3 SCMoKuvPepBdQEBVrTccvA 3 vbaNWd5IQvsGdQ9R8dIqhw There is actually very little duplication in word files. Many of the dupes above are from various revisions of the same files. Dedup Advantages: (1) save space relative to the amount of duplication. this is highly dependent on workload, and ranges from 0% to 99%, but the distribution of possibilities isn't a bell curve (i.e. the average space saved isn't 50%). I have evidence that shows 75% duplicate data on (mostly sparse) zone roots created and maintained over a 18 month period. I show other evidence above that it is not nearly as good for one person's copy of word documents. I suspect that it would be different if the file system that I did this on was on a file server where all of my colleagues also stored their documents (and revisions of mine that they have reviewed). (2) noticable write performance penalty (assuming block-level dedup on write), with potential write cache issues. Depends on the approach taken. (3) very significant post-write dedup time, at least on the order of 'zfs scrub'. Also, during such a post-write scenario, it more or less takes the zpool out of usage. The ZFS competition that has this in shipping product today does not quiesce the file system during dedup passes. (4) If dedup is done at block level, not at file level, it kills read performance, effectively turning all dedup'd files from sequential read to a random read. That is, block-level dedup drastically accelerates filesystem fragmentation. Absent data that shows this, I don't accept this claim. Arguably the blocks that are duplicate are more likely to be in cache. I think that my analysis above shows that this is not a concern for my data set. (5) Something no one has talked about, but is of concern. By removing duplication, you increase the likelihood that loss of the master segment will corrupt many more files. Yes, ZFS has self-healing and such. But, particularly in the case where there is no ZFS pool redundancy (or pool-level redundancy has been compromised), loss of one block can thus be many more times severe. I believe this is true and likely a good topic for discussion. We need to think long and hard about what the real widespread benefits are of dedup
Re: [zfs-discuss] ZFS deduplication
On 7/22/08 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote: I'm still not convinced that dedup is really worth it for anything but very limited, constrained usage. Disk is just so cheap, that you _really_ have to have an enormous amount of dup before the performance penalties of dedup are countered. Again, I will argue that the spinning rust itself isn't expensive, but data management is. If I am looking to protect multiple PB (through remote data replication and backup), I need more than just the rust to store that. I need to copy this data, which takes time and effort. If the system can say these 500K blocks are the same as these 500K, don't bother copying them to the DR site AGAIN, then I have a less daunting data management task. De-duplication makes a lot of sense at some layer(s) within the data management scheme. Charles ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
On Tue, 22 Jul 2008, Erik Trimble wrote: Dedup Disadvantages: Obviously you do not work in the Sun marketing department which is intrested in this feature (due to some other companies marketing it). Note that the topic starter post came from someone in Sun's marketing department. I think that dedupication is a potential diversion which draws attention away from the core ZFS things which are still not ideally implemented or do not yet exist at all. Compared with other filesystems, ZFS is still a toddler since it has only been deployed for a few years. ZFS is intended to be an enterprise filesystem so let's give it more time to mature before hiting it with the feature stick. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [RFC] Improved versioned pointer algorithms
Btrfs does not suffer from this problem as far as I can see because it uses reference counting rather than a ZFS-style dead list. I was just wondering if ZFS devs recognize the problem and are working on a solution. Daniel, Correct me if I'm wrong, but how does reference counting solve this problem ? The terminology is as following: 1. Filesystem : A writable filesystem with no references or a parent. 2. Snapshot: Immutable point-in-time view of a filesystem 3. Clone: A writable filesystem whose parent is a given snapsho Under this terminology, it is easy to see that dead-list is equivalent to reference counting. The problem is rather that to have a clone, you need to have it's snapshot around, since by definition it is a child of a snapshot (with an exception that by using zfs promote you can make a clone a direct child of the filesystem, it's like turning a grand-child into a child). So what is the terminology of brtfs ? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] OT: Formatting Problem of ZFS Adm Guide (pdf)
I doubt so. Star/OpenOffice are word processors... and like Word they are not suitable for typesetting documents. SGML, FrameMaker TeX/LateX are the only ones capable of doing that. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
et == Erik Trimble [EMAIL PROTECTED] writes: et Dedup Advantages: et (1) save space (2) coalesce data which is frequently used by many nodes in a large cluster into a small nugget of common data which can fit into RAM or L2 fast disk (3) back up non-ZFS filesystems that don't have snapshots and clones (4) make offsite replication easier on the WAN but, yeah, aside from imagining ahead to possible disastrous problems with the final implementation, the imagined use cases should probably be carefully compared to existing large installations. Firstly, dedup may be more tempting as a bulletted marketing feature or a bloggable/banterable boasting point than it is valuable to real people. Secondly, the comparison may drive the implementation. For example, should dedup happen at write time and be something that doesn't happen to data written before it's turned on, like recordsize or compression, to make it simpler in the user interface, and avoid problems with scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night (fast outside-business-hours backup)? pgpHArHK13e1c.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
On Tue, 22 Jul 2008, Miles Nordin wrote: scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night (fast outside-business-hours backup)? I think that the scrub-like model makes the most sense since ZFS write performance should not be penalized. It is useful to implement score-boarding so that a block is not considered for de-duplication until it has been duplicated a certain number of times. In order to decrease resource consumption, it is useful to perform de-duplication over a span of multiple days or multiple weeks doing just part of the job each time around. Deduping a petabyte of data seems quite challenging yet ZFS needs to be scalable to these levels. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Bob Friesenhahn wrote: On Tue, 22 Jul 2008, Erik Trimble wrote: Dedup Disadvantages: Obviously you do not work in the Sun marketing department which is intrested in this feature (due to some other companies marketing it). Note that the topic starter post came from someone in Sun's marketing department. I think that dedupication is a potential diversion which draws attention away from the core ZFS things which are still not ideally implemented or do not yet exist at all. Compared with other filesystems, ZFS is still a toddler since it has only been deployed for a few years. ZFS is intended to be an enterprise filesystem so let's give it more time to mature before hiting it with the feature stick. Bob == Bob Friesenhahn [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ More than anything, Bob's reply is my major feeling on this. Dedup may indeed turn out to be quite useful, but honestly, there's no broad data which says that it is a Big Win (tm) _right_now_, compared to finishing other features. I'd really want a Engineering Study about the real-world use (i.e. what percentage of the userbase _could_ use such a feature, and what percentage _would_ use it, and exactly how useful would each segment find it...) before bumping it up in the priority queue of work to be done on ZFS. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
On Tue, 22 Jul 2008, Miles Nordin wrote: scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night (fast outside-business-hours backup)? I think that the scrub-like model makes the most sense since ZFS write performance should not be penalized. It is useful to implement score-boarding so that a block is not considered for de-duplication until it has been duplicated a certain number of times. In order to decrease resource consumption, it is useful to perform de-duplication over a span of multiple days or multiple weeks doing just part of the job each time around. Deduping a petabyte of data seems quite challenging yet ZFS needs to be scalable to these levels. Bob Friesenhahn In case anyone (other than Bob) missed it, this is why I suggested File-Level Dedup: ... using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made ... We could have: Block-Level (if we wanted to restore an exact copy of the drive - duplicate the 'dd' command) or Byte-Level (if we wanted to use compression - duplicate the 'zfs set compression=on rpool' _or_ 'bzip' commands) ... etc... assuming we wanted to duplicate commands which already implement those features, and provide more than we (the filesystem) needs at a very high cost (performance). So I agree with your comment about the need to be mindful of resource consumption, the ability to do this over a period of days is also useful. Indeed the Plan9 filesystem simply snapshots to WORM and has no delete - nor are they able to fill their drives faster than they can afford to buy new ones: Venti Filesystem http://www.cs.bell-labs.com/who/seanq/p9trace.html Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [zfs-code] Peak every 4-5 second
Dear Mark/All, Our trading system is writing to local and/or array volume at 10k messages per second. Each message is about 700bytes in size. Before ZFS, we used UFS. Even with UFS, there was evey 5 second peak due to fsflush invocation. However each peak is about ~5ms. Our application can not recover from such higher latency. So we used several tuning parameters (tune_r_* and autoup) to decrease the flush interval. As a result peaks came down to ~1.5ms. But it is still too high for our application. I believe, if we could reduce ZFS sync interval down to ~1s, peaks will be reduced to ~1ms or less. We like 1ms peaks per second than 5ms peak per 5 second :-) Are there any tunable, so i can reduce ZFS sync interval. If there is no any tunable, can not I use mdb for the job ...? This is not general and we are ok with increased I/O rate. Please advice/help. Thankx in advance. tharindu Mark Maybee wrote: ZFS is designed to sync a transaction group about every 5 seconds under normal work loads. So your system looks to be operating as designed. Is there some specific reason why you need to reduce this interval? In general, this is a bad idea, as there is somewhat of a fixed overhead associated with each sync, so increasing the sync frequency could result in increased IO. -Mark Tharindu Rukshan Bamunuarachchi wrote: Dear ZFS Gurus, We are developing low latency transaction processing systems for stock exchanges. Low latency high performance file system is critical component of our trading systems. We have choose ZFS as our primary file system. But we saw periodical disk write peaks every 4-5 second. Please refer first column of below output. (marked in bold) Output is generated from our own Disk performance measuring tool. i.e DTool (please find attachment) Compared UFS/VxFS , ZFS is performing very well, but we could not minimize periodical peaks. We used autoup and tune_r_fsflush flags for UFS tuning. Are there any ZFS specific tuning, which will reduce file system flush interval of ZFS. I have tried all parameters specified in solarisinternals and google.com. I would like to go for ZFS code change/recompile if necessary. Please advice. Cheers Tharindu cpu4600-100 /tantan ./*DTool -f M -s 1000 -r 1 -i 1 -W* System Tick = 100 usecs Clock resolution 10 HR Timer created for 100usecs z_FileName = M i_Rate = 1 l_BlockSize = 1000 i_SyncInterval = 0 l_TickInterval = 100 i_TicksPerIO = 1 i_NumOfIOsPerSlot = 1 Max (us)| Min (us) | Avg (us) | MB/S | File Freq Distribution 336 | 4| 10.5635 | 4.7688 | M 50(98.55), 200(1.09), 500(0.36), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), *1911 * | 4| 10.3152 | 9.4822 | M 50(98.90), 200(0.77), 500(0.32), 2000(0.01), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 307 | 4| 9.9386 | 9.5324 | M 50(99.03), 200(0.66), 500(0.31), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 331 | 4| 9.9465 | 9.5332 | M 50(99.04), 200(0.72), 500(0.24), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 318 | 4| 10.1241 | 9.5309 | M 50(99.07), 200(0.66), 500(0.27), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 303 | 4| 9.9236 | 9.5296 | M 50(99.13), 200(0.59), 500(0.28), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 560 | 4| 10.2604 | 9.4565 | M 50(98.82), 200(0.86), 500(0.31), 2000(0.01), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 376 | 4| 9.9975 | 9.5176 | M 50(99.05), 200(0.63), 500(0.32), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), *9783 * | 4| 10.8216 | 9.5301 | M 50(99.05), 200(0.58), 500(0.36), 2000(0.00), 5000(0.00), 1(0.01), 10(0.00), 20(0.00), 332 | 4| 9.9345 | 9.5252 | M 50(99.06), 200(0.61), 500(0.33), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 355 | 4| 9.9906 | 9.5315 | M 50(99.01), 200(0.69), 500(0.30), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 356 | 4| 10.2341 | 9.5207 | M 50(98.96), 200(0.76), 500(0.28), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), 320 | 4| 9.8893 | 9.5279 | M 50(99.10), 200(0.59), 500(0.31), 2000(0.00), 5000(0.00), 1(0.00), 10(0.00), 20(0.00), *10005* | 4| 10.8956 | 9.5258 | M 50(99.07), 200(0.63), 500(0.29), 2000(0.00), 5000(0.00), 1(0.00), 10(0.01),