Re: [zfs-discuss] ZFS Web administration interface

2008-07-22 Thread Ross
Just came across this myself, the command you want to enable just the web admin 
interface is:

# svccfg
svc: select system/webconsole 
svc:/system/webconsole setprop options/tcp_listen=true 
svc:/system/webconsole quit 
# svcadm restart system/webconsole
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovering corrupted root pool

2008-07-22 Thread Rainer Orth
Rainer Orth [EMAIL PROTECTED] writes:

 Yesterday evening, I tried Live Upgrade on a Sun Fire V60x running SX:CE 90
 to SX:CE 93 with ZFS root (mirrored root pool called root).  The LU itself
 ran without problems, but before rebooting the machine, I wanted to add
 some space to the root pool that had previously been in use for an UFS BE.
 
 Both disks (c0t0d0 and c0t1d0) were partitioned as follows:
 
 Part  TagFlag Cylinders SizeBlocks
   0   rootwm   1 - 18810   25.91GB(18810/0/0) 54342090
   1 unassignedwm   18811 - 246188.00GB(5808/0/0)  16779312
   2 backupwm   0 - 24618   33.91GB(24619/0/0) 71124291
   3 unassignedwu   00 (0/0/0)0
   4 unassignedwu   00 (0/0/0)0
   5 unassignedwu   00 (0/0/0)0
   6 unassignedwu   00 (0/0/0)0
   7 unassignedwu   00 (0/0/0)0
   8   bootwu   0 - 01.41MB(1/0/0) 2889
   9 unassignedwu   00 (0/0/0)0
 
 Slice 0 is used by the root pool, slice 1 was used by the UFS BE.  To
 achieve this, I ludeleted the now unused UFS BE and used 
 
 # NOINUSE_CHECK=1 format
 
 to extend slice 0 by the size of slice 1, deleting the latter afterwards.
 I'm pretty sure that I've done this successfully before, even on a live
 system, but this time something went wrong: I remember an FMA message about
 one side of the root pool mirror being broken (something about an
 inconsistent label, unfortunately I didn't write down the exact message).
 Nonetheless, I rebooted the machine after luactivate sol_nv_93 (the new ZFS
 BE), but the machine didn't come up:
 
 SunOS Release 5.11 Version snv_93 32-bit
 Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
 Use is subject to license terms.
 NOTICE:
 spa_import_rootpool: error 22
 
 
 panic[cpu0]/thread=fec1cfe0: cannot mount root path /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a /[EMAIL 
 PROTECTED],0/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL PROTECTED],0:a
 
 fec351ac genunix:rootconf+10b (c0f040, 1, fec1c750)
 fec351d0 genunix:vfs_mountroot+54 (fe800010, fec30fd8,)
 fec351e4 genunix:main+b4 ()
 
 panic: entering debugger (no dump device, continue to reboot)
 skipping system dump - no dump device configured
 rebooting...
 
 I've managed a failsafe boot (from the same pool), and zpool import reveals
 
   pool: root
 id: 14475053522795106129
  state: UNAVAIL
 status: The pool was last accessed by another system.
 action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-EY
 config:
 
 root  UNAVAIL  insufficient replicas
   mirror  UNAVAIL  corrupted data
 c0t1d0s0  ONLINE
 c0t0d0s0  ONLINE
 
 Even restoring slice 1 on both disks to its old size and shrinking slice 0
 accordingly doesn't help.  I'm sure I've done this correctly since I could
 boot from the old sol_nv_b90_ufs BE, which was still on c0t0d0s1.
 
 I didn't have much success to find out what's going on here: I tried to
 remove either of the disks in case both sides of the mirror are
 inconsistent, but to no avail.  I didn't have much luck with zdb either.
 Here's the output of zdb -l /dev/rdsk/c0t0d0s0 and /dev/rdsk/c0t1d0s0:
 
 c0t0d0s0:
 
 
 LABEL 0
 
 version=10
 name='root'
 state=0
 txg=14643945
 pool_guid=14475053522795106129
 hostid=336880771
 hostname='erebus'
 top_guid=17627503873514720747
 guid=6121143629633742955
 vdev_tree
 type='mirror'
 id=0
 guid=17627503873514720747
 whole_disk=0
 metaslab_array=13
 metaslab_shift=28
 ashift=9
 asize=36409180160
 is_log=0
 children[0]
 type='disk'
 id=0
 guid=1526746004928780410
 path='/dev/dsk/c0t1d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
 PROTECTED],0:a'
 whole_disk=0
 DTL=160
 children[1]
 type='disk'
 id=1
 guid=6121143629633742955
 path='/dev/dsk/c0t0d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci8086,[EMAIL 
 PROTECTED]/pci8086,[EMAIL PROTECTED]/pci8086,[EMAIL PROTECTED],1/[EMAIL 
 PROTECTED],0:a'
 whole_disk=0

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
 Hi All 
Is there any hope for deduplication on ZFS ? 
Mertol Ozyoney
Storage Practice - Sales Manager
Sun Microsystems
 Email [EMAIL PROTECTED]

There is always hope.

Seriously thought, looking at 
http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are 
a lot of choices of how we could implement this.

SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of 
those with ZFS. 

It _could_ be as simple (with SVN as an example) of using directory listings to 
produce files which were then 'diffed'. You could then view the diffs as though 
they were changes made to lines of source code. 

Just add a tree subroutine to allow you to grab all the diffs that referenced 
changes to file 'xyz' and you would have easy access to all the changes of a 
particular file (or directory).

With the speed optimized ability added to use ZFS snapshots with the tree 
subroutine to rollback a single file (or directory) you could undo / redo your 
way through the filesystem.

Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html) you 
could sit out on the play and watch from the sidelines -- returning to the OS 
when you thought you were 'safe' (and if not, jumping backout).

Thus, Mertol, it is possible (and could work very well).

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sparc boot Bad magic number in disk label

2008-07-22 Thread Joe Stone
Hello Cindy,

That did the trick. Thank you for your quick assessment and solution.

Joe
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Formatting Problem of ZFS Adm Guide (pdf)

2008-07-22 Thread Bob Friesenhahn
On Mon, 21 Jul 2008, W. Wayne Liauh wrote:

 Perhaps some considerations should be given to create those 
 documents with OpenOffice.org or StarOffice/StarSuite.

I would encourage Sun to continue using the system which has already 
been working for so many years so that it can focus on creating more 
good documentation with less cut-and-paste and more clarity.

The complaints seem to be about flaws in the viewer programs rather 
than in the actual PDF.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:

  Hi All
 Is there any hope for deduplication on ZFS ?
 Mertol Ozyoney
 Storage Practice - Sales Manager
 Sun Microsystems
  Email [EMAIL PROTECTED]

 There is always hope.

 Seriously thought, looking at http://en.wikipedia.
 org/wiki/Comparison_of_revision_control_software there are a lot of
 choices of how we could implement this.

 SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
 one of those with ZFS.

 It _could_ be as simple (with SVN as an example) of using directory
 listings to produce files which were then 'diffed'. You could then
 view the diffs as though they were changes made to lines of source code.

 Just add a tree subroutine to allow you to grab all the diffs that
 referenced changes to file 'xyz' and you would have easy access to
 all the changes of a particular file (or directory).

 With the speed optimized ability added to use ZFS snapshots with the
 tree subroutine to rollback a single file (or directory) you could
 undo / redo your way through the filesystem.



dedup is not revision control,  you seem to completely misunderstand the
problem.



 Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
 ) you could sit out on the play and watch from the sidelines --
 returning to the OS when you thought you were 'safe' (and if not,
 jumping backout).


Now it seems you have veered even further off course.  What are you
implying the LKCD has to do with zfs, solaris, dedup, let alone revision
control software?

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC is Solaris 10?

2008-07-22 Thread Jeff Taylor
When will L2ARC be available in Solaris 10?

Thanks,
Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
To do dedup properly, it seems like there would have to be some overly
complicated methodology for a sort of delayed dedup of the data. For speed,
you'd want your writes to go straight into the cache and get flushed out as
quickly as possibly, keep everything as ACID as possible. Then, a dedup
scrubber would take what was written, do the voodoo magic of checksumming
the new data, scanning the tree to see if there are any matches, locking the
duplicates, run the usage counters up or down for that block of data,
swapping out inodes, and marking the duplicate data as free space. It's a
lofty goal, but one that is doable. I guess this is only necessary if
deduplication is done at the file level. If done at the block level, it
could possibly be done on the fly, what with the already implemented
checksumming at the block level, but then your reads will suffer because
pieces of files can potentially be spread all over hell and half of Georgia
on the zdevs. Deduplication is going to require the judicious application of
hallucinogens and man hours. I expect that someone is up to the task.

On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:

   Hi All
  Is there any hope for deduplication on ZFS ?
  Mertol Ozyoney
  Storage Practice - Sales Manager
  Sun Microsystems
   Email [EMAIL PROTECTED]
 
  There is always hope.
 
  Seriously thought, looking at http://en.wikipedia.
  org/wiki/Comparison_of_revision_control_software there are a lot of
  choices of how we could implement this.
 
  SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
  one of those with ZFS.
 
  It _could_ be as simple (with SVN as an example) of using directory
  listings to produce files which were then 'diffed'. You could then
  view the diffs as though they were changes made to lines of source code.
 
  Just add a tree subroutine to allow you to grab all the diffs that
  referenced changes to file 'xyz' and you would have easy access to
  all the changes of a particular file (or directory).
 
  With the speed optimized ability added to use ZFS snapshots with the
  tree subroutine to rollback a single file (or directory) you could
  undo / redo your way through the filesystem.
 


 dedup is not revision control,  you seem to completely misunderstand the
 problem.



  Using a LKCD (
 http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
  ) you could sit out on the play and watch from the sidelines --
  returning to the OS when you thought you were 'safe' (and if not,
  jumping backout).
 

 Now it seems you have veered even further off course.  What are you
 implying the LKCD has to do with zfs, solaris, dedup, let alone revision
 control software?

 -Wade

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM:

 To do dedup properly, it seems like there would have to be some
 overly complicated methodology for a sort of delayed dedup of the
 data. For speed, you'd want your writes to go straight into the
 cache and get flushed out as quickly as possibly, keep everything as
 ACID as possible. Then, a dedup scrubber would take what was
 written, do the voodoo magic of checksumming the new data, scanning
 the tree to see if there are any matches, locking the duplicates,
 run the usage counters up or down for that block of data, swapping
 out inodes, and marking the duplicate data as free space.
I agree,  but what you are describing is file based dedup,  ZFS already has
the groundwork for dedup in the system (block level checksuming and
pointers).

 It's a
 lofty goal, but one that is doable. I guess this is only necessary
 if deduplication is done at the file level. If done at the block
 level, it could possibly be done on the fly, what with the already
 implemented checksumming at the block level,

exactly -- that is why it is attractive for ZFS,  so much of the groundwork
is done and needed for the fs/pool already.

 but then your reads
 will suffer because pieces of files can potentially be spread all
 over hell and half of Georgia on the zdevs.

I don't know that you can make this statement without some study of an
actual implementation on real world data -- and then because it is block
based,  you should see varying degrees of this dedup-flack-frag depending
on data/usage.

For instance,  I would imagine that in many scenarios much od the dedup
data blocks would belong to the same or very similar files. In this case
the blocks were written as best they could on the first write,  the deduped
blocks would point to a pretty sequential line o blocks.  Now on some files
there may be duplicate header or similar portions of data -- these may
cause you to jump around the disk; but I do not know how much this would be
hit or impact real world usage.


 Deduplication is going
 to require the judicious application of hallucinogens and man hours.
 I expect that someone is up to the task.

I would prefer the coder(s) not be seeing pink elephants while writing
this,  but yes it can and will be done.  It (I believe) will be easier
after the grow/shrink/evac code paths are in place though. Also,  the
grow/shrink/evac path allows (if it is done right) for other cool things
like a base to build a roaming defrag that takes into account snaps,
clones, live and the like.  I know that some feel that the grow/shrink/evac
code is more important for home users,  but I think that it is super
important for most of these additional features.

-Wade

 On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:

   Hi All
  Is there any hope for deduplication on ZFS ?
  Mertol Ozyoney
  Storage Practice - Sales Manager
  Sun Microsystems
   Email [EMAIL PROTECTED]
 
  There is always hope.
 
  Seriously thought, looking at http://en.wikipedia.
  org/wiki/Comparison_of_revision_control_software there are a lot of
  choices of how we could implement this.
 
  SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
  one of those with ZFS.
 
  It _could_ be as simple (with SVN as an example) of using directory
  listings to produce files which were then 'diffed'. You could then
  view the diffs as though they were changes made to lines of source
code.
 
  Just add a tree subroutine to allow you to grab all the diffs that
  referenced changes to file 'xyz' and you would have easy access to
  all the changes of a particular file (or directory).
 
  With the speed optimized ability added to use ZFS snapshots with the
  tree subroutine to rollback a single file (or directory) you could
  undo / redo your way through the filesystem.
 


 dedup is not revision control,  you seem to completely misunderstand the
 problem.



  Using a LKCD
(http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
  ) you could sit out on the play and watch from the sidelines --
  returning to the OS when you thought you were 'safe' (and if not,
  jumping backout).
 

 Now it seems you have veered even further off course.  What are you
 implying the LKCD has to do with zfs, solaris, dedup, let alone revision
 control software?

 -Wade

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



 --
 chris -at- microcozm -dot- net
 === Si Hoc Legere Scis Nimium Eruditionis Habes
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Chris Cosby
On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] wrote on 07/22/2008 09:58:53 AM:

  To do dedup properly, it seems like there would have to be some
  overly complicated methodology for a sort of delayed dedup of the
  data. For speed, you'd want your writes to go straight into the
  cache and get flushed out as quickly as possibly, keep everything as
  ACID as possible. Then, a dedup scrubber would take what was
  written, do the voodoo magic of checksumming the new data, scanning
  the tree to see if there are any matches, locking the duplicates,
  run the usage counters up or down for that block of data, swapping
  out inodes, and marking the duplicate data as free space.
 I agree,  but what you are describing is file based dedup,  ZFS already has
 the groundwork for dedup in the system (block level checksuming and
 pointers).

  It's a
  lofty goal, but one that is doable. I guess this is only necessary
  if deduplication is done at the file level. If done at the block
  level, it could possibly be done on the fly, what with the already
  implemented checksumming at the block level,

 exactly -- that is why it is attractive for ZFS,  so much of the groundwork
 is done and needed for the fs/pool already.

  but then your reads
  will suffer because pieces of files can potentially be spread all
  over hell and half of Georgia on the zdevs.

 I don't know that you can make this statement without some study of an
 actual implementation on real world data -- and then because it is block
 based,  you should see varying degrees of this dedup-flack-frag depending
 on data/usage.

It's just a NonScientificWAG. I agree that most of the duplicated blocks
will in most cases be part of identical files anyway, and thus lined up
exactly as you'd want them. I was just free thinking and typing.




 For instance,  I would imagine that in many scenarios much od the dedup
 data blocks would belong to the same or very similar files. In this case
 the blocks were written as best they could on the first write,  the deduped
 blocks would point to a pretty sequential line o blocks.  Now on some files
 there may be duplicate header or similar portions of data -- these may
 cause you to jump around the disk; but I do not know how much this would be
 hit or impact real world usage.


  Deduplication is going
  to require the judicious application of hallucinogens and man hours.
  I expect that someone is up to the task.

 I would prefer the coder(s) not be seeing pink elephants while writing
 this,  but yes it can and will be done.  It (I believe) will be easier
 after the grow/shrink/evac code paths are in place though. Also,  the
 grow/shrink/evac path allows (if it is done right) for other cool things
 like a base to build a roaming defrag that takes into account snaps,
 clones, live and the like.  I know that some feel that the grow/shrink/evac
 code is more important for home users,  but I think that it is super
 important for most of these additional features.

The elephants are just there to keep the coders company. There are tons of
benefits for dedup, both for home and non-home users. I'm happy that it's
going to be done. I expect the first complaints will come from those people
who don't understand it, and their df and du numbers look different than
their zpool status ones. Perhaps df/du will just have to be faked out for
those folks, or we just apply the same hallucinogens to them instead.




 -Wade

  On Tue, Jul 22, 2008 at 10:39 AM, [EMAIL PROTECTED] wrote:
  [EMAIL PROTECTED] wrote on 07/22/2008 08:05:01 AM:
 
Hi All
   Is there any hope for deduplication on ZFS ?
   Mertol Ozyoney
   Storage Practice - Sales Manager
   Sun Microsystems
Email [EMAIL PROTECTED]
  
   There is always hope.
  
   Seriously thought, looking at http://en.wikipedia.
   org/wiki/Comparison_of_revision_control_software there are a lot of
   choices of how we could implement this.
  
   SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge
   one of those with ZFS.
  
   It _could_ be as simple (with SVN as an example) of using directory
   listings to produce files which were then 'diffed'. You could then
   view the diffs as though they were changes made to lines of source
 code.
  
   Just add a tree subroutine to allow you to grab all the diffs that
   referenced changes to file 'xyz' and you would have easy access to
   all the changes of a particular file (or directory).
  
   With the speed optimized ability added to use ZFS snapshots with the
   tree subroutine to rollback a single file (or directory) you could
   undo / redo your way through the filesystem.
  
 

  dedup is not revision control,  you seem to completely misunderstand the
  problem.
 
 
 
   Using a LKCD
 (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html
   ) you could sit out on the play and watch from the sidelines --
   returning to the OS when you thought you were 'safe' (and if not,
   jumping 

Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive

2008-07-22 Thread Rob Clark
 Though possible, I don't think we would classify it as a best practice.
  -- richard

Looking at http://opensolaris.org/os/community/volume_manager/ I see:
Supports RAID-0, RAID-1, RAID-5, Root mirroring and Seamless upgrades and 
live upgrades (that would go nicely with my ZFS root mirror - right).

I also don't see that there is a nice GUI for those that desire one ...

Looking at http://evms.sourceforge.net/gui_screen/ I see some great screenshots 
and page http://evms.sourceforge.net/ says it supports: Ext2/3, JFS, ReiserFS, 
XFS, Swap, OCFS2, NTFS, FAT -- so it might be better to suggest adding ZFS 
there instead of focusing on non-ZFS solutions in this ZFS discussion group.

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Erik Trimble
Chris Cosby wrote:


 On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote on 07/22/2008
 09:58:53 AM:

  To do dedup properly, it seems like there would have to be some
  overly complicated methodology for a sort of delayed dedup of the
  data. For speed, you'd want your writes to go straight into the
  cache and get flushed out as quickly as possibly, keep everything as
  ACID as possible. Then, a dedup scrubber would take what was
  written, do the voodoo magic of checksumming the new data, scanning
  the tree to see if there are any matches, locking the duplicates,
  run the usage counters up or down for that block of data, swapping
  out inodes, and marking the duplicate data as free space.
 I agree,  but what you are describing is file based dedup,  ZFS
 already has
 the groundwork for dedup in the system (block level checksuming and
 pointers).

  It's a
  lofty goal, but one that is doable. I guess this is only necessary
  if deduplication is done at the file level. If done at the block
  level, it could possibly be done on the fly, what with the already
  implemented checksumming at the block level,

 exactly -- that is why it is attractive for ZFS,  so much of the
 groundwork
 is done and needed for the fs/pool already.

  but then your reads
  will suffer because pieces of files can potentially be spread all
  over hell and half of Georgia on the zdevs.

 I don't know that you can make this statement without some study of an
 actual implementation on real world data -- and then because it is
 block
 based,  you should see varying degrees of this dedup-flack-frag
 depending
 on data/usage.

 It's just a NonScientificWAG. I agree that most of the duplicated 
 blocks will in most cases be part of identical files anyway, and thus 
 lined up exactly as you'd want them. I was just free thinking and typing.
  
No, you are right to be concerned over block-level dedup seriously 
impacting seeks.  The problem is that, given many common storage 
scenarios, you will have not just similar files, but multiple common 
sections of many files.  Things such as the various standard 
productivity app documents will not just have the same header sections, 
but internally, there will be significant duplications of considerable 
length with other documents from the same application.  Your 5MB Word 
file is thus likely to share several (actually, many) multi-kB segments 
with other Word files.  You will thus end up seeking all over the disk 
to read _most_ Word files.  Which really sucks.  I can list at least a 
couple more common scenarios where dedup has to potential to save at 
least some reasonable amount of space, yet will absolutely kill performance.


 For instance,  I would imagine that in many scenarios much od the
 dedup
 data blocks would belong to the same or very similar files. In
 this case
 the blocks were written as best they could on the first write,
  the deduped
 blocks would point to a pretty sequential line o blocks.  Now on
 some files
 there may be duplicate header or similar portions of data -- these may
 cause you to jump around the disk; but I do not know how much this
 would be
 hit or impact real world usage.


  Deduplication is going
  to require the judicious application of hallucinogens and man hours.
  I expect that someone is up to the task.

 I would prefer the coder(s) not be seeing pink elephants while
 writing
 this,  but yes it can and will be done.  It (I believe) will be easier
 after the grow/shrink/evac code paths are in place though. Also,  the
 grow/shrink/evac path allows (if it is done right) for other cool
 things
 like a base to build a roaming defrag that takes into account snaps,
 clones, live and the like.  I know that some feel that the
 grow/shrink/evac
 code is more important for home users,  but I think that it is super
 important for most of these additional features.

 The elephants are just there to keep the coders company. There are 
 tons of benefits for dedup, both for home and non-home users. I'm 
 happy that it's going to be done. I expect the first complaints will 
 come from those people who don't understand it, and their df and du 
 numbers look different than their zpool status ones. Perhaps df/du 
 will just have to be faked out for those folks, or we just apply the 
 same hallucinogens to them instead.

I'm still not convinced that dedup is really worth it for anything but 
very limited, constrained usage. Disk is just so cheap, that you 
_really_ have to have an enormous amount of dup before the performance 
penalties of dedup are countered.

This in many ways reminds me the last year's discussion over file 
versioning in the 

[zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-22 Thread Rainer Orth
I just wanted to attach a second mirror to a ZFS root pool on an Ultra
1/170E running snv_93.

I've followed the workarounds for CR 6680633 and 6680633 from the ZFS Admin
Guide, but booting from the newly attached mirror fails like so:

Boot device: disk  File and args: 

Can't mount root
Fast Data Access MMU Miss

while the original side of the mirror works just fine.

Any advice on what could be wrong here?

Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Moving ZFS root pool to different system breaks boot

2008-07-22 Thread Rainer Orth
Recently, I needed to move the boot disks containing a ZFS root pool in an
Ultra 1/170E running snv_93 to a different system (same hardware) because
the original system was broken/unreliable.

To my dismay, unlike with UFS, the new machine wouldn't boot:

WARNING: pool 'root' could not be loaded as it was last accessed by another 
system (host:  hostid: 0x808f7fd8).  See: http://www.sun.com/msg/ZFS-8000-EY

panic[cpu0]/thread=180e000: BAD TRAP: type=31 rp=180acc0 addr=0 mmu_fsr=0 
occurred in module unix due to a NULL pointer dereference

: trap type = 0x31
pid=0, pc=0x1046de4, sp=0x180a561, tstate=0x4480001602, context=0x0
g1-g7: 0, 180b1c8, 3314f80, 306, 18c2c00, 10, 180e000

0180a9e0 unix:die+74 (10c1400, 180acc0, 0, 0, 10, 180aaa0)
  %l0-3:    0100
  %l4-7: 2000 010c1848 010c1800 1a7e
0180aac0 unix:trap+9d8 (180acc0, 0, 31, 1c00, 0, 5)
  %l0-3: c168 e000  01835bc0
  %l4-7: 0001 0001 01162800 0002
0180ac10 unix:ktl0+48 (0, 180e000, 180afe8, 3314b00, 2f, 
3314b00)
  %l0-3: 0003 1400 004480001602 0101b5b0
  %l4-7: 000c 0003  0180acc0
0180ad60 genunix:lookuppnat+90 (0, 0, 1, 180afe0, 180b1c8, 0)
  %l0-3: 0118   cc23
  %l4-7: 003f 000a 01835bc0 0180afe8
0180ae20 genunix:vn_createat+11c (7fff, 1, 180b1d0, 80, 80, 1)
  %l0-3:    
  %l4-7: 2102 0001 fdff 
0180b020 genunix:vn_openat+164 (180b420, 2102, 2302, 200, 1, 100)
  %l0-3: 0001   
  %l4-7:  0080 0300 0180b1c8
0180b270 genunix:vn_open+30 (, 1, 2302, 1a4, 0, 0)
  %l0-3: 0004 01877c00 03309418 03309418
  %l4-7: 000c 0003 03c114a8 062c
0180b340 zfs:spa_config_write+c8 (3c13928, 3c138a8, 0, 1, 
3c137e8, 18d6c00)
  %l0-3: 03045870 03d19b10 030458c0 0134fc00
  %l4-7: 0018 0005 2000 0134fc00
0180b4b0 zfs:spa_config_sync+104 (33111c0, 0, 1, 33111e8, 0, 5)
  %l0-3: 03311660 033111c0 03c13928 0135
  %l4-7: 0134fc00 0135  
0180b570 zfs:spa_import_common+470 (0, 134e400, 0, 1, 0, 33111c0)
  %l0-3: 01346d58  0001 
  %l4-7: 0001 01346c00 0134e400 012c
0180b650 zfs:spa_import_rootpool+74 (183b3d0, 183b000, 134f000, 8, 
134bc00, 134f000)
  %l0-3: 0180e000 01873000 0036 01815000
  %l4-7: 0035 0002 01843ab8 4885f531
0180b720 zfs:zfs_mountroot+54 (1899f28, 0, 18c2800, 708, 33b09f0, 
13380cc)
  %l0-3: 01815400 018c8000 012ba000 011e8000
  %l4-7: 018c3400 012ba000 011e8000 018bbc00
0180b7e0 swapgeneric:rootconf+1ac (12dc400, 0, 1873400, 1873bb0, 
18c35f0, 18bfc00)
  %l0-3: 01873400  018c0800 0304c6f0
  %l4-7: 018c2c00 012dc400 01873400 0001
0180b890 unix:stubs_common_code+70 (30f1000, 0, 4, 304c520, 
30f1000, 1877f18)
  %l0-3: 0180b149 0180b211  0001
  %l4-7:  01818ab8  01142800
0180b950 genunix:vfs_mountroot+5c (600, 200, 800, 200, 1873400, 189a000)
  %l0-3: 0001d524 0064 0001d4c0 1d4c
  %l4-7: 05dc 1770 0640 018c5800
0180ba10 genunix:main+b4 (1815000, 180c000, 1835bc0, 18151f8, 1, 
180e000)
  %l0-3: 01836b58 70002000 010c0400 
  %l4-7: 0183ac00  0180c000 01836800

panic: entering debugger (no dump device, continue to reboot)

This seems to be the same issue as CR 6716241, which has been closed as
`Not a defect'.  I consider this completely unacceptable since this is a
serious regression compared to UFS, which has no such requirement.  There
needs to be some sort of documented workaround for situations like this.

Fortunately, the machine still had a UFS BE with snv_93, where I could
import the root pool like this:

# zpool import -f -R /mnt root

Afterwards, the machine booted as expected.  In the 

Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-22 Thread Mark J Musante
On Tue, 22 Jul 2008, Rainer Orth wrote:

 I just wanted to attach a second mirror to a ZFS root pool on an Ultra 
 1/170E running snv_93.

 I've followed the workarounds for CR 6680633 and 6680633 from the ZFS 
 Admin Guide, but booting from the newly attached mirror fails like so:

I think you're running into CR 6668666.  I'd try manually running 
instlalboot on the new disk and see if that fixes it.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cannot attach mirror to SPARC zfs root pool

2008-07-22 Thread Rainer Orth
Mark J Musante writes:

 On Tue, 22 Jul 2008, Rainer Orth wrote:
 
  I just wanted to attach a second mirror to a ZFS root pool on an Ultra 
  1/170E running snv_93.
 
  I've followed the workarounds for CR 6680633 and 6680633 from the ZFS 
  Admin Guide, but booting from the newly attached mirror fails like so:
 
 I think you're running into CR 6668666.  I'd try manually running 

oops, cut-and-paste error on my part: 6668666 was one of the two CRs
mentioned in the zfs admin guide which I worked around.

 instlalboot on the new disk and see if that fixes it.

Unfortunately, it didn't.  Reconsidering now, I see that I ran installboot
against slice 0 (reduced by 1 sector as required by CR 6680633) instead of
slice 2 (whole disk).  Doing so doesn't fix the problem either, though.

Regards.
Rainer

-
Rainer Orth, Faculty of Technology, Bielefeld University

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Richard Elling
FWIW,
Sun's VTL products use ZFS and offer de-duplication services.
http://www.sun.com/aboutsun/pr/2008-04/sunflash.20080407.2.xml
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 07/22/2008 11:48:30 AM:

 Chris Cosby wrote:
 
 
  On Tue, Jul 22, 2008 at 11:19 AM, [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote on 07/22/2008
  09:58:53 AM:
 
   To do dedup properly, it seems like there would have to be some
   overly complicated methodology for a sort of delayed dedup of the
   data. For speed, you'd want your writes to go straight into the
   cache and get flushed out as quickly as possibly, keep everything
as
   ACID as possible. Then, a dedup scrubber would take what was
   written, do the voodoo magic of checksumming the new data,
scanning
   the tree to see if there are any matches, locking the duplicates,
   run the usage counters up or down for that block of data,
swapping
   out inodes, and marking the duplicate data as free space.
  I agree,  but what you are describing is file based dedup,  ZFS
  already has
  the groundwork for dedup in the system (block level checksuming and
  pointers).
 
   It's a
   lofty goal, but one that is doable. I guess this is only
necessary
   if deduplication is done at the file level. If done at the block
   level, it could possibly be done on the fly, what with the
already
   implemented checksumming at the block level,
 
  exactly -- that is why it is attractive for ZFS,  so much of the
  groundwork
  is done and needed for the fs/pool already.
 
   but then your reads
   will suffer because pieces of files can potentially be spread all
   over hell and half of Georgia on the zdevs.
 
  I don't know that you can make this statement without some study of
an
  actual implementation on real world data -- and then because it is
  block
  based,  you should see varying degrees of this dedup-flack-frag
  depending
  on data/usage.
 
  It's just a NonScientificWAG. I agree that most of the duplicated
  blocks will in most cases be part of identical files anyway, and thus
  lined up exactly as you'd want them. I was just free thinking and
typing.
 
 No, you are right to be concerned over block-level dedup seriously
 impacting seeks.  The problem is that, given many common storage
 scenarios, you will have not just similar files, but multiple common
 sections of many files.  Things such as the various standard
 productivity app documents will not just have the same header sections,
 but internally, there will be significant duplications of considerable
 length with other documents from the same application.  Your 5MB Word
 file is thus likely to share several (actually, many) multi-kB segments
 with other Word files.  You will thus end up seeking all over the disk
 to read _most_ Word files.  Which really sucks.  I can list at least a
 couple more common scenarios where dedup has to potential to save at
 least some reasonable amount of space, yet will absolutely kill
performance.

While you may have a point on some data sets,  actual testing of this type
of data (28.000+ of actual end user doc files) using xdelta with 4k and 8k
block sizes shows that the similar blocks in these files are in the 2%
range (~ 6% for 4k). That means a full read of each file on average would
require  6% seeks to other disk areas. That is not bad,  but this is the
worst case picture as those duplicate blocks would need to live in the same
offsets and have the same block boundaries to match under the proposed
algo. To me this means word docs are not a good candidate for dedup at the
block level -- but the actual cost to dedup anyways seems small.  Of course
you could come up with data that is pathologically bad for these
benchmarks,  but I do not believe it would be nearly as bad as you are
making it out to be on real world data.





  For instance,  I would imagine that in many scenarios much od the
  dedup
  data blocks would belong to the same or very similar files. In
  this case
  the blocks were written as best they could on the first write,
   the deduped
  blocks would point to a pretty sequential line o blocks.  Now on
  some files
  there may be duplicate header or similar portions of data -- these
may
  cause you to jump around the disk; but I do not know how much this
  would be
  hit or impact real world usage.
 
 
   Deduplication is going
   to require the judicious application of hallucinogens and man
hours.
   I expect that someone is up to the task.
 
  I would prefer the coder(s) not be seeing pink elephants while
  writing
  this,  but yes it can and will be done.  It (I believe) will be
easier
  after the grow/shrink/evac code paths are in place though. Also,
the
  grow/shrink/evac path allows (if it is done right) for other cool
  things
  like a base to build a roaming defrag that takes into account
snaps,
  clones, live and the like.  I know 

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Mike Gerdts
On Tue, Jul 22, 2008 at 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote:
 No, you are right to be concerned over block-level dedup seriously
 impacting seeks.  The problem is that, given many common storage
 scenarios, you will have not just similar files, but multiple common
 sections of many files.  Things such as the various standard
 productivity app documents will not just have the same header sections,
 but internally, there will be significant duplications of considerable
 length with other documents from the same application.  Your 5MB Word
 file is thus likely to share several (actually, many) multi-kB segments
 with other Word files.  You will thus end up seeking all over the disk
 to read _most_ Word files.  Which really sucks.  I can list at least a
 couple more common scenarios where dedup has to potential to save at
 least some reasonable amount of space, yet will absolutely kill performance.

This would actually argue in favor of dedup... If the blocks are
common they are more likely to be in the ARC with dedup, thus avoiding
a read altogether.  There would likely be greater overhead in
assembling smaller packets

Here's some real life...

I have 442 Word documents created by me and others over several years.
 Many were created from the same corporate templates.  I generated the
MD5 hash of every 8 KB of each file and came up with a total of 8409
hash - implying 65 MB of word documents.  Taking those hashes through
sort | uniq -c | sort -n led to the following:

  3 p9I7HgbxFme7TlPZmsD6/Q
  3 sKE3RBwZt8A6uz+tAihMDA
  3 uA4PK1+SQqD+h1Nv6vJ6fQ
  3 wQoU2g7f+dxaBMzY5rVE5Q
  3 yM0csnXKtRxjpSxg1Zma0g
  3 yyokNamrTcD7lQiitcVgqA
  4 jdsZZfIHtshYZiexfX3bQw
 17 pohs0DWPFwF8HJ8p/HnFKw
 19 s0eKyh/vT1LothTvsqtZOw
 64 CCn3F0CqsauYsz6uId7hIg

Note that CCn3F0CqsauYsz6uId7hIg is the MD5 hash of 8 KB of zeros.
If compression is used as well, this block would not even be stored.

If 512 byte blocks are used, the story is a bit different:

 81 DEf6rofNmnr1g5f7oaV75w
109 3gP+ZaZ2XKqMkTQ6zGLP/A
121 ypk+0ryBeMVRnnjYQD2ZEA
124 HcuMdyNKV7FDYcPqvb2o3Q
371 s0eKyh/vT1LothTvsqtZOw
372 ozgGMCCoc+0/RFbFDO8MsQ
   8535 v2GerAzfP2jUluqTRBN+iw

As you might guess, that most common hash is a block of zeros.

Most likely, however, these files will end up using 128K blocks for
the first part of the file, smaller for the portions that don't fit.
When I look at just 128K...

  1 znJqBX8RtPrAOV2I6b5Wew
  2 6tuJccWHGVwv3v4nee6B9w
  2 Qr//PMqqhMtuKfgKhUIWVA
  2 idX0awfYjjFmwHwi60MAxg
  2 s0eKyh/vT1LothTvsqtZOw
  3 +Q/cXnknPr/uUCARsaSIGw
  3 /kyIGuWnPH/dC5ETtMqqLw
  3 4G/QmksvChYvfhAX+rfgzg
  3 SCMoKuvPepBdQEBVrTccvA
  3 vbaNWd5IQvsGdQ9R8dIqhw

There is actually very little duplication in word files.  Many of the
dupes above are from various revisions of the same files.

 Dedup Advantages:

 (1)  save space relative to the amount of duplication.  this is highly
 dependent on workload, and ranges from 0% to 99%, but the distribution
 of possibilities isn't a bell curve (i.e. the average space saved isn't
 50%).

I have evidence that shows 75% duplicate data on (mostly sparse) zone
roots created and maintained over a 18 month period.  I show other
evidence above that it is not nearly as good for one person's copy of
word documents.  I suspect that it would be different if the file
system that I did this on was on a file server where all of my
colleagues also stored their documents (and revisions of mine that
they have reviewed).

 (2)  noticable write performance penalty (assuming block-level dedup on
 write), with potential write cache issues.

Depends on the approach taken.

 (3)  very significant post-write dedup time, at least on the order of
 'zfs scrub'. Also, during such a post-write scenario, it more or less
 takes the zpool out of usage.

The ZFS competition that has this in shipping product today does not
quiesce the file system during dedup passes.

 (4) If dedup is done at block level, not at file level, it kills read
 performance, effectively turning all dedup'd files from sequential read
 to a random read.  That is, block-level dedup drastically accelerates
 filesystem fragmentation.

Absent data that shows this, I don't accept this claim.  Arguably the
blocks that are duplicate are more likely to be in cache.  I think
that my analysis above shows that this is not a concern for my data
set.

 (5)  Something no one has talked about, but is of concern. By removing
 duplication, you increase the likelihood that loss of the master
 segment will corrupt many more files. Yes, ZFS has self-healing and
 such.  But, particularly in the case where there is no ZFS pool
 redundancy (or pool-level redundancy has been compromised), loss of one
 block can thus be many more times severe.

I believe this is true and likely a good topic for discussion.

 We need to think long and hard about what the real widespread benefits
 are of dedup 

Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Charles Soto
On 7/22/08 11:48 AM, Erik Trimble [EMAIL PROTECTED] wrote:

 I'm still not convinced that dedup is really worth it for anything but
 very limited, constrained usage. Disk is just so cheap, that you
 _really_ have to have an enormous amount of dup before the performance
 penalties of dedup are countered.

Again, I will argue that the spinning rust itself isn't expensive, but data
management is.  If I am looking to protect multiple PB (through remote data
replication and backup), I need more than just the rust to store that.  I
need to copy this data, which takes time and effort.  If the system can say
these 500K blocks are the same as these 500K, don't bother copying them to
the DR site AGAIN, then I have a less daunting data management task.
De-duplication makes a lot of sense at some layer(s) within the data
management scheme.

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Bob Friesenhahn
On Tue, 22 Jul 2008, Erik Trimble wrote:

 Dedup Disadvantages:

Obviously you do not work in the Sun marketing department which is 
intrested in this feature (due to some other companies marketing it). 
Note that the topic starter post came from someone in Sun's marketing 
department.

I think that dedupication is a potential diversion which draws 
attention away from the core ZFS things which are still not ideally 
implemented or do not yet exist at all.  Compared with other 
filesystems, ZFS is still a toddler since it has only been deployed 
for a few years.  ZFS is intended to be an enterprise filesystem so 
let's give it more time to mature before hiting it with the feature 
stick.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [RFC] Improved versioned pointer algorithms

2008-07-22 Thread Akhilesh Mritunjai
 Btrfs does not suffer from this problem as far as I
 can see because it
 uses reference counting rather than a ZFS-style dead
 list.  I was just
 wondering if ZFS devs recognize the problem and are
 working on a
 solution.

Daniel,

Correct me if I'm wrong, but how does reference counting solve this problem ?

The terminology is as following:

1. Filesystem : A writable filesystem with no references or a parent.
2. Snapshot: Immutable point-in-time view of a filesystem
3. Clone: A writable filesystem whose parent is a given snapsho

Under this terminology, it is easy to see that dead-list is equivalent to 
reference counting. The problem is rather that to have a clone, you need to 
have it's snapshot around, since by definition it is a child of a snapshot 
(with an exception that by using zfs promote you can make a clone a direct 
child of the filesystem, it's like turning a grand-child into a child).

So what is the terminology of brtfs ?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OT: Formatting Problem of ZFS Adm Guide (pdf)

2008-07-22 Thread Akhilesh Mritunjai
I doubt so. Star/OpenOffice are word processors... and like Word they are not 
suitable for typesetting documents.

SGML, FrameMaker  TeX/LateX are the only ones capable of doing that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Miles Nordin
 et == Erik Trimble [EMAIL PROTECTED] writes:

et Dedup Advantages:

et (1) save space

(2) coalesce data which is frequently used by many nodes in a large
cluster into a small nugget of common data which can fit into RAM
or L2 fast disk

(3) back up non-ZFS filesystems that don't have snapshots and clones

(4) make offsite replication easier on the WAN

but, yeah, aside from imagining ahead to possible disastrous problems
with the final implementation, the imagined use cases should probably
be carefully compared to existing large installations.

Firstly, dedup may be more tempting as a bulletted marketing feature
or a bloggable/banterable boasting point than it is valuable to real
people.

Secondly, the comparison may drive the implementation.  For example,
should dedup happen at write time and be something that doesn't happen
to data written before it's turned on, like recordsize or compression,
to make it simpler in the user interface, and avoid problems with
scrubs making pools uselessly slow?  Or should it be scrub-like so
that already-written filesystems can be thrown into the dedup bag and
slowly squeezed, or so that dedup can run slowly during the business
day over data written quickly at night (fast outside-business-hours
backup)?


pgpHArHK13e1c.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Bob Friesenhahn
On Tue, 22 Jul 2008, Miles Nordin wrote:
 scrubs making pools uselessly slow?  Or should it be scrub-like so
 that already-written filesystems can be thrown into the dedup bag and
 slowly squeezed, or so that dedup can run slowly during the business
 day over data written quickly at night (fast outside-business-hours
 backup)?

I think that the scrub-like model makes the most sense since ZFS write 
performance should not be penalized.  It is useful to implement 
score-boarding so that a block is not considered for de-duplication 
until it has been duplicated a certain number of times.  In order to 
decrease resource consumption, it is useful to perform de-duplication 
over a span of multiple days or multiple weeks doing just part of the 
job each time around. Deduping a petabyte of data seems quite 
challenging yet ZFS needs to be scalable to these levels.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Erik Trimble
Bob Friesenhahn wrote:
 On Tue, 22 Jul 2008, Erik Trimble wrote:
   
 Dedup Disadvantages:
 

 Obviously you do not work in the Sun marketing department which is 
 intrested in this feature (due to some other companies marketing it). 
 Note that the topic starter post came from someone in Sun's marketing 
 department.

 I think that dedupication is a potential diversion which draws 
 attention away from the core ZFS things which are still not ideally 
 implemented or do not yet exist at all.  Compared with other 
 filesystems, ZFS is still a toddler since it has only been deployed 
 for a few years.  ZFS is intended to be an enterprise filesystem so 
 let's give it more time to mature before hiting it with the feature 
 stick.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
   

More than anything, Bob's reply is my major feeling on this.  Dedup may 
indeed turn out to be quite useful, but honestly, there's no broad data 
which says that it is a Big Win (tm) _right_now_, compared to finishing 
other features.  I'd really want a Engineering Study about the 
real-world use (i.e. what percentage of the userbase _could_ use such a 
feature, and what percentage _would_ use it, and exactly how useful 
would each segment find it...) before bumping it up in the priority 
queue of work to be done on ZFS.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
 On Tue, 22 Jul 2008, Miles Nordin wrote:
  scrubs making pools uselessly slow?  Or should it be scrub-like so
  that already-written filesystems can be thrown into the dedup bag and
  slowly squeezed, or so that dedup can run slowly during the business
  day over data written quickly at night (fast outside-business-hours
  backup)?
 
 I think that the scrub-like model makes the most sense since ZFS write 
 performance should not be penalized.  It is useful to implement 
 score-boarding so that a block is not considered for de-duplication 
 until it has been duplicated a certain number of times.  In order to 
 decrease resource consumption, it is useful to perform de-duplication 
 over a span of multiple days or multiple weeks doing just part of the 
 job each time around. Deduping a petabyte of data seems quite 
 challenging yet ZFS needs to be scalable to these levels.
 Bob Friesenhahn

In case anyone (other than Bob) missed it, this is why I suggested File-Level 
Dedup:

... using directory listings to produce files which were then 'diffed'. You 
could then view the diffs as though they were changes made ...


We could have:
Block-Level (if we wanted to restore an exact copy of the drive - duplicate  
the 'dd' command) or 
Byte-Level (if we wanted to use compression - duplicate the 'zfs set 
compression=on rpool' _or_ 'bzip' commands) ...
etc... 
assuming we wanted to duplicate commands which already implement those 
features, and provide more than we (the filesystem) needs at a very high cost 
(performance).

So I agree with your comment about the need to be mindful of resource 
consumption, the ability to do this over a period of days is also useful.

Indeed the Plan9 filesystem simply snapshots to WORM and has no delete - nor 
are they able to fill their drives faster than they can afford to buy new ones:

Venti Filesystem
http://www.cs.bell-labs.com/who/seanq/p9trace.html

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zfs-code] Peak every 4-5 second

2008-07-22 Thread Tharindu Rukshan Bamunuarachchi

Dear Mark/All,

Our trading system is writing to local and/or array volume at 10k 
messages per second.
Each message is about 700bytes in size.

Before ZFS, we used UFS.
Even with UFS, there was evey 5 second peak due to fsflush invocation.

However each peak is about ~5ms.
Our application can not recover from such higher latency.

So we used several tuning parameters (tune_r_* and autoup) to decrease 
the flush interval.
As a result peaks came down to ~1.5ms. But it is still too high for our 
application.

I believe, if we could reduce ZFS sync interval down to ~1s, peaks will 
be reduced to ~1ms or less.
We like 1ms peaks per second than 5ms peak per 5 second :-)

Are there any tunable, so i can reduce ZFS sync interval.
If there is no any tunable, can not I use mdb for the job ...?

This is not general and we are ok with increased I/O rate.
Please advice/help.

Thankx in advance.
tharindu


Mark Maybee wrote:
 ZFS is designed to sync a transaction group about every 5 seconds
 under normal work loads.  So your system looks to be operating as
 designed.  Is there some specific reason why you need to reduce this
 interval?  In general, this is a bad idea, as there is somewhat of a
 fixed overhead associated with each sync, so increasing the sync
 frequency could result in increased IO.

 -Mark

 Tharindu Rukshan Bamunuarachchi wrote:
 Dear ZFS Gurus,

 We are developing low latency transaction processing systems for 
 stock exchanges.
 Low latency high performance file system is critical component of our 
 trading systems.

 We have choose ZFS as our primary file system.
 But we saw periodical disk write peaks every 4-5 second.

 Please refer first column of below output. (marked in bold)
 Output is generated from our own Disk performance measuring tool. i.e 
 DTool (please find attachment)

 Compared UFS/VxFS , ZFS is performing very well,  but we could not 
 minimize periodical peaks.
 We used autoup and tune_r_fsflush flags for UFS tuning.

 Are there any ZFS specific tuning, which will reduce file system 
 flush interval of ZFS.

 I have tried all parameters specified in solarisinternals and 
 google.com.
 I would like to go for ZFS code change/recompile if necessary.

 Please advice.

 Cheers
 Tharindu



 cpu4600-100 /tantan ./*DTool -f M -s 1000 -r 1 -i 1 -W*
 System Tick = 100 usecs
 Clock resolution 10
 HR Timer created for 100usecs
 z_FileName = M
 i_Rate = 1
 l_BlockSize = 1000
 i_SyncInterval = 0
 l_TickInterval = 100
 i_TicksPerIO = 1
 i_NumOfIOsPerSlot = 1
 Max (us)| Min (us)  | Avg (us)  | MB/S  | 
 File  Freq Distribution
   336   |  4|  10.5635  |  4.7688   |  M   
 50(98.55), 200(1.09), 500(0.36), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   *1911 * |  4|  10.3152  |  9.4822   |  M   
 50(98.90), 200(0.77), 500(0.32), 2000(0.01), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   307   |  4|  9.9386   |  9.5324   |  M   
 50(99.03), 200(0.66), 500(0.31), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   331   |  4|  9.9465   |  9.5332   |  M   
 50(99.04), 200(0.72), 500(0.24), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   318   |  4|  10.1241  |  9.5309   |  M   
 50(99.07), 200(0.66), 500(0.27), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   303   |  4|  9.9236   |  9.5296   |  M   
 50(99.13), 200(0.59), 500(0.28), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   560   |  4|  10.2604  |  9.4565   |  M   
 50(98.82), 200(0.86), 500(0.31), 2000(0.01), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   376   |  4|  9.9975   |  9.5176   |  M   
 50(99.05), 200(0.63), 500(0.32), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   *9783 * |  4|  10.8216  |  9.5301   |  M   
 50(99.05), 200(0.58), 500(0.36), 2000(0.00), 5000(0.00), 1(0.01), 
 10(0.00), 20(0.00),
   332   |  4|  9.9345   |  9.5252   |  M   
 50(99.06), 200(0.61), 500(0.33), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   355   |  4|  9.9906   |  9.5315   |  M   
 50(99.01), 200(0.69), 500(0.30), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   356   |  4|  10.2341  |  9.5207   |  M   
 50(98.96), 200(0.76), 500(0.28), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   320   |  4|  9.8893   |  9.5279   |  M   
 50(99.10), 200(0.59), 500(0.31), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.00), 20(0.00),
   *10005* |  4|  10.8956  |  9.5258   |  M   
 50(99.07), 200(0.63), 500(0.29), 2000(0.00), 5000(0.00), 1(0.00), 
 10(0.01),