Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Casper . Dik

 Is there some reason why a small read on a raidz2 is not statistically very 
 likely to require I/O on only one device? Assuming a non-degraded pool of 
 course.

ZFS stores its checksums for RAIDZ/RAIDZ2 in such a way that all disks must be 
read to compute and
 verify the checksum.


But why do ZFS reads require the computation of the RAIDZ checksum?

If the block checksum is fine, then you need not care about the
parity.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Robert Milkowski
Hello Anton,

Thursday, January 4, 2007, 3:46:48 AM, you wrote:

 Is there some reason why a small read on a raidz2 is not statistically very 
 likely to require I/O on only one device? Assuming a non-degraded pool of 
 course.

ABR ZFS stores its checksums for RAIDZ/RAIDZ2 in such a way that all
ABR disks must be read to compute and verify the checksum.

It's not about the checksum but about how a fs block is stored in
raid-z[12] case - it's spread out to all non-parity disks so in order
to read one fs block you have to read fromm all disks except parity
disks.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs clones

2007-01-04 Thread Darren J Moffat

Matthew Ahrens wrote:
now wouldnt it be more natural way of usage when I intend to create a 
clone, that by default
the zfs clone command will create the needed snapshot from the current 
image internally as part
of taking the clone unless I explicitely specify that I do want to 
take a clone of a specific snapshot ?


While that might be convenient, hiding what's really going on can result 
in confusion.


So really, administrators need to be aware of the underlying snapshot. 
Therefore, providing a clone a fs by taking a snapshot under the 
covers feature would serve only as syntactic sugar.  I believe that the 
potential for confusion outweighs this benefit.


I agree with both of you.

I agree with Frank that there should be the ability to run a single ZFS 
command to create a clone without the admin need to manually create a 
snapshot first.  This isn't a reduction from one command to two in 
scripting but more like three or four commands because you would need to 
calculate a snapshot name then snapshot with that then use it to clone.


I agree completely with Matt that we should never hide the fact from the 
admin that for a clone you need a snapshot to first exist.


What I would like to see it that zfs clone when given an additional 
argument creates the snapshot and the clone for you.  This means we need 
a way to name the snapshots to ensure they will never clash, I'd suggest 
using the date and time in an ISO format eg:


$ zfs clone -s homes/template/user homes/bob

Would be equivalent to doing the following:

$ snapname=`date +%F:%T`
$ zfs snapshot homes/template/[EMAIL PROTECTED]
$ zfs clone homes/template/[EMAIL PROTECTED] homes/bob

In both cases running zfs list would show the snapshot.

Where this becomes really useful is if we also had '-r' for clones, for 
example cloning a whole part os the namesoace might go something like this:


$ snapname=`date +%F:%T`
$ zfs snapshot -r [EMAIL PROTECTED]
$ for i in `zfs list -t snapshot -H -o name | sort | tail -1` ; do
 zfs clone $i nhomes/${i%%$snapname}
done

If we had implicit snapshot with clones and a recursive capability
we could do this:

$ zfs clone -s -r homes nhomes

Its a easy of use thing in the UI, just like when running zpool to 
create a new pool you get the top level filesystem without running zfs 
create.


It could also be used when cloneing zones using zoneadm(1M), that 
currently uses @SUNWzone as a prefix of the snapshot name.


Now I don't think this is particularly high priority, I'd much rather 
see '-r' for zfs send/recv before I saw this.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS -- Grub shell

2007-01-04 Thread Lubos Kocman
Hi,

I've tried to create zfs pool on my c0d0s7 and use it as a tank for my 
/export/home
Everything was perfect, I moved my files from a backup there and still ok.

I also deleted old line with ufs mountpoint /export/home in vfstab file.

But after reboot, there was just bash shell. I've tried to boot up from command 
line but I wasn't succesfull. I have never did it before (just with Linux).

SXCR b53 amd64

If somebody will give me instructions for grub command line boot, I'll be glad.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS -- Grub shell

2007-01-04 Thread Detlef Drewanz
I am not sure what your problem is. Does it not start dtlogin/gdm ? Or is the zpool not 
mounted ? Just run after login on cli and run svcs -x and df -k to see if all services 
 are running and the zpool has been mounted. It can be that your zpool was not mounted on 
/export (depending on your mountpoint) if /export has still the existing subdirectory 
/export/home. But you should see this in the errorlog fram the failing zpool mount service


Detlef

On 01/04/07 13:27, Lubos Kocman wrote:

Hi,

I've tried to create zfs pool on my c0d0s7 and use it as a tank for my 
/export/home
Everything was perfect, I moved my files from a backup there and still ok.

I also deleted old line with ufs mountpoint /export/home in vfstab file.

But after reboot, there was just bash shell. I've tried to boot up from command 
line but I wasn't succesfull. I have never did it before (just with Linux).

SXCR b53 amd64

If somebody will give me instructions for grub command line boot, I'll be glad.
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Detlef Drewanz  Systems Engineer/OS Ambassador
Sun Microsystems GmbH   Phone: (+49 30) 747096 856
Komturstrasse 18a   mailto:[EMAIL PROTECTED]
D-12099 Berlin  http://blogs.sun.com/solarium
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton Rang

On Jan 4, 2007, at 3:25 AM, [EMAIL PROTECTED] wrote:

Is there some reason why a small read on a raidz2 is not  
statistically very
likely to require I/O on only one device? Assuming a non-degraded  
pool of

course.


ZFS stores its checksums for RAIDZ/RAIDZ2 in such a way that all  
disks must be read to compute and

 verify the checksum.

But why do ZFS reads require the computation of the RAIDZ checksum?

If the block checksum is fine, then you need not care about the  
parity.


It's the block checksum that requires reading all of the disks.  If  
ZFS stored sub-block checksums
for the RAID-Z case then short reads could often be satisfied without  
reading the whole block (and

all disks).

So actually I mis-spoke slightly; rather than all disks, I should  
have said all data disks.
In practice this has the same effect: No more than one read may be  
processed at a time.


Anton

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Casper . Dik

So actually I mis-spoke slightly; rather than all disks, I should  
have said all data disks.
In practice this has the same effect: No more than one read may be  
processed at a time.

But aren't short blocks sometimes stored on only a subset of disks?

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Darren Dunham
 It's the block checksum that requires reading all of the disks.  If
 ZFS stored sub-block checksums for the RAID-Z case then short reads
 could often be satisfied without reading the whole block (and all
 disks).

What happens when a sub-block is missing (single disk failure)?  Surely
it doesn't have to discard the entire checksum and simply trust the
remaining blocks?

Also, even if it could read the data from a subset of the disks, isn't
it a feature that every read is also verifying the parity for
correctness/silent corruption?  I'm assuming that any short-read
optimization wouldn't be able to perform that check.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re[2]: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Roch - PAE
Anton B. Rang writes:
   In our recent experience RAID-5 due to the 2 reads, a XOR calc and a
   write op per write instruction is usually much slower than RAID-10
   (two write ops). Any advice is  greatly appreciated.
   
   RAIDZ and RAIDZ2 does not suffer from this malady (the RAID5 write hole).
  
  1. This isn't the write hole.
  
  2. RAIDZ and RAIDZ2 suffer from read-modify-write overhead when
  updating a file in writes of less than 128K, but not when writing a
  new file or issuing large writes. 
   

I don't think this is stated correctly.

All   filesystems   will   incur  a   read-modify-write when
application is  updating  portion of a  block.  The read I/O
only   occurs if the block  is  not already in memory cache.
The write is potentially deferred and multiple block updates
may occur per write I/O. 

This is not RAIDZ specific.

ZFS stores files less than 128K (or less than the filesystem
recordsize)  as a single block.  Larger  files are stored as
multiple recordsize blocks. 

For RAID-Z a block spreads onto all devices of a group.

-r

   
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re[2]: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton Rang


On Jan 4, 2007, at 10:26 AM, Roch - PAE wrote:


All   filesystems   will   incur  a   read-modify-write when
application is  updating  portion of a  block.


For most Solaris file systems it is the page size, rather than
the block size, that affects read-modify-write; hence 8K (SPARC)
or 4K (x86/x64) writes do not require read-modify-write for
UFS/QFS, even when larger block sizes are used.

When direct I/O is enabled, UFS and QFS will write directly to
disk (without reading) for 512-byte-aligned I/O.


The read I/O only occurs if the block is not already in memory cache.


Of course.


ZFS stores files less than 128K (or less than the filesystem
recordsize)  as a single block.  Larger  files are stored as
multiple recordsize blocks.


So appending to any file less than 128K will result in a read-modify- 
write

cycle (modulo read caching); while a write to a file which is not
record-size-aligned (by default, 128K) results in a read-modify-write  
cycle.



For RAID-Z a block spreads onto all devices of a group.


Which means that all devices are involved in the read and the write;  
except,
as I believe Casper pointed out, that very small blocks (less than  
512 bytes

per data device) will reside on a smaller set of disks.

Anton

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs list and snapshots..

2007-01-04 Thread Wade . Stuart






[EMAIL PROTECTED] wrote on 01/03/2007 04:21:00 PM:

 [EMAIL PROTECTED] wrote:
  which is not the behavior I am seeing..

 Show me the output, and I can try to explain what you are seeing.
[9:36am] [~]:test% zfs create data/test
[9:36am] [~]:test% zfs set compression=on data/test
[9:37am] [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
[9:37am] [/data/test]:test% cp -R /data/fileblast/export/spare/images .
[9:40am] [/data/test]:test% zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data/test 13.4G  14.2T  13.4G  /data/test
data/[EMAIL PROTECTED]   61.2K  -  66.6K  -
[9:40am]  [/data/test]:test% du -sk images
14022392  images
[9:40am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
[9:41am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
[9:41am]  [/data/test]:test% cd images/
[9:41am]  [/data/test/images]:test% cd fullres
[9:41am]  [/data/test/images/fullres]:test% rm -rf [A-H]*
[9:42am]  [/data/test/images/fullres]:test% zfs snapshot data/[EMAIL PROTECTED]
[9:42am]  [/data/test/images/fullres]:test% zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data/test 13.4G  14.2T  6.54G  /data/test
data/[EMAIL PROTECTED]   61.2K  -  66.6K  -
data/[EMAIL PROTECTED]   0  -  13.4G  -
data/[EMAIL PROTECTED]   0  -  13.4G  -
data/[EMAIL PROTECTED]   0  -  6.54G  -
[9:42am]  [/data/test/images/fullres]:test% cd ..
[9:42am]  [/data/test/images]:test% cd ..
[9:42am]  [/data/test]:test% du -sk images
6862197 images

What I would expect to see is:
data/[EMAIL PROTECTED]   6.86G  -  13.4G

This shows me that snap3 now is the most specific owner of 6.86G of
delta. Please note that snap2 also uses this same delta data, but is not
the most specific (newest and last node) owner so it is not expected that
it would show this data in it's usage. Removing snaps from earliest to
snap3 would free the total used space from the snaps destroyed. I do see
that the REFER column does fluctuate down -- but as the test becomes more
complex (more writes/deletes between deltas and more deltas) I do not see
any way to correlate the usage of the snaps.  In my original tests that
were more complex I lost 50G from any view in this list, it was only
recoverable by deleting snaps until I hit the one that actually owned the
data. This is a major problem for me because our snap policy is such that
we need to keep as many snaps as possible (assuring a low water mark free
space) and have advance notice to snap culling.  Every other snapping
fs/vm/system I have used has been able to show delta size ownership for
snaps -- so this has never been an issue for us before...



 AFAIK, the manpage is accurate.  The space used by a snapshot is
exactly
 the amount of space that will be freed up when you run 'zfs destroy
 snapshot'.  Once that operation completes, 'zfs list' will show that
the
 space used by adjacent snapshots has changed as a result.

 Unfortunately, at this time there is no way to answer the question how
 much space would be freed up if I were to delete these N snapshots.  We
 have some ideas on how to express this, but it will probably be some time

 before we are able to implement it.

  If I have 100 snaps of a
  filesystem that are relatively low delta churn and then delete half of
the
  data out there I would expect to see that space go up in the used
column
  for one of the snaps (in my tests cases I am deleting 50gb out of 100gb
  filesystem and showing no usage increase on any of the snaps).

 That's probably because the 50GB that you deleted from the fs is shared
 among the snapshots, so it is still the case that deleting any one
snapshot
 will not free up much space.

No in my original test I had a few hundred snaps -- all with varying delta.
In the middle of the snap history I unlinked a substantial portion of the
data,  creating a large COW delta that should be easily spottable with
reporting tools.



  I am
  planning on having many many snaps on our filesystems and
programmatically
  during old snaps as space is needed -- when zfs list does not attach
delta
  usage to snaps it makes this impossible (without blindly deleting
snaps,
  waiting an unspecified period until zfs list is updated and repeat).

 As I mentioned, you need only wait until 'zfs destroy' finishes to see
the
 updated accounting from 'zfs list'.

The problem is this is looking at the forest after you burn all the trees.
I want to be able to plan the cull before swinging the axe.


  Also another thing that is not really specified in the documentation is
  where this delta space usage would be listed

 What do you mean by delta space usage?  AFAIK, ZFS does not use that
term
 anywhere, which is why it is not documented :-)

By delta I mean the COW blocks that the snap represents as delta from the
previous snap and the next snap or live -- in other words what makes this
snap different from the previous or next snap.



 

[zfs-discuss] Re: Re: RAIDZ2 vs. ZFS RAID-10

2007-01-04 Thread Anton B. Rang
 What happens when a sub-block is missing (single disk failure)?  Surely
 it doesn't have to discard the entire checksum and simply trust the
 remaining blocks?

The checksum is over the data, not the data+parity.  So when a disk fails,
the data is first reconstructed, and then the block checksum is computed.

 Also, even if it could read the data from a subset of the disks, isn't
 it a feature that every read is also verifying the parity for
 correctness/silent corruption?

It doesn't -- we only read the data, not the parity.  (See line 708 of
vdev_raidz.c.)  The parity is checked only when scrubbing.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Scrubbing on active zfs systems (many snaps per day)

2007-01-04 Thread Wade . Stuart




From what I have read, it looks like there is a known issue with scrubbing
restarting when any of the other usages of the same code path run
(re-silver, snap ...).  It looks like there is a plan to put in a marker so
that scrubbing knows where to start again after being preempted.  This is
good.  I am wondering if any thought has been put in to a scrubbing service
that would do constant low priority scrubs (either full with the restart
marker, or randomized).   I have noticed that the default scrub seems to be
very resource intensive and can cause significant slowdowns on the
filesystems,  a much slower but constant scrub would be nice.

while (1) {
  scrub_very_slowly();
}


Are there any plans in this area documented anywhere or can someone give
insight as to the devel teams goals?

Thanks!
-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs list and snapshots..

2007-01-04 Thread Matthew Ahrens

Darren Dunham wrote:

Is the problem of displaying the potential space freed by multiple
destructions one of calculation (do you have to walk snapshot trees?) or
one of formatting and display?


Both, because you need to know for each snapshot, how much of the data 
it references was first referenced in each previous snapshot. 
Displaying these O(Nsnapshots ^ 2) data points is nontrivial.


As I mentioned, you need only wait until 'zfs destroy' finishes to see the 
updated accounting from 'zfs list'.


So if reaching a hard target was necessary, we could just delete
snapshots in age order, checking space after each, until the target
space became available.  But there would be no way to see beforehand how
many that would be, or if it was worth starting the process if some
snapshots are special and not available for deletion.


That's correct.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksum errors...

2007-01-04 Thread eric kustarz



errors: The following persistent errors have been detected:

  DATASET  OBJECT  RANGE
  z_tsmsun1_pool/tsmsrv1_pool  26208464760832-8464891904

Looks like I have possibly a single file that is corrupted.  My question is how do I find the file.  Is it as simple as doing a find command using -inum 2620?  



FYI, i'm finishing up:
6410433 'zpool status -v' would be more useful with filenames

Which will give you the complete path to the file (if applicable), so 
you don't have to do a 'find' on the inum.


eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs list and snapshots..

2007-01-04 Thread Matthew Ahrens

[EMAIL PROTECTED] wrote:

[9:40am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
[9:41am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]

...

[9:42am]  [/data/test/images/fullres]:test% zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
data/test 13.4G  14.2T  6.54G  /data/test
data/[EMAIL PROTECTED]   61.2K  -  66.6K  -
data/[EMAIL PROTECTED]   0  -  13.4G  -
data/[EMAIL PROTECTED]   0  -  13.4G  -
data/[EMAIL PROTECTED]   0  -  6.54G  -


When snap3 is deleted, no space will be freed (because it will still be 
referenced by snap2), therefore the space used by snap3 is 0.



What I would expect to see is:
data/[EMAIL PROTECTED]   6.86G  -  13.4G



This shows me that snap3 now is the most specific owner of 6.86G of
delta.


I understand that you *want* it to display a different value, but it is 
correctly showing the documented value.  How can we make the manpage 
better, to avoid this confusion in the future?



By delta I mean the COW blocks that the snap represents as delta from the
previous snap and the next snap or live -- in other words what makes this
snap different from the previous or next snap.


The best way I could come up with to define a bounded number of stats to 
express space usage for snapshots was the amount of space born and the 
amount killed.  Space born is the amount of space that is newly 
allocated in this snapshot (ie. not referenced in the prev snap, but 
referenced here).  Space killed is the amount of space that is newly 
freed in this snapshot (ie. referenced in the prev snap, but not 
referenced here).


We considered including these numbers, but decided against it, primarily 
because they can't actually answer the important question:  how much 
space will be freed if I delete these N snapshots?


You can't answer this question because you don't know *which* blocks 
were born and killed.  Consider 2 filesystems, A and B, both of which 
have lots of churn between every snapshot.  However, in A every block is 
referenced by exactly 2 snapshots, and in B every block is referenced by 
exactly 3 snapshots (excluding the first/last few).  The born/killed 
stats for A and B's snapshots may be the same, so there's no way to tell 
that to free up space in A, you must delete at least 2 adjacent snaps 
vs. for B, you must delete at least 3.


To really answer this question (how much space would be freed if I 
deleted these N snapshots), you need to know for each snapshot, how much 
of the space that it references was first referenced in each of the 
previous snapshots.  We're working on a way to compute and graphically 
display these values, which should make them relatively easy to interpret.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs list and snapshots..

2007-01-04 Thread Wade . Stuart




Matthew,

 I really do appreciate this discussion, thank you for taking the time to
go over this with me.



Matthew Ahrens [EMAIL PROTECTED] wrote on 01/04/2007 01:49:00 PM:

 [EMAIL PROTECTED] wrote:
  [9:40am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
  [9:41am]  [/data/test]:test% zfs snapshot data/[EMAIL PROTECTED]
 ...
  [9:42am]  [/data/test/images/fullres]:test% zfs list
  NAME   USED  AVAIL  REFER  MOUNTPOINT
  data/test 13.4G  14.2T  6.54G  /data/test
  data/[EMAIL PROTECTED]   61.2K  -  66.6K  -
  data/[EMAIL PROTECTED]   0  -  13.4G  -
  data/[EMAIL PROTECTED]   0  -  13.4G  -
  data/[EMAIL PROTECTED]   0  -  6.54G  -

 When snap3 is deleted, no space will be freed (because it will still be
 referenced by snap2), therefore the space used by snap3 is 0.

Where does it show (on any of the snaps) that they are holding 6.8G of disk
space hostage? I understand snap2 and snap3 both share that data,  thats
why below I say that snap3, being the most specific owner should list the
6.8G as used -- showing that you need to delete snaps from 0 - snap3 to
free 6.8G ( more specifically deleing snap1 gets you 61.3k, snap2 0, snap
three 6.8G if you delete them in that order, and only guaranteed if you
delete them all).  If I just delete snap3 and leave snap2, I would expect
snap2 to be the most specific owner of that delta data showing 6.8G usage
now.



  What I would expect to see is:
  data/[EMAIL PROTECTED]   6.86G  -  13.4G

  This shows me that snap3 now is the most specific owner of 6.86G of
  delta.

 I understand that you *want* it to display a different value, but it is
 correctly showing the documented value.  How can we make the manpage
 better, to avoid this confusion in the future?



  The amount of space consumed by this dataset and all its
 descendants.  This  is the value that is checked against
 this dataset's quota and  reservation.  The  space  used
 does  not  include  this dataset's reservation, but does
 take into account the  reservations  of  any  descendant
 datasets.  The  amount  of space that a dataset consumes
 from its parent, as well as the  amount  of  space  that
 will  be freed if this dataset is recursively destroyed,
 is the greater of its space used and its reservation.


maybe kill this part below entirely and put the usage column for
snapshots is undetermined and may or may not reflect actual disk usage
associated with the snapshot especialy when blocks are freed between
snapshots.

 When  snapshots  (see  the  Snapshots   section)   are
 created,  their  space  is  initially shared between the
 snapshot and the file system, and possibly with previous
 snapshots.  As  the  file system changes, space that was
 previously shared becomes unique to  the  snapshot,  and
 counted  in  the  snapshot's  space  used. Additionally,
 deleting snapshots can  increase  the  amount  of  space
 unique to (and used by) other snapshots.







  By delta I mean the COW blocks that the snap represents as delta from
the
  previous snap and the next snap or live -- in other words what makes
this
  snap different from the previous or next snap.

 The best way I could come up with to define a bounded number of stats to
 express space usage for snapshots was the amount of space born and the
 amount killed.  Space born is the amount of space that is newly
 allocated in this snapshot (ie. not referenced in the prev snap, but
 referenced here).  Space killed is the amount of space that is newly
 freed in this snapshot (ie. referenced in the prev snap, but not
 referenced here).

For the common case you don't care about born or killed,  you only care
about blocks that are referenced on this snap that are not on the next (or
live if you are the last snap) -- these cover both new files/blocks and
deleted files/blocks that this snap now owns.

Assuming you can get an array of COW blocks for a given snapshot, and for
each snapshot in order of oldest to newest show the size_of ( COW blocks )
that are in snapshot N but do not exist in snapshot N+1 (set N-N+1) would
show me enough information to plan destroys.  The number here is how much
space would be freed by deleting snapshots from oldest one to this one in
sequence. This would allow us to plan deletions, or even see where peak
deltas happen on a long series of snaps. I understand this is not as nice
as doing a bidirectional difference test to gather more information such as
what would deleting a random snapshot in the series free if anything,  but
that seems to require much more overhead.  I think this covers the most
common usage, deleting older snaps before deleting random snaps in the
series or the newest snap first.

Even if this was a long operation that was executed via a list flag  it
would sure help.  I am 

Re: [zfs-discuss] ZFS related (probably) hangs due to memory exhaustion(?) with snv53

2007-01-04 Thread Tomas Ögren
On 03 January, 2007 - [EMAIL PROTECTED] sent me these 0,5K bytes:

 
 Hmmm, so there is lots of evictable cache here (mostly in the MFU
 part of the cache)... could you make your core file available?
 I would like to take a look at it.
 
 Isn't this just like:
 6493923 nfsfind on ZFS filesystem quickly depletes memory in a 1GB system
 
 Which was introduced in b51(or 52) and fixed in snv_54.

I've upgraded to snv54 now and even forced 500k dnlc entries in memory
(ncsize=50,arc_reduce_dnlc_percent=0).. Let's see how it copes with
that ;)

Currently seeing about 96% cache hit rate in name lookups even after
just 3h.. it's usually been around 20% or so when it's automatically
lowered to around 15k entries due to memory pressure..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] odd versus even

2007-01-04 Thread Peter Tribble

I'm being a bit of a dunderhead at the moment and neither the site search
nor
google are picking up the information I seek...

I'm setting up a thumper and I'm sure I recall some discussion of the
optimal
number of drives in raidz1 and raidz2 vdevs. I also recall that it was
something
like you would want an even number of disk for raidz1, and an odd number for
raidz2 (so you always have an odd number of data drives). Have I remembered
this correctly, or am I going delusional? And, if it is the case, what is
the
reasoning behind it?

Thanks,

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: zfs list and snapshots..

2007-01-04 Thread Matthew Ahrens

[EMAIL PROTECTED] wrote:

Common case to me is, how much would be freed by deleting the snapshots in
order of age from oldest to newest always starting with the oldest.


That would be possible.  A given snapshot's space used by this and all 
prior snapshots would be the prev snap's used+prior + the next snap's 
killed (as defined in my previous email).  I can see that exposing the 
killed value may be useful in some circumstances.


However, I think that the more general question (ie. for arbitrary 
ranges of snapshots) would be required in many cases.  For example, if 
you had a more complicated snapshot policy, like keep every monthly 
snapshot, but delete any others to make space.  I imagine that such a 
policy would be fairly common.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS direct IO

2007-01-04 Thread dudekula mastan
Hi All,
   
  As you all know that DIRECT IO is not supported by ZFS file sytem. When ZFS 
people will add DIRECT IO support to ZFS ? What is the roadmap for ZFS direct 
IO ? Do you have any idea on this, Please let me know.
   
   
  Thanks  Regards
  Masthan

 __
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss