Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs

2005-10-21 Thread Lowell Gilbert
user [EMAIL PROTECTED] writes:

 Folks,
 
 On Thu, 20 Oct 2005, Gayn Winters wrote:
 
   Imagine that each data block is marked with labels
   on change. It doesn't matter how many labels there
   are, there will be only one data block saved.
  
  In trying to follow this thread, I started looking around for a precise
  definition of snapshot.
  Man mksnap_ffs
  wasn't too helpful, and googling for snapshot etc. wasn't fruitful.
  I'm guessing that the original author of the thread (user at dhp.com)
  may also need such a definition.  Can someone provide a pointer to a
  specification or at least an RFC-like paper?
 
 
 I found one:
 
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.snapshot?rev=1.4
 
 and further, I did some tests and discovered that what I was being told
 (by you folks) was indeed correct.
 
 No matter how many snapshots you have, the changes in blocks since the
 tiem before the first snapshot is only recorded in one of them.  That is
 to say, if I do the following:
 
 - create 4 1gig /dev/zero filled files
 - create a snapshot
 - overwrite one of those 1gig files with /dev/random
 
 My free space will have decreased by 1gig.  So far so good.
 
 If I then:
 
 - create a second snapshot
 - overwrite a different 1gig file with /dev/random
 
 My free space merely decreases by another 1gig.  It makes sense to me now
 because it has occurred to me that since the second file had not changed
 between the creation of the first and second snapshot, there is no reason
 for _both_ snapshots to _both_ say this 1gig random file used to be
 filled with zeros - it would be redundant.
 
 So that's great ... but I am curious, how do they know ?  I think my
 previous assumption (that the first _and_ the second snapshot file would
 _both_ have to record the change of file #2 from zero to random) was based
 on the notion that these snapshot files were totally autonomous and
 independent, and had no general organization behind them.  If that was the
 case, then I am still fairly certain both snapshots would need to record
 the change of the second file.

Yes, they both need to notice, but they can share the actual copy of
the data.

 So what is the behind the scenes organization that makes it possible for
 the snapshot files to not duplicate data like that ?

Without trying to give a whole course in filesystems (there are books
available if you want to go in depth), the data in the file is
held in a number of data blocks, but there is meta-data that tells
where the data is.  When a file is overwritten, the snapshots continue
to use the old version of the meta-data, which continues to point to
the old data, while the real filesystem creates a new meta-data
container pointing to new data blocks.  If you then make another
snapshot, the snapshot will use the new meta-data and its associated
underlying data. 

It's an application of the copy-on-write principle.
http://en.wikipedia.org/wiki/Copy-on-write

 ALSO,
 
 I have noticed that if you:
 
 - dd 1gig /dev/zero file
 - create snapshot
 - overwrite that 1gig file with /dev/random
 
 (free space decreases by 1gig, as expected)
 
 - rewrite that 1gig file with /dev/zero again
 
 You _don't_ get that 1gig of free space back ... which surprises me, since
 it was all zeros before, and its all zeros now ... how does the snapshot
 know those are different zeros ?  And what ramifications does this have
 for restoring, etc., if identical files do not get counted as identical in
 the snapshot ?

The snapshot doesn't know what the bits in the file are.  All it knows
is that the file's data used to be, say in block 1857 and now the
file's data are in block 1956.  The fact that both blocks are
identical is not detected.

If you're really interested in this, I suggest reading a decent
operating systems book.  It's a lot easier to understand the specific
implementation when you have a good grip on the standard terminology
and principles.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs

2005-10-21 Thread user


On 21 Oct 2005, Lowell Gilbert wrote:

 The snapshot doesn't know what the bits in the file are.  All it knows
 is that the file's data used to be, say in block 1857 and now the
 file's data are in block 1956.  The fact that both blocks are
 identical is not detected.
 
 If you're really interested in this, I suggest reading a decent
 operating systems book.  It's a lot easier to understand the specific
 implementation when you have a good grip on the standard terminology
 and principles.


Thanks very much for your help.  I am going to read a book or two - my
plan was to start with the design adn implementation of the 4.4BSD OS,
but I wanted to update it with more modern information - like snapshots,
etc., which I will do with those URLs we have already posted RE: the
snapshot work.

If you have any others, let me know.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs

2005-10-21 Thread Lowell Gilbert
user [EMAIL PROTECTED] writes:

 On 21 Oct 2005, Lowell Gilbert wrote:
 
  The snapshot doesn't know what the bits in the file are.  All it knows
  is that the file's data used to be, say in block 1857 and now the
  file's data are in block 1956.  The fact that both blocks are
  identical is not detected.
  
  If you're really interested in this, I suggest reading a decent
  operating systems book.  It's a lot easier to understand the specific
  implementation when you have a good grip on the standard terminology
  and principles.
 
 
 Thanks very much for your help.  I am going to read a book or two - my
 plan was to start with the design adn implementation of the 4.4BSD OS,
 but I wanted to update it with more modern information - like snapshots,
 etc., which I will do with those URLs we have already posted RE: the
 snapshot work.
 
 If you have any others, let me know.

Yes.  Start with something more basic, because McKusick's books assume
that you are already acquainted with the standard terminology.
Tanenbaum's are the usual recommendations.  And when you do get to
McKusick, you'll do a lot better with the new Design and
Implementation of the FreeBSD Operating System, which covers a lot of
these recent improvements.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread Doug Poland
On Thu, Oct 20, 2005 at 11:01:42AM -0400, user wrote:
 
 Finally, are there any snapshot diag tools at all ?  Like, something that
 reports snapshot sizes, percent of disk used for snapshots, and maybe even
 a way for me to actually calculate what the percent change for time period
 X is for a particular filsystem ?
 
I find sysutils/freebsd-snapshot quite useful, although it doesn't do
everything you're asking for.

Be advised there are issues with snapshots of large filesystems.
 
  
http://www.freebsd.org/cgi/query-pr-summary.cgi?category=severity=priority=class=state=sort=nonetext=snapshotresponsible=multitext=originator=release=

My experience shows that multiple snapshots of large filesystems, 30GB,
causes the system to become unresponsive when a snap is created.  If I
limit the snapshot to 1/per large FS, then often the machine hangs on
attempted reboots.

-- 
Regards,
Doug
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread user

Doug,

On Thu, 20 Oct 2005, Doug Poland wrote:

 On Thu, Oct 20, 2005 at 11:01:42AM -0400, user wrote:
  
  Finally, are there any snapshot diag tools at all ?  Like, something that
  reports snapshot sizes, percent of disk used for snapshots, and maybe even
  a way for me to actually calculate what the percent change for time period
  X is for a particular filsystem ?
  
 I find sysutils/freebsd-snapshot quite useful, although it doesn't do
 everything you're asking for.


Thanks - I will check that out.

Any comments on my math ?  (changing the same 5% of the FS all the time
vs. changing different 5%'s, and what that means for successive
snapshots) ?

thanks.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread Lowell Gilbert
user [EMAIL PROTECTED] writes:

 I am trying to budget some disk space for filesystems with snapshots
 enabled on them.
 
 The following is simplified - I am just trying to get my concepts in
 order:
 
 Let's say I have a filesystem, and on that filesystem I create a snapshot
 every single night, and every night I delete the snapshot from 5 nights
 ago.  This means that at all times, I have four snapshots running on that
 filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago,
 and one from 4 days ago.
 
 Let's also assume that the percent change of the filesystem is 5% (every
 day 5% of the blocks in the filesystem are either changed or deleted).
 
 
 
 Does this mean that if that 5% change is a different 5% every day, that
 the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day
 ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for
 a total of 50% of the total filesystem taken up with snapshot data ?

No.  One copy of each version of the file that exists in any
snapshot.  Regardless of how many snapshots it's in.

 Does that sound correct ?  When I say that the 5% change is a different 5%
 every day, what I mean is that it is not the same files/data being altered
 every day, but rather there is 5% of new data changed every day, relative
 to the previous nights snapshot.
 
 The second question is this:
 
 If the 5% data changed per day is the _same_ 5% every day (perhaps
 changing the same table in a DB every day, or perhaps changing the same
 block of lines in a text file every day) does that mean that every day
 simply represents 5%_of_filesystem, for a total of 20% of the total
 filesystem in use at all times for snapshot data ?

Whether it's the same data or not doesn't affect how much space you use.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread user

Hello,

On 20 Oct 2005, Lowell Gilbert wrote:

 user [EMAIL PROTECTED] writes:
 
  Let's say I have a filesystem, and on that filesystem I create a snapshot
  every single night, and every night I delete the snapshot from 5 nights
  ago.  This means that at all times, I have four snapshots running on that
  filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago,
  and one from 4 days ago.
  
  Let's also assume that the percent change of the filesystem is 5% (every
  day 5% of the blocks in the filesystem are either changed or deleted).
  
  
  
  Does this mean that if that 5% change is a different 5% every day, that
  the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day
  ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for
  a total of 50% of the total filesystem taken up with snapshot data ?
 
 No.  One copy of each version of the file that exists in any
 snapshot.  Regardless of how many snapshots it's in.


That doesn't make much sense to me ... if the snapshot keeps track of
changed_data_since_snapshot_was_taken, then ...

Well, think of it this way - let's say I have a 1G filesystem, which is
filled with a single 500M text file.  Now let's say I snapshot that FS.  
At this point, the snapshot takes up 0 bytes.  Now let's say the next day
I alter 10% (50M) of that single 500M file - now the snapshot takes up
that exact same amount of space, namely, 50M.

Now I create a second snapshot, which immediately yakes up 0 bytes.  The
next day, I change a totally different 50M of my text file ... so now, the
first snapshot needs to keep track of yesterdays 50M of changes/deletions
as well as todays, because todays operates on totally different disk
blocks.  So now 2-day-ago snapshot is size 100M, and the snapshot from one
day ago is now 50M.

I think my interpretation is correct ... can you look over my and your
conclusions again ?


  The second question is this:
  
  If the 5% data changed per day is the _same_ 5% every day (perhaps
  changing the same table in a DB every day, or perhaps changing the same
  block of lines in a text file every day) does that mean that every day
  simply represents 5%_of_filesystem, for a total of 20% of the total
  filesystem in use at all times for snapshot data ?
 
 Whether it's the same data or not doesn't affect how much space you use.


Yeah ... see, I think it does matter, for the reasons above ... if, as in
this second example, I am changing the same blocks on disk every day, the
snapshot just needs to keep track of them once, namely this is what they
were during the snapshot, and you can change those same blocks all you
want, I just need to keep track of what they were when you took the
snapshot..

comments ?

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread Andrew P.
On 10/20/05, user [EMAIL PROTECTED] wrote:

 Hello,

 On 20 Oct 2005, Lowell Gilbert wrote:

  user [EMAIL PROTECTED] writes:
 
   Let's say I have a filesystem, and on that filesystem I create a snapshot
   every single night, and every night I delete the snapshot from 5 nights
   ago.  This means that at all times, I have four snapshots running on that
   filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago,
   and one from 4 days ago.
  
   Let's also assume that the percent change of the filesystem is 5% (every
   day 5% of the blocks in the filesystem are either changed or deleted).
  
   
  
   Does this mean that if that 5% change is a different 5% every day, that
   the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day
   ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for
   a total of 50% of the total filesystem taken up with snapshot data ?
 
  No.  One copy of each version of the file that exists in any
  snapshot.  Regardless of how many snapshots it's in.


 That doesn't make much sense to me ... if the snapshot keeps track of
 changed_data_since_snapshot_was_taken, then ...

 Well, think of it this way - let's say I have a 1G filesystem, which is
 filled with a single 500M text file.  Now let's say I snapshot that FS.
 At this point, the snapshot takes up 0 bytes.  Now let's say the next day
 I alter 10% (50M) of that single 500M file - now the snapshot takes up
 that exact same amount of space, namely, 50M.

 Now I create a second snapshot, which immediately yakes up 0 bytes.  The
 next day, I change a totally different 50M of my text file ... so now, the
 first snapshot needs to keep track of yesterdays 50M of changes/deletions
 as well as todays, because todays operates on totally different disk
 blocks.  So now 2-day-ago snapshot is size 100M, and the snapshot from one
 day ago is now 50M.

 I think my interpretation is correct ... can you look over my and your
 conclusions again ?


   The second question is this:
  
   If the 5% data changed per day is the _same_ 5% every day (perhaps
   changing the same table in a DB every day, or perhaps changing the same
   block of lines in a text file every day) does that mean that every day
   simply represents 5%_of_filesystem, for a total of 20% of the total
   filesystem in use at all times for snapshot data ?
 
  Whether it's the same data or not doesn't affect how much space you use.


 Yeah ... see, I think it does matter, for the reasons above ... if, as in
 this second example, I am changing the same blocks on disk every day, the
 snapshot just needs to keep track of them once, namely this is what they
 were during the snapshot, and you can change those same blocks all you
 want, I just need to keep track of what they were when you took the
 snapshot..

 comments ?

 ___
 freebsd-questions@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-questions
 To unsubscribe, send any mail to [EMAIL PROTECTED]


What makes you so reassured, I wonder.

Imagine that each data block is marked with labels
on change. It doesn't matter how many labels there
are, there will be only one data block saved.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread Gayn Winters
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Andrew P.
 Sent: Thursday, October 20, 2005 12:35 PM
 To: user
 Cc: freebsd-questions@freebsd.org
 Subject: Re: FreeBSD UFS2 snapshots, and math ...

 Imagine that each data block is marked with labels
 on change. It doesn't matter how many labels there
 are, there will be only one data block saved.

In trying to follow this thread, I started looking around for a precise
definition of snapshot.
Man mksnap_ffs
wasn't too helpful, and googling for snapshot etc. wasn't fruitful.
I'm guessing that the original author of the thread (user at dhp.com)
may also need such a definition.  Can someone provide a pointer to a
specification or at least an RFC-like paper?

Thanks,

-gayn


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs

2005-10-20 Thread user

Folks,

On Thu, 20 Oct 2005, Gayn Winters wrote:

  Imagine that each data block is marked with labels
  on change. It doesn't matter how many labels there
  are, there will be only one data block saved.
 
 In trying to follow this thread, I started looking around for a precise
 definition of snapshot.
 Man mksnap_ffs
 wasn't too helpful, and googling for snapshot etc. wasn't fruitful.
 I'm guessing that the original author of the thread (user at dhp.com)
 may also need such a definition.  Can someone provide a pointer to a
 specification or at least an RFC-like paper?


I found one:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.snapshot?rev=1.4

and further, I did some tests and discovered that what I was being told
(by you folks) was indeed correct.

No matter how many snapshots you have, the changes in blocks since the
tiem before the first snapshot is only recorded in one of them.  That is
to say, if I do the following:

- create 4 1gig /dev/zero filled files
- create a snapshot
- overwrite one of those 1gig files with /dev/random

My free space will have decreased by 1gig.  So far so good.

If I then:

- create a second snapshot
- overwrite a different 1gig file with /dev/random

My free space merely decreases by another 1gig.  It makes sense to me now
because it has occurred to me that since the second file had not changed
between the creation of the first and second snapshot, there is no reason
for _both_ snapshots to _both_ say this 1gig random file used to be
filled with zeros - it would be redundant.

So that's great ... but I am curious, how do they know ?  I think my
previous assumption (that the first _and_ the second snapshot file would
_both_ have to record the change of file #2 from zero to random) was based
on the notion that these snapshot files were totally autonomous and
independent, and had no general organization behind them.  If that was the
case, then I am still fairly certain both snapshots would need to record
the change of the second file.

So what is the behind the scenes organization that makes it possible for
the snapshot files to not duplicate data like that ?

ALSO,

I have noticed that if you:

- dd 1gig /dev/zero file
- create snapshot
- overwrite that 1gig file with /dev/random

(free space decreases by 1gig, as expected)

- rewrite that 1gig file with /dev/zero again

You _don't_ get that 1gig of free space back ... which surprises me, since
it was all zeros before, and its all zeros now ... how does the snapshot
know those are different zeros ?  And what ramifications does this have
for restoring, etc., if identical files do not get counted as identical in
the snapshot ?

thanks.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


RE: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs

2005-10-20 Thread Gayn Winters


 -Original Message-
 From: user [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, October 20, 2005 1:51 PM
 To: Gayn Winters
 Cc: 'Andrew P.'; freebsd-questions@freebsd.org
 Subject: RE: FreeBSD UFS2 snapshots, and math ... - resolved, 
 but two more Qs
 
 
 
 Folks,
 
 On Thu, 20 Oct 2005, Gayn Winters wrote:
 
   Imagine that each data block is marked with labels
   on change. It doesn't matter how many labels there
   are, there will be only one data block saved.
  
  In trying to follow this thread, I started looking around 
 for a precise
  definition of snapshot.
  Man mksnap_ffs
  wasn't too helpful, and googling for snapshot etc. wasn't 
 fruitful.
  I'm guessing that the original author of the thread (user 
 at dhp.com)
  may also need such a definition.  Can someone provide a pointer to a
  specification or at least an RFC-like paper?
 
 
 I found one:
 
 http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.s
 napshot?rev=1.4
 
 and further, I did some tests and discovered that what I was 
 being told
 (by you folks) was indeed correct.
 
 No matter how many snapshots you have, the changes in blocks since the
 tiem before the first snapshot is only recorded in one of 
 them.  That is
 to say, if I do the following:
 
 - create 4 1gig /dev/zero filled files
 - create a snapshot
 - overwrite one of those 1gig files with /dev/random
 
 My free space will have decreased by 1gig.  So far so good.
 
 If I then:
 
 - create a second snapshot
 - overwrite a different 1gig file with /dev/random
 
 My free space merely decreases by another 1gig.  It makes 
 sense to me now
 because it has occurred to me that since the second file had 
 not changed
 between the creation of the first and second snapshot, there 
 is no reason
 for _both_ snapshots to _both_ say this 1gig random file used to be
 filled with zeros - it would be redundant.
 
 So that's great ... but I am curious, how do they know ?  I think my
 previous assumption (that the first _and_ the second snapshot 
 file would
 _both_ have to record the change of file #2 from zero to 
 random) was based
 on the notion that these snapshot files were totally autonomous and
 independent, and had no general organization behind them.  If 
 that was the
 case, then I am still fairly certain both snapshots would 
 need to record
 the change of the second file.
 
 So what is the behind the scenes organization that makes it 
 possible for
 the snapshot files to not duplicate data like that ?
 
 ALSO,
 
 I have noticed that if you:
 
 - dd 1gig /dev/zero file
 - create snapshot
 - overwrite that 1gig file with /dev/random
 
 (free space decreases by 1gig, as expected)
 
 - rewrite that 1gig file with /dev/zero again
 
 You _don't_ get that 1gig of free space back ... which 
 surprises me, since
 it was all zeros before, and its all zeros now ... how does 
 the snapshot
 know those are different zeros ?  And what ramifications 
 does this have
 for restoring, etc., if identical files do not get counted as 
 identical in
 the snapshot ?
 
 thanks.
 

I just finished skimming an old paper by McKusick on Soft Updates:
http://www.usenix.org/publications/library/proceedings/usenix99/full_pap
ers/mckusick/mckusick.pdf
This paper is dated 1999.  Does anyone know if it accurately reflects
how soft updates and snapshots in FreeBSD 5.4 are implemented?  If so,
it would answer the above questions.

-gayn


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: FreeBSD UFS2 snapshots, and math ...

2005-10-20 Thread Andrew P.
On 10/21/05, Gayn Winters [EMAIL PROTECTED] wrote:
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Andrew P.
  Sent: Thursday, October 20, 2005 12:35 PM
  To: user
  Cc: freebsd-questions@freebsd.org
  Subject: Re: FreeBSD UFS2 snapshots, and math ...

  Imagine that each data block is marked with labels
  on change. It doesn't matter how many labels there
  are, there will be only one data block saved.

 In trying to follow this thread, I started looking around for a precise
 definition of snapshot.
 Man mksnap_ffs
 wasn't too helpful, and googling for snapshot etc. wasn't fruitful.
 I'm guessing that the original author of the thread (user at dhp.com)
 may also need such a definition.  Can someone provide a pointer to a
 specification or at least an RFC-like paper?

 Thanks,

 -gayn




Here ya go: http://www.mckusick.com/softdep/
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]