Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user <[EMAIL PROTECTED]> writes: > On 21 Oct 2005, Lowell Gilbert wrote: > > > The snapshot doesn't know what the bits in the file are. All it knows > > is that the file's data used to be, say in "block 1857" and now the > > file's data are in "block 1956". The fact that both blocks are > > identical is not detected. > > > > If you're really interested in this, I suggest reading a decent > > operating systems book. It's a lot easier to understand the specific > > implementation when you have a good grip on the standard terminology > > and principles. > > > Thanks very much for your help. I am going to read a book or two - my > plan was to start with "the design adn implementation of the 4.4BSD OS", > but I wanted to update it with more modern information - like snapshots, > etc., which I will do with those URLs we have already posted RE: the > snapshot work. > > If you have any others, let me know. Yes. Start with something more basic, because McKusick's books assume that you are already acquainted with the standard terminology. Tanenbaum's are the usual recommendations. And when you do get to McKusick, you'll do a lot better with the new "Design and Implementation of the FreeBSD Operating System," which covers a lot of these recent improvements. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
On 21 Oct 2005, Lowell Gilbert wrote: > The snapshot doesn't know what the bits in the file are. All it knows > is that the file's data used to be, say in "block 1857" and now the > file's data are in "block 1956". The fact that both blocks are > identical is not detected. > > If you're really interested in this, I suggest reading a decent > operating systems book. It's a lot easier to understand the specific > implementation when you have a good grip on the standard terminology > and principles. Thanks very much for your help. I am going to read a book or two - my plan was to start with "the design adn implementation of the 4.4BSD OS", but I wanted to update it with more modern information - like snapshots, etc., which I will do with those URLs we have already posted RE: the snapshot work. If you have any others, let me know. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
user <[EMAIL PROTECTED]> writes: > Folks, > > On Thu, 20 Oct 2005, Gayn Winters wrote: > > > > Imagine that each data block is marked with labels > > > on change. It doesn't matter how many labels there > > > are, there will be only one data block saved. > > > > In trying to follow this thread, I started looking around for a precise > > definition of snapshot. > > Man mksnap_ffs > > wasn't too helpful, and googling for "snapshot" etc. wasn't fruitful. > > I'm guessing that the original author of the thread (user at dhp.com) > > may also need such a definition. Can someone provide a pointer to a > > specification or at least an RFC-like paper? > > > I found one: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.snapshot?rev=1.4 > > and further, I did some tests and discovered that what I was being told > (by you folks) was indeed correct. > > No matter how many snapshots you have, the changes in blocks since the > tiem before the first snapshot is only recorded in one of them. That is > to say, if I do the following: > > - create 4 1gig /dev/zero filled files > - create a snapshot > - overwrite one of those 1gig files with /dev/random > > My free space will have decreased by 1gig. So far so good. > > If I then: > > - create a second snapshot > - overwrite a different 1gig file with /dev/random > > My free space merely decreases by another 1gig. It makes sense to me now > because it has occurred to me that since the second file had not changed > between the creation of the first and second snapshot, there is no reason > for _both_ snapshots to _both_ say "this 1gig random file used to be > filled with zeros" - it would be redundant. > > So that's great ... but I am curious, how do they know ? I think my > previous assumption (that the first _and_ the second snapshot file would > _both_ have to record the change of file #2 from zero to random) was based > on the notion that these snapshot files were totally autonomous and > independent, and had no general organization behind them. If that was the > case, then I am still fairly certain both snapshots would need to record > the change of the second file. Yes, they both need to notice, but they can share the actual copy of the data. > So what is the behind the scenes organization that makes it possible for > the snapshot files to not duplicate data like that ? Without trying to give a whole course in filesystems (there are books available if you want to go in depth), the data in the file is held in a number of data blocks, but there is meta-data that tells where the data is. When a file is overwritten, the snapshots continue to use the old version of the meta-data, which continues to point to the old data, while the "real" filesystem creates a new meta-data container pointing to new data blocks. If you then make another snapshot, the snapshot will use the new meta-data and its associated underlying data. It's an application of the "copy-on-write" principle. http://en.wikipedia.org/wiki/Copy-on-write > ALSO, > > I have noticed that if you: > > - dd 1gig /dev/zero file > - create snapshot > - overwrite that 1gig file with /dev/random > > (free space decreases by 1gig, as expected) > > - rewrite that 1gig file with /dev/zero again > > You _don't_ get that 1gig of free space back ... which surprises me, since > it was all zeros before, and its all zeros now ... how does the snapshot > know those are "different zeros" ? And what ramifications does this have > for restoring, etc., if identical files do not get counted as identical in > the snapshot ? The snapshot doesn't know what the bits in the file are. All it knows is that the file's data used to be, say in "block 1857" and now the file's data are in "block 1956". The fact that both blocks are identical is not detected. If you're really interested in this, I suggest reading a decent operating systems book. It's a lot easier to understand the specific implementation when you have a good grip on the standard terminology and principles. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
On 10/21/05, Gayn Winters <[EMAIL PROTECTED]> wrote: > > -Original Message- > > From: [EMAIL PROTECTED] > > [mailto:[EMAIL PROTECTED] On Behalf Of Andrew P. > > Sent: Thursday, October 20, 2005 12:35 PM > > To: user > > Cc: freebsd-questions@freebsd.org > > Subject: Re: FreeBSD UFS2 snapshots, and math ... > > > Imagine that each data block is marked with labels > > on change. It doesn't matter how many labels there > > are, there will be only one data block saved. > > In trying to follow this thread, I started looking around for a precise > definition of snapshot. > Man mksnap_ffs > wasn't too helpful, and googling for "snapshot" etc. wasn't fruitful. > I'm guessing that the original author of the thread (user at dhp.com) > may also need such a definition. Can someone provide a pointer to a > specification or at least an RFC-like paper? > > Thanks, > > -gayn > > > Here ya go: http://www.mckusick.com/softdep/ ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
> -Original Message- > From: user [mailto:[EMAIL PROTECTED] > Sent: Thursday, October 20, 2005 1:51 PM > To: Gayn Winters > Cc: 'Andrew P.'; freebsd-questions@freebsd.org > Subject: RE: FreeBSD UFS2 snapshots, and math ... - resolved, > but two more Qs > > > > Folks, > > On Thu, 20 Oct 2005, Gayn Winters wrote: > > > > Imagine that each data block is marked with labels > > > on change. It doesn't matter how many labels there > > > are, there will be only one data block saved. > > > > In trying to follow this thread, I started looking around > for a precise > > definition of snapshot. > > Man mksnap_ffs > > wasn't too helpful, and googling for "snapshot" etc. wasn't > fruitful. > > I'm guessing that the original author of the thread (user > at dhp.com) > > may also need such a definition. Can someone provide a pointer to a > > specification or at least an RFC-like paper? > > > I found one: > > http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.s > napshot?rev=1.4 > > and further, I did some tests and discovered that what I was > being told > (by you folks) was indeed correct. > > No matter how many snapshots you have, the changes in blocks since the > tiem before the first snapshot is only recorded in one of > them. That is > to say, if I do the following: > > - create 4 1gig /dev/zero filled files > - create a snapshot > - overwrite one of those 1gig files with /dev/random > > My free space will have decreased by 1gig. So far so good. > > If I then: > > - create a second snapshot > - overwrite a different 1gig file with /dev/random > > My free space merely decreases by another 1gig. It makes > sense to me now > because it has occurred to me that since the second file had > not changed > between the creation of the first and second snapshot, there > is no reason > for _both_ snapshots to _both_ say "this 1gig random file used to be > filled with zeros" - it would be redundant. > > So that's great ... but I am curious, how do they know ? I think my > previous assumption (that the first _and_ the second snapshot > file would > _both_ have to record the change of file #2 from zero to > random) was based > on the notion that these snapshot files were totally autonomous and > independent, and had no general organization behind them. If > that was the > case, then I am still fairly certain both snapshots would > need to record > the change of the second file. > > So what is the behind the scenes organization that makes it > possible for > the snapshot files to not duplicate data like that ? > > ALSO, > > I have noticed that if you: > > - dd 1gig /dev/zero file > - create snapshot > - overwrite that 1gig file with /dev/random > > (free space decreases by 1gig, as expected) > > - rewrite that 1gig file with /dev/zero again > > You _don't_ get that 1gig of free space back ... which > surprises me, since > it was all zeros before, and its all zeros now ... how does > the snapshot > know those are "different zeros" ? And what ramifications > does this have > for restoring, etc., if identical files do not get counted as > identical in > the snapshot ? > > thanks. > I just finished skimming an old paper by McKusick on Soft Updates: http://www.usenix.org/publications/library/proceedings/usenix99/full_pap ers/mckusick/mckusick.pdf This paper is dated 1999. Does anyone know if it accurately reflects how soft updates and snapshots in FreeBSD 5.4 are implemented? If so, it would answer the above questions. -gayn ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
Folks, On Thu, 20 Oct 2005, Gayn Winters wrote: > > Imagine that each data block is marked with labels > > on change. It doesn't matter how many labels there > > are, there will be only one data block saved. > > In trying to follow this thread, I started looking around for a precise > definition of snapshot. > Man mksnap_ffs > wasn't too helpful, and googling for "snapshot" etc. wasn't fruitful. > I'm guessing that the original author of the thread (user at dhp.com) > may also need such a definition. Can someone provide a pointer to a > specification or at least an RFC-like paper? I found one: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.snapshot?rev=1.4 and further, I did some tests and discovered that what I was being told (by you folks) was indeed correct. No matter how many snapshots you have, the changes in blocks since the tiem before the first snapshot is only recorded in one of them. That is to say, if I do the following: - create 4 1gig /dev/zero filled files - create a snapshot - overwrite one of those 1gig files with /dev/random My free space will have decreased by 1gig. So far so good. If I then: - create a second snapshot - overwrite a different 1gig file with /dev/random My free space merely decreases by another 1gig. It makes sense to me now because it has occurred to me that since the second file had not changed between the creation of the first and second snapshot, there is no reason for _both_ snapshots to _both_ say "this 1gig random file used to be filled with zeros" - it would be redundant. So that's great ... but I am curious, how do they know ? I think my previous assumption (that the first _and_ the second snapshot file would _both_ have to record the change of file #2 from zero to random) was based on the notion that these snapshot files were totally autonomous and independent, and had no general organization behind them. If that was the case, then I am still fairly certain both snapshots would need to record the change of the second file. So what is the behind the scenes organization that makes it possible for the snapshot files to not duplicate data like that ? ALSO, I have noticed that if you: - dd 1gig /dev/zero file - create snapshot - overwrite that 1gig file with /dev/random (free space decreases by 1gig, as expected) - rewrite that 1gig file with /dev/zero again You _don't_ get that 1gig of free space back ... which surprises me, since it was all zeros before, and its all zeros now ... how does the snapshot know those are "different zeros" ? And what ramifications does this have for restoring, etc., if identical files do not get counted as identical in the snapshot ? thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
RE: FreeBSD UFS2 snapshots, and math ...
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Andrew P. > Sent: Thursday, October 20, 2005 12:35 PM > To: user > Cc: freebsd-questions@freebsd.org > Subject: Re: FreeBSD UFS2 snapshots, and math ... > Imagine that each data block is marked with labels > on change. It doesn't matter how many labels there > are, there will be only one data block saved. In trying to follow this thread, I started looking around for a precise definition of snapshot. Man mksnap_ffs wasn't too helpful, and googling for "snapshot" etc. wasn't fruitful. I'm guessing that the original author of the thread (user at dhp.com) may also need such a definition. Can someone provide a pointer to a specification or at least an RFC-like paper? Thanks, -gayn ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
On 10/20/05, user <[EMAIL PROTECTED]> wrote: > > Hello, > > On 20 Oct 2005, Lowell Gilbert wrote: > > > user <[EMAIL PROTECTED]> writes: > > > > > Let's say I have a filesystem, and on that filesystem I create a snapshot > > > every single night, and every night I delete the snapshot from 5 nights > > > ago. This means that at all times, I have four snapshots running on that > > > filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago, > > > and one from 4 days ago. > > > > > > Let's also assume that the percent change of the filesystem is 5% (every > > > day 5% of the blocks in the filesystem are either changed or deleted). > > > > > > > > > > > > Does this mean that if that 5% change is a different 5% every day, that > > > the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day > > > ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for > > > a total of 50% of the total filesystem taken up with snapshot data ? > > > > No. One copy of each version of the file that exists in any > > snapshot. Regardless of how many snapshots it's in. > > > That doesn't make much sense to me ... if the snapshot keeps track of > changed_data_since_snapshot_was_taken, then ... > > Well, think of it this way - let's say I have a 1G filesystem, which is > filled with a single 500M text file. Now let's say I snapshot that FS. > At this point, the snapshot takes up 0 bytes. Now let's say the next day > I alter 10% (50M) of that single 500M file - now the snapshot takes up > that exact same amount of space, namely, 50M. > > Now I create a second snapshot, which immediately yakes up 0 bytes. The > next day, I change a totally different 50M of my text file ... so now, the > first snapshot needs to keep track of yesterdays 50M of changes/deletions > as well as todays, because todays operates on totally different disk > blocks. So now 2-day-ago snapshot is size 100M, and the snapshot from one > day ago is now 50M. > > I think my interpretation is correct ... can you look over my and your > conclusions again ? > > > > > The second question is this: > > > > > > If the 5% data changed per day is the _same_ 5% every day (perhaps > > > changing the same table in a DB every day, or perhaps changing the same > > > block of lines in a text file every day) does that mean that every day > > > simply represents 5%_of_filesystem, for a total of 20% of the total > > > filesystem in use at all times for snapshot data ? > > > > Whether it's the same data or not doesn't affect how much space you use. > > > Yeah ... see, I think it does matter, for the reasons above ... if, as in > this second example, I am changing the same blocks on disk every day, the > snapshot just needs to keep track of them once, namely "this is what they > were during the snapshot, and you can change those same blocks all you > want, I just need to keep track of what they were when you took the > snapshot.." > > comments ? > > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "[EMAIL PROTECTED]" > What makes you so reassured, I wonder. Imagine that each data block is marked with labels on change. It doesn't matter how many labels there are, there will be only one data block saved. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
Hello, On 20 Oct 2005, Lowell Gilbert wrote: > user <[EMAIL PROTECTED]> writes: > > > Let's say I have a filesystem, and on that filesystem I create a snapshot > > every single night, and every night I delete the snapshot from 5 nights > > ago. This means that at all times, I have four snapshots running on that > > filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago, > > and one from 4 days ago. > > > > Let's also assume that the percent change of the filesystem is 5% (every > > day 5% of the blocks in the filesystem are either changed or deleted). > > > > > > > > Does this mean that if that 5% change is a different 5% every day, that > > the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day > > ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for > > a total of 50% of the total filesystem taken up with snapshot data ? > > No. One copy of each version of the file that exists in any > snapshot. Regardless of how many snapshots it's in. That doesn't make much sense to me ... if the snapshot keeps track of changed_data_since_snapshot_was_taken, then ... Well, think of it this way - let's say I have a 1G filesystem, which is filled with a single 500M text file. Now let's say I snapshot that FS. At this point, the snapshot takes up 0 bytes. Now let's say the next day I alter 10% (50M) of that single 500M file - now the snapshot takes up that exact same amount of space, namely, 50M. Now I create a second snapshot, which immediately yakes up 0 bytes. The next day, I change a totally different 50M of my text file ... so now, the first snapshot needs to keep track of yesterdays 50M of changes/deletions as well as todays, because todays operates on totally different disk blocks. So now 2-day-ago snapshot is size 100M, and the snapshot from one day ago is now 50M. I think my interpretation is correct ... can you look over my and your conclusions again ? > > The second question is this: > > > > If the 5% data changed per day is the _same_ 5% every day (perhaps > > changing the same table in a DB every day, or perhaps changing the same > > block of lines in a text file every day) does that mean that every day > > simply represents 5%_of_filesystem, for a total of 20% of the total > > filesystem in use at all times for snapshot data ? > > Whether it's the same data or not doesn't affect how much space you use. Yeah ... see, I think it does matter, for the reasons above ... if, as in this second example, I am changing the same blocks on disk every day, the snapshot just needs to keep track of them once, namely "this is what they were during the snapshot, and you can change those same blocks all you want, I just need to keep track of what they were when you took the snapshot.." comments ? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
user <[EMAIL PROTECTED]> writes: > I am trying to budget some disk space for filesystems with snapshots > enabled on them. > > The following is simplified - I am just trying to get my concepts in > order: > > Let's say I have a filesystem, and on that filesystem I create a snapshot > every single night, and every night I delete the snapshot from 5 nights > ago. This means that at all times, I have four snapshots running on that > filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago, > and one from 4 days ago. > > Let's also assume that the percent change of the filesystem is 5% (every > day 5% of the blocks in the filesystem are either changed or deleted). > > > > Does this mean that if that 5% change is a different 5% every day, that > the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day > ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for > a total of 50% of the total filesystem taken up with snapshot data ? No. One copy of each version of the file that exists in any snapshot. Regardless of how many snapshots it's in. > Does that sound correct ? When I say that the 5% change is a different 5% > every day, what I mean is that it is not the same files/data being altered > every day, but rather there is 5% of new data changed every day, relative > to the previous nights snapshot. > > The second question is this: > > If the 5% data changed per day is the _same_ 5% every day (perhaps > changing the same table in a DB every day, or perhaps changing the same > block of lines in a text file every day) does that mean that every day > simply represents 5%_of_filesystem, for a total of 20% of the total > filesystem in use at all times for snapshot data ? Whether it's the same data or not doesn't affect how much space you use. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
Doug, On Thu, 20 Oct 2005, Doug Poland wrote: > On Thu, Oct 20, 2005 at 11:01:42AM -0400, user wrote: > > > > Finally, are there any snapshot diag tools at all ? Like, something that > > reports snapshot sizes, percent of disk used for snapshots, and maybe even > > a way for me to actually calculate what the percent change for time period > > X is for a particular filsystem >? > > > I find sysutils/freebsd-snapshot quite useful, although it doesn't do > everything you're asking for. Thanks - I will check that out. Any comments on my math ? (changing the same 5% of the FS all the time vs. changing different 5%'s, and what that means for successive snapshots) ? thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: FreeBSD UFS2 snapshots, and math ...
On Thu, Oct 20, 2005 at 11:01:42AM -0400, user wrote: > > Finally, are there any snapshot diag tools at all ? Like, something that > reports snapshot sizes, percent of disk used for snapshots, and maybe even > a way for me to actually calculate what the percent change for time period > X is for a particular filsystem >? > I find sysutils/freebsd-snapshot quite useful, although it doesn't do everything you're asking for. Be advised there are issues with snapshots of large filesystems. http://www.freebsd.org/cgi/query-pr-summary.cgi?category=&severity=&priority=&class=&state=&sort=none&text=snapshot&responsible=&multitext=&originator=&release= My experience shows that multiple snapshots of large filesystems, >30GB, causes the system to become unresponsive when a snap is created. If I limit the snapshot to 1/per large FS, then often the machine hangs on attempted reboots. -- Regards, Doug ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"
FreeBSD UFS2 snapshots, and math ...
I am trying to budget some disk space for filesystems with snapshots enabled on them. The following is simplified - I am just trying to get my concepts in order: Let's say I have a filesystem, and on that filesystem I create a snapshot every single night, and every night I delete the snapshot from 5 nights ago. This means that at all times, I have four snapshots running on that filesystem, one from 1 day ago, one from 2 days ago, one from 3 days ago, and one from 4 days ago. Let's also assume that the percent change of the filesystem is 5% (every day 5% of the blocks in the filesystem are either changed or deleted). Does this mean that if that 5% change is a different 5% every day, that the one day ago snapshot will be size 5%_of_filesystem, and that the 2 day ago snapshot will be size 10%_of_filesystem, day 3 15% and day 4 20%, for a total of 50% of the total filesystem taken up with snapshot data ? Does that sound correct ? When I say that the 5% change is a different 5% every day, what I mean is that it is not the same files/data being altered every day, but rather there is 5% of new data changed every day, relative to the previous nights snapshot. The second question is this: If the 5% data changed per day is the _same_ 5% every day (perhaps changing the same table in a DB every day, or perhaps changing the same block of lines in a text file every day) does that mean that every day simply represents 5%_of_filesystem, for a total of 20% of the total filesystem in use at all times for snapshot data ? - Finally, are there any snapshot diag tools at all ? Like, something that reports snapshot sizes, percent of disk used for snapshots, and maybe even a way for me to actually calculate what the percent change for time period X is for a particular filsystem >? Thank you. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "[EMAIL PROTECTED]"