Ming Zhang wrote:
> On Sat, 2005-11-12 at 15:46 -0600, David Masover wrote:
> 
>>Ming Zhang wrote:
>>
>>>On Fri, 2005-11-11 at 16:56 -0800, Peter van Hardenberg wrote:
>>>
>>>
>>>>On November 11, 2005 05:59 am, John Gilmore wrote:
>>>>
>>>>
>>>>>Does anybody remember GoBack? It was a versioning
>>>>>system for windows 95/98 that was incredibly flexible and useful. Tracked
>>>>>all changes to the whole disk. Old versions of a file? no problem. grab an
>>>>>old version of a directory for referance temporarily? easy. Got a virus?
>>>>>revert the whole HD, and then grab the newer copies of your documents and
>>>>>saved games as needed.
>>>>
>>>>My thoughts on this:
>>>>
>>>>The versioning would be an audit plugin. When the file is modified, tag the 
>>>>current version, copy it into a sub-directory (oh, I don't know, say 
>>>>file/.revisions/<number/date>), and disable write access to it. You might 
>>>>not 
>>>>even need extended filesystem attributes for this, but they would be handy 
>>>>for tagging particular versions.
>>>
>>>
>>>if a file is opened, modified 2 times, then closed. u will only generate
>>>1 version right? so "When the file is modified" is inaccurate.
>>
>>How about "When the transaction was completed?"  Why does it matter?
> 
> 
> then how u define a transaction? i mean we first need to choose a good
> event/period to define what is a good meaningful version.
> 
> 
> 
>>>>Copy-on-write would make this action extremely cheap, only adding a couple 
>>>>of 
>>>>extra writes to make it work.
>>>
>>>
>>>add 1 line at the beginning of a 100MB text file will make this uncheap.
>>
>>Who has to work with 100 meg text files?  And why has this person not
>>broken them down into 100 kilobyte text files?  Storage efficiency isn't
>>really an issue there...
> 
> 
> yes, 100MB/s text file is an extreme example, but a common case can be u
> delete 1 frame in a streaming media file.

What do you mean by "streaming"?  (To me, "streaming media" usually
means "over the Internet", which makes no sense here.)

> basically, a cow is not good
> for a data shift situation. u have >99% data unchanged, just their
> offset in file is changed. this lead to all blocks changed, then COW
> will need to copy a lot.

When do you have a data shift situation where this is significant enough
to impact COW, but not significant enough to affect normal performance?

As far as I know, *nix has no way to append to the beginning of a file,
so if you're editing a large video file, say several gigs of DVD, you
have to write out several gigs worth of data all over again because you
want it shifted.

The filesystem may eventually provide more intelligent ways of messing
with a file, and the COW system should be able to handle when a program
appends to or chops off the beginning of a file.

Until then, we can rely somewhat on programs optimizing for speed --
rather than rewrite several gigs, it could split the file into smaller
files (thus, only the file which was changed is copied), or make it a
sort of mini-FS in that it fragments the logical structure of the file
so that it writes as little as possible -- for instance, inserting a
clip in the middle might write to the end of a "project" file, instead
of shifting half of that file over first.  You'd keep versions of the
project file, not the stream (properly defragmented) you'd export when
you're done.

For cases where developers didn't have to deal with the speed issues, we
don't have to worry about it.  In the case of audio editing, if it's
actually messing with the sound itself, no COW in the world will catch
that.  If it's a mixing/sequencing program, that's usually stored as a
"project", accompanied by lots of little WAV files, which don't change,
and a tiny "project" file describing how they go together, which does
change.

And for text files and office documents, the sizes just aren't usually
enough for us to care.  My biggest OpenOffice.org document probably
isn't a hundred kilobytes, and my disk space is measured in gigabytes.
It'd take over ten thousand revisions to fill a gig with copies of one
of those files.  Sure, we could make an Oasis plugin for OO.o to use, so
all the contents of the document are stored as individual files, turned
into a zipfile on demand to match the current standard -- but that's not
worth it in the short term, and only really helps with presentations in
the long term.

Actually, while I think it'd be nice to be able to more advanced
splicing in a file (append or delete from the beginning or middle), I
think it's more important to come up with a sane way for a program to
access a file as a lot of little pieces, and to have a standard way of
serializing them for transport (email or otherwise).  Kind of like XML,
only it could be more efficient than the old model, instead of less.

Like XML in that XML allows programmers to dump internal structures to a
human-readable file without writing parsers and serializers.  Move the
serializing logic out to the FS, let it handle the performance, version
control, and export issues.

Reply via email to