Re: [zfs-discuss] A versioning FS

2006-10-13 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
  Before we start defining the first offocial functionality for this Sun 
  feature, 
  we should define a mapping for Mac OS, FreeBSD and Linux. It may make 
  sense, to 
  define a sub directory for the attribute directory for keeping old versions
  of a file.

 Definitely a sub-directory would be needed yes, and I don't agree to the
 first part.

Why not?

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-13 Thread Nicolas Williams
On Fri, Oct 13, 2006 at 11:03:51AM +0200, Joerg Schilling wrote:
 Nicolas Williams [EMAIL PROTECTED] wrote:
 
  On Wed, Oct 11, 2006 at 08:24:13PM +0200, Joerg Schilling wrote:
   Before we start defining the first offocial functionality for this Sun 
   feature, 
   we should define a mapping for Mac OS, FreeBSD and Linux. It may make 
   sense, to 
   define a sub directory for the attribute directory for keeping old 
   versions
   of a file.
 
  Definitely a sub-directory would be needed yes, and I don't agree to the
  first part.
 
 Why not?

Because I don't see how creating a sub-directory of the EA namespace for
storing FVs will step on the toes of anyone trying to map other
platforms' notions of EA onto Solaris'.  Is this being too optimistic?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-11 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling wrote:
  Nicolas Williams [EMAIL PROTECTED] wrote:
  
   You're arguing for treating FV as extended/named attributes :)
  
   I think that'd be the right thing to do, since we have tools that are
   aware of those already.  Of course, we're talking about somewhat magical
   attributes, but I think that's fine (though, IIRC, NFSv4 [RFC3530] has
   some strange verbiage limiting attributes to applications).
  
  I thought NFSv4 supports extended attributes. What limiting are you 
  aware of?

 It does.  I meant this on pg. 12:

  [...]  Named attributes
are meant to be used by client applications as a method to associate
application specific data with a regular file or directory.

FreeBSD and Linux implement something different also called extended attributes.
There should be a possibility to map from FreeBSD/Linux to Solaris.

 and this on pg. 36:

Named attributes are intended for data needed by applications rather
than by an NFS client implementation.  NFS implementors are strongly
encouraged to define their new attributes as recommended attributes
by bringing them to the IETF standards-track process.

See above... Since the extended attributes appeared on a Solaris ( 8 update???),
I was looking for a way to map simple exteneded attribute implementation as 
those on Mac OS, FreeBSD and Linux to the more general implementation on 
Solaris.

Before we start defining the first offocial functionality for this Sun feature, 
we should define a mapping for Mac OS, FreeBSD and Linux. It may make sense, to 
define a sub directory for the attribute directory for keeping old versions
of a file.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread przemolicc
On Fri, Oct 06, 2006 at 11:57:36AM -0700, Matthew Ahrens wrote:
 [EMAIL PROTECTED] wrote:
 On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC 
 wrote:
 But I would dearly like to have a versioning capability.
 
 Me too.
 Example (real life scenario): there is a samba server for about 200
 concurrent connected users. They keep mainly doc/xls files on the
 server.  From time to time they (somehow) currupt their files (they
 share the files so it is possible) so they are recovered from backup.
 Having versioning they could be said that if their main file is
 corrupted they can open previous version and keep working.
 ZFS snapshots is not solution in this case because we would have to
 create snapshots for 400 filesystems (yes, each user has its filesystem
 and I said that there are 200 concurrent connections but there much more
 accounts on the server) each hour or so.
 
 I completely disagree.  In this scenario (and almost all others), use of 
 regular snapshots will solve the problem.  'zfs snapshot -r' is 
 extremely fast, and I'm working on some new features that will make 
 using snapshots for this even easier and better-performing.
 
 If you disagree, please tell us *why* you think snapshots don't solve 
 the problem.

Matt,

think of night when some (maybe 5 %) people still work. Having snapshot
I would still have to create snapshots for 400 filesystems each hour because I
don't know which of them are working. And what about weekend ? Still
400 snaphosts each hour ? And 'zfs list' will list me 400*24*2=19200 lines ?
And how about organizations which has thousends people and keep their
files on one server ? Or ISP/free e-maila account providers who have millions ?

Imagine just ordinary people who use ZFS in their homes and forgot
creating snapshots ? Or they turn their computer on once and then don't
turn it off: they work daily (and create snapshot an hour) and don't
turn it off in the evening but leave it working and downloading some
films and musics. Still one snapshot an hour ? How many snapshot's a
day, a week a month ? Thousands ? And having ZFS which is _so_easy_ to use
does managing so many snapshots is ZFS-like feature ? (ZFS-like =
extremely easy).

The way ZFS is working right now is that it cares about disks
(checksumming), redundancy (raid*) and performance. Having versioning
would let ZFS care about people mistakes. And people do mistakes.
Yes, Matt, you are right that snapshots are a feature which might be used
here but it is not the most convenient in such scenarios. Snapshots are
probably much more useful then versioning in predictable scenarios: backup at 
night,
software development (commit new version) etc.  In highly unpredictable
environment (many users working in _diferent_ hours in different part ot
the world) you would have to create many thousands of snapshots. To deal
with them might be painfull.

Matt, I agree with you that having snapshots *solve* the problem with
400 filesystems because in SVM/UFS environemnt I _wouldn't_ have such
solution. But I feel that versioning would be much more convenient here.
Imagine that you are the admin of the server and ZFS has versioning: having a 
choice
what would you choose in this case ?

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread przemolicc
On Fri, Oct 06, 2006 at 02:08:34PM -0700, Erik Trimble wrote:
 Also, save-early-save-often  results in a version explosion, as does 
 auto-save in the app.  While this may indeed mean that you have all of 
 your changes around, figuring out which version has them can be 
 massively time-consuming.  Let's say you have auto-save set for 5 
 minutes (very common in MS Word). That gives you 12 versions per hour.  
 If you suddenly decide you want to back up a couple of hours, that 
 leaves you with looking at a whole bunch of files, trying to figure out 
 which one you want.  E.g. I want a file from about 3 hours ago. Do I 
 want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours 
 ago?  And, what if I've mis-remembered, and it really was closer to 4 
 hours ago?  Yes, the data is eventually there. However, wouldn't a 
 1-hour snapshot capability have saved you an enormous amount of time, by 
 being able to simplify your search (and, yes, you won't have _exactly_ 
 the version you want, but odds are you will have something close, and 
 you can put all the time you would have spent searching the FV tree into 
 restarting work from the snapshot-ed version).

Erik,

versioning could be managed by sort of versioning policy managed by
users. E.g. if a file, which is going to be saved right now (auto-saving),
has a previous version saved within last 30 minuts, don't create another
previous version. 
10:00 open  file f.xls
10:10   (...working...)
10:20   file.xls;1  (...auto save ...)
10:30   (...working...)
10:40   (...auto save ...) -don't create another
version because within
last 30 minuts there is
another, previous 
version

Another policy might be based on number of previous version: e.g. if there
are more then 10, purge the older.


 [...]
 
 
 To me, FV is/was very useful in TOPS-20 and VMS, where you were looking 
 at a system DESIGNED with the idea in mind, already have a user base 
 trained to use and expect it, and virtually all usage was local (i.e. no 
 network filesharing). None of this is true in the UNIX/POSIX world.

Versioning could be turned off per filesystem. And also could be
inherited from a parent - exactly like current compression.

przemol
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Joerg Schilling
Erik Trimble [EMAIL PROTECTED] wrote:

  The only idea I get thast matches this criteria is to have the versions
  in the extended attribute name space.
 
  Jörg
 

 Realistically speaking, that's my conclusion, if we want a nice clean, 
 well-designed solution. You need to hide the versioning info in the 
 meta-tags, and create a whole new API for accessing/manipulating them.   
 This easily solves (1) and (2) above, but (3) is the huge problem, as 
 having a new API means you need to change the SMB/NFS protocols to allow 
 for client machines to access the new API.  With the new Windows NTFS 

There is no need to extend NFS as NFS v4 already supports extended attributes.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling wrote:
  The only idea I get thast matches this criteria is to have the versions
  in the extended attribute name space.

 Indeed.  All that's needed then, CLI UI-wise, beyond what we have now is
 a way to rename versions extended attributes to new file,s or at least
 copy them (we have the latter).  And it nicely hides versions.  And it
 nicely provides an API for creating them on demand (magic extended
 attributes), and remote access.


The infrastructure is there - local or remote via NFSv4 - the problem
is that the extended attribute name space lacks definitions for usage.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 You're arguing for treating FV as extended/named attributes :)

 I think that'd be the right thing to do, since we have tools that are
 aware of those already.  Of course, we're talking about somewhat magical
 attributes, but I think that's fine (though, IIRC, NFSv4 [RFC3530] has
 some strange verbiage limiting attributes to applications).

I thought NFSv4 supports extended attributes. What limiting are you 
aware of?


Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Erik Trimble

Joseph Mocker wrote:
However would it be great if I could somehow easily FV  a file I am 
working on with some arbitrary (closed) application I am forced to use 
without the application really knowing about it, and with little or no 
actions I have to take to do so?



To paraphrase an old wive's tale:

That ain't gonna happen in my lifetime.

I think that this discussion thread has determined that you DON'T want 
to make file versioning visible to un-modified applications.


That said, given that it looks like a new FV API is the current favorite 
implementation way, I see no reason why you can't open your favorite 
app, edit FOO, and have ZFS do versioning on it behind the scenes.  You 
just won't be able to see the versions from inside your App. You'd need 
a FV-aware app (whether a GUI filebrower or cmdline util) to access the 
old versions, and potentially copy an older version to a new filename, 
allowing you to edit it in your non-FV-aware App.


-Erik



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Nicolas Williams
On Mon, Oct 09, 2006 at 12:44:34PM +0200, Joerg Schilling wrote:
 Nicolas Williams [EMAIL PROTECTED] wrote:
 
  You're arguing for treating FV as extended/named attributes :)
 
  I think that'd be the right thing to do, since we have tools that are
  aware of those already.  Of course, we're talking about somewhat magical
  attributes, but I think that's fine (though, IIRC, NFSv4 [RFC3530] has
  some strange verbiage limiting attributes to applications).
 
 I thought NFSv4 supports extended attributes. What limiting are you 
 aware of?

It does.  I meant this on pg. 12:

 [...]  Named attributes
   are meant to be used by client applications as a method to associate
   application specific data with a regular file or directory.

and this on pg. 36:

   Named attributes are intended for data needed by applications rather
   than by an NFS client implementation.  NFS implementors are strongly
   encouraged to define their new attributes as recommended attributes
   by bringing them to the IETF standards-track process.

and this on pg. 232:

17.1.  Named Attribute Definition

   The NFS version 4 protocol provides for the association of named
   attributes to files.  The name space identifiers for these attributes
   are defined as string names.  The protocol does not define the
   specific assignment of the name space for these file attributes.
   Even though the name space is not specifically controlled to prevent
   collisions, an IANA registry has been created for the registration of
   NFS version 4 named attributes.  Registration will be achieved
   through the publication of an Informational RFC and will require not
   only the name of the attribute but the syntax and semantics of the
   named attribute contents; the intent is to promote interoperability
   where common interests exist.  While application developers are
   allowed to define and use attributes as needed, they are encouraged
   to register the attributes with IANA.


Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread Jonathan Edwards


On Oct 8, 2006, at 23:54, Nicolas Williams wrote:


On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards wrote:

On Oct 8, 2006, at 22:46, Nicolas Williams wrote:

You're arguing for treating FV as extended/named attributes :)


kind of - but one of the problems with EAs is the increase/bloat in
the inode/dnode structures and corresponding incompatibilities with
other applications or tools.


This in a thread where folks [understandably] claim that storage is
cheap and abundant.  And I agree that it is.

Plus, I think you may be jumping to conclusions about the bloat of
extended attributes:


  Another approach might be to put it all
into the block storage rather than trying to stuff it into the
metadata on top.  If we look at the zfs on-disk structure instead and
simply extend the existing block pointer mappings to handle the diffs
along with a header block to handle the version numbers - this might
be an easier way out rather than trying to redefine or extend the
dnode structure.   Of course you'd still need a single attribute to
flag reading the version block header and corresponding diff blocks,
but this could go anywhere - even a magic acl perhaps .. i would
argue that the overall goal should be aimed toward the reduction of
complexity in the metadata nodes rather than attempting to extend
them and increase the seek/parse time.


Wait a minute -- the extended attribute idea is about *interfaces*,  
not
internal implementation.  I certainly did not argue that a file  
version

should be copied into an EA.


true, but I just find that the EA discussion is just as loaded as the FV
discussion that too often focuses on improvements in the metadata
space rather than the block data space.  I'm not talking about the file
version data .. rather the bplist for the file version data and possibly
causing this to live in the block data space instead of the dnode
DMU.  This way the FV will be completely accessible within the
filesystem block data structure instead of being abstracted back out
of the dnode DMU.  I would hold that the version data space
consumption should also be readily apparent on the filesystem level
and that versioned access should not impede the regular file
lookup or attribute caching.  It's a slight deviation from the typical
EA approach, but an important distinction to make to keep the
metadata structures relatively lean.

Let's keep interface and implementation details separate.  Most of  
this

thread has been about interfaces precisely because that's what users
will interact with; users won't care one bit about how it's all
implemented under the hood.


I'm not so sure you can separate the two without creating a hack.  I
would also argue that users (particularly the ones creating the
interfaces) will care about the implementation details since those
are the real underlying issues they'll be wrestling with.

.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-09 Thread David Dyer-Bennet

On 10/6/06, Erik Trimble [EMAIL PROTECTED] wrote:

David Dyer-Bennet wrote:
 On 10/6/06, Nicolas Williams [EMAIL PROTECTED] wrote:

  Maybe Erik would find it confusing.  I know I would find it
  _annoying_.
 
  Then leave it set to 1 version

 Per-directory?  Per-filesystem?

 Whatever.  What's the actual issue here?

 I don't recall that on TOPS-20 it was possible to not version.  What
 you could do is set your logout.cmd file to purge your space down to
 one copy when you logged out.
But see, that assumes you have a logout-type functionality to use. Which
indeed is possible for command-line usage, but then only in a very
limited way.   During a typical session, I access almost 20 NFS-mounted
directories. And anyone using autofs/automount trees gets even more.
You're saying that my logout script has to know about all of them to
keep things clean?  That's unrealistic.  And that still doesn't solve
the problem of people who use SAMBA or NFS from machines which don't
have an interactive shell logout system (i.e. Windows).


Seems entirely realistic to me that your logout script would know
about the things you routinely use.  People who don't log into any
system are more of a problem, though.  Various things come to mind,
like having a default number of files (so it doesn't expand without
limits), and maybe a regular cron job; but I've never worked in an
environment doing versioning for non-login users over the network, so
they're all theory, no idea how they'd work in practice.



 This worked fine for the users I knew; even on a system that didn't
 have as much as a gigabyte of disk storage total to support a few
 dozen software engineers.

The problem is we are comparing apples to oranges in user bases here.
TOPS-20 systems had a couple of dozen users (or, at most, a few
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of
thousands.  Plus, the number of files being created under typical modern
systems is at least two (and probably three or four) orders of magnitude
greater.  I've got 100,000 files under /usr in Solaris, and almost 1,000
under my home directory.  And I don't have anything significant in my
/home (no source code, no build/test trees, just misc business stuff).
What is managable with a few files quickly becomes unwieldy with more
than a few dozen.


I have to ask again -- is this theory?  Or have you actually worked on
a versioning filesystem?  And specifically on TOPS-20?  (I remember,
vaguely, that people found VMS versioning MUCH less comfortable to
work with than TOPS-20, and I don't know at this distance if that was
just because it was different, or because of subtle UI differences).

I don't think the number of files under /usr is relevant; how often do
you edit them by hand?  I'd expect an installation procedure to clean
up old versions when it was done installing new software; but if not a
simple purge would settle the matter.

I don't recall my directories having much fewer files then than now.
I have more *directories* now, but the number of files in a directory
is set by human issues and by development process issues, not by disk
space available.


This is what Nico and I are talking about:  if you turn on file
versioning automatically (even for just a directory, and not a whole
filesystem), the number of files being created explodes geometrically.


I don't see it; new versions are created *when you do something* to a
file; not from the file just sitting there.  And the number of files I
poke in a day, again, isn't controlled much by the disk space
available, it's controlled by *my time*, and so has stayed more
constant over the years.


  The above should be simple to do however -- a program does an open of
  a file name foo.bar.  ZFS / the file system routine would use the
  most recent version by default if no version info is given.

 How can version information be given without changing the APIs or
 putting the version number/string into the file name?

 The version number is part of the file name in all the examples I know
 about.  I'd find it useless without that; it has to be a real part of
 the filesystem, usable by everybody, not a special addon accessible
 only with one or two dedicated applications.

 Putting the version number/string into the file name is hard for me to
 accept.  It's what would lead to polluting my directories.

 Set your ls default to not show versions.  Isn't the problem then
 solved?  Maybe add that option to the GUI filesystem explorer as well.

But this requires modifying all the relevant apps, which is the same
amount of work as modifying them to use a new FV API.  It's not
transparent to the end-user.


I think the relevant apps are very different in the two cases.  File
listing tools are much rarer than file using tools, and in my case you
only need to modify the file listing tools.  In your case, you have to
modify every single file using tool.


 In practice, it never was a problem that I noticed, or that other
 people 

Re: [zfs-discuss] A versioning FS

2006-10-09 Thread David Dyer-Bennet

On 10/7/06, Erik Trimble [EMAIL PROTECTED] wrote:

Chad Leigh -- Shire.Net LLC wrote:
 Plus, the number of files being created under typical
 modern systems is at least two (and probably three or four) orders
 of magnitude greater.  I've got 100,000 files under /usr in Solaris,
 and almost 1,000 under my home directory.

 wimp :-)  I count 88,148 in my main home directory.  I'll bet just
 running gnome and firefox will get you in the ballpark of 1,000 :-/

 None (well, maybe 1 or 2)  of which you edit and hence would not
 generate versions.

 Chad

Richard actually brings up a good point, which answers another question
Chad had for me:  exactly how many files do I edit?   Which directly
impacts the directory pollution problem I've been talking about.

There are essentially three scenarios:

(a)  FV is turned on on a per-file basis

(b) FV is turned on on a per-directory basis

(c) FV is turned on on a per-filesystem basis


Now, I think we can all see that you get geometic file explosion in case
(c), as absolutely anything that writes to the filesystem gets
versioned.  Things like Web Browser caches alone would kill you.


Web browser caches (as normally used) would *never* generate a single
additional file version.  The web browsers use a naming algorithm to
prevent overwriting the same file, and that's the situation when a new
version is created.  They delete the files they decide they don't need
directly, rather than by overwriting the same name.

Your use of writes to the filesystem suggests to me you're thinking
of a different implementation of versioning than was in TOPS-20 and
VMS, and that (I think) most of us are discussing here.   The kind of
versioning I'm talking about works by keep old versions of a file
*when it's overwritten by a new version*.  It's the operation of
creating a new file with the same name as an old file that triggers
it; in current Unix semantics the old file is deleted, but in the kind
of FV I'm talking about, the old version is *kept* and the new version
is given an incremented version number to keep the names unique.

It has nothing to do with writing to files; if you update a file in
place, a new version isn't generated.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Ian Collins
David Dyer-Bennet wrote:


 Actually, save early and often is exactly why versioning is
 important.  If you discover you've gone down a blind alley in some
 code, it makes it easy to get back to the earlier spots.  This, in my
 experience, happens at a detail level where you won't (in fact can't)
 be doing checkins to version control.

Isn't that what your editor's undo command is for?

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Fri, Oct 06, 2006 at 06:22:01PM -0700, Joseph Mocker wrote:
 Nicolas Williams wrote:
 Automatically capturing file versions isn't possible in the general case
 with applications that aren't aware of FV.
 
 Don't snapshots have the same problem. A snapshot could potentially be 
 taken when a file is partially written or updated, no?

And backups in general.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Erik Trimble

Joerg Schilling wrote:

Erik Trimble [EMAIL PROTECTED] wrote:

  
In order for an FV implementation to be useful for this stated purpose, 
it must fulfill the following requirements:


(1)  Clean interface for users.  That is, one must NOT be presented with 
a complete list of all versions unless explicitly asked for it, and it 
should be simple to select a version based on some reasonable criteria 
(date of creation/modification, version number, etc.)


(2)  Simple way to decide if a file should be versioned or not. Either 
automatically version all files (or none at all), or provide a mechanism 
to turn FV on/off on a per-file or per-directory basis.


(3)  Network-FS awareness.  Without this, FV is severely limited. Given 
my preconditions above (that is, the current usage pattern of us in the 
non-FS world), limiting FV to those on the local system restricts its 
usefulness to the point where it isn't worth the effort.



The only idea I get thast matches this criteria is to have the versions
in the extended attribute name space.

Jörg

  
Realistically speaking, that's my conclusion, if we want a nice clean, 
well-designed solution. You need to hide the versioning info in the 
meta-tags, and create a whole new API for accessing/manipulating them.   
This easily solves (1) and (2) above, but (3) is the huge problem, as 
having a new API means you need to change the SMB/NFS protocols to allow 
for client machines to access the new API.  With the new Windows NTFS 
versioning, we at least have something to hook into for Windows, but 
UNIX clients will need to have a whole new suite of tools written, and a 
raft of current apps modified to take advantage of FV.


That said, FV may very well be worth it, and it certainly is worthy of a 
community-driven exploratory implementation.


-Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Sat, Oct 07, 2006 at 01:43:29PM +0200, Joerg Schilling wrote:
 The only idea I get thast matches this criteria is to have the versions
 in the extended attribute name space.

Indeed.  All that's needed then, CLI UI-wise, beyond what we have now is
a way to rename versions extended attributes to new file,s or at least
copy them (we have the latter).  And it nicely hides versions.  And it
nicely provides an API for creating them on demand (magic extended
attributes), and remote access.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Wee Yeh Tan

On 10/7/06, Ben Gollmer [EMAIL PROTECTED] wrote:

On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
 What I'm saying is that I'd like to be able to keep multiple
 versions of
 my files without echo * or ls showing them to me by default.

Hmm, what about file.txt - ._file.txt.1, ._file.txt.2, etc? If you
don't like the _ you could use @ or some other character.


You missed Nicolas's point.

It does not matter which delimiter you use.  I still want my for i in
*; do ... to work as per now.

We want to differentiate files that are created intentionally from
those that are just versions.  If files starts showing up on their
own, a lot of my scripts will break.  Still, an FV-aware
shell/program/API can accept an environment setting that may quiesce
the version output. E.g. export show-version=off/on.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Jonathan Edwards


On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:


On 10/7/06, Ben Gollmer [EMAIL PROTECTED] wrote:

On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
 What I'm saying is that I'd like to be able to keep multiple
 versions of
 my files without echo * or ls showing them to me by default.

Hmm, what about file.txt - ._file.txt.1, ._file.txt.2, etc? If you
don't like the _ you could use @ or some other character.


You missed Nicolas's point.

It does not matter which delimiter you use.  I still want my for i in
*; do ... to work as per now.

We want to differentiate files that are created intentionally from
those that are just versions.  If files starts showing up on their
own, a lot of my scripts will break.  Still, an FV-aware
shell/program/API can accept an environment setting that may quiesce
the version output. E.g. export show-version=off/on.



if we're talking implementation - i think it would make more sense to
store the block version differences in the base dnode itself rather than
creating new dnode structures to handle the different versions.  You'd
then structure different tools or flags to handle the versions (copy  
them
to a new file/dnode, etc) - standard or existing tools don't need to  
know

about the underlying versions.

.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet wrote:
 No, any sane VC protocol must specifically forbid the checkin of the
 stuff I want versioning (or file copies or whatever) for.  It's
 partial changes, probably doesn't compile, nearly certainly doesn't
 work.  This level of work product *cannot* be committed to the
 repository.
 
 [...]
 
 One of the big problems with CVS and SVN and Microsoft SourceSafe is
 that you don't have the benefits of version control most of the time,
 because all commits are *public*.

I think what you're saying is something like this: a VC repository is
one thing, but when I'm working on something not ready to put into that
repository I still want versioning in my workspace.

That's still VC though!

In Teamware you use SCCS for version control in your workspace, then, if
you have wx (a script built atop Teamware) you collapse the SCCS deltas
to remove all the intermediate work and 'putback' just the end result to
the parent repository.

In Teamware the distinction between repository and workspace isn't :)

But you can work that way in many other VCs.  In PRCS, for example, you
can checkout a project, check it into a new repository, check in changes
as you go, then later do this again with the trunk, merge, then
check-in to the original repository.  Or you can use one repository and
delete unsightly history.

Mercurial supports the model of development we use in ON based on
Teamware.  So you can also get version control for your intermediate
versions using Mercurial and lose the unsightly history when you're
ready to commit your changes to the gate.

It's been a while since I've used ClearCase, but I'm pretty sure there's
something like this there as well.

And, in any case, I think any good VC supports this.  And all should,
because with file versioning a la VMS I don't get a lot of things I
need, like comments, branches, history, merges, etc...

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Mon, Oct 09, 2006 at 09:27:14AM +0800, Wee Yeh Tan wrote:
 On 10/7/06, David Dyer-Bennet [EMAIL PROTECTED] wrote:
 I've never encountered branch being used that way, anywhere.  It's
 used for things like developing release 2.0 while still supporting 1.5
 and 1.6.
 
 However, especially with merge in svn it might be feasible to use a
 branch that way.  What's the operation to update the branch from the
 trunk in that scenario?
 
 You merge the changes from the main trunk.

I think David meant something else.  History of intermediate changes is
often useless, particularly if some of those changes don't build.

In ON development we've used Teamware for years, and for years we've had
a policy that intermediate deltas must be collapsed.  We have a script,
'wx', that can do that trivially, and good thing too, because collapsing
deltas without it is a pain.

(I.e., in Teamware terms, if you bringover version 1.7 of some file,
check-in 1.8, then 1.9, then putback to the parent workspace you'll be
creating versions 1.8 and 1.9 in the parent when noone needs to see 1.8,
so what you want to do is collapse those two deltas, which then become
version 1.8, and that's what you putback.)

But this is a lame argument for FV!  Because any good VC lets you
version intermediate work without polluting the main trunk when you're
done.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Sun, Oct 08, 2006 at 10:28:06PM -0400, Jonathan Edwards wrote:
 On Oct 8, 2006, at 21:40, Wee Yeh Tan wrote:
 On 10/7/06, Ben Gollmer [EMAIL PROTECTED] wrote:
 Hmm, what about file.txt - ._file.txt.1, ._file.txt.2, etc? If you
 don't like the _ you could use @ or some other character.
 
 It does not matter which delimiter you use.  I still want my for i in
 *; do ... to work as per now.

.prefix might be acceptable, but I rubs me the wrong way because of
this:

 We want to differentiate files that are created intentionally from
 those that are just versions.  If files starts showing up on their
 own, a lot of my scripts will break.  Still, an FV-aware
 shell/program/API can accept an environment setting that may quiesce
 the version output. E.g. export show-version=off/on.

Exactly.

 if we're talking implementation - i think it would make more sense to
 store the block version differences in the base dnode itself rather than
 creating new dnode structures to handle the different versions.  You'd
 then structure different tools or flags to handle the versions (copy  
 them
 to a new file/dnode, etc) - standard or existing tools don't need to  
 know
 about the underlying versions.

You're arguing for treating FV as extended/named attributes :)

I think that'd be the right thing to do, since we have tools that are
aware of those already.  Of course, we're talking about somewhat magical
attributes, but I think that's fine (though, IIRC, NFSv4 [RFC3530] has
some strange verbiage limiting attributes to applications).

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Wee Yeh Tan

On 10/9/06, Jonathan Edwards [EMAIL PROTECTED] wrote:

 We want to differentiate files that are created intentionally from
 those that are just versions.  If files starts showing up on their
 own, a lot of my scripts will break.  Still, an FV-aware
 shell/program/API can accept an environment setting that may quiesce
 the version output. E.g. export show-version=off/on.

if we're talking implementation - i think it would make more sense to
store the block version differences in the base dnode itself rather than
creating new dnode structures to handle the different versions.  You'd
then structure different tools or flags to handle the versions (copy
them
to a new file/dnode, etc) - standard or existing tools don't need to
know
about the underlying versions.


The beauty of extending the dnode is that it will continue to behave
nicely through renames or multiple hardlinks.  However, handling
Erik's concerns about recovering deleted files will require a bit more
work (mainly concerns about how a user will recover his file(s)).
There may also be performance considerations when if mass version
purging happens often.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Sun, Oct 08, 2006 at 11:16:21PM -0400, Jonathan Edwards wrote:
 On Oct 8, 2006, at 22:46, Nicolas Williams wrote:
 You're arguing for treating FV as extended/named attributes :)
 
 kind of - but one of the problems with EAs is the increase/bloat in  
 the inode/dnode structures and corresponding incompatibilities with  
 other applications or tools.

This in a thread where folks [understandably] claim that storage is
cheap and abundant.  And I agree that it is.

Plus, I think you may be jumping to conclusions about the bloat of
extended attributes:

   Another approach might be to put it all  
 into the block storage rather than trying to stuff it into the  
 metadata on top.  If we look at the zfs on-disk structure instead and  
 simply extend the existing block pointer mappings to handle the diffs  
 along with a header block to handle the version numbers - this might  
 be an easier way out rather than trying to redefine or extend the  
 dnode structure.   Of course you'd still need a single attribute to  
 flag reading the version block header and corresponding diff blocks,  
 but this could go anywhere - even a magic acl perhaps .. i would  
 argue that the overall goal should be aimed toward the reduction of  
 complexity in the metadata nodes rather than attempting to extend  
 them and increase the seek/parse time.

Wait a minute -- the extended attribute idea is about *interfaces*, not
internal implementation.  I certainly did not argue that a file version
should be copied into an EA.

Let's keep interface and implementation details separate.  Most of this
thread has been about interfaces precisely because that's what users
will interact with; users won't care one bit about how it's all
implemented under the hood.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Nicolas Williams
On Fri, Oct 06, 2006 at 07:37:47PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:
 This is what Nico and I are talking about:  if you turn on file  
 versioning automatically (even for just a directory, and not a  
 whole filesystem), the number of files being created explodes  
 geometrically.
 
 But it doesn't.  Unless you are editing geometrically more files.

Perhaps my filing habits aren't very good, as I have many files that
I've edited over the years in very few directories.  Why punish me?

(Also, I believe in the search better, search more, file/sort less model
that Gmail and friends promote.  Filing is a pain.  Searching should be
easy and fast.  Until we get to where searching is always simpler/faster
than scrolling through directory listings I simply could not accept
in-your-face FV.)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-08 Thread Joseph Mocker

Nicolas Williams wrote:

On Thu, Oct 05, 2006 at 05:25:17PM -0700, David Dyer-Bennet wrote:
  

No, any sane VC protocol must specifically forbid the checkin of the
stuff I want versioning (or file copies or whatever) for.  It's
partial changes, probably doesn't compile, nearly certainly doesn't
work.  This level of work product *cannot* be committed to the
repository.

[...]

One of the big problems with CVS and SVN and Microsoft SourceSafe is
that you don't have the benefits of version control most of the time,
because all commits are *public*.



I think what you're saying is something like this: a VC repository is
one thing, but when I'm working on something not ready to put into that
repository I still want versioning in my workspace.

That's still VC though!
  
This is just one class of problem that I think VC might be useful for. 
We could go on, specific case by case coming up with best practices for 
applications, but it seems to me that FV is trying to solve a general 
problem in general way. Whether that is a good idea or bad idea I don't 
know.


However would it be great if I could somehow easily FV  a file I am 
working on with some arbitrary (closed) application I am forced to use 
without the application really knowing about it, and with little or no 
actions I have to take to do so?





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Ben Gollmer

On Oct 6, 2006, at 6:15 PM, Nicolas Williams wrote:
What I'm saying is that I'd like to be able to keep multiple  
versions of

my files without echo * or ls showing them to me by default.


Hmm, what about file.txt - ._file.txt.1, ._file.txt.2, etc? If you  
don't like the _ you could use @ or some other character.


I'd like an option for ls(1), find(1) and friends to show file  
versions,
and a way to copy (or, rather, un-hide) selected versions files so  
that
I could now refer to them as usual -- when I do this I don't care  
to see

version numbers in the file name, I just want to give them names.


ln -s ._file.txt.1 first_published_draft.txt
ln -s ._file.txt.5 second_published_draft.txt


And, maybe, I'd like a way to write globs that match file versions
(think of extended globboing, as in KSH).


Hmm, I'm not exactly sure what you mean by this, but using a dotfile  
scheme would allow you to easily glob for the file names.



Similarly with applications that keep files open but keep writing
transactions in ways that the OS can't isolate without input from the
app.  E.g., databases.  fsync(2) helps here, but lots and lots of
fsync(2)s would result in no useful versioning.


Presumably you'd create a different fs for your database, turning the  
versioning property off. You'd be likely to want to adjust other fs  
parameters anyway, judging from some recent posts discussing how to  
get the best database performance.


--
Ben




PGP.sig
Description: This is a digitally signed message part
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Erik Trimble

Chad Leigh -- Shire.Net LLC wrote:
Plus, the number of files being created under typical 
modern systems is at least two (and probably three or four) orders 
of magnitude greater.  I've got 100,000 files under /usr in Solaris, 
and almost 1,000 under my home directory.


wimp :-)  I count 88,148 in my main home directory.  I'll bet just
running gnome and firefox will get you in the ballpark of 1,000 :-/


None (well, maybe 1 or 2)  of which you edit and hence would not 
generate versions.


Chad


Richard actually brings up a good point, which answers another question 
Chad had for me:  exactly how many files do I edit?   Which directly 
impacts the directory pollution problem I've been talking about.


There are essentially three scenarios:

(a)  FV is turned on on a per-file basis

(b) FV is turned on on a per-directory basis

(c) FV is turned on on a per-filesystem basis


Now, I think we can all see that you get geometic file explosion in case 
(c), as absolutely anything that writes to the filesystem gets 
versioned.  Things like Web Browser caches alone would kill you.


In case (b), there's quite a bit of explosion, too.  There are lots of 
apps which create, update, and destroy files frequently in various 
directories. Most Office and similar large user apps do this. So it is 
very, very easy to have many versions quickly.  This can be somewhat 
mitigated by NOT turning on FV in directories which are commonly used as 
temp dirs (e.g. ~/tmp)


In case (a), you are down to files you actively tell FV to use, which I 
agree can be quite manageable.  I tend to actively edit a couple of 
dozen files frequently, so that number can be manageable, so long as the 
number of versions is held down to some limit. 

However, in both case (a) and (b) for netFS users, exactly how are they 
supposed to indicate that they want FV turned on?  There is no symantics 
for doing this in any netFS protocol, so we'd have have to have custom 
API/tools for them to run to turn on FV.



Also, something to think about:  under FV, do old versions of a file 
which was deleted (via unlink() or similar) also get deleted?



-Erik




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Erik Trimble

Chad Leigh -- Shire.Net LLC wrote:
But see, that assumes you have a logout-type functionality to use. 
Which indeed is possible for command-line usage, but then only in a 
very limited way.   During a typical session, I access almost 20 
NFS-mounted directories. And anyone using autofs/automount trees gets 
even more. You're saying that my logout script has to know about all 
of them to keep things clean?  That's unrealistic.
It is up to you to come up with a scheme to keep things clean, the 
same way you do now anyway (downloads, etc),


Which is entirely reasonable if the number of places where FV is 
limited, but completely unrealistic if FV is turned on for a large 
number of places.  And much more difficult for those restricted to 
accessing File Versioned directories over a netFS, where scripting 
cleanups can be difficult or highly impractical.


  And that still doesn't solve the problem of people who use SAMBA or 
NFS from machines which don't have an interactive shell logout system 
(i.e. Windows).
It is still mounted on their desktops and they can still delete files 
with FV the same way they do now


No real issue.
Well   If the versions of everything are kept in the same directory, 
then you are going to have a VERY bad user experience with people using 
GUI file browsers.  Cleaning up multiple versions of the same file name 
is going to be tricky, and you will find people very frequently 
accidentally delete the wrong thing.  More importantly, people are going 
to consider it a big hassle to have to keep things tidy by hand.   If 
the versioning is kept somewhere different than the current file 
version, then this mitigates things a bit, but you still don't want to 
require people to clean this stuff up via a GUI.   And, with Windows, 
asking users to use the command prompt for what is normally a GUI 
operation isn't acceptable, from a general usability standpoint.



This worked fine for the users I knew; even on a system that didn't
The problem is we are comparing apples to oranges in user bases here. 
TOPS-20 systems had a couple of dozen users (or, at most, a few 
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
thousands.

Rarely.  Most of them have in the same range as VMS now or then.
Very, Very few VMS systems that I know about had more than a couple 
hundred users.  MIT's main VMS server had only about 2000, with less 
than half that active. A couple of Fortune 500 companies I've worked at 
in the 90s had VMS systems, and they had very restricted user bases.  
VMS simply was never used as a general-purpose file server, and if there 
were a fairly large number of users, they were logged in via some custom 
app, and never really used the system in the manner we are discussing here.


On the other hand, virtually all the companies I've worked for have had 
a UNIX-based file server, with at least a hundred or more UIDs.  And 
with Single Sign-on and LDAP becoming the way to go, even mid-sized 
companies have systems with over a 1000 users.  10,000 active users 
isn't hard to come up with at all.  And, given that Enterprises are a 
main target for ZFS, millions of users are entirely within reason.


But this requires modifying all the relevant apps, which is the same 
amount of work as modifying them to use a new FV API.  It's not 
transparent to the end-user.


Because the semantics of a file name are different on a unix/posix 
system than they are on a VMS or TOPS-20 system, which had more 
structured filenames.  I would say that the version cannot be an 
actual part of the file name but would have to be meta data.  However, 
it could display as part of the username and the underlying system can 
be made to do the right thing


ie,

foo gets you the latest foo

Specifically entering in  foo;7 gets you version 7 or the latest if 
there are less than 7 versions available.  The app can think of it as 
being part of the file name, but the underlying system would have to 
know how to do the right thing in extracting the version out and 
making it meta data.  Takes some thinking and I am not claiming to 
have all the answers right now, but hardly undoable.


No app changes are necessary.


No, this is untrue.  Remember that you can't use any character to 
indicate FV, as all characters are valid POSIX file names. (well, except 
'/'). You CAN'T say foo;8 gives me version 8 of the file foo, 
because there very well might be a completely different file name 
foo;8 that is NOT any version of the file foo.  VMS and TOPS had 
reserved characters for file versioning, and thus you were set. This 
isn't true in UNIX filesystems.


The only way to do FV in the POSIX concept is to either keep the file 
versions in a separate file tree than the current files, or to use 
some sort of an API to access them, and otherwise keep them normally 
hidden from view.


You can't dodge this by simply saying oh, well, then change the FV 
delimiter if it causes you problems.  Aside from 

Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Joerg Schilling
Jeremy Teo [EMAIL PROTECTED] wrote:

 A couple of use cases I was considering off hand:

 1. Oops i truncated my file
 2. Oops i saved over my file
 3. Oops an app corrupted my file.
 4. Oops i rm -rf the wrong directory.
 All of which can be solved by periodic snapshots, but versioning gives
 us immediacy.

I am sure that the same people who accitental type rm -rf * 
would type rm -rf *\;*

And note that this feature would cause a need to change a lot 
of utilities including all shells (see path name expansion).

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Joerg Schilling
Nicolas Williams [EMAIL PROTECTED] wrote:

 On Fri, Oct 06, 2006 at 12:02:16PM -0700, Matthew Ahrens wrote:
  In my opinion, the marginal benefit of per-write(2) versions over 
  snapshots (which can be per-transaction, ie. every ~5 seconds) does not 
  outweigh the complexity of implementation and use/administration.

 Per-write(2) versions would be worse than useless in many, if not most
 cases.  Even per-close(2) versions wouldn't always be useful.

Even if there is a proper way to find the right time for a micro snapshot,
if the versions live in the standard namespace of the filesystem, it would
cause POSIX compatibility problems and we would need to change many programs.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Joerg Schilling
David Dyer-Bennet [EMAIL PROTECTED] wrote:

 On 10/6/06, Erik Trimble [EMAIL PROTECTED] wrote:
  First of all, let's agree that this discussion of File Versioning makes
  no more reference to its usage as Version Control.  That is, we aren't
  going to talk about it being useful for source code, other than in the
  context where a source code file is a document, like any other text
  document.  File Versioning and Version Control are separate things, with
  different purposes and feature sets.

 Hmm; the most important uses of file versioning come, in my opinion,
 when working on source code.  But for handling very different
 situations than source control does.

  OK. So, now we're on to FV.  As Nico pointed out, FV is going to need a
  new API.  Using the VMS convention of simply creating file names with a
  version string afterwards is unacceptible, as it creates enormous
  directory pollution, not to mention user confusion.  So, FV has to be
  invisible to non-aware programs.

 Strongly disagree, twice.

 Having FV invisible to programs not updated to specially support it is
 IMHO unacceptable, and would render the feature useless.

Making it visible to programs causes many problems with OSIX compatibility and
will enforce to change many programs.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-07 Thread Joerg Schilling
Erik Trimble [EMAIL PROTECTED] wrote:


 In order for an FV implementation to be useful for this stated purpose, 
 it must fulfill the following requirements:

 (1)  Clean interface for users.  That is, one must NOT be presented with 
 a complete list of all versions unless explicitly asked for it, and it 
 should be simple to select a version based on some reasonable criteria 
 (date of creation/modification, version number, etc.)

 (2)  Simple way to decide if a file should be versioned or not. Either 
 automatically version all files (or none at all), or provide a mechanism 
 to turn FV on/off on a per-file or per-directory basis.

 (3)  Network-FS awareness.  Without this, FV is severely limited. Given 
 my preconditions above (that is, the current usage pattern of us in the 
 non-FS world), limiting FV to those on the local system restricts its 
 usefulness to the point where it isn't worth the effort.

The only idea I get thast matches this criteria is to have the versions
in the extended attribute name space.

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Michael Schuster
I seem to remember that one could configure the max. number of versions VMS 
would retain for you on a per-file basis - setting this to 1 would de facto 
turn off versioning.
IFF versioning were implemented in ZFS, AND was made configurable on a 
per-file basis (everything else wouldn't make any sense at all, IMO), the 
default could be set to 1, to avoid the various horror scenarios that have 
been painted here, and people could increase the number of versions they want 
for those files that need it.


cheers
Michael

Chad Leigh -- Shire.Net LLC wrote:


On Oct 5, 2006, at 5:40 PM, Erik Trimble wrote:


And, try thinking of a directory with a few dozen files in it, each with
a dozen or more versions. that's hideous, from a normal user standpoint.
VMS's implementation of filename;version is completely unwieldy if
you have more than a few files,


No it is not. I  worked for DEC and used VMS up through 1993 and never 
found it unwieldy.  Even if I had 100 versions of one file.  It is


1) what you are used to

2) what you are trained to do

that makes it unwieldy or not

I find the unix conventions of storying a file and file~ or any of the 
other myriad billion ways of doing it that each app has invented to be 
much more unwieldy.


Yes, you have to purge your directories once in a while. The same way 
you have to clean up any file mess you make on you computer (download 
area, desktop, etc).



or more than a few versions. And, in
modern typical use, it is _highly_ likely both will be true.


So what if you have more than a few versions of a file.

Beauty is in the eye of the beholder, and just because YOU find it 
unwieldy does not make it so for the general user or anyone else.


I would LOVE to have a VMS style (sorry, my TOPS-20 usage was very 
little so I have no remembrance of it there) file versioning built in to 
the system.


save early, save often ONLY makes sense with a file versioning system, 
or else you lose previous edits if you decide you have gone down a wrong 
alley.


Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Michael Schuster  +49 89 46008-2974 / x62974
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread przemolicc
On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC wrote:
 
 But I would dearly like to have a versioning capability.

Me too.
Example (real life scenario): there is a samba server for about 200
concurrent connected users. They keep mainly doc/xls files on the
server.  From time to time they (somehow) currupt their files (they
share the files so it is possible) so they are recovered from backup.
Having versioning they could be said that if their main file is
corrupted they can open previous version and keep working.
ZFS snapshots is not solution in this case because we would have to
create snapshots for 400 filesystems (yes, each user has its filesystem
and I said that there are 200 concurrent connections but there much more
accounts on the server) each hour or so.


przemol


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Jeremy Teo

A couple of use cases I was considering off hand:

1. Oops i truncated my file
2. Oops i saved over my file
3. Oops an app corrupted my file.
4. Oops i rm -rf the wrong directory.
All of which can be solved by periodic snapshots, but versioning gives
us immediacy.

So is immediacy worth it to you folks? I rather not embark on writing
and finishing code on something no one wants besides me.
--
Regards,
Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Nicolas Williams
On Fri, Oct 06, 2006 at 11:25:29PM +0800, Jeremy Teo wrote:
 A couple of use cases I was considering off hand:
 
 1. Oops i truncated my file
 2. Oops i saved over my file
 3. Oops an app corrupted my file.
 4. Oops i rm -rf the wrong directory.
 All of which can be solved by periodic snapshots, but versioning gives
 us immediacy.

There's been talk of making every transaction a snapshot.

Of course, there'd be no information as to whether a transaction
includes a file close, or truncation, or whatever.

IMO a file versioning API would be good, but file versioning should
normally be invisible, particularly to applications that are not aware
of it (which would be every application to date).

So think about the interfaces first.

I think ls(1) would have to be made version-aware.  And
cp(1)/mv(1)/ln(1).  That would be enough for a start.

Then add find/sfind and tar/star support.

And GNOME support.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Ed Plese
On Fri, Oct 06, 2006 at 09:40:22AM +0200, [EMAIL PROTECTED] wrote:
 Example (real life scenario): there is a samba server for about 200
 concurrent connected users. They keep mainly doc/xls files on the
 server.  From time to time they (somehow) currupt their files (they
 share the files so it is possible) so they are recovered from backup.
 Having versioning they could be said that if their main file is
 corrupted they can open previous version and keep working.
 ZFS snapshots is not solution in this case because we would have to
 create snapshots for 400 filesystems (yes, each user has its filesystem
 and I said that there are 200 concurrent connections but there much more
 accounts on the server) each hour or so.

Why is creating that many snapshots a problem?  The somewhat recent addition
of recursive snapshots (zfs snapshot -r) reduces this to a single command.
Taking individual snapshots of each filesystem can take a decent amount
of time, but I was under the impression that recursive snapshots would
be much faster due to the snapshots being committed in a single transaction.
Is this not correct?


Ed Plese
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Matthew Ahrens

[EMAIL PROTECTED] wrote:

On Fri, Oct 06, 2006 at 01:14:23AM -0600, Chad Leigh -- Shire.Net LLC wrote:

But I would dearly like to have a versioning capability.


Me too.
Example (real life scenario): there is a samba server for about 200
concurrent connected users. They keep mainly doc/xls files on the
server.  From time to time they (somehow) currupt their files (they
share the files so it is possible) so they are recovered from backup.
Having versioning they could be said that if their main file is
corrupted they can open previous version and keep working.
ZFS snapshots is not solution in this case because we would have to
create snapshots for 400 filesystems (yes, each user has its filesystem
and I said that there are 200 concurrent connections but there much more
accounts on the server) each hour or so.


I completely disagree.  In this scenario (and almost all others), use of 
regular snapshots will solve the problem.  'zfs snapshot -r' is 
extremely fast, and I'm working on some new features that will make 
using snapshots for this even easier and better-performing.


If you disagree, please tell us *why* you think snapshots don't solve 
the problem.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Matthew Ahrens

Jeremy Teo wrote:

A couple of use cases I was considering off hand:

1. Oops i truncated my file
2. Oops i saved over my file
3. Oops an app corrupted my file.
4. Oops i rm -rf the wrong directory.
All of which can be solved by periodic snapshots, but versioning gives
us immediacy.

So is immediacy worth it to you folks? I rather not embark on writing
and finishing code on something no one wants besides me.


In my opinion, the marginal benefit of per-write(2) versions over 
snapshots (which can be per-transaction, ie. every ~5 seconds) does not 
outweigh the complexity of implementation and use/administration.


--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Joseph Mocker

Matthew Ahrens wrote:



If you disagree, please tell us *why* you think snapshots don't solve 
the problem.


Technically there's a race condition here. If you're taking regular 
snapshots, you might see


10:25 - snapshot 1 - myfile.xls version 21
10:26 -- myfile.xls version 22
10:27 -- myfile.xls version 23 - corrupted
10:30 - snapshot 2 - myfile.xls version 23 - corrupted

So if you need to roll back to a previous version, the most recent 
non-corrupt version (22)  is lost.


Snapshots are a decent alternative but not as comprehensive and perhaps 
automatic as people would like.


 --joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread David Dyer-Bennet

On 10/6/06, Matthew Ahrens [EMAIL PROTECTED] wrote:

Jeremy Teo wrote:
 A couple of use cases I was considering off hand:

 1. Oops i truncated my file
 2. Oops i saved over my file
 3. Oops an app corrupted my file.
 4. Oops i rm -rf the wrong directory.
 All of which can be solved by periodic snapshots, but versioning gives
 us immediacy.

 So is immediacy worth it to you folks? I rather not embark on writing
 and finishing code on something no one wants besides me.

In my opinion, the marginal benefit of per-write(2) versions over
snapshots (which can be per-transaction, ie. every ~5 seconds) does not
outweigh the complexity of implementation and use/administration.


It may quite possibly not be worth adding the second, fairly similar,
facility.  In addition to the points you cite, trying to explain to
average users what the two are and when to use each one would be
fairly challenging.

All the arguments about piles of version seem to apply in spades to
taking snapshots every 5 seconds.  And given the snapshot hierarchy,
it's much harder to find your file in the snapshot you want (let's say
your file is 5 or 10 directories down, quite common in source trees in
my experience; you have to go back to the top and navigate to
~/.zfs/weirdsnapshotdirectoryname/foo/bar/mumble/bag/baz/etc/the-file-I-want.cpp
in each snapshot that might have the version you're looking for.

I'd say the snapshot system is not as good as file versioning for the
tasks I think file versioning is best for.  However, snapshotting at
very freaquent intervals would definitely capture close enough to the
version I need to retrieve to make it a tolerable alternative.  The
user interface to it for retrieving a file is rather harder to use, it
seems to me, and that might possibly discourage use when it would have
been helpful.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Joseph Mocker

Chad Leigh -- Shire.Net LLC wrote:



disclaimer:  I have not used zfs snapshots a lot as I am still  
experimenting with zfs, but they appear to be similar to freebsd  
snapshots, with which I am familiar.


The user experience with snapshots, in terms of file versioning (#1,  
#2, maybe #3) is much worse than a true file versioning user  
experience.  People are oriented to their files, not to snapshots.   
And I may not want versioning with all my files (object files etc)  
which you would get with the snapshots.


disclaimer: ditto

I tend to agree with Chad though. If you are taking snapshots every 5 
seconds like Matthew suggests in a earlier reply, how does a user easily 
go back to previous versions without encountering a bunch of duplicated 
versions in the myriad of snapshots that are being taken. If the 
latest snapshot is number 2000, for example, and my file was last 
changed in snapshot 450. How do I easily figure that out without walking 
through snapshots 1999 - 451 before finding it?


 --joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Erik Trimble
First of all, let's agree that this discussion of File Versioning makes 
no more reference to its usage as Version Control.  That is, we aren't 
going to talk about it being useful for source code, other than in the 
context where a source code file is a document, like any other text 
document.  File Versioning and Version Control are separate things, with 
different purposes and feature sets.



OK. So, now we're on to FV.  As Nico pointed out, FV is going to need a 
new API.  Using the VMS convention of simply creating file names with a 
version string afterwards is unacceptible, as it creates enormous 
directory pollution, not to mention user confusion.  So, FV has to be 
invisible to non-aware programs.


Now we have a problem:  how do we access FV for non-local (e.g. 
SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is in 
the network file server arena, unless we can use FV over the network, it 
is useless.  You can't modify the SMB or NFS protocol (easily or 
quickly) to add FV functionality (look how hard it was to add ACLs to 
these protocols).


About the only way I can think around this problem is to store versions 
in a special subdir of each directory (e.g. .zfs_version), which would 
then be browsable over the network, using tools not normally FV-aware.  
But this puts us back into the problem of a directory which potentially 
has hundreds or thousands of files.


Also, save-early-save-often  results in a version explosion, as does 
auto-save in the app.  While this may indeed mean that you have all of 
your changes around, figuring out which version has them can be 
massively time-consuming.  Let's say you have auto-save set for 5 
minutes (very common in MS Word). That gives you 12 versions per hour.  
If you suddenly decide you want to back up a couple of hours, that 
leaves you with looking at a whole bunch of files, trying to figure out 
which one you want.  E.g. I want a file from about 3 hours ago. Do I 
want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15 hours 
ago?  And, what if I've mis-remembered, and it really was closer to 4 
hours ago?  Yes, the data is eventually there. However, wouldn't a 
1-hour snapshot capability have saved you an enormous amount of time, by 
being able to simplify your search (and, yes, you won't have _exactly_ 
the version you want, but odds are you will have something close, and 
you can put all the time you would have spent searching the FV tree into 
restarting work from the snapshot-ed version).


Remember, FV's main audience is going to be naive users, not us 
technical users, who generally have the problem that FV solves under 
control (yes, FV would make it easier for us, but we're not the primary 
target).  Version explosion (and the consequential problem of picking 
the right version to edit) is a huge problem for the naive audience.


Also, a big difference between Snapshots and FV tends to be who controls 
EOL-ing a version/Snapshot.  Snapshots tend to be done by the Admin, and 
their aging strictly controlled and defines (e.g. we keep hourly 
snapshots for 1 week). File versioning is typically under the control 
of the End-User, as their utility is much more nebulously defined.   
Certainly, there is no ability to truncate based on number of versions 
(e.g. we only allow 100 versions to be kept), since the frequency of 
versioning a file varies widely.  Aging on a version is possibly a 
better answer, but this runs into a problem of user education, where we 
have to retrain our users to stop making frequent copies of important 
documents (like they do now, in absence of FV), but _do_ remember to dig 
through the FV archive periodically to save a desirable old copy.   
Also, if  managing FV is to be a User task, how are they to do it over 
NFS/SAMBA?  And, log into the NFS server to do a cleanup isn't an 
acceptable answer.


Also, FV is only useful for apps which do a close() on a file (or at 
least, I'm assuming we wait for a file to signal that it is closed 
before taking a version - otherwise, we do what? take a version every X 
minutes while the file still open? I shudder to think about the 
implementation of this, and its implications...).  How many apps keep a 
file open for a long period of time?  FV isn't useful to them, only an 
unlimited undo functionality INSIDE the app.


Lastly, consider the additional storage requirement of FV, and exactly 
how much utility you gain for sacrificing disk space.
Look at this scenario:  I'm editing a file, making 1MB of change per 5 
minutes (a likely scenario when actively editing any Office-style 
document), of which only 50% to I actually make permanent (the rest 
being temp edits for ideas I decide to change or throw out).  If I'm 
auto-saving every 5 minutes, that means I use 12MB of version space per 
hour. If I took a hourly snapshot, then I need only 6MB of storage.  The 
situation gets worse, for the primary usefulness of FV is for files 
which are frequently 

Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Erik Trimble

Chad Leigh -- Shire.Net LLC wrote:
disclaimer:  I have not used zfs snapshots a lot as I am still 
experimenting with zfs, but they appear to be similar to freebsd 
snapshots, with which I am familiar.


The user experience with snapshots, in terms of file versioning (#1, 
#2, maybe #3) is much worse than a true file versioning user 
experience.  People are oriented to their files, not to snapshots.  
And I may not want versioning with all my files (object files etc) 
which you would get with the snapshots.


Chad

You can't turn off and on File Versioning at the file level. At least, I 
can't imaging trying to support (i.e. write) this kind of functionality 
into ZFS.  File Versioning would be a tunable parameter for each 
filesystem.   So, you'd have to store your object files on a different 
filesystem than your code. Which would make snapshots no different than 
FV, w/r/t keeping versions of the code, and not the object files.


-Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Chad Leigh -- Shire.Net LLC


On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:

First of all, let's agree that this discussion of File Versioning  
makes no more reference to its usage as Version Control.  That is,  
we aren't going to talk about it being useful for source code,  
other than in the context where a source code file is a document,  
like any other text document.  File Versioning and Version Control  
are separate things, with different purposes and feature sets.



OK. So, now we're on to FV.  As Nico pointed out, FV is going to  
need a new API.  Using the VMS convention of simply creating file  
names with a version string afterwards is unacceptible, as it  
creates enormous directory pollution,


Assumption, not supported.  Eye of the  beholder.


not to mention user confusion.


Assumption, not supported.


So, FV has to be invisible to non-aware programs.


yes



Now we have a problem:  how do we access FV for non-local (e.g.  
SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is  
in the network file server arena,


Assumption, and definitely not supported.   It is very useful outside  
of the file sharing arena.



unless we can use FV over the network, it is useless.


Wrong

You can't modify the SMB or NFS protocol (easily or quickly) to add  
FV functionality (look how hard it was to add ACLs to these  
protocols).


About the only way I can think around this problem is to store  
versions in a special subdir of each directory (e.g. .zfs_version),  
which would then be browsable over the network, using tools not  
normally FV-aware.  But this puts us back into the problem of a  
directory which potentially has hundreds or thousands of files.


This directory way of doing it is not a good way.  It fails the ease  
of use to the end user test.


The VMS way is far superior.  The problem is that you have to make  
sure that apps that are not FV aware have no problems, which means  
you cannot just append something to the actual file name. It has to  
be some sort of meta data.




Also, save-early-save-often  results in a version explosion, as  
does auto-save in the app.


Does not have to.  In VMS it is configurable on how many versions you  
want to save before it does an auto purge. A simple purge command  
then cleans things up for you.  Very minimal requirements for  
retraining the user.  Set the default configuration to be a max of  
1 version and you have no problems unless you turn it on.


While this may indeed mean that you have all of your changes  
around, figuring out which version has them can be massively time- 
consuming.


Your assumption.  (And much less hard than using snapshots).

Let's say you have auto-save set for 5 minutes (very common in MS  
Word). That gives you 12 versions per hour.


So?

If you suddenly decide you want to back up a couple of hours, that  
leaves you with looking at a whole bunch of files, trying to figure  
out which one you want.  E.g. I want a file from about 3 hours ago.  
Do I want the one from 2:45, 2:50, 2:55, 3:00, 3:05, 3:10, or 3:15  
hours ago?


Look at the file create time.  Take a quick look at the contents if  
you are confused.  At least you HAVE the capability to go back.


  And, what if I've mis-remembered, and it really was closer to 4  
hours ago?


Simple file system tools help me find it.

Yes, the data is eventually there. However, wouldn't a 1-hour  
snapshot capability have saved you an enormous amount of time,


No.  Managing the versions is not hard like you say.  I lived on VMS  
for years and it was never a problem.  It is your mindset and your  
preconceived notions that is the problem


by being able to simplify your search (and, yes, you won't have  
_exactly_ the version you want, but odds are you will have  
something close, and you can put all the time you would have spent  
searching the FV tree into restarting work from the snapshot-ed  
version).


I would much rather take an extra 2 minutes futzing around with the  
FV saved versions than trying to recreate what I had done.  And  
snapshots are not user friendly from a UI perspective -- funny  
strange directories and having to dig around in them.




Remember, FV's main audience is going to be naive users, not us  
technical users,


No, it is US technical users as much as the naive user.

who generally have the problem that FV solves under control (yes,  
FV would make it easier for us, but we're not the primary target).


We do?  I have often edited system files and then wanted to go back  
to something I deleted earlier as I realized it was the wrong one.


Version explosion (and the consequential problem of picking the  
right version to edit) is a huge problem for the naive audience.




This statement is naive itself and is unsupportable.  Where are the  
usability tests that support this?  VMS has a LONG HISTORY and is/was  
used by a lot of what you call naive users.  FV never caused any  
problems that I encountered or indeed that DEC encountered as 

Re: [zfs-discuss] A versioning FS

2006-10-06 Thread David Dyer-Bennet

On 10/6/06, Erik Trimble [EMAIL PROTECTED] wrote:

First of all, let's agree that this discussion of File Versioning makes
no more reference to its usage as Version Control.  That is, we aren't
going to talk about it being useful for source code, other than in the
context where a source code file is a document, like any other text
document.  File Versioning and Version Control are separate things, with
different purposes and feature sets.


Hmm; the most important uses of file versioning come, in my opinion,
when working on source code.  But for handling very different
situations than source control does.


OK. So, now we're on to FV.  As Nico pointed out, FV is going to need a
new API.  Using the VMS convention of simply creating file names with a
version string afterwards is unacceptible, as it creates enormous
directory pollution, not to mention user confusion.  So, FV has to be
invisible to non-aware programs.


Strongly disagree, twice.

Having FV invisible to programs not updated to specially support it is
IMHO unacceptable, and would render the feature useless.

I remember it being a bit inconvenient on VMS.  It wasn't on TOPS-20.
I'll have to look into what the TOPS-20 conventions were again (I used
TOPS-20 from 1977 to 1985, but hardly touched it since), but I found
them very friendly and easy to work with, not confusing, etc.  They
weren't *that* different from the VMS approach, but this is probably
one of those situations where tiny tweaks to user interface make a
huge difference to user experience.


Also, FV is only useful for apps which do a close() on a file (or at
least, I'm assuming we wait for a file to signal that it is closed
before taking a version - otherwise, we do what? take a version every X
minutes while the file still open? I shudder to think about the
implementation of this, and its implications...).  How many apps keep a
file open for a long period of time?  FV isn't useful to them, only an
unlimited undo functionality INSIDE the app.


It's the rewrite scenario; when we open or rename a file on top of an
existing file, the new file gets an incremented version number, and
the old file stays around.


Lastly, consider the additional storage requirement of FV, and exactly
how much utility you gain for sacrificing disk space.


It was something we could afford, and did afford, on TOPS-20 systems
where having three RP06 disk pack systems (at 200MB each) was
considered rather a lot of storage.  Today it's a complete non-issue.
Disk space is free.


To me, FV is/was very useful in TOPS-20 and VMS, where you were looking
at a system DESIGNED with the idea in mind, already have a user base
trained to use and expect it, and virtually all usage was local (i.e. no
network filesharing). None of this is true in the UNIX/POSIX world.


When TOPS-20 was introduced, essentially nobody was used to file
versioning.  When VMS was introduced, very few people were used to
file versioning (and the TOPS-20 community mostly moved to Unix rather
than VMS).  TOPS-20 wasn't the first system I used (it was the fifth,
Ithink), or even the first timesharing system (the third, I believe).
File versioning was one of those instant love features; it was
instantly obvious how it worked, how to use it, and how beneficial it
was.

I see network file access as a non-issue; the version gets treated as
part of the file name, just as it did on all the previous systems that
supported file versioning.

I'm *still* not really sure it's actually worth the trouble of adding,
if 5-second snapshots are really feasible.  They're less convenient to
use by quite a bit, but the important use cases arise relatively
rarely, and the value is high when they arise, so that's not *too* big
an issue.  And more code complexity and more user confusion (I don't
think versioning is terribly comlex to understand, but certainly
snapshots plus versioning is more complex than snapshots alone).  But
if people are going to decide against file versioning, I'd prefer it
to be based on a more accurate understanding of how it plays to users
:-).
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Nicolas Williams
On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
 OK. So, now we're on to FV.  As Nico pointed out, FV is going to  
 need a new API.  Using the VMS convention of simply creating file  
 names with a version string afterwards is unacceptible, as it  
 creates enormous directory pollution,
 
 Assumption, not supported.  Eye of the  beholder.

No, you really need an API, otherwise you have to guess when to snapshot
versions of files.

 not to mention user confusion.
 
 Assumption, not supported.

Maybe Erik would find it confusing.  I know I would find it _annoying_.

 So, FV has to be invisible to non-aware programs.
 
 yes

Interesting that you agree with this when you disagree with Erik's other
points!  To me this statement implies FV APIs.

 Now we have a problem:  how do we access FV for non-local (e.g.  
 SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is  
 in the network file server arena,
 
 Assumption, and definitely not supported.   It is very useful outside  
 of the file sharing arena.

I agree with you, and I agree with Erik.  We, Sun engineers that is,
need to look at the big picture, and network access is part of the big
picture.

 unless we can use FV over the network, it is useless.
 
 Wrong

Yes, but we have to provide for it.

 You can't modify the SMB or NFS protocol (easily or quickly) to add  
 FV functionality (look how hard it was to add ACLs to these  
 protocols).
 
 About the only way I can think around this problem is to store  
 versions in a special subdir of each directory (e.g. .zfs_version),  
 which would then be browsable over the network, using tools not  
 normally FV-aware.  But this puts us back into the problem of a  
 directory which potentially has hundreds or thousands of files.
 
 This directory way of doing it is not a good way.  It fails the ease  
 of use to the end user test.

No, it doesn't: it doesn't preclude having FV-aware UIs that make it
easier to access versions.  All Erik's .zfs_version proposal is about is
remote access, not a user interface.

 The VMS way is far superior.  The problem is that you have to make  
 sure that apps that are not FV aware have no problems, which means  
 you cannot just append something to the actual file name. It has to  
 be some sort of meta data.

I.e., APIs.

The big question though is: how to snapshot file versions when they are
touched/created by applications that are not aware of FV?

Certainly not with every write(2).  At fsync(2), close(2), open(2) for
write/append?  What if an application deals in multiple files?  Etc...

Automatically capturing file versions isn't possible in the general case
with applications that aren't aware of FV.

 While this may indeed mean that you have all of your changes  
 around, figuring out which version has them can be massively time- 
 consuming.
 
 Your assumption.  (And much less hard than using snapshots).

I agree that with ZFS snapshots it could be hard to find the file
versions you want.  I don't agree that the same isn't true with FV
*except* where you have FV-aware applications.

 Yes, any time you do a close() or equivalent. The idea is not to  
 implement a universal undo stack.

Or open(2) for write, fsync(2)s, unlinks.  Maybe.  It could work for
some apps and not for others.

(I really wouldn't want building code to result in lots of file versions
of intermediate and end-result files!)

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Chad Leigh -- Shire.Net LLC


On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:

On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net  
LLC wrote:

On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:

OK. So, now we're on to FV.  As Nico pointed out, FV is going to
need a new API.  Using the VMS convention of simply creating file
names with a version string afterwards is unacceptible, as it
creates enormous directory pollution,


Assumption, not supported.  Eye of the  beholder.


No, you really need an API, otherwise you have to guess when to  
snapshot

versions of files.


What does snapshot versions of files mean?

My line Assumption, not supported.  Eye of the beholder was in  
reference to enormous directory polution





not to mention user confusion.


Assumption, not supported.


Maybe Erik would find it confusing.  I know I would find it  
_annoying_.


Then leave it set to 1 version




So, FV has to be invisible to non-aware programs.


yes


Interesting that you agree with this when you disagree with Erik's  
other

points!  To me this statement implies FV APIs.


It has to do with the implementation details.  I don't know what sort  
of APIs you are saying are  needed.  Maybe they are needed and maybe  
they would be handy. I am not disputing that.


The above should be simple to do however -- a program does an open of  
a file name foo.bar.  ZFS / the file system routine would use the  
most recent version by default if no version info is given.





Now we have a problem:  how do we access FV for non-local (e.g.
SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is
in the network file server arena,


Assumption, and definitely not supported.   It is very useful outside
of the file sharing arena.


I agree with you, and I agree with Erik.  We, Sun engineers that is,
need to look at the big picture, and network access is part of the big
picture.


Sure




unless we can use FV over the network, it is useless.


Wrong


Yes, but we have to provide for it.


I never said that file sharing is not useful (in this or any  
context).  I just said that FV is not useless except in the over the  
network use.  And if it did not support filesharing scenarios, at  
least in the beginning, it still has great use.  The same way that  
apache does not support lockfiles on nfs file systems, does not  make  
apache or nfs useless, FV that is not 100% in every nook and cranny  
does not make it useless.


I would find it of tremendous use just in managing system and  
configuration files.





You can't modify the SMB or NFS protocol (easily or quickly) to add
FV functionality (look how hard it was to add ACLs to these
protocols).

About the only way I can think around this problem is to store
versions in a special subdir of each directory (e.g. .zfs_version),
which would then be browsable over the network, using tools not
normally FV-aware.  But this puts us back into the problem of a
directory which potentially has hundreds or thousands of files.


This directory way of doing it is not a good way.  It fails the ease
of use to the end user test.


No, it doesn't: it doesn't preclude having FV-aware UIs that make it
easier to access versions.  All Erik's .zfs_version proposal is  
about is

remote access, not a user interface.


one UI is the command line shell




The VMS way is far superior.  The problem is that you have to make
sure that apps that are not FV aware have no problems, which means
you cannot just append something to the actual file name. It has to
be some sort of meta data.


I.e., APIs.


Well, file system level meta data that the file system uses may or  
may not need APIs to expose it -- depends on how the final  
implementation works.  However, I never came out against APIs




The big question though is: how to snapshot file versions when they  
are

touched/created by applications that are not aware of FV?


Don't use the word snapshot as it may draw in unintended comparisons  
to snapshot features.




Certainly not with every write(2).


no


At fsync(2), close(2), open(2) for
write/append?


probably


What if an application deals in multiple files?


so?


Etc...

Automatically capturing file versions isn't possible in the general  
case

with applications that aren't aware of FV.


In most cases it is possible.  At worst you make a copy on open and  
work on the copy, making it the most recent version.





While this may indeed mean that you have all of your changes
around, figuring out which version has them can be massively time-
consuming.


Your assumption.  (And much less hard than using snapshots).


I agree that with ZFS snapshots it could be hard to find the file
versions you want.  I don't agree that the same isn't true with FV
*except* where you have FV-aware applications.


How so?  The shell / desktop is enough of a UI to deal with it.




Yes, any time you do a close() or equivalent. The idea is not to
implement a universal undo stack.


Or open(2) for write, fsync(2)s, 

Re: [zfs-discuss] A versioning FS

2006-10-06 Thread David Dyer-Bennet

On 10/6/06, Nicolas Williams [EMAIL PROTECTED] wrote:

On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
 OK. So, now we're on to FV.  As Nico pointed out, FV is going to
 need a new API.  Using the VMS convention of simply creating file
 names with a version string afterwards is unacceptible, as it
 creates enormous directory pollution,

 Assumption, not supported.  Eye of the  beholder.

No, you really need an API, otherwise you have to guess when to snapshot
versions of files.


First of all snapshot versions of files is a very confusing phrase
especially in this discussion.  But, if you mean what I think you
mean, then the existing file API gives you all the information you
need . Whenever you create a new file, you create a new version.  The
only thing that changes is, if an *old* version already exists, it
doesn't get deleted the way it used to.


 Now we have a problem:  how do we access FV for non-local (e.g.
 SAMBA/NFS) clients?  Since the VAST majority of usefulness of FV is
 in the network file server arena,

 Assumption, and definitely not supported.   It is very useful outside
 of the file sharing arena.

I agree with you, and I agree with Erik.  We, Sun engineers that is,
need to look at the big picture, and network access is part of the big
picture.


Yes, I have to agree here also.  So much of people's file access is
over a network these days that a local-only facility isn't very
interesting / useful.


 You can't modify the SMB or NFS protocol (easily or quickly) to add
 FV functionality (look how hard it was to add ACLs to these
 protocols).
 
 About the only way I can think around this problem is to store
 versions in a special subdir of each directory (e.g. .zfs_version),
 which would then be browsable over the network, using tools not
 normally FV-aware.  But this puts us back into the problem of a
 directory which potentially has hundreds or thousands of files.

 This directory way of doing it is not a good way.  It fails the ease
 of use to the end user test.

No, it doesn't: it doesn't preclude having FV-aware UIs that make it
easier to access versions.  All Erik's .zfs_version proposal is about is
remote access, not a user interface.


Requiring special software to access this kind of feature is death.
People don't want to learn new tools; they want to learn existing
tools.  Depending on the user, that's ls, or awk, or grep, or find, or
Emacs dired, or this or that or the other thing.

One of the reasons ZFS snapshots (and other snapshots, in my limited
experience) work easily is that they appear as ordinary files within
the directory structure, and do *not* require special tools to access.


 The VMS way is far superior.  The problem is that you have to make
 sure that apps that are not FV aware have no problems, which means
 you cannot just append something to the actual file name. It has to
 be some sort of meta data.

I.e., APIs.


I don't think I understand the issues being raised here.  My
off-the-cuff impression is that they don't exist at all, or are at
least moderate molehills not mountains.

When writing an application for TOPS-20 or VMS, you didn't have to do
anything to specifically deal with file versioning.  It just worked.
If the user wanted the most recent version of the file, they typed the
name without the version, or else with the most current version.  If
they *did* want an older version, they had to type very slightly more,
by appending the version number.  And (on TOPS-20) of course we had
filename completion and inline help to make it easy to refresh your
memory on what versions existed in the middle of doing this.

So, one small feature built into the filesystem OPEN code: if a
version is not specificied for a file, use the most recent version.
NO special code in any application is needed.

There are public-access TOPS-20 systems on the net today (I've got an
account on one, though that data is at home and I'm in Palo Alto this
week).  And I've still got the small TOPS-20 system manual (I didn't
keep the big twenty-something volume set though) where I can look up
the details when I'm home.  This technology isn't completely lost yet
:-).
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Erik Trimble

Chad,

I think our problem is that we look at FV from different angles. I look 
at it from the point of view of people who have NEVER used FV, and you 
look at it from the view of people who have ALWAYS used FV.


For those of us who have never had FV available, technical users have 
used VC tools for important files forever (scripts, config files, etc), 
and will continue to use VC for those purposes, even if FV is 
implemented, as VC has decided advantages for these uses (history, 
management, etc.).   For the technical user, FV is primarily useful for 
when editing documents where were never put under VC in the pre-FV 
era.   This is virtually identical to the usage for naive users. That 
is, FV is highly useful for keeping multiple copies of documents under 
active editing.



In order for an FV implementation to be useful for this stated purpose, 
it must fulfill the following requirements:


(1)  Clean interface for users.  That is, one must NOT be presented with 
a complete list of all versions unless explicitly asked for it, and it 
should be simple to select a version based on some reasonable criteria 
(date of creation/modification, version number, etc.)


(2)  Simple way to decide if a file should be versioned or not. Either 
automatically version all files (or none at all), or provide a mechanism 
to turn FV on/off on a per-file or per-directory basis.


(3)  Network-FS awareness.  Without this, FV is severely limited. Given 
my preconditions above (that is, the current usage pattern of us in the 
non-FS world), limiting FV to those on the local system restricts its 
usefulness to the point where it isn't worth the effort.



So, we have two scenarios for the implementation here:

(a)  FV requires no special API, and all programs using the Filesystem 
automatically have access to versions


(b)  FV uses a new API, so versions are only available to applications 
using the new API



For case (a), you are going to have to store the versions as files 
_somewhere_, in which case you run into the directory pollution 
problem I quote (if you store the versions next to the current 
version), or the where is my version problem that you quote w/r/t 
snapshots (if you store them elsewhere).


In case (b), you will have to re-write _all_ FS-access apps to make them 
FV-aware, in the same manner work had to be done to make apps 
ACL-aware.  And, to get requirement (3) above, you have to modify the 
network FS protocols to support the API calls.



Also, regardless of which implementation mechanism you use (a) or (b), 
you will need some sort of tool to indicate which files are to be 
versioned (to satisfy requirement (2) above), how many versions are to 
be kept, and other FV administration utilities.  These tools will all 
need to be netFS-aware/usable.



Disk space consumption is NOT irrelevant. Else, why is there so much 
concern around the ZFS compression project?  Disk is NOT cheap - on the 
desktop, yes, but I'm sorry, networked disk systems are not really 
cheap, and tape archivers less so.  Allocating several GB of disk space 
per end-user is not uncommon, so 1000 users requires multi-terabyte 
systems just for normal storage (i.e. no 
backups/versions/snapshots/archives).  Take a look at what a typical 
system costs:  $10+/GB for workgroup-level storage (Sun 3510FC class, 
1-20TB), $30+/GB for nice mid-level SAN storage arrays (Sun 6920-class. 
10TB).  If I have to increase my storage requirements 25-50% for FV, 
most of which is unused versions, this is decidedly non-trivial 
amounts.  This applies as well to the 5-second snapshot proposal.



For source code, FV isn't really needed - the problem has already been 
solved.  If your particular VC/editor/IDE doesn't handle the problem 
correctly, then switch.  There are many VC and IDE combinations on all 
platforms which provide a solution to the same problem FV solves.  
Mercurial, RationalRose, BitKeeper, Git, and others on the VC side; 
NetBeans, CodeWarrior, Visual Studio, and even Emacs can be configured 
to handle the problem on the IDE side.



-Erik


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Nicolas Williams
On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:
 On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net  
 LLC wrote:
 On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
 OK. So, now we're on to FV.  As Nico pointed out, FV is going to
 need a new API.  Using the VMS convention of simply creating file
 names with a version string afterwards is unacceptible, as it
 creates enormous directory pollution,
 
 Assumption, not supported.  Eye of the  beholder.
 
 No, you really need an API, otherwise you have to guess when to  
 snapshot
 versions of files.
 
 What does snapshot versions of files mean?

The act of creating file versions ala VMS.

 My line Assumption, not supported.  Eye of the beholder was in  
 reference to enormous directory polution

Ah.  'Twasn't clear.

 
 not to mention user confusion.
 
 Assumption, not supported.
 
 Maybe Erik would find it confusing.  I know I would find it  
 _annoying_.
 
 Then leave it set to 1 version

Per-directory?  Per-filesystem?

 
 So, FV has to be invisible to non-aware programs.
 
 yes
 
 Interesting that you agree with this when you disagree with Erik's  
 other
 points!  To me this statement implies FV APIs.
 
 It has to do with the implementation details.  I don't know what sort  
 of APIs you are saying are  needed.  Maybe they are needed and maybe  
 they would be handy. I am not disputing that.
 
 The above should be simple to do however -- a program does an open of  
 a file name foo.bar.  ZFS / the file system routine would use the  
 most recent version by default if no version info is given.

How can version information be given without changing the APIs or
putting the version number/string into the file name?

Putting the version number/string into the file name is hard for me to
accept.  It's what would lead to polluting my directories.

Now, if the default is 1 version (i.e., keep the current version only),
then I might live with it because I'd never change that setting.

But if we don't encode the version number/string in the file name and
instead enhance APIs and UIs so that by default I can keep N1 versions
without them polluting my directories, THEN I would set N1.

 one UI is the command line shell

Indeed!  And command-line tools, like ls(1), find(1), etc...

What I'm saying is that I'd like to be able to keep multiple versions of
my files without echo * or ls showing them to me by default.

I'd like an option for ls(1), find(1) and friends to show file versions,
and a way to copy (or, rather, un-hide) selected versions files so that
I could now refer to them as usual -- when I do this I don't care to see
version numbers in the file name, I just want to give them names.

And, maybe, I'd like a way to write globs that match file versions
(think of extended globboing, as in KSH).

GUIs would, presumably, have  a way show/hide file versions, search for
them, select them, etc...

 Certainly not with every write(2).
 
 no

Good.

 At fsync(2), close(2), open(2) for
 write/append?
 
 probably

Which?

 What if an application deals in multiple files?
 
 so?

So, file versions aren't useful unless the application explicitly
decides tells the OS when to make them.

Similarly with applications that keep files open but keep writing
transactions in ways that the OS can't isolate without input from the
app.  E.g., databases.  fsync(2) helps here, but lots and lots of
fsync(2)s would result in no useful versioning.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread David Dyer-Bennet

On 10/6/06, Nicolas Williams [EMAIL PROTECTED] wrote:

On Fri, Oct 06, 2006 at 04:06:37PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 On Oct 6, 2006, at 3:53 PM, Nicolas Williams wrote:



 Maybe Erik would find it confusing.  I know I would find it
 _annoying_.

 Then leave it set to 1 version

Per-directory?  Per-filesystem?


Whatever.  What's the actual issue here?

I don't recall that on TOPS-20 it was possible to not version.  What
you could do is set your logout.cmd file to purge your space down to
one copy when you logged out.

This worked fine for the users I knew; even on a system that didn't
have as much as a gigabyte of disk storage total to support a few
dozen software engineers.


 The above should be simple to do however -- a program does an open of
 a file name foo.bar.  ZFS / the file system routine would use the
 most recent version by default if no version info is given.

How can version information be given without changing the APIs or
putting the version number/string into the file name?


The version number is part of the file name in all the examples I know
about.  I'd find it useless without that; it has to be a real part of
the filesystem, usable by everybody, not a special addon accessible
only with one or two dedicated applications.


Putting the version number/string into the file name is hard for me to
accept.  It's what would lead to polluting my directories.


Set your ls default to not show versions.  Isn't the problem then
solved?  Maybe add that option to the GUI filesystem explorer as well.

In practice, it never was a problem that I noticed, or that other
people noticed.  And remember that this was on slower systems with
smaller screens and often rather slower screen update.

Do you not like the idea based on theory, or did you actually use
TOPS-20 for a while and find the versioning troublesome?


 one UI is the command line shell

Indeed!  And command-line tools, like ls(1), find(1), etc...

What I'm saying is that I'd like to be able to keep multiple versions of
my files without echo * or ls showing them to me by default.


And I find that completely unacceptable; useless.  The whole point of
putting versioning in the filesystem is that that makes it accessible
to all programs.


 What if an application deals in multiple files?

 so?

So, file versions aren't useful unless the application explicitly
decides tells the OS when to make them.


File versions are created when a file is created.  In the scenario
where, today, an existing file would be overwritten (deleted), instead
the old file is kept and the new file is given the version number +1
of the old file.


Similarly with applications that keep files open but keep writing
transactions in ways that the OS can't isolate without input from the
app.  E.g., databases.  fsync(2) helps here, but lots and lots of
fsync(2)s would result in no useful versioning.


None of those are candidates for file versioning, and a darned good thing, too.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Joseph Mocker

Nicolas Williams wrote:


On Fri, Oct 06, 2006 at 03:30:20PM -0600, Chad Leigh -- Shire.Net LLC wrote:
 


On Oct 6, 2006, at 3:08 PM, Erik Trimble wrote:
   

OK. So, now we're on to FV.  As Nico pointed out, FV is going to  
need a new API.  Using the VMS convention of simply creating file  
names with a version string afterwards is unacceptible, as it  
creates enormous directory pollution,
 


Assumption, not supported.  Eye of the  beholder.
   



No, you really need an API, otherwise you have to guess when to snapshot
versions of files.
 

David Dyer-Bennet's post gives a hint of how this could be done without 
any API. Simply augment a few system calls like open(), unlink(), etc. 
Calls that can potentially change files. Since you can't change a file 
unless is open()'ed with various write flags like O_WRONLY, O_RDWR, etc, 
this could be an ideal place to create the version.


One could probably write a poor man's FV LD_PRELOAD library to do this 
without the filesystem's knowledge at all.


It wouldn't be as efficient with space as could be done at the 
filesystem level, but as someone said, disk is cheap.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Joseph Mocker

Nicolas Williams wrote:



The big question though is: how to snapshot file versions when they are
touched/created by applications that are not aware of FV?

Certainly not with every write(2).  At fsync(2), close(2), open(2) for
write/append?  What if an application deals in multiple files?  Etc...

Automatically capturing file versions isn't possible in the general case
with applications that aren't aware of FV.
 

Don't snapshots have the same problem. A snapshot could potentially be 
taken when a file is partially written or updated, no?


For example, I start to write a large file, zfs's buffers fill up and it 
flushes them to disk during the middle of the file I'm writing. If a 
snapshot came along at about the same time, the file would be 
incomplete/corrupt, no?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Erik Trimble

David Dyer-Bennet wrote:

On 10/6/06, Nicolas Williams [EMAIL PROTECTED] wrote:


 Maybe Erik would find it confusing.  I know I would find it
 _annoying_.

 Then leave it set to 1 version

Per-directory?  Per-filesystem?


Whatever.  What's the actual issue here?

I don't recall that on TOPS-20 it was possible to not version.  What
you could do is set your logout.cmd file to purge your space down to
one copy when you logged out.
But see, that assumes you have a logout-type functionality to use. Which 
indeed is possible for command-line usage, but then only in a very 
limited way.   During a typical session, I access almost 20 NFS-mounted 
directories. And anyone using autofs/automount trees gets even more. 
You're saying that my logout script has to know about all of them to 
keep things clean?  That's unrealistic.  And that still doesn't solve 
the problem of people who use SAMBA or NFS from machines which don't 
have an interactive shell logout system (i.e. Windows).



This worked fine for the users I knew; even on a system that didn't
have as much as a gigabyte of disk storage total to support a few
dozen software engineers.

The problem is we are comparing apples to oranges in user bases here. 
TOPS-20 systems had a couple of dozen users (or, at most, a few 
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
thousands.  Plus, the number of files being created under typical modern 
systems is at least two (and probably three or four) orders of magnitude 
greater.  I've got 100,000 files under /usr in Solaris, and almost 1,000 
under my home directory.  And I don't have anything significant in my 
/home (no source code, no build/test trees, just misc business stuff).   
What is managable with a few files quickly becomes unwieldy with more 
than a few dozen.


This is what Nico and I are talking about:  if you turn on file 
versioning automatically (even for just a directory, and not a whole 
filesystem), the number of files being created explodes geometrically.



 The above should be simple to do however -- a program does an open of
 a file name foo.bar.  ZFS / the file system routine would use the
 most recent version by default if no version info is given.

How can version information be given without changing the APIs or
putting the version number/string into the file name?


The version number is part of the file name in all the examples I know
about.  I'd find it useless without that; it has to be a real part of
the filesystem, usable by everybody, not a special addon accessible
only with one or two dedicated applications.


Putting the version number/string into the file name is hard for me to
accept.  It's what would lead to polluting my directories.


Set your ls default to not show versions.  Isn't the problem then
solved?  Maybe add that option to the GUI filesystem explorer as well.

But this requires modifying all the relevant apps, which is the same 
amount of work as modifying them to use a new FV API.  It's not 
transparent to the end-user.



In practice, it never was a problem that I noticed, or that other
people noticed.  And remember that this was on slower systems with
smaller screens and often rather slower screen update.

Do you not like the idea based on theory, or did you actually use
TOPS-20 for a while and find the versioning troublesome?

Putting the file version number as part of the file name breaks things. 
Apps unaware of the special significance of this format will tend to 
write similar names, which can screw everything royally. 


Example:

Say we use file;version

In emacs, I edit FOO:2

it will write out a temp file FOO:2~.  So, how does the FS deal with 
this the next time they need to create a new version?


The problem lies in that under VMS, the ';' was a special character, and 
unusable in normal naming. I suspect a similar situation exists under 
TOPS-20.  No such luck in a POSIX filesystem - all printable (and many 
unprintable) characters are valid for use in filenames. So you _CAN'T_ 
use them to deliniate File Versioning, without risking blowing the 
entire scheme when some random app decides to either use your FV marker 
for its own needs, or something similar to the emacs case above.





 one UI is the command line shell

Indeed!  And command-line tools, like ls(1), find(1), etc...

What I'm saying is that I'd like to be able to keep multiple versions of
my files without echo * or ls showing them to me by default.


And I find that completely unacceptable; useless.  The whole point of
putting versioning in the filesystem is that that makes it accessible
to all programs.

But, because of the explosion in the number of files, you CAN'T 
automatically show all versions. Users will NEVER accept this. The only 
clean way to do this is to show file versions only upon request. Not by 
default.




 What if an application deals in multiple files?

 so?

So, file versions aren't useful unless the application explicitly
decides tells the OS 

Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Chad Leigh -- Shire.Net LLC


On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:



This is what Nico and I are talking about:  if you turn on file  
versioning automatically (even for just a directory, and not a  
whole filesystem), the number of files being created explodes  
geometrically.


But it doesn't.  Unless you are editing geometrically more files.

Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Erik Trimble

Joseph Mocker wrote:

Nicolas Williams wrote:



The big question though is: how to snapshot file versions when they are
touched/created by applications that are not aware of FV?

Certainly not with every write(2).  At fsync(2), close(2), open(2) for
write/append?  What if an application deals in multiple files?  Etc...

Automatically capturing file versions isn't possible in the general case
with applications that aren't aware of FV.
 

Don't snapshots have the same problem. A snapshot could potentially be 
taken when a file is partially written or updated, no?


For example, I start to write a large file, zfs's buffers fill up and 
it flushes them to disk during the middle of the file I'm writing. If 
a snapshot came along at about the same time, the file would be 
incomplete/corrupt, no?


The developers can answer this definitively, but I believe the answer to 
your questions is NO.  That is, if there is anything in the buffer 
waiting to be written when a snapshot request comes along, the buffer is 
written out so that the file is consistent with the last write().  So, 
snapshotting should NEVER cause a file corruption in this matter. That 
said, if you are doing the following:


1. App issues write() for data A
2. snapshot request
3. App issues write for data B

Then yes, the snapshot file will only contain data A, and not data B, 
which might lead to an inconsistency in the app's behavior, if both A 
and B were important to be written together.  But if that were the case, 
then the app should have written A and B atomically.


So, if you are writing to a file, it works better to write everything at 
once in a stream, rather than a character (or byte) at a time. :-)


-Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Chad Leigh -- Shire.Net LLC


On Oct 6, 2006, at 7:33 PM, Erik Trimble wrote:


David Dyer-Bennet wrote:

On 10/6/06, Nicolas Williams [EMAIL PROTECTED] wrote:


 Maybe Erik would find it confusing.  I know I would find it
 _annoying_.

 Then leave it set to 1 version

Per-directory?  Per-filesystem?


Whatever.  What's the actual issue here?

I don't recall that on TOPS-20 it was possible to not version.  What
you could do is set your logout.cmd file to purge your space down to
one copy when you logged out.
But see, that assumes you have a logout-type functionality to use.  
Which indeed is possible for command-line usage, but then only in a  
very limited way.   During a typical session, I access almost 20  
NFS-mounted directories. And anyone using autofs/automount trees  
gets even more. You're saying that my logout script has to know  
about all of them to keep things clean?  That's unrealistic.


It is up to you to come up with a scheme to keep things clean, the  
same way you do now anyway (downloads, etc),


  And that still doesn't solve the problem of people who use SAMBA  
or NFS from machines which don't have an interactive shell logout  
system (i.e. Windows).


It is still mounted on their desktops and they can still delete files  
with FV the same way they do now


No real issue.




This worked fine for the users I knew; even on a system that didn't
have as much as a gigabyte of disk storage total to support a few
dozen software engineers.

The problem is we are comparing apples to oranges in user bases  
here. TOPS-20 systems had a couple of dozen users (or, at most, a  
few hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s  
of thousands.


Rarely.  Most of them have in the same range as VMS now or then.

Plus, the number of files being created under typical modern  
systems is at least two (and probably three or four) orders of  
magnitude greater.  I've got 100,000 files under /usr in Solaris,


so?  You are not editing these are you?


and almost 1,000 under my home directory.


again, FV only matters when you edit them

And I don't have anything significant in my /home (no source code,  
no build/test trees, just misc business stuff).   What is managable  
with a few files quickly becomes unwieldy with more than a few dozen.


I think you admitted you had not used FV before.  Is that the case?   
Then how can you speak about what becomes unwieldy?


FV is not any more unwieldy with 1000 files in a dir than with 10.   
Most people are not editing the 1000 files sitting in their directory.




This is what Nico and I are talking about:  if you turn on file  
versioning automatically (even for just a directory, and not a  
whole filesystem), the number of files being created explodes  
geometrically.


Again, it does not.  Files are only versioned when they are edited.



 The above should be simple to do however -- a program does an  
open of
 a file name foo.bar.  ZFS / the file system routine would use  
the

 most recent version by default if no version info is given.

How can version information be given without changing the APIs or
putting the version number/string into the file name?


The version number is part of the file name in all the examples I  
know

about.  I'd find it useless without that; it has to be a real part of
the filesystem, usable by everybody, not a special addon accessible
only with one or two dedicated applications.

Putting the version number/string into the file name is hard for  
me to

accept.  It's what would lead to polluting my directories.


Set your ls default to not show versions.  Isn't the problem then
solved?  Maybe add that option to the GUI filesystem explorer as  
well.


But this requires modifying all the relevant apps, which is the  
same amount of work as modifying them to use a new FV API.  It's  
not transparent to the end-user.


Because the semantics of a file name are different on a unix/posix  
system than they are on a VMS or TOPS-20 system, which had more  
structured filenames.  I would say that the version cannot be an  
actual part of the file name but would have to be meta data.   
However, it could display as part of the username and the underlying  
system can be made to do the right thing


ie,

foo gets you the latest foo

Specifically entering in  foo;7 gets you version 7 or the latest if  
there are less than 7 versions available.  The app can think of it as  
being part of the file name, but the underlying system would have to  
know how to do the right thing in extracting the version out and  
making it meta data.  Takes some thinking and I am not claiming to  
have all the answers right now, but hardly undoable.


No app changes are necessary.





In practice, it never was a problem that I noticed, or that other
people noticed.  And remember that this was on slower systems with
smaller screens and often rather slower screen update.

Do you not like the idea based on theory, or did you actually use
TOPS-20 for a while and find the 

Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Richard Elling - PAE

Erik Trimble wrote:
The problem is we are comparing apples to oranges in user bases here. 
TOPS-20 systems had a couple of dozen users (or, at most, a few 
hundred).  VMS only slightly more.  UNIX/POSIX systems have 10s of 
thousands.  


IIRC, I had about a dozen files under VMS, not counting versions.

Plus, the number of files being created under typical modern 
systems is at least two (and probably three or four) orders of magnitude 
greater.  I've got 100,000 files under /usr in Solaris, and almost 1,000 
under my home directory.  


wimp :-)  I count 88,148 in my main home directory.  I'll bet just
running gnome and firefox will get you in the ballpark of 1,000 :-/
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-06 Thread Chad Leigh -- Shire.Net LLC


On Oct 6, 2006, at 10:18 PM, Richard Elling - PAE wrote:


Erik Trimble wrote:
The problem is we are comparing apples to oranges in user bases  
here. TOPS-20 systems had a couple of dozen users (or, at most, a  
few hundred).  VMS only slightly more.  UNIX/POSIX systems have  
10s of thousands.


IIRC, I had about a dozen files under VMS, not counting versions.


You mean in your system?  There was a lot more than that...



Plus, the number of files being created under typical  
modern systems is at least two (and probably three or four) orders  
of magnitude greater.  I've got 100,000 files under /usr in  
Solaris, and almost 1,000 under my home directory.


wimp :-)  I count 88,148 in my main home directory.  I'll bet just
running gnome and firefox will get you in the ballpark of 1,000 :-/


None (well, maybe 1 or 2)  of which you edit and hence would not  
generate versions.


Chad

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Brian Hechinger
On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:
 On 10/5/06, Jeremy Teo [EMAIL PROTECTED] wrote:
 What would a version FS buy us that cron+ zfs snapshots doesn't?
 
 Finer granularity; no chance of missing a change.
 
 TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
 control, and other alternatives aren't, in fact, alternatives.
 They're useful in and of themselves, very useful indeed, but they
 don't address the same needs as versioning.

VMS _still_ does this, and it's one of my favorite features of the OS.

-brian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Richard Elling - PAE

Brian Hechinger wrote:

On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:

On 10/5/06, Jeremy Teo [EMAIL PROTECTED] wrote:

What would a version FS buy us that cron+ zfs snapshots doesn't?

Finer granularity; no chance of missing a change.

TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
control, and other alternatives aren't, in fact, alternatives.
They're useful in and of themselves, very useful indeed, but they
don't address the same needs as versioning.


VMS _still_ does this, and it's one of my favorite features of the OS.


It is a real PITA if you are unfortunate enough to use quotas :-(
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Casper . Dik

Brian Hechinger wrote:
 On Thu, Oct 05, 2006 at 11:19:19AM -0700, David Dyer-Bennet wrote:
 On 10/5/06, Jeremy Teo [EMAIL PROTECTED] wrote:
 What would a version FS buy us that cron+ zfs snapshots doesn't?
 Finer granularity; no chance of missing a change.

 TOPS-20 did this, and it was *tremendously* useful . Snapshots, source
 control, and other alternatives aren't, in fact, alternatives.
 They're useful in and of themselves, very useful indeed, but they
 don't address the same needs as versioning.
 
 VMS _still_ does this, and it's one of my favorite features of the OS.

It is a real PITA if you are unfortunate enough to use quotas :-(

It's one of the things I hated about VMS; so I quickly wrote a script
which on logout purged all extra copies and renamed all files back to
*;1.

Casper
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread David Dyer-Bennet

On 10/5/06, Erik Trimble [EMAIL PROTECTED] wrote:


Doing versioning at the file-system layer allows block-level changes to
be stored, so it doesn't consume enormous amounts of extra space. In
fact, it's more efficient than any versioning software (CVS, SVN,
teamware, etc) for storing versions.


Comparing to cvs/svn misses the point; as I said, they address
comletely different needs.


However, there are three BIG drawbacks for using versioning in your FS
(that assumes that it is a tunable parameter and can be turned off for a
FS when not desired):

(1)  File listing symantics become a bit of a mess.  VMS stores versions
as filename;versionThat is, it uses the semi-colon as a
divider.  Now, I'm not at all sure how we can make ZFS POSIX-compliant
and still do something like this.  Versioning filesystems tend to be a
complete mess - it is hard to present usable information about which
versions are available, and at the same time keep things clean. Even
keeping versions in a hidden dir (say .zfs_versions) in each directory
still leaves that directory filled with a huge mess of files.


Complete mess is certainly not my experience (I worked with TOPS-20
from 1977 to 1985 and VMS from 1979 to 1985).  The key is that you
need to *clean up*; specifically, you need to use the command which
deletes all but the most recent copy of each file in a directory at
the end of pretty much each work session.

It's trivial to present information on which versions are available;
you simply list each one as a file, which has the date info any file
has, and the version number.


(2)  File Versioning is no replacement for source code control, as you
miss all the extra features (tagging, branching, comments, etc) that go
with a file version check-in.


It's very definitely not an alternative or replacement for source code
control, no.  It provides a very useful feature to use *alongside*
source control.  Source code control is also not a replacement for
file versioning (I end up creating spare copies of files with funny
names for things I'd otherwise get from versioning; and I end up
losing time through not having through to create such a file, whereas
versioning is automatic).


(3)  Many apps continuously save either temp copies or actual copies of
the file you are working on. This leads to a version explosion, where
you end up with 100s of versions of a commonly used file.  This tends to
be worse than useless, as people have an incredibly hard time figuring
out which (older) version they might actually want to look at.  And,
this problem ISN'T ever going to go away, as it would require apps to
understand filesystem features for ZFS, which isn't going to happen.


Files treated that way are often deleted at the end of the session
automatically, so no problem there.  Or else they'll be cleaned up
when you do your session-end cleanup.  What the heck was that command
on TOPS-20 anyway?  Maybe purge?  Sorry, 20-year-old memories are
fuzzy on some details.

File versioning worked a lot better on TOPS-20 than on VMS, as I
remember it.  The facility looked the same, but actually working with
it was much cleaner and easier.

Making it somewhat controllable would be useful.  Starting with maybe
an inheritable default, so some directory trees could be set not to
version.


I'd discourage File Versioning at this late stage in UNIX.  Source Code
control systems fulfill the need for serious uses, and casual usage is
obviated by the mantra of save early, save often that has been beaten
into the userbase. Trying to change that is a recipe for disaster.


Actually, save early and often is exactly why versioning is
important.  If you discover you've gone down a blind alley in some
code, it makes it easy to get back to the earlier spots.  This, in my
experience, happens at a detail level where you won't (in fact can't)
be doing checkins to version control.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Brian Hechinger
On Thu, Oct 05, 2006 at 04:08:13PM -0700, David Dyer-Bennet wrote:

 when you do your session-end cleanup.  What the heck was that command
 on TOPS-20 anyway?  Maybe purge?  Sorry, 20-year-old memories are
 fuzzy on some details.

It's PURGE under VMS, so knowing DEC, it was named PURGE under TOPS-20
as well.

H, gotta get the DECsystem-2020 powered up one of these days.

-brian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Erik Trimble
On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet wrote:
 On 10/5/06, Erik Trimble [EMAIL PROTECTED] wrote:
 
  Doing versioning at the file-system layer allows block-level changes to
  be stored, so it doesn't consume enormous amounts of extra space. In
  fact, it's more efficient than any versioning software (CVS, SVN,
  teamware, etc) for storing versions.
 
 Comparing to cvs/svn misses the point; as I said, they address
 comletely different needs.
 
I was making a general point, to make it clear FS versioning isn't a
disk pig.


  However, there are three BIG drawbacks for using versioning in your FS
  (that assumes that it is a tunable parameter and can be turned off for a
  FS when not desired):
 
  (1)  File listing symantics become a bit of a mess.  VMS stores versions
  as filename;versionThat is, it uses the semi-colon as a
  divider.  Now, I'm not at all sure how we can make ZFS POSIX-compliant
  and still do something like this.  Versioning filesystems tend to be a
  complete mess - it is hard to present usable information about which
  versions are available, and at the same time keep things clean. Even
  keeping versions in a hidden dir (say .zfs_versions) in each directory
  still leaves that directory filled with a huge mess of files.
 
 Complete mess is certainly not my experience (I worked with TOPS-20
 from 1977 to 1985 and VMS from 1979 to 1985).  The key is that you
 need to *clean up*; specifically, you need to use the command which
 deletes all but the most recent copy of each file in a directory at
 the end of pretty much each work session.
 
 It's trivial to present information on which versions are available;
 you simply list each one as a file, which has the date info any file
 has, and the version number.
 

I stand by the complete mess statement. _You_ have trained yourself to
get around the problem, by eliminating most of the reason for file
versioning - you delete everything when you log out.  A normal user (or
even, most scripts) aren't going to do this. Indeed, I would argue that
it makes no sense to implement versioning if all you are going to use it
for is on a per-session basis. 

And, try thinking of a directory with a few dozen files in it, each with
a dozen or more versions. that's hideous, from a normal user standpoint.
VMS's implementation of filename;version is completely unwieldy if
you have more than a few files, or more than a few versions. And, in
modern typical use, it is _highly_ likely both will be true. 


  (2)  File Versioning is no replacement for source code control, as you
  miss all the extra features (tagging, branching, comments, etc) that go
  with a file version check-in.
 
 It's very definitely not an alternative or replacement for source code
 control, no.  It provides a very useful feature to use *alongside*
 source control.  Source code control is also not a replacement for
 file versioning (I end up creating spare copies of files with funny
 names for things I'd otherwise get from versioning; and I end up
 losing time through not having through to create such a file, whereas
 versioning is automatic).

File versioning would certainly be nice in many cases, but I think it's
better implemented in the application (think of Photoshop's unlimited
undo feature, though better than that), than in the FS, where it creates
a whole lot of clutter and confusion real fast, where it is only
specifically useful for a very limited selection of files.


  (3)  Many apps continuously save either temp copies or actual copies of
  the file you are working on. This leads to a version explosion, where
  you end up with 100s of versions of a commonly used file.  This tends to
  be worse than useless, as people have an incredibly hard time figuring
  out which (older) version they might actually want to look at.  And,
  this problem ISN'T ever going to go away, as it would require apps to
  understand filesystem features for ZFS, which isn't going to happen.
 
 Files treated that way are often deleted at the end of the session
 automatically, so no problem there.  Or else they'll be cleaned up
 when you do your session-end cleanup.  What the heck was that command
 on TOPS-20 anyway?  Maybe purge?  Sorry, 20-year-old memories are
 fuzzy on some details.

So, here's a question:  if I delete file X;1, do I delete X;x ?  That
is, do I delete all versions of a file when I delete the actual file?
what about deleting a (non-head) version?  And, exactly how many
different files have to be cleaned up when you logout?  How does this
get configured? Who does the configuring? What if I _want_ versions of
some files, but not the others?  

And, what about network-sharing?  For non-interactive use?  (i.e. via
SAMBA, or other apps where you're not looking at the FS via a command
prompt?)

 File versioning worked a lot better on TOPS-20 than on VMS, as I
 remember it.  The facility looked the same, but actually working with
 it was much cleaner and easier.
 
 Making it 

Re: [zfs-discuss] A versioning FS

2006-10-05 Thread David Dyer-Bennet

A lot of this we're clearly not going to agree on and I've said what I
had to contribute.  There's one remaining point, though...

On 10/5/06, Erik Trimble [EMAIL PROTECTED] wrote:

On Thu, 2006-10-05 at 16:08 -0700, David Dyer-Bennet wrote:



 Actually, save early and often is exactly why versioning is
 important.  If you discover you've gone down a blind alley in some
 code, it makes it easy to get back to the earlier spots.  This, in my
 experience, happens at a detail level where you won't (in fact can't)
 be doing checkins to version control.

Then, IMHO, you aren't using VC properly.  File Versioning should NEVER,
EVER, EVER be used for anything around VC.  It might be useful for
places VC isn't traditionally use (Office documents, small scripts,
etc.), but the example you provide is one which is easily solved by use
of frequent checkins to VC - indeed, that's what VC is supposed to be
for.


No, any sane VC protocol must specifically forbid the checkin of the
stuff I want versioning (or file copies or whatever) for.  It's
partial changes, probably doesn't compile, nearly certainly doesn't
work.  This level of work product *cannot* be committed to the
repository.

Well, unless you have a better VCS than CVS or SVN.  I first met this
as an obscure, buggy, expensive, short-lived SUN product, actually; I
believe it was called NSE, the Network Software Engineering
environment.  And I used one commercial product (written by an NSE
user after NSE was discontinued) that supported the feature needed.
Both of these had what I might call a two-level VCS.  Each developer
had one or more private repositories (the way people have working
directories now with SVN), but you had full VCS checkin/checkout (and
compare and rollback and so forth) within that.  Then, when your code
was ready for the repository, you did a commit step that pushed it
up from your private repository to the public repository.

One of the big problems with CVS and SVN and Microsoft SourceSafe is
that you don't have the benefits of version control most of the time,
because all commits are *public*.
--
David Dyer-Bennet, mailto:[EMAIL PROTECTED], http://www.dd-b.net/dd-b/
RKBA: http://www.dd-b.net/carry/
Pics: http://www.dd-b.net/dd-b/SnapshotAlbum/
Dragaera/Steven Brust: http://dragaera.info/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Erik Trimble
On Thu, 2006-10-05 at 17:25 -0700, David Dyer-Bennet wrote:
 
 Well, unless you have a better VCS than CVS or SVN.  I first met this
 as an obscure, buggy, expensive, short-lived SUN product, actually; I
 believe it was called NSE, the Network Software Engineering
 environment.  And I used one commercial product (written by an NSE
 user after NSE was discontinued) that supported the feature needed.
 Both of these had what I might call a two-level VCS.  Each developer
 had one or more private repositories (the way people have working
 directories now with SVN), but you had full VCS checkin/checkout (and
 compare and rollback and so forth) within that.  Then, when your code
 was ready for the repository, you did a commit step that pushed it
 up from your private repository to the public repository.
 
 One of the big problems with CVS and SVN and Microsoft SourceSafe is
 that you don't have the benefits of version control most of the time,
 because all commits are *public*.

Just FYI:  that buggy, expensive, short-lived SUN product eventually
became Teamware. 

Check out (no pun intended)  Mercurial and similar products, which have
similar behavior to Teamware - each developer has a workspace for
code, and you can do VC inside that workspace without having to do a
putback into the main tree.  That way, you do frequent VC checkins,
but don't putback to the main tree until things actually work. Or, at
least, you _claim_ them to work. 

:-)




-- 
Erik Trimble
Java System Support
Mailstop:  usca14-102
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Wee Yeh Tan

On 10/6/06, David Dyer-Bennet [EMAIL PROTECTED] wrote:

One of the big problems with CVS and SVN and Microsoft SourceSafe is
that you don't have the benefits of version control most of the time,
because all commits are *public*.


David,

That is exactly what branch is for in CVS and SVN.  Dunno much about
M$ SourceSafe.

--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Frank Cusack

On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet [EMAIL PROTECTED] wrote:

Well, unless you have a better VCS than CVS or SVN.  I first met this
as an obscure, buggy, expensive, short-lived SUN product, actually; I
believe it was called NSE, the Network Software Engineering
environment.  And I used one commercial product (written by an NSE
user after NSE was discontinued) that supported the feature needed.
Both of these had what I might call a two-level VCS.  Each developer
had one or more private repositories (the way people have working
directories now with SVN), but you had full VCS checkin/checkout (and
compare and rollback and so forth) within that.  Then, when your code
was ready for the repository, you did a commit step that pushed it
up from your private repository to the public repository.


I wouldn't call that 2-level, it's simply branching, and all VCS/SCM
systems have this, even rcs.  Some expose all changes in the private
branch to everyone (modulo protection mechanisms), some only expose changes
that are put back (to use Sun teamware terminology).

Both CVS and SVN have this.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Chad Leigh -- Shire.Net LLC


On Oct 5, 2006, at 7:47 PM, Chad Leigh -- Shire.Net LLC wrote:

I find the unix conventions of storying a file and file~ or any  
of the other myriad billion ways of doing it that each app has  
invented to be much more unwieldy.



sorry,  storing a file, not storying

---
Chad Leigh -- Shire.Net LLC
Your Web App and Email hosting provider
chad at shire.net





smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Chad Lewis


On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote:

On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet [EMAIL PROTECTED] 
b.net wrote:

Well, unless you have a better VCS than CVS or SVN.  I first met this
as an obscure, buggy, expensive, short-lived SUN product, actually; I
believe it was called NSE, the Network Software Engineering
environment.  And I used one commercial product (written by an NSE
user after NSE was discontinued) that supported the feature needed.
Both of these had what I might call a two-level VCS.  Each developer
had one or more private repositories (the way people have working
directories now with SVN), but you had full VCS checkin/checkout (and
compare and rollback and so forth) within that.  Then, when your code
was ready for the repository, you did a commit step that pushed it
up from your private repository to the public repository.


I wouldn't call that 2-level, it's simply branching, and all VCS/SCM
systems have this, even rcs.  Some expose all changes in the private
branch to everyone (modulo protection mechanisms), some only expose  
changes

that are put back (to use Sun teamware terminology).

Both CVS and SVN have this.

-frank



David is describing a different behavior. Even a branch is still  
ultimately on the single,
master server with CVS, SVN, and more other versioning systems.  
Teamware, and a few
other versioning systems, let you have more arbitrary parent and  
child relationships.


In Teamware, you can create a project gate, have a variety of people  
check code into this
project gate, and do all of this without ever touching the parent  
gate. When the
project is done, you then checkin the changes to the project gate's  
parent.


The gate parent may itself be a child of some other gate, making the  
above
project gate a grand-child of some higher gate. You can also change a  
child's parent,
so you could in fact skip the parent and go straight to the grand  
parent if you wish.


For that matter, you can re-parent the parent to sync with the  
former child if you

had some reason to do so.

A Teamware putback really isn't a matter of exposure. Until you do a  
putback to the
parent, the code is not physically (or even logically) present in the  
parent.


Teamware's biggest drawbacks are a lack of change sets (like how  
Subversion tracks
simultaneous, individual changes as a group) and that it only runs  
via file access

(no network protocol, filesystem or NFS only.)

Mercurial seems to be similar to Teamware in terms of parenting, but  
with network protocol

support builtin. Which is presumably OpenSolaris will be using it.

ckl


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A versioning FS

2006-10-05 Thread Frank Cusack

On October 5, 2006 7:02:29 PM -0700 Chad Lewis [EMAIL PROTECTED] wrote:


On Oct 5, 2006, at 6:48 PM, Frank Cusack wrote:


On October 5, 2006 5:25:17 PM -0700 David Dyer-Bennet [EMAIL PROTECTED]
b.net wrote:

Well, unless you have a better VCS than CVS or SVN.  I first met this
as an obscure, buggy, expensive, short-lived SUN product, actually; I
believe it was called NSE, the Network Software Engineering
environment.  And I used one commercial product (written by an NSE
user after NSE was discontinued) that supported the feature needed.
Both of these had what I might call a two-level VCS.  Each developer
had one or more private repositories (the way people have working
directories now with SVN), but you had full VCS checkin/checkout (and
compare and rollback and so forth) within that.  Then, when your code
was ready for the repository, you did a commit step that pushed it
up from your private repository to the public repository.


I wouldn't call that 2-level, it's simply branching, and all VCS/SCM
systems have this, even rcs.  Some expose all changes in the private
branch to everyone (modulo protection mechanisms), some only expose
changes
that are put back (to use Sun teamware terminology).

Both CVS and SVN have this.

-frank



David is describing a different behavior. Even a branch is still  ultimately on 
the single,
master server with CVS, SVN, and more other versioning systems.  Teamware, and 
a few
other versioning systems, let you have more arbitrary parent and  child 
relationships.


How are branches not arbitrary parent and child relationships?  (except
in cvs where branches pretty much suck but still it's close)


A Teamware putback really isn't a matter of exposure. Until you do a  putback 
to the
parent, the code is not physically (or even logically) present in the  parent.


That is what I meant by exposure -- whether or not private code is
available to others.  But how does that matter?

The difference between teamware (or git or bk or mercurial) and cvs (or
svn or p4) here is that everyone can see all private branches and everyone
can see each change in a private branch (again, modulo protections).
That doesn't matter to the main branch.  The code is not in the main
branch logically (physically doesn't matter) until you integrate or
putback.

My point is that having a private branch, where you can check in changes
to your heart's content, and re-branch at will, and don't have to follow
must compile rules, can be handled by most any VCS.  Which is what
David was saying is needed for it to replace the functionality of a
versioned filesystem.

Some of them (eg p4) handle branching much better than others, making
this easier, but all of them can do it.

Wow, I'm surprised teamware doesn't have changelists or a similar concept.
Talk about stone ages. :-)

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss