Sorry, I planned to post this about two weeks ago :-) but here it goes...

I've been chatting with Peter about persistence for s5, and the
version-control-like functionality we discussed about some time ago on the
list.  He said both things should go hand in hand, and I tend to agree;
one feature I've always thought of for version control was "horizon",
meaning, being able to configure how many revisions to store; by setting
that to 1, you can effectively get persistence without version control.

The basic idea
==============

Peter is building a replication mechanism into s5.  There are two separate
concepts, "site" and "host", with the site being a tree of vobjects (same
concept as s4), and a host being a "realisation" of a site; the
implication being that a site may be "clustered" on a bunch of hosts.

So the replication system is used to pass update data between hosts that
share a given object, but also to handle remote "mirror" objects (same
concept as s4 remote vobject).  Both are done by some variation of the
subscriber pattern.

Now, the first thing that floats up from this, is that you could have
cluster setups where only one host does persistence; on cluster "bootup",
this host would read the objects and initialise the whole cluster.

What we're thinking is to take this one level further, and implement
persistence itself as a "host"; so even in a single-host setup,
persistence would be a discrete thing, that sits in a corner, and
communicates with the host via the inter-host replication "protocol".
Which would allow it to be off-process, if you want.

Now, in version control terms, you can think of each host as a branch. So
synchronisation between hosts is equivalent to a merge; and updating a
mirror object is like updating a working copy (the "update" command in
bzr/svn/cvs/git/etc).  In both cases, the "client" host will send a
reference (the id of the revision it already has, which may be "none"). 
For an update, the "sending" host may calculate what is more efficient to
send, a full copy or a delta.  For a merge, we want instead to send all
revisions the "client" host doesn't have. (Maybe we could tie the horizon
setting in here; if the "client" sends its horizon preference with the
request, and the number of "new" revisions exceeds the horizon, just send
the last N.)

(Although according to Peter, "host == branch" is not entirely correct; a
host could have more than one site, so it's probably more accurate to say
a site is a repository, in the bzr sense.)

Version control: what's stored
==============================

A version control branch corresponds to a "concrete site", by which I mean
the information about one site as seen by one host (as opposed to the
"full site", which is the most up-to-date version as seen by the whole
network of hosts that hold that site).  Only one piece of information is
held at this level: a full list of all object ids in the site (so we can
control things like, in which revision an object got created or deleted).

The real bulk of it is at vobject level.  You could say a branch is made
of a soup of vobject histories.  (By the term "soup" I mean, they aren't
stored in any kind of hierarchy or order.)  A given vobject history is
"tagged" by the (immutable) object id.  Each revision of a vobject in
history holds:

- type list
- child list
- payload, if any (eg properties)
- security capabilities

Version control: how it's stored
================================

Atomicity is of course important, and a "transparent" version control
system is only useful if merging is smart, because there will be no human
element to resolve conflicts.  This all ties into how historic information
should be stored internally.

Most recent brainpower in version control projects like bzr, in the last
few years, went in the direction of line-based approaches. Although bzr
has evolved past the weave format, it's very easy to explain: a weave is a
sequence of line groups, where each group is marked with the revision id
where it got added or removed.

So, a weave of numbers instead of lines, using {} to represent an addition
and [] for a removal, and letters for revision ids, could look like this:
{a537[b[c1]b{b94}]c{c20}68}.  This would represent a revision "a" being
537168, "b" being 5379468, "c" 5372068.

Or go to http://bazaar-vcs.org/BzrWeaveFormat?highlight=%28weave%29 for a
proper explanation :-)

The important point here is that weaves and more recent formats in the
same vein operate on sequences.  We often think of them as operating on
lines, but essentially, they're about sequences.  Just as my example had
sequences of numbers instead, we can easily use them for sequences of
pretty much anything.

Now, look at the 5 things we're storing in our version control:

- global list of ids
- vobject list of types
- vobject list of children
- vobject payload
- vobject list of capabilities

See a pattern?  ;-)  The only one who "doesn't belong" is the payload, but
then, most of them will be a small discrete value (like a number), while
occasionally we'll have long chunks of text, which are important to be
smart about.  So I think we can treat payload as bzr treat text files, and
store it as a sequence of lines.

This gives us "smart merge" for pretty much everything; with this kind of
format, the situations where you get a conflict are MUCH rarer than with
the usual one delta per revision.

I'll probably start this project by writing a "libmerge" or something like
that in C++, implementing a version of whatever is the latest tech in bzr,
for handling arbitrary (STL) sequences.

Revisions and transactions
==========================

Revisions are identified by GUIDs, rather than numbers, because numbers
change during merges.

Based on the actor model of s5, we came to this revision model: by
default, when an object (actor) finishes a "request", a revision is
committed.  By "request" I mean, it can call other methods **in the same
object** to help out, and these methods won't trigger commits when they
return.  A revision corresponds to all changes that were made in response
to a message from another object (local or remote). Tying into the actor
model, this revision will *only* commit the changes to that vobject;
that's specially important, since at that point a different thread may be
running something else, for a different vobject.  Also, of course, if
there were no changes, there's no point introducing a new revision.

It's important to note, during a "request", the object may (and probably
will, in many cases) send messages to other local objects. These will
trigger commits when they return!  And if those other objects call back to
the "first" object, then that will cause a commit too... which may end up
committing some changes that were made by the original method.  We think,
in normal usage, that shouldn't be a problem, so it's a reasonable default
behaviour.

You can, of course, escape the default.  I imagine having a method at the
host level, which unconditionally commits a new revision, taking a set of
objects as an argument.  (Well, not quite unconditionally -- rather, as
long as there have actually been any changes.)

The other thing you can do is a larger revision, essentially stopping
auto-commit for some time.  The way this happens internally is more
similar to a bzr "microbranch" than an SQL transaction, but we can still
call it a transaction.  So one host method "branches" current execution
line from the latest host revision.  From that point, all new revisions
from methods in "child requests" (or same call stack) will be on that
"microbranch", whether explicit or automatic.  It's important to note,
since this is a branch, those methods won't see any concurrent changes to
the objects, made by calls outside the branch. This is intentional and
important.  (I think.)  Then, of course, there would be a method to
explicitly reconcile the "microbranch", or it would happen automatically
at the end of the request where it was created.  What makes this branch
"micro" is that, on merge point, all "internal" revisions are discarded;
what gets committed to the main branch is one single revision,
accumulating all changes.

Peter, blow me off if that sounds too hard to do :-) it would imply the
ability of having more than one "version" of the same object in memory in
the same host, and knowing which one is the "right" one for a given
call...

Horizons
========

Off the top of my head, I think we'd like to be able to set a horizon per
host, per type, and per vobject, in that order of precedence (vobject
overrides type).  What if a vobject has two different types that specify a
horizon?  Respect the first?  The last?

What else
=========

The "protocol" for replication is a whole other can of worms, I'll let
Peter talk about that when he wants.  One point he asked me to remember is
that "cluster" replication propagates capability lists, while "mirror"
replication doesn't.  Probably.

best,
                                               Lalo Martins

-- 
      So many of our dreams at first seem impossible,
       then they seem improbable, and then, when we
       summon the will, they soon become inevitable.
                           -----
personal:                    http://lalo.hystericalraisins.net/
technical:                    http://www.hystericalraisins.net/
GNU: never give up freedom                 http://www.gnu.org/


_______________________________________________
vos-d mailing list
vos-d@interreality.org
http://www.interreality.org/cgi-bin/mailman/listinfo/vos-d

Reply via email to