On 2009-12-01 10.52, Niclas Hedhman wrote:
Another option would be to not allow any changes, only additions
(provided that S3 can handle atomic additions), and resolve the state
in higher level code. For some systems there could be an explosion of
storage, which could be solved by having a kind of GC chewing along in
the background.
This seems to be the direction that for example XVM and Clojure are
going in, where time is explicitly modeled, and hence objects are always
immutable. If you listen to Rich Hickeys presentation about Clojure on
InfoQ he makes it really clear that by not modeling time we are getting
into serious trouble, and it doesn't take much complexity to get these
issues either. I don't particularly like the idea of coding in Lisp
again, but his assertions on objects, state and time are sound.
Quick summary: today object references refer to a state structure directly:
ref -> struct
When you have several concurrent users of this, doing changes screw
things up:
ref1 -> struct
ref2 -> struct
Ref1 modifies the struct and so the user of ref2 will see them immediately.
So what you do is to introduce one more level of references, and it becomes
ref1 -> {id,time1} -> struct1
ref2 -> {id,time1} -> struct1
If the code with the ref1 changes a field in the object it becomes:
ref1 -> {id,time2} -> struct2
ref2 -> {id,time1} -> struct1
And so the view that ref2 has is unchanged. All code execute in
transactions, and so when a new transaction is started it will then see
the new state that the tx for ref1 produced:
ref3 -> {id,time2} -> struct2
This is very similar to what we do with UnitOfWork, except in Clojure
it's done on a language level, and they call it STM. (VERY simplified
here, but this is the gist of it, AFAICT)
The point is that time, more specifically the time of execution of a
UnitOfWork, is handled explicitly. Basically, if a UoW starts at time
122, then it will see all state as it was at time 122, regardless of
whether another UoW changed objects to some other state after that. In
our terms, I think each entity would be stored using <id,version> rather
than just <id>, and yes, then you'd have some form of GC to clean up
unused state.
However, loading/storing state using this seems to me to be the easy
part. Doing querying and indexing using time as a key factor seems like
a bigger headache to me, although, Rich do describe some pretty fancy
BTree structures that are immutable as well, which *could* help in that
regard.
In short, I think the whole industry is slowly waking up to the idea
that all state is always immutable, and that we use GC to deal with
cleaning up old unused state, but to get those notions into persistence
is going to take awhile.
OR, it might also be one of those cases where if you try, a TON of
complexity just falls away and making it fast, concurrent and scalable
is simply a trivial sideeffect of immutability.
Interesting subject, for sure :-)
/Rickard
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev