On 2009-12-01 10.52, Niclas Hedhman wrote:
Another option would be to not allow any changes, only additions
(provided that S3 can handle atomic additions), and resolve the state
in higher level code. For some systems there could be an explosion of
storage, which could be solved by having a kind of GC chewing along in
the background.

This seems to be the direction that for example XVM and Clojure are going in, where time is explicitly modeled, and hence objects are always immutable. If you listen to Rich Hickeys presentation about Clojure on InfoQ he makes it really clear that by not modeling time we are getting into serious trouble, and it doesn't take much complexity to get these issues either. I don't particularly like the idea of coding in Lisp again, but his assertions on objects, state and time are sound.

Quick summary: today object references refer to a state structure directly:
ref -> struct

When you have several concurrent users of this, doing changes screw things up:
ref1 -> struct
ref2 -> struct

Ref1 modifies the struct and so the user of ref2 will see them immediately.

So what you do is to introduce one more level of references, and it becomes
ref1 -> {id,time1} -> struct1
ref2 -> {id,time1} -> struct1

If the code with the ref1 changes a field in the object it becomes:
ref1 -> {id,time2} -> struct2
ref2 -> {id,time1} -> struct1

And so the view that ref2 has is unchanged. All code execute in transactions, and so when a new transaction is started it will then see the new state that the tx for ref1 produced:
ref3 -> {id,time2} -> struct2

This is very similar to what we do with UnitOfWork, except in Clojure it's done on a language level, and they call it STM. (VERY simplified here, but this is the gist of it, AFAICT)

The point is that time, more specifically the time of execution of a UnitOfWork, is handled explicitly. Basically, if a UoW starts at time 122, then it will see all state as it was at time 122, regardless of whether another UoW changed objects to some other state after that. In our terms, I think each entity would be stored using <id,version> rather than just <id>, and yes, then you'd have some form of GC to clean up unused state.

However, loading/storing state using this seems to me to be the easy part. Doing querying and indexing using time as a key factor seems like a bigger headache to me, although, Rich do describe some pretty fancy BTree structures that are immutable as well, which *could* help in that regard.

In short, I think the whole industry is slowly waking up to the idea that all state is always immutable, and that we use GC to deal with cleaning up old unused state, but to get those notions into persistence is going to take awhile.

OR, it might also be one of those cases where if you try, a TON of complexity just falls away and making it fast, concurrent and scalable is simply a trivial sideeffect of immutability.

Interesting subject, for sure :-)

/Rickard

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to