From the previous post we have that the client sends events to the
server by PUTting them to the /changes Atom feed. But what happens next?
How are those changes applied to the snapshots for read?
One important thing to realize is that for the changes to be accepted
only the following need to happen:
1) Validate that changes are consistent. This includes checking version
of written and optionally read state. If changes only only includes new
entities then validation can be skipped.
2) The event (e.g. the XML from the previous post) needs to be
transactionally persisted. It must include a pointer to the previous event.
3) The EntityStore needs to transactionally store a pointer to the most
recent event that was posted.
See attached image for example. Starting from the pointer in the
EntityStore (2 in the image) the series of events then form a linked
list which can be traversed back to the beginning of the series of
changes. This list can optionally be stored outside of the changes
themselves, to optimize for traversal rather than retrieval.
What we want to do now is achieve eventual consistency, that is, the
read snapshots accessed by EntityStoreUnitOfWork.getEntityState() which
internally calls /entity?id=1234 needs to have the latest snapshot of
"1234" somehow. To get this the read-store will go to /changes, which is
an Atom feed. The result will be something like:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Changes</title>
<updated>2009-04-13T12:30:05Z</updated>
<entry>
<title>Add new task to project</title>
<link href="http://example.org/changes/unitofwork/aG324JWH"/>
<id>urn:uuid:aG324JWH</id>
<updated>2009-04-13T12:30:05Z</updated>
</entry>
<entry>
<title>Create project</title>
<link href="http://example.org/changes/unitofwork/bz452HSQ"/>
<id>urn:uuid:bz452HSQ</id>
<updated>2009-04-10T10:21:15Z</updated>
</entry>
</feed>
---
The feed includes the linked list of UnitOfWork events that have been
persisted, with the last one first. If there are lots of events, then it
can be chunked (let's say 100 for each feed), and then traversed
backwards in time using /changes?start=db325JH2 to indicate which event
to have first in the feed. The reader will keep track of how far back it
has read and simply traverses back to just before what it has already
read, and then gets the UnitOfWork's, one at a time, and applies them
locally to Entity snapshots. For performance we can optionally allow the
feed to include the state directly using the <content> tag. This should
be indicated in the URL though to allow both traversal (links as
entries) and retrieval (content as entries).
Note now that there can be any number of readers here, and the writer
with the /changes URL does not have to know who they are. Either the
readers update every once in a while, or they can get open feeds where
the server simply holds the connection open until data becomes
available, to minimize the lag from receiving the change and applying it.
Note also that these feeds can be used for all sorts of fun stuff...
more on that in later posts.
To conclude, with this simple REST-based scheme we can achieve extremely
good performance for writing (since the writer does not have to update
snapshots, only log the events), and also arbitrary reader scalability,
as all you have to do is add more readers to the feed. Either all
readers can have all state, or you can do consistent hashing for content
routing in order to do data partitioning.
This also provides an answer as to what the version is of each entity.
It is not "1,2,3,4" or something like that. Instead it is the id of the
last applied UnitOfWork!
In this scheme the reader can choose the level of consistency of data.
Either the client can just get "whatever is there" when /entity is
called, or if greater consistency is required, then the reader can call
the feed to ensure that there are no more changes to be applied to the
entity being accessed.
The reader can also choose how to interact with the change feed. If the
reader is on the same network you might want to do HTTP streaming calls
so that whenever data is available in the writer it is sent to writers.
On the other end of the spectrum you have WAN-access which happens once
every day to get the changes for the past 24h. Consistency is lower, but
there is also less demand on the network, especially if content is
included in the feed, and the feed is then also gzipped.
Continued in part 5.
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev