From the previous post we have that the client sends events to the server by PUTting them to the /changes Atom feed. But what happens next? How are those changes applied to the snapshots for read?

One important thing to realize is that for the changes to be accepted only the following need to happen: 1) Validate that changes are consistent. This includes checking version of written and optionally read state. If changes only only includes new entities then validation can be skipped. 2) The event (e.g. the XML from the previous post) needs to be transactionally persisted. It must include a pointer to the previous event. 3) The EntityStore needs to transactionally store a pointer to the most recent event that was posted.

See attached image for example. Starting from the pointer in the EntityStore (2 in the image) the series of events then form a linked list which can be traversed back to the beginning of the series of changes. This list can optionally be stored outside of the changes themselves, to optimize for traversal rather than retrieval.

What we want to do now is achieve eventual consistency, that is, the read snapshots accessed by EntityStoreUnitOfWork.getEntityState() which internally calls /entity?id=1234 needs to have the latest snapshot of "1234" somehow. To get this the read-store will go to /changes, which is an Atom feed. The result will be something like:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom";>

  <title>Changes</title>
  <updated>2009-04-13T12:30:05Z</updated>

  <entry>
    <title>Add new task to project</title>
    <link href="http://example.org/changes/unitofwork/aG324JWH"/>
    <id>urn:uuid:aG324JWH</id>
    <updated>2009-04-13T12:30:05Z</updated>
  </entry>

  <entry>
    <title>Create project</title>
    <link href="http://example.org/changes/unitofwork/bz452HSQ"/>
    <id>urn:uuid:bz452HSQ</id>
    <updated>2009-04-10T10:21:15Z</updated>
  </entry>
</feed>
---
The feed includes the linked list of UnitOfWork events that have been persisted, with the last one first. If there are lots of events, then it can be chunked (let's say 100 for each feed), and then traversed backwards in time using /changes?start=db325JH2 to indicate which event to have first in the feed. The reader will keep track of how far back it has read and simply traverses back to just before what it has already read, and then gets the UnitOfWork's, one at a time, and applies them locally to Entity snapshots. For performance we can optionally allow the feed to include the state directly using the <content> tag. This should be indicated in the URL though to allow both traversal (links as entries) and retrieval (content as entries).

Note now that there can be any number of readers here, and the writer with the /changes URL does not have to know who they are. Either the readers update every once in a while, or they can get open feeds where the server simply holds the connection open until data becomes available, to minimize the lag from receiving the change and applying it.

Note also that these feeds can be used for all sorts of fun stuff... more on that in later posts.

To conclude, with this simple REST-based scheme we can achieve extremely good performance for writing (since the writer does not have to update snapshots, only log the events), and also arbitrary reader scalability, as all you have to do is add more readers to the feed. Either all readers can have all state, or you can do consistent hashing for content routing in order to do data partitioning.

This also provides an answer as to what the version is of each entity. It is not "1,2,3,4" or something like that. Instead it is the id of the last applied UnitOfWork!

In this scheme the reader can choose the level of consistency of data. Either the client can just get "whatever is there" when /entity is called, or if greater consistency is required, then the reader can call the feed to ensure that there are no more changes to be applied to the entity being accessed.

The reader can also choose how to interact with the change feed. If the reader is on the same network you might want to do HTTP streaming calls so that whenever data is available in the writer it is sent to writers. On the other end of the spectrum you have WAN-access which happens once every day to get the changes for the past 24h. Consistency is lower, but there is also less demand on the network, especially if content is included in the feed, and the feed is then also gzipped.

Continued in part 5.

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to