With the eventual consistency feature available through the Atom feeds as described in the previous post, the next question is: what kind of readers are possible to make use of it?

Here is the current list I have managed to figure out:
* Entity readers
* Entity indexers
* Backup
* Logging
* Reporting

I will go through them and their characteristics below.

Entity readers
==============
The most obvious one is the reader that answers /entity requests. This creates the main feedback loop between clients and servers. Clients read from /entity, uses domain logic to come up with a change set, which is then sent to /changes. Readers may either be required to be Consistent, in that they can only answer requests if they have processed all changes, or they can be Available and Partition tolerant, in the sense that calls to /entity simply returns what's available right now, and the data might be slightly out of date. It all depends on the clients requirements (response time vs accuracy vs availability).

What happens here is that the reader gets the /changes feed and applies the changes on the local database, which contains the snapshots of the entities to be read (either all of them or a partition). Each change can cause the snapshot to be updated, or optionally create a new snapshot, so that you can easily traverse the database back in time if you want to.

Entity indexers
===============
The same feed could also be used to update the index used by /query REST requests. There could also be many indexers in parallel, to cover various needs. One reader could use RDF, one could use Lucene and one could use Neo4j. REST routing would be used to figure out which one is used where. For example, /query/queries could return a feed of named queries, each of which points to a different backend:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom";>

  <title>Named queries</title>
  <link href="http://example.org/query/queries"/>

  <entry>
    <title>User by name</title>
    <link href="http://example.org/query/rdf/User_by_name"/>
  </entry>
  <entry>
    <title>User friends</title>
    <link href="http://neo4j.example.org/query/neo4j/User_friends"/>
  </entry>

  <entry>
    <title>Message by content</title>
    <link href="http://example.org/query/lucene/Message_by_content"/>
  </entry>
</feed>
Note that both paths and hostnames change... the client will read this to find out where the specific queries are. Each of these could be clustered, or changed as the system evolves, without having to update the client.

Backup
======
In my experience backups is a very complicated matter. The naive developer might think that if they simply allow the admin to do backups once every day of the application database everything is just fine. In reality I have never met a customer who is satisfied with this, not in terms of making the backups, but rather in terms of restoring backups from such nightly snapshots. If the database breaks and a restore is requested, using a nightly snapshot would mean that the customer loses on average 12 hours worth of operation on it. This is simply not acceptable. When I was working on SiteVision we therefore had to come up with some funky tricks to do partial restoration of databases (so that a minimal amount of the current working database is lost), either by selecting parts of databases to be restored, or even on an individual object basis. This is non-trivial, and ensuring that the result is consistent is theoretically impossible. But, if you talk to an average developer who is not familiar with deployment and administration concerns this is what they would suggest.

Another major problem is taking backups of the online database. Some databases do support that backups are taken while the system is running, but this still causes the system to lose performance. In the SiteVision case I had one customer who never got a full backup, simply because their monitoring system restarted SiteVision every night at 2.15am because it was responding too slow. This was when the backup was being made. They therefore had a whole lot of partial backups, but not a single completed one. They eventually had a crash which is when this was found out... (funny note: the admin folks actually thought it was normal to restart the server every night. That's just sad)

With the EventSourced approach this problem becomes muuuuch simpler to handle. Simply create a separate backup server which reads the changes feed and applies it locally to a database. When a backup is requested, stop reading changes from the master, take the snapshot backup and include the id of the last read message. Also do a backup of all the changes that have been done. If the changes are stored in plain files you can easily use incremental backups to minimize how much is copied each time. When the backup is done, resume reading changes so that the backup copy eventually catches up and has more or less the current state.

When restoring from backup, get the snapshot, and then apply changes from the time of the backup up until now, or if there are some changes you don't want to include, such as "Delete the whole database", then filter those out. This way the customer will not lose any important business data and you can ensure that the database is in a consistent state when resuming operation.

Logging
=======
In a sense this is the simplest case: have a separate server that gets the changes and stores them locally, optionally in a format that makes it easier to search them. This can be used for finding out what has happened in a system, either for debugging purposes or legal or similar. It is also possible to let the logger listen to several /changes feeds and get a syndicated view of everything that is going on in multiple systems. This can also be extended into a monitoring system where you for example measure nr of changes per hour, or "nr of 'New object X' messages per hour", or similar. Lots of interesting monitoring things become trivial.

Reporting
=========
One of the main culprits in continuing the "Objects in RDBMS" fallacy is that customers want to do reporting using live data. This is so wrong I don't want to even spend time explaining why, because it's a whole essay on its own. With the EventSourced approach we now have a way to take the messages and massage them into a RDBMS (yes plain SQL, tables, rows and columns) in a format that is suitable for whatever report is needed. It can be done as many times as necessary, and if the reporting needs changes, then simply throw away the tables and start all over, reading the /changes from day one up until the last message. This provides a much better basis for doing advanced reports, and also ensures that reporting does not impact domain modeling or the online application server performance.

Conclusion
==========
This is a short rundown of what you can do with readers in an EventSourced system. As you can see a whole bunch of typically complex problems become muuuch easier to deal with.

Continued in part 6.

_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev

Reply via email to