With the eventual consistency feature available through the Atom feeds
as described in the previous post, the next question is: what kind of
readers are possible to make use of it?
Here is the current list I have managed to figure out:
* Entity readers
* Entity indexers
* Backup
* Logging
* Reporting
I will go through them and their characteristics below.
Entity readers
==============
The most obvious one is the reader that answers /entity requests. This
creates the main feedback loop between clients and servers. Clients read
from /entity, uses domain logic to come up with a change set, which is
then sent to /changes. Readers may either be required to be Consistent,
in that they can only answer requests if they have processed all
changes, or they can be Available and Partition tolerant, in the sense
that calls to /entity simply returns what's available right now, and the
data might be slightly out of date. It all depends on the clients
requirements (response time vs accuracy vs availability).
What happens here is that the reader gets the /changes feed and applies
the changes on the local database, which contains the snapshots of the
entities to be read (either all of them or a partition). Each change can
cause the snapshot to be updated, or optionally create a new snapshot,
so that you can easily traverse the database back in time if you want to.
Entity indexers
===============
The same feed could also be used to update the index used by /query REST
requests. There could also be many indexers in parallel, to cover
various needs. One reader could use RDF, one could use Lucene and one
could use Neo4j. REST routing would be used to figure out which one is
used where. For example, /query/queries could return a feed of named
queries, each of which points to a different backend:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Named queries</title>
<link href="http://example.org/query/queries"/>
<entry>
<title>User by name</title>
<link href="http://example.org/query/rdf/User_by_name"/>
</entry>
<entry>
<title>User friends</title>
<link href="http://neo4j.example.org/query/neo4j/User_friends"/>
</entry>
<entry>
<title>Message by content</title>
<link href="http://example.org/query/lucene/Message_by_content"/>
</entry>
</feed>
Note that both paths and hostnames change... the client will read this
to find out where the specific queries are. Each of these could be
clustered, or changed as the system evolves, without having to update
the client.
Backup
======
In my experience backups is a very complicated matter. The naive
developer might think that if they simply allow the admin to do backups
once every day of the application database everything is just fine. In
reality I have never met a customer who is satisfied with this, not in
terms of making the backups, but rather in terms of restoring backups
from such nightly snapshots. If the database breaks and a restore is
requested, using a nightly snapshot would mean that the customer loses
on average 12 hours worth of operation on it. This is simply not
acceptable. When I was working on SiteVision we therefore had to come up
with some funky tricks to do partial restoration of databases (so that a
minimal amount of the current working database is lost), either by
selecting parts of databases to be restored, or even on an individual
object basis. This is non-trivial, and ensuring that the result is
consistent is theoretically impossible. But, if you talk to an average
developer who is not familiar with deployment and administration
concerns this is what they would suggest.
Another major problem is taking backups of the online database. Some
databases do support that backups are taken while the system is running,
but this still causes the system to lose performance. In the SiteVision
case I had one customer who never got a full backup, simply because
their monitoring system restarted SiteVision every night at 2.15am
because it was responding too slow. This was when the backup was being
made. They therefore had a whole lot of partial backups, but not a
single completed one. They eventually had a crash which is when this was
found out... (funny note: the admin folks actually thought it was normal
to restart the server every night. That's just sad)
With the EventSourced approach this problem becomes muuuuch simpler to
handle. Simply create a separate backup server which reads the changes
feed and applies it locally to a database. When a backup is requested,
stop reading changes from the master, take the snapshot backup and
include the id of the last read message. Also do a backup of all the
changes that have been done. If the changes are stored in plain files
you can easily use incremental backups to minimize how much is copied
each time. When the backup is done, resume reading changes so that the
backup copy eventually catches up and has more or less the current state.
When restoring from backup, get the snapshot, and then apply changes
from the time of the backup up until now, or if there are some changes
you don't want to include, such as "Delete the whole database", then
filter those out. This way the customer will not lose any important
business data and you can ensure that the database is in a consistent
state when resuming operation.
Logging
=======
In a sense this is the simplest case: have a separate server that gets
the changes and stores them locally, optionally in a format that makes
it easier to search them. This can be used for finding out what has
happened in a system, either for debugging purposes or legal or similar.
It is also possible to let the logger listen to several /changes feeds
and get a syndicated view of everything that is going on in multiple
systems. This can also be extended into a monitoring system where you
for example measure nr of changes per hour, or "nr of 'New object X'
messages per hour", or similar. Lots of interesting monitoring things
become trivial.
Reporting
=========
One of the main culprits in continuing the "Objects in RDBMS" fallacy is
that customers want to do reporting using live data. This is so wrong I
don't want to even spend time explaining why, because it's a whole essay
on its own. With the EventSourced approach we now have a way to take the
messages and massage them into a RDBMS (yes plain SQL, tables, rows and
columns) in a format that is suitable for whatever report is needed. It
can be done as many times as necessary, and if the reporting needs
changes, then simply throw away the tables and start all over, reading
the /changes from day one up until the last message. This provides a
much better basis for doing advanced reports, and also ensures that
reporting does not impact domain modeling or the online application
server performance.
Conclusion
==========
This is a short rundown of what you can do with readers in an
EventSourced system. As you can see a whole bunch of typically complex
problems become muuuch easier to deal with.
Continued in part 6.
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev