Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.

Full Story:
We're planning on adding data centers throughout the US.  Our platform is
used for business communications.  Each DC currently utilizes elastic
search and redis.  A message can be sent from one user to another, and the
intent is that it would be seen in near-real-time.  This means that 2
people may be using different data centers, and the messages need to
propagate from one to the other.

On the plus side, we know we get this with Cassandra (fist pump) but the
other pieces, not so much.  Even if they did work, there's all sorts of
race conditions that could pop up from having different pieces of our
architecture communicating over different channels.  From this, we've
arrived at the idea that since Cassandra is the authoritative data source,
we might be able to trigger events in DC2 based on activity coming through
either the commit log or some other means.  One idea was to use a CF with a
low gc time as a means of transporting messages between DCs, and watching
the commit logs for deletes to that CF in order to know when we need to do
things like reindex a document (or a new document), bust cache, etc.
 Facebook did something similar with their modifications to MySQL to
include cache keys in the replication log.

Assuming this is sane, I'd want to avoid having the same event register on
3 servers, thus registering 3 items in the queue when only one should be
there.  So, for any piece of data replicated from the other DC, I'd need a
way to determine if it was supposed to actually trigger the event or not.
 (Maybe it looks at the token and determines if the current server falls in
the token range?)  Or is there a better way?

So, my questions to all ye Cassandra users:

1. Is this is even sane?
2. Is anyone doing it?


-- 
Jon Haddad
http://www.rustyrazorblade.com
skype: rustyrazorblade

Reply via email to