> Thanks for the additional insight on this -- think of a CDN that needs to
> respond to requests, distributed around the globe. Ultimately you would hope
> that each edge location could respond as quickly as possible (RF=N) but if
> each of the ring members keep open/active connections to each other, and a
> request comes in to an edge location that does not contain a copy of the
> data, does it request the data from the node that does, then cache it (in
> the case of more requests coming into that edge location with the same
> request) or does it reply once and forget it, requiring *each* subsequent
> request to that node to always phone back home to the node that actually
> contains it?
> The CDN/edge-server scenario works particularly well to illustrate my goals,
> if visualizing that helps.
> Look forward to your thoughts.

Nodes will never cache any data. Nodes have the data that they own
according to the ring topology and the replication factor (to the
extent that the data has been replicated); the node you happen to talk
to is merely a "co-ordinator" of a request; essentially a proxy with
intelligent routing to the correct hosts.

In the CDN situation, if you're talking about e.g. having a group of
servers in one "place" (network topologically distinct location, such
as geographically distinct) then a better fit than RF=N is probably to
use multi-site support and say that you want a certain number of
copies for each location and have all clients talk to the most local
"site".

But that's assuming you want to try to model this using just
Cassandra's replication to begin with. Dynamically caching wherever
data is accessed is a good idea for a CDN use-case (probably), but is
not something that Cassandra does itself, internally. It's really
difficult to know what the best solution is for a CDN; and in your
case you imply that it's really *not* a CDN and it's just an analogy
;)

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Reply via email to