How fast: How fast is almost meaningless to ask, since it depends a lot on what's between CouchDB and your chat clients.
After a change is written to the database, the internal change listener will get the update almost immediately. From there it's pushed down the ?feed=continuous long poll to a _changes consumer (e.g. another couch doing pull replication or a client) or two http requests (usually with keep-alive) to a push replication destination CouchDB. For *one* pull replication or _changes hop it is (at best, feed is up-to-date, consumer is waiting for the next entry from the server) the time for the producer (couchdb) and consumer (couchdb, client, etc) to (de)-serialize and send/receive one line of JSON text. Nothing more. This can be really fast. Should you use CouchDB? Let's assume this project gets interesting and you need multiple nodes like you described. You could partition your clients between CouchDB nodes using a consistent hash on a normalized name of the user to divide up the resources of a cluster. You would then filter the replications such that each Couch only receives messages intended for its connected users. The biggest hurdle here is checkpointing. Since replication needs to know where to begin if it's restarted, you need to create a replication topology or strategy that is both resilient to network outages and doesn't require checking the entire chat history of everything should you need to change your replication pattern (in response to failure, scaling, reconfiguration, etc). If I were doing it this way I would maybe keep and "inbox" and "outbox" database on every node. You could even name outbox something like "ramdisk/outbox" and mount a RAM disk as "ramdisk" in the CouchDB storage directory so that "outbox.couch" gets stored in there. When your clients send messages you could store them in the outbox and trust that when they arrive at the right "inbox" on some CouchDB they will be persisted there. You could even round robin through many outboxes, or have one per hour or so. This keeps your storage down and opens up the interesting replication patterns for pushing messages through a redundantly connected graph of Couches without building up a massive database that will be hard to replicate (except the inboxes at the edges). Using CouchDB for a chat server is an interesting idea, but I don't know of anyone using CouchDB for replication that is this 'gossipy'. I think BigCouch might do some every-to-every node replication for keeping cluster information and database metadata up to date around the cluster, but that information tends to be small and changes infrequently. However, to me this sounds like a lot of work for something that might be better solved using technologies like zeromq, particularly if logging all messages is optional. Anyway, I'm happy to talk about all of this further since I think it's kind of fascinating. I've been thinking a lot recently about how flood replication could function efficiently in a dynamic environment, but it's mostly open questions right now. I hope that provides some direction and thought guidance. Please let me know if anything didn't make sense or you have other interesting ideas or questions. I think it could be made to work, but it's not a natural fit at scale for the existing replication model at this time. Cheers, Randall On Fri, Dec 17, 2010 at 13:57, Johnny Weng Luu <[email protected]> wrote: > Hi > > Im designing a chat app and i thought about this design: > > Clients are connected to the nearest couchdb and listening for changes (chat > texts). > If one client posts a new message it will be inserted in that client's > couchdb node. > The change will be propagated to other couchdb nodes in the cluster. > The clients connected to those couchdb nodes will get that message. > > But this design is heavily dependent on how fast couchdb propagates changes > to other nodes. > Is this a good design with couchdb or is it not intended for this design? > > How else could you design a chat application with couchdb? > > /Johnny >
