I made an app that does exactly what you described, although it was a
"for-fun" hack just to showcase couch to my buddies (it was called
tweetmesexy, so you can imagine how much fun it actually was)... What
I ended up doing was:
1) Each new comment is a new doc that references the tweet id
2) Use view collation to get the tweet(s) and comments via single
http call (http://wiki.apache.org/couchdb/View_collation)
3) Run a script (via cron or whatever) to move the comments to the
tweet (and delete the comments) when the tweet is no longer "hot".
This is not required, but in our case it allowed us to do some nifty
analytics thanks to couch's incremental map/reduce
As for if couch is a good fit or "update-heavy" applications, I think
an RDBMS has advantages in a true "update" scenario (like 'update
stats set counter=counter+1'). But remember, you are only using the
word "update" because couch's awesomeness allows you to even consider
storing the comments inline with the doc. Technically you can do the
same with an SQL database, using a serialized blob and have the same
conflict issues (without built-in revision love).
So assuming I'm correct that the structure of your data will be
similar if using a SQL database or couch, you would be well served
with couch:
1) You can archive the comments inline, as I mentioned above and run
cool map/reduce on the tweet and comments together
2) Simple master-master, allowing you to scale writes to your heart's
content
3) With SQL you'll need multiple queries (or go the ugly join route)
to get the comments and the tweet, vs a single http call
Bottom line, just because you find yourself structuring your data like
you would in an SQL database, does not negate the other advantages of
couch.
Troy
On Dec 28, 2009, at 10:09 PM, Sean Clark Hess wrote:
Our system will have comments related to live data - imagine people
commenting on tweets right after they are written.
I'm having trouble deciding how to model it. It makes a lot of sense
to make
one document containing all the comments for each data segment, but
we could
theoretically have hundreds of users commenting on the same segment at
once.
Would data consistency become a nightmare? With an RDBMS you would
have a
comments table, and insert a new row for each comment - preventing
conflicts. I could do the same thing with couch, by adding a separate
document for each comment, but it seems to violate a fundamental
principle
of couch.
Is Couch DB a bad fit for an update-heavy system? Updates will only
be heavy
within the first minute or so after the data is released, then it will
switch to a very read-heavy system.
Thanks for your help