Jamie hits a lot of points, ill cover sone, mind the overlap.
Couch has a built in indexing mechanism called a b-tree. Every view
gets it's own b-tree. The larger your data set the longer it takes to
build the b-tree.
Couch also knows when the b-tree was built in relation to data on disk
via internal sequence identifiers for each record. If your index is
older than data on disk it will update the b-tree with the new data.
Riak has no native indexing mechanism. If you want to reuse m/r
results you need to stash them somewhere.
The subscription mechanism for changes is very nice but works because
couch is not a distributed system. It is a replicated system. That
feature should be worked into all rdbms's. Very handy.
The big win for couch is that you can arrange it in all sorts of
interesting topologies for replication. It can be used in offline
systems that need to sync when they get back online.
The couch guys are also working to scale couch down so you can use
couch on phones and other portable devices.
Like riak, couch is erlang based so you get all the erlang love. But
unlike riak couch is only accessible over http. Riak has
protocolbuffers and native access.
Couch uses a wol (write only log) like bitcask (the default backend
for riak). They both need to be compacted to reclaim space. But unlike
couch, riak can also use other backends in the same cluster which
gives you flexibility.
And as hit on already but imho the biggest difference between couch
and riak is that in order to scale couch you need to implement a
sharding layer to split your data between multiple couches (see big
couch, lounge). Riak is a distributed system so all you need to do to
scale riak is add more nodes. I once tweeted something like "couch:
divide and conquer. Riak: one ring to rule them all."
Best, Alexander
@siculars on twitter
http://siculars.posterous.com
Sent from my iPhone
On Jan 28, 2011, at 8:29, Jamie Talbot <[email protected]> wrote:
Hey Joshua,
I'm relatively new to Riak, but have done quite a bit of
investigation into CouchDB, so this is as much to confirm my own
understanding as anything. With that disclaimer out of the way,
here's what I understand about the two.
Couch has excellent database consistency - killing the server
process dead won't lose you any data, and recovering after a crash
is very quick. Fault tolerance I would say is Riak's biggest
selling point, with the ability to configure how many nodes can fail
before results can no longer be returned or written. You can kind
of achieve fault tolerance with Couch by load-balancing behind a
proxy, but it's a kludge compared to the fault-tolerance that is at
the very heart of Riak.
Both CouchDB and Riak have map/reduce functionality available
through REST, using Erlang or Javascript. With Couch, querying the
data can be problematic though, especially on large sets of data as
you have to pre-define views of how you want to extract data and
then wait for them to be built. It's certainly not true that you
can just choose any old design and then figure things out later.
Building views can take a long time - on a few hundred million rows
in my sample, it took a number of weeks to build one relatively
minor view (though hardware was quite limited). This makes RAD with
CouchDB difficult, and was a significant business risk. The upside
here is that once built, I could query 7 years of ISP data at year,
month, day, hour, minute granularity, across any cross-section of
services in a handful of milliseconds. This was incredible, and
pretty addictive - it's lightning fast, for very specific use cases.
The space requirements for Couch are enormous though, as updates and
even deletes increase the size of the DB, until compacted. Riak too
will use additional space to store duplicate copies of data on
different nodes, to provide fault tolerance, though from my
experiments the overhead is nothing like recent versions of CouchDB
for my specific use cases. Your mileage will vary greatly, based on
your configuration of Riak and the characteristics of your Couch
views.
Riak, from what I understand is not currently particularly well-
suited to retrieving large amounts of data sequentially by key, but
CouchDB works very quickly here, as long as you have defined a
suitable view.
Couch does bi-directional replication, though I did find that a
little flaky, sometimes dying for no reason. No data loss of
course, and it did eventually sync, but frustrating nonetheless.
This was as of the previous version. Riak does replication of data
as part of its architecture, but if you want to scale to multiple
datacentres, you need the enterprise, non-free version.
Scalability is hard with Couch, from what I can tell - certainly not
the ability just to add a new node for better performance like you
can with Riak. For me, this is a killer feature of Riak.
Couch has a nice subscription mechanism for changes to the database,
which allows you to set triggers and the like. Don't be fooled by
the talk of document versioning though - it is built in, but it is
purely a mechanism for the MVCC (replication and concurrency)
mechanism to work and old versions of documents are specifically
removed whenever the database is compacted.
This page has a high-level comparison of a number of NoSQL options,
including Riak and CouchDB, which was generally considered to be
pretty reasonable: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Hopefully that's a reasonable representation of the two systems. I
will let more seasoned pros correct and expand on the above as
necessary!
Cheers,
Jamie.
PS: Hello, list!
On Fri, Jan 28, 2011 at 21:44, Joshua Partogi
<[email protected]> wrote:
Hi there.
Has anyone here done any comparison between Riak and CouchDB? I am
interested to see how similar and different Riak compared to Couch.
If this can be added to the Riak wiki, I think it would be great for
all of us here.
Thanks heaps.
Kind regards,
Joshua.
--
http://twitter.com/jpartogi
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com