Re: when couchdb is not right for my use case

CGS Thu, 10 May 2012 08:28:33 -0700

Hi,

Just an opinion which may work for you as for the others. I am not a
developer, so, if anyone has knowledge that I may be wrong, please, by all
means, correct me.

CouchDB is built in Erlang for the core implementation with JavaScript as
user interface. To take advantage of the Erlang strength and not only,
parallel builds may be more appropriate than serial ones. Erlang is not as
fast as C/C++, so, serial queries are slower than in other databases. There
are other advantages though, but I am not going to enter into details. To
make the long story short, I think having smaller databases is a better
design than using one huge database because of two reasons: parallel
computations and reduced time in search per database. So, if you plan to
have a larger database, maybe BigCouch may be a more suitable solution than
simple CouchDB. It may not be the fastest solution on the market, but for
sure is in one of the safest for your data (my opinion here, not a proven
fact).

About logs being too noise, that's a matter of tastes, I think. From my
experience with Erlang, logs do not slow down that much the application if
they are structured properly (I doubt the CouchDB devs didn't take that
into consideration). Just to give a simple basic example, Erlang has two
advantages here: "let it crash" philosophy and messaging system to
communicate in between threads. If you build your log as an independent
gen_server (which I think it is in the case of CouchDB) which listens to
other gen_server's messages and write them down on the harddisk, the only
true inconvenient is that if the queue of messages is longer than
predefined queue, your log server will just crash (and restarted by the
application monitor according to a predefined revival schema) and you lose
some events in the log, but that doesn't affect the rest of the
application. Another inconvenient for this architecture is that the log
events are not ordered, but written in a more or less chaotic mode as
different threads may compute faster or slower the given information and,
subsequently, they send the information to the log server faster or slower.
Just to conclude this part, the noise in the CouchDB log is not so relevant
for the overall computational speed of the application (please, note that I
wrote "not so relevant" which is different from "not relevant").

About revision management, well, it depends what gets in your way from
that: the size of the database or the update of your documents. If it is
the size of your database, then there are few ways to avoid that (some
solutions are already on the market). One way may be a round-robin
algorithm on a set of databases which allows each database time for
compaction. Of course, this is just a rudimentary example and it has a lot
of inconveniences if you apply it just in this way. If it is for the time
you need to acquire the revision number for updating a document, there are
solutions to speed up the process, but those depend on your hardware
capabilities. I remember that one dirty trick I used to avoid revision
management was to send a delete command before to send the new version of
the document. But, of course, that is if you don't need any of the
key-value's stored in the previous version of the document. Another
solution may be to keep tracking of the most important documents revisions
in RAM, but that requires some RAM as you can imagine. So, yep, solutions
are if you really like CouchDB, even if they require a bit of extra-work. :)

Sharded database with replication seems a very nice combination. So, I
think it's up to the project design here.

I hope this message will help you in designing your project. I do not know
about your project, so, if I gave unsuitable suggestions to you, please,
forgive me. Also, if someone has knowledge that my suggestions may not work
or I gave wrong info, please, correct me. In the end, I am a simple user
with limited knowledge.

Good luck!

CGS

On Thu, May 10, 2012 at 9:45 AM, bryan rasmussen
<[email protected]>wrote:

> Hi,
>
> I really like working with couchdb, one of the benefits it gives at
> the beginning of a project is the ability to play with data, to
> determine the right data structure that one actually needs (since I'm
> an XML guy this is pretty important to me[I also think couchdb does
> this much better than XQuery based DBs - too strongly typed])
>
> So anyway, because I like couchdb I have embarked on a apache/solr
> logs analysis project for which couchdb does not seem to be
> well-suited (which I knew beforehand but was using couchdb as quick
> proof of concept for some of the things I wanted to do.)
>
> the drawbacks are:
>
> logs pile up quickly, so the project is write intensive. Since the
> data is being used internally for reports it is not likely to be read
> intensive.
> Should not need any revision management.
> A lot of the benefits of db replication will not be useful.
> Lots of views to data need to be provided.
>
> So has anyone ever had a similar situation, and what did you move to
> as your DB. Or how did you structure you couchdb solution to make it
> more suitable?
>
> Thanks,
> Bryan Rasmussen
>

Re: when couchdb is not right for my use case

Reply via email to