Hi, Just an opinion which may work for you as for the others. I am not a developer, so, if anyone has knowledge that I may be wrong, please, by all means, correct me.
CouchDB is built in Erlang for the core implementation with JavaScript as user interface. To take advantage of the Erlang strength and not only, parallel builds may be more appropriate than serial ones. Erlang is not as fast as C/C++, so, serial queries are slower than in other databases. There are other advantages though, but I am not going to enter into details. To make the long story short, I think having smaller databases is a better design than using one huge database because of two reasons: parallel computations and reduced time in search per database. So, if you plan to have a larger database, maybe BigCouch may be a more suitable solution than simple CouchDB. It may not be the fastest solution on the market, but for sure is in one of the safest for your data (my opinion here, not a proven fact). About logs being too noise, that's a matter of tastes, I think. From my experience with Erlang, logs do not slow down that much the application if they are structured properly (I doubt the CouchDB devs didn't take that into consideration). Just to give a simple basic example, Erlang has two advantages here: "let it crash" philosophy and messaging system to communicate in between threads. If you build your log as an independent gen_server (which I think it is in the case of CouchDB) which listens to other gen_server's messages and write them down on the harddisk, the only true inconvenient is that if the queue of messages is longer than predefined queue, your log server will just crash (and restarted by the application monitor according to a predefined revival schema) and you lose some events in the log, but that doesn't affect the rest of the application. Another inconvenient for this architecture is that the log events are not ordered, but written in a more or less chaotic mode as different threads may compute faster or slower the given information and, subsequently, they send the information to the log server faster or slower. Just to conclude this part, the noise in the CouchDB log is not so relevant for the overall computational speed of the application (please, note that I wrote "not so relevant" which is different from "not relevant"). About revision management, well, it depends what gets in your way from that: the size of the database or the update of your documents. If it is the size of your database, then there are few ways to avoid that (some solutions are already on the market). One way may be a round-robin algorithm on a set of databases which allows each database time for compaction. Of course, this is just a rudimentary example and it has a lot of inconveniences if you apply it just in this way. If it is for the time you need to acquire the revision number for updating a document, there are solutions to speed up the process, but those depend on your hardware capabilities. I remember that one dirty trick I used to avoid revision management was to send a delete command before to send the new version of the document. But, of course, that is if you don't need any of the key-value's stored in the previous version of the document. Another solution may be to keep tracking of the most important documents revisions in RAM, but that requires some RAM as you can imagine. So, yep, solutions are if you really like CouchDB, even if they require a bit of extra-work. :) Sharded database with replication seems a very nice combination. So, I think it's up to the project design here. I hope this message will help you in designing your project. I do not know about your project, so, if I gave unsuitable suggestions to you, please, forgive me. Also, if someone has knowledge that my suggestions may not work or I gave wrong info, please, correct me. In the end, I am a simple user with limited knowledge. Good luck! CGS On Thu, May 10, 2012 at 9:45 AM, bryan rasmussen <[email protected]>wrote: > Hi, > > I really like working with couchdb, one of the benefits it gives at > the beginning of a project is the ability to play with data, to > determine the right data structure that one actually needs (since I'm > an XML guy this is pretty important to me[I also think couchdb does > this much better than XQuery based DBs - too strongly typed]) > > So anyway, because I like couchdb I have embarked on a apache/solr > logs analysis project for which couchdb does not seem to be > well-suited (which I knew beforehand but was using couchdb as quick > proof of concept for some of the things I wanted to do.) > > the drawbacks are: > > logs pile up quickly, so the project is write intensive. Since the > data is being used internally for reports it is not likely to be read > intensive. > Should not need any revision management. > A lot of the benefits of db replication will not be useful. > Lots of views to data need to be provided. > > So has anyone ever had a similar situation, and what did you move to > as your DB. Or how did you structure you couchdb solution to make it > more suitable? > > Thanks, > Bryan Rasmussen >
