Ladislav, For someone who doesn't really understand your points make a lot of sense to me :-)
As you say unless we write our MR in erlang its probably not a great fit for our document size. But providing we don't have to rebuild them to often it does make for a good document store and ETL mechanism for feeding a OLAP engine. Thanks this and Matthieu's responses help greatly. Mike -----Original Message----- From: Ladislav Thon [mailto:[email protected]] Sent: 25 May 2012 16:17 To: [email protected] Subject: Re: Am I doing something fundamentally wrong? > > Clearly I seem to be a bit of a loan voice on this as everyone skirts > around the why do views take so long to build, why do they only run on one > CPU and why do they take up so much space I don't really understand CouchDB, so I'm not afraid to (try to) answer this and be (quite possibly) wrong :-) 1. "why do views take so long to build" -- because for every document, a JavaScript function has to be executed. This function is executed by a separate process (view server), in this case the "couchjs" process. As your documents are fairly large, this might incur significant serialization/deserialization overhead. Views can also be written in Erlang, AFAIK, and I bet that would be a hell of a lot faster. 2. "why do they only run on one CPU" -- because order of processing still matters here (to make incremental view updates possible), even if it is called "map reduce". It would surely be possible to write a parallel implementation, but it's more tricky than it looks on the first sight. Noone just did it yet. 3. " why do they take up so much space" -- because of CouchDB's append-only B-tree nature. Your views seem to have pretty random keys (as the keys are IDs of your documents), which means that a lot of inner nodes have to be created only to be discarded few moments later. Note that discarded here means that no pointer points at them, but they are still lying on the disk. All in all, I'd say that CouchDB isn't exactly great fit for an OLAP-style workload. LT
