On 04/19/2010 09:41 AM, Adam Kocoloski wrote:
On Apr 17, 2010, at 11:09 AM, Eric Casteleijn wrote:
On 04/16/2010 04:46 AM, wolfgang haefelinger wrote:
Thanks Robert
for your answer. However, it is not exactly what I was looking for
(due to my inappropriate problem description).
Firstly, I do want to have the document instead of the time stamp in
order to avoid that additional document fetch. That's obviously easy
to fix:
function(doc) { //
emit([doc.name, doc.timestamp], doc);
}
Don't do that, it's unnecessary, because you can always call any view with
'?include_docs=true' and it will add a 'doc' member to each row, containing the
document, and worse than that, it's harmful, as it makes the indexes stored on
disk many times larger than they need to be. (Depending on the size of your
documents this can really make a huge difference, anecdotal evidence suggests:
gwibber used to do this, and when I changed it, the indexes stored on disk
decreased some 90% in size.)
If you always want the whole document, just emit null for a value and always
call the view with include_docs.
If there are cases where you don't want the whole document, decide which data
you need and only emit that.
Hi Eric, I don't think its correct to have a blanket recommendation to always
use include_docs=true. For large range queries on a view the query performance
will be much better - up to 10x better throughput on large DBs in my experience
- if the doc is already included. Yes, the view index will balloon in size,
but some people may be willing to make that tradeoff. Cheers,
Oops, thanks for catching that Adam, and my apologies, that was rather
myopic. I didn't think about the other side of the tradeoff, but that
makes a lot of sense.
I still wonder in that case if there is something you can do to shrink
the stored views somewhat: gwibber had a number of views that emitted
the whole document, but those documents (typically representing a
twitter or identi.ca message) weren't very large in themselves. My
database, after compaction was something between 70 and 80 MB, whereas
the indexes took over a GB. Since gwibber+desktopcouch run on the
desktop, where only one client typically talks to couch, I still think
we made the right decision to sacrifice speed for diskspace. On a
server, both are important though, considering we host multiple couchdbs
per user. Luckily we don't compute the views for the gwibber dbs server
side, but I'm sure it's something we'll run into again elsewhere.
--
eric casteleijn
https://code.launchpad.net/~thisfred
Canonical Ltd.