On 04/19/2010 09:41 AM, Adam Kocoloski wrote:
On Apr 17, 2010, at 11:09 AM, Eric Casteleijn wrote:

On 04/16/2010 04:46 AM, wolfgang haefelinger wrote:
Thanks Robert

for your answer. However, it is not exactly what I was looking for
(due to my inappropriate problem description).

Firstly, I do want to have the document instead of the time stamp in
order to avoid that additional document fetch. That's obviously easy
to fix:

function(doc) { //
  emit([doc.name, doc.timestamp], doc);
}

Don't do that, it's unnecessary, because you can always call any view with 
'?include_docs=true' and it will add a 'doc' member to each row, containing the 
document, and worse than that, it's harmful, as it makes the indexes stored on 
disk many times larger than they need to be. (Depending on the size of your 
documents this can really make a huge difference, anecdotal evidence suggests: 
gwibber used to do this, and when I changed it, the indexes stored on disk 
decreased some 90% in size.)

If you always want the whole document, just emit null for a value and always 
call the view with include_docs.

If there are cases where you don't want the whole document, decide which data 
you need and only emit that.

Hi Eric, I don't think its correct to have a blanket recommendation to always 
use include_docs=true.  For large range queries on a view the query performance 
will be much better - up to 10x better throughput on large DBs in my experience 
- if the doc is already included.  Yes, the view index will balloon in size, 
but some people may be willing to make that tradeoff.  Cheers,

Oops, thanks for catching that Adam, and my apologies, that was rather myopic. I didn't think about the other side of the tradeoff, but that makes a lot of sense.

I still wonder in that case if there is something you can do to shrink the stored views somewhat: gwibber had a number of views that emitted the whole document, but those documents (typically representing a twitter or identi.ca message) weren't very large in themselves. My database, after compaction was something between 70 and 80 MB, whereas the indexes took over a GB. Since gwibber+desktopcouch run on the desktop, where only one client typically talks to couch, I still think we made the right decision to sacrifice speed for diskspace. On a server, both are important though, considering we host multiple couchdbs per user. Luckily we don't compute the views for the gwibber dbs server side, but I'm sure it's something we'll run into again elsewhere.

--
eric casteleijn
https://code.launchpad.net/~thisfred
Canonical Ltd.

Reply via email to