On Apr 19, 2010, at 10:10 AM, Eric Casteleijn wrote:
> On 04/19/2010 09:41 AM, Adam Kocoloski wrote:
>> On Apr 17, 2010, at 11:09 AM, Eric Casteleijn wrote:
>>
>>> On 04/16/2010 04:46 AM, wolfgang haefelinger wrote:
>>>> Thanks Robert
>>>>
>>>> for your answer. However, it is not exactly what I was looking for
>>>> (due to my inappropriate problem description).
>>>>
>>>> Firstly, I do want to have the document instead of the time stamp in
>>>> order to avoid that additional document fetch. That's obviously easy
>>>> to fix:
>>>>
>>>> function(doc) { //
>>>> emit([doc.name, doc.timestamp], doc);
>>>> }
>>>
>>> Don't do that, it's unnecessary, because you can always call any view with
>>> '?include_docs=true' and it will add a 'doc' member to each row, containing
>>> the document, and worse than that, it's harmful, as it makes the indexes
>>> stored on disk many times larger than they need to be. (Depending on the
>>> size of your documents this can really make a huge difference, anecdotal
>>> evidence suggests: gwibber used to do this, and when I changed it, the
>>> indexes stored on disk decreased some 90% in size.)
>>>
>>> If you always want the whole document, just emit null for a value and
>>> always call the view with include_docs.
>>>
>>> If there are cases where you don't want the whole document, decide which
>>> data you need and only emit that.
>>
>> Hi Eric, I don't think its correct to have a blanket recommendation to
>> always use include_docs=true. For large range queries on a view the query
>> performance will be much better - up to 10x better throughput on large DBs
>> in my experience - if the doc is already included. Yes, the view index will
>> balloon in size, but some people may be willing to make that tradeoff.
>> Cheers,
>
> Oops, thanks for catching that Adam, and my apologies, that was rather
> myopic. I didn't think about the other side of the tradeoff, but that makes a
> lot of sense.
>
> I still wonder in that case if there is something you can do to shrink the
> stored views somewhat: gwibber had a number of views that emitted the whole
> document, but those documents (typically representing a twitter or identi.ca
> message) weren't very large in themselves. My database, after compaction was
> something between 70 and 80 MB, whereas the indexes took over a GB. Since
> gwibber+desktopcouch run on the desktop, where only one client typically
> talks to couch, I still think we made the right decision to sacrifice speed
> for diskspace. On a server, both are important though, considering we host
> multiple couchdbs per user. Luckily we don't compute the views for the
> gwibber dbs server side, but I'm sure it's something we'll run into again
> elsewhere.
>
Were the view indices also compacted? If so, that's very surprising to me. I
should double-check our numbers, but I seem to remember the compacted view
indices for our case (which had similarly-sized documents) being comparable in
size to the DBs.
There are a few things we can do to decrease the size of uncompacted view
indices. Chief among those is to put a lower bound on the size of a view index
write, as reported by Henrik Jensen last month (COUCHDB-700). Cheers,
Adam
> --
> eric casteleijn
> https://code.launchpad.net/~thisfred
> Canonical Ltd.
>