On May 27, 2010, at 3:16 AM, Robert Buck wrote:

> On Wed, May 26, 2010 at 9:01 PM, Randall Leeds <[email protected]> 
> wrote:
>> On Wed, May 26, 2010 at 15:39, J Chris Anderson <[email protected]> wrote:
>>> 
>>> On May 26, 2010, at 1:53 PM, Robert Buck wrote:
>>> 
>>>> Hi Folks,
>>>> 
>>>> Thank you for kindly answering my last round of questions. Here is
>>>> another question related to Couch:
>>>> 
>>>> What sort of locality of reference exists in Couch with respect to
>>>> retrieval of state ? Is locality of reference solely at the document
>>>> level, or is locality of reference also exhibited elsewhere that
>>>> developers can take advantage of ?
>>>> 
>>> 
>>> Hmm, I'm not sure what you mean by locality of reference (I know it in the 
>>> context of performance optimizations)
>> 
>> Same.
>> 
>> If you're using it this way, I guess the best answer is that documents
>> might be very sparsely spread out across the disk in a large database
>> that has not been compacted for a long time. Documents updated around
>> the same time might be closer to the tail of the file. We really rely
>> on the filesystem cache to make this something we can forget about.
>> 
>> Does that answer anything at all?
> 
> That's good, that confirms what I have read, just wanted to verify.
> Some database technologies allow you to "group" data more closely
> together, some call this a container, others call it a segment. From
> what I read Couch apparently has no such feature.
> 
> Thanks so much for your help.

Another performance thing you can do it, is use document ids that are near each 
other in the keyspace. This will give better cache-locality in the filesystem 
cache. Luckily, CouchDB does a good job of this by default, if you use the 
built-in uuid generator.

There were some analysis / benchmarks on this list a few months ago about the 
difference between sorted, sorry I don't have the links handy. Upshot, small, 
sequential-ish ids are the fastest.

Chris


Reply via email to