On Mar 2, 2011, at 9:30 AM, Rob Pettefar wrote:

> On 02/03/2011 13:05, Bruno Rohée wrote:
>> On Wed, Mar 2, 2011 at 12:33 PM, Rob Pettefar
>> <[email protected]>  wrote:
>>>  Hi guys
>>> I've got a question about improving the speed at which views are updated in
>>> our system:
>>> 
>>> Currently we use a set of database documents to make up whole files after
>>> they have been requested out of the system. When submitted back into the
>>> database the old docs that held data are deleted and new docs are created in
>>> their place. This was done for simplicity of design. However when we have
>>> large file submitted into the system this will involve the deletion and
>>> creation of a large number of docs being deleted and created (we are looking
>>> at around 4,000 deletes and 4,000 new docs).
>>> The views then take some time to update after this has happened.
>>> 
>>> If we were to instead, modify the contents of the 4,000 documents (perhaps
>>> with some deletions and creations) would this reduce the amount of updates
>>> the system would have to put though the views and thus, reduce the time
>>> needed to update the views?
>> I think it's pretty dependent on your data, whether your new documents
>> are mostly identical or mostly different from the old ones. If it's
>> the former the process can be sped up quite a bit as the map function
>> will only be called on the changed documents, if it's the later not
>> much speed gain to be expected IMHO.
> This would probably involved writing over the content of the document, even 
> with the same data as before, inuring a new revision number. I guess that 
> this would cause the map functions to be run over it again.
> However I think the key thing here is a question of how mass deletions are 
> treated by the view updater.

Hi Rob, the view updater walks the database update feed and splits the entries 
into normal documents and deleted ones.  The deleted documents are not sent to 
the view server OS process, but otherwise they traverse a pretty similar path 
through the code.  In the end the updater does batch modifications of the view 
indexes, removing the KVs corresponding to old versions of documents and 
inserting the KVs from the map phase of the MR job.

The key point is that even when you modify documents the view updater still 
needs to delete all the KVs associated with the old version of the document.  
Deleting and then re-creating documents might introduce a few extra lookups, 
but in my opinion you aren't likely to see any major indexing speedup if you 
re-architect to do updates instead.  Happy to be proven wrong though.  Best,

Adam

Reply via email to