On Dec 9, 2010, at 10:49 AM, Paul Davis wrote: > On Thu, Dec 9, 2010 at 10:47 AM, Jan Lehnardt <[email protected]> wrote: >> >> On 9 Dec 2010, at 15:37, Paul Davis wrote: >> >>> On Thu, Dec 9, 2010 at 7:51 AM, Jan Lehnardt <[email protected]> wrote: >>>> Hi Huw, >>>> >>>> >>>> On 9 Dec 2010, at 13:32, Huw Selley wrote: >>>> >>>>> Hi, >>>>> >>>>> I read on http://guide.couchdb.org/draft/performance.html that >>>>> >>>>> "Views load a batch of updates from disk, pass them through the view >>>>> engine, and then write the view rows out. Each batch is a few hundred >>>>> documents, so the writer can take advantage of the bulk efficiencies we >>>>> see in the next section." >>>>> >>>>> Is there a method to change the batch size? I would like to try measure >>>>> the impact of using smaller and larger batches. >>>> >>>> Thanks for helping to profile things. You may want to take this to >>>> [email protected] as it is the development-related mailing list. >>>> >>>> For tuning these values, see src/couchdb/couch_view_updater.erl >>>> >>>> The `update()` function has these lines: >>>> >>>> {ok, MapQueue} = couch_work_queue:new(100000, 500), >>>> {ok, WriteQueue} = couch_work_queue:new(100000, 500), >>>> >>>> They set up a queue for mapping and writing each. The parameters are >>>> >>>> couch_work_queue:new(MaxSize, MaxItems) >>>> >>>> If either maximum is hit, the queue is deemed full. >>>> >>>> Note: This is from about 30 seconds of looking at the source, so I >>>> might miss a subtlety or three. >>>> >>>> Cheers >>>> Jan >>>> -- >>>> >>>> >>>> >>> >>> The only real subtlety is that we don't wait for a minimum amount to >>> be inserted into the queue. Playing with larger or smaller queues on >>> either side might be an interesting bit. Also, for testing it might >>> not be a bad idea to add config values for these values. >> >> >> Good thinking, I made a patch: >> >> >> https://github.com/janl/couchdb/commit/547691a9f4b9895086f2763af84e1cc459e4d72c >> >> Branch: >> >> https://github.com/janl/couchdb/tree/config-view-batches >> >> "Compiles for me". >> >> To make this proper, we probably want to move the lookups into >> couch_view_group:init/3 and pass the values down, but it should >> be ok as is. >> > > Probably not worth it. ets looks like that are quick, and as a > percentage of a view build are going to be fairly inconsequential. > Though, you could make a case about coupling.
This issue was initially raised in https://issues.apache.org/jira/browse/COUCHDB-700 . I'm not opposed to making the queue sizes configurable, but I think the more important fix by far is to be able to configure a minimum number of items in the work unit sent to the reducer. Cheers, Adam
