Le 03/02/15 10:54, Howard Chu a écrit : > Howard Chu wrote: >> Emmanuel Lécharny wrote: >>> Le 03/02/15 09:41, Howard Chu a écrit : >>>> Emmanuel Lécharny wrote: >>>>> Le 03/02/15 05:11, Howard Chu a écrit : >>>>>> Another option here is simply to perform batching. Now that we have >>>>>> the TXN api exposed in the backend interface, we could just batch up >>>>>> e.g. 500 entries per txn. much like slapadd -q already does. >>>>>> Ultimately we ought to be able to get syncrepl refresh to occur at >>>>>> nearly the same speed as slapadd -q. >>>>> >>>>> Batching is ok, except that you never know how many entries you'll >>>>> going >>>>> to have, thus you will have to actually write the data after a >>>>> period of >>>>> time, even if you don't have the 500 entries. >>>> >>>> This isn't a problem - we know exactly when refresh completes, so we >>>> can finish the batch regardless of how many entries are left over. >>> >>> True for Refresh. I was thinking more specifically of updates when we >>> are connected. >> >> None of this is for Persist phase, I have only been talking about >> refresh. Thanks for the clarification.
>> >>>> Testing this out with the experimental ITS#8040 patch - with lazy >>>> commit the 2.8M entries (2.5GB data) takes ~10 minutes for the refresh >>>> to pull them across. With batching 500 entries/txn+lazy commit it >>>> takes ~7 minutes, a decent improvement. It's still 2x slower than >>>> slapadd -q though, which loads the data in 3-1/2 minutes. >>> >>> Not bad at all. What makes it 2x slower, btw? >> >> Still looking into it. slapadd -q uses 2 threads, one to parse the LDIF >> and one to write to the DB. syncrepl consumer only uses 1 thread. >> Probably if we split reading from the network apart from writing to the >> DB, that would make the difference. That would worth a try. Although I can expect the disk access to be the bottleneck here, and using two threads migth swamp the memory, up to a point. Intersting problem, intersting bechnhmark to conduct ;-) Emmanuel.