On 11 January 2018 at 19:41, Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote:
> Two, what to do when the memory limit is reached. With the old > accounting, this was easy, because we'd decide for each subtransaction > independently whether to spill it to disk, when it has reached its 4096 > limit. Now, we are looking at a global limit, so we have to find a > transaction to spill in some other way. The proposed patch searches > through the entire list of transactions to find the largest one. But as > the patch says: > > "XXX With many subtransactions this might be quite slow, because we'll > have to walk through all of them. There are some options how we could > improve that: (a) maintain some secondary structure with transactions > sorted by amount of changes, (b) not looking for the entirely largest > transaction, but e.g. for transaction using at least some fraction of > the memory limit, and (c) evicting multiple transactions at once, e.g. > to free a given portion of the memory limit (e.g. 50%)." AIUI spilling to disk doesn't affect absorbing future updates, we would just keep accumulating them in memory right? We won't need to unspill until it comes time to commit. Is there any actual advantage to picking the largest transaction? it means fewer spills and fewer unspills at commit time but that just a bigger spike of i/o and more of a chance of spilling more than necessary to get by. In the end it'll be more or less the same amount of data read back, just all in one big spike when spilling and one big spike when committing. If you spilled smaller transactions you would have a small amount of i/o more frequently and have to read back small amounts for many commits. But it would add up to the same amount of i/o (or less if you avoid spilling more than necessary). The real aim should be to try to pick the transaction that will be committed furthest in the future. That gives you the most memory to use for live transactions for the longest time and could let you process the maximum amount of transactions without spilling them. So either the oldest transaction (in the expectation that it's been open a while and appears to be a long-lived batch job that will stay open for a long time) or the youngest transaction (in the expectation that all transactions are more or less equally long-lived) might make sense. -- greg