On 19/09/2018 20:23, Peter Geoghegan wrote: > Attached is v5, So. I don't know much about the btree code, so don't believe anything I say.
I was very interested in the bloat test case that you posted on 2018-07-09 and I tried to understand it more. The current method for inserting a duplicate value into a btree is going to the leftmost point for that value and then move right until we find some space or we get "tired" of searching, in which case just make some space right there. The problem is that it's tricky to decide when to stop searching, and there are scenarios when we stop too soon and repeatedly miss all the good free space to the right, leading to bloat even though the index is perhaps quite empty. I tried playing with the getting-tired factor (it could plausibly be a reloption), but that wasn't very successful. You can use that to postpone the bloat, but you won't stop it, and performance becomes terrible. You propose to address this by appending the tid to the index key, so each key, even if its "payload" is a duplicate value, is unique and has a unique place, so we never have to do this "tiresome" search. This makes a lot of sense, and the results in the bloat test you posted are impressive and reproducible. I tried a silly alternative approach by placing a new duplicate key in a random location. This should be equivalent since tids are effectively random. I didn't quite get this to fully work yet, but at least it doesn't blow up, and it gets the same regression test ordering differences for pg_depend scans that you are trying to paper over. ;-) As far as the code is concerned, I agree with Andrey Lepikhov that one more abstraction layer that somehow combines the scankey and the tid or some combination like that would be useful, instead of passing the tid as a separate argument everywhere. I think it might help this patch move along if it were split up a bit, for example 1) suffix truncation, 2) tid stuff, 3) new split strategies. That way, it would also be easier to test out each piece separately. For example, how much space does suffix truncation save in what scenario, are there any performance regressions, etc. In the last few versions, the patches have still been growing significantly in size and functionality, and most of the supposed benefits are not readily visible in tests. And of course we need to think about how to handle upgrades, but you have already started a separate discussion about that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services