Fabien COELHO <coe...@cri.ensmp.fr> writes: >>>> I remain of the opinion that we ought to simply rip out support for >>>> zipfian with s < 1.
>>> +1 to that >> If this is done, some people with zipfian distribution that currently >> work might be unhappy. > After giving it some thought, I think that this cannot be fully fixed for > 12. Just to clarify --- my complaint about "over engineering" referred to the fact that a cache exists at all; fooling around with the overflow behavior isn't really going to answer that. The bigger picture here is that a benchmarking tool that contains its own performance surprises is not a nice thing to have. It's not very hard to imagine somebody wasting a lot of time trying to explain weird results, only to find out that the cause is unstable performance of random_zipfian. Or worse, people might draw totally incorrect conclusions if they fail to drill down enough to realize that there's a problem in pgbench rather than on the server side. > Given the constraint of Jim Gray's approximated method for s in (0, 1), > which really does zipfian for the first two integers and then uses an > exponential approximation, the only approach is that the parameters must > be computed in a partial eval preparation phase before the bench code is > run. This means that only (mostly) constants would be allowed as > parameters when s is in (0, 1), but I think that this is acceptable > because anyway the method fundamentaly requires it. Yeah, if we could store all the required harmonic numbers before the test run timing starts, that would address the concern about stable performance. But I have to wonder whether zipfian with s < 1 is useful enough to justify so much code. > The attached other attached patch illustrate what I call poor performance > for stupid parameters (no point in doing zipfian on 2 integers…) : > ... > Maybe the implementation could impose that s is at least 1.001 to avoid > the lower performance? I was wondering about that too. It seems like it'd be a wise idea to further constrain s and/or n to ensure that the s > 1 code path doesn't do anything too awful ... unless someone comes up with a better implementation that has stable performance without such constraints. regards, tom lane