Hi, 2014-07-04 19:05 GMT+09:00 Andres Freund <and...@2ndquadrant.com>:
> On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote: > > > > >Yea. I certainly disagree with the patch in it's current state because > it > > >copies the same 15 lines several times with a two word difference. > > >Independent of whether we want those options, I don't think that's going > > >to fly. > > > > I liked a simple static string for the different variants, which means > > replication. Factorizing out the (large) common part will mean malloc & > > sprintf. Well, why not. > > It sucks from a maintenance POV. And I don't see the overhead of malloc > being relevant here... > > > >>OTOH, we've almost reached the consensus that supporting gaussian > > >>and exponential options in \setrandom. So I think that you should > > >>separate those two features into two patches, and we should apply > > >>the \setrandom one first. Then we can discuss whether the other patch > > >>should be applied or not. > > > > >Sounds like a good plan. > > > > Sigh. I'll do that as it seems to be a blocker... > I still agree with Fabien-san. I cannot understand why our logical proposal isn't accepted... I think we also need documentation about the actual mathematical > behaviour of the randomness generators. > > + <para> > > + With the gaussian option, the larger the > <replaceable>threshold</>, > > + the more frequently values close to the middle of the interval > are drawn, > > + and the less frequently values close to the <replaceable>min</> > and > > + <replaceable>max</> bounds. > > + In other worlds, the larger the <replaceable>threshold</>, > > + the narrower the access range around the middle. > > + the smaller the threshold, the smoother the access pattern > > + distribution. The minimum threshold is 2.0 for performance. > > + </para> > > The only way to actually understand the distribution here is to create a > table, insert random values, and then look at the result. That's not a > good thing. > That's right. Therefore, we create command line option to easy to understand parametrized Gaussian distribution. When you want to know the parameter of distribution, you can use command line option like under followings. [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=10 starting vacuum...end. transaction type: Exponential distribution TPC-B (sort of) scaling factor: 1 exponential threshold: 10.00000 decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0% highest/lowest percent of the range: 9.5% 0.0% [nttcom@localhost postgresql]$ contrib/pgbench/pgbench --exponential=5 starting vacuum...end. transaction type: Exponential distribution TPC-B (sort of) scaling factor: 1 exponential threshold: 5.00000 decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4% highest/lowest percent of the range: 4.9% 0.0% If you have a better method than our method, please share us. > > The caveat that I have is that without these options there is: > > > > (1) no return about the actual distributions in the final summary, which > > depend on the threshold value, and > > > > (2) no included mean to test the feature, so the first patch is less > > meaningful if the feature cannot be used simply and require a custom > script. > > I personally agree that we likely want that as an additional > feature. Even if just because it makes the results easier to compare. > If we can do positive and logical discussion, I will agree with the proposal about separate patches. However, I think that most opposite hacker decided by his feelings... Actuary, he didn't answer to our proposal about understanding the parametrized distribution... So I also think it is blocker. Command line feature is also needed. Besides, is there a other good method? Please share us. Best regards, -- Mitsumasa KONDO