Hello Heikki,

If you reject it, you can also remove the gaussian and exponential random
distribution which is near useless without a mean to add a minimal
pseudo-random stage, for which "(x * something) % size" is a reasonable
approximation, hence the modulo submission.

I'm confused. The gaussian and exponential patch was already committed.

Yes.

They are significant patches that really involved significant work, and which are only really useful with a complementary stupid 10 lines patch which is being rejected without understanding why it is needed.

Are you saying that it's not actually useful, and should be reverted? That doesn't make sense to me, gaussian and exponential distributed values seems quite useful to me in test scripts.

Yes and no.

Currently these distributions are achieved by mapping a continuous function onto integers, so that neighboring integers get neighboring number of draws, say with size=7:

  #draws     10 6 3 1 0 0 0  // some exponential distribution
  int drawn   0 1 2 3 4 5 6

Although having an exponential distribution of accesses on tuples is quite reasonable, the likelyhood there would be so much correlation between neighboring values is not realistic at all. You need some additional shuffling to get there.

I don't understand what that pseudo-random stage you're talking about is. Can you elaborate?

The pseudo random stage is just a way to scatter the values. A basic approach to achieve this is "i' = (i * large-prime) % size", if you have a modulo. For instance with prime=5 you may get something like:

  #draws     10 6 3 1 0 0 0
  int drawn   0 1 2 3 4 5 6 (i)
  scattered   0 5 3 1 6 4 2 (i' = 5 i % 7)

So the distribution becomes:

  #draws     10 1 0 3 0 6 0
  scattered   0 1 2 3 4 5 6

Which is more interesting from a testing perspective because it removes the neighboring value correlation.

A better approach is to use a hash function. "i' = hash(i) % size",
although it skews the distribution some more, but the quality of the shuffling is much better than with the mult-modulo version above.
Note that you need a modulo as well...

I must say that I'm appaled by a decision process which leads to such results, with significant patches passed, and the tiny complement to make it really useful (I mean not on the paper or on the feature list, but in real life) is rejected...

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to