Re: random() (was Re: New GUC to sample log queries)

2018-12-29 Thread Tom Lane
I wrote: > Thomas Munro writes: >> I was going to suggest that we might be able to use a single >> not-visible-to-users number that is mixed into the existing recipe, so >> that we only ever read urandom once for the cluster. > Yeah, I was thinking along similar lines, but there's a problem: >

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > Looking at this, I seem to remember that we considered doing exactly this > awhile ago, but refrained because there was concern about depleting the > system's reserve of entropy if we have a high backend spawn rate, and it > didn't seem like there was a security reason to insist on

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > Thomas Munro writes: >> +1, but I wonder if just separating them is enough. Is our seeding >> algorithm good enough for this new purpose? The initial seed is 100% >> predictable to a logged in user (it's made from the backend PID and >> backend start time, which we tell you), and not

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > On further reflection, it seems likely that in most installations a lot > of processes never invoke drandom()/setseed() at all, making such work > in InitProcessGlobals a waste of cycles. Probably a better idea is to > have drandom() initialize the seed on first use, if it wasn't

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
Fabien COELHO writes: >> but I don't feel a need for replacing the algorithm. > Hmmm. Does it mean that you would veto any change, even if the speed > concern is addressed (i.e. faster/not slower with better quality)? Well, not veto exactly, but I'd be suspicious of it. First, erand48 has

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
Hello again, - lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw) 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 | Small benchmark on my laptop with gcc-7.3 -O3: - pg_lrand48 takes 4.0 seconds to generate 1

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Adrien NAYRAT
On 12/26/18 7:45 PM, Tom Lane wrote: I wonder whether we should establish a project policy to avoid use of random() for internal purposes, ie try to get to a point where drandom() is the only caller in the backend. FYI I picked the idea from auto_explain. Maybe we will have to change

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
- lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw) 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 | See http://vigna.di.unimi.it/xorshift/ Small benchmark on my laptop with gcc-7.3 -O3: - pg_lrand48

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
About costs, not counting array accesses: - lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xorshift+ (128 bits state as 2 uint64) is 13 ops ( 5 =, 0 *, 1 +, 3 >>, 4 ^) - xororshift128+ (idem) is 17 ops ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
Hello Tom, Another idea, which would be a lot less prone to breakage by add-on code, is to change drandom() and setseed() to themselves use pg_erand48() with a private seed. The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits architectures (which it is probably not,

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Fabien COELHO writes: >> Another idea, which would be a lot less prone to breakage by >> add-on code, is to change drandom() and setseed() to themselves >> use pg_erand48() with a private seed. > The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits > architectures (which it

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
I wrote: > Peter Geoghegan writes: >> We're already making fairly broad assumptions about our having control >> of the backend's PRNG state within InitProcessGlobals(). How should >> this affect the new drandom()/setseed() private state, if at all? > I would think that InitProcessGlobals would

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Thomas Munro writes: > On Thu, Dec 27, 2018 at 3:55 PM Peter Geoghegan wrote: >> On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: >>> The point here is not to be cryptographically strong at every single >>> place where the backend might want a random number; I think we're >>> all agreed that we

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: >> Now, we could probably fix that with some less intrusive patch than >> #define'ing random() --- in particular, if we give drandom and setseed >> their own private PRNG state, we've really fixed the security exposure >>

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Thomas Munro
On Thu, Dec 27, 2018 at 3:55 PM Peter Geoghegan wrote: > On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: > > The point here is not to be cryptographically strong at every single > > place where the backend might want a random number; I think we're > > all agreed that we don't need that. To me,

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: > The point here is not to be cryptographically strong at every single > place where the backend might want a random number; I think we're > all agreed that we don't need that. To me, the point is to ensure that > the user-accessible random

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 08:46:25PM -0500, Tom Lane wrote: > One thing I was wondering is if we should try to enforce a policy > like this by putting, say, > > #define random() pg_random() > > into port.h or so. That would have the advantages of not having to touch > any existing calls and not

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Michael Paquier writes: > On Wed, Dec 26, 2018 at 01:45:00PM -0500, Tom Lane wrote: >> A quick grep says that there's a dozen or so callers, so this patch >> certainly isn't the only offender ... but should we make an effort >> to convert them all to use, say, pg_erand48()? I think all the >>

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 1:21 PM Tom Lane wrote: > One thing we'd have to think about if we want to take this seriously > is whether a process-wide PRNG state is really adequate; if you're > trying to make a particular call site be deterministic, you'd likely > wish it weren't interfered with by

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 12:19 PM Tom Lane wrote: >> Replacing random() might actually make that easier not harder, since >> we'd have more control over what happens when. > That does seem useful. I'm in favor. But why does the function to seed > the internal PRNG have

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 12:19 PM Tom Lane wrote: > Replacing random() might actually make that easier not harder, since > we'd have more control over what happens when. That does seem useful. I'm in favor. But why does the function to seed the internal PRNG have to be loadable? Can't it just be

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > I'm beginning to think that the technique that I came up with to make > "getting tired" deterministic ought to be supporting as a debugging > option if we're to do away with internal use of the generic/seedable > backend PRNG. I have no objection to providing such a

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 10:45 AM Tom Lane wrote: >> I wonder whether we should establish a project policy to avoid use >> of random() for internal purposes, ie try to get to a point where >> drandom() is the only caller in the backend. A quick grep says >> that there's

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 10:45 AM Tom Lane wrote: > I wonder whether we should establish a project policy to avoid use > of random() for internal purposes, ie try to get to a point where > drandom() is the only caller in the backend. A quick grep says > that there's a dozen or so callers, so this

random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Alvaro Herrera writes: > Thanks! I pushed this with two changes -- one was to reword the docs a > bit more, and the other was to compute in_sample only if it's going to > be used (when exceeded is true). I hope this won't upset any compilers ... > I wonder if there's any sensible way to verify