Re: New GUC to sample log queries

2019-01-04 Thread Adrien Mobile
Le 4 janvier 2019 13:16:48 GMT+01:00, Peter Eisentraut a écrit : >On 19/11/2018 14:40, Tomas Vondra wrote: >> On 11/19/18 2:57 AM, Michael Paquier wrote: >>> On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: Since it's hard to come up with a concise name that will mention

Re: New GUC to sample log queries

2019-01-04 Thread Peter Eisentraut
On 19/11/2018 14:40, Tomas Vondra wrote: > On 11/19/18 2:57 AM, Michael Paquier wrote: >> On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: >>> Since it's hard to come up with a concise name that will mention sampling >>> rate >>> in the context of min_duration_statement, I think

Re: random() (was Re: New GUC to sample log queries)

2018-12-29 Thread Tom Lane
I wrote: > Thomas Munro writes: >> I was going to suggest that we might be able to use a single >> not-visible-to-users number that is mixed into the existing recipe, so >> that we only ever read urandom once for the cluster. > Yeah, I was thinking along similar lines, but there's a problem: >

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > Looking at this, I seem to remember that we considered doing exactly this > awhile ago, but refrained because there was concern about depleting the > system's reserve of entropy if we have a high backend spawn rate, and it > didn't seem like there was a security reason to insist on

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > Thomas Munro writes: >> +1, but I wonder if just separating them is enough. Is our seeding >> algorithm good enough for this new purpose? The initial seed is 100% >> predictable to a logged in user (it's made from the backend PID and >> backend start time, which we tell you), and not

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
I wrote: > On further reflection, it seems likely that in most installations a lot > of processes never invoke drandom()/setseed() at all, making such work > in InitProcessGlobals a waste of cycles. Probably a better idea is to > have drandom() initialize the seed on first use, if it wasn't

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Tom Lane
Fabien COELHO writes: >> but I don't feel a need for replacing the algorithm. > Hmmm. Does it mean that you would veto any change, even if the speed > concern is addressed (i.e. faster/not slower with better quality)? Well, not veto exactly, but I'd be suspicious of it. First, erand48 has

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
Hello again, - lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw) 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 | Small benchmark on my laptop with gcc-7.3 -O3: - pg_lrand48 takes 4.0 seconds to generate 1

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Adrien NAYRAT
On 12/26/18 7:45 PM, Tom Lane wrote: I wonder whether we should establish a project policy to avoid use of random() for internal purposes, ie try to get to a point where drandom() is the only caller in the backend. FYI I picked the idea from auto_explain. Maybe we will have to change

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
- lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xoshiro256** (256 bits states as 4 uint64) is 24 ops (18 if rot in hw) 8 =, 2 *, 2 +, 5 <<, 5 ^, 2 | See http://vigna.di.unimi.it/xorshift/ Small benchmark on my laptop with gcc-7.3 -O3: - pg_lrand48

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
About costs, not counting array accesses: - lrand48 (48 bits state as 3 uint16)is 29 ops (10 =, 8 *, 7 +, 4 >>) - xorshift+ (128 bits state as 2 uint64) is 13 ops ( 5 =, 0 *, 1 +, 3 >>, 4 ^) - xororshift128+ (idem) is 17 ops ( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2

Re: random() (was Re: New GUC to sample log queries)

2018-12-28 Thread Fabien COELHO
Hello Tom, Another idea, which would be a lot less prone to breakage by add-on code, is to change drandom() and setseed() to themselves use pg_erand48() with a private seed. The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits architectures (which it is probably not,

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Fabien COELHO writes: >> Another idea, which would be a lot less prone to breakage by >> add-on code, is to change drandom() and setseed() to themselves >> use pg_erand48() with a private seed. > The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits > architectures (which it

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
I wrote: > Peter Geoghegan writes: >> We're already making fairly broad assumptions about our having control >> of the backend's PRNG state within InitProcessGlobals(). How should >> this affect the new drandom()/setseed() private state, if at all? > I would think that InitProcessGlobals would

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Thomas Munro writes: > On Thu, Dec 27, 2018 at 3:55 PM Peter Geoghegan wrote: >> On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: >>> The point here is not to be cryptographically strong at every single >>> place where the backend might want a random number; I think we're >>> all agreed that we

Re: random() (was Re: New GUC to sample log queries)

2018-12-27 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: >> Now, we could probably fix that with some less intrusive patch than >> #define'ing random() --- in particular, if we give drandom and setseed >> their own private PRNG state, we've really fixed the security exposure >>

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Thomas Munro
On Thu, Dec 27, 2018 at 3:55 PM Peter Geoghegan wrote: > On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: > > The point here is not to be cryptographically strong at every single > > place where the backend might want a random number; I think we're > > all agreed that we don't need that. To me,

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 6:39 PM Tom Lane wrote: > The point here is not to be cryptographically strong at every single > place where the backend might want a random number; I think we're > all agreed that we don't need that. To me, the point is to ensure that > the user-accessible random

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Michael Paquier
On Wed, Dec 26, 2018 at 08:46:25PM -0500, Tom Lane wrote: > One thing I was wondering is if we should try to enforce a policy > like this by putting, say, > > #define random() pg_random() > > into port.h or so. That would have the advantages of not having to touch > any existing calls and not

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Michael Paquier writes: > On Wed, Dec 26, 2018 at 01:45:00PM -0500, Tom Lane wrote: >> A quick grep says that there's a dozen or so callers, so this patch >> certainly isn't the only offender ... but should we make an effort >> to convert them all to use, say, pg_erand48()? I think all the >>

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 1:21 PM Tom Lane wrote: > One thing we'd have to think about if we want to take this seriously > is whether a process-wide PRNG state is really adequate; if you're > trying to make a particular call site be deterministic, you'd likely > wish it weren't interfered with by

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 12:19 PM Tom Lane wrote: >> Replacing random() might actually make that easier not harder, since >> we'd have more control over what happens when. > That does seem useful. I'm in favor. But why does the function to seed > the internal PRNG have

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 12:19 PM Tom Lane wrote: > Replacing random() might actually make that easier not harder, since > we'd have more control over what happens when. That does seem useful. I'm in favor. But why does the function to seed the internal PRNG have to be loadable? Can't it just be

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > I'm beginning to think that the technique that I came up with to make > "getting tired" deterministic ought to be supporting as a debugging > option if we're to do away with internal use of the generic/seedable > backend PRNG. I have no objection to providing such a

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Peter Geoghegan writes: > On Wed, Dec 26, 2018 at 10:45 AM Tom Lane wrote: >> I wonder whether we should establish a project policy to avoid use >> of random() for internal purposes, ie try to get to a point where >> drandom() is the only caller in the backend. A quick grep says >> that there's

Re: random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Peter Geoghegan
On Wed, Dec 26, 2018 at 10:45 AM Tom Lane wrote: > I wonder whether we should establish a project policy to avoid use > of random() for internal purposes, ie try to get to a point where > drandom() is the only caller in the backend. A quick grep says > that there's a dozen or so callers, so this

random() (was Re: New GUC to sample log queries)

2018-12-26 Thread Tom Lane
Alvaro Herrera writes: > Thanks! I pushed this with two changes -- one was to reword the docs a > bit more, and the other was to compute in_sample only if it's going to > be used (when exceeded is true). I hope this won't upset any compilers ... > I wonder if there's any sensible way to verify

Re: New GUC to sample log queries

2018-11-30 Thread Alvaro Herrera
On 2018-Nov-30, Sergei Kornilov wrote: > > Ah, I feared that some compiler would not be smart enough to be silent > > about that. I hope it does not emit a warning with this patch? > Yes, with this change build is clean. Thank you Thanks, pushed. > PS: currently i use old gcc (Debian

Re: New GUC to sample log queries

2018-11-30 Thread Sergei Kornilov
Hi > Ah, I feared that some compiler would not be smart enough to be silent > about that. I hope it does not emit a warning with this patch? Yes, with this change build is clean. Thank you PS: currently i use old gcc (Debian 4.9.2-10+deb8u1) 4.9.2 regards, Sergei

Re: New GUC to sample log queries

2018-11-30 Thread Alvaro Herrera
On 2018-Nov-30, Sergei Kornilov wrote: > Hello > > I can not build current HEAD cleanly. I got warning: > > > postgres.c: In function ‘check_log_duration’: > > postgres.c:2254:17: warning: ‘in_sample’ may be used uninitialized in this > > function [-Wmaybe-uninitialized] > > if ((exceeded &&

Re: New GUC to sample log queries

2018-11-30 Thread Sergei Kornilov
Hello I can not build current HEAD cleanly. I got warning: > postgres.c: In function ‘check_log_duration’: > postgres.c:2254:17: warning: ‘in_sample’ may be used uninitialized in this > function [-Wmaybe-uninitialized] > if ((exceeded && in_sample) || log_duration) Should not we have such

Re: New GUC to sample log queries

2018-11-30 Thread Adrien NAYRAT
On 11/30/18 7:42 AM, Nikolay Samokhvalov wrote: On Thu, Nov 29, 2018 at 1:49 PM Alvaro Herrera > wrote: Thanks!  I pushed this with two changes -- one was to reword the docs a bit more, and the other was to compute in_sample only if it's going to be

Re: New GUC to sample log queries

2018-11-30 Thread Adrien NAYRAT
On 11/29/18 10:48 PM, Alvaro Herrera wrote: Thanks! I pushed this with two changes -- one was to reword the docs a bit more, and the other was to compute in_sample only if it's going to be used (when exceeded is true). I hope this won't upset any compilers ... Thanks! I wonder if there's

Re: New GUC to sample log queries

2018-11-29 Thread Nikolay Samokhvalov
On Thu, Nov 29, 2018 at 1:49 PM Alvaro Herrera wrote: > Thanks! I pushed this with two changes -- one was to reword the docs a > bit more, and the other was to compute in_sample only if it's going to > be used (when exceeded is true). I hope this won't upset any compilers ... > This is a

Re: New GUC to sample log queries

2018-11-29 Thread Alvaro Herrera
Thanks! I pushed this with two changes -- one was to reword the docs a bit more, and the other was to compute in_sample only if it's going to be used (when exceeded is true). I hope this won't upset any compilers ... I wonder if there's any sensible way to verify the behavior in an automated

Re: New GUC to sample log queries

2018-11-29 Thread Adrien Nayrat
On 11/29/18 6:35 PM, Alvaro Herrera wrote: > > Sounds good to me (we -> you, I suppose?). I'd tweak the extra_desc for > log_min_duration_statement too, because we no longer log "all" when the > rate is <1; maybe "Zero prints all queries, subject to > log_statement_sample_rate. -1 turns this

Re: New GUC to sample log queries

2018-11-29 Thread Alvaro Herrera
On 2018-Nov-29, Adrien NAYRAT wrote: > > =# select name, short_desc, extra_desc, category from pg_settings where > > category like 'Reporting%When%' ; > > ─[ RECORD 1 > > ] > > name │

Re: New GUC to sample log queries

2018-11-29 Thread Alvaro Herrera
=# select name, short_desc, extra_desc, category from pg_settings where category like 'Reporting%When%' ; ─[ RECORD 1 ] name │ log_min_duration_statement short_desc │ Sets the minimum

Re: New GUC to sample log queries

2018-11-25 Thread Adrien Nayrat
On 11/26/18 12:40 AM, Thomas Munro wrote: > On Wed, Nov 21, 2018 at 9:06 PM Adrien Nayrat > wrote: >> Thanks for your comments. Here is the updated patch. I fixed a warning for >> missing parentheses in this expression: >> if ((exceeded && in_sample) || log_duration) > > Hi Adrien, > > GCC 4.8

Re: New GUC to sample log queries

2018-11-25 Thread Thomas Munro
On Wed, Nov 21, 2018 at 9:06 PM Adrien Nayrat wrote: > Thanks for your comments. Here is the updated patch. I fixed a warning for > missing parentheses in this expression: > if ((exceeded && in_sample) || log_duration) Hi Adrien, GCC 4.8 says: guc.c:3328:3: error: initialization makes integer

Re: New GUC to sample log queries

2018-11-22 Thread Vik Fearing
On 21/11/2018 09:06, Adrien Nayrat wrote: > On 11/19/18 2:52 PM, Dmitry Dolgov wrote: >>> On Mon, Nov 19, 2018 at 2:40 PM Tomas Vondra >>> wrote: >>> >>> On 11/19/18 2:57 AM, Michael Paquier wrote: On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: > Since it's hard to come

Re: New GUC to sample log queries

2018-11-21 Thread Adrien Nayrat
On 11/19/18 2:52 PM, Dmitry Dolgov wrote: >> On Mon, Nov 19, 2018 at 2:40 PM Tomas Vondra >> wrote: >> >> On 11/19/18 2:57 AM, Michael Paquier wrote: >>> On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: Since it's hard to come up with a concise name that will mention sampling

Re: New GUC to sample log queries

2018-11-19 Thread Dmitry Dolgov
> On Mon, Nov 19, 2018 at 2:40 PM Tomas Vondra > wrote: > > On 11/19/18 2:57 AM, Michael Paquier wrote: > > On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: > >> Since it's hard to come up with a concise name that will mention sampling > >> rate > >> in the context of

Re: New GUC to sample log queries

2018-11-19 Thread Tomas Vondra
On 11/18/18 10:52 AM, Adrien Nayrat wrote: ... Alors, I wonder if we should use the same logic for other parameters, such as log_statement_stats log_parser_stats log_planner_stats log_executor_stats > It was mentioned in this thread >

Re: New GUC to sample log queries

2018-11-19 Thread Tomas Vondra
On 11/19/18 2:57 AM, Michael Paquier wrote: On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: Since it's hard to come up with a concise name that will mention sampling rate in the context of min_duration_statement, I think it's fine to name this configuration "log_sample_rate",

Re: New GUC to sample log queries

2018-11-18 Thread Michael Paquier
On Sun, Nov 18, 2018 at 12:18:33PM +0100, Dmitry Dolgov wrote: > Since it's hard to come up with a concise name that will mention sampling rate > in the context of min_duration_statement, I think it's fine to name this > configuration "log_sample_rate", as long as it's dependency from >

Re: New GUC to sample log queries

2018-11-18 Thread Dmitry Dolgov
> On Sun, 18 Nov 2018 at 10:52, Adrien Nayrat > wrote: > > For me, the question is how to name this GUC? Is "log_sample_rate" is enough? > I am not sure because it is only related to queries logged by > log_min_duration_statements. Since it's hard to come up with a concise name that will mention

Re: New GUC to sample log queries

2018-11-18 Thread Adrien Nayrat
On 11/18/18 12:47 AM, Dmitry Dolgov wrote: > This patch went through last few commitfests without any activity, but cfbot > says it still applies and doesn't break any tests. From what I see there was > some agreement about naming, so the patch is indeed needs more review. Could > anyone from the

Re: New GUC to sample log queries

2018-11-17 Thread Dmitry Dolgov
> On Mon, 16 Jul 2018 at 23:07, Tomas Vondra > wrote: > > On 07/16/2018 05:24 PM, Robert Haas wrote: > > On Sun, Jul 15, 2018 at 6:53 AM, Vik Fearing > > wrote: > >> Hmm. Not sure if that last word should be _sample, _sampling, _rate, or > >> a combination of those. > > > > +1 for rate or

Re: New GUC to sample log queries

2018-07-16 Thread Tomas Vondra
On 07/16/2018 05:24 PM, Robert Haas wrote: > On Sun, Jul 15, 2018 at 6:53 AM, Vik Fearing > wrote: >> Hmm. Not sure if that last word should be _sample, _sampling, _rate, or >> a combination of those. > > +1 for rate or sample_rate. I think "sample" or "sampling" without > "rate" will not be

Re: New GUC to sample log queries

2018-07-16 Thread Adrien Nayrat
On 07/16/2018 05:24 PM, Robert Haas wrote: > On Sun, Jul 15, 2018 at 6:53 AM, Vik Fearing > wrote: >> Hmm. Not sure if that last word should be _sample, _sampling, _rate, or >> a combination of those. > > +1 for rate or sample_rate. I think "sample" or "sampling" without > "rate" will not be

Re: New GUC to sample log queries

2018-07-15 Thread Vik Fearing
On 10/07/18 20:34, Adrien Nayrat wrote: > On 06/27/2018 11:13 PM, Adrien Nayrat wrote: >>> 3) Is it intentional to only sample with log_min_duration_statement and >>> not also with log_duration? It seems like it should affect both. In >>> both cases, the name is too generic. Something called

Re: New GUC to sample log queries

2018-07-10 Thread Adrien Nayrat
On 06/27/2018 11:13 PM, Adrien Nayrat wrote: >> 3) Is it intentional to only sample with log_min_duration_statement and >> not also with log_duration? It seems like it should affect both. In >> both cases, the name is too generic. Something called "log_sample_rate" >> should sample

Re: New GUC to sample log queries

2018-06-27 Thread Adrien Nayrat
On 06/24/2018 08:41 PM, Vik Fearing wrote: > On 24/06/18 13:22, Adrien Nayrat wrote: >> Attached third version of the patch with documentation. > > Hi. I'm reviewing this. Hi, thanks for your review. > >> exceeded = (log_min_duration_statement == 0 || >>

Re: New GUC to sample log queries

2018-06-24 Thread Vik Fearing
On 24/06/18 13:22, Adrien Nayrat wrote: > Attached third version of the patch with documentation. Hi. I'm reviewing this. > exceeded = (log_min_duration_statement == 0 || > (log_min_duration_statement > 0 && >

Re: New GUC to sample log queries

2018-05-31 Thread Michael Paquier
On Thu, May 31, 2018 at 02:37:07PM +0200, Adrien Nayrat wrote: > On 05/31/2018 03:34 AM, David Rowley wrote: >> On 31 May 2018 at 06:44, Adrien Nayrat wrote: >>> Here is a naive SELECT only bench with a dataset which fit in ram (scale >>> factor >>> = 100) and PGDATA and log on a ramdisk: >>>

Re: New GUC to sample log queries

2018-05-31 Thread Adrien Nayrat
On 05/31/2018 03:34 AM, David Rowley wrote: > On 31 May 2018 at 06:44, Adrien Nayrat wrote: >> Here is a naive SELECT only bench with a dataset which fit in ram (scale >> factor >> = 100) and PGDATA and log on a ramdisk: >> shared_buffers = 4GB >> seq_page_cost = random_page_cost = 1.0 >>

Re: New GUC to sample log queries

2018-05-30 Thread David Rowley
On 31 May 2018 at 06:44, Adrien Nayrat wrote: > Here is a naive SELECT only bench with a dataset which fit in ram (scale > factor > = 100) and PGDATA and log on a ramdisk: > shared_buffers = 4GB > seq_page_cost = random_page_cost = 1.0 > logging_collector = on (no rotation) It would be better

New GUC to sample log queries

2018-05-30 Thread Adrien Nayrat
Hello hackers, In case of OLTP trafic it is hard to catch fast queries in logs (for example, you want to know parameters for only few queries). You have to put log_min_duration_statement to 0, do a reload, wait a few seconds/minutes, back log_min_duration_statement to a previous value and reload