Re: [HACKERS] log_destination=file

2017-09-10 Thread Dmitry Dolgov
> On 7 September 2017 at 15:46, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> It seems that for this particular workload it was about 20-25% slower.​

Hmm...looks like I provided misleading data, sorry. The numbers from
previous
email are correct and I'm able to reproduce them, but surprisingly for me
only
for one particular situation when `log_statement=all;
log_destination=stderr`
and stderr is sent to a console with a tmux session running in it (and
apparently
it has some impact on the performance, but I'm not sure how exactly). In all
other cases (when stderr is sent to a plain console, /dev/null, or we send
logs
to a file) numbers are different, and the difference between patch and
master
is quite less significant.

average latency:

clients patch   master

10  0.321   0.286

20  0.669   0.602

30  1.016   0.942

40  1.358   1.280

50  1.727   1.637


Re: [HACKERS] log_destination=file

2017-09-08 Thread Dmitry Dolgov
> On 8 September 2017 at 01:32, Magnus Hagander  wrote:
>
> 1. where was stderr actually sent? To the console, to /dev/null or to a
file?

To the console (but I can also try other options, although I'm not sure if
it would have any impact).

> 2. Could you run the same thing (on the same machine) with the logging
collector on and logging to file, without the patch? I'd assume that gives
performance similar to running with the patch, but it would be good to
confirm that.

Sure, I'll do it this weekend.


Re: [HACKERS] log_destination=file

2017-09-07 Thread Magnus Hagander
On Thu, Sep 7, 2017 at 7:02 AM, Tom Lane  wrote:

> Dmitry Dolgov <9erthali...@gmail.com> writes:
> >> On 31 August 2017 at 14:49, Tom Lane  wrote:
> >> pgbench with log_statement = all would be a pretty easy test case.
>
> > It seems that for this particular workload it was about 20-25% slower.
>
> Ouch.  That seems like rather a large hit :-(.  I actually expected
> it to be close to negligible, but I don't think 20% qualifies.
>

Agreed, that's a lot more than I expected. A few questions though:

1. where was stderr actually sent? To the console, to /dev/null or to a
file?

2. Could you run the same thing (on the same machine) with the logging
collector on and logging to file, without the patch? I'd assume that gives
performance similar to running with the patch, but it would be good to
confirm that.

And thanks for running the benchmark, saved me some time!


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] log_destination=file

2017-09-07 Thread Tom Lane
Dmitry Dolgov <9erthali...@gmail.com> writes:
>> On 31 August 2017 at 14:49, Tom Lane  wrote:
>> pgbench with log_statement = all would be a pretty easy test case.

> It seems that for this particular workload it was about 20-25% slower.

Ouch.  That seems like rather a large hit :-(.  I actually expected
it to be close to negligible, but I don't think 20% qualifies.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_destination=file

2017-09-07 Thread Dmitry Dolgov
Hi

> On 31 August 2017 at 14:49, Tom Lane  wrote:
>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.
>
>> If so,
>> could you suggest a workload to make an actual benchmark of it (where
>> logging would be high enough that it could be come a bottleneck -- and
not
>> writing the log data to disk, but the actual logging). I'm not sure what
a
>> good one would be.
>
> pgbench with log_statement = all would be a pretty easy test case.

This part of the discussion caught my attention, and I tried to perform such
easy test. I was doing:

pgbench -S -j2 -c${clients} -T500 test

with `log_statement=all` and `log_destination=stderr`, what I assume should
be
enough to get some approximation for numbers.

scaling factor: 100
average latency:

clients patch   master

10  1.827   1.456

20  4.027   3.300

30  6.284   4.921

40  8.409   6.767

50  10.985 8.646

It seems that for this particular workload it was about 20-25% slower.


Re: [HACKERS] log_destination=file

2017-09-01 Thread Magnus Hagander
On Fri, Sep 1, 2017 at 6:44 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark  wrote:
> >> So what happens now with these messages? My understanding is that
> >> they're missing from the CSV logs and are simply inserted into the
> >> text logs without any log_line_prefix? The logging collector doesn't
> >> recognize these messages and reformat them for the CSV logs does it?
>
> > Yeah, that's pretty much it.
>
> > Fixing that is also on my plan, but for later :)
>
> Keep in mind that you've got to be really, really conservative about
> adding functionality in the logging collector process.  If it ever
> crashes, we have problems.  If memory serves, the postmaster is able
> to restart it, but we cannot restore the original stdout/stderr
> destination, which is problematic if that's where its output had
> been going.  So it's pretty close to being as mission-critical as
> the postmaster itself.
>

Yeah. That's one of the reasons I'm not trying to do it all in one batch.

But yeah, the postmaster restarts it just fine. And with the WIP patch I
posted earlier, the message from the postmaster that it did gets lost if
you are logging to stderr. It does end up in the CSV file though. And I'm
sure there are plenty of other corner cases around that one to consider.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] log_destination=file

2017-09-01 Thread Tom Lane
Magnus Hagander  writes:
> On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark  wrote:
>> So what happens now with these messages? My understanding is that
>> they're missing from the CSV logs and are simply inserted into the
>> text logs without any log_line_prefix? The logging collector doesn't
>> recognize these messages and reformat them for the CSV logs does it?

> Yeah, that's pretty much it.

> Fixing that is also on my plan, but for later :)

Keep in mind that you've got to be really, really conservative about
adding functionality in the logging collector process.  If it ever
crashes, we have problems.  If memory serves, the postmaster is able
to restart it, but we cannot restore the original stdout/stderr
destination, which is problematic if that's where its output had
been going.  So it's pretty close to being as mission-critical as
the postmaster itself.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_destination=file

2017-09-01 Thread Magnus Hagander
On Fri, Sep 1, 2017 at 6:00 PM, Greg Stark  wrote:

> On 31 August 2017 at 13:49, Tom Lane  wrote:
> > Magnus Hagander  writes:
> >> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane  wrote:
> > Yes, it's pretty important, because of assorted stuff not-under-our-
> > control that doesn't know about ereport and will just print to stderr
> > anyway.  Some examples: dynamic linker can't-resolve-symbol failure
> > messages, glibc malloc corruption error messages, just about any external
> > module in plperl or plpython.  I don't find this to be negotiable.
>
> So what happens now with these messages? My understanding is that
> they're missing from the CSV logs and are simply inserted into the
> text logs without any log_line_prefix? The logging collector doesn't
> recognize these messages and reformat them for the CSV logs does it?
>

Yeah, that's pretty much it.

Fixing that is also on my plan, but for later :)



> I'm actually asking because I'm more concerned with JSON logs or
> msgpack logs. Currently these are supported with an emit_log_hook but
> they can't capture these non-ereport logs either.
>
> Also the CSV and emit_log_hook based logs don't have any convenient
> way to turn them on and off and control the location and filename of
> the logs. It would be nice if we could have something like
>
> log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%
> d_%H%M%S.csv=csv'
>

That could definitely be an interesting improvement as well. Decoupling the
format from the destination would make a lot of sense. (Arguments can
certainly be made for example for csv-to-syslog, when you have a custom
syslog server)



> >> Are you actually asking for a benchmark of if logging gets slower?
> >
> > Yes.
>
> Personally I don't think it's "performance" so much as operational
> issues that are more concerning. For all we know there are people out
> there who tried to use the logging collector and found it didn't work
> well on some system -- perhaps it interacted with systemd or something
> else on the system -- and they switched back to just using stderr. I
> don't know how to flush these users out though if there are any. Just
> making this change early in a release cycle is the best we can do.
>

The most common use of stderr I think is debian/ubuntu which uses that and
then pg_ctl based redirect. Or at least used to (I remember being annoyed
by that at least once..)

AFAIK all other major platforms already use the logging collector in the
standard packages.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] log_destination=file

2017-09-01 Thread Greg Stark
On 31 August 2017 at 13:49, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane  wrote:
> Yes, it's pretty important, because of assorted stuff not-under-our-
> control that doesn't know about ereport and will just print to stderr
> anyway.  Some examples: dynamic linker can't-resolve-symbol failure
> messages, glibc malloc corruption error messages, just about any external
> module in plperl or plpython.  I don't find this to be negotiable.

So what happens now with these messages? My understanding is that
they're missing from the CSV logs and are simply inserted into the
text logs without any log_line_prefix? The logging collector doesn't
recognize these messages and reformat them for the CSV logs does it?

I'm actually asking because I'm more concerned with JSON logs or
msgpack logs. Currently these are supported with an emit_log_hook but
they can't capture these non-ereport logs either.

Also the CSV and emit_log_hook based logs don't have any convenient
way to turn them on and off and control the location and filename of
the logs. It would be nice if we could have something like

log_destinations='stderr=text,syslog=json,postgresql-%Y-%m-%d_%H%M%S.csv=csv'

>> Are you actually asking for a benchmark of if logging gets slower?
>
> Yes.

Personally I don't think it's "performance" so much as operational
issues that are more concerning. For all we know there are people out
there who tried to use the logging collector and found it didn't work
well on some system -- perhaps it interacted with systemd or something
else on the system -- and they switched back to just using stderr. I
don't know how to flush these users out though if there are any. Just
making this change early in a release cycle is the best we can do.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_destination=file

2017-08-31 Thread Tom Lane
Magnus Hagander  writes:
> On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane  wrote:
>> Right, because the decision whether to redirect stdout/stderr can't
>> be changed on the fly.

> Right.

> We could of course also say we only care about things generated by our
> ereport framework, in which case we don't need to redirect stderr and can
> just use a "regular pipe". But IIRC that's functionality we also
> specifically wanted (though I don't think I've ever needed it myself,
> presumably others have).

Yes, it's pretty important, because of assorted stuff not-under-our-
control that doesn't know about ereport and will just print to stderr
anyway.  Some examples: dynamic linker can't-resolve-symbol failure
messages, glibc malloc corruption error messages, just about any external
module in plperl or plpython.  I don't find this to be negotiable.

> Are you actually asking for a benchmark of if logging gets slower?

Yes.

> If so,
> could you suggest a workload to make an actual benchmark of it (where
> logging would be high enough that it could be come a bottleneck -- and not
> writing the log data to disk, but the actual logging). I'm not sure what a
> good one would be.

pgbench with log_statement = all would be a pretty easy test case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] log_destination=file

2017-08-31 Thread Magnus Hagander
On Thu, Aug 31, 2017 at 2:34 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > My understanding is that the main reason for this is that we cannot
> change
> > logging_collector without restarting postmaster, whereas we can change
> > log_destination.
>
> Right, because the decision whether to redirect stdout/stderr can't
> be changed on the fly.
>

Right.

We could of course also say we only care about things generated by our
ereport framework, in which case we don't need to redirect stderr and can
just use a "regular pipe". But IIRC that's functionality we also
specifically wanted (though I don't think I've ever needed it myself,
presumably others have).




> > My suggestion is we work around this by just always starting the logging
> > collector, even if we're not planning to use it.
>
> Umm
>
> > Do people see an actual problem with that? I agree it's an extra round of
> > indirection there, but is that a problem? It would also cause one more
> > backgorund process to always be running, but again, is that really a
> > problem? The overhead is not exactly large.
>
> You just made three assertions about "this isn't a problem" without
> providing any evidence in support of any of them.  Maybe with some
> measurements we could have a real discussion.
>

Well, not entirely. The first two are questions "is this really a problem".

The overhead of an extra process certainly isn't much, I'm wiling to say
that as an assertion.

The other two, less so, that's more question.

Are you actually asking for a benchmark of if logging gets slower? If so,
could you suggest a workload to make an actual benchmark of it (where
logging would be high enough that it could be come a bottleneck -- and not
writing the log data to disk, but the actual logging). I'm not sure what a
good one would be.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] log_destination=file

2017-08-31 Thread Tom Lane
Magnus Hagander  writes:
> My understanding is that the main reason for this is that we cannot change
> logging_collector without restarting postmaster, whereas we can change
> log_destination.

Right, because the decision whether to redirect stdout/stderr can't
be changed on the fly.

> My suggestion is we work around this by just always starting the logging
> collector, even if we're not planning to use it.

Umm

> Do people see an actual problem with that? I agree it's an extra round of
> indirection there, but is that a problem? It would also cause one more
> backgorund process to always be running, but again, is that really a
> problem? The overhead is not exactly large.

You just made three assertions about "this isn't a problem" without
providing any evidence in support of any of them.  Maybe with some
measurements we could have a real discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] log_destination=file

2017-08-31 Thread Magnus Hagander
Attached is a very much VIP (AKA rough draft) for $subject.

Right now we have two parameters controlling logging destination, and they
work in interesting combinations:

log_destination=stderr, logging_collector=off -> log to stderr (makes sense)
log_destination=stderr, logging_collector=on  -> log to file (*highly*
illogical for most users, to set stderr when they want file)
log_destination=csvlog, logging_collector=on -> log to csvfile (makes sense)
log_destination=csvlog, logging_collector=off -> fail (ugh)

(beyond that we also have syslog and eventlog, but those are different and
not touched by this patch)

My understanding is that the main reason for this is that we cannot change
logging_collector without restarting postmaster, whereas we can change
log_destination.

My suggestion is we work around this by just always starting the logging
collector, even if we're not planning to use it. Then we'd just have:

log_destination = stderr -> log to stderr
log_destination = file -> log to file
log_destination = csvlog -> log to csv file

The main difference here is that log_destination=stderr would also go
through the logging collector which would then assemble it and write it out
to it's own stderr.

Do people see an actual problem with that? I agree it's an extra round of
indirection there, but is that a problem? It would also cause one more
backgorund process to always be running, but again, is that really a
problem? The overhead is not exactly large.

It would make configuration a lot more logical for logging. It would also
make it possible to switch between all logging configurations without
restarting.

The attached WIP is mostly working for the simple cases. It fails
completely if the syslogger is restarted at this point, simply because I
haven't figured out how to pass the original stderr down to the syslogger.
I'm sure there are also many other smaller issues around it, but I wanted
to get the discussion done before I spend the time to go through those.

Thoughts?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 95180b2ef5..7c4eb5743f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1764,7 +1764,7 @@ ServerLoop(void)
 		}
 
 		/* If we have lost the log collector, try to start a new one */
-		if (SysLoggerPID == 0 && Logging_collector)
+		if (SysLoggerPID == 0)
 			SysLoggerPID = SysLogger_Start();
 
 		/*
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 3255b42c7d..15fc606d24 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -57,10 +57,8 @@
 
 
 /*
- * GUC parameters.  Logging_collector cannot be changed after postmaster
- * start, but the rest can change at SIGHUP.
+ * GUC parameters. Can change at SIGHUP.
  */
-bool		Logging_collector = false;
 int			Log_RotationAge = HOURS_PER_DAY * MINS_PER_HOUR;
 int			Log_RotationSize = 10 * 1024;
 char	   *Log_directory = NULL;
@@ -182,6 +180,10 @@ SysLoggerMain(int argc, char *argv[])
 	 * assumes that all interesting messages generated in the syslogger will
 	 * come through elog.c and will be sent to write_syslogger_file.
 	 */
+	/*
+	 * XXX: this does not work when we want to use stderr in the syslogger.
+	 * needs fixing!
+	 */
 	if (redirection_done)
 	{
 		int			fd = open(DEVNULL, O_WRONLY, 0);
@@ -370,10 +372,11 @@ SysLoggerMain(int argc, char *argv[])
 		if (!rotation_requested && Log_RotationSize > 0 && !rotation_disabled)
 		{
 			/* Do a rotation if file is too big */
-			if (ftell(syslogFile) >= Log_RotationSize * 1024L)
+			if (syslogFile != NULL &&
+ftell(syslogFile) >= Log_RotationSize * 1024L)
 			{
 rotation_requested = true;
-size_rotation_for |= LOG_DESTINATION_STDERR;
+size_rotation_for |= LOG_DESTINATION_FILE;
 			}
 			if (csvlogFile != NULL &&
 ftell(csvlogFile) >= Log_RotationSize * 1024L)
@@ -390,7 +393,7 @@ SysLoggerMain(int argc, char *argv[])
 			 * was sent by pg_rotate_logfile.
 			 */
 			if (!time_based_rotation && size_rotation_for == 0)
-size_rotation_for = LOG_DESTINATION_STDERR | LOG_DESTINATION_CSVLOG;
+size_rotation_for = LOG_DESTINATION_FILE | LOG_DESTINATION_CSVLOG;
 			logfile_rotate(time_based_rotation, size_rotation_for);
 		}
 
@@ -522,9 +525,6 @@ SysLogger_Start(void)
 	pid_t		sysloggerPid;
 	char	   *filename;
 
-	if (!Logging_collector)
-		return 0;
-
 	/*
 	 * If first time through, create the pipe which will receive stderr
 	 * output.
@@ -581,7 +581,10 @@ SysLogger_Start(void)
 	first_syslogger_file_time = time(NULL);
 	filename = logfile_getname(first_syslogger_file_time, NULL);
 
-	syslogFile = logfile_open(filename, "a", false);
+	if (Log_destination & LOG_DESTINATION_FILE)
+		syslogFile = logfile_open(filename, "a", false);
+	else
+		syslogFile = NULL;