Hi,

When one tests postgres in a some of the popular CI systems (all that
use docker for windows), some of the tests fail in weird ways. Like

https://www.postgresql.org/message-id/20210303052011.ybplxw6q4tafwogk%40alap3.anarazel.de

> t/003_recovery_targets.pl ............ 7/9
> #   Failed test 'multiple conflicting settings'
> #   at t/003_recovery_targets.pl line 151.
> 
> #   Failed test 'recovery end before target reached is a fatal error'
> #   at t/003_recovery_targets.pl line 177.
> t/003_recovery_targets.pl ............ 9/9 # Looks like you failed 2 tests of 
> 9.
> t/003_recovery_targets.pl ............ Dubious, test returned 2 (wstat 512, 
> 0x200)
> Failed 2/9 subtests
> 
> I think it's pretty dangerous if we have a substantial number of tests
> that aren't run on windows - I think a lot of us just assume that the
> BF would catch windows specific problems...

A lot of debugging later I figured out that the problem is that postgres
decides not to write anything to stderr, but send everything to the
windows event log instead.  This includes error messages when starting
postgres with wrong parameters or such...

The reason for that elog.c and pg_ctl.c use
src/port/win32security.c:pgwin32_is_service() to detect whether they're
running as a service:

static void
send_message_to_server_log(ErrorData *edata)
...
                /*
                 * In a win32 service environment, there is no usable stderr. 
Capture
                 * anything going there and write it to the eventlog instead.
                 *
                 * If stderr redirection is active, it was OK to write to 
stderr above
                 * because that's really a pipe to the syslogger process.
                 */
                else if (pgwin32_is_service())
                        write_eventlog(edata->elevel, buf.data, buf.len);
..
void
write_stderr(const char *fmt,...)
...
        /*
         * On Win32, we print to stderr if running on a console, or write to
         * eventlog if running as a service
         */
        if (pgwin32_is_service())       /* Running as a service */
        {
                write_eventlog(ERROR, errbuf, strlen(errbuf));


but pgwin32_is_service() doesn't actually reliably detect if running as
a service - it's a heuristic that also triggers when running postgres
within a windows docker container (presumably because that itself is run
from within a service?).


ISTM that that's a problem, and is likely to become more of a problem
going forward (assuming that docker on windows will become more
popular).


My opinion is that the whole attempt at guessing whether we are running
as a service is a bad idea. This isn't the first time to be a problem,
see e.g. [1].

Why don't we instead have pgwin32_doRegister() include a parameter that
indicates we're running as a service and remove all the heuristics?


I tried to look around to see if there's a simple way to drop the
problematic memberships that trigger pgwin32_is_service() - but there
seem to be no commandline tools doing so (but there are C APIs).

Does anybody have an alternative way of fixing this?


Greetings,

Andres Freund

[1]
commit ff30aec759bdc4de78912d91f650ec8fd95ff6bc
Author: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date:   2017-03-17 11:14:01 +0200

    Fix and simplify check for whether we're running as Windows service.


Reply via email to