On 12/12/2016 02:32 AM, Heikki Linnakangas wrote:
On 12/12/2016 05:58 AM, Michael Paquier wrote:
On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan <and...@dunslane.net> wrote:

jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make
check". After starting the autovacuum launcher there are 120 messages on its
log about "Could not acquire random number". Then nothing.

So I suspect the problem here is commit
fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in

Shouldn't we want the postmaster to shut down if it's not going to go
further? Note that frogmouth, also mingw, which builds with openssl, doesn't
have this issue.

Did you unlock it in some way at the end? Here is the shape of the
report for others:
And here is of course the interesting bit:
2016-12-10 17:25:38.822 EST [584c80e2.ddc:2] LOG:  could not acquire
random number
2016-12-10 17:25:39.869 EST [584c80e2.ddc:3] LOG:  could not acquire
random number
2016-12-10 17:25:40.916 EST [584c80e2.ddc:4] LOG:  could not acquire
random number

I am not seeing any problems with MSVC without openssl, so that's a
problem proper to MinGW. I am getting to wonder if it is actually a
good idea to cache the crypt context and then re-use it. Using a new
context all the time is definitely not performance-wise though.

Actually, looking at the config.log on jacana, it's trying to use /dev/urandom:

configure:15028: checking for /dev/urandom
configure:15041: result: yes
configure:15054: checking which random number source to use
configure:15073: result: /dev/urandom

And looking closer at configure.in, I can see why:

  elif test "$PORTNAME" = x"win32" ; then

That test is broken. It looks like the x"$VAR" = x"constant" idiom, but the left side of the comparison doesn't have the 'x'. Oops.

Fixed that, let's see if it made jacana happy again.

This makes me wonder if we should work a bit harder to get a good error message, if acquiring a random number fails for any reason. This needs to work in the frontend as well backend, but we could still have an elog(LOG, ...) there, inside an #ifndef FRONTEND block.

I see you have now improved the messages in postmaster.c, which is good.

However, the bigger problem (ISTM) is that when this failed I had a system which was running but where every connection immediately failed:

   ============== creating temporary instance            ==============
   ============== initializing database system           ==============
   ============== starting postmaster                    ==============

   pg_regress: postmaster did not respond within 120 seconds
 for the reason
   make: *** [check] Error 2

Should one or more of these errors be fatal? Or should we at least get pg_regress to try to shut down the postmaster if it can't connect after 120 seconds?

[In answer to Michael's question above, I forcibly shut down the postmaster by hand. Otherwise it would still be running, and we would not have got the report on the buildfarm server.]



Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to