Re: [HACKERS] jacana hung after failing to acquire random number

Andrew Dunstan Mon, 12 Dec 2016 05:41:37 -0800


On 12/12/2016 02:32 AM, Heikki Linnakangas wrote:

On 12/12/2016 05:58 AM, Michael Paquier wrote:
On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan <and...@dunslane.net>wrote:
jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make
check". After starting the autovacuum launcher there are 120messages on its
log about "Could not acquire random number". Then nothing.


So I suspect the problem here is commit
fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in
detail.


Shouldn't we want the postmaster to shut down if it's not going to go
further? Note that frogmouth, also mingw, which builds with openssl,doesn't
have this issue.
Did you unlock it in some way at the end? Here is the shape of the
report for others:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-12-10%2022%3A00%3A15
And here is of course the interesting bit:
2016-12-10 17:25:38.822 EST [584c80e2.ddc:2] LOG:  could not acquire
random number
2016-12-10 17:25:39.869 EST [584c80e2.ddc:3] LOG:  could not acquire
random number
2016-12-10 17:25:40.916 EST [584c80e2.ddc:4] LOG:  could not acquire
random number

I am not seeing any problems with MSVC without openssl, so that's a
problem proper to MinGW. I am getting to wonder if it is actually a
good idea to cache the crypt context and then re-use it. Using a new
context all the time is definitely not performance-wise though.
Actually, looking at the config.log on jacana, it's trying to use/dev/urandom:
configure:15028: checking for /dev/urandom
configure:15041: result: yes
configure:15054: checking which random number source to use
configure:15073: result: /dev/urandom

And looking closer at configure.in, I can see why:

  elif test "$PORTNAME" = x"win32" ; then
    USE_WIN32_RANDOM=1
That test is broken. It looks like the x"$VAR" = x"constant" idiom,but the left side of the comparison doesn't have the 'x'. Oops.
Fixed that, let's see if it made jacana happy again.
This makes me wonder if we should work a bit harder to get a gooderror message, if acquiring a random number fails for any reason. Thisneeds to work in the frontend as well backend, but we could still havean elog(LOG, ...) there, inside an #ifndef FRONTEND block.



I see you have now improved the messages in postmaster.c, which is good.

However, the bigger problem (ISTM) is that when this failed I had asystem which was running but where every connection immediately failed:


   ============== creating temporary instance            ==============
   ============== initializing database system           ==============
   ============== starting postmaster                    ==============

   pg_regress: postmaster did not respond within 120 seconds
   Examine 
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/log/postmaster.log
 for the reason
   make: *** [check] Error 2

Should one or more of these errors be fatal? Or should we at least getpg_regress to try to shut down the postmaster if it can't connect after120 seconds?

[In answer to Michael's question above, I forcibly shut down thepostmaster by hand. Otherwise it would still be running, and we wouldnot have got the report on the buildfarm server.]


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jacana hung after failing to acquire random number

Reply via email to