> openSSL v9.8.2 on Windows 2000 Server in support of PostgreSQL 8.2 - my > original bug report to the Postgres developers can be found at: > http://archives.postgresql.org/pgsql-bugs/2006-12/msg00122.php > > A follow-up to my report posted by Tom Lane, a member of Postgres' core > dev team, includes the following: > > [quote] > [ pokes around... ] I'm inclined to say it's an OpenSSL bug --- the > message string corresponds to WSAEINTR, which to the extent I know > anything about Windows (which is admittedly nil) seems like it should be > treated the same as EINTR on Unix, ie, retry. That is how Postgres > treats it on regular unsecured socket connections. But a look in the > OpenSSL sources finds no indication that they treat it specially, > which means they're going to think it's a hard error. > [/quote] > > Basically, I've had my server brought down on three occasions by full > hard drives. They fill with Postgres' log files, logging the error > message: "SSL SYSCALL error: A blocking operation was interrupted by a > call to WSACancelBlockingCall." As I understand Tom's follow-up to my > bug report, some part of my client-server interaction is causing what > should be a soft error, but OpenSSL reports it to Postgres as a hard > error, and Postgrs in turn dutifully logs it... again and again and again.
According to MSDN this error occurs only when WSACancelBlockingCall is called. Now, if we assume that it's openssl that calls WSACancelBlockingCall, then it should be noted that it's done just before WSACleanup, which in turn "terminates use of the winsock interface" after which all winsock calls shall fail. Then googling suggests that error code in question can be returned if socket is closed elsewhere. In other words there is no reason to treat it as "a soft error," not under just mentioned premises. One can argue that openssl shouldn't call BIO_sock_cleanup (that's where WSACancelBlockingCall and WSACleanup are called), at least not present function, unless it knows that application is exiting [but it never knows]. And it shouldn't have registered it as signal handler. The latter can also be the cause of the problem, i.e. when openssl simply overrides application's signal handler. Because if application sets one, then signal won't be handled the way application expects. Even though signal(3)-ing in Windows is a kind of misnomer... I can see that Postgres aliases WSAEINTR to EINTR, but I can't find any evidence that this error code is in any way equivalent to Unix EINTR and that it should be treated in "just instantly retry same call" manner and not as fatal condition. I'd insist on starting by removing signal(3) from b_sock.c [according to http://cvs.openssl.org/chngview?cn=16556]. A. ______________________________________________________________________ OpenSSL Project http://www.openssl.org Development Mailing List [email protected] Automated List Manager [EMAIL PROTECTED]
