Re: [HACKERS] Overhauling our interrupt handling

2015-01-15 Thread Kyotaro HORIGUCHI
Hello,

> > I think I should finilize my commitfest item for this issue, with
> > .. "Rejected"?
> 
> Fine with me.

done.

> > > 0001: Replace walsender's latch with the general shared latch.
> > > 
> > >   New patch that removes ImmediateInteruptOK behaviour from 
> > > walsender. I
> > >   think that's a rather good idea, because walsender currently seems 
> > > to
> > >   assume WaitLatchOrSocket is reentrant - which I don't think is 
> > > really
> > >   guaranteed.
> > >   Hasn't been reviewed yet, but I think it's not far from being
> > >   committable.
> > 
> > Deesn't this patchset containing per-socket basis non-blocking
> > control for win32? It should make the code (above the win32
> > socket layer itself) more simpler.
> 
> I don't think so - we still rely on it unfortunately.

Does "it" mean win32_noblock?  Or the nonblocking bare win32
socket? The win32-per-sock-blkng-cntl patch in the below message
should cover both of them.

http://www.postgresql.org/message-id/54060ae5.5020...@vmware.com

If you are saying it should be a patch separate from this, I'll
do so.

regareds,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling our interrupt handling

2015-01-15 Thread Andres Freund
Hi,

On 2015-01-15 15:05:08 +0900, Kyotaro HORIGUCHI wrote:
> Hello, I'd synced up this at last.
> 
> I think I should finilize my commitfest item for this issue, with
> .. "Rejected"?

Fine with me.

> > All the patches in the series up to 0008 hav ecommit messages providing
> > more detail. A short description of each patch follows:
> > 
> > 0001: Replace walsender's latch with the general shared latch.
> > 
> >   New patch that removes ImmediateInteruptOK behaviour from walsender. I
> >   think that's a rather good idea, because walsender currently seems to
> >   assume WaitLatchOrSocket is reentrant - which I don't think is really
> >   guaranteed.
> >   Hasn't been reviewed yet, but I think it's not far from being
> >   committable.
> 
> Deesn't this patchset containing per-socket basis non-blocking
> control for win32? It should make the code (above the win32
> socket layer itself) more simpler.

I don't think so - we still rely on it unfortunately.

> > 0004: Process 'die' interrupts while reading/writing from the client socket.
> > 
> >   This is the reason Horiguchi-san started this thread.
> > 
> >   I think the important debate here is whether we think it's
> >   acceptable that there are situations (a full socket buffer, but a
> >   alive connection) where the client previously got an error, but
> >   not anymore afterwards. I think that's much better than having
> >   unkillable connections, but I can see others being of a different
> >   opinion.
> 
> 
> This patch yields a code a bit confusion like following.
> 
> | secure_raw_write(Port *port, const void *ptr, size_t len)
> | {
> ..
> | w = WaitLatchOrSocket(MyLatch,
> | if (w & WL_LATCH_SET)
> ...
> | errno = EINTR;
> | else if (w & WL_SOCKET_WRITEABLE)
> | goto wloop;
> |
> | errno = save_errno;
> 
> The errno set when WL_LATCH_SET always vanishes. Specifically,
> the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by
> EAGAIN. As the result, pg_terminte_backend() cannot kill the
> backend till the write blocking is released. errno = save_errno
> should be the alternative of the line "errno = EINTR" and I
> confirmed that the change leads to the desirable (as of me)
> behavior.

Ugh, that's the result stupid last minute "cleanup". You're right.

> > 0006: Don't allow immediate interupts during authentication anymore.
> > 
> >   So far we've set ImmediateInterruptOK to true during large parts
> >   of the client authentication - that's not all that pretty,
> >   interrupts might arrive while we're in some system routines.
> > 
> >   Due to patches 0003/0004 we now are able to safely serve
> >   interrupts during client communication which is the major are
> >   where we want to adhere to authentication_timeout.
> > 
> >   I additionally placed some CHECK_FOR_INTERRUPTS()s in some
> >   somewhat randomly chosen places in auth.c. Those don't completely
> >   get back the previous 'appruptness' (?) of reacting to
> >   interrupts, but that's partially for the better, because we don't
> >   interrupt foreign code anymore.
> 
> Simplly as a comment on style, this patch introduces checks of
> ClientAuthInProgress twice successively into
> ProcessInterrupts(). Isn't it better to make it like following?
> 
> | /* As in quickdie, ...
> | if (ClientAuthInProgress)
> | {
> |if (whereToSendOutput == DestRemote) whereToSendOutput = DestNone;
> |ereport(FATAL,

Hm, yes.

> > 0008: Remove remnants of ImmediateInterruptOK handling.
> > 
> >   Now that ImmediateInterruptOK is never set to true anymore, we can
> >   remove related code and comments.
> > 
> >   New and not reviewed.
> 
> walreceiver.c still has WalRcvImmediateInterruptOK as mentioned
> below, apart from whether it should be changed or not, the
> following comment remains.

> | * This is very much like what regular backends do with ImmediateInterruptOK,
> | * ProcessInterrupts() etc.

Yep. As mentioned below, it doesn't use the same infrastructure, so I'd
rather treat this separately. This set is more than big enough.

Thanks for looking!

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Overhauling our interrupt handling

2015-01-14 Thread Kyotaro HORIGUCHI
Hello, I'd synced up this at last.

I think I should finilize my commitfest item for this issue, with
.. "Rejected"?

> This mail is a answer to
> http://archives.postgresql.org/message-id/20150110022542.GF12509%40alap3.anarazel.de
> but I decided that it's a good go move it into a new thread since the
> patchseries has outgrown its original target. It's invasive and touches
> many arcane areas of the code, so that I think more eyes than a long
> deep thread is likely to have, are a good thing.
> 
> As a short description of the goal of the patchseries:
> This started out as steps on the way to be able to safely handle
> terminate_backend() et al when we're reading/writing from the
> client. E.g. because the client is dead and we haven't noticed yet and
> are stuck writing, or because we want to implement a idle_in_transaction
> timeout.
> 
> Making these changes allowed me to see that not doing significant work
> in signal handlers can make code much simpler and more robust. The
> number of bugs postgres had in the past that involved doing too much in
> signal handlers is significant.  Thus I've since extended the
> patchseries to get rid of nearly all nontrivial work in signal
> handlers.

It sounds pretty nice.

> All the patches in the series up to 0008 hav ecommit messages providing
> more detail. A short description of each patch follows:
> 
> 0001: Replace walsender's latch with the general shared latch.
> 
>   New patch that removes ImmediateInteruptOK behaviour from walsender. I
>   think that's a rather good idea, because walsender currently seems to
>   assume WaitLatchOrSocket is reentrant - which I don't think is really
>   guaranteed.
>   Hasn't been reviewed yet, but I think it's not far from being
>   committable.

Deesn't this patchset containing per-socket basis non-blocking
control for win32? It should make the code (above the win32
socket layer itself) more simpler.

> 0002: Use a nonblocking socket for FE/BE communication and block using
>   latches.
> 
>   Has previously been reviewed by Heikki. I think Noah also had a
>   look, although I'm not sure how close that was.
> 
>   I think this can be committed soon.
>
> 0003: Introduce and use infrastructure for interrupt processing during client 
> reads.
> 
>   From here on ImmediateInterruptOK isn't set during client
>   communication. Normal interrupts and sinval/async interrupts are
>   processed outside of signal handlers. Especially the sinval/async
>   greatly simplify the respective code.
> 
>   Has been reviewed by Heikki in an earlier version - I think I
>   addressed all the review comments.
> 
>   I'd like somebody else to look at the sinval/async.c changes
>   before committing. I really like the simplification this change
>   allowed.
> 
>   I think this patch might not be safe without 0004 because I can't
>   convince myself that it's safe to interrupt latch waits with work
>   that actually might also use the same latch (sinval
>   interrupts). But it's easier to review this way as 0004 is quite
>   big.
> 
> 0004: Process 'die' interrupts while reading/writing from the client socket.
> 
>   This is the reason Horiguchi-san started this thread.
> 
>   I think the important debate here is whether we think it's
>   acceptable that there are situations (a full socket buffer, but a
>   alive connection) where the client previously got an error, but
>   not anymore afterwards. I think that's much better than having
>   unkillable connections, but I can see others being of a different
>   opinion.


This patch yields a code a bit confusion like following.

| secure_raw_write(Port *port, const void *ptr, size_t len)
| {
..
| w = WaitLatchOrSocket(MyLatch,
| if (w & WL_LATCH_SET)
...
| errno = EINTR;
| else if (w & WL_SOCKET_WRITEABLE)
| goto wloop;
|
| errno = save_errno;

The errno set when WL_LATCH_SET always vanishes. Specifically,
the EINTR set by SIGTERM(WL_LATCH_SET) is overwritten by
EAGAIN. As the result, pg_terminte_backend() cannot kill the
backend till the write blocking is released. errno = save_errno
should be the alternative of the line "errno = EINTR" and I
confirmed that the change leads to the desirable (as of me)
behavior.


> 0005: Move deadlock and other interrupt handling in proc.c out of signal 
> handlers.
> 
>   I'm surprised how comparatively simple this turned out to be. For
>   robustness and understanding I think this is a great improvement.
> 
>   New and not reviewed at all. Needs significant review. But works
>   in my simple testcases.
> 
> 0006: Don't allow immediate interupts during authentication anymore.
> 
>   So far we've set ImmediateInterruptOK to true during large parts
>   of the client authentication - that's not all that pretty,
>   interrupts might arrive while we're in so