Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2015-02-23 Thread Andres Freund
Hi,

On 2014-07-26 18:16:01 +0200, Andres Freund wrote:
 On 2014-07-26 11:32:24 -0400, Tom Lane wrote:
  MauMau maumau...@gmail.com writes:
   [ sinval catchup signal - ProcessCatchupEvent - WaitLatch - deadlock ]
  
  Ugh.
  
  One line of thought is that it's pretty unsafe to be doing anything
  as complicated as transaction start/commit in a signal handler, even one
  that is sure it's not interrupting anything else.
 
 Yea, that's really not nice.

MauMau, we don't do this anymore. Could you verify that the issue is
fixed for you?

I'd completely forgotten that this thread made me work on moving
everything complicated out of signal handlers...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread MauMau

Robert Haas robertmh...@gmail.com writes:

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem.  As the commit message says, it's dead simple.


From: Tom Lane t...@sss.pgh.pa.us

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix.  At best it's
band-aiding one symptom in a rather fragile way.


Thank you, Robert san.  I'll be waiting for it to be back-ported to the next 
9.1/9.2 release.


Yes, I think this failure is only one potential symptom caused by the 
implemnentation mistake -- handling both latch wakeup and other tasks that 
wait on a latch in the SIGUSR1 handler.  Although there may be no such tasks 
now, I'd like to correct and clean up the implementation as follows to avoid 
similar problems in the future.  I think it's enough to do this only for 
9.5.  Please correct me before I go deeper in the wrong direction.


* The SIGUSR1 handler only does latch wakeup.  Any other task is done in 
other signal handlers such as SIGUSR2.  Many daemon postgres processes 
follow this style, but the normal backend, autovacuum daemons, and 
background workers don't now.


* InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1, 
latch_sigusr1_handler).  Change the argument of latch_sigusr1_handler() 
accordingly.


* Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler 
functions from all processes.  We can eliminate many SIGUSR1 handler 
functions which have the same contents.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Robert Haas
On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
 I've tracked down the real root cause.  The fix is very simple.  Please
 check the attached one-liner patch.

 I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
 problem.  As the commit message says, it's dead simple.

 While I have no great objection to back-porting Heikki's patch, it seems
 like a very large stretch to call this a root-cause fix.  At best it's
 band-aiding one symptom in a rather fragile way.

Yeah, that's true, but I'm not clear that we have another
back-patchable fix, so doing something almost-certainly-harmless to
alleviate the immediate pain seems worthwhile.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
 On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
  I've tracked down the real root cause.  The fix is very simple.  Please
  check the attached one-liner patch.
 
  I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
  problem.  As the commit message says, it's dead simple.
 
  While I have no great objection to back-porting Heikki's patch, it seems
  like a very large stretch to call this a root-cause fix.  At best it's
  band-aiding one symptom in a rather fragile way.
 
 Yeah, that's true, but I'm not clear that we have another
 back-patchable fix, so doing something almost-certainly-harmless to
 alleviate the immediate pain seems worthwhile.

Isn't that still leaving the very related issue of waits due to hot
pruning open?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Robert Haas
On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
 On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
  I've tracked down the real root cause.  The fix is very simple.  Please
  check the attached one-liner patch.
 
  I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
  problem.  As the commit message says, it's dead simple.
 
  While I have no great objection to back-porting Heikki's patch, it seems
  like a very large stretch to call this a root-cause fix.  At best it's
  band-aiding one symptom in a rather fragile way.

 Yeah, that's true, but I'm not clear that we have another
 back-patchable fix, so doing something almost-certainly-harmless to
 alleviate the immediate pain seems worthwhile.

 Isn't that still leaving the very related issue of waits due to hot
 pruning open?

Yes.  Do you have a back-patchable solution for that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
 On Tue, Aug 12, 2014 at 11:06 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-08-12 11:04:00 -0400, Robert Haas wrote:
  On Mon, Aug 11, 2014 at 7:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   Robert Haas robertmh...@gmail.com writes:
   On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
   I've tracked down the real root cause.  The fix is very simple.  Please
   check the attached one-liner patch.
  
   I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
   problem.  As the commit message says, it's dead simple.
  
   While I have no great objection to back-porting Heikki's patch, it seems
   like a very large stretch to call this a root-cause fix.  At best it's
   band-aiding one symptom in a rather fragile way.
 
  Yeah, that's true, but I'm not clear that we have another
  back-patchable fix, so doing something almost-certainly-harmless to
  alleviate the immediate pain seems worthwhile.
 
  Isn't that still leaving the very related issue of waits due to hot
  pruning open?
 
 Yes.  Do you have a back-patchable solution for that?

The easiest thing I can think of is sprinkling a few
SetConfigOption('synchronous_commit', 'off',
PGC_INTERNAL, PGC_S_OVERRIDE,
GUC_ACTION_LOCAL, true, ERROR);
in some strategic places. From a quick look:
* all of autovacuum
* locally in ProcessCompletedNotifies
* locally in ProcessIncomingNotify
* locally in ProcessCatchupEvent
* locally in InitPostgres

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
 Yes.  Do you have a back-patchable solution for that?

 The easiest thing I can think of is sprinkling a few
 SetConfigOption('synchronous_commit', 'off',
 PGC_INTERNAL, PGC_S_OVERRIDE,
 GUC_ACTION_LOCAL, true, ERROR);

This still seems to me to be applying a band-aid that covers over some
symptoms; it's not dealing with the root cause that we've overloaded
the signal handling mechanism too much.   What reason is there to think
that there are no other symptoms of that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-12 Thread Andres Freund
On 2014-08-12 13:11:55 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-12 11:56:41 -0400, Robert Haas wrote:
  Yes.  Do you have a back-patchable solution for that?
 
  The easiest thing I can think of is sprinkling a few
  SetConfigOption('synchronous_commit', 'off',
  PGC_INTERNAL, PGC_S_OVERRIDE,
  GUC_ACTION_LOCAL, true, ERROR);
 
 This still seems to me to be applying a band-aid that covers over some
 symptoms; it's not dealing with the root cause that we've overloaded
 the signal handling mechanism too much.   What reason is there to think
 that there are no other symptoms of that?

I'm not arguing against fixing that. I think we need to do both,
although I am wary of fixing the signal handling in the back
branches. Fixing the signal handling won't get rid of the problem that
one e.g. might not be able to log in anymore if all synchronous standbys
are down and login caused hot pruning.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-11 Thread Robert Haas
On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
 I've tracked down the real root cause.  The fix is very simple.  Please
 check the attached one-liner patch.

 I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied
 the code fragment from 9.5devel to 9.2.9.  The attached patch is for 9.2.9.
 I didn't check 9.4 and other versions.  Why wasn't the fix applied to 9.2?

It was considered a performance improvement - i.e. a feature - rather
than a bug fix, so it was only added to master.  That was after the
release of 9.2 and before the release of 9.3, so it's in newer
branches but not older ones.

Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Branch: master Release: REL9_3_BR [c9d7dbacd] 2013-01-29 10:43:33 +0200

Skip truncating ON COMMIT DELETE ROWS temp tables, if the transaction hasn't
touched any temporary tables.

We could try harder, and keep track of whether we've inserted to any temp
tables, rather than accessed them, and which temp tables have been inserted
to. But this is dead simple, and already covers many interesting scenarios.

I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
problem.  As the commit message says, it's dead simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-11 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Fri, Aug 8, 2014 at 10:30 AM, MauMau maumau...@gmail.com wrote:
 I've tracked down the real root cause.  The fix is very simple.  Please
 check the attached one-liner patch.

 I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
 problem.  As the commit message says, it's dead simple.

While I have no great objection to back-porting Heikki's patch, it seems
like a very large stretch to call this a root-cause fix.  At best it's
band-aiding one symptom in a rather fragile way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-08-08 Thread MauMau
I've tracked down the real root cause.  The fix is very simple.  Please 
check the attached one-liner patch.


The cause is that the temporary relations are truncated unconditionally 
regardless of whether they are accessed in the transaction or not.  That is, 
the following sequence of steps result in the hang:


1. A session creates a temporary table with ON COMMIT DELETE ROWS.  It adds 
the temporary table to the list of relations that should be truncated at 
transaction commit.


2. The session receives a sinval catchup signal (SIGUSR1) from another 
session.  It starts a transaction and processes sinval messages in the 
SIGUSR1 signal handler.  No WAL is output while processing the sinval 
messages.


3. When the transaction commits, the list of temporary relations are checked 
to see if they need to be truncated.


4. The temporary table created in step 1 is truncated.  To truncate a 
relation, Access Exclusive lock is acquired on it.  When hot standby is 
used, acquiring an Access Exclusive lock generates a WAL record 
(RM_STANDBY_ID, XLOG_STANDBY_LOCK).


5. The transaction waits on a latch for a reply from a synchronous standby, 
because it wrote some WAL.  But the latch wait never returns, because the 
latch needs to receive SIGUSR1 but the SIGUSR1 handler is already in 
progress from step 2.



The correct behavior is for the transaction not to truncate the temporary 
table in step 4, because the transaction didn't use the temporary table.


I confirmed that the fix is already in 9.3 and 9.5devel, so I just copied 
the code fragment from 9.5devel to 9.2.9.  The attached patch is for 9.2.9. 
I didn't check 9.4 and other versions.  Why wasn't the fix applied to 9.2?


Finally, I found a very easy way to reproduce the problem:

1. On terminal session 1, start psql and run:
 CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
Leave the psql session open.

2. On terminal session 2, run:
 pgbench -c8 -t500 -s1 -n -f test.sql dbname
[test.sql]
CREATE TEMPORARY TABLE t (c int) ON COMMIT DELETE ROWS;
DROP TABLE t;

3. On the psql session on terminal session 1, run any SQL statement.  It 
doesn't reply.  The backend is stuck at SyncRepWaitForLSN().


Regards
MauMau


sinval_catchup_hang_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread MauMau

From: MauMau maumau...@gmail.com
I must add one thing.  After some client processes closed the connection 
without any hang, their server processes were stuck with a stack trace 
like this (I'll look for and show the exact stack trace tomorrow):


I found two kinds of stack traces:

#0  0x003199ec488f in poll () from /lib64/libc.so.6
#1  0x00609f24 in WaitLatchOrSocket ()
#2  0x0063ad92 in SyncRepWaitForLSN ()
#3  0x004ad474 in CommitTransaction ()
#4  0x004aef53 in CommitTransactionCommand ()
#5  0x0064b547 in shmem_exit ()
#6  0x0064b625 in proc_exit_prepare ()
#7  0x0064b6a8 in proc_exit ()
#8  0x00668a94 in PostgresMain ()
#9  0x00617f2c in ServerLoop ()
#10 0x0061ae96 in PostmasterMain ()
#11 0x005b2ccf in main ()

#0  0x003f4badf258 in poll () from /lib64/libc.so.6
#1  0x00619b94 in WaitLatchOrSocket ()
#2  0x00640c4c in SyncRepWaitForLSN ()
#3  0x00491c18 in RecordTransactionCommit ()
#4  0x00491d98 in CommitTransaction ()
#5  0x00493135 in CommitTransactionCommand ()
#6  0x00653fc5 in ProcessCatchupEvent ()
#7  0x006540ed in HandleCatchupInterrupt ()
#8  0x006533e3 in procsignal_sigusr1_handler ()
#9  signal handler called
#10 0x003f4bae96b0 in recv () from /lib64/libc.so.6
#11 0x005b75f6 in secure_read ()
#12 0x005c223b in pq_recvbuf ()
#13 0x005c263b in pq_getbyte ()
#14 0x0066e081 in PostgresMain ()
#15 0x00627d81 in PostmasterMain ()
#16 0x005c4803 in main ()


I'll try the fix tomorrow if possible.  What kind of problems do you hink 
of for back-patching?


I could reproduce the problem with 9.2.8, but have not yet with 9.5dev. 
I'll try with 9.2.9, and create the fix.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Andres Freund
On 2014-07-26 20:20:05 +0200, Andres Freund wrote:
 On 2014-07-26 13:58:38 -0400, Tom Lane wrote:
 
  Andres Freund and...@2ndquadrant.com writes:
   That'd require either renegging on SA_RESTART or
   using WaitLatchOrSocket() and nonblocking send/recv.
  
  Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
  We already have a hook that lets us do the actual recv even when using
  OpenSSL, and in principle that function could do interrupt-service-like
  functions if it got kicked off the recv().
 
 I've started playing with this. Looks clearly worthwile.
 
 I think if we do it right we pretty much can get rid of the whole
 prepare_for_client_read() machinery and handle everything via
 ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
 with joy.

One thing I am wondering about around this is: Why are we only
processing catchup events when DoingCommandRead? There's other paths
where we can wait for data from the client for a long time. Obviously we
don't want to process async.c stuff from inside copy, but I don't see
why that's the case for sinval.c.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 One thing I am wondering about around this is: Why are we only
 processing catchup events when DoingCommandRead? There's other paths
 where we can wait for data from the client for a long time. Obviously we
 don't want to process async.c stuff from inside copy, but I don't see
 why that's the case for sinval.c.

It might be all right to do it during copy, but I'd just as soon treat
that as a separate issue.  If you merge it into the basic patch then it
might be hard to get rid of if there are problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-28 Thread Andres Freund
On 2014-07-28 15:29:57 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  One thing I am wondering about around this is: Why are we only
  processing catchup events when DoingCommandRead? There's other paths
  where we can wait for data from the client for a long time. Obviously we
  don't want to process async.c stuff from inside copy, but I don't see
  why that's the case for sinval.c.
 
 It might be all right to do it during copy, but I'd just as soon treat
 that as a separate issue.  If you merge it into the basic patch then it
 might be hard to get rid of if there are problems.

Yea, not planning to merge it. Just wondering to make sure I understand
all the implications.

Another thing I'm wondering about - also not for the basic patch - is
accepting termination while writing to the client. It's rather annoying
that we currently don't allow to pg_terminate_backend() when writing to
the client.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread MauMau

From: Robert Haas robertmh...@gmail.com

I think the problem here is that it actually is possible for one
session to access the temporary objects of another session:
Now, we could prohibit that specific thing.  But at the very least, it
has to be possible for one session to drop another session's temporary
objects, because autovacuum does it eventually, and superusers will
want to do it sooner to shut autovacuum up.  So it's hard to reason
about whether and to what extent it's safe to not send sinval messages
for temporary objects.


I was a bit surprised to know that one session can access the data of 
another session's temporary tables.  That implenentation nay be complicating 
the situation -- extra sinval messages.




I think you might be approaching this problem from the wrong end,
though.  The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record?  The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables.  Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.


I could figure out what log record was output in the transaction started in 
ProcessCatchupEvent() by inserting elog() in XLogInsert().  The log record 
was (RM_STANDBY_ID, XLOG_STANDBY_LOCK).


The cause of the hang turned out clear.  It was caused as follows:

1. When a transaction commits which used a temporary table created with ON 
COMMIT DELETE ROWS, the sinval catchup signal (SIGUSR1) was issued from 
smgrtruncate().  This is because the temporary table is truncated at 
transaction end.


2. Another session, which is waiting for a client request, receives SIGUSR1. 
It calls ProcessCatchupEvent().


3. ProcessCatchupEvent() calls StartTransactionCommand(), emits the 
XLOG_STANDBY_LOCK WAL record, and then calls CommitTransactionCommand().


4. It then calls SyncRepWaitForLSN(), which in turn waits on the latch.

5. But the WaitLatch() never returns, because the session is already running 
inside the SIGUSR1 handler in step 2.  WaitLatch() needs SIGUSR1 to 
complete.


I think there is a problem with the latch and SIGUSR1 mechanism.  How can we 
fix this problem?


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread MauMau

From: Andres Freund and...@2ndquadrant.com

I think we should do what the first paragraph in
http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de
outlined. As Tom says somewhere downthread that requires some code
review, but other than that it should get rid of a fair amount of
problems.


As mentioned in the mail I've just sent,  there seems to be a problem around 
the latch and/or sinval catchup implementation.


Or, is it bad that many things are done in SIGUSR1 handler?  If some 
processing in SIGUSR1 handler requires waiting on a latch, it hangs at 
WaitLatch().  Currently, the only processing in the backend which requires a 
latch may be to wait for the sync standby.  However, in the future, the 
latch may be used for more tasks.


Another problem is, who knows WaitLatch() can return prematurely (before the 
actual waited-for event does SetLatch()) due to the SIGUSR1 issued for 
sinval catchup event?


How should we tackle these problem?

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 [ sinval catchup signal - ProcessCatchupEvent - WaitLatch - deadlock ]

Ugh.

One line of thought is that it's pretty unsafe to be doing anything
as complicated as transaction start/commit in a signal handler, even one
that is sure it's not interrupting anything else.  The only excuse
ProcessCatchupEvent has for that is that it's trying to ensure proper
cleanup from an error.  Perhaps with closer analysis we could convince
ourselves that errors thrown from AcceptInvalidationMessages() wouldn't
need transaction abort cleanup.

For that matter, it's not exactly clear that an error thrown out of
there would result in good things even with the transaction
infrastructure.  Presuming that we're waiting for client input, an
error longjmp out of ProcessCatchupEvent could mean taking control
away from OpenSSL, and I bet that won't end well.  Maybe we should
just be doing

PG_TRY
AcceptInvalidationMessages();
PG_CATCH
elog(FATAL, curl up and die);

ProcessIncomingNotify is also using
StartTransactionCommand()/CommitTransactionCommand(), so that would
need some thought too.

Or we could try to get rid of the need to do anything beyond setting
a flag in the interrupt handler itself.  But I'm afraid that's probably
unworkable, especially now that we use SA_RESTART on all signals.

Another line of thought is that we've been way too uncritical about
shoving different kinds of events into the SIGUSR1 multiplexor.
It might be a good idea to separate high level interrupts from
low level ones, using say SIGUSR1 for the former and SIGUSR2
for the latter.  However, that doesn't sound very back-patchable,
even assuming that we can come up with a clean division.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread Andres Freund
On 2014-07-26 11:32:24 -0400, Tom Lane wrote:
 MauMau maumau...@gmail.com writes:
  [ sinval catchup signal - ProcessCatchupEvent - WaitLatch - deadlock ]
 
 Ugh.
 
 One line of thought is that it's pretty unsafe to be doing anything
 as complicated as transaction start/commit in a signal handler, even one
 that is sure it's not interrupting anything else.

Yea, that's really not nice.

 The only excuse
 ProcessCatchupEvent has for that is that it's trying to ensure proper
 cleanup from an error.  Perhaps with closer analysis we could convince
 ourselves that errors thrown from AcceptInvalidationMessages() wouldn't
 need transaction abort cleanup.

Hm. I'm not convinced that's going to be easy.

Wouldn't it be better to move the catchup interrupt processing out of
the signal handler? For normal backends we only enable when reading from
the client and DoingCommandRead is set. How about setting a variable in
the signal handler and doing the actual catchup processing after the
recv() returned EINTR? That'd require either renegging on SA_RESTART or
using WaitLatchOrSocket() and nonblocking send/recv.  I think that'd be
a much safer model and after researching it a bit for the idle in
transaction timeout thing it doesn't look super hard. Even windows seems
to already support everything necessary.

 Or we could try to get rid of the need to do anything beyond setting
 a flag in the interrupt handler itself.  But I'm afraid that's probably
 unworkable, especially now that we use SA_RESTART on all signals.

Yea :(. Several platforms actually ignore SA_RESTART for
send/recv/select/... under some circumstances (notably linux), but it'd
probably be hard to get it right for all.

I don't think we can continue to use the blocking calls for much longer
unless we allow them to be interruptible. But I doubt that change would
be backpatchable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Wouldn't it be better to move the catchup interrupt processing out of
 the signal handler? For normal backends we only enable when reading from
 the client and DoingCommandRead is set. How about setting a variable in
 the signal handler and doing the actual catchup processing after the
 recv() returned EINTR?

Only it won't.  See SA_RESTART.  I think turning that off is a nonstarter,
as per previous discussions.

 That'd require either renegging on SA_RESTART or
 using WaitLatchOrSocket() and nonblocking send/recv.

Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
We already have a hook that lets us do the actual recv even when using
OpenSSL, and in principle that function could do interrupt-service-like
functions if it got kicked off the recv().

Anything in this line is going to be a bigger change than I'd want to
back-patch, though.  Are we OK with not fixing the problem in the back
branches?  Given the shortage of field complaints, that might be all
right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread Andres Freund
On 2014-07-26 13:58:38 -0400, Tom Lane wrote:

 Andres Freund and...@2ndquadrant.com writes:
  That'd require either renegging on SA_RESTART or
  using WaitLatchOrSocket() and nonblocking send/recv.
 
 Yeah, I was wondering about using WaitLatchOrSocket for client I/O too.
 We already have a hook that lets us do the actual recv even when using
 OpenSSL, and in principle that function could do interrupt-service-like
 functions if it got kicked off the recv().

I've started playing with this. Looks clearly worthwile.

I think if we do it right we pretty much can get rid of the whole
prepare_for_client_read() machinery and handle everything via
ProcessInterrupts(). EnableCatchupInterrupt() et al don't really fill me
with joy.

I'm not yet entirely sure where the interrupt processing should happen,
but I guess that'll fall out of the work at some point. The important
bit imo is to *not* not do anything but return with BIO_set_retry_*()
from my_sock_read/write(). That then allows us to implement stuff like
the idle transaction timeout with much fewer problems.

I probably won't finish doing this before leaving on holidays, so nobody
should hesitate to look themselves if interested. If not, I plan to pick
this up again.  I think it's a prerequisite to getting rid of the FATAL
for recovery conflict interrupts which I really would like to do.

 Anything in this line is going to be a bigger change than I'd want to
 back-patch, though.

Agreed. I don't think it will, but it very well could have performance
implications. Besides the obvious risk of bugs...

 Are we OK with not fixing the problem in the back
 branches?  Given the shortage of field complaints, that might be all
 right.

I'm not really comfortable with that. How about simply flagging a couple
contexts to not do the SyncRepWaitForLsn() dance? Possibly just by doing
something ugly like
SetConfigOption('synchronous_commit', 'off', PGC_INTERNAL,
PGC_S_OVERRIDE, GUC_ACTION_LOCAL, true, ERROR)?
during startup, inval and similar transaction commands? Not pretty, but
it looks simple enough to be backpatchable.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-26 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

[ sinval catchup signal - ProcessCatchupEvent - WaitLatch - deadlock ]


I must add one thing.  After some client processes closed the connection 
without any hang, their server processes were stuck with a stack trace like 
this (I'll look for and show the exact stack trace tomorrow):


WaitLatchOrSocket
SyncRepWaitForLSN
CommitTransaction
CommitTransactionCommand
ProcessCatchupEvent
...
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
...

The process appears to be hanging during session termination.  So, it's not 
the problem only during client request wait.




Another line of thought is that we've been way too uncritical about
shoving different kinds of events into the SIGUSR1 multiplexor.
It might be a good idea to separate high level interrupts from
low level ones, using say SIGUSR1 for the former and SIGUSR2
for the latter.  However, that doesn't sound very back-patchable,
even assuming that we can come up with a clean division.


This seems to be one step in the right direction.  There are two issues in 
the current implementation:


[Issue 1]
[ sinval catchup signal - ProcessCatchupEvent - WaitLatch - deadlock ]
This is (partly) because the latch wakeup and other processing use the same 
SIGUSR1 in normal backend, autovacuum launcher/worker, and the background 
worker with database access.  On the other hand, other background daemon 
processes properly use SIGUSR1 only for latch wakeup, and SIGUSR2 for other 
tasks.


[Issue 2]
WaitLatch() returns prematurely due to the sinval catchup signal, even 
though the target event (e.g. reply from standby) hasn't occurred and called 
SetLatch() yet.  This is because procsignal_sigusr1_handler() always calls 
latch_sigusr1_handler().

This is probably not related to the cause of the hang.

So, as you suggest, I think it would be a good idea to separate 
latch_sigusr1_handler() call into a different function solely for it like 
other daemon processes,  and leave the rest in procsignal_sigusr1_handler() 
and rename function to procsignal_sigusr2_handler() or procsignal_handler(). 
Originally, it's not natural that the current procsignal_sigusr1_handler() 
contains latch_sigusr1_handler() call, because latch processing is not based 
on the procsignal mechanism (SetLatch() doesn't call SendProcSignal()).


I'll try the fix tomorrow if possible.  What kind of problems do you hink of 
for back-patching?


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-24 Thread Robert Haas
On Wed, Jul 23, 2014 at 12:17 PM, MauMau maumau...@gmail.com wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 This seems like a pretty unsafe suggestion, because the smgr level doesn't
 know or care whether relations are temp; files are files.  In any case it
 would only paper over one specific instance of whatever problem you're
 worried about, because sinval messages definitely do need to be sent in
 general.

 I'm sorry I don't show the exact problem yet.  Apart from that, I understood
 that you insist it's not appropriate for smgr to be aware of whether the
 file is a temporary relation, in terms of module layering.  However, it
 doesn't seem so in the current implementation.  md.c, which is a layer under
 or part of smgr, uses SmgrIsTemp().  In addition, as the name suggests,
 SmgrIsTemp() is a function of smgr, which is defined in smgr.h.  So, it's
 not inappropriate for smgr to use it.

 What I wanted to ask is whether and why sinval messages are really necessary
 for session-private objects like temp relations.  I thought shared inval is,
 as the name indicates, for objects shared among sessions.  If so, sinval
 for session-private objects doesn't seem to match the concept.

I think the problem here is that it actually is possible for one
session to access the temporary objects of another session:

rhaas=# create temp table fructivore (a int);
CREATE TABLE
rhaas=# select 'fructivore'::regclass::oid;
  oid
---
 24578
(1 row)

Switch windows:

rhaas=# select 24578::regclass;
   regclass
--
 pg_temp_2.fructivore
(1 row)

rhaas=# alter table pg_temp_2.fructivore add column b int;
ALTER TABLE

Now, we could prohibit that specific thing.  But at the very least, it
has to be possible for one session to drop another session's temporary
objects, because autovacuum does it eventually, and superusers will
want to do it sooner to shut autovacuum up.  So it's hard to reason
about whether and to what extent it's safe to not send sinval messages
for temporary objects.

I think you might be approaching this problem from the wrong end,
though.  The question in my mind is: why does the
StartTransactionCommand() / CommitTransactionCommand() pair in
ProcessCatchupEvent() end up writing a commit record?  The obvious
possibility that occurs to me is that maybe rereading the invalidated
catalog entries causes a HOT prune, and maybe there ought to be some
way for a transaction that has only done HOT pruning to commit
asynchronously, just as we already do for transactions that only
modify temporary tables.  Or, failing that, maybe there's a way to
suppress synchronous commit for this particular transaction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-24 Thread Andres Freund
On 2014-07-24 11:17:15 -0400, Robert Haas wrote:
 I think you might be approaching this problem from the wrong end,
 though.

Yep.

  The question in my mind is: why does the
 StartTransactionCommand() / CommitTransactionCommand() pair in
 ProcessCatchupEvent() end up writing a commit record?  The obvious
 possibility that occurs to me is that maybe rereading the invalidated
 catalog entries causes a HOT prune, and maybe there ought to be some
 way for a transaction that has only done HOT pruning to commit
 asynchronously, just as we already do for transactions that only
 modify temporary tables.  Or, failing that, maybe there's a way to
 suppress synchronous commit for this particular transaction.

I think we should do what the first paragraph in
http://archives.postgresql.org/message-id/20140707155113.GB1136%40alap3.anarazel.de
outlined. As Tom says somewhere downthread that requires some code
review, but other than that it should get rid of a fair amount of
problems.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-23 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 Looking at smgrtruncate(), the sinval message is sent even when the 
 truncated relation is a temporary relation.  However, I think the sinval 
 message is not necessary for temp relations, because each session doesn't 
 see the temp relations of other sessions.

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files.  In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Should smgrtruncate() avoid sending sinval message for temp relations

2014-07-23 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

This seems like a pretty unsafe suggestion, because the smgr level doesn't
know or care whether relations are temp; files are files.  In any case it
would only paper over one specific instance of whatever problem you're
worried about, because sinval messages definitely do need to be sent in
general.


I'm sorry I don't show the exact problem yet.  Apart from that, I understood 
that you insist it's not appropriate for smgr to be aware of whether the 
file is a temporary relation, in terms of module layering.  However, it 
doesn't seem so in the current implementation.  md.c, which is a layer under 
or part of smgr, uses SmgrIsTemp().  In addition, as the name suggests, 
SmgrIsTemp() is a function of smgr, which is defined in smgr.h.  So, it's 
not inappropriate for smgr to use it.


What I wanted to ask is whether and why sinval messages are really necessary 
for session-private objects like temp relations.  I thought shared inval is, 
as the name indicates, for objects shared among sessions.  If so, sinval 
for session-private objects doesn't seem to match the concept.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers