[HACKERS] clog_redo causing very long recovery time

2011-05-02 Thread Joseph Conway
I'm working with a client that uses Postgres on what amounts to an
appliance.

The database is therefore subject to occasional torture such as, in this
particular case, running out of disk space while performing a million
plus queries (of mixed varieties, many using plpgsql with exception
handling -- more on that later), and eventually being power-cycled. Upon
restart, clog_redo was called approx 885000 times (CLOG_ZEROPAGE) during
recovery, which took almost 2 hours on their hardware. I should note
that this is on Postgres 8.3.x.

After studying the source, I can only see one possible way that this
could have occurred:

In varsup.c:GetNewTracsactionId(), ExtendCLOG() needs to succeed on a
freshly zeroed clog page, and ExtendSUBTRANS() has to fail. Both of
these calls can lead to a page being pushed out of shared buffers and to
disk, so given a lack of disk space, sufficient clog buffers, but lack
of subtrans buffers, this could happen. At that point the transaction id
does not get advanced, so clog zeros the same page, extendSUBTRANS()
fails again, rinse and repeat.

I believe in the case above, subtrans buffers were exhausted due to the
extensive use of plpgsql with exception handling.

I can simulate this failure with the attached debug-clog patch which
makes use of two pre-existing debug GUCs to selectively interject an
ERROR in between calls to ExtendCLOG() and ExtendSUBTRANS(). If you want
to test this yourself, apply this patch and use the function in
test_clog.sql to generate a million or so transactions. After the first
32K or before (based on when clog gets its first opportunity to zero a
new page) you should start seeing messages about injected ERRORs. Let a
few hundred thousand ERRORs go by, then kill postgres. Recovery will
take ages, because clog_redo is calling fsync hundreds of thousands of
times in order to zero the same page over and over.

The attached fix-clogredo diff is my proposal for a fix for this.

Thoughts/alternatives, etc appreciated.

Thanks,

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

diff --git a/src/backend/access/transam/clog.c 
b/src/backend/access/transam/clog.c
index 52224b1..317bc2e 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -36,6 +36,7 @@
 #include "access/slru.h"
 #include "access/transam.h"
 #include "postmaster/bgwriter.h"
+#include "utils/guc.h"
 
 /*
  * Defines for CLOG page sizes.  A page is the same BLCKSZ as is used
@@ -355,6 +356,9 @@ ExtendCLOG(TransactionId newestXact)
/* Zero the page and make an XLOG entry about it */
ZeroCLOGPage(pageno, true);
 
+   /* steal this variable for test -- means we've been here */
+   Debug_print_rewritten = true;
+
LWLockRelease(CLogControlLock);
 }
 
diff --git a/src/backend/access/transam/varsup.c 
b/src/backend/access/transam/varsup.c
index 8838d42..e55a67b 100644
--- a/src/backend/access/transam/varsup.c
+++ b/src/backend/access/transam/varsup.c
@@ -21,6 +21,7 @@
 #include "storage/pmsignal.h"
 #include "storage/proc.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 
 
 /* Number of OIDs to prefetch (preallocate) per XLOG write */
@@ -107,6 +108,11 @@ GetNewTransactionId(bool isSubXact)
 * Extend pg_subtrans too.
 */
ExtendCLOG(xid);
+   if (Debug_print_rewritten && Debug_pretty_print)
+   {
+   Debug_print_rewritten = false;
+   elog(ERROR,"injected ERROR");
+   }
ExtendSUBTRANS(xid);
 
/*
diff -cNr postgresql-8.3.13.orig/src/backend/access/transam/clog.c 
postgresql-8.3.13/src/backend/access/transam/clog.c
*** postgresql-8.3.13.orig/src/backend/access/transam/clog.cTue Dec 14 
03:51:20 2010
--- postgresql-8.3.13/src/backend/access/transam/clog.c Thu Apr 28 12:04:52 2011
***
*** 74,79 
--- 75,81 
  
  #define ClogCtl (&ClogCtlData)
  
+ static int last_pageno = -1;
  
  static intZeroCLOGPage(int pageno, bool writeXlog);
  static bool CLOGPagePrecedes(int page1, int page2);
***
*** 471,476 
--- 476,488 
  
memcpy(&pageno, XLogRecGetData(record), sizeof(int));
  
+   /* avoid repeatedly zeroing the same page */
+   if (InRecovery && pageno == last_pageno)
+   return;
+ 
+   /* save state */
+   last_pageno = pageno;
+ 
LWLockAcquire(CLogControlLock, LW_EXCLUSIVE);
  
slotno = ZeroCLOGPage(pageno, false);
create language plpgsql;
\i /path/to/share/contrib/dblink.sql

CREATE OR REPLACE FUNCTION test_clog(howmany int) RETURNS int AS $_$
DECLARE
 i int;
 arr text[];
 dbname text;
BEGIN
 dbname := current_database();
 arr := dblink_get_connections();
 IF arr IS NOT NULL THEN
  PERFORM dblink_disconnect('conn');
 END IF;
 EXECUTE $$SELECT dblink_connect('conn','dbname=$$ || dbname || $$')$$;
 PERFORM db

Re: [HACKERS] Re: making write location work (was: Efficient transaction-controlled synchronous replication)

2011-03-24 Thread Joseph Conway
On 3/24/11 8:16 AM, Kevin Grittner wrote:
> Simon Riggs  wrote:
>> Robert Haas  wrote:
>  
>>> At least as I understand it, it's not our project policy to carry
>>> around code that doesn't accomplish anything useful. I have no
>>> objection to keeping the field; I simply think that if we're
>>> going to have it, we should make it work
>  
>> What a stupid conversation.
>  
> That hardly seems like a convincing response.  Adding a column to a
> view when the column contains meaninful values seems less likely to
> break things than initially adding it with a different value,
> identical to another column, and then changing the semantics.
>  
> +1 for either dropping it or making it work.

+1

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Row-level Locks & SERIALIZABLE transactions, postgres vs. Oracle

2010-07-17 Thread Joseph Conway
On 7/17/10 12:09 PM, Kevin Grittner wrote:
> Joe Conway  wrote:
>  
>> Should I be installing Florian's patch in addition to yours when I
>> start testing?
>  
> There's some manual fix-up needed, primarily because we need to
> differentiate between SERIALIZABLE and REPEATABLE READ isolation
> levels, and therefore replaced the IsXactIsoLevelSerializable macro
> with IsXactIsoLevelXactSnapshotBased and
> IsXactIsoLevelFullySerializable.  If you can wait until tomorrow,
> I'll create a merged patch for you and confirm that it passes the
> modified Florian's pgbench test (with the FOR SHARED clause
> removed).  I'll include a crude hack to pgbench I had to use to test
> this, with an explanation of why it was needed.  I'm still trying to
> put together better testing techniques for the long term.
>  
>> Also, where can I get the latest and greatest version of your
>> patch?
>  
> There's always the git repository, but I'll post a new patch
> tomorrow, based on what I've recently found.

That all sounds great. I'll concentrate today on understanding the
theory and high level design.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] walreceiver is uninterruptible on win32

2010-04-12 Thread Joseph Conway
Fujii Masao wrote:
> If adding new shared library is too big change at this point, I think
> that we should postpone the fix only for dblink to 9.1 or later. Since
> no one has complained about this long-term problem of dblink, I'm not
> sure it really should be fixed right now. Thought?

I would agree with this. No one has ever complained that I am aware of.

Joe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers