Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote: > daveg writes: > > Here is the update: the problem happens with vacuum full alone, no reindex > > is needed to trigger it. I updated the script to avoid reindexing after > > vacuum. Over the past two days there are still many ocurrances of this > > error coincident with the vacuum. > > Well, that jives with the assumption that the one case we saw in > the buildfarm was the same thing, because the regression tests were > certainly only doing a VACUUM FULL and not a REINDEX of pg_class. > But it doesn't get us much closer to understanding what's happening. > In particular, it seems to knock out most ideas associated with race > conditions, because the VAC FULL should hold exclusive lock on pg_class > until it's completely done (including index rebuilds). > > I think we need to start adding some instrumentation so we can get a > better handle on what's going on in your database. If I were to send > you a source-code patch for the server that adds some more logging > printout when this happens, would you be willing/able to run a patched > build on your machine? Yes we can run an instrumented server so long as the instrumentation does not interfere with normal operation. However, scheduling downtime to switch binaries is difficult, and generally needs to be happen on a weekend, but sometimes can be expedited. I'll look into that. > (BTW, just to be perfectly clear ... the "could not find pg_class tuple" > errors always mention index 2662, right, never any other number?) Yes, only index 2662, never any other. I'm attaching a somewhat redacted log for two different databases on the same instance around the time of vacuum full of pg_class in each database. My observations so far are: - the error occurs at commit of vacuum full of pg_class - in these cases error hits autovacuum after it waited for a lock on pg_class - in these two cases there was a new process startup while the vacuum was running. Don't know if this is relevant. - while these hit autovacuum, the error does hit other processs (just not in these sessions). Unknown if autovacuum is a required component. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. NOTES: -- most statements for db c01 in one catalog vacuum session -- database c01 is oid 16404 -- applications run as user 'app' -- vacuums run as user postgres -- somewhat redacted MM:SS.ms piduser log text - - - 03:00.350 2716 c01 postgres vacuum full pg_catalog.pg_class; 03:00.507 2866 LOG: process 2866 acquired ShareUpdateExclusiveLock on relation 2604 of database 16416 after 10143.750 ms 03:01.222 21685 c01 app 11100.670 ms 03:01.223 21685 c01 app SELECT DISTINCT 03:01.224 21685 c01 app 1.524 ms 03:02.031 3001 c01 app connection authorized: user=app database=c01 03:04.952 2787 LOG: process 2787 still waiting for AccessShareLock on relation 1259 of database 16412 after 5001.035 ms 03:05.065 2787 LOG: process 2787 acquired AccessShareLock on relation 1259 of database 16412 after 5114.253 ms 03:05.564 2977 LOG: process 2977 still waiting for AccessShareLock on relation 1259 of database 16404 after 5000.970 ms 03:05.640 2731 LOG: process 2731 still waiting for RowExclusiveLock on relation 1259 of database 16404 after 5000.186 ms 03:06.045 2977 LOG: process 2977 acquired AccessShareLock on relation 1259 of database 16404 after 5482.389 ms 03:06.045 2731 LOG: process 2731 acquired RowExclusiveLock on relation 1259 of database 16404 after 5405.652 ms 03:06.046 2731 ERROR: could not find pg_class tuple for index 2662 03:06.046 2731 CONTEXT: automatic vacuum of table "c01.pg_catalog.pg_index" 03:06.046 2731 ERROR: could not find pg_class tuple for index 2662 03:06.046 2716 c01 postgres 5696.537 ms 03:06.056 3001 c01 app SET SESSION TIME ZONE 'UTC'; ... 03:06.057 2716 c01 postgres vacuum full pg_catalog.pg_rewrite; 03:06.066 3001 c01 app 10.459 ms 03:06.091 3001 c01 app disconnection: session time: 0:00:04.085 user=app database=c01 host=xxxl01 03:08.908 3006 LOG: process 3006 still waiting for AccessShareLock on relation 2659 of database 16407 after 5000.778 ms 03:11.777 21685 c01 app SELECT 03:11.779 21685 c01 app 2.296 ms 03:11.779 21685 c01 app SELECT 03:11.780 21685 c01 app 0.328 ms 03:11.798 21685 c01 app SELECT 03:11.799 21685 c01 app 0.348 ms 03:11.800 21685 c01 app SELECT 03:11.800 21685 c01 app 0.205 ms 03:11.804 21685 c01 app SELECT 03:11.805 21685 c01 app 0.589 ms 03:11.806 21685 c01 app SELECT DISTINCT 03:11.809 21685 c01 app 3.552 ms 03:11.810 21685 c01 app SELECT 03:12.554 21685 c01 app 744.233 ms 03:12.555 21685 c01 app SELECT DISTINCT 03:12.556 21685 c01 app 1.136 ms 03:13.153 2716 c01 post
Re: [HACKERS] SSI heap_insert and page-level predicate locks
On Wed, 2011-06-08 at 17:29 -0500, Kevin Grittner wrote: > Heikki Linnakangas wrote: > > heap_insert() calls CheckForSerializableConflictIn(), which checks if > > there is a predicate lock on the whole relation, or on the page we're > > inserting to. It does not check for tuple-level locks, because there > > can't be any locks on a tuple that didn't exist before. > > AFAICS, the check for page lock is actually unnecessary. A page-level > > lock on a heap only occurs when tuple-level locks are promoted. It is > > just a coarser-grain representation of holding locks on all tuples on > > the page, *that exist already*. It is not a "gap" lock like the index > > locks are, it doesn't need to conflict with inserting new tuples on > the > > page. In fact, if heap_insert chose to insert the tuple on some other > > heap page, there would have been no conflict. > > Absolutely correct. Patch attached. I like the change, but the comment is slightly confusing. Perhaps something like: "For a heap insert, we only need to check for table locks. Our new tuple can't possibly conflict with existing tuple locks, and heap page locks are only consolidated versions of tuple locks. The index insert will check for any other predicate locks." would be a little more clear? (the last sentence is optional, and I only included it because the original comment mentioned indexes). Same for heap_update(). Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lazy vxid locks, v3
On Wed, 2011-07-20 at 13:41 -0400, Robert Haas wrote: > I took another look at v2 of my lazy vxid locks patch and realized > that it was pretty flaky in a couple of different ways. Here's a > version that I think is a bit more robust, but considering the extent > of the revisions, it probably needs another round of review from > someone before I commit it. > > Any review appreciated; I would prefer not to have to wait until > October to get this committed, since there is quite a bit of follow-on > work that I would like to do as well. FWIW, the performance > characteristics are basically identical to the previous versions, > AFAICT. > fpLocalTransactionId is redundant with the lxid, and the explanation is that one that they have different locking semantics. That looks reasonable, and it avoided the need for the careful ordering while starting/ending a transaction that was present in v2. However, it also looks like you're using it for another purpose: In VirtualXactLockTableCleanup(): /* * If fpVXIDLock has been cleared without touching fpLocalTransactionId, * that means someone transferred the lock to the main lock table. */ if (!fastpath && LocalTransactionIdIsValid(lxid)) Is the "&& LocalTransactionIdIsValid(lxid)" a guard against calling VirtualXactLockTableCleanup twice? Can that happen? Or is it just defensive coding to avoid making an additional assumption? Regards, Jeff Davis PS: In the recent sinval synch patch, you had a typo: "If we haven't catch up completely". Other than that, it looked good. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] python cleanup
On 07/25/2011 12:03 PM, Tom Lane wrote: Andrew Dunstan writes: On 07/25/2011 10:52 AM, Tom Lane wrote: What is features.h, and have its authors read the POSIX standard? AFAICS they have no business defining this symbol. [andrew@emma ~]$ rpm -q -f /usr/include/features.h glibc-headers-2.13-1.x86_64 Oh, for some reason I was thinking this was mingw-specific. [ pokes around ... ] I still think it's a bad idea for the header files to be defining this, but they'll probably point at the part of the POSIX spec that says the results are undefined if the macro is changed after the first system header is #included. I can't immediately think of any way to actually do what you were trying to do (ie, save and restore the definition of the macro). I wonder whether it would be good enough to do this: #include postgres.h #include everything else we want except python headers #undef _POSIX_C_SOURCE #undef _XOPEN_SOURCE #include python headers ... rest of .c file ... This should only fail if (a) some macro imported from system headers attempts to test the value of a feature macro, and (b) the results vary between the system default setting and the setting the python headers selected. Neither of these things seem very probable. OK, attached gives a clean build and passes regression on my Windows box that builds with Python. I had to undefine a few more things and save and restore our *snprintf settings (with code borrowed from plperl.h, where we did this sort of cleanup a while ago). cheers andrew diff --git a/src/pl/plpython/plpython.c b/src/pl/plpython/plpython.c index e03d7ce..da34a17 100644 --- a/src/pl/plpython/plpython.c +++ b/src/pl/plpython/plpython.c @@ -6,6 +6,56 @@ * */ +#include "postgres.h" + +/* system stuff */ +#include +#include + +/* postgreSQL stuff */ +#include "catalog/pg_proc.h" +#include "catalog/pg_type.h" +#include "commands/trigger.h" +#include "executor/spi.h" +#include "funcapi.h" +#include "fmgr.h" +#include "mb/pg_wchar.h" +#include "miscadmin.h" +#include "nodes/makefuncs.h" +#include "parser/parse_type.h" +#include "tcop/tcopprot.h" +#include "access/transam.h" +#include "access/xact.h" +#include "utils/builtins.h" +#include "utils/hsearch.h" +#include "utils/lsyscache.h" +#include "utils/memutils.h" +#include "utils/rel.h" +#include "utils/syscache.h" +#include "utils/typcache.h" + +/* + * Undefine some things that get (re)defined in the + * Python headers. They aren't used below and we've + * already included all the headers we need, so this + * should be pretty safe. + */ + +#undef _POSIX_C_SOURCE +#undef _XOPEN_SOURCE +#undef HAVE_STRERROR +#undef HAVE_TZNAME + +/* + * Sometimes python carefully scribbles on our *printf macros. + * So we undefine them here and redefine them after it's done its dirty deed. + */ + +#ifdef USE_REPL_SNPRINTF +#undef snprintf +#undef vsnprintf +#endif + #if defined(_MSC_VER) && defined(_DEBUG) /* Python uses #pragma to bring in a non-default libpython on VC++ if * _DEBUG is defined */ @@ -84,34 +134,6 @@ typedef int Py_ssize_t; PyObject_HEAD_INIT(type) size, #endif -#include "postgres.h" - -/* system stuff */ -#include -#include - -/* postgreSQL stuff */ -#include "catalog/pg_proc.h" -#include "catalog/pg_type.h" -#include "commands/trigger.h" -#include "executor/spi.h" -#include "funcapi.h" -#include "fmgr.h" -#include "mb/pg_wchar.h" -#include "miscadmin.h" -#include "nodes/makefuncs.h" -#include "parser/parse_type.h" -#include "tcop/tcopprot.h" -#include "access/transam.h" -#include "access/xact.h" -#include "utils/builtins.h" -#include "utils/hsearch.h" -#include "utils/lsyscache.h" -#include "utils/memutils.h" -#include "utils/rel.h" -#include "utils/syscache.h" -#include "utils/typcache.h" - /* define our text domain for translations */ #undef TEXTDOMAIN #define TEXTDOMAIN PG_TEXTDOMAIN("plpython") @@ -119,6 +141,23 @@ typedef int Py_ssize_t; #include #include +/* put back our snprintf and vsnprintf */ +#ifdef USE_REPL_SNPRINTF +#ifdef snprintf +#undef snprintf +#endif +#ifdef vsnprintf +#undef vsnprintf +#endif +#ifdef __GNUC__ +#define vsnprintf(...) pg_vsnprintf(__VA_ARGS__) +#define snprintf(...) pg_snprintf(__VA_ARGS__) +#else +#define vsnprintf pg_vsnprintf +#define snprintfpg_snprintf +#endif /* __GNUC__ */ +#endif /* USE_REPL_SNPRINTF */ + PG_MODULE_MAGIC; /* convert Postgresql Datum or tuple into a PyObject. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libedit memory stomp is apparently fixed in OS X Lion
On Sun, Jul 31, 2011 at 6:45 PM, Alvaro Herrera wrote: > Excerpts from Dave Page's message of sáb jul 30 15:32:03 -0400 2011: >> On Sat, Jul 30, 2011 at 8:25 PM, Tom Lane wrote: >> > I think you had better plan on incorporating GNU readline into installer >> > builds for Lion. >> >> Unfortunately the licence makes that a non-starter. > > A fixed version of libedit, then? Yes, that's the best we can do. -- Dave Page Blog: http://pgsnake.blogspot.com Twitter: @pgsnake EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] libedit memory stomp is apparently fixed in OS X Lion
Excerpts from Dave Page's message of sáb jul 30 15:32:03 -0400 2011: > On Sat, Jul 30, 2011 at 8:25 PM, Tom Lane wrote: > > I think you had better plan on incorporating GNU readline into installer > > builds for Lion. > > Unfortunately the licence makes that a non-starter. A fixed version of libedit, then? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
daveg writes: > Here is the update: the problem happens with vacuum full alone, no reindex > is needed to trigger it. I updated the script to avoid reindexing after > vacuum. Over the past two days there are still many ocurrances of this > error coincident with the vacuum. Well, that jives with the assumption that the one case we saw in the buildfarm was the same thing, because the regression tests were certainly only doing a VACUUM FULL and not a REINDEX of pg_class. But it doesn't get us much closer to understanding what's happening. In particular, it seems to knock out most ideas associated with race conditions, because the VAC FULL should hold exclusive lock on pg_class until it's completely done (including index rebuilds). I think we need to start adding some instrumentation so we can get a better handle on what's going on in your database. If I were to send you a source-code patch for the server that adds some more logging printout when this happens, would you be willing/able to run a patched build on your machine? (BTW, just to be perfectly clear ... the "could not find pg_class tuple" errors always mention index 2662, right, never any other number?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Thu, Jul 28, 2011 at 11:31:31PM -0700, daveg wrote: > On Thu, Jul 28, 2011 at 07:45:01PM -0400, Robert Haas wrote: > > REINDEX. My guess is that this is happening either right around the > > time the VACUUM FULL commits or right around the time the REINDEX > > commits. It'd be helpful to know which, if you can figure it out. > > I'll update my vacuum script to skip reindexes after vacuum full for 9.0 > servers and see if that makes the problem go away. Thanks for reminding > me that they are not needed. However, I suspect it is the vacuum, not the > reindex causing the problem. I'll update when I know. Here is the update: the problem happens with vacuum full alone, no reindex is needed to trigger it. I updated the script to avoid reindexing after vacuum. Over the past two days there are still many ocurrances of this error coincident with the vacuum. -dg -- David Gould da...@sonic.net 510 536 1443510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers