Re: [HACKERS] [ADMIN] reindexdb hangs
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > I am unsure if I should backpatch to 8.1: the code in cluster.c has > > changed, and while it is relatively easy to modify the patch, this is a > > rare bug and nobody has reported it in CLUSTER (not many people clusters > > temp tables, it seems). Should I patch only REINDEX? How far back? > > I'd say go as far back as you can conveniently modify the patch for. > This is a potential data-loss bug (even if only for temporary data) > so we ought to take it seriously. OK, I fixed it all the way back that it was needed: 7.4 for CLUSTER and 8.1 for REINDEX. Before 7.4 there wasn't a database-wide version of CLUSTER. This wasn't a very serious bug in any case, because it would have thrown an ERROR if it tried to cluster a remote temp table. So apparently no one ever saw it, because I can't remember any report about it. For REINDEX the story is very different, because what happens is that queries start silently returning wrong data. My test case was alvherre=# create temp table foo (a int primary key); NOTICE: CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para la tabla «foo» CREATE TABLE alvherre=# insert into foo select * from generate_series(1,4); INSERT 0 4 -- "reindex database alvherre" in another session alvherre=# select * from foo where a = 400; a - (0 rows) If you now REINDEX this table in the current session, it correctly returns one tuple. So if somebody is executing a procedure which involve temp tables and someone else concurrently does a REINDEX DATABASE, the queries of the first session are automatically corrupted. It seems like the worst kind of bug. Maybe we need additional protections on the bufmgr to prevent this kind of problem. We only introduced REINDEX DATABASE as a way to reindex user indexes in 8.1. Before that, it only considered system catalogs, which ISTM are never temp. Many thanks to dx k9 for the original report. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Endurecerse, pero jamás perder la ternura" (E. Guevara) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [ADMIN] reindexdb hangs
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I am unsure if I should backpatch to 8.1: the code in cluster.c has > changed, and while it is relatively easy to modify the patch, this is a > rare bug and nobody has reported it in CLUSTER (not many people clusters > temp tables, it seems). Should I patch only REINDEX? How far back? I'd say go as far back as you can conveniently modify the patch for. This is a potential data-loss bug (even if only for temporary data) so we ought to take it seriously. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [ADMIN] reindexdb hangs
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > I'm not sure I follow. Are you suggesting adding a new function, > > similar to pg_class_ownercheck, which additionally checks for temp-ness? > > No, I was just suggesting adding the check for temp-ness in cluster() > and cluster_rel() where we do pg_class_ownercheck. We already have the > rel open there and so it's cheap to do the temp-ness check. I applied a patch along these lines to HEAD and 8.2. I am unsure if I should backpatch to 8.1: the code in cluster.c has changed, and while it is relatively easy to modify the patch, this is a rare bug and nobody has reported it in CLUSTER (not many people clusters temp tables, it seems). Should I patch only REINDEX? How far back? -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [ADMIN] reindexdb hangs
Alvaro Herrera <[EMAIL PROTECTED]> writes: > I'm not sure I follow. Are you suggesting adding a new function, > similar to pg_class_ownercheck, which additionally checks for temp-ness? No, I was just suggesting adding the check for temp-ness in cluster() and cluster_rel() where we do pg_class_ownercheck. We already have the rel open there and so it's cheap to do the temp-ness check. I guess it's a question of which path you are more concerned about making cheap, of course. Your proposal was to filter before putting the rel into the list at all, which certainly saves cycles when we reject a remote temp table; but it adds cycles for all other tables. I argue that the remote-temp-table case is uncommon (else we'd have had many more trouble reports) and therefore optimizing it at the cost of the normal case isn't a win. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [ADMIN] reindexdb hangs
Tom Lane wrote: > Alvaro Herrera <[EMAIL PROTECTED]> writes: > > Tom Lane wrote: > >> Yeah, an extra fetch of the pg_class row doesn't seem all that nice. > >> I think you'd want to check it in approximately the same two places > >> where pg_class_ownercheck() is applied (one for the 1-xact and one for > >> the multi-xact path). > > > Actually, the 1-xact path does not need it, because the check is already > > elsewhere. > > Yeah, but if you do it there it's one added comparison > (isOtherTempNamespace is very cheap, and you can get the namespace > cheaply from the already-open rel). This way you need an extra syscache > lookup because you are insisting on doing the check in a place where you > don't have easy access to the pg_class row. Doesn't seem better. I'm not sure I follow. Are you suggesting adding a new function, similar to pg_class_ownercheck, which additionally checks for temp-ness? -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] reindexdb hangs
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Yeah, an extra fetch of the pg_class row doesn't seem all that nice. >> I think you'd want to check it in approximately the same two places >> where pg_class_ownercheck() is applied (one for the 1-xact and one for >> the multi-xact path). > Actually, the 1-xact path does not need it, because the check is already > elsewhere. Yeah, but if you do it there it's one added comparison (isOtherTempNamespace is very cheap, and you can get the namespace cheaply from the already-open rel). This way you need an extra syscache lookup because you are insisting on doing the check in a place where you don't have easy access to the pg_class row. Doesn't seem better. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] [ADMIN] reindexdb hangs
Tom Lane wrote: > > I examined cluster.c and it does seem to be missing a check too. I'm > > not sure where to add one though; the best choice would be the place > > where the list of rels is built, but that scans only pg_index, so it > > doesn't have access to the namespace of each rel. So one idea would be > > to get the pg_class row for each candidate, but that seems slow. > > Another idea would be to just add all the candidates and silently skip > > the temp indexes in cluster_rel. > > Yeah, an extra fetch of the pg_class row doesn't seem all that nice. > I think you'd want to check it in approximately the same two places > where pg_class_ownercheck() is applied (one for the 1-xact and one for > the multi-xact path). Actually, the 1-xact path does not need it, because the check is already elsewhere. We only need logic enough to skip temp tables silently while building the list in the multi-xact path. What this means is that very few people, if any, clusters temp tables; because as soon as you do, a plain CLUSTER command in another session errors out with alvherre=# cluster; ERROR: cannot cluster temporary tables of other sessions So, patch attached. > Are there any other commands to be worried about? I can't see any > besides VACUUM/ANALYZE, and those seem covered. I can't think of any. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support Index: src/backend/commands/cluster.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/cluster.c,v retrieving revision 1.162 diff -c -p -r1.162 cluster.c *** src/backend/commands/cluster.c 19 May 2007 01:02:34 - 1.162 --- src/backend/commands/cluster.c 29 Aug 2007 22:07:04 - *** cluster_rel(RelToCluster *rvtc, bool rec *** 276,281 --- 276,287 } /* + * Note: we don't need to check whether the table is a temp for a + * remote session here, because it will be checked in + * check_index_is_clusterable, below. + */ + + /* * Check that the index still exists */ if (!SearchSysCacheExists(RELOID, *** swap_relation_files(Oid r1, Oid r2, Tran *** 995,1004 } /* ! * Get a list of tables that the current user owns and ! * have indisclustered set. Return the list in a List * of rvsToCluster ! * with the tableOid and the indexOid on which the table is already ! * clustered. */ static List * get_tables_to_cluster(MemoryContext cluster_context) --- 1001,1037 } /* ! * Returns whether a pg_class tuple belongs to a temp namespace which is not ! * our backend's. ! */ ! static bool ! relid_is_other_temp(Oid class_oid) ! { ! HeapTuple tuple; ! Form_pg_class classForm; ! bool istemp; ! ! tuple = SearchSysCache(RELOID, ! ObjectIdGetDatum(class_oid), ! 0, 0, 0); ! if (!HeapTupleIsValid(tuple)) ! ereport(ERROR, ! (errcode(ERRCODE_UNDEFINED_TABLE), ! errmsg("relation with OID %u does not exist", class_oid))); ! ! classForm = (Form_pg_class) GETSTRUCT(tuple); ! istemp = isOtherTempNamespace(classForm->relnamespace); ! ! ReleaseSysCache(tuple); ! ! return istemp; ! } ! ! /* ! * Get a list of tables that the current user owns, have indisclustered set, ! * and are not temp tables of remote backends. Return the list in a List * of ! * rvsToCluster with the tableOid and the indexOid on which the table is ! * already clustered. */ static List * get_tables_to_cluster(MemoryContext cluster_context) *** get_tables_to_cluster(MemoryContext clus *** 1031,1036 --- 1064,1072 if (!pg_class_ownercheck(index->indrelid, GetUserId())) continue; + if (relid_is_other_temp(index->indrelid)) + continue; + /* * We have to build the list in a different memory context so it will * survive the cross-transaction processing Index: src/backend/commands/indexcmds.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.162 diff -c -p -r1.162 indexcmds.c *** src/backend/commands/indexcmds.c 25 Aug 2007 19:08:19 - 1.162 --- src/backend/commands/indexcmds.c 25 Aug 2007 22:11:18 - *** ReindexDatabase(const char *databaseName *** 1292,1297 --- 1292,1301 if (classtuple->relkind != RELKIND_RELATION) continue; + /* Skip temp tables of other backends; we can't reindex them at all */ + if (isOtherTempNamespace(classtuple->relnamespace)) + continue; + /* Check user/system classification, and optionally skip */ if (IsSystemClass(classtuple)) { ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not
Re: [HACKERS] [ADMIN] reindexdb hangs
Alvaro Herrera <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> Hmm, there is not any filter in ReindexDatabase() to exclude temp tables >> of other backends, but it sure seems like there needs to be. CLUSTER >> might have the same issue. I think we fixed this in VACUUM long ago, >> but we need to check the other commands that grovel over all of a database. > Was this ever fixed? I think it wasn't, because I don't see any check > in ReindexDatabase. Here is a patch to add one. No, that's still on the todo list. Your patch looks reasonable. > I examined cluster.c and it does seem to be missing a check too. I'm > not sure where to add one though; the best choice would be the place > where the list of rels is built, but that scans only pg_index, so it > doesn't have access to the namespace of each rel. So one idea would be > to get the pg_class row for each candidate, but that seems slow. > Another idea would be to just add all the candidates and silently skip > the temp indexes in cluster_rel. Yeah, an extra fetch of the pg_class row doesn't seem all that nice. I think you'd want to check it in approximately the same two places where pg_class_ownercheck() is applied (one for the 1-xact and one for the multi-xact path). Are there any other commands to be worried about? I can't see any besides VACUUM/ANALYZE, and those seem covered. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] reindexdb hangs
Tom Lane wrote: > "dx k9" <[EMAIL PROTECTED]> writes: > > [ stuck reindex ] > > It turns out it was a temporary database and temporary table, that just > > wasn't there maybe it thought it was there from some type of snapshot then > > the next minute it was gone. > > Hmm, there is not any filter in ReindexDatabase() to exclude temp tables > of other backends, but it sure seems like there needs to be. CLUSTER > might have the same issue. I think we fixed this in VACUUM long ago, > but we need to check the other commands that grovel over all of a database. Was this ever fixed? I think it wasn't, because I don't see any check in ReindexDatabase. Here is a patch to add one. I examined cluster.c and it does seem to be missing a check too. I'm not sure where to add one though; the best choice would be the place where the list of rels is built, but that scans only pg_index, so it doesn't have access to the namespace of each rel. So one idea would be to get the pg_class row for each candidate, but that seems slow. Another idea would be to just add all the candidates and silently skip the temp indexes in cluster_rel. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. Index: src/backend/commands/indexcmds.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/commands/indexcmds.c,v retrieving revision 1.162 diff -c -p -r1.162 indexcmds.c *** src/backend/commands/indexcmds.c 25 Aug 2007 19:08:19 - 1.162 --- src/backend/commands/indexcmds.c 25 Aug 2007 22:11:18 - *** ReindexDatabase(const char *databaseName *** 1292,1297 --- 1292,1301 if (classtuple->relkind != RELKIND_RELATION) continue; + /* Skip temp tables of other backends; we can't reindex them at all */ + if (isOtherTempNamespace(classtuple->relnamespace)) + continue; + /* Check user/system classification, and optionally skip */ if (IsSystemClass(classtuple)) { ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [ADMIN] reindexdb hangs
"dx k9" <[EMAIL PROTECTED]> writes: > [ stuck reindex ] > It turns out it was a temporary database and temporary table, that just > wasn't there maybe it thought it was there from some type of snapshot then > the next minute it was gone. Hmm, there is not any filter in ReindexDatabase() to exclude temp tables of other backends, but it sure seems like there needs to be. CLUSTER might have the same issue. I think we fixed this in VACUUM long ago, but we need to check the other commands that grovel over all of a database. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly