Re: [HACKERS] Buffer statistics for pg_stat_statements
Cedric Villemain cedric.villem...@dalibo.com wrote: Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit : I'd like to add per-query buffer usage into contrib/pg_stat_statements. Here is a patch to add buffer usage columns into pg_stat_statements. It also changes initialzation of the result TupleDesc from manually coded routines to get_call_result_type(). Perhaps it can be usefull to have the percentage for hit/read ratio computed in the view ? I think different DBAs want different derived values; Some of them might want the buffer hit ratio, but others might request numbers per query. I'd like to privide only raw values from pg_stat_statements to keep it simple, but I added a computational expression of hit percentage in the documentation. ! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / !nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent ! FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; ! -[ RECORD 1 ]- ! query | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2; ! calls | 3000 ! total_time | 9.6090010002 ! rows| 2836 ! hit_percent | 99.977897200936 Regards, --- Takahiro Itagaki NTT Open Source Software Center pg_stat_statements_bufusage_20091222.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Sun, 2009-12-20 at 19:11 -0500, Robert Haas wrote: On Sun, Dec 20, 2009 at 3:42 PM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2009-12-19 at 20:59 +0200, Heikki Linnakangas wrote: I put them on the TODO list at https://wiki.postgresql.org/wiki/Hot_Standby_TODO, under the must-fix category. I notice you also re-arranged other items on there, specifically the notion that starting from a shutdown checkpoint is somehow important. It's definitely not any kind of bug. We've discussed this on-list and I've requested that you justify this. So far, nothing you've said on that issue has been at all convincing for me or others. The topic is already mentioned on the HS todo, since if one person requests something we should track that, just in case others eventually agree. But having said that, it clearly isn't a priority, so rearranging the item like that was not appropriate, unless you were thinking of doing it yourself, though that wasn't marked. This doesn't match my recollection of the previous discussion on this topic. I am not sure that I'd call it a bug, but I'd definitely like to see it fixed, and I think I mentioned that previously, though I don't have the email in front ATM. I am also not aware that anyone other than yourself has opined that we should not worry about fixing it, although I might be wrong about that too. At any rate, clearly not a priority seems like an overstatement relative to my memory of that conversation. Please check the thread then. Nobody but me has opined that we should not worry about fixing it, but then nobody else other than Heikki has suggested it is even a feature worthy of inclusion, ever. One person agreed with my position, nobody has spoken in favour of Heikki's position. However, I had already included the feature on the todo; it was further down the todo before a second copy was added, second copy now removed. If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. Please also be careful that you don't mix this up with other improvements, nor say they all need fixing. This isn't a general discussion on those points. There are other important things. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New VACUUM FULL
Simon Riggs si...@2ndquadrant.com wrote: I think we either need to implement that or document that vacuum will not skip all-visible pages when running VACUUM FULL. All-visible does not always mean filled enough, no? We will need to check both visibility and fillfactor when we optimize copying tuples. Old VACUUM FULL was substantially faster than this on tables that had nothing to remove. Yeah, that's why traditional VACUUM FULL is still kept as INPLACE vacuum. Please can you arrange for the cluster operation to skip rebuilding indexes if the table is not reduced in size? Do you think we should dispose the rewritten table when we find the VACUUM FULL (or CLUSTER) is useless? We could save the time to reindex, but we've already consumed time to rewrite tables. It will be still slower than VACUUM FULL INPLACE because it is read-only. Instead, I'd suggest to have conditional database-wide maintenance commands, something like: VACUUM FULL WHERE the table is fragmented We don't have to support the feature by SQL syntax; it could be done in client tools. How about pg_maintain or pg_ctl maintain that cleans up each relation with appropriate command? We could replace vacuumdb, clusterdb, and reindexdb with it then. Part of the reason why these happen is that the code hasn't been refactored much at all from its original use for cluster. There are almost zero comments to explain the additional use of this code for VACUUM, or at least to explain it still all works even when there is no index. Ok, I'll check for additional comments. The flow of code might be still confusable because vacuum() calls cluster(), but we need major replacement of codes to refactor it. I'm not sure we need the code rewrites for it. Also, I think we need additional messages for VACUUM VERBOSE (and CLUSTER VERBOSE), though it might be adjusted in another patch. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] fdw validation function vs zero catalog id
I wrote: The validator is run for the generic options specified to CREATE/ALTER FDW, SERVER and USER MAPPING (+ possible future SQL/MED objects). In this case the catalog is always known. Also we can assume that the catalog is known, if a user runs the validator directly. So yes, AFAICS there is no case for the or zero. Updated patch attached. This now also removes the or zero note from the documentation and modifies postgresql_fdw_validator() to assume that a valid catalog oid is always passed. regards, Martin *** a/doc/src/sgml/ref/create_foreign_data_wrapper.sgml --- b/doc/src/sgml/ref/create_foreign_data_wrapper.sgml *** *** 74,83 CREATE FOREIGN DATA WRAPPER replaceable class=parametername/replaceable take two arguments: one of type typetext[]/type, which will contain the array of options as stored in the system catalogs, and one of type typeoid/type, which will be the OID of the ! system catalog containing the options, or zero if the context is ! not known. The return type is ignored; the function should ! indicate invalid options using ! the functionereport()/function function. /para /listitem /varlistentry --- 74,82 take two arguments: one of type typetext[]/type, which will contain the array of options as stored in the system catalogs, and one of type typeoid/type, which will be the OID of the ! system catalog containing the options. The return type is ignored; ! the function should indicate invalid options using the ! functionereport()/function function. /para /listitem /varlistentry *** a/src/backend/commands/foreigncmds.c --- b/src/backend/commands/foreigncmds.c *** *** 88,94 optionListToArray(List *options) * This is used by CREATE/ALTER of FOREIGN DATA WRAPPER/SERVER/USER MAPPING. */ static Datum ! transformGenericOptions(Datum oldOptions, List *options, Oid fdwvalidator) { --- 88,95 * This is used by CREATE/ALTER of FOREIGN DATA WRAPPER/SERVER/USER MAPPING. */ static Datum ! transformGenericOptions(Oid catalogId, ! Datum oldOptions, List *options, Oid fdwvalidator) { *** *** 162,168 transformGenericOptions(Datum oldOptions, result = optionListToArray(resultOptions); if (fdwvalidator) ! OidFunctionCall2(fdwvalidator, result, (Datum) 0); return result; } --- 163,169 result = optionListToArray(resultOptions); if (fdwvalidator) ! OidFunctionCall2(fdwvalidator, result, ObjectIdGetDatum(catalogId)); return result; } *** *** 384,390 CreateForeignDataWrapper(CreateFdwStmt *stmt) nulls[Anum_pg_foreign_data_wrapper_fdwacl - 1] = true; ! fdwoptions = transformGenericOptions(PointerGetDatum(NULL), stmt-options, fdwvalidator); if (PointerIsValid(DatumGetPointer(fdwoptions))) --- 385,393 nulls[Anum_pg_foreign_data_wrapper_fdwacl - 1] = true; ! fdwoptions = transformGenericOptions(ForeignDataWrapperRelationId, ! PointerGetDatum(NULL), ! stmt-options, fdwvalidator); if (PointerIsValid(DatumGetPointer(fdwoptions))) *** *** 501,507 AlterForeignDataWrapper(AlterFdwStmt *stmt) datum = PointerGetDatum(NULL); /* Transform the options */ ! datum = transformGenericOptions(datum, stmt-options, fdwvalidator); if (PointerIsValid(DatumGetPointer(datum))) repl_val[Anum_pg_foreign_data_wrapper_fdwoptions - 1] = datum; --- 504,513 datum = PointerGetDatum(NULL); /* Transform the options */ ! datum = transformGenericOptions(ForeignDataWrapperRelationId, ! datum, ! stmt-options, ! fdwvalidator); if (PointerIsValid(DatumGetPointer(datum))) repl_val[Anum_pg_foreign_data_wrapper_fdwoptions - 1] = datum; *** *** 667,673 CreateForeignServer(CreateForeignServerStmt *stmt) nulls[Anum_pg_foreign_server_srvacl - 1] = true; /* Add server options */ ! srvoptions = transformGenericOptions(PointerGetDatum(NULL), stmt-options, fdw-fdwvalidator); if (PointerIsValid(DatumGetPointer(srvoptions))) --- 673,681 nulls[Anum_pg_foreign_server_srvacl - 1] = true; /* Add server options */ ! srvoptions = transformGenericOptions(ForeignServerRelationId, ! PointerGetDatum(NULL), ! stmt-options, fdw-fdwvalidator); if (PointerIsValid(DatumGetPointer(srvoptions))) *** *** 765,771 AlterForeignServer(AlterForeignServerStmt *stmt) datum = PointerGetDatum(NULL); /* Prepare the options array */ ! datum = transformGenericOptions(datum, stmt-options, fdw-fdwvalidator); if (PointerIsValid(DatumGetPointer(datum))) --- 773,781 datum = PointerGetDatum(NULL); /* Prepare the options array */ ! datum =
Re: [HACKERS] New VACUUM FULL
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote: Old VACUUM FULL was substantially faster than this on tables that had nothing to remove. Yeah, that's why traditional VACUUM FULL is still kept as INPLACE vacuum. Please can you arrange for the cluster operation to skip rebuilding indexes if the table is not reduced in size? Do you think we should dispose the rewritten table when we find the VACUUM FULL (or CLUSTER) is useless? We could save the time to reindex, but we've already consumed time to rewrite tables. The main purpose of doing a new VF is to compact space, so its role has slightly changed from earlier versions. We need much clearer docs about this. On a production system, it seems better to skip the re-indexing, which could take a long, long time. I don't see any way to skip completely re-writing the table though, other than scanning the table with a normal VACUUM first, as old VF does. I would be inclined towards the idea that if somebody does a VF of a whole database then we should look out for and optimise for tables with no changes, but when operating on a single table we just do as instructed and rebuild everything, including indexes. That seems like we should do it, but we're running out of time. For now, I think we can easily and should skip the index build, if appropriate. That just takes a little reworking of code, which appears necessary anyway. We should just document that this works a little differently now and a complete db VF is now not either necessary or a good thing. And running VACUUM; REINDEX DATABASE foo; will now be very pointless. It will be still slower than VACUUM FULL INPLACE because it is read-only. Understood, lets document that. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small typos in Hot Standby docs
On Sat, 2009-12-19 at 18:42 -0800, John Naylor wrote: Here's a patch: Thanks John, much appreciated. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New VACUUM FULL
On Tue, 2009-12-22 at 18:11 +0900, Takahiro Itagaki wrote: Instead, I'd suggest to have conditional database-wide maintenance commands, something like: VACUUM FULL WHERE the table is fragmented We don't have to support the feature by SQL syntax; it could be done in client tools. How about pg_maintain or pg_ctl maintain that cleans up each relation with appropriate command? We could replace vacuumdb, clusterdb, and reindexdb with it then. Some broad thoughts... Our perception of acceptable change is much higher than most users. If we tell people use VACUUM FULL or vacuumdb -f, then that command should, if possible, continue to work well across many releases. vacuumdb in most people's minds is the command you run to ease maintenance and make everything right, rather than a specific set of features. We have It just works as a principle. I think the corollary of that is that we should also have It just continues to work the same way. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New VACUUM FULL
Simon Riggs si...@2ndquadrant.com wrote: Our perception of acceptable change is much higher than most users. If we tell people use VACUUM FULL or vacuumdb -f, then that command should, if possible, continue to work well across many releases. vacuumdb in most people's minds is the command you run to ease maintenance and make everything right, rather than a specific set of features. We have It just works as a principle. I think the corollary of that is that we should also have It just continues to work the same way. I used VACUUM FULL because we were discussing to drop VFI completely, but I won't replace the behavior if hot-standby can support VFI. We can use any keywords without making it reserved in VACUUM (...) syntax. VACUUM (REWRITE), the first idea, can be used here. We can also take on entirely-different syntax for it -- ex, ALTER TABLE REWRITE or SHRINK. I think the ALTER TABLE idea is not so bad because it does _not_ support database-wide maintenance. REWRITE is not the best maintenance in normal cases because a database should contain some rarely-updated tables. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small Bug in GetConflictingVirtualXIDs
On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote: On Monday 21 December 2009 16:48:52 Simon Riggs wrote: Giving the drop database a snapshot is not the answer. I expect Andres to be able to fix this with a simple patch that would not effect the case of normal running. Actually its less simply than I had thought at first - I don't think the code ever handled that correctly. I might be wrong there, my knowledge of the involved code is a bit sparse... The whole conflict resolution builds on the concept of waiting for an VXid, but an idle backend does not have a valid vxid. Thats correct, right? Yes, that's correct. I'll take this one back then. Sure, the code should be modifyable to handle that code mostly transparently (simply ignoring a invalid localTransactionId when the database id is valid), but ... I am inclined to just unconditionally kill the users of the database. Its not like that would be an issue in production. I cant see a case where its important to run a session to its end on a database which was dropped on the master. Opinions on that? I don't see any mileage in making Startup process wait for an idle session, so no real reason to wait for others either. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
2009/12/21 Tom Lane t...@sss.pgh.pa.us: Greg Smith g...@2ndquadrant.com writes: To answer Rafael's concerns directly: you're right that this is confusing. pg_relation_size is always going to do what it does right now just because of how that fits into the design of the database. However, the documentation should be updated to warn against the issue with TOAST here. And it should be easier to get the total you're like to see here: main relation + toasted parts, since that's what most DBAs want in this area. Perhaps invent pg_table_size() = base table + toast table + toast index and pg_indexes_size() = all other indexes for table giving us the property pg_table_size + pg_indexes_size = pg_total_relation_size Did you mean : pg_table_size() = base table + toast table pg_indexes_size() = base indexes + toast indexes ? I think the 8.4 documentation already makes it apparent that pg_relation_size is a pretty low-level number. If we invent other functions with obvious names, that should be sufficient. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Buffer statistics for pg_stat_statements
2009/12/22 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp: Cedric Villemain cedric.villem...@dalibo.com wrote: Le vendredi 18 decembre 2009 09:44:40, Takahiro Itagaki a ecrit : I'd like to add per-query buffer usage into contrib/pg_stat_statements. Here is a patch to add buffer usage columns into pg_stat_statements. It also changes initialzation of the result TupleDesc from manually coded routines to get_call_result_type(). Perhaps it can be usefull to have the percentage for hit/read ratio computed in the view ? I think different DBAs want different derived values; Some of them might want the buffer hit ratio, but others might request numbers per query. I'd like to privide only raw values from pg_stat_statements to keep it simple, but I added a computational expression of hit percentage in the documentation. Yes, you are right. ! bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit / ! nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent ! FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5; ! -[ RECORD 1 ]- ! query | UPDATE pgbench_branches SET bbalance = bbalance + $1 WHERE bid = $2; ! calls | 3000 ! total_time | 9.6090010002 ! rows | 2836 ! hit_percent | 99.977897200936 Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On 22.12.09 9:34 , Simon Riggs wrote: If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. I think it's not so much an important feature but more the removal of a footgun. Image a reporting database where all transactions but a few daily bulk imports are read-only. To spread the load, you do your bulk loads on the master, but run the reporting queries against a read-only HS slave. Now you take the master down for maintenance. Since all clients but the bulk loader use the slave already, and since the bulk loads can be deferred until after the maintenance window closes again, you don't actually do a fail-over. Now you're already pointing at your foot with the gun. All it takes to ruin your day is *some* reason for the slave to restart. Maybe due to a junior DBA's typo, or maybe due to a bug in postgres. Anway, once the slave is down, it won't come up until you manage to get the master up and running again. And this limitation is pretty surprising, since one would assume that if the slave survives a *crash* of the master, it'd certainly survive a simple *shutdown*. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we can just use load_external_function() to load the library and call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the library name. Walreceiver is quite tightly coupled with the rest of the backend anyway, so I don't think we need to come up with a pluggable API at the moment. Please? I am really interested in replacing walsender and walreceiver with something which uses a communication bus like spread instead of a single point to point connection. ISTM if we start with something tightly coupled it'll be hard to decouple later. Whereas if we start with a limited interface we'll learn just how much information is really required by the modules and will have fewer surprises later when we find suprising interdependencies. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs si...@2ndquadrant.com wrote: If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. Can you explain the consequences of missing this? It sounds to me like if I lose my master and it happened to be while it was shut down for whatever reason then I'll be stuck and won't be able to use my standby. If that's true it seems like it's a major problem. Or does it just mean I would have to follow a different procedure when failing over? I'm not sure if it's relevant but one thing to realize is that a lot of MySQL people are used to doing failovers to do regular maintenance tasks like creating indexes or making schema changes. Besides,a lot of sites build in regular failovers to ensure that their failover procedure works. In both cases they usually want to do a clean shut down of the master to ensure they don't lose any transactions during the failover. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Tuplestore should remember the memory context it's created in
With regards to this bug report: http://archives.postgresql.org/pgsql-bugs/2009-12/msg00241.php I think we should change tuplestore code so that callers of tuplestore_put* don't need to switch to the correct memory context (and resource owner, after this patch) before call. Instead, tuplestore_begin_heap() should memorize the context and resource owner used to create the tuplestore, and use that in tuplestore_put* functions. AFAICS it is always a bug to be in a different memory context in tuplestore_put* than in tuplestore_begin_heap(), so it would be more robust to not put the burden on the callers. Patch against CVS HEAD to do that and fix the reported bug attached. Now that the tuplestore_put* switches to the right memory context, we could remove that from all the callers, but this patch only does it for pl_exec.c. Thoughts? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ? GNUmakefile ? config.log ? config.status ? gin-splay-1.patch ? gin-splay-2.patch ? gin-splay-3.patch ? md-1.c ? md-1.patch ? temp-file-resowner-2.patch ? contrib/fuzzystrmatch/.deps ? contrib/fuzzystrmatch/fuzzystrmatch.sql ? contrib/pg_standby/.deps ? contrib/pg_standby/pg_standby ? contrib/pgbench/fsynctest ? contrib/pgbench/fsynctest.c ? contrib/pgbench/fsynctestfile ? contrib/spi/.deps ? src/Makefile.global ? src/backend/aaa.patch ? src/backend/postgres ? src/backend/access/common/.deps ? src/backend/access/gin/.deps ? src/backend/access/gist/.deps ? src/backend/access/hash/.deps ? src/backend/access/heap/.deps ? src/backend/access/index/.deps ? src/backend/access/nbtree/.deps ? src/backend/access/transam/.deps ? src/backend/bootstrap/.deps ? src/backend/catalog/.deps ? src/backend/catalog/postgres.bki ? src/backend/catalog/postgres.description ? src/backend/catalog/postgres.shdescription ? src/backend/commands/.deps ? src/backend/executor/.deps ? src/backend/foreign/.deps ? src/backend/foreign/dummy/.deps ? src/backend/foreign/postgresql/.deps ? src/backend/lib/.deps ? src/backend/libpq/.deps ? src/backend/main/.deps ? src/backend/nodes/.deps ? src/backend/optimizer/geqo/.deps ? src/backend/optimizer/path/.deps ? src/backend/optimizer/plan/.deps ? src/backend/optimizer/prep/.deps ? src/backend/optimizer/util/.deps ? src/backend/parser/.deps ? src/backend/po/af.mo ? src/backend/po/cs.mo ? src/backend/po/de.mo ? src/backend/po/es.mo ? src/backend/po/fr.mo ? src/backend/po/hr.mo ? src/backend/po/hu.mo ? src/backend/po/it.mo ? src/backend/po/ja.mo ? src/backend/po/ko.mo ? src/backend/po/nb.mo ? src/backend/po/nl.mo ? src/backend/po/pl.mo ? src/backend/po/pt_BR.mo ? src/backend/po/ro.mo ? src/backend/po/ru.mo ? src/backend/po/sk.mo ? src/backend/po/sl.mo ? src/backend/po/sv.mo ? src/backend/po/tr.mo ? src/backend/po/zh_CN.mo ? src/backend/po/zh_TW.mo ? src/backend/port/.deps ? src/backend/postmaster/.deps ? src/backend/regex/.deps ? src/backend/rewrite/.deps ? src/backend/snowball/.deps ? src/backend/snowball/snowball_create.sql ? src/backend/storage/buffer/.deps ? src/backend/storage/file/.deps ? src/backend/storage/freespace/.deps ? src/backend/storage/ipc/.deps ? src/backend/storage/large_object/.deps ? src/backend/storage/lmgr/.deps ? src/backend/storage/page/.deps ? src/backend/storage/smgr/.deps ? src/backend/tcop/.deps ? src/backend/tsearch/.deps ? src/backend/utils/.deps ? src/backend/utils/probes.h ? src/backend/utils/adt/.deps ? src/backend/utils/cache/.deps ? src/backend/utils/error/.deps ? src/backend/utils/fmgr/.deps ? src/backend/utils/hash/.deps ? src/backend/utils/init/.deps ? src/backend/utils/mb/.deps ? src/backend/utils/mb/Unicode/BIG5.TXT ? src/backend/utils/mb/Unicode/CP950.TXT ? src/backend/utils/mb/conversion_procs/conversion_create.sql ? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps ? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc2004_sjis2004/.deps ? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps ? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps ? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps ? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps ? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc2004/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps ? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps ?
Re: [HACKERS] Streaming replication and non-blocking I/O
Fujii Masao wrote: On Tue, Dec 22, 2009 at 3:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we can just use load_external_function() to load the library and call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the library name. Walreceiver is quite tightly coupled with the rest of the backend anyway, so I don't think we need to come up with a pluggable API at the moment. That's the way I did it yesterday, see 'replication' branch in my git repository, but it looks like I fumbled the commit so that some of the changes were committed as part of the merge commit with origin/master (=CVS HEAD). Sorry about that. Umm.., I still cannot find the place where the walreceiver module is loaded by using load_external_function() in your 'replication' branch. Also the compilation of that branch fails. Is the 'pushed' branch the latest? Sorry if I'm missing something. Ah, I see. The changes were not included in the merge commit after all, but I had simple forgot to git add them. Sorry about that, should be there now. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: ... AFAICS it is always a bug to be in a different memory context in tuplestore_put* than in tuplestore_begin_heap(), so it would be more robust to not put the burden on the callers. ... Patch against CVS HEAD to do that and fix the reported bug attached. Now that the tuplestore_put* switches to the right memory context, we could remove that from all the callers, but this patch only does it for pl_exec.c. Thoughts? I thought there were comments specifically explaining why it was done that way but I don't recall what they said. Perhaps it was a performance concern since it's going to happen for every tuple put in the tuplestore and usually you'll just be in the same memory context anyways. It would certainly be a lot less confusing the way you describe though. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: With regards to this bug report: http://archives.postgresql.org/pgsql-bugs/2009-12/msg00241.php Hm, do we have any build farm members running with work_mem set to the minimum? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote: On 22.12.09 9:34 , Simon Riggs wrote: If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. I think it's not so much an important feature but more the removal of a footgun. Image a reporting database where all transactions but a few daily bulk imports are read-only. To spread the load, you do your bulk loads on the master, but run the reporting queries against a read-only HS slave. Now you take the master down for maintenance. Since all clients but the bulk loader use the slave already, and since the bulk loads can be deferred until after the maintenance window closes again, you don't actually do a fail-over. Now you're already pointing at your foot with the gun. All it takes to ruin your day is *some* reason for the slave to restart. Maybe due to a junior DBA's typo, or maybe due to a bug in postgres. Anway, once the slave is down, it won't come up until you manage to get the master up and running again. And this limitation is pretty surprising, since one would assume that if the slave survives a *crash* of the master, it'd certainly survive a simple *shutdown*. Well, you either wait for master to come up again and restart, or you flip into normal mode and keep running queries from there. You aren't prevented from using the server, except by your own refusal to failover. That's not enough for me to raise the priority for this feature. But it was already on the list and remains there now. If someone does add this, it will require careful thought about how to avoid introducing further subtle ways to break HS, all of which will need testing and re-testing to avoid regression. So I'm not personally going to be working on it, for this release and likely the next also, nor will I encourage others to do so, for anyone looking to assist. There are more important things for us to do, IMHO. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 11:41 +, Greg Stark wrote: On Tue, Dec 22, 2009 at 8:34 AM, Simon Riggs si...@2ndquadrant.com wrote: If you are saying being able to start Hot Standby from a shutdown checkpoint is an important feature for you, then say so, and why. Can you explain the consequences of missing this? It sounds to me like if I lose my master and it happened to be while it was shut down for whatever reason then I'll be stuck and won't be able to use my standby. If that's true it seems like it's a major problem. Or does it just mean I would have to follow a different procedure when failing over? Failover isn't prevented in this case. If we were going to spend time on anything it would be to make failover and switchback easier so that people aren't afraid of it. I've spent a few weeks trying to remove the shutdown checkpoint, but no luck so far. Switchback optimization is probably something for next release now, unless you're looking for a project? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] New VACUUM FULL
On Tue, 2009-12-22 at 19:45 +0900, Takahiro Itagaki wrote: I used VACUUM FULL because we were discussing to drop VFI completely, but I won't replace the behavior if hot-standby can support VFI. HS can't support VFI now, by definition. We agreed to spend the time getting rid of VFI, which working on this with you is part of. If we can just skip the index rebuild, I think that's all the additional code changes we need. I'll improve the docs as I review-to-commit. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
Greg Stark wrote: On Tue, Dec 22, 2009 at 6:30 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we can just use load_external_function() to load the library and call WalReceiverMain from AuxiliaryProcessMain(). Ie. hard-code the library name. Walreceiver is quite tightly coupled with the rest of the backend anyway, so I don't think we need to come up with a pluggable API at the moment. Please? I am really interested in replacing walsender and walreceiver with something which uses a communication bus like spread instead of a single point to point connection. I think you'd still need to be able to request older WAL segments to resync after a lost connection, restore from base backup etc., which don't really fit into a publish/subscribe style communication bus. I'm sure it could all be solved though. It would be a pretty cool feature, for scaling to a large number of slaves. ISTM if we start with something tightly coupled it'll be hard to decouple later. Whereas if we start with a limited interface we'll learn just how much information is really required by the modules and will have fewer surprises later when we find suprising interdependencies. I'm all ears if you have a concrete proposal. I'm not too worried about it being hard to decouple later. The interface is actually quite limited already, as the communication between processes is done via shared memory. It probably wouldn't be hard to turn it into an API, but I don't think there's a hurry to do that until someone actually steps up to write an alternative walreceiver/walsender, -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
Simon Riggs wrote: If someone does add this, it will require careful thought about how to avoid introducing further subtle ways to break HS, all of which will need testing and re-testing to avoid regression. Well, I *did* add that, but you removed it... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
--On 22. Dezember 2009 11:46:32 +0100 Cédric Villemain cedric.villemain.deb...@gmail.com wrote: Did you mean : pg_table_size() = base table + toast table pg_indexes_size() = base indexes + toast indexes ? Since you always have a toast index automatically it makes sense to include them in pg_table_size(). -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
--On 21. Dezember 2009 12:02:02 -0500 Greg Smith g...@2ndquadrant.com wrote: Tom Lane wrote: Perhaps invent pg_table_size() = base table + toast table + toast index and pg_indexes_size() = all other indexes for table giving us the property pg_table_size + pg_indexes_size = pg_total_relation_size Right; that's exactly the way I'm computing things now, I just have to crawl way too much catalog data to do it. I also agree that if we provide pg_table_size, the issue of pg_relation_size doesn't do what I want goes away without needing to even change the existing documentation--people don't come to that section looking for relation, they're looking for table. Bernd, there's a basic spec if you have time to work on this. I see if i can get some time for it during christmas vacation (its on my radar for a longer period of time). I'm still working on this NOT NULL pg_constraint representation and would like to propose a patch fairly soon for this. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Streaming replication and non-blocking I/O
On Tue, Dec 22, 2009 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Ah, I see. The changes were not included in the merge commit after all, but I had simple forgot to git add them. Sorry about that, should be there now. Thanks for doing git push again! But the compilation still fails. Attached patch addresses this problem. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center *** a/src/backend/Makefile --- b/src/backend/Makefile *** *** 34,41 endif OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a ! # We put libpgport into OBJS, so remove it from LIBS; also add libldap and libpq ! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE) $(libpq) # The backend doesn't need everything that's in LIBS, however LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS)) --- 34,41 OBJS = $(SUBDIROBJS) $(LOCALOBJS) $(top_builddir)/src/port/libpgport_srv.a ! # We put libpgport into OBJS, so remove it from LIBS; also add libldap ! LIBS := $(filter-out -lpgport, $(LIBS)) $(LDAP_LIBS_BE) # The backend doesn't need everything that's in LIBS, however LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, $(LIBS)) *** a/src/backend/postmaster/walreceiverproc/Makefile --- b/src/backend/postmaster/walreceiverproc/Makefile *** *** 18,24 OBJS = walreceiverproc.o SHLIB_LINK = $(libpq) NAME = walreceiverproc ! all: all-shared-lib include $(top_srcdir)/src/Makefile.shlib --- 18,24 SHLIB_LINK = $(libpq) NAME = walreceiverproc ! all: submake-libpq all-shared-lib include $(top_srcdir)/src/Makefile.shlib -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 16:09 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: If someone does add this, it will require careful thought about how to avoid introducing further subtle ways to break HS, all of which will need testing and re-testing to avoid regression. Well, I *did* add that, but you removed it... It was already on there when you added the second one. It is still there now, even after I removed the duplicate entry. By add I meant to write the feature, test and then support it afterwards, not to re-discuss editing the Wiki. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Table size does not include toast size
=?ISO-8859-1?Q?C=E9dric_Villemain?= cedric.villemain.deb...@gmail.com writes: 2009/12/21 Tom Lane t...@sss.pgh.pa.us: Perhaps invent pg_table_size() = base table + toast table + toast index and pg_indexes_size() = all other indexes for table giving us the property pg_table_size + pg_indexes_size = pg_total_relation_size Did you mean : pg_table_size() = base table + toast table pg_indexes_size() = base indexes + toast indexes ? No. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Greg Stark gsst...@mit.edu writes: On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: AFAICS it is always a bug to be in a different memory context in tuplestore_put* than in tuplestore_begin_heap(), so it would be more robust to not put the burden on the callers. I thought there were comments specifically explaining why it was done that way but I don't recall what they said. I think it was just a performance optimization. It's probably not measurable though; even in the in-memory case there's at least a palloc inside the put() function, no? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On 22.12.09 13:21 , Simon Riggs wrote: On Tue, 2009-12-22 at 12:32 +0100, Florian Pflug wrote: Image a reporting database where all transactions but a few daily bulk imports are read-only. To spread the load, you do your bulk loads on the master, but run the reporting queries against a read-only HS slave. Now you take the master down for maintenance. Since all clients but the bulk loader use the slave already, and since the bulk loads can be deferred until after the maintenance window closes again, you don't actually do a fail-over. Now you're already pointing at your foot with the gun. All it takes to ruin your day is *some* reason for the slave to restart. Maybe due to a junior DBA's typo, or maybe due to a bug in postgres. Anway, once the slave is down, it won't come up until you manage to get the master up and running again. And this limitation is pretty surprising, since one would assume that if the slave survives a *crash* of the master, it'd certainly survive a simple *shutdown*. Well, you either wait for master to come up again and restart, or you flip into normal mode and keep running queries from there. You aren't prevented from using the server, except by your own refusal to failover. Very true. However, that refusal as you put it might actually be the most sensible thing to do in a lot of setups. Not everyone needs extreme up-time guarantees, and for those people setting up, testing and *continuously* exercising fail-over is just not worth the effort. Especially since fail-over with asynchronous replication is tricky to get right if you want to avoid data loss. So I still believe that there are very real use-cases for HS where this limitation can be quite a PITA. But you are of course free to work on whatever you feel like, and probably need to satisfy your client's needs first. So I'm in no way implying that this issue is a must-fix issue, or that you're in any way obliged to take care of it. I merely wanted to make the point that there *are* valid use-cases where this behavior is not ideal. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Patch against CVS HEAD to do that and fix the reported bug attached. Now that the tuplestore_put* switches to the right memory context, we could remove that from all the callers, but this patch only does it for pl_exec.c. BTW, I'm not convinced that the owner-switchery you added to pl_exec.c is necessary/appropriate. Under what circumstances would that be a good idea? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug fgp.phlo@gmail.com wrote: Well, you either wait for master to come up again and restart, or you flip into normal mode and keep running queries from there. You aren't prevented from using the server, except by your own refusal to failover. Very true. However, that refusal as you put it might actually be the most sensible thing to do in a lot of setups. Not everyone needs extreme up-time guarantees, and for those people setting up, testing and *continuously* exercising fail-over is just not worth the effort. Especially since fail-over with asynchronous replication is tricky to get right if you want to avoid data loss. To say nothing that the replica might not be a suitable master at all. It could be running on inferior hardware or be on a separate network perhaps too slow to reach from production services. HA is not the only use case for HS or even the main one in my experience -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote: But you are of course free to work on whatever you feel like, and probably need to satisfy your client's needs first. Alluding to me as whimsical or mercenary isn't likely to change my mind. IMHO this isn't one of the more important features, for the majority, in this release. I do intend to check that. If there are people that believe otherwise, knock yourselves out. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Tom Lane wrote: Greg Stark gsst...@mit.edu writes: On Tue, Dec 22, 2009 at 11:45 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: AFAICS it is always a bug to be in a different memory context in tuplestore_put* than in tuplestore_begin_heap(), so it would be more robust to not put the burden on the callers. I thought there were comments specifically explaining why it was done that way but I don't recall what they said. I think it was just a performance optimization. It's probably not measurable though; even in the in-memory case there's at least a palloc inside the put() function, no? Yes. And many of the callers do the memory context switching dance anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 15:38 +, Greg Stark wrote: On Tue, Dec 22, 2009 at 3:32 PM, Florian Pflug fgp.phlo@gmail.com wrote: Well, you either wait for master to come up again and restart, or you flip into normal mode and keep running queries from there. You aren't prevented from using the server, except by your own refusal to failover. Very true. However, that refusal as you put it might actually be the most sensible thing to do in a lot of setups. Not everyone needs extreme up-time guarantees, and for those people setting up, testing and *continuously* exercising fail-over is just not worth the effort. Especially since fail-over with asynchronous replication is tricky to get right if you want to avoid data loss. To say nothing that the replica might not be a suitable master at all. It could be running on inferior hardware or be on a separate network perhaps too slow to reach from production services. HA is not the only use case for HS or even the main one in my experience I can invent scenarios in which all the outstanding issues give problems. What I have to do is balance which of those is more likely and which have useful workarounds. This is about priority and in particular, my priority. IMHO my time would be misplaced to work upon this issue, though I will check that other users feel that way also. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
Simon Riggs wrote: By add I meant to write the feature, test and then support it afterwards, not to re-discuss editing the Wiki. That's exactly what I meant too. I *did* write the feature, but you removed it before committing. I can extract the removed parts from the git repository and send you as a new patch for review, if you'd like. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] remove redundant ownership checks
2009/12/22 KaiGai Kohei kai...@ak.jp.nec.com: (2009/12/21 12:53), Robert Haas wrote: On Thu, Dec 17, 2009 at 7:19 PM, Tom Lanet...@sss.pgh.pa.us wrote: KaiGai Koheikai...@ak.jp.nec.com writes: [ patch to remove EnableDisableRule's permissions check ] I don't particularly like this patch, mainly because I disagree with randomly removing permissions checks without any sort of plan about where they ought to go. So where ought they to go? If we're going to start moving these checks around we need a very well-defined notion of where permissions checks should be made, so that everyone knows what to expect. I have not seen any plan for that. This seems to me to get right the heart of the matter. When I submitted my machine-readable explain patch, you critiqued it for implementing half of an abstraction layer, and it seems to me that our current permissions-checking logic has precisely the same issue. We consistently write code that starts by checking permissions and then moves right along to implementing the action. Those two things need to be severed. I see two ways to do this. Given a function that (A) does some prep work, (B) checks permissions, and (C) performs the action, we could either: 1. Make the existing function do (A) and (B) and then call another function to do (C), or 2. Make the existing function do (A), call another function to do (B), and then do (C) itself. I'm not sure which will work better, but I think making a decision about which way to do it and how to name the functions would be a big step towards having a coherent plan for this project. We have mixed policy in the current implementation. The point is what database object shall be handled in this function. If we consider a rewrite rule as a database object, not a property of the relation, it seems to me a correct manner to apply permission checks in the EnableDisableRule(), because it handles a given rewrite rule. If we consider a rewrite rule as a property of a relation, not an independent database object, it seems to me we should apply permission checks in ATPrepCmd() which handles a relation, rather than EnableDisableRule(). My patch stands on the later perspective, because pg_rewrite entries don't have its own ownership and access privileges, and it is always owned by a certain relation. That's somewhat separate from the point I was making, but it's a good point all the same. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: By add I meant to write the feature, test and then support it afterwards, not to re-discuss editing the Wiki. That's exactly what I meant too. I *did* write the feature, but you removed it before committing. I removed it because you showed it wouldn't work. If you want to fix that problem, test, commit and support it, go right ahead. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: I think it was just a performance optimization. It's probably not measurable though; even in the in-memory case there's at least a palloc inside the put() function, no? Yes. And many of the callers do the memory context switching dance anyway. Yeah, I was just noticing that. We should go around and clean those up if we apply this change. Looking at the CVS history, I think the reason tuplestore doesn't do its own memory context switch is that it was cloned from tuplesort, which didn't either at the time. But several years ago we changed tuplesort to be safer about this (it actually keeps its own memory context now), so it's just inconsistent that tuplestore still exposes the risk. The ownership business is another story though. tuplesort doesn't make any attempt to defend itself against resource-owner changes. If we need this for tuplestore I bet we need it for tuplesort too; but do we? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
Simon Riggs wrote: On Tue, 2009-12-22 at 18:17 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: By add I meant to write the feature, test and then support it afterwards, not to re-discuss editing the Wiki. That's exactly what I meant too. I *did* write the feature, but you removed it before committing. I removed it because you showed it wouldn't work. I did? I believe this is the discussion that lead to you removing it (6th of December, thread Hot Standby, recent changes): Simon Riggs wrote: On Sun, 2009-12-06 at 12:32 +0200, Heikki Linnakangas wrote: 4. Need to handle the case where master is started up with wal_standby_info=true, shut down, and restarted with wal_standby_info=false, while the standby server runs continuously. And the code in StartupXLog() to initialize recovery snapshot from a shutdown checkpoint needs to check that too. I don't really understand the use case for shutting down the server and then using it as a HS base backup. Why would anyone do that? Why would they have their server down for potentially hours, when they can take the backup while the server is up? If the server is idle, it can be up-and-idle just as easily as down-and-idle, in which case we wouldn't need to support this at all. Adding yards of code for this capability isn't important to me. I'd rather strip the whole lot out than keep fiddling with a low priority area. Please justify this as a real world solution before we continue trying to support it. The issue I mentioned had nothing to do with starting from a shutdown checkpoint - it's still a problem if you keep the standby running through the restart cycle in the master) - but maybe you thought it was? Or was there something else? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Largeobject Access Controls (r2460)
2009/12/22 KaiGai Kohei kai...@ak.jp.nec.com: (2009/12/21 9:39), KaiGai Kohei wrote: (2009/12/19 12:05), Robert Haas wrote: On Fri, Dec 18, 2009 at 9:48 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haasrobertmh...@gmail.com writes: Oh. This is more complicated than it appeared on the surface. It seems that the string BLOB COMMENTS actually gets inserted into custom dumps somewhere, so I'm not sure whether we can just change it. Was this issue discussed at some point before this was committed? Changing it would seem to require inserting some backward compatibility code here. Another option would be to add a separate section for BLOB METADATA, and leave BLOB COMMENTS alone. Can anyone comment on what the Right Thing To Do is here? The BLOB COMMENTS label is, or was, correct for what it contained. If this patch has usurped it to contain other things It has. I would argue that that is seriously wrong. pg_dump already has a clear notion of how to handle ACLs for objects. ACLs for blobs ought to be made to fit into that structure, not dumped in some random place because that saved a few lines of code. OK. Hopefully KaiGai or Takahiro can suggest a fix. The patch has grown larger than I expected before, because the way to handle large objects are far from any other object classes. Here are three points: 1) The new BLOB ACLS section was added. It is a single purpose section to describe GRANT/REVOKE statements on large objects, and BLOB COMMENTS section was reverted to put only descriptions. Because we need to assume a case when the database holds massive number of large objects, it is not reasonable to store them using dumpACL(). It chains an ACL entry with the list of TOC entries, then, these are dumped. It means pg_dump may have to register massive number of large objects in the local memory space. Currently, we also store GRANT/REVOKE statements in BLOB COMMENTS section, but confusable. Even if pg_restore is launched with --no-privileges options, it cannot ignore GRANT/REVOKE statements on large objects. This fix enables to distinguish ACLs on large objects from other properties, and to handle them correctly. 2) The BLOBS section was separated for each database users. Currently, the BLOBS section does not have information about owner of the large objects to be restored. So, we tried to alter its ownership in the BLOB COMMENTS section, but incorrect. The --use-set-session-authorization option requires to restore ownership of objects without ALTER ... OWNER TO statements. So, we need to set up correct database username on the section properties. This patch renamed the hasBlobs() by getBlobs(), and changed its purpose. It registers DO_BLOBS, DO_BLOB_COMMENTS and DO_BLOB_ACLS for each large objects owners, if necessary. For example, if here are five users owning large objects, getBlobs() shall register five TOC entries for each users, and dumpBlobs(), dumpBlobComments() and dumpBlobAcls() shall be also invoked five times with the username. 3) _LoadBlobs() For regular database object classes, _printTocEntry() can inject ALTER xxx OWNER TO ... statement on the restored object based on the ownership described in the section header. However, we cannot use this infrastructure for large objects as-is, because one BLOBS section can restore multiple large objects. _LoadBlobs() is a routine to restore large objects within a certain section. This patch modifies this routine to inject ALTER LARGE OBJECT loid OWNER TO user statement for each large objects based on the ownership of the section. (if --use-set-session-authorization is not given.) $ diffstat pgsql-fix-pg_dump-blob-privs.patch pg_backup_archiver.c | 4 pg_backup_custom.c | 11 ! pg_backup_files.c | 9 ! pg_backup_tar.c | 9 ! pg_dump.c | 312 +++!! pg_dump.h | 9 ! pg_dump_sort.c | 8 ! 7 files changed, 68 insertions(+), 25 deletions(-), 269 modifications(!) I will review this sooner if I have time, but please make sure it gets added to the next CommitFest so we don't lose it. I think it also needs to be added here, since AFAICS this is a must-fix for 8.5. http://wiki.postgresql.org/wiki/PostgreSQL_8.5_Open_Items Thanks, ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 18:40 +0200, Heikki Linnakangas wrote: The issue I mentioned had nothing to do with starting from a shutdown checkpoint - it's still a problem if you keep the standby running through the restart cycle in the master) - but maybe you thought it was? Or was there something else? Strangely enough that exact same problem already happens with archive_mode, and we see a fix coming for that soon also. That fix takes the same approach as HS already takes. HS will flip out when it sees the next record (checkpoint). The only way out is to re-take base backup, just the same. Even after that fix is applied, HS will still work as well as archive-mode, so if anything HS is ahead of other functionality. Fixing obscure cases where people actively try to get past configuration options is not a priority. I'm not sure why you see it as important, especially when you've argued we don't even need the parameter in the first place. You've been perfectly happy for *years* with the situation that recovery would fail if max_prepared_transactions was not correctly. You're not going to tell me you never noticed? Why is avoidance of obvious misconfiguration of HS such a heavy priority when nothing else ever was? I'm going to concentrate on fixing important issues. I'd rather you helped with those. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small change of the HS document
On Tue, 2009-12-22 at 11:25 +0900, Fujii Masao wrote: I found there is no primary tag for the HS parameters in config.sgml. Attached patch adds that tag. Thanks -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
Simon Riggs wrote: You've been perfectly happy for *years* with the situation that recovery would fail if max_prepared_transactions was not correctly. You're not going to tell me you never noticed? Why is avoidance of obvious misconfiguration of HS such a heavy priority when nothing else ever was? That's not a priority, and I never said it was. It almost sounds like we're in a violant agreement: this issue of flipping wal_standby_info in the master has nothing to do with the removal of the capability to start standby from a shutdown checkpoint. So what *was* the reason? Was there something wrong with it? If not, please put it back. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Thu, 2009-11-26 at 17:02 +0900, Fujii Masao wrote: On Thu, Nov 26, 2009 at 4:55 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Fujii Masao wrote: In current SR, since a backup history file is not replicated, the standby always starts an archive recovery without a backup history file, and a wrong minRecoveryPoint might be used. This is not a problem for SR itself, but would cause trouble when SR cooperates with Hot Standby. But the backup history file is included in the base backup you start replication from, right? No. A backup history file is created by pg_stop_backup(). So it's not included in the base backup. The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Copy the contents of the backup history file into the WAL record, just like we do with prepared transactions. That way it will be streamed to the standby without any other code being needed for SR, while we don't need to retest warm standby to check that still works also. (The thread diverges onto a second point and this first point seems to have been a little forgotten) -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Tue, 2009-12-22 at 14:40 +0900, Fujii Masao wrote: On Sat, Dec 19, 2009 at 1:03 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't think it's worthwhile to modify pg_stop_backup() like that. We should address the general problem. At the moment, you're fine if you also configure WAL archiving and log file shipping, but it would be nice to have some simpler mechanism to avoid the problem. For example, a GUC in master to retain all log files (including backup history files) for X days. Or some way for to register the standby with the master so that the master knows it's out there, and still needs the logs, even when it's not connected. I propose the new GUC replication_reserved_segments (better name?) which determines the maximum number of WAL files held for the standby. Design: * Only the WAL files which are replication_reserved_segments segments older than the current write segment can be recycled. IOW, we can think that the standby which falls replication_reserved_segments segments behind is always connected to the primary, and the WAL files needed for the active standby are not recycled. (I don't fully understand your words above, sorry.) Possibly an easier way would be to have a size limit, not a number of segments. Something like max_reserved_wal = 240GB. We made that change to shared_buffers a few years back and it was really useful. The problem I foresee is that doing it this way puts an upper limit on the size of database we can replicate. While we do base backup and transfer it we must store WAL somewhere. Forcing us to store the WAL on the master while this happens could be very limiting. * Disjoin the standby which falls more than replication_reserved_segment segments behind, in order to avoid the disk full failure, i.e., the primary server's PANIC error. This would be only possible in asynchronous replication case. Or at the start of replication. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Patch against CVS HEAD to do that and fix the reported bug attached. Now that the tuplestore_put* switches to the right memory context, we could remove that from all the callers, but this patch only does it for pl_exec.c. BTW, I'm not convinced that the owner-switchery you added to pl_exec.c is necessary/appropriate. Under what circumstances would that be a good idea? A PL/pgSQL normally runs in the whatever resource owner is current when the function is called. When we allocate the tuplestore for return tuples, it's associated with the current resource owner. But if you have an exception-block, we start a new subtransaction and switch to the subtransaction resource owner. If you have a RETURN NEXT/QUERY in the block, the tuplestore (or the temporary file backing it, to be precise) is initialized into the subtransaction resource owner, which is released at subtransaction commit. The subtransaction resource owner is not the right owner for the tuplestore holding return tuples. We already take care to use the right memory context for the tuplestore, but now that temp files are associated with resource owners, we need to use the right resource owner as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault from PL/Perl Returning vstring
On Dec 21, 2009, at 9:04 PM, Andrew Dunstan wrote: I cannot reproduce this. I tested with perl 5.10.1 which is the latest reported stable release at http://www.cpan.org/src/README.html, on an 8.4.2 UTF8 database, and with the same Safe and Encode module versions as above. I've replicated it all the way back to 8.0. I'd be happy to give you a login to my box. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
Simon Riggs wrote: On Mon, 2009-12-21 at 18:42 +0900, Hiroyuki Yamada wrote: Do you think this problem is must-fix for the final release ? We should be clear that this is a behaviour I told you about, not a shock discovery by yourself. There is no permanent freeze, just a wait, from which the Startup process wakes up at the appropriate time. There is no crash or hang as is usually implied by the word freeze. It remains to be seen whether this is a priority for usability enhancement in this release. There are other issues as well and it is doubtful that every user will be fully happy with the functionality in this release. I will work on things in the order in which I understand them to be important for the majority, given my time and budget constraints and the resolvability of the issues. When you report bugs, I say thanks. When you start agitating about already-documented restrictions and I see which other software you promote, I think you may have other motives. Regrettably that reduces the weight I give your claims, in relation to other potential users. Simon, where did this come from? Other software? I think Simon's comments are way off base here and only serve to increase tension in this discussion. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs si...@2ndquadrant.com wrote: The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Copy the contents of the backup history file into the WAL record, just like we do with prepared transactions. That way it will be streamed to the standby without any other code being needed for SR, while we don't need to retest warm standby to check that still works also. This means that we can replace a backup history file with the corresponding xlog record. I think that we should simplify the code by making the replacement completely rather than just adding new xlog record. Thought? BTW, in current SR code, the capability to replicate a backup history file has been implemented. But if there is better and simpler idea, I'll adopt it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, 2009-12-23 at 03:28 +0900, Fujii Masao wrote: On Wed, Dec 23, 2009 at 2:42 AM, Simon Riggs si...@2ndquadrant.com wrote: The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Copy the contents of the backup history file into the WAL record, just like we do with prepared transactions. That way it will be streamed to the standby without any other code being needed for SR, while we don't need to retest warm standby to check that still works also. This means that we can replace a backup history file with the corresponding xlog record. I think that we should simplify the code by making the replacement completely rather than just adding new xlog record. Thought? We can't do that because it would stop file-based archiving from working. I don't think we should deprecate that for another release at least. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Small Bug in GetConflictingVirtualXIDs
On Tuesday 22 December 2009 11:42:30 Simon Riggs wrote: On Tue, 2009-12-22 at 03:19 +0100, Andres Freund wrote: On Monday 21 December 2009 16:48:52 Simon Riggs wrote: Giving the drop database a snapshot is not the answer. I expect Andres to be able to fix this with a simple patch that would not effect the case of normal running. Actually its less simply than I had thought at first - I don't think the code ever handled that correctly. I might be wrong there, my knowledge of the involved code is a bit sparse... The whole conflict resolution builds on the concept of waiting for an VXid, but an idle backend does not have a valid vxid. Thats correct, right? Yes, that's correct. I'll take this one back then. So youre writing a fix or shall I? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 19:30 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: You've been perfectly happy for *years* with the situation that recovery would fail if max_prepared_transactions was not correctly. You're not going to tell me you never noticed? Why is avoidance of obvious misconfiguration of HS such a heavy priority when nothing else ever was? That's not a priority, and I never said it was. It almost sounds like we're in a violant agreement: this issue of flipping wal_standby_info in the master has nothing to do with the removal of the capability to start standby from a shutdown checkpoint. I removed the capability to start at shutdown checkpoints because you said it would cause this bug. That gives two choices: fix the bug, remove the feature. I don't think it is a priority to support that feature, so I removed it in favour of other work. I will work on issues in priority order and this was already on the list and remains so. I don't have endless time, so realistically, given its current priority it is unlikely to be addressed in this release. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On 22.12.09 16:45 , Simon Riggs wrote: On Tue, 2009-12-22 at 16:32 +0100, Florian Pflug wrote: But you are of course free to work on whatever you feel like, and probably need to satisfy your client's needs first. Alluding to me as whimsical or mercenary isn't likely to change my mind. Simon, you *completely* miss-understood my last paragraph! I never intended to call you whimsical or mercenary, and I honestly don't believe I did. The only thing I alluded to you was seeing HS mostly as a solution for HA setups, whereas I felt that I has quite a few use-cases beside that. Plus that your view of what the important use-cases are is influenced by the projects you usually work on, and that it's perfectly reasonable for your priorities to reflect that view. None of this was meant as an insult of any kind. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Streaming Rep - 2-phase backups and reducing time to full replication
Some ideas to improve current behaviour of SR http://wiki.postgresql.org/wiki/Streaming_Replication The current startup process is copied below. (7) gives some issues if it is a very long step, notably that the master may fill with data and then break off the connection before replication is fully configured. 7. Make a base backup of the primary server, load this data onto the standby. 8. Set up XLOG archiving, connections and authentication in the standby server like the primary, so that the standby might work as a primary after failover. 9. Create a recovery command file in the standby server; the following parameters are required for streaming replication. 10. Start postgres in the standby server. It will start streaming replication. It occurs to me to ask which files we need as a minimum before we can begin step (10)? If we could start step (10) before step (7) is complete then we would avoid many of our space problems and replication would enter safe mode considerably sooner, in some cases dozens of hours earlier. We read the recovery.conf at the start of StartupXLog(). So by that point we need only the following files * All *.conf files * pg_control (and probably much of the rest of global/ directory) Some very quick surgery on a current-version data directory shows this is correct, apart from the call to RelationCacheInitFileRemove() which can be altered to accept a missing directory as proof that the file has been removed. If we then think of the starting procedure as happening in two parts: i) sufficient startup to get to the point where we bring up the walreceiver, while startup process waits further confirmation ii) following further confirmation startup process now begins recovering database So if we do the base backup in two stages the sequence of actions could become 9. Create a recovery command file in the standby server with parameters required for streaming replication. 7. (a) Make a base backup of minimal essential files from primary server, load this data onto the standby. 10. Start postgres in the standby server. It will start streaming replication. 7. (b) Continue with second phase of base backup, copying all remaining files, ending with pg_stop_backup() * Next step is to waken startup process so it can continue recovery We don't need to introduce another call, we just need to have a mechanism for telling the startup process to sleep because we are doing a two-phase backup and another mechanism for waking it when the whole backup is complete. That sounds like a recovery.conf parameter and an additional kind of trigger file, perhaps the backup file? This seems like a simple and useful option for 8.5 -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: BTW, I'm not convinced that the owner-switchery you added to pl_exec.c is necessary/appropriate. Under what circumstances would that be a good idea? A PL/pgSQL normally runs in the whatever resource owner is current when the function is called. When we allocate the tuplestore for return tuples, it's associated with the current resource owner. But if you have an exception-block, we start a new subtransaction and switch to the subtransaction resource owner. If you have a RETURN NEXT/QUERY in the block, the tuplestore (or the temporary file backing it, to be precise) is initialized into the subtransaction resource owner, which is released at subtransaction commit. Got it. So doesn't tuplesort have the same issue? The patch definitely requires more than zero comments. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, Dec 23, 2009 at 2:49 AM, Simon Riggs si...@2ndquadrant.com wrote: (I don't fully understand your words above, sorry.) NM ;-) Possibly an easier way would be to have a size limit, not a number of segments. Something like max_reserved_wal = 240GB. We made that change to shared_buffers a few years back and it was really useful. For me, a size limit is intuitive because the checkpoint_segments which is closely connected with the new parameter still indicates the number of segments. But if more people like a size limit, I'll make that change. The problem I foresee is that doing it this way puts an upper limit on the size of database we can replicate. While we do base backup and transfer it we must store WAL somewhere. Forcing us to store the WAL on the master while this happens could be very limiting. Yes. If the size of pg_xlog is relatively small to the size of database, the primary would not be able to hold all the WAL files required for the standby, so we would need to use the restore_command which retrieves the WAL files from the master's archival area. I'm OK that such extra operation is required in that special case, now. * Disjoin the standby which falls more than replication_reserved_segment segments behind, in order to avoid the disk full failure, i.e., the primary server's PANIC error. This would be only possible in asynchronous replication case. Or at the start of replication. Yes. I think that avoidance of the primary's PANIC error should be given priority over a smooth start of replication. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, 2009-12-22 at 19:53 +0100, Florian Pflug wrote: None of this was meant as an insult of any kind. Then I apologise completely. I've clearly been working too hard and will retire for some rest (even though that is not listed as a task on the Wiki). -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Simon Riggs si...@2ndquadrant.com writes: The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Certainly not. The history file is, in the restore-from-archive case, needed to *find* the xlog data. However, it's not clear to me why SR should have any need for it. It's not doing restore from archive. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote: I've clearly been working too hard and will retire for some rest (even though that is not listed as a task on the Wiki). Someone add it! David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, Dec 23, 2009 at 3:41 AM, Simon Riggs si...@2ndquadrant.com wrote: This means that we can replace a backup history file with the corresponding xlog record. I think that we should simplify the code by making the replacement completely rather than just adding new xlog record. Thought? We can't do that because it would stop file-based archiving from working. I don't think we should deprecate that for another release at least. Umm... ISTM that we can do that even if file-base archiving case; * pg_stop_backup writes the xlog record corresponding to a backup history file, and requests the WAL file switch. * In PITR or warm-standby, the startup process just marks database as inconsistent until it has read that xlog record. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alpha3 release schedule?
On Tue, Dec 22, 2009 at 11:04:29AM -0800, David Wheeler wrote: On Dec 22, 2009, at 11:02 AM, Simon Riggs wrote: I've clearly been working too hard and will retire for some rest (even though that is not listed as a task on the Wiki). Someone add it! Done! :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Tue, 2009-12-22 at 14:02 -0500, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Certainly not. The history file is, in the restore-from-archive case, needed to *find* the xlog data. However, it's not clear to me why SR should have any need for it. It's not doing restore from archive. Definitely should not make this *just* in WAL, files still required, agreed. It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote: It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote: It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. Agreed, hence my interest! -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Fujii Masao masao.fu...@gmail.com writes: On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote: It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. OK, so the information needed in the WAL record doesn't even include most of what is in the history file, correct? What you're actually talking about is identifying a WAL position. Maybe you'd want to provide the backup label for identification purposes, but not much else. In that case I concur that this is a better solution than hacking up something to pass over the history file. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Segfault from PL/Perl Returning vstring
David E. Wheeler wrote: On Dec 21, 2009, at 9:04 PM, Andrew Dunstan wrote: I cannot reproduce this. I tested with perl 5.10.1 which is the latest reported stable release at http://www.cpan.org/src/README.html, on an 8.4.2 UTF8 database, and with the same Safe and Encode module versions as above. I've replicated it all the way back to 8.0. I'd be happy to give you a login to my box. If a trivial plperl function can crash the server this is surely worth some research. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy
Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes: Tom Lane t...@sss.pgh.pa.us wrote: I suggest that we might want to just rip out the support for copying comments on indexes. We have two related ToDo items below. They are a bit inconsintent, but they mean we should forbid COMMENT on columns of an index, or must have full-support of the feature. Which direction should we go? As for me, forbidding comments on index columns seems to be a saner way because index can have arbitrary key names in some cases. - Forbid COMMENT on columns of an index Postgres currently allows comments to be placed on the columns of an index, but pg_dump doesn't handle them and the column names themselves are implementation-dependent. http://archives.postgresql.org/message-id/27676.1237906...@sss.pgh.pa.us - pg_dump / pg_restore: Add dumping of comments on index columns and composite type columns http://archives.postgresql.org/pgsql-hackers/2009-03/msg00931.php (XXX: Comments on composite type columns can work now?) I'm for forbidding comments on index columns. The amount of work required to support the feature fully seems far out of proportion to its value. In any case, if pg_dump drops such comments (which I had forgotten, but it seems true after a quick look at the code), then we could certainly get away with having LIKE not copy them. That would fix the immediate problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Tom Lane wrote: Got it. So doesn't tuplesort have the same issue? Tuplesort has the same general problem that the caller of puttuple needs to be in the right resource owner. Which ought to be fixed, especially since tuplesort doesn't require that for the memory context anymore. But we don't use tuplesort to return tuples from functions, so it's not broken in a user-visible way. Unless you can think of another scenario like that. The patch definitely requires more than zero comments. Sure. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Simon Riggs wrote: On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote: It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. Agreed, hence my interest! Yeah, that's a great idea. I was just having a chat with Magnus this morning, and he asked if the current patch already provides or if it would be possible to write a stand-alone utility to connect to a master and stream WAL files to an archive directory, without setting up a full-blown standby instance. We came to the conclusion that backup history files wouldn't be copied as the patch stands, because the standby has to specifically request them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: The backup history file is a slightly bit quirky way of doing things and was designed when the transfer mechanism was file-based. Why don't we just write a new xlog record that contains the information we need? Certainly not. The history file is, in the restore-from-archive case, needed to *find* the xlog data. Hmm, not really. The backup_label file tells where the checkpoint record is. And that is still needed. AFAICS the backup history file is only needed to determine the point where the backup was completed, ie. the minimum safe stopping point for WAL replay. I think we could get away without the backup history file altogether. It's kind of nice to have them in the archive directory, for the DBA, to easily see the locations where base backups were taken. But if we write the backup stop location in WAL, the system doesn't really need the history file for anything. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy
Tom Lane wrote: I'm for forbidding comments on index columns. The amount of work required to support the feature fully seems far out of proportion to its value. In any case, if pg_dump drops such comments (which I had forgotten, but it seems true after a quick look at the code), then we could certainly get away with having LIKE not copy them. That would fix the immediate problem. If we're going to keep the comments we should dump them. I don't mind dropping them altogether - it's hardly a killer feature. We should just be consistent, IMNSHO. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplestore should remember the memory context it's created in
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Got it. So doesn't tuplesort have the same issue? Tuplesort has the same general problem that the caller of puttuple needs to be in the right resource owner. Which ought to be fixed, especially since tuplesort doesn't require that for the memory context anymore. But we don't use tuplesort to return tuples from functions, so it's not broken in a user-visible way. Unless you can think of another scenario like that. (1) create a cursor whose plan involves a sort that will spill to disk (2) enter subtransaction (3) fetch from cursor (causing the sort to actually happen) (4) leave subtransaction (5) fetch some more from cursor Too busy to develop a test case right now, but ISTM it ought to fail. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possible patch for better index name choosing
I wrote: Attached is a WIP patch for addressing the problems mentioned in this thread: http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php ... There is one thing that is not terribly nice about the behavior, which is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart names for expression indexes; ... The reason for this is that the patch depends on FigureColname which works on untransformed parse trees, and we don't have access to such a tree when copying an existing index. There seem to be three possible responses to that: ... 3. Implement a separate FigureIndexColname function that works as much like FigureColname as it can, but takes a transformed parse tree. I fooled around with this solution and decided that it is a lot messier than it's worth. In the first place, we can't make a FigureColname-like function that just takes a transformed tree: there is no way to interpret Vars without some context. You need at least a table OID, and more than that if you'd like to handle cases like multiple-relation expressions or non-RELATION RTEs. For the case at hand of index expressions, a table OID would be enough, but that doesn't leave much room for imagining the function could be used for anything else in future. Worse, for the problematic case (CREATE TABLE LIKE) we actually do not have a table OID because the target table doesn't exist yet. We could finesse that by passing the source table's OID instead, but that seems pretty klugy itself. In the second place, the number of corner cases where we'd generate output different from FigureColname is much greater than I realized. As an example, if foo is a type name then foo(x) and x::foo produce the same parsed tree, but FigureColname will treat them differently. Seeing that CREATE TABLE LIKE doesn't try to reproduce the source table's index names anyway, I'm inclined to just go with the patch as-is and not try to make it handle this one case nicely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I think we could get away without the backup history file altogether. Hmmm ... actually I was confusing these with timeline history files, which are definitely not something we can drop. You might be right that the backup history file could be part of WAL instead. On the other hand, it's quite comforting that the history file is plain ASCII and can be examined without any special tools. I'm -1 for removing it, even if we decide to duplicate the info in a WAL record. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [091222 15:47]: I was just having a chat with Magnus this morning, and he asked if the current patch already provides or if it would be possible to write a stand-alone utility to connect to a master and stream WAL files to an archive directory, without setting up a full-blown standby instance. We came to the conclusion that backup history files wouldn't be copied as the patch stands, because the standby has to specifically request them. Please, please, please... I've been watching the SR from the sidelines, basically because I want my WAL fsync'ed on 2 physically separate machines before the client's COMMIt returns... And no, I'm not really interested in trying to do raid over NBD or DRBD, and deal with the problems and pitfals that entails... Being able to write a utility that connects as an SR client, but just synchronously writes WAL into an archive directory setup is exactly what I want... And once SR has settled, it's something I'm interested in working on... a. -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. signature.asc Description: Digital signature
Re: [HACKERS] LIKE INCLUDING COMMENTS code is a flight of fancy
Andrew Dunstan and...@dunslane.net writes: Tom Lane wrote: I'm for forbidding comments on index columns. The amount of work required to support the feature fully seems far out of proportion to its value. In any case, if pg_dump drops such comments (which I had forgotten, but it seems true after a quick look at the code), then we could certainly get away with having LIKE not copy them. That would fix the immediate problem. If we're going to keep the comments we should dump them. I don't mind dropping them altogether - it's hardly a killer feature. We should just be consistent, IMNSHO. Well, let's just forbid them then. It'll take just a few extra lines in comment.c to throw an error if the target table has the wrong relkind. Then we can pull out the troublesome code in parse_utilcmds.c. It strikes me also that this changes the terms of discussion for the other patch I was working on. I was mistakenly assuming that we could not change the naming convention for individual index columns because it would cause errors during dump/reload of per-column comments. But since pg_dump never has supported that, there is no such risk. I propose re-instating my previous idea of replacing pg_expression_n with the name chosen by FigureColname. This not only makes the index column names a bit more useful, but it fixes the disadvantage I was on about for CREATE TABLE LIKE: it can get the FigureColname name from the index's pg_attribute entry, so there's no problem with producing smart index-expression column names when cloning indexes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup history file should be replicated in Streaming Replication?
On Tue, 2009-12-22 at 22:46 +0200, Heikki Linnakangas wrote: Simon Riggs wrote: On Wed, 2009-12-23 at 04:15 +0900, Fujii Masao wrote: On Wed, Dec 23, 2009 at 4:09 AM, Simon Riggs si...@2ndquadrant.com wrote: It's needed to find the place where the backup stopped, so it defines the safe stopping point. We could easily pass that info via WAL, when streaming. It doesn't actually matter until we try to failover. Right. And, it's also needed to cooperate with HS which begins accepting read-only queries after a recovery reaches that safe stopping point. Agreed, hence my interest! Yeah, that's a great idea. I was just having a chat with Magnus this morning, and he asked if the current patch already provides or if it would be possible to write a stand-alone utility to connect to a master and stream WAL files to an archive directory, without setting up a full-blown standby instance. We came to the conclusion that backup history files wouldn't be copied as the patch stands, because the standby has to specifically request them. There isn't any need to write that utility. Read my post about 2-phase backup and you'll see we are a couple of lines of code away from that. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] join ordering via Simulated Annealing
Hi, I've been playing with using a Simulated Annealing-type algorithm for determinig join ordering for relations. To get into context see http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php (there's also a TODO in the wiki). There's a nice paper on that in http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf (also linked from that thread) and someone even wrote a patch: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php This generally aims at being able to replace GEQO. I have some rough prototype code, but I'm not even asking people to look at it. It's stuffed with silly things and writes three Graphviz-style .dot files in /tmp for each algorithm step. But I'd like to at least present the idea. There are three main problems that have to be solved to get a good Simulated Annealing implementation: o) choosing the starting point for the algorithm o) generating the next point o) computing the cost in the current point The code I have now creates the initial plan by doing something similar to what gimme_tree does in GEQO, but without any kind of heuristic to try to favour joins with join restrictions (the idea is that it doesn't matter, since we only want to get *any* plan and we only do it once), but ideally it would be somehow chosen randonly from the space of all possible join orderings. I keep a binary tree of relations in memory, where leaves are RelOptInfos with 1 relid and the root is a RelOptInfo with all relids. Each iteration of the algorithm picks two nodes at random (with some restrictions, but that's not important) and tries to swap them around, meaning that a tree like (use a monospace font for best results): (1 2 3 4) *(1 2)(3 4) (1) (2) *(3) (4) where the parenthesised things are the two chosen nodes would get transformed into (1 2 3 4) (3) (1 2 4) (1 2)(4) (1) (2) that is, the (1 2) node and its subtree gets swapped with the (3) node and the upper-level nodes get changed accordingly. Sometimes a swap like that will produce an invalid join ordering - then swap is then reverted. If the join order given by the tree after the swap is legal, the paths are recomputed, much like in geqo_eval, and if the root node's cheapest path is not cheaper that before the swap, the swap gets reverted. Simulated Annealing algorithms permit uphill moves in terms of cost, with some probability that's decreasing as time passes, but that's not implemented yet. After a fixed amount of moves, the algorithm stops and returns the root node of the tree as the single RelOptInfo. The issues with the approach are: o) the initial state is not really a random plan, it's usualy a left-deep tree (and is very inefficient) and this might skew results. o) is swapping random nodes like that a good way of exploring the solution space? The SA paper suggests something much simpler, but some of the moves proposed there don't really make sense when using the make_join_rel machinery: *) changing the inner and outer relation of a join doesn't make sense, because make_join_rel is symmetrical *) changing the join executing strategy (nested loop, merge join, etc.) doesn't make sense, because make_join_rel considers all possible paths for a join o) each swap needs a full recalcualtion of the whole solution (geqo_eval does that, for instance), maybe it's possible to leave the subtrees of swapped nodes intact and only recalculate the nodes above the two swapped nodes? o) because make_join_rel scribbles on the PlannerInfo, the same hack as in geqo_eval is necessary: truncating join_rel_list after building all joinrels and nulling out the join_rel_hash o) I use make_join_rel to determine whether a join is legal, which does lots of other things. That looks like wasted effort, although in the end each time I need to build the final rel to assess the resulting path's cost. To follow the SA algorithm spirit more closely it would be necessary to only consider one, random path for each relation and make using different paths a move that the algorithm can choose while exploring the solution space. A cheaper way of determining the current solution's cost would be nice, too - fully rebuilding the final RelOptInfo after each move is annoying. Lastly, I'm lacking good testcases or even a testing approach: I'm generating silly queries and looking at how they get optimised, but if someone has a real dataset and examples of joins that cannot be planned with the standard planner, I would be interested to compare the results my prototype gets with those produced by GEQO. The code, a module that you can LOAD into a backend, is here (if you really want to see it): http://git.wulczer.org/?p=saio.git Set the saio GUC to true and saio_cutoff to the number of iterations you want. Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
Re: [HACKERS] join ordering via Simulated Annealing
I will follow it. Thank you. 2009/12/23 Jan Urba��ski wulc...@wulczer.org Hi, I've been playing with using a Simulated Annealing-type algorithm for determinig join ordering for relations. To get into context see http://archives.postgresql.org/pgsql-hackers/2009-05/msg00098.php (there's also a TODO in the wiki). There's a nice paper on that in http://reference.kfupm.edu.sa/content/h/e/heuristic_and_randomized_optimization_fo_87585.pdf (also linked from that thread) and someone even wrote a patch: http://archives.postgresql.org/pgsql-hackers/2009-05/msg00736.php This generally aims at being able to replace GEQO. I have some rough prototype code, but I'm not even asking people to look at it. It's stuffed with silly things and writes three Graphviz-style .dot files in /tmp for each algorithm step. But I'd like to at least present the idea. There are three main problems that have to be solved to get a good Simulated Annealing implementation: o) choosing the starting point for the algorithm o) generating the next point o) computing the cost in the current point The code I have now creates the initial plan by doing something similar to what gimme_tree does in GEQO, but without any kind of heuristic to try to favour joins with join restrictions (the idea is that it doesn't matter, since we only want to get *any* plan and we only do it once), but ideally it would be somehow chosen randonly from the space of all possible join orderings. I keep a binary tree of relations in memory, where leaves are RelOptInfos with 1 relid and the root is a RelOptInfo with all relids. Each iteration of the algorithm picks two nodes at random (with some restrictions, but that's not important) and tries to swap them around, meaning that a tree like (use a monospace font for best results): (1 2 3 4) *(1 2)(3 4) (1) (2) *(3) (4) where the parenthesised things are the two chosen nodes would get transformed into (1 2 3 4) (3) (1 2 4) (1 2)(4) (1) (2) that is, the (1 2) node and its subtree gets swapped with the (3) node and the upper-level nodes get changed accordingly. Sometimes a swap like that will produce an invalid join ordering - then swap is then reverted. If the join order given by the tree after the swap is legal, the paths are recomputed, much like in geqo_eval, and if the root node's cheapest path is not cheaper that before the swap, the swap gets reverted. Simulated Annealing algorithms permit uphill moves in terms of cost, with some probability that's decreasing as time passes, but that's not implemented yet. After a fixed amount of moves, the algorithm stops and returns the root node of the tree as the single RelOptInfo. The issues with the approach are: o) the initial state is not really a random plan, it's usualy a left-deep tree (and is very inefficient) and this might skew results. o) is swapping random nodes like that a good way of exploring the solution space? The SA paper suggests something much simpler, but some of the moves proposed there don't really make sense when using the make_join_rel machinery: *) changing the inner and outer relation of a join doesn't make sense, because make_join_rel is symmetrical *) changing the join executing strategy (nested loop, merge join, etc.) doesn't make sense, because make_join_rel considers all possible paths for a join o) each swap needs a full recalcualtion of the whole solution (geqo_eval does that, for instance), maybe it's possible to leave the subtrees of swapped nodes intact and only recalculate the nodes above the two swapped nodes? o) because make_join_rel scribbles on the PlannerInfo, the same hack as in geqo_eval is necessary: truncating join_rel_list after building all joinrels and nulling out the join_rel_hash o) I use make_join_rel to determine whether a join is legal, which does lots of other things. That looks like wasted effort, although in the end each time I need to build the final rel to assess the resulting path's cost. To follow the SA algorithm spirit more closely it would be necessary to only consider one, random path for each relation and make using different paths a move that the algorithm can choose while exploring the solution space. A cheaper way of determining the current solution's cost would be nice, too - fully rebuilding the final RelOptInfo after each move is annoying. Lastly, I'm lacking good testcases or even a testing approach: I'm generating silly queries and looking at how they get optimised, but if someone has a real dataset and examples of joins that cannot be planned with the standard planner, I would be interested to compare the results my prototype gets with those produced by GEQO. The code, a module that you can LOAD into a backend, is here (if you really want to see it): http://git.wulczer.org/?p=saio.git Set the saio GUC to true
Re: [HACKERS] creating index names automatically?
Peter Eisentraut pete...@gmx.net writes: Could we create an option to create index names automatically, so you'd only have to write CREATE INDEX ON foo (a); which would pick a name like foo_a_idx. Having done all the groundwork to support that nicely, I find that it doesn't work because of bison limitations :-(. AFAICT, the only way we could support this syntax would be to make ON a reserved word. Or at least more reserved than it is now. We used up all the wiggle room we had by making CONCURRENTLY non-reserved. Now ON is reserved according to SQL99, but I'm a bit hesitant to make it so in our grammar for such a marginal feature as this. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
I wrote: The trouble with changing the index attnames for expressions is that it increases the risk of collisions with attnames for regular index columns. You can hit that case today: regression=# create table foo (f1 int, f2 text); CREATE TABLE regression=# create index fooi on foo(f1, lower(f2)); CREATE INDEX regression=# \d fooi Index public.fooi Column | Type | Definition -+-+ f1 | integer | f1 pg_expression_2 | text| lower(f2) btree, for table public.foo regression=# alter table foo rename f1 to pg_expression_2; ERROR: duplicate key value violates unique constraint pg_attribute_relid_attnam_index DETAIL: Key (attrelid, attname)=(64621, pg_expression_2) already exists. but it's not exactly probable that someone would name a column pg_expression_N. The risk goes up quite a lot if we might use simple names like abs or lower for expression columns. It strikes me that the easiest way to deal with this is just to get rid of the code in renameatt() that tries to rename index columns to agree with the underlying table columns. That code is not nearly bright enough to deal with collisions, and furthermore it seems rather inconsistent to try to rename index columns (which are not very user-visible in the first place) while not renaming the indexes themselves (which surely are user-visible). There was some marginal excuse for doing it back when \d didn't show the index column definition; but now that it does, I don't think the behavior is worth expending effort on. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
Tom Lane wrote: Peter Eisentraut pete...@gmx.net writes: Could we create an option to create index names automatically, so you'd only have to write CREATE INDEX ON foo (a); which would pick a name like foo_a_idx. Having done all the groundwork to support that nicely, I find that it doesn't work because of bison limitations :-(. AFAICT, the only way we could support this syntax would be to make ON a reserved word. Or at least more reserved than it is now. We used up all the wiggle room we had by making CONCURRENTLY non-reserved. And here's Simon talking about making CONCURRENTLY more reserved so that people stop creating indexes named concurrently ... http://database-explorer.blogspot.com/2009/09/create-index-concurrently.html -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane wrote: ... AFAICT, the only way we could support this syntax would be to make ON a reserved word. Or at least more reserved than it is now. We used up all the wiggle room we had by making CONCURRENTLY non-reserved. And here's Simon talking about making CONCURRENTLY more reserved so that people stop creating indexes named concurrently ... http://database-explorer.blogspot.com/2009/09/create-index-concurrently.html Hmm. It would actually work if we made CONCURRENTLY reserved instead; and that would fix Simon's gripe too. That's kind of weird from a standards-compliance POV, but in terms of the risk of breaking applications it might be better than reserving ON. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
I wrote: Hmm. It would actually work if we made CONCURRENTLY reserved instead; and that would fix Simon's gripe too. That's kind of weird from a standards-compliance POV, but in terms of the risk of breaking applications it might be better than reserving ON. Wait a minute. I must have been looking at the wrong keyword list --- ON already is reserved. The problem is exactly that it can't tell whether CREATE INDEX CONCURRENTLY ON ... means to default the index name or to create an index named CONCURRENTLY. So really the *only* way to fix this is to make CONCURRENTLY be at least type_func_name_keyword. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
On Dec 22, 2009, at 7:31 PM, Tom Lane wrote: Wait a minute. I must have been looking at the wrong keyword list --- ON already is reserved. The problem is exactly that it can't tell whether CREATE INDEX CONCURRENTLY ON ... means to default the index name or to create an index named CONCURRENTLY. So really the *only* way to fix this is to make CONCURRENTLY be at least type_func_name_keyword. +1 if it prevents indexes from being named CONCURRENTLY. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] join ordering via Simulated Annealing
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: I've been playing with using a Simulated Annealing-type algorithm for determinig join ordering for relations. Cool. The code I have now creates the initial plan by doing something similar to what gimme_tree does in GEQO, but without any kind of heuristic to try to favour joins with join restrictions (the idea is that it doesn't matter, since we only want to get *any* plan and we only do it once), but ideally it would be somehow chosen randonly from the space of all possible join orderings. FWIW, I think that's probably a bad idea. In a query where there are a lot of join order constraints, you can waste enormous amounts of time trying to find a join order that is even legal at all, let alone any good. Pure randomness is not as nice as it seems. You might want to study the CVS history of GEQO a bit and try to avoid falling into some of the traps we already fell into with that ;-) o) each swap needs a full recalcualtion of the whole solution (geqo_eval does that, for instance), maybe it's possible to leave the subtrees of swapped nodes intact and only recalculate the nodes above the two swapped nodes? Even if you could make this work, it'd probably be an infeasible space-for-time tradeoff, because you'd have to keep around all the Path infrastructure of the lower nodes in order to not start from scratch. As we were forcibly reminded a few months ago, one of the important properties of the GEQO code is to be able to do planning with a limited amount of memory even for very large relation sets. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
David E. Wheeler wrote: On Dec 22, 2009, at 7:31 PM, Tom Lane wrote: Wait a minute. I must have been looking at the wrong keyword list --- ON already is reserved. The problem is exactly that it can't tell whether CREATE INDEX CONCURRENTLY ON ... means to default the index name or to create an index named CONCURRENTLY. So really the *only* way to fix this is to make CONCURRENTLY be at least type_func_name_keyword. +1 if it prevents indexes from being named CONCURRENTLY. Yeah, if you really want to have an index named like that you can use double quotes. Seems a sensible compromise. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] creating index names automatically?
Alvaro Herrera alvhe...@commandprompt.com writes: David E. Wheeler wrote: +1 if it prevents indexes from being named CONCURRENTLY. Yeah, if you really want to have an index named like that you can use double quotes. Seems a sensible compromise. Well, this will also break tables and columns named concurrently. I think the odds of it being a problem are small, but still it is a reserved word that shouldn't be reserved according to the SQL spec. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers