Re: [HACKERS] Crash bug in 8.2.3 on Solaris 10/Sparc
Zoltan Boszormenyi wrote: We compiled GCC-4.1.2 on this machine, recompiled PostgreSQL with the new GCC without --enable-integer-datetimes and it fixed the problem we experienced. It seems that my suspicion was right: GCC-3.4.3 on Solaris 10/Sparc is buggy. I tried original S10 gcc (3.4.3) on two different machine with different kernel update and both work fine. In term of our offlist communication and Tom's mention, It looks more as problem in linking/loading. Maybe some libraries mismatch. I'm not able say more without core. Zdenek ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Without a switch, because both full page writes and corresponding logical log is included in WAL, this will increase WAL size slightly (maybe about five percent or so). If everybody is happy with this, we don't need a switch. Sorry, I still don't understand that. What is the corresponding logical log ? It seems to me, that a full page WAL record has enough info to produce a dummy LSN WAL entry. So insead of just cutting the full page wal record you could replace it with a LSN WAL entry when archiving the log. Then all that is needed is the one flag, no extra space ? Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Autovacuum vs statement_timeout
Tom Lane wrote: I seem to remember that we'd agreed that autovacuum should ignore any globally set statement_timeout, on the grounds that a poorly chosen setting could indefinitely prevent large tables from being vacuumed. But I do not see anything in autovacuum.c that resets the variable. Am I just being blind? (Quite possible, as I'm tired and under the weather.) The thing that brought this to mind was the idea that Mark Shuttleworth's open problem might be triggered in part by a statement timeout interrupting autovacuum at an inopportune point --- some logs he sent me offlist show that he is using statement_timeout ... statement_timeout interrupts seem to go through the PG_CATCH-block and clean up the entry from the vacuum cycle array as they should. But a SIGINT leading to a terminating connection due to administrator command error does not. After the recent change in CVS HEAD, CREATE DATABASE tries to kill(SIGINT) any autovacuum process in the template database. That seems very dangerous now, it could easily leave stale entries in the cycle id array. However, it doesn't explain the Mark Shuttleworth's problem because the 8.2 behavior is to throw an source database is being accessed by other users error instead of killing autovacuum. Maybe there's something else killing autovacuum processes? I think we need to add the xid of the vacuum transaction in the vacuum cycle array, and clean up orphaned entries in _bt_start_vacuum. We're going to have a hard time plugging every leak one-by-one otherwise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote: Without a switch, because both full page writes and corresponding logical log is included in WAL, this will increase WAL size slightly (maybe about five percent or so). If everybody is happy with this, we don't need a switch. Sorry, I still don't understand that. What is the corresponding logical log ? It seems to me, that a full page WAL record has enough info to produce a dummy LSN WAL entry. So insead of just cutting the full page wal record you could replace it with a LSN WAL entry when archiving the log. Then all that is needed is the one flag, no extra space ? The full page write is required for crash recovery, but that isn't required during archive recovery because the base backup provides the safe base. Archive recovery needs the normal xlog record, which in some cases has been optimised away because the backup block is present, since the full block already contains the changes. If you want to remove the backup blocks, you need to put back the information that was optimised away, otherwise you won't be able to do the archive recovery correctly. Hence a slight increase in WAL volume to allow it to be compressed does make sense. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Minor changes to Recovery related code
I'd like to make the following changes to recovery related code over the next few days/weeks. If anybody insists I do this by freeze or not at all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd rather take a bit more time and do this in one drop and there are some code dependencies between these changes and other patches from Koichi-san and myself. 1. Current xlog should be archived at shutdown for smart shutdown - check that archiver is active prior to shutdown checkpoint request - if (shutdown checkpoint XLogArchivingActive()) RequestXLogSwitch() - for smart shutdown, have archiver complete its work before exiting 2. pg_stop_backup() should wait until all archive files are safely archived before returning 3. Need a %r parameter for restore_command, to allow the restore command be passed the name of the file containing the last restartpoint. This will allow the restore command to clean down old archive files more safely/cleanly in Warm Standby operation. - change to pg_standby to accept the parameter and use %r rather than -k parameter 4. Add an option to pg_standby to have it complete all outstanding archive files after it has been triggered, minimising data loss at the slight expense of cut-over time 5. log_restartpoint option in recovery.conf LOG each restorepoint, so can understand whether restartable recovery will be effective or not 6. refactor recovery.conf so that it uses a GUC-like parser 7. refactor all xlog _desc routines into one module, so these can be more easily used by xlogviewer utility 8. get xlogviewer utility a freshen-up so it can be part of main release, possibly including further refactoring of xlog.c 9. Another round of doc updates to highlight the use of pg_standby and Koichi-san's work. I think Doug Knight might have some additional code examples to include as well, from previous discussions. 10. Changes to ensure WAL-avoiding operations and hot backups cannot be executed simultaneously. One of these two options, ISTM: a) Following a change to archive_command while server is running. Log the xid of the WAL-avoiding operations when they start and have pg_start_backup() wait for those xids to complete before continuing. b) Introduce a new parameter, archive_mode = on | off that can only be set at server start. If archive_mode = on then XLogArchivingActive(); archiving only takes place when archive_command is not ''. This allows archive_command to be changed while server running, yet without any danger from WAL-avoiding operations. [7 8 would be complete by about 5-6 weeks from now. Others much earlier] Comments? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Archive recovery needs the normal xlog record, which in some cases has been optimised away because the backup block is present, since the full block already contains the changes. Aah, I didn't know that optimization exists. I agree that removing that optimization is good/ok. Andreas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Proposal: include PL/Proxy into core
PL/Proxy is a small PL whose goal is to allow creating proxy functions that call actual functions in remote database. Basic design is: Function body describes how to deduce final database. Its either CONNECT 'connstr'; -- connect to exactly this db or when partitioning is used: -- partitons are described under that name CLUSTER 'namestr'; -- calculate int4 based on function paramenters -- and use that to pick a partition RUN ON hashtext(username); Actual function call info (arguments, result fields) are deduced from looking at its own signature. so function foo(int4, text) returns setof text will result in query select * from foo($1::int4, $2::text) to be executed. Announcement with more examples: http://archives.postgresql.org/pgsql-announce/2007-03/msg5.php Documentation: https://developer.skype.com/SkypeGarage/DbProjects/PlProxy Patch: http://plproxy.projects.postgresql.org/plproxy_core.diff.gz Now, why put it into core? 1) Much simpler replacement for various other clustering solutions that try to cluster regular SQL. 2) Nicer replacement for dblink. 3) PLs need much more intimate knowledge of the PostgreSQL core then regular modules. API for PLs has been changing every major revision of PostgreSQL. 4) It promotes the db-access-thru-functions design to databases, that has proven to be killer feature of PostgreSQL. In a sense it is using PostgreSQL as appserver which provides fixed API via functions for external users, but hides internal layout from them, so it can be changed invisibly to external users. 5) The language is ready feature-wise - theres no need for it to grow into Remote PLPGSQL, because all logic can be put into remote function. Some objections that may be: 1) It is not a universal solves-everything tool for remote access/clustering. But those solves-everything tools have very hard time maturing, and will be not exactly simple. Much better is to have simple tool that works well. 2) You cant use it for all thing you can use dblink. PL/Proxy is easier to use for simple result fetching. For complicated access using full-blown PLs (plperl, plpython) is better. From such POV dblink is replaced. 3) It is possible for PL to live outside, The pain is not that big. Sure its possible. We just feel that its usefulness : lines-of-code ratio is very high, so its worthy of being builtin into PostgreSQL core, thus also giving PostgreSQL opportunity to boast being clusterable out-of-box. 4) What about all the existing apps that dont access database thru functions? Those are target for solves-everything tool... 5) It it too new product. We think this is offset by the small scope of the task it takes, and it already works well in that scope. -- marko ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Simon Riggs wrote: On Fri, 2007-03-30 at 10:22 +0200, Zeugswetter Andreas ADI SD wrote: Without a switch, because both full page writes and corresponding logical log is included in WAL, this will increase WAL size slightly (maybe about five percent or so). If everybody is happy with this, we don't need a switch. Sorry, I still don't understand that. What is the corresponding logical log ? It seems to me, that a full page WAL record has enough info to produce a dummy LSN WAL entry. So insead of just cutting the full page wal record you could replace it with a LSN WAL entry when archiving the log. Then all that is needed is the one flag, no extra space ? The full page write is required for crash recovery, but that isn't required during archive recovery because the base backup provides the safe base. Is that always true? Could the backup not pick up a partially-written page? Assuming it's being written to as the backup is in progress. (We are talking about when disk blocks are smaller than PG blocks here, so can't guarantee an atomic write for a PG block?) -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Proposal: include PL/Proxy into core
Le vendredi 30 mars 2007 12:36, Marko Kreen a écrit : Patch: http://plproxy.projects.postgresql.org/plproxy_core.diff.gz Note a perhaps oversight in your makefile : + #REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-language=plpgsql --load-language=plproxy + REGRESS_OPTS = --dbname=regression --load-language=plpgsql --load-language=plproxy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposal: include PL/Proxy into core
On 3/30/07, Cédric Villemain [EMAIL PROTECTED] wrote: Le vendredi 30 mars 2007 12:36, Marko Kreen a écrit: Patch: http://plproxy.projects.postgresql.org/plproxy_core.diff.gz Note a perhaps oversight in your makefile : + #REGRESS_OPTS = --dbname=$(PL_TESTDB) --load-language=plpgsql --load-language=plproxy + REGRESS_OPTS = --dbname=regression --load-language=plpgsql --load-language=plproxy Heh. The problem is I had 'regression' hardwired into regtests, so I could not use $(PL_TESTDB). If the proposal is accespted and we want to always run PL/Proxy regtests, there should be some dynamic way of passing main dbname and also connstrings for partitions into regression tests. ATM I thought it can stay as-is. (Actually I forgot that change after I had done it :) -- marko ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote: Is that always true? Could the backup not pick up a partially-written page? Assuming it's being written to as the backup is in progress. (We are talking about when disk blocks are smaller than PG blocks here, so can't guarantee an atomic write for a PG block?) Any page written during a backup has a backup block that would not be removable by Koichi's tool, so yes, you'd still be safe. i.e. between pg_start_backup() and pg_stop_backup() we always use full page writes, even if you are running in full_page_writes=off mode. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] UPDATE using sub selects
Hi, No. Offhand I think you'd either need to relax EXPR_SUBLINK to allow multiple output columns, or invent a ROW_SUBLINK SubLinkType that is just like EXPR_SUBLINK except for allowing multiple output columns. The latter would probably be less likely to break other things... Yeah, was looking at EXPR_SUBLINK and its single column use case and drove to the same conclusion that inventing a new sublink type would be better too. It is indeed becoming a not so simple and narrow fix as you had mentioned earlier in your first response :) I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan. This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal SET colname = expr targets). Is there a simpler way of doing things? Should I try generating a resjunk TargetEntry in transformUpdateStmt and have its expr point to the subquery and see if that works? Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Simon Riggs wrote: On Fri, 2007-03-30 at 11:27 +0100, Richard Huxton wrote: Is that always true? Could the backup not pick up a partially-written page? Assuming it's being written to as the backup is in progress. (We are talking about when disk blocks are smaller than PG blocks here, so can't guarantee an atomic write for a PG block?) Any page written during a backup has a backup block that would not be removable by Koichi's tool, so yes, you'd still be safe. i.e. between pg_start_backup() and pg_stop_backup() we always use full page writes, even if you are running in full_page_writes=off mode. Ah, that's OK then. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal: include PL/Proxy into core
Ühel kenal päeval, R, 2007-03-30 kell 13:36, kirjutas Marko Kreen: PL/Proxy is a small PL whose goal is to allow creating proxy functions that call actual functions in remote database. Basic design is: Function body describes how to deduce final database. Its either CONNECT 'connstr'; -- connect to exactly this db or when partitioning is used: -- partitons are described under that name CLUSTER 'namestr'; -- calculate int4 based on function paramenters -- and use that to pick a partition RUN ON hashtext(username); Actual function call info (arguments, result fields) are deduced from looking at its own signature. so function foo(int4, text) returns setof text will result in query select * from foo($1::int4, $2::text) to be executed. Announcement with more examples: http://archives.postgresql.org/pgsql-announce/2007-03/msg5.php Documentation: https://developer.skype.com/SkypeGarage/DbProjects/PlProxy Patch: http://plproxy.projects.postgresql.org/plproxy_core.diff.gz Now, why put it into core? 1) Much simpler replacement for various other clustering solutions that try to cluster regular SQL. 2) Nicer replacement for dblink. 3) PLs need much more intimate knowledge of the PostgreSQL core then regular modules. API for PLs has been changing every major revision of PostgreSQL. 4) It promotes the db-access-thru-functions design to databases, that has proven to be killer feature of PostgreSQL. In a sense it is using PostgreSQL as appserver which provides fixed API via functions for external users, but hides internal layout from them, so it can be changed invisibly to external users. 5) The language is ready feature-wise - theres no need for it to grow into Remote PLPGSQL, because all logic can be put into remote function. Some objections that may be: 1) It is not a universal solves-everything tool for remote access/clustering. But those solves-everything tools have very hard time maturing, and will be not exactly simple. Much better is to have simple tool that works well. current pl/proxy proposed here for inclusion is already an almost complete redesign and rewrite based on our experiences of using the initial version in production databases, so you can expect ver 2.x robustness, maintainability and code cleanness from it. 5) It it too new product. We think this is offset by the small scope of the task it takes, and it already works well in that scope. Also, it is actively used serving thousands of requests per second in a 24/7 live environment, which means that it should be reasonably well tested. Together with our lightweight connection pooler https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer pl/proxy can be used to implement the vision of building a DB-bus over a database farm of diverse postgresql servers as shown in SLIDE3: of https://developer.skype.com/SkypeGarage/DbProjects/SkypePostgresqlWhitepaper . The connection pooler is not strictly needed and can be left out for smaller configurations with maybe less than about 10 databases and/or concurrent db connections. (btw, the connection poolers name PgBouncer comes from its initial focus of bouncing around single-transaction db calls.) -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Proposal: Adding CP51932 encoding
Hi, Here is a proposal to add a client side encoding CP51932. CP51932 is an encoding for Japanese, defined by Microsoft. It is similar to existing EUC_JP encoding but some characters are mapped differently. Unfortunately for most Windows apps EUC-JP means CP51932. As a result, for example, if a web page is encoded with EUC-JP, web browsers will encode users' input as CP51932 and save data to PostgreSQL database as CP51932, which is not what PostgreSQL expects of course. Adding CP51932 -- EUC_JP conversion should solve the problem (CP51932 -- UTF-8 and SJIS should be added too). If there's no objection, patches for current will be posted for a review (patches have been develpped by a Japanese developer, not me). Comments and suggestions are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Autovacuum vs statement_timeout
Heikki Linnakangas [EMAIL PROTECTED] writes: statement_timeout interrupts seem to go through the PG_CATCH-block and clean up the entry from the vacuum cycle array as they should. But a SIGINT leading to a terminating connection due to administrator command error does not. Hm, that's an interesting thought, but there are no terminating connection messages in Shuttleworth's logs either. So we still lack the right idea there. (BTW it would be SIGTERM not SIGINT.) I think we need to add the xid of the vacuum transaction in the vacuum cycle array, and clean up orphaned entries in _bt_start_vacuum. We're going to have a hard time plugging every leak one-by-one otherwise. You're thinking too small --- what this thought actually suggests is that PG_CATCH can't be used to clean up shared memory at all, and I don't think we want to accept that. (I see several other places already where we assume we can do that. We could convert each one into an on_proc_exit cleanup operation, maybe, but that seems messy and not very scalable.) I'm thinking we may want to redesign elog(FATAL) processing so that we escape out to the outer level before calling proc_exit, thereby allowing CATCH blocks to run first. Note for the archives: I've argued for some time that SIGTERM'ing individual backends is an insufficiently tested code path to be exposed as standard functionality. Looks like that's still true. This is not a bug for database shutdown because we don't really care if we leave perfectly clean shared memory behind --- it's only a bug if you try to SIGTERM an individual vacuum process while leaving the system up. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee [EMAIL PROTECTED] writes: How about storing the snapshot which we used during planning in CachedPlanSource, if at least one index was seen unusable because its CREATE INDEX transaction was seen as in-progress ? I'm getting tired of repeating this, but: the planner doesn't use a snapshot. System catalogs run on SnapshotNow. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: I'm getting tired of repeating this, but: the planner doesn't use a snapshot. System catalogs run on SnapshotNow. I am really sorry if I sound foolish here. I am NOT suggesting that we use snapshot to read system catalogs. I understand that system catalogs run on SnapshotNow and all transactions, irrespective of when they started, would see the changes to system catalogs as soon as the transaction updating the system catalog commits. What I am suggesting is to use ActiveSnapshot (actually Florian's idea) to decide whether the transaction that created index was still running when we started. Isn't it the case that some snapshot will be active when we plan ? The active snapshot may change later in the same transaction if we are running in read-committed mode, and we may need to invalidate the plan. Here is what I suggest to do in get_relation_info(): +if (index-indcreatexid != InvalidTransactionId) +{ +Assert(ActiveSnapshot); +if (XidInMVCCSnapshot(index-indcreatexid, ActiveSnapshot)) +{ +index_close(indexRelation, NoLock); +continue; +} +/* + * Otherwise the index is usable + */ +} Is there a problem with this ? I really appreciate all the help I am receiving on this. But there is hardly anything else that I can do than post my thoughts and get feedback, until we find a clean solution :-( Thanks, Pavan -- EnterpriseDBhttp://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: How about storing the snapshot which we used during planning in CachedPlanSource, if at least one index was seen unusable because its CREATE INDEX transaction was seen as in-progress ? I'm getting tired of repeating this, but: the planner doesn't use a snapshot. System catalogs run on SnapshotNow. But it would still do that - it would just compare the createxid of the index against some snapshot, and the query would be replanned if the cached result of this comparison differs from the one the current snapshot yields. It might well be that this won't work, because the planner is invoked in situations where there is no active snapshot - I'm not sure if your comment refers to that case, or not. greetings, Florian Pflug ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [pgsql-www] Help: reading the source
1) Wrong list, you want -hackers. 2) Did you have a specific question based on what you have read or are you asking for suggested reading? On 3/29/07, 李宁 [EMAIL PROTECTED] wrote: Dear everyone, I'm a college student,now I am doing the topic about the postgresql storage management as my thesis. I wish I can get help here about where should I begin. I am reading the book Database System Implementation and reading the source code,mainly in storage/smgr,I hope someone can give me some suggestion or instruction. Thank you! Sincerely yours ninglee ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee [EMAIL PROTECTED] writes: What I am suggesting is to use ActiveSnapshot (actually Florian's idea) to decide whether the transaction that created index was still running when we started. Isn't it the case that some snapshot will be active when we plan ? I do not think you can assume that the plan won't be used later with some older snapshot. Consider recursive plpgsql functions for a counterexample: the inner occurrence might be the first to arrive at a given line of the function, hence the first to plan it, yet when we return to the outer instance we might revert to an older snapshot. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Oracle indemnifies PostgreSQL on its patents
Here is a surprising article about how Oracle has made open source projects, like PostgreSQL, safe from claims of infringing Oracle patents: http://www.cbronline.com/article_news.asp?guid=A0F5F220-5940-470D-8564-CEA7E2D2B954. Oracle, like IBM, Sony, RedHat, and Novell, is now a member of Oracle, like IBM, Sony, RedHat, and Novell, is now a member of the Open Invention Network, whose members all offer patent indemnification. -- Bruce Momjian [EMAIL PROTECTED] http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: What I am suggesting is to use ActiveSnapshot (actually Florian's idea) to decide whether the transaction that created index was still running when we started. Isn't it the case that some snapshot will be active when we plan ? I do not think you can assume that the plan won't be used later with some older snapshot. Consider recursive plpgsql functions for a counterexample: the inner occurrence might be the first to arrive at a given line of the function, hence the first to plan it, yet when we return to the outer instance we might revert to an older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? What about doing PREPARE myplan select ... ; outside of a transaction? Will this be execute inside a transaction? Is is a query always planned upon it's first execution, and not when PREPARE is issued? greetings, Florian Pflug ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote: What about doing PREPARE myplan select ... ; outside of a transaction? Will this be execute inside a transaction? I checked that. PREPARE runs with ActiveSnapshot set. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] CREATE INDEX and HOT - revised design
Florian G. Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: I do not think you can assume that the plan won't be used later with some older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? That might work, but it doesn't seem to address the core objection: there's no mechanism to cause the query to be replanned once the snapshot is new enough, because no relcache inval will happen. So most likely existing backends will keep using old plans that don't consider the index. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] CREATE INDEX and HOT - revised design
On Fri, 2007-03-30 at 11:44 +0530, Pavan Deolasee wrote: ISTM that the run-another-transaction-afterwards idea would have same problem with plan invalidation. When the second transaction commits, the relcache invalidation event is generated. The event may get consumed by other backends, but the index may still not be usable to them because their xid xcreat. If no more relcache invalidation events are generated after that, the backends would continue to use the cached plan, even if index becomes usable to them later. Sounds like we need to allow create index invalidation events to be processed at the the end of the current transaction in the *receiving* backend. That way we don't need to do the run-another-transaction thing and seems a helluva lot cleaner way of doing this. Messages of SHAREDINVALRELCACHE_ID, would be augmented by a boolean deferred flag on the SharedInvalRelcacheMsg struct. Received messages would be stored in a third kind of InvalidationList, then processed during AtEOXact_Inval() whether the receiving transaction commits or not. (see src/backend/utils/cache/inval.c) Not sure how we'd know to *send* the message marked as deferred, but seems like we can work that out also. That seems to allow CCI to not have to wait until the end of time either. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote: That might work, but it doesn't seem to address the core objection: there's no mechanism to cause the query to be replanned once the snapshot is new enough, because no relcache inval will happen. So most likely existing backends will keep using old plans that don't consider the index. Can't we store the snapshot (or may be the transaction id) which was used to plan the query in CachedPlanSource if and only if at least one index was seen unusable ? In RevalidateCachedPlan() we then check if the snapshot has changed and replan the query in that case. That would make the index usable in the subsequent transactions in the same session, though we may not be able to use the index in the same transaction, even if its running in read-commited mode. Would that be acceptable ? Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] CREATE INDEX and HOT - revised design
Tom Lane wrote: Florian G. Pflug [EMAIL PROTECTED] writes: Tom Lane wrote: I do not think you can assume that the plan won't be used later with some older snapshot. So maybe we'd need to use the SerializableSnapshot created at the start of each transaction for this check, and not the ActiveSnapshot? Could that work? That might work, but it doesn't seem to address the core objection: there's no mechanism to cause the query to be replanned once the snapshot is new enough, because no relcache inval will happen. So most likely existing backends will keep using old plans that don't consider the index. Pavan suggested storing the IndexSnapshot in the cached plan, and to compare it to the IndexSnapshot when the query is executed. If those two snapshots differ, the query would be replanned. My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Those two ideas seem to be mostly equivalent, mine seems to be a bit more fine-grained, but at the cost of more work upon each query execution. greetings, Florian Pflug ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Autovacuum vs statement_timeout
I wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: statement_timeout interrupts seem to go through the PG_CATCH-block and clean up the entry from the vacuum cycle array as they should. But a SIGINT leading to a terminating connection due to administrator command error does not. Hm, that's an interesting thought, but there are no terminating connection messages in Shuttleworth's logs either. So we still lack the right idea there. (BTW it would be SIGTERM not SIGINT.) Hold it ... stop the presses ... the reason we saw no terminating connection messages was he was grepping his logs for lines containing ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not 100% convinced that any are from autovacuum processes, but clearly *something* is throwing SIGTERM around with abandon in his test environment. So at this point your theory above looks like a plausible mechanism for the vacuum cycle array to slowly fill up and eventually make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to a repeat vacuum attempt). I think we need to add the xid of the vacuum transaction in the vacuum cycle array, and clean up orphaned entries in _bt_start_vacuum. We're going to have a hard time plugging every leak one-by-one otherwise. You're thinking too small --- what this thought actually suggests is that PG_CATCH can't be used to clean up shared memory at all, and I don't think we want to accept that. (I see several other places already where we assume we can do that. We could convert each one into an on_proc_exit cleanup operation, maybe, but that seems messy and not very scalable.) I'm thinking we may want to redesign elog(FATAL) processing so that we escape out to the outer level before calling proc_exit, thereby allowing CATCH blocks to run first. I was hoping we could do that just as an 8.3 change, but it's now starting to look like we might have to back-patch it, depending on how much we care about surviving random SIGTERM attempts. I'd like to wait for some report from Mark about what's causing all the SIGTERMs before we evaluate that. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Autovacuum vs statement_timeout
Tom Lane wrote: I wrote: Heikki Linnakangas [EMAIL PROTECTED] writes: statement_timeout interrupts seem to go through the PG_CATCH-block and clean up the entry from the vacuum cycle array as they should. But a SIGINT leading to a terminating connection due to administrator command error does not. Hm, that's an interesting thought, but there are no terminating connection messages in Shuttleworth's logs either. So we still lack the right idea there. (BTW it would be SIGTERM not SIGINT.) Hold it ... stop the presses ... the reason we saw no terminating connection messages was he was grepping his logs for lines containing ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not 100% convinced that any are from autovacuum processes, but clearly *something* is throwing SIGTERM around with abandon in his test environment. So at this point your theory above looks like a plausible mechanism for the vacuum cycle array to slowly fill up and eventually make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to a repeat vacuum attempt). Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not in 8.2. SIGINT does terminate the autovac process however. I haven't read the whole problem report completely, so I'm not sure this has something to do or not. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote: My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Actually, if we are using Serializable Snapshot then there is no chance to replan the query before the transaction completes and the next transaction to start in the session must see the index and hence we must replan. So it would be enough just to associate a transaction id with the cached plan. If this xid is set and our transaction id is different than that, we replan. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Autovacuum vs statement_timeout
Alvaro Herrera [EMAIL PROTECTED] writes: Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not in 8.2. SIGINT does terminate the autovac process however. I haven't read the whole problem report completely, so I'm not sure this has something to do or not. AFAICT, SIGINT should be okay, because it will lead to an ERROR not a FATAL elog; so control should fall out through the CATCH block before the autovacuum process quits. The problem is with FATAL elogs. Mark reports that the only FATAL lines in his logs are instances of FATAL: terminating connection due to administrator command FATAL: database launchpad_ftest does not exist and the latter presumably isn't coming out from within the btree vacuum code, so I don't see any other explanation for a FATAL exit than SIGTERM. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Proposal: include PL/Proxy into core
Marko Kreen [EMAIL PROTECTED] writes: Now, why put it into core? I don't think you have made a sufficient case for that. I think it should stay as an outside project for awhile and see what sort of userbase it attracts. If it becomes sufficiently popular I'd be willing to consider adding it to core, but that remains to be seen. We can barely keep up maintaining what's in core now --- we need to be very strict about adding stuff that doesn't really have to be in core, and this evidently doesn't, since you've got it working ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee wrote: On 3/30/07, Florian G. Pflug [EMAIL PROTECTED] wrote: My idea was to store a list of xid's together with the cached plan that are assumed to be uncommitted accoring to the IndexSnapshot. The query is replanned if upon execution the IndexSnapshot assumes that one of these xid's is committed. Actually, if we are using Serializable Snapshot then there is no chance to replan the query before the transaction completes and the next transaction to start in the session must see the index and hence we must replan. So it would be enough just to associate a transaction id with the cached plan. If this xid is set and our transaction id is different than that, we replan. I believe this is true for the CREATE INDEX scenario. However, comparing either the snapshot or the result of xid checks seems like it might be useful for other things beside CREATE INDEX. I'm specifically thinking about TRUNCATE here - the create index + HOT problems sound quite similar to the problems a non-exclusive-locking TRUNCATE would face. greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] CREATE INDEX and HOT - revised design
Pavan Deolasee [EMAIL PROTECTED] writes: Actually, if we are using Serializable Snapshot then there is no chance to replan the query before the transaction completes and the next transaction to start in the session must see the index and hence we must replan. So it would be enough just to associate a transaction id with the cached plan. If this xid is set and our transaction id is different than that, we replan. Hm. So anytime we reject a potentially useful index as being not valid yet, we mark the plan as only good for this top-level transaction? That seems possibly workable --- in particular it doesn't get more complicated as soon as you consider multiple such indexes. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] CREATE INDEX and HOT - revised design
On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote: Pavan Deolasee [EMAIL PROTECTED] writes: Actually, if we are using Serializable Snapshot then there is no chance to replan the query before the transaction completes and the next transaction to start in the session must see the index and hence we must replan. So it would be enough just to associate a transaction id with the cached plan. If this xid is set and our transaction id is different than that, we replan. Hm. So anytime we reject a potentially useful index as being not valid yet, we mark the plan as only good for this top-level transaction? That seems possibly workable --- in particular it doesn't get more complicated as soon as you consider multiple such indexes. I like that because its specific in dealing with the exact issue we have - it doesn't rely on many other things happening correctly. ...and it also seems to provide a new route to avoiding the CIC wait. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Proposal: include PL/Proxy into core
Hannu, Marko, I, personally, think that it's worth talking about integrating these. However, the old versions were definitely NOT ready for integration, and the new versions went on the internet like a week ago. Heck, I haven't even downloaded them yet. Can we address these on the 8.4 timeline? That will give the rest of us in the community time to download, try and debug the new SkyTools. I know I'm planning on testing them and will know a lot more about your code/performance in a few months. Is there a reason why getting PL/proxy into 8.3 is critical? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Arrays of Complex Types
On Fri, Mar 02, 2007 at 03:40:16PM -0800, David Fetter wrote: Folks, I'd like to take the TODO item that reads, Add support for arrays of complex types, but before I start patching, I'd like to see whether what I'm about to do makes any sense: After several rounds of patches, it appears that it might be easier to create a new typtype entry, which I'll tentatively call 'a' because it seems a little fragile and a lot inelegant and hard to maintain to have typtype='c' and typrelid=InvalidOid mean, this is an array of complex types. I'd like to see about making this new typtype available for arrays of DOMAINs eventually, but that's not a requirement right this instant. What parts of the code would need a once-over? Cheers, D -- David Fetter [EMAIL PROTECTED] http://fetter.org/ phone: +1 415 235 3778AIM: dfetter666 Skype: davidfetter Remember to vote! Consider donating to PostgreSQL: http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Minor changes to Recovery related code
Simon Riggs [EMAIL PROTECTED] writes: I'd like to make the following changes to recovery related code over the next few days/weeks. If anybody insists I do this by freeze or not at all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd rather take a bit more time and do this in one drop and there are some code dependencies between these changes and other patches from Koichi-san and myself. Well, I've got a proposal for a pg_proc change that I haven't even started coding yet, so personally I won't hold you to having the patch submitted as long as the design is agreed to before feature freeze. However: 2. pg_stop_backup() should wait until all archive files are safely archived before returning Not sure I agree with that one. If it fails, you can't tell whether the action is done and it failed while waiting for the archiver, or if you need to redo it. 6. refactor recovery.conf so that it uses a GUC-like parser I would suggest that after feature freeze is not the time for code beautification efforts like this. (I rather doubt that it's worth doing at all actually, but definitely not now.) 7. refactor all xlog _desc routines into one module, so these can be more easily used by xlogviewer utility Even more so. 8. get xlogviewer utility a freshen-up so it can be part of main release, possibly including further refactoring of xlog.c This is not happening for 8.3, either. 10. Changes to ensure WAL-avoiding operations and hot backups cannot be executed simultaneously. One of these two options, ISTM: b) Introduce a new parameter, archive_mode = on | off that can only be set at server start. If archive_mode = on then XLogArchivingActive(); archiving only takes place when archive_command is not ''. This allows archive_command to be changed while server running, yet without any danger from WAL-avoiding operations. I think I'd go with (b) since a lot of people felt it should've been like that from the beginning, and found the magic empty string behavior confusing. [7 8 would be complete by about 5-6 weeks from now. Others much earlier] We are hoping to go beta in less time than that. While I'm willing to cut a little slack, anything you can't submit before about mid-April is not going to make it into 8.3. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
Simon Riggs [EMAIL PROTECTED] writes: Any page written during a backup has a backup block that would not be removable by Koichi's tool, so yes, you'd still be safe. How does it know not to do that? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] UPDATE using sub selects
NikhilS [EMAIL PROTECTED] writes: I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan. This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal SET colname = expr targets). Hmm. That sounds like it would be a horrid mess. You need to decouple the execution of the subplan from the use of its outputs, apparently. There is some precedent for this in the way that InitPlans are handled: the result of the subplan is stored into a ParamList array entry that's later referenced by a Param node in the parent's expression tree. That would generalize easily enough to setting more than one Param, but I'm not clear on where you'd want to stick the subplan itself in the plan tree, nor on what controls how often it needs to get evaluated. An alternative approach is to put the subplan into the rangetable and use Vars to reference its outputs. Again it's not quite clear what drives re-execution of the subplan. It strikes me though that an approach like this might also serve for SQL2003's LATERAL construct, which'd be a nice thing to support. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Oracle indemnifies PostgreSQL on its patents
Bruce, Oracle, like IBM, Sony, RedHat, and Novell, is now a member of the Open Invention Network, whose members all offer patent indemnification. Hey! We could go back to using ARC! ;-) -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Arrays of Complex Types
David Fetter [EMAIL PROTECTED] writes: After several rounds of patches, it appears that it might be easier to create a new typtype entry, which I'll tentatively call 'a' because it seems a little fragile and a lot inelegant and hard to maintain to have typtype='c' and typrelid=InvalidOid mean, this is an array of complex types. Uh, wouldn't it be typtype = 'c' and typelem != 0 ? I'd like to see about making this new typtype available for arrays of DOMAINs eventually, but that's not a requirement right this instant. Hmm. It might not be a bad idea to switch to 'a' for arrays over regular scalar types too. Right now we have some klugy rules involving looking at typlen to decide whether an array type is a normal array. (There are also some special subscriptable types like name and point, which should continue to not use 'a' because they are not general purpose arrays. So the 'a' marker wouldn't be entirely redundant with typelem being nonzero, rather checking for 'a' would replace the places where we test both typelem and typlen.) OTOH this is a lot of hacking for something that I'm not convinced is really needed. Anyway, the point is that I dislike the idea of doing arrays for complex types differently from those for scalars --- either both should use a new typtype, or neither. If you try to do it differently then you'll have more complexity, not less, since there are a lot of places that shouldn't need to care. get_element_type() is an example. What parts of the code would need a once-over? A lot :-( ... probably every place that touches typtype or typelem would need at least a look. It'd be a good idea to take the opportunity to start using macros for the values of typtype, as we do for relkind but for some reason never adopted for typtype. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] Full page writes improvement, code update
On Fri, 2007-03-30 at 16:35 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Any page written during a backup has a backup block that would not be removable by Koichi's tool, so yes, you'd still be safe. How does it know not to do that? Not sure what you mean, but I'll take a stab... I originally questioned Koichi-san's request for a full_page_compress parameter, which is how it would tell whether/not. After explanation, I accepted the need for a parameter, but I think we're looking for a new name for it. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Minor changes to Recovery related code
On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I'd like to make the following changes to recovery related code over the next few days/weeks. If anybody insists I do this by freeze or not at all, then I'll submit patches for 1,3,4,5,10 before Saturday night. I'd rather take a bit more time and do this in one drop and there are some code dependencies between these changes and other patches from Koichi-san and myself. Well, I've got a proposal for a pg_proc change that I haven't even started coding yet, so personally I won't hold you to having the patch submitted as long as the design is agreed to before feature freeze. However: Cool 2. pg_stop_backup() should wait until all archive files are safely archived before returning Not sure I agree with that one. If it fails, you can't tell whether the action is done and it failed while waiting for the archiver, or if you need to redo it. There's a slight delay between pg_stop_backup() completing and the archiver doing its stuff. Currently if somebody does a -m fast straight after the pg_stop_backup() the backup may be unusable. We need a way to plug that small hole. I suggest that pg_stop_backup() polls once per second until pg_xlog/archive_status/LOG.ready disappears, in which case it ends successfully. If it does this for more than 60 seconds it ends successfully but produces a WARNING. 6. refactor recovery.conf so that it uses a GUC-like parser I would suggest that after feature freeze is not the time for code beautification efforts like this. (I rather doubt that it's worth doing at all actually, but definitely not now.) OK 7. refactor all xlog _desc routines into one module, so these can be more easily used by xlogviewer utility Even more so. OK 8. get xlogviewer utility a freshen-up so it can be part of main release, possibly including further refactoring of xlog.c This is not happening for 8.3, either. OK 10. Changes to ensure WAL-avoiding operations and hot backups cannot be executed simultaneously. One of these two options, ISTM: b) Introduce a new parameter, archive_mode = on | off that can only be set at server start. If archive_mode = on then XLogArchivingActive(); archiving only takes place when archive_command is not ''. This allows archive_command to be changed while server running, yet without any danger from WAL-avoiding operations. I think I'd go with (b) since a lot of people felt it should've been like that from the beginning, and found the magic empty string behavior confusing. OK We are hoping to go beta in less time than that. While I'm willing to cut a little slack, anything you can't submit before about mid-April is not going to make it into 8.3. No probs, you struck a line through the more time consuming items. :-) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Minor changes to Recovery related code
Simon Riggs wrote: On Fri, 2007-03-30 at 16:34 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: 2. pg_stop_backup() should wait until all archive files are safely archived before returning Not sure I agree with that one. If it fails, you can't tell whether the action is done and it failed while waiting for the archiver, or if you need to redo it. There's a slight delay between pg_stop_backup() completing and the archiver doing its stuff. Currently if somebody does a -m fast straight after the pg_stop_backup() the backup may be unusable. We need a way to plug that small hole. I suggest that pg_stop_backup() polls once per second until pg_xlog/archive_status/LOG.ready disappears, in which case it ends successfully. If it does this for more than 60 seconds it ends successfully but produces a WARNING. I fear that ending sucessfully despite having not archived all wals will make this feature less worthwile. If a dba knows what he is doing, he can code a perfectly safe backup script using 8.2 too. He'll just have to check the current wal position after pg_stop_backup(), (There is a function for that, right?), and wait until the corresponding wal was archived. In realitly, however, I feare that most people will just create a script that does 'echo select pg_stop_backup | psql' or something similar. If they're a bit more carefull, they will enable ON_ERROR_STOP, and check the return value of pgsql. I believe that those are the people who would really benefit from a pg_stop_backup() that waits for archiving to complete. But they probably won't check for WARNINGs. Maybe doing it the other way round would be an option? pg_stop_backup() could wait for the archiver to complete forever, but spit out a warning every 60 seconds or so WARNING: Still waiting for wal archiving of wal ??? to complete. If someone really wants a 60-second timeout, he can just use statement_timeout. Anyway, just my 0.02 eurocents, maybe I'm totally mistaken about the postgresql dba's out there... greetings, Florian Pflug ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()
While cleaning up some pg_migrator code (http://pgfoundry.org/projects/pg-migrator/) it occurred to me that a typical libpq client application spends a lot of code constructing SQL commands. The code typically looks like this: a) allocate enough room to hold the command b) sprintf( command, text, argument, argument, argument, ... ) c) PQexec( conn, command ) d) free( command ) In most cases, the amount of memory that you allocate in step a) is just an educated guess. It's typically more room than you need, occassionally less room than you need (and you get a buffer overflow exploit), and it's rarely maintained properly when you modify the command text (or the argument list). I'd like to see a new variant on PQexec(): PGresult * PQexecf(PGconn *conn, const char *fmt, ...); PQexecf() simply performs steps a, b, c, and d for you. And you call it like this: PQexecf( conn, text, argument, argument, argument, ... ) PQexecf() is just a wrapper around the already existing createPQExpBuffer(), enlargePQExpBuffer(), printfPQExpBuffer(), and PQexec() so it introduces no new code (other than assembling the wrapper) and doesn't change any existing code. PQexecf() is similar to PQexecParams() but it much simpler to use (and should be very familiar to C programmers). PQexecf() is not intended as a replacement for PQprepare() and PQexecPrepared() - you should use prepare/exec when you want to execute a command many times. I could eliminate a lot of client-side code if PQexecf() were available - and the code that I could remove is the code that's most likely to be buggy and least likely to be properly maintained. I've thrown together an UNTESTED prototype (below), just to get the idea across - you'll recognize that most of this code is identical to printPQExpBuffer(). In the prototype, I'm keeping a static PQExpBuffer that grows to the hold the largest string ever required by the client application - that part seems to be a point for discussion, but since the detail is hidden in the implementation, we could adjust the code later if necessary (without changing the interface). Of course, I could include an implementation of PQexecf() in each of my client applications if it were not available in libpq, but that would be silly and I'd have to invent my own createPQExpBuffer() / enlargePQExpBuffer() code since those functions are not an official part of libpq (and won't even be available to a Win32 client application). Is it just too late to even think about this for 8.3? (Bruce laughed at me when I suggested the idea :-) -- Korry [EMAIL PROTECTED] http://www.enterprisedb.com PGresult * PQexecf(PGconn *conn, const char *fmt, ...) { static PQExpBuffer str; va_listargs; if (str == NULL) str = createPQExpBuffer(); for (;;) { /* * Try to format the given string into the available space; but if * there's hardly any space, don't bother trying, just fall through to * enlarge the buffer first. */ if (str-maxlen str-len + 16) { size_t avail = str-maxlen - str-len - 1; intnprinted; va_start(args, fmt); nprinted = vsnprintf(str-data + str-len, avail, fmt, args); va_end(args); /* * Note: some versions of vsnprintf return the number of chars * actually stored, but at least one returns -1 on failure. Be * conservative about believing whether the print worked. */ if (nprinted = 0 nprinted (int) avail - 1) { /* Success. Note nprinted does not include trailing null. */ str-len += nprinted; break; } } /* Double the buffer size and try again. */ if (!enlargePQExpBuffer(str, str-maxlen)) return PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR); /* oops, out of memory */ } return PQexec(conn, str-data); }
Re: [HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()
[EMAIL PROTECTED] writes: I'd like to see a new variant on PQexec(): PGresult * PQexecf(PGconn *conn, const char *fmt, ...); Way too late for 8.3 --- if we were going to do something like this, we should think first and program later. In particular, blindly adopting the sprintf format string definition doesn't seem very helpful. The sorts of escapes I'd want to have are properly quoted SQL identifier, properly quoted SQL literal, etc. A large fraction of what sprintf knows about is more or less irrelevant to the task of creating SQL commands. Also, how does this interact with parameterized or prepared commands? If we wanted PQexecf we'd soon want PQexecParamsf, etc. I don't think we really want so much duplicate logic there --- it'd be better to decouple the string-building functionality from the query-sending functionality. Probably better to consider something like PQformatQuery() that passes back a malloc'd string. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PATCHES] Full page writes improvement, code update
Simon; Tom; Koichi is writing. Your question is how to determine WAL record generated between pg_start_backup and pg_stop_backup and here's an answer. XLogInsert( ) already has a logic to determine if inserting WAL record is between pg_start_backup and pg_stop_backup. Currently it is used to remove full_page_writes when full_page_writes=off. We can use this to mark WAL records. We have one bit not used in WAL record header, the last bit of xl_info, where upper four bits are used to indicate the resource manager and three of the rest are used to indicate number of full page writes included in the record. So in my proposal, this unused bit is used to mark that full page writes must not be removed at offline optimization by pg_complesslog. Sorry I didn't have mailing list capability from home and have just completed my subscription from home. I had to create new thread to continue my post. Sorry for confusion. Please refer to the original thread about this discussion. Best Regards; -- -- Koichi Suzuki ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Last minute mini-proposal (I know, I know) for PQexecf()
Hm, my first thought was that you should just be using bind parameters instead of interpolating variables directly into the query. But the more I think about it the more I like your idea. It's true that using parameters takes away most of the use cases for the kind of interface you suggest. But there are still cases that remain. And in those cases it would be possible to do it more cleanly and conveniently than with a stock sprintf. In particular cases like when I want to insert one of a small number of constants and want to be sure the planner plans and caches separate plans for each value; or when I want to insert entirely different subexpressions depending on some parameter; or most commonly of all I want to vary the order of the ORDER BY expressions and still have every chance of using indexes. Aside from the convenience I think it would be interesting from an injection-safety point of view. We can offer a %-escape for string with SQL quoting and a separate %-escape for unquoted SQL text which is documented as being the wrong thing to use for user-provided data. And we can ensure that all escapes except for this raw SQL escape are all injection-safe. But anything you provide you should provide both in PQexec form and PQprepare form as well (and I suppose in PQexecParams form). This might seem pointless, if you're interpolating some values why not interpolate them all? The answer is that you quite often want to interpolate a few specific values, often values that don't have many possible values and might affect the plan, but definitely don't want to interpolate user-provided values that have many possible values. A typical example might be something like: SELECT * FROM invoices WHERE customer_id = ? ORDER BY { order_by_clauses[column_selected] } You certainly don't want to a plan a new query for every possible user, but you don't mind caching 5 different plans for the five display columns depending on which the user has clicked on. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Modifying TOAST thresholds
[EMAIL PROTECTED] (Tom Lane) wrote: I'm willing to do the code changes to separate TOAST_THRESHOLD from the toast chunk size, but I do not have the time or facilities to do any performance testing for different parameter choices. Anyone want to work on that? What have you got in mind there? I might be able to do something, assuming that it doesn't cause heartburn that I'll be offline April 6-14th. Are we simply talking about having the option of #defining a different threshold at which items get thrown out to TOAST? Or one of the more sophisticated options? Or is the idea more simply that we might consider having the default set somewhat lower than it is at present? And I guess a good question is, what's going to get regarded as a meaningful test? I've got a couple local test cases I could draw from, unfortunately, the interaction with TOAST will more than likely be pretty trivial, showing off Yeah, cutting the threshold was a good idea. And that may not be fair to everyone's case. [The good news is, of course, that if the end deliverable is a single #define parameter that's used as the denominator to the fraction, delivery during beta time is, indeed, quite trivial...] -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/slony.html Outside of a dog, a book is man's best friend. Inside of a dog, it's too dark to read. -Groucho Marx ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/31/07, Simon Riggs [EMAIL PROTECTED] wrote: On Fri, 2007-03-30 at 13:54 -0400, Tom Lane wrote: Hm. So anytime we reject a potentially useful index as being not valid yet, we mark the plan as only good for this top-level transaction? That seems possibly workable --- in particular it doesn't get more complicated as soon as you consider multiple such indexes. I like that because its specific in dealing with the exact issue we have - it doesn't rely on many other things happening correctly. Ok. Cool. I would finish this work then. ...and it also seems to provide a new route to avoiding the CIC wait. Yeah, though I would like to take that up later. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] CREATE INDEX and HOT - revised design
On 3/30/07, Tom Lane [EMAIL PROTECTED] wrote: I do not think you can assume that the plan won't be used later with some older snapshot. Consider recursive plpgsql functions for a counterexample: the inner occurrence might be the first to arrive at a given line of the function, hence the first to plan it, yet when we return to the outer instance we might revert to an older snapshot. Thanks for making me aware of such scenarios. What it also means is that a transaction may use an older snapshot after it created the index. So to be on safer side, we should not use an index created in the same transaction if we saw HOT-updated RECENTLY_DEAD or DELETE_IN_PROGRESS tuples while building the index (with HOT we don't want to index these tuples). Not such a bad restriction to live with. May be we can do something with command ids to solve this once we get the basic stuff ready. Thanks, Pavan -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] UPDATE using sub selects
Hi, On 3/31/07, Tom Lane [EMAIL PROTECTED] wrote: NikhilS [EMAIL PROTECTED] writes: I have invented a ROWEXPR_SUBLINK type that handles multiple output columns. The trouble is that since eventually columns of the parents have to be part of the query's targetList, I am sending the entire subquery as one of the entries in that list and the targetList gets populated with entries dependent on the subquery much later via make_subplan. This breaks code in rewriteTargetList (which expects every list entry to be of type TargetEntry), and expand_targetlist (which expects the targets to be present in attrno order, the entries added because of the subquery will not be in order as compared to normal SET colname = expr targets). Hmm. That sounds like it would be a horrid mess. You need to decouple the execution of the subplan from the use of its outputs, apparently. There is some precedent for this in the way that InitPlans are handled: the result of the subplan is stored into a ParamList array entry that's later referenced by a Param node in the parent's expression tree. That would generalize easily enough to setting more than one Param, but I'm not clear on where you'd want to stick the subplan itself in the plan tree, nor on what controls how often it needs to get evaluated. Yes, I have tried this already. As you suspect, it seems that the subplan does not get evaluated if its not part of the targetList at all. An alternative approach is to put the subplan into the rangetable and use Vars to reference its outputs. Again it's not quite clear what drives re-execution of the subplan. It strikes me though that an approach like this might also serve for SQL2003's LATERAL construct, which'd be a nice thing to support. Ok, I will try this out. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com