Re: [HACKERS] Move src/tools/backend/ to wiki
Alvaro Herrera wrote: I think getting rid of the FAQ completely is not necessarily a good idea; it seems useful as a collection of interesting questions. Moving the contents to new pages is probably OK. Also, as the answers mature on the Wiki, perhaps it'd be possible to move them to the SGML docs (and reduce the Wiki answer to just a pointer). Yes, that's about the right plan. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote: The importance of the WAL will only increase as more people start to use it for PITR, replication etc. Agreed. The 2nd use case, however, I find pretty unconvincing. I can't think of a good example of that. Anything that needs to define its own resource manager is very low-level stuff, and probably needs to go into the core anyway. New indexes are a big one, but I listed others also. Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Other data structures can be maintained by trigger code that writes new types of WAL. That was always possible before, now they can be recoverable too. If we have extensible functions, triggers, indexes, why not WAL? What is the problem with making WAL extensible? It carries no penalty at all for standard WAL records, since the internal design for WAL already caters for exactly this. So, let's focus on the 1st use case. No, lets look at both...you can't just wave away half the use cases. If you look at all of the use cases the argument for doing it externally quickly falls apart since it severely limits what can be achieved. I think a better approach for that is to implement the filters as external programs, like pglesslog. It allows more flexibility, although it also means that you can't rely on existing backend functions to manipulate the WAL. I'd love to see a WAL toolkit on pgfoundry, with tools like the filter to only restore a single database, pglesslog, a WAL record viewer etc. A while ago, you also talked about replacing the Slony triggers in the master with a tool that reads the WAL; another good example of an external tool that needs to read WAL. The toolkit could provide some sort of a framework and common user interface to read and write WAL files for all those tools. This patch provides exactly the toolkit you describe, just internally. As you point out, doing it other ways means you can't access internal functions easily and can't maintain internal data structures correctly either. So doing it externally is *not* a substitute and this is not a simple discussion of include/exclude from core. I'm lost as to why suggesting we limit the functionality is going to be a good thing? If external tools really are so good, then we can do that *as well*. But this is only a plugin API, so the tools will be developed externally anyway. A while ago, you also talked about replacing the Slony triggers in the master with a tool that reads the WAL Writes the WAL you mean? Slony triggers could write data to WAL rather than log tables and the slon daemon can be implemented as an rmgr plugin. Or many other options. Another tool like that is pglesslog, although that one couldn't actually be implemented with these hooks. Sounds like we'll want to integrate that into synch replication some how, so suggestions as to how to do that welcome - if you're not already doing it via some other plugin in synch rep code? I'm sure there's more tricks like that people would find useful, if the tools existed. Agreed. So lets make them exist. If there's an argument against doing this, I've not heard it made clearly by anybody. When we discussed it first on hackers there was no objection, so I wrote the patch. If people want to see this blocked now, we need some good reasons. I've got nothing riding on the acceptance of this patch, I just think its a good thing. That's why I deprioritised it. If there's some hidden threat to national security or whatever, tell me off list. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The main idea was to invert the meaning of the xid array in the snapshot struct - instead of storing all the xid's between xmin and xmax that are to be considering in-progress, the array contained all the xid's xmin that are to be considered completed. The downside is that the size of the read-only snapshot is theoretically unbounded, which poses a bit of a problem if it's supposed to live inside shared memory... Why do it inverted? That clearly has problems. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Autovacuum and Autoanalyze
Disabling autovacuum can have catastrophic effects, since it disables the ANALYZing of tables. Can we have a mode where we disable autoVACUUM yet enable autoANALYZE? ANALYZE times are fairly bounded because of the way we do sampling. VACUUM times are not bounded at all, and typically O(n). So it makes sense to switch off the VACUUM at certain times, but never good to switch off ANALYZE. While we're there, it would be useful if CREATE TABLE AS SELECT was followed by an automatic ANALYZE. Especially important for temp tables. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Simon Riggs [EMAIL PROTECTED] writes: On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote: The 2nd use case, however, I find pretty unconvincing. I can't think of a good example of that. Anything that needs to define its own resource manager is very low-level stuff, and probably needs to go into the core anyway. New indexes are a big one, but I listed others also. Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Hm, so currently if you want to add a new indexam you can't just insert into pg_am and make them recoverable. You basically have to build in your new index access method into Postgres with the new rmgr. That is annoying and a problem worth tackling. But I'm a bit worried about having this be an external plugin. There's no way to looking at a WAL file to know whether it will be recoverable with the plugins available. Worse, there's a risk you could have a plugin but not the *right* plugin. Perhaps this could be tackled simply by having startup insert a record listing all the rmgr's in use with identifying information and their version numbers. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch
* Single Evaluation: with foo(i) as (select random() as i) select * from foo union all select * from foo; i --- 0.233165248762816 0.62126633618027 (2 rows) The standard specifies that non-recursive WITH should be evaluated once. What shall we do? I don't think there's an easy way to fix this as Tom suggested. Maybe we should not allow WITH clause without RECURISVE for 8.4? This is a still remaing issue... * Binary recursion and subselect strangeness: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 union all select i+1 from foo where i X) t) select * from foo; Produces 10 rows of output regardless of what X is. This should be fixed for 8.4. Also, this is non-linear recursion, which the standard seems to disallow. I will try to fix this. However detecting the query being not a non-linear one is not so easy. I have implemented rejection of non-linear recursion and now this type of query will not be executed anyway. * Multiple recursive references: with recursive foo(i) as (values (1) union all select i+1 from foo where i 10 union all select i+1 from foo where i 20) select * from foo; ERROR: Left hand side of UNION ALL must be a non-recursive term in a recursive query If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid case. I will try to disallow this. Non-linear recursion is not allowed now. * Strange result with except: with recursive foo(i) as (values (1) union all select * from (select i+1 from foo where i 10 except select i+1 from foo where i 5) t) select * from foo; ERROR: table foo has 0 columns available but 1 columns specified This query works if you replace except with union. This should be fixed for 8.4. I will try to fix this. This is a non-linear recursion too and will not be executed anyway. * Aggregates allowed: with recursive foo(i) as (values(1) union all select max(i)+1 from foo where i 10) select * from foo; Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4. I will try to fix this. Fixed. * DISTINCT should supress duplicates: with recursive foo(i) as (select distinct * from (values(1),(2)) t union all select distinct i+1 from foo where i 10) select * from foo; This outputs a lot of duplicates, but they should be supressed according to the standard. This query is essentially the same as supporting UNION for recursive queries, so we should either fix both for 8.4 or block both for consistency. I'm not sure if it's possible to fix this. Will look into. Ok, now this type of DISTINCT is not allowed. Included is the latest patches against CVS HEAD. -- Tatsuo Ishii SRA OSS, Inc. Japan recursive_query.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Mechanism to transmit multiple event notifications with one signal
Hi, On Fri, Sep 12, 2008 at 12:41 AM, Tom Lane [EMAIL PROTECTED] wrote: Markus Wanner [EMAIL PROTECTED] writes: Tom Lane wrote: Sooner or later we shall have to bite the bullet and set up a multiplexing system to transmit multiple event types to backends with just one signal. We already did it for signals to the postmaster. Agreed. However, it's non-trivial if you want reliable queues (i.e. no message skipped, as with signals) for varying message sizes. No, that's not what I had in mind at all, just the ability to deliver one of a specified set of event notifications --- ie, get around the fact that Unix only gives us two user-definable signal types. For signals sent from other backends, it'd be sufficient to put a bitmask field into PGPROC entries, which the sender could OR bits into before sending the one real signal event (either SIGUSR1 or SIGUSR2). I'm not sure what to do if we need signals sent from processes that aren't connected to shared memory; but maybe we need not cross that bridge here. (Also, I gather that the Windows implementation could already support a bunch more signal types without much trouble.) There was discussion of event notification to backends from other process in the thread, Synchronous Log Shipping Replication (http://archives.postgresql.org/pgsql-hackers/2008-09/msg00802.php). The problem is that it's difficult to define new event notification using signal to backends since they already use SIGUSR1 and SIGUSR2. Attached is a WIP patch for the mechanism to transmit multiple event notifications with one signal. This mechanism would be useful for various uses, at least I would use this for implementing synchronous replication. In this patch, * I put a bitmask field into PGPROC for the communication of event type as advised by Tom in the above thread. * I replaced CatchupInterruptHandler() by new signal handler for SIGUSR1, adjusted the codes around catchup interrupt to this mechanism. * remaining work is adjusting the comments around catchup interrupt. Comments welcome. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center multiplexer_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3.3 compiler warnings with gcc 4.3
Tom Lane [EMAIL PROTECTED] writes: Devrim =?ISO-8859-1?Q?G=DCND=DCZ?= [EMAIL PROTECTED] writes: I'm getting these on Fedora-9: tqual.c:115: warning: inlining failed in call to ‘SetHintBits’: call is They're just cosmetic. We don't generally worry about fixing cosmetic warnings in back branches. Are they? It seems like these were marked inline for a reason. SetHintBits has at least one conditional in it which can often be optimized out (it's either InvalidTransactionId or else has been tested already). In the tuplestore case there's not much point in having that function if it's not going to be inlined. I'm imagining that it was put in because someone did verify that the non-inlined version was consuming significant time. -- Perhaps a bit assumption though. Also, is 8.3 really a back branch? It's the current release and anyone downloading Postgres from source on the web site will be getting these warnings -- and there are a lot of them. Enough that it looks like something's gone wrong with the build. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
On Mon, 2008-09-15 at 10:47 +0100, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: On Sat, 2008-09-13 at 10:59 +0300, Heikki Linnakangas wrote: The 2nd use case, however, I find pretty unconvincing. I can't think of a good example of that. Anything that needs to define its own resource manager is very low-level stuff, and probably needs to go into the core anyway. New indexes are a big one, but I listed others also. Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Hm, so currently if you want to add a new indexam you can't just insert into pg_am and make them recoverable. You basically have to build in your new index access method into Postgres with the new rmgr. That is annoying and a problem worth tackling. Agreed. But I'm a bit worried about having this be an external plugin. There's no way to looking at a WAL file to know whether it will be recoverable with the plugins available. Worse, there's a risk you could have a plugin but not the *right* plugin. That risk was discussed and is handled in the plugin. You are limited to only insert data into WAL that has a current plugin that says it will handle redo for that type. Perhaps this could be tackled simply by having startup insert a record listing all the rmgr's in use with identifying information and their version numbers. Non-standard plugins in use are listed when in use, so we can all see what's going on. Plugins can issue their own startup messages if they choose, with version numbers and other details. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] New shapshot RPMs (Sep 15 2008) are ready for testing
Hi, I just released new RPM sets, which is based on Sep 15 10:00 AM EEST CVS snapshot. These packages *do* require a dump/reload, even from previous 8.4 packages, because of a catversion update. Over the last week, we have 70 more testers (now 220!) . As usual, please find detailed info from: http://yum.pgsqlrpms.org A mini howto about 8.4devel release + RPMs is here: http://yum.pgsqlrpms.org/news-8.4devel-ready-for-testing.php The tarball I used is here: http://yum.pgsqlrpms.org/srpms/8.4 (I will remove tarballs after a few weeks...) Please report any packaging related errors to me. If you find any PostgreSQL 8.4 bugs, please post them to [EMAIL PROTECTED] or fill out this form: http://www.postgresql.org/support/submitbug Regards, -- Devrim GÜNDÜZ, RHCE devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr http://www.gunduz.org signature.asc Description: This is a digitally signed message part
Re: [HACKERS] pg_regress inputdir
Tom Lane wrote: But I think Alvaro is worried about something at a higher level: the regression test process as a whole has some directory layout assumptions built into it, particularly in regards to where to find .so's. The only information about the location of the .so's is in the test files themselves, which seems reasonable, because they are created and installed at the same time as the .so's that they are presumably supposed to test. So I see no problem here. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] no XLOG during COPY?
On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote: Great, thanks (and also to Guillaume). It looks to me like the simple way around this issue would be to provide an option to have pg_restore emit: begin; truncate foo; copy foo ... commit; The truncate will be trivial as there won't be any data or indexes at that stage anyway. Not sure which stage you're talking about. If this is a parallel restore and you are running a create in one session and a load in another, then ISTM you have no way of knowing that for certain. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The current read-only snapshot (which current meaning the corresponding state on the master at the time the last replayed wal record was generated) was maintained in shared memory. It' xmin field was continually updated with the (newly added) XLogRecord.xl_xmin field, which contained the xid of the oldest running query on the master, with a pruning step after each ReadOnlySnapshot.xmin update to remove all entries xmin from the xid array. If a commit was seen for an xid, that xid was added to the ReadOnlySnapshot.xid array. The advantage of this concept is that it handles snapshotting on the slave without too much additional work for the master (The only change is the addition of the xl_xmin field to XLogRecord). It especially removes that need to track ShmemVariableCache-nextXid. Snapshots only need to know which transactions are currently running during WAL apply. The standby can't remove any tuples itself, so it doesn't need to know what the master's OldestXmin is. So passing xl_xmin from master to standby seems not necessary to me. The standby's OldestXmin needs to be passed through to the master, not the other way around so that WAL records for tuple removal are not generated. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review Report: propose to include 3 new functions into intarray and intagg
Markus Wanner wrote: Dmitry Koterov wrote: But, what about intarray patch? Does somebody plan to review it? I'd prefer to include it too. If you approve, I'll correct the code style in intarray contrib patch too. I've already volunteered for reviewing it as well. I just felt like splitting things up... Looking at the intarray patch: I find it a bit unfriendly to have a function that depends on sorted input, but doesn't check it. But that's probably not a good enough reason to reject an otherwise simple and useful function. Also, we already have uniq, which doesn't strictly speaking require sorted input, but it does if you want to eliminate all duplicates from the array. _int_group_count_sort seems a bit special purpose. Why does it bother to sort the output? That's wasted time if you don't need sorted output, or if you want the array sorted by the integer value instead of frequency. If you want sorted output, you can just sort it afterwards. Also, it's requiring sorted input for a small performance gain, but there's a lot more precedence in the existing intarray functions to not require sorted input, but to sort the input instead (union, intersect, same, overlap). I realize that the current implementation is faster for the use case where the input is sorted, and output needs to be sorted, but if we go down that path we'll soon have dozens of different variants of various functions, with different ordering requirements of inputs and outputs. So, I'd suggest changing _int_group_count_sort so that it doesn't require sorted input, and doesn't sort the output. The binary search function looks good to me (I think I'd prefer naming it bsearch(), though, though I can see that it was named bidx in reference to the existing idx function). Also, as Markus pointed out, the SGML docs need to be updated. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] no XLOG during COPY?
Simon Riggs wrote: On Thu, 2008-09-11 at 15:25 -0400, Andrew Dunstan wrote: Great, thanks (and also to Guillaume). It looks to me like the simple way around this issue would be to provide an option to have pg_restore emit: begin; truncate foo; copy foo ... commit; The truncate will be trivial as there won't be any data or indexes at that stage anyway. Not sure which stage you're talking about. If this is a parallel restore and you are running a create in one session and a load in another, then ISTM you have no way of knowing that for certain. Er, who doesn't know what for certain, exactly? pg_restore will certainly know that it has created the table in another session and can thus safely truncate the table in the same transaction as the data load. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
Simon Riggs [EMAIL PROTECTED] writes: So passing xl_xmin from master to standby seems not necessary to me. The standby's OldestXmin needs to be passed through to the master, not the other way around so that WAL records for tuple removal are not generated. I think most people were pretty leery of doing it that way because it means activity on the standby database can cause the master to bloat. The consensus seemed to be headed towards having WAL replay on the standby stall if it meets a tuple removal record for a tuple that's visible to a query running on it. Probably with a mechanism to configure a maximum amount of time it can be stalled before shooting those queries. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review Report: propose to include 3 new functions into intarray and intagg
Hi, sorry for not having completed this review, yet. As you are obviously looking at the patch as well, I'll try to quickly write down my points so far. Trying to compile the intarray module, I now receive an error: error: ‘INT4OID’ undeclared (first use in this function) That can be solved by including catalog/pg_type.h from contrib/intarr/_int_op.c. The PG_FUNCTION_INFO_V1 and prototype definition certainly belong to the top of the file, where all others are. Some lines are longer than 80 columns and again comments are a bit sparse or even useless (no additional things, please). Heikki Linnakangas wrote: I find it a bit unfriendly to have a function that depends on sorted input, but doesn't check it. But that's probably not a good enough reason to reject an otherwise simple and useful function. Also, we already have uniq, which doesn't strictly speaking require sorted input, but it does if you want to eliminate all duplicates from the array. I think it's a performance optimization which is absolutely required in some cases. Some time ago I've also had to rip out the sorting step from certain intarray module functions to save processing time. One option already mentioned somewhere would be saving a 'sorted' property for the array. Then again, I think such a thing would certainly have to be done globally, for all kinds of arrays. _int_group_count_sort seems a bit special purpose. Why does it bother to sort the output? That's wasted time if you don't need sorted output, or if you want the array sorted by the integer value instead of frequency. If you want sorted output, you can just sort it afterwards. Agreed. IMO the normal GROUP BY and ORDER BY stuff of the database itself should be used for such a thing. However, that means turning an array into a set of rows... Also, it's requiring sorted input for a small performance gain, but there's a lot more precedence in the existing intarray functions to not require sorted input, but to sort the input instead (union, intersect, same, overlap). ..and exactly these are the functions I had to wrap again to strip the sorting step, due to poor performance for known-sorted arrays. I realize that the current implementation is faster for the use case where the input is sorted, and output needs to be sorted, but if we go down that path we'll soon have dozens of different variants of various functions, with different ordering requirements of inputs and outputs. Agreed. However, given the OP is using that in production, there seems to be a use case for the optimization, where we have none for the same function without it. So, I'd suggest changing _int_group_count_sort so that it doesn't require sorted input, and doesn't sort the output. The binary search function looks good to me (I think I'd prefer naming it bsearch(), though, though I can see that it was named bidx in reference to the existing idx function). Also, as Markus pointed out, the SGML docs need to be updated. As is, it should probably also carry the '_int' prefix, because it's not a general purpose array function. So propose to name it '_int_bsearch'. Overall I think these functions are overly specialized and should be replaced be more general counterparts in core. However, until we have that, it's hard to refuse such a thing for contrib. By that reasoning, however, the intarray would have to provide methods for sorted as well as asorted input arrays as well. I'm closing my part of reviewing of this patch now. Dmitry, how do you want to proceed with these patches? Do you have time for some cleaning up and writing documentation? Regards Markus Wanner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, 2008-09-15 at 13:13 +0100, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: So passing xl_xmin from master to standby seems not necessary to me. The standby's OldestXmin needs to be passed through to the master, not the other way around so that WAL records for tuple removal are not generated. I think most people were pretty leery of doing it that way because it means activity on the standby database can cause the master to bloat. The consensus seemed to be headed towards having WAL replay on the standby stall if it meets a tuple removal record for a tuple that's visible to a query running on it. Probably with a mechanism to configure a maximum amount of time it can be stalled before shooting those queries. Well, my impression from all inputs is there is no single preferred route. Any one of the approaches seems to have a possible objection, depending upon the needs of the user. So I'm going to give options. In any case it's not just a two horse race. There are other options, favoured by some people, that you've not personally commented on in any of your summaries (thats up to you, of course). And Standby causing master to bloat is not such a big problem. It's no different to running queries directly on the master. But please don't take it that I don't see the problem or think it the best solution in all cases. Certainly, halting WAL replay for any length of time might mean it can never catch up again, so that won't be acceptable for many cases. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Fri, Sep 12, 2008 at 7:41 PM, Simon Riggs [EMAIL PROTECTED] wrote: On Thu, 2008-09-11 at 18:17 +0300, Heikki Linnakangas wrote: Fujii Masao wrote: I think that this case would often happen. So, we should establish a certain solution or procedure to the case where TLI of the master doesn't match TLI of the slave. If we only allow the case where TLI of both servers is the same, the configuration after failover always needs to get the base backup on the new master. It's unacceptable for many users. But, I think that it's the role of admin or external tools to copy history files to the slave from the master. Hmm. There's more problems than the TLI with that. For the original master to catch up by replaying WAL from the new slave, without restoring from a full backup, the original master must not write to disk *any* WAL that hasn't made it to the slave yet. That is certainly not true for asynchronous replication, but it also throws off the idea of flushing the WAL concurrently to the local disk and to the slave in synchronous mode. I agree that having to get a new base backup to get the old master catch up with the new master sucks, so I hope someone sees a way around that. If we were going to recover from failed-over standby back to original master just via WAL logs we would need all of the WAL files from the point of failover. So you'd need to be storing all WAL file just in case the old master recovers. I can't believe doing that would be the common case, because its so impractical and most people would run out of disk space and need to delete WAL files. No. The original master doesn't need all WAL files. It needs WAL file which its pg_control points as latest checkpoint location and subsequent files. It should be clear that to make this work you must run with a base backup that was derived correctly on the current master. You can do that by re-copying everything, or you can do that by just shipping changed blocks (rsync etc). So I don't see a problem in the first place. PITR doesn't always need a base backup. We can do PITR from the data files just after crash if they aren't corrupted (i.e. not media crash). As the situation demands, most users would like to choose the setup procedure that bad influence on the cluster is smaller. They would choose the procedure without a base backup if there are few WAL files to be replayed. Meanwhile, they would use a base backup if the indispensable WAL files have already been deleted. But, in that case, they might not take new base backup and use old one (e.g. taken 2 days before). Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Hm, so currently if you want to add a new indexam you can't just insert into pg_am and make them recoverable. You basically have to build in your new index access method into Postgres with the new rmgr. That is annoying and a problem worth tackling. I concur with Heikki that that's not exactly a compelling use-case. I've never heard of anyone building a non-core index AM at all; much less trying to use it in a production context. Given the obvious potential for version-mismatch-type problems, it's hard to believe that anyone ever would try. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
D'Arcy J.M. Cain wrote: On Fri, 12 Sep 2008 06:53:55 +1000 Brendan Jurd [EMAIL PROTECTED] wrote: Josh has assigned your patch to me for an initial review. And me. Thanks for the reviews, guys. I've adjusted the patch per your comments (I think), and applied it. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
On Fri, Sep 12, 2008 at 12:17 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Fujii Masao wrote: I think that this case would often happen. So, we should establish a certain solution or procedure to the case where TLI of the master doesn't match TLI of the slave. If we only allow the case where TLI of both servers is the same, the configuration after failover always needs to get the base backup on the new master. It's unacceptable for many users. But, I think that it's the role of admin or external tools to copy history files to the slave from the master. Hmm. There's more problems than the TLI with that. For the original master to catch up by replaying WAL from the new slave, without restoring from a full backup, the original master must not write to disk *any* WAL that hasn't made it to the slave yet. That is certainly not true for asynchronous replication, but it also throws off the idea of flushing the WAL concurrently to the local disk and to the slave in synchronous mode. Yes. If the master fails after writing WAL to disk and before sending it to the slave, at least latest WAL file would be inconsistent between both servers. So, regardless of using a base backup, in a setup procedure, we need to delete those inconsistent WAL files or overwrite them. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, Sep 15, 2008 at 8:40 AM, Simon Riggs [EMAIL PROTECTED] wrote: On Mon, 2008-09-15 at 13:13 +0100, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: So passing xl_xmin from master to standby seems not necessary to me. The standby's OldestXmin needs to be passed through to the master, not the other way around so that WAL records for tuple removal are not generated. I think most people were pretty leery of doing it that way because it means activity on the standby database can cause the master to bloat. The consensus seemed to be headed towards having WAL replay on the standby stall if it meets a tuple removal record for a tuple that's visible to a query running on it. Probably with a mechanism to configure a maximum amount of time it can be stalled before shooting those queries. Well, my impression from all inputs is there is no single preferred route. Any one of the approaches seems to have a possible objection, depending upon the needs of the user. So I'm going to give options. In any case it's not just a two horse race. There are other options, favoured by some people, that you've not personally commented on in any of your summaries (thats up to you, of course). And Standby causing master to bloat is not such a big problem. It's no different to running queries directly on the master. But please don't take it that I don't see the problem or think it the best solution in all cases. It's not a problem, but a relative disadvantage. What makes warm standby really attractive relative to other data transfer solutions is that there are no side effects on the master outside of setting the archive command...communication is one way. Any 'master bloat' style approach seems to be increasingly fragile if you want to increase the number of standby servers, if it's even possible to do that. I'm not saying it should be the only way to do it (and it may not even be possible)...but it would be very attractive to be able to run a hot standby much the same as a warm standby is running today...I could, for example easily script a second standby but keep it a week behind for example. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Simon Riggs escribió: On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote: It should be clear that to make this work you must run with a base backup that was derived correctly on the current master. You can do that by re-copying everything, or you can do that by just shipping changed blocks (rsync etc). So I don't see a problem in the first place. Hmm, built-in rsync capability would be cool. Probably not in the first phase, though.. Built-in? Why? I mean make base backup using rsync. That way only changed data blocks need be migrated, so much faster. Why rsync? Just compare the LSNs ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
On Mon, Sep 15, 2008 at 10:53 PM, Magnus Hagander [EMAIL PROTECTED] wrote: Thanks for the reviews, guys. I've adjusted the patch per your comments (I think), and applied it. Cool. I had a look at the commitdiff and I think you missed one of the error messages (it's still all on one line). It's at src/backend/libpq/hba.c line 841. I've included a patch to split it up below. Cheers, BJ --- src/backend/libpq/hba.c +++ src/backend/libpq/hba.c @@ -840,9 +840,10 @@ hba_syntax: if (line_item) ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), -errmsg(invalid entry in file \%s\ at line %d, token \%s\, - HbaFileName, line_num, - (char *) lfirst(line_item; +errmsg(invalid entry for token \%s\, + (char *) lfirst(line_item)), +errdetail(In file \%s\, line %d, + HbaFileName, line_num))); else ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Alvaro Herrera wrote: Simon Riggs escribió: On Fri, 2008-09-12 at 17:08 +0300, Heikki Linnakangas wrote: It should be clear that to make this work you must run with a base backup that was derived correctly on the current master. You can do that by re-copying everything, or you can do that by just shipping changed blocks (rsync etc). So I don't see a problem in the first place. Hmm, built-in rsync capability would be cool. Probably not in the first phase, though.. Built-in? Why? I mean make base backup using rsync. That way only changed data blocks need be migrated, so much faster. Why rsync? Just compare the LSNs ... True, that's much better. Only works for data files, though, so we'll still need something else for clog etc. But the volume of the other stuff is much smaller, so I support we don't need to bother delta compressing them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: So passing xl_xmin from master to standby seems not necessary to me. The standby's OldestXmin needs to be passed through to the master, not the other way around so that WAL records for tuple removal are not generated. I think most people were pretty leery of doing it that way because it means activity on the standby database can cause the master to bloat. The consensus seemed to be headed towards having WAL replay on the standby stall if it meets a tuple removal record for a tuple that's visible to a query running on it. Probably with a mechanism to configure a maximum amount of time it can be stalled before shooting those queries. Yes, but we can quite easily provide an option on top of that to advertise the slave xmin back to the master, for those who prefer some bloat on master over stalling replay or killing queries in the slave. In fact, I think a lot of people will choose some compromise, where the slave xmin is advertised back to the master, but the master will obey it only up to some limit, after which the slave will need to stall or kill queries again. It's not something that needs to be done in the first phase, but should be straightforward to add after the basics are working. In any case, we'll need the capability in the slave to notice when it's about to remove a tuple that's still visible to a snapshot in the slave. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Hm, so currently if you want to add a new indexam you can't just insert into pg_am and make them recoverable. You basically have to build in your new index access method into Postgres with the new rmgr. That is annoying and a problem worth tackling. I concur with Heikki that that's not exactly a compelling use-case. I've never heard of anyone building a non-core index AM at all; much less trying to use it in a production context. Given the obvious potential for version-mismatch-type problems, it's hard to believe that anyone ever would try. Well wasn't GIST such an instance until we put it in core? IIRC it lived in contrib for a long time. It happens that the route they took was to implement it without recoverability until it was in core then add logging. I suspect we would lean on any new method to have logging before it was merged in though. I think the version-mismatch problems are fairly important though which is why I was suggesting providing checks for that in postgres. Simon's right though that the plugin could check for it itself. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Brendan Jurd [EMAIL PROTECTED] writes: Cool. I had a look at the commitdiff and I think you missed one of the error messages (it's still all on one line). It's at src/backend/libpq/hba.c line 841. I've included a patch to split it up below. Hm, that doesn't seem like a great improvement --- in particular, it violates the style guideline that detail messages should be complete sentences. I think the error text is all right as-is, really. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Tom Lane wrote: Brendan Jurd [EMAIL PROTECTED] writes: Cool. I had a look at the commitdiff and I think you missed one of the error messages (it's still all on one line). It's at src/backend/libpq/hba.c line 841. I've included a patch to split it up below. Hm, that doesn't seem like a great improvement --- in particular, it violates the style guideline that detail messages should be complete sentences. I think the error text is all right as-is, really. I think I need to hit myself over the head with those guidelines repeatedly, because I think the *changed* messages are in violation of this, and need to be changed again... //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, 2008-09-15 at 16:26 +0300, Heikki Linnakangas wrote: In any case, we'll need the capability in the slave to notice when it's about to remove a tuple that's still visible to a snapshot in the slave. Agreed. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
On Mon, 2008-09-15 at 08:52 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: Simon Riggs [EMAIL PROTECTED] writes: Indexes have always been able to be added dynamically. Now they can be recovered correctly as well. Hm, so currently if you want to add a new indexam you can't just insert into pg_am and make them recoverable. You basically have to build in your new index access method into Postgres with the new rmgr. That is annoying and a problem worth tackling. I concur with Heikki that that's not exactly a compelling use-case. I've never heard of anyone building a non-core index AM at all; much less trying to use it in a production context. Given the obvious potential for version-mismatch-type problems, it's hard to believe that anyone ever would try. The lack of a chicken is not an argument against the use case for an egg. But in any case, Bizgres was exactly this case, so they already did. We just forced the authors to produce a code fork to do it, confusing people rather than attracting people to Postgres. We have plugin APIs with possible version mismatches in other contexts, but I don't see us doing anything about that. In the context of WAL, the basic WAL format has not changed in about 6 releases, so its been one of the most stable file formats. Certain message types have changed, but messages are all independent across rmgrs, so insulated from change. The version mismatch idea presumes that a code author would structure their code in two pieces: one to generate the WAL and one to read it. Seems much more likely to me that authors would have one module containing both as a way of avoiding the problem altogether. So I'm not sure what to check, and against what? When people do write useful plugins in the future they will be potentially usable with any server at 8.4 or above. If we had had this feature a few releases ago, we could have made GIN available to earlier releases, for example. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, 2008-09-15 at 09:07 -0400, Merlin Moncure wrote: Well, my impression from all inputs is there is no single preferred route. Any one of the approaches seems to have a possible objection, depending upon the needs of the user. So I'm going to give options. In any case it's not just a two horse race. There are other options, favoured by some people, that you've not personally commented on in any of your summaries (thats up to you, of course). And Standby causing master to bloat is not such a big problem. It's no different to running queries directly on the master. But please don't take it that I don't see the problem or think it the best solution in all cases. It's not a problem, but a relative disadvantage. In some circumstances, yes, in others, not. But its not the only consideration and different people attach different weightings in their decision making. What makes warm standby really attractive relative to other data transfer solutions is that there are no side effects on the master outside of setting the archive command...communication is one way. Any 'master bloat' style approach seems to be increasingly fragile if you want to increase the number of standby servers, if it's even possible to do that. I'm not saying it should be the only way to do it (and it may not even be possible)...but it would be very attractive to be able to run a hot standby much the same as a warm standby is running today...I could, for example easily script a second standby but keep it a week behind for example. I hope to provide options for all users, not just a single approach. If you choose never to use the option to link standby and master, I respect that and understand. I've listened to your requirements and believe I'm including things to allow it to work for you the way you've said. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Simon Riggs [EMAIL PROTECTED] writes: The version mismatch idea presumes that a code author would structure their code in two pieces: one to generate the WAL and one to read it. Seems much more likely to me that authors would have one module containing both as a way of avoiding the problem altogether. So I'm not sure what to check, and against what? No, the danger is that someone generates a backup with one version of the plugin and then restores with a different version of the plugin. That would be frightfully easy to do when doing a minor upgrade, for example. Or on a standby database if you've installed a new version of the plugin since the standby was built. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Simon Riggs wrote: On Mon, 2008-09-15 at 08:52 -0400, Tom Lane wrote: I've never heard of anyone building a non-core index AM at all; much less trying to use it in a production context. Given the obvious potential for version-mismatch-type problems, it's hard to believe that anyone ever would try. The lack of a chicken is not an argument against the use case for an egg. But in any case, Bizgres was exactly this case, so they already did. We just forced the authors to produce a code fork to do it, confusing people rather than attracting people to Postgres. Are you referring to the bitmap index patch? IIRC, there was some non-trivial changes to indexam API in there, as well as issues with VACUUM. If anything, that patch supports the assumption that anything that needs WAL-logging is working at such a low-level that it needs to be in core anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Simon Riggs [EMAIL PROTECTED] writes: We have plugin APIs with possible version mismatches in other contexts, but I don't see us doing anything about that. In the context of WAL, the basic WAL format has not changed in about 6 releases, so its been one of the most stable file formats. Er, that's simply false. Read the revision history for xlog_internal.h. The version mismatch idea presumes that a code author would structure their code in two pieces: one to generate the WAL and one to read it. No, the version mismatch problem is that you might try to read the WAL with a different version of the plugin than you wrote it with. Or maybe with a completely unrelated plugin that was unfortunate enough to choose the same rmgr ID. We can't afford to insert complete versioning information into each WAL record, so it's going to be pretty difficult to avoid this risk. When people do write useful plugins in the future they will be potentially usable with any server at 8.4 or above. If we had had this feature a few releases ago, we could have made GIN available to earlier releases, for example. Well, the initial commit for GIN looked like this: 2006-05-02 07:28 teodor * contrib/tsearch2/Makefile, contrib/tsearch2/ginidx.c, contrib/tsearch2/tsearch.sql.in, contrib/tsearch2/expected/tsearch2.out, contrib/tsearch2/sql/tsearch2.sql, src/backend/access/Makefile, src/backend/access/gin/Makefile, src/backend/access/gin/README, src/backend/access/gin/ginarrayproc.c, src/backend/access/gin/ginbtree.c, src/backend/access/gin/ginbulk.c, src/backend/access/gin/gindatapage.c, src/backend/access/gin/ginentrypage.c, src/backend/access/gin/ginget.c, src/backend/access/gin/gininsert.c, src/backend/access/gin/ginscan.c, src/backend/access/gin/ginutil.c, src/backend/access/gin/ginvacuum.c, src/backend/access/gin/ginxlog.c, src/backend/access/transam/rmgr.c, src/backend/commands/cluster.c, src/backend/commands/opclasscmds.c, src/backend/commands/vacuum.c, src/backend/utils/adt/selfuncs.c, src/backend/utils/init/globals.c, src/backend/utils/misc/guc.c, src/include/access/gin.h, src/include/access/rmgr.h, src/include/catalog/catversion.h, src/include/catalog/pg_am.h, src/include/catalog/pg_amop.h, src/include/catalog/pg_amproc.h, src/include/catalog/pg_opclass.h, src/include/catalog/pg_operator.h, src/include/catalog/pg_proc.h, src/include/utils/selfuncs.h, src/test/regress/data/array.data, src/test/regress/expected/arrays.out, src/test/regress/expected/create_index.out, src/test/regress/expected/create_table.out, src/test/regress/expected/opr_sanity.out, src/test/regress/expected/sanity_check.out, src/test/regress/input/copy.source, src/test/regress/output/copy.source, src/test/regress/output/misc.source, src/test/regress/sql/arrays.sql, src/test/regress/sql/create_index.sql, src/test/regress/sql/create_table.sql, src/test/regress/sql/opr_sanity.sql: GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN. Had the only core source file touched been rmgr.c, then maybe this argument would be valid ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
On Mon, Sep 15, 2008 at 11:46 PM, Magnus Hagander [EMAIL PROTECTED] wrote: Tom Lane wrote: Hm, that doesn't seem like a great improvement --- in particular, it violates the style guideline that detail messages should be complete sentences. I think the error text is all right as-is, really. I think I need to hit myself over the head with those guidelines repeatedly, because I think the *changed* messages are in violation of this, and need to be changed again... Right, I was just applying the same DETAIL structure as was used elsewhere in the patch. Tom's got a point about the complete sentences though. I still feel that the primary message is too long once the arguments have been substituted in. How about just tweaking the DETAIL portion so that it is a complete sentence? Or perhaps the DETAIL portion would be more appropriate as a CONTEXT? Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Brendan Jurd [EMAIL PROTECTED] writes: Or perhaps the DETAIL portion would be more appropriate as a CONTEXT? +1 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous Log Shipping Replication
Fujii Masao wrote: On Fri, Sep 12, 2008 at 12:17 AM, Heikki Linnakangas [EMAIL PROTECTED] wrote: Hmm. There's more problems than the TLI with that. For the original master to catch up by replaying WAL from the new slave, without restoring from a full backup, the original master must not write to disk *any* WAL that hasn't made it to the slave yet. That is certainly not true for asynchronous replication, but it also throws off the idea of flushing the WAL concurrently to the local disk and to the slave in synchronous mode. Yes. If the master fails after writing WAL to disk and before sending it to the slave, at least latest WAL file would be inconsistent between both servers. So, regardless of using a base backup, in a setup procedure, we need to delete those inconsistent WAL files or overwrite them. And if you're unlucky, the changes in the latest WAL file might already have been flushed to data files as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Integrating hash index tupledesc hack a bit better
A few days ago I complained that the lossy-hash-indexes patch needed a better solution for letting catalog/index.c know that hash indexes will really store only hashcodes and not the underlying column type. The current version of the patch puts some hard-wired knowledge into index.c: /* HACK: make hash always use int4 as storage (really it's uint32) */ if (opclassTup-opcmethod == HASH_AM_OID) keyType = INT4OID; else keyType = opclassTup-opckeytype; which might not exactly be a stop-ship issue but it still seems pretty distasteful. We could avoid the problem if all hash opclasses were marked as having storage type int4, but that seems to just move the problem over to opclass creation --- I doubt we want to insist on user-defined opclasses changing to accommodate this, so we'd have to force this marking internally in CREATE OPERATOR CLASS. The idea that seems most attractive to me at the moment is to add a column to pg_am, probably defined like pg_opclass.opckeytype but applying to all opclasses of the index type. Another possibility is the idea I floated before of adding an AM entry point to let it manipulate the index tupledesc in a more general fashion during CREATE INDEX. However it's unclear how much flexibility that would really buy us given the assumptions that are wired into various places, so I'm thinking it's probably just useless complication. Comments, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, 2008-09-15 at 16:26 +0300, Heikki Linnakangas wrote: In any case, we'll need the capability in the slave to notice when it's about to remove a tuple that's still visible to a snapshot in the slave. Looks like what I'll do is this: Alter functions in pruneheap.c so that we bubble up the latest xid that is being removed as part of block cleaning. We then add that xid into the WAL record for cleaning. If latest xid of clean is ahead of oldestxmin of running queries on standby then Startup process needs to take action, of some kind. Re-examining the tuples in WAL apply seems bad plan, since we'd have to touch stuff in the block twice and juggle the locks. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
Simon Riggs [EMAIL PROTECTED] writes: Bottom line is that any backup of Postgres needs to include plugin directories, otherwise parts of the application could stop working following restore. This patch doesn't change that. No, backups of executables are normally not the same backups as the data and in many cases -- minor upgrades for example -- cannot be. * add the rmgr bms to the long header of each WAL file * change !RmgrIdIsValid() so that it causes FATAL by default. This then allows people to correct a mistake and retry. We provide an option to treat such errors as corrupt data and assume we have reached the end of WAL. I'm not sure but I think this just begs the question. The problem is to ensure that the rmgrid means the same thing on the restoring database as it does on the original database, or at least a compatible version. I think this would mean having a long text description and version number to compare. And as Tom points out startup isn't often enough. Would WAL headers even be often enough? We would have to ensure there was never two versions of the plugin in the same WAL file. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql is not translate-aware
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: Actually this is wrong -- since the library is going to run with postgres text domain, we need to add the files to the backend's nls.mk: Can't we give it its own text domain? It seems fundamentally wrong for a plug-in language to require core support for its messages. (Now that I think about it, that may have been the reason we don't have localization for it already.) I suppose this must be possible, since e.g. glibc manages to have its own messages separate from whatever app it's linked with. What glibc does is use dgettext() instead of plain gettext(), so they are able to pass the domain along the message. See here, at the bottom: http://github.com/bneumeier/glibc/tree/master/include/libintl.h where _libc_intl_domainname is defined as libc elsewhere. I guess one way to go about this is to add some way to pass the domain from the caller, i.e. ereport(ERROR, (errdomain(plpgsql), errmsg( ... ))) but it seems a bit fragile, if only because it's easy to forget to add it to new code, and easy to overlook a message that should have it. Another idea would be to redefine errmsg() and friends within the plpgsql sources, but that seems worse because every other library will need to do the same. Refinement of the previous idea: maybe we can add a compiler flag, to be set in Makefiles, that defines the domain and passes it down to EVALUATE_MESSAGE. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
On Mon, 2008-09-15 at 16:43 +0100, Gregory Stark wrote: Simon Riggs [EMAIL PROTECTED] writes: Bottom line is that any backup of Postgres needs to include plugin directories, otherwise parts of the application could stop working following restore. This patch doesn't change that. No, backups of executables are normally not the same backups as the data and in many cases -- minor upgrades for example -- cannot be. So you advise your clients to do backup in two halves. Then complain that this is a bad way to do a backup because it may cause insurmountable problems on restore. And then seek to reject a patch because of that, even though similar problems already exist in other parts of the system. I'm sorry, but that is circular, then faulted logic. If you do a minor upgrade that changes the on-disk format of *any* part of the system then you have just introduced a limitation into what software can be used for restore. That could be a new version of a custom datatype just as easily as it could be an rmgr plugin. Shall we ban custom datatypes? Should we add a version number into every varlen header just in case somebody switched release levels, then forgot? * add the rmgr bms to the long header of each WAL file * change !RmgrIdIsValid() so that it causes FATAL by default. This then allows people to correct a mistake and retry. We provide an option to treat such errors as corrupt data and assume we have reached the end of WAL. I'm not sure but I think this just begs the question. The problem is to ensure that the rmgrid means the same thing on the restoring database as it does on the original database, or at least a compatible version. I think this would mean having a long text description and version number to compare. Why is that any different to using functions or other plugins? If you restore data into a database where the functions have the same names, yet do different things then you are in trouble. Period. If you don't use an rmgr plugin at all then you have nothing different to do, nor will you see any different messages. If you use *any* external server software, expect to read the instructions or have problems. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Friday 12 September 2008 07:44:36 Csaba Nagy wrote: And then the actually interesting question: what will the slave do with views, rules, triggers ? I guess triggers are out of the question to be executed, what about rules ? Probably must be also ignored... user functions will probably get errors if they try to update something... Views should probably function correctly. If we dont have rules, we dont get views, so those need to be. Really we should allow anything that would work in the context of a read only transaction. (ie. functions that dont change anything should be fine to call) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] rmgr hooks and contrib/rmgr_hook
On Mon, 2008-09-15 at 10:04 -0400, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: The version mismatch idea presumes that a code author would structure their code in two pieces: one to generate the WAL and one to read it. No, the version mismatch problem is that you might try to read the WAL with a different version of the plugin than you wrote it with. Or maybe with a completely unrelated plugin that was unfortunate enough to choose the same rmgr ID. We can't afford to insert complete versioning information into each WAL record, so it's going to be pretty difficult to avoid this risk. I'm happy to include additional things into the patch, but I don't see anything to force rejection of the patch entirely, from what has been said. Bottom line is that any backup of Postgres needs to include plugin directories, otherwise parts of the application could stop working following restore. This patch doesn't change that. I proposed a registration scheme to avoid one of those problems. If a plugin changed its file format, it would clearly need to include a version test within it. It wouldn't be the fault of the plugin API if the plugin author didn't handle that. But if they can work out how to build an index AM and write WAL, I'm sure they can handle version management between software components and informative error messages if problems occur. And if they can't, they'll get a bad rep and nobody will use the plugin. Few ideas: * add the rmgr bms to the long header of each WAL file * change !RmgrIdIsValid() so that it causes FATAL by default. This then allows people to correct a mistake and retry. We provide an option to treat such errors as corrupt data and assume we have reached the end of WAL. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] EncodeDateTime performance
I'm trying to optimize postgres performance on a headless solid state hardware platform (no fans or disks). I have the database stored on a USB 2.0 flash drive (hdparm benchmarks reads at 10 MB/s). Performance is limited by the 533Mhz CPU. Hardware: IXP425 XScale (big endian) 533Mhz 64MB RAM USB 2.0 Flash Drive Software: Linux 2.6.21.4 postgres 8.2.5 I created a fresh database using initdb, then added one table. Here is the create table: CREATE TABLE archivetbl ( DateTime timestamp without time zone, StationNum smallint, DeviceDateTime timestamp without time zone, DeviceNum smallint, Tagname character(64), Value double precision, Online boolean ) WITH (OIDS=FALSE); ALTER TABLE archivetbl OWNER TO novatech; I populated the table with 38098 rows. I'm doing this simple query: select * from archivetbl; It takes 79 seconds to complete the query (when postgres is compiled with -O2). I'm running the query from pgadmin3 over TCP/IP. oprofile is showing that memset (via dopr) is using about 60% of the CPU. I traced back further and noticed most of the usage was coming from EncodeDateTime. I'm not quite sure why oprofile is showing that memset is hogging so much CPU. Regardless, I found way to eliminate most of the sprintf calls that were taking place in my situation. I made some modifications to EncodeDateTime and have attached them as a patch. These changes alone reduced the query time of the select * from archivetbl; from 79 seconds to just 35 seconds. This patch is against 8.2.5. Since I'm cross compiling changing versions is a bit of a pain, but if someone thinks the performance has changed much in this area I could probably get the latest version cross compiling. Regards, George McCollister diff -Naur postgresql-8.2.5/src/backend/utils/adt/datetime.c postgresql-8.2.5.new/src/backend/utils/adt/datetime.c --- postgresql-8.2.5/src/backend/utils/adt/datetime.c 2007-06-12 10:58:39.0 -0500 +++ postgresql-8.2.5.new/src/backend/utils/adt/datetime.c 2008-09-15 12:16:32.0 -0500 @@ -3287,6 +3287,53 @@ return TRUE; } /* EncodeTimeOnly() */ +void ymdhm(char * buf, int year, int mon, int day, int hour, int min) +{ + buf[0] = (char)((year / 1000) % 10) + '0'; + buf[1] = (char)((year / 100) % 10) + '0'; + buf[2] = (char)((year / 10) % 10) + '0'; + buf[3] = (char)(year % 10) + '0'; + buf[4] = '-'; + buf[5] = (char)((mon / 10) % 10) + '0'; + buf[6] = (char)(mon % 10) + '0'; + buf[7] = '-'; + buf[8] = (char)((day / 10) % 10) + '0'; + buf[9] = (char)(day % 10) + '0'; + buf[10] = ' '; + buf[11] = (char)((hour / 10) % 10) + '0'; + buf[12] = (char)(hour % 10) + '0'; + buf[13] = ':'; + buf[14] = (char)((min / 10) % 10) + '0'; + buf[15] = (char)(min % 10) + '0'; + buf[16] = '\0'; +} + +void append_seconds(char * buf, int sec) +{ + buf[0] = ':'; + buf[1] = (char)((sec / 10) % 10) + '0'; + buf[2] = (char)(sec % 10) + '0'; + buf[3] = '\0'; +} + +#ifdef HAVE_INT64_TIMESTAMP +void append_seconds_and_fsecs(char * buf, int sec, fsec_t fsec) +{ + buf[0] = ':'; + buf[1] = (char)((sec / 10) % 10) + '0'; + buf[2] = (char)(sec % 10) + '0'; + buf[3] = '.'; + buf[4] = (char)((fsec / 10) % 10) + '0'; + buf[5] = (char)((fsec / 1) % 10) + '0'; + buf[6] = (char)((fsec / 1000) % 10) + '0'; + buf[7] = (char)((fsec / 100) % 10) + '0'; + buf[8] = (char)((fsec / 10) % 10) + '0'; + buf[9] = (char)(fsec % 10) + '0'; + buf[10] = '\0'; +} +#endif + + /* EncodeDateTime() * Encode date and time interpreted as local time. @@ -3315,9 +3362,8 @@ case USE_ISO_DATES: /* Compatible with ISO-8601 date formats */ - sprintf(str, %04d-%02d-%02d %02d:%02d, - (tm-tm_year 0) ? tm-tm_year : -(tm-tm_year - 1), - tm-tm_mon, tm-tm_mday, tm-tm_hour, tm-tm_min); + ymdhm(str, (tm-tm_year 0) ? tm-tm_year : -(tm-tm_year - 1), +tm-tm_mon, tm-tm_mday, tm-tm_hour, tm-tm_min); /* * Print fractional seconds if any. The field widths here should @@ -3329,7 +3375,7 @@ #ifdef HAVE_INT64_TIMESTAMP if (fsec != 0) { -sprintf(str + strlen(str), :%02d.%06d, tm-tm_sec, fsec); +append_seconds_and_fsecs(str + strlen(str), tm-tm_sec, fsec); TrimTrailingZeros(str); } #else @@ -3340,7 +3386,7 @@ } #endif else -sprintf(str + strlen(str), :%02d, tm-tm_sec); +append_seconds(str + strlen(str), tm-tm_sec); /* * tzp == NULL indicates that we don't want *any* time zone info -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Tom Lane wrote: Brendan Jurd [EMAIL PROTECTED] writes: Or perhaps the DETAIL portion would be more appropriate as a CONTEXT? +1 Is this the proper syntax for errcontext() messags? //Magnus Index: hba.c === RCS file: /cvsroot/pgsql/src/backend/libpq/hba.c,v retrieving revision 1.167 diff -c -r1.167 hba.c *** hba.c 15 Sep 2008 12:32:56 - 1.167 --- hba.c 15 Sep 2008 17:38:01 - *** *** 660,666 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid IP address \%s\: %s, token, gai_strerror(ret)), ! errdetail(In file \%s\, line %d, HbaFileName, line_num))); if (cidr_slash) *cidr_slash = '/'; --- 660,666 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid IP address \%s\: %s, token, gai_strerror(ret)), ! errcontext(File \%s\, line %d, HbaFileName, line_num))); if (cidr_slash) *cidr_slash = '/'; *** *** 697,703 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid IP mask \%s\: %s, token, gai_strerror(ret)), ! errdetail(In file \%s\, line %d, HbaFileName, line_num))); if (gai_result) pg_freeaddrinfo_all(hints.ai_family, gai_result); --- 697,703 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid IP mask \%s\: %s, token, gai_strerror(ret)), ! errcontext(File \%s\, line %d, HbaFileName, line_num))); if (gai_result) pg_freeaddrinfo_all(hints.ai_family, gai_result); *** *** 773,779 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid authentication method \%s\, token), ! errdetail(In file \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } --- 773,779 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid authentication method \%s\, token), ! errcontext(File \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } *** *** 784,790 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid authentication method \%s\: not supported on this platform, token), ! errdetail(In file \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } --- 784,790 (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(invalid authentication method \%s\: not supported on this platform, token), ! errcontext(File \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } *** *** 796,802 ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(krb5 authentication is not supported on local sockets), ! errdetail(In file \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } --- 796,802 ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), errmsg(krb5 authentication is not supported on local sockets), ! errcontext(File \%s\ line %d, HbaFileName, line_num))); goto hba_other_error; } *** *** 840,852 if (line_item) ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), ! errmsg(invalid entry in file \%s\ at line %d, token \%s\, ! HbaFileName, line_num, ! (char *) lfirst(line_item; else ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), ! errmsg(missing field in file \%s\ at end of line %d, HbaFileName, line_num))); /* Come here if suitable message already logged */ --- 840,855 if (line_item) ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), ! errmsg(invalid token \%s\, ! (char *) lfirst(line_item)), ! errcontext(File %s, line %d, ! HbaFileName, line_num))); else ereport(LOG, (errcode(ERRCODE_CONFIG_FILE_ERROR), ! errmsg(missing field at end of line %d, ! line_num), ! errcontext(File %s, line %d, HbaFileName, line_num))); /* Come here if suitable message already logged */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Magnus Hagander [EMAIL PROTECTED] writes: Or perhaps the DETAIL portion would be more appropriate as a CONTEXT? +1 Is this the proper syntax for errcontext() messags? Hmm, I think I'd suggest following the wording already in use in ts_locale.c: errcontext(line %d of configuration file \%s\, stp-lineno, stp-filename); if only to save translation effort. Also, in the last example, don't include the line number in the errmsg. Otherwise it looks ok to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction Snapshots and Hot Standby
Simon Riggs wrote: On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The main idea was to invert the meaning of the xid array in the snapshot struct - instead of storing all the xid's between xmin and xmax that are to be considering in-progress, the array contained all the xid's xmin that are to be considered completed. The downside is that the size of the read-only snapshot is theoretically unbounded, which poses a bit of a problem if it's supposed to live inside shared memory... Why do it inverted? That clearly has problems. Because it solves the problem of sponteaously apprearing XIDs in the WAL. At least prior to 8.3 with virtual xids, a transaction might have allocated it's xid long before actually writing anything to disk, and therefore long before this XID ever shows up in the WAL. And with a non-inverted snapshot such an XID would be considered to be completed by transactions on the slave... So, one either needs to periodically log a snapshot on the master or log XID allocations which both seem to cause considerable additional load on the master. With an inverted snapshot, it's sufficient to log the current RecentXmin - a values that is readily available on the master, and therefore the cost amounts to just one additional 4-byte field per xlog entry. regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Transaction Snapshots and Hot Standby
Simon Riggs wrote: On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The current read-only snapshot (which current meaning the corresponding state on the master at the time the last replayed wal record was generated) was maintained in shared memory. It' xmin field was continually updated with the (newly added) XLogRecord.xl_xmin field, which contained the xid of the oldest running query on the master, with a pruning step after each ReadOnlySnapshot.xmin update to remove all entries xmin from the xid array. If a commit was seen for an xid, that xid was added to the ReadOnlySnapshot.xid array. The advantage of this concept is that it handles snapshotting on the slave without too much additional work for the master (The only change is the addition of the xl_xmin field to XLogRecord). It especially removes that need to track ShmemVariableCache-nextXid. Snapshots only need to know which transactions are currently running during WAL apply. The standby can't remove any tuples itself, so it doesn't need to know what the master's OldestXmin is. I used the logged xmin value to track the shared snapshot's xmin, which in turn allowed me to prune the xid array, eliminating all xids that xmin. regards, Florian Pflug smime.p7s Description: S/MIME Cryptographic Signature
Re: [HACKERS] Transaction Snapshots and Hot Standby
On Mon, 2008-09-15 at 19:20 +0100, Florian G. Pflug wrote: Simon Riggs wrote: On Sat, 2008-09-13 at 10:48 +0100, Florian G. Pflug wrote: The main idea was to invert the meaning of the xid array in the snapshot struct - instead of storing all the xid's between xmin and xmax that are to be considering in-progress, the array contained all the xid's xmin that are to be considered completed. The downside is that the size of the read-only snapshot is theoretically unbounded, which poses a bit of a problem if it's supposed to live inside shared memory... Why do it inverted? That clearly has problems. Because it solves the problem of sponteaously apprearing XIDs in the WAL. At least prior to 8.3 with virtual xids, a transaction might have allocated it's xid long before actually writing anything to disk, and therefore long before this XID ever shows up in the WAL. And with a non-inverted snapshot such an XID would be considered to be completed by transactions on the slave... So, one either needs to periodically log a snapshot on the master or log XID allocations which both seem to cause considerable additional load on the master. With an inverted snapshot, it's sufficient to log the current RecentXmin - a values that is readily available on the master, and therefore the cost amounts to just one additional 4-byte field per xlog entry. I think I understand what you're saying now, though I think it mostly/only applies before your brilliant idea in 8.3. If we have a transaction history that looks like this: ReadA, WriteB, WriteA (both commit in either order) then pre-8.3 we would have xidA xidB, whereas at 8.3 and above we see that xidA is actually higher than xidB. Now, TransactionIds are assigned in the order of their first page write and *not* in the order of transaction start as was previously the case, which isn't very obvious. So when we replay WAL, we know that WAL is only written with locks held, so that WriteA and WriteB must be independent of each other. So that means the locks held by Read1 can be ignored and we can assume that the above history is the same as WriteB, WriteA So if we took a snapshot in the middle of WriteB we would be safe to say that only transaction B was in progress and that transaction A was not yet started. So the snapshot we derive on the standby is different to the one we would have derived on the client, yet the serializable order is the same. In general, this means that all reads on a transaction prior to the first write can be reordered later so that they can be assumed to occur exactly prior to the first write of a transaction. (Please shoot me down, if incorrect). So when we see the first WAL record of a transaction we know that there are no in progress transactions with a *lower* xid that we have not yet seen. So we cannot be confused about whether a transaction is in-progress, or not. Almost. Now having written all of that I see there is an obvious race condition between assignment of an xid and actions that result in the acquisition of WALInsertLock. So even though the above seems mostly correct, there is still a gap to plug, but a much smaller one. So when writing the first WAL record for xid=48, it is possible that xid=47 has just been assigned and is also just about to write a WAL record. Thus the absence of a WAL record for xid=47 is not evidence that xid=47 is complete because it was read-only. We might handle this with the inverted technique you describe, but there should be an easier way to track dense packing of xid sequence. We expect xids to be written to WAL in the order assigned, so we might check whether a newly assigned xid is 1 higher than the last highest value to have inserted into WAL. If it is not, then we can write a short WAL record to inform readers of WAL that the missing xids in sequence are in progress also. So readers of WAL will see xids in the correct sequence and are thus able to construct valid snapshots direct from WAL. I think I should measure how often that occurs to see what problem or overhead this might cause, if any. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: Or perhaps the DETAIL portion would be more appropriate as a CONTEXT? +1 Is this the proper syntax for errcontext() messags? Hmm, I think I'd suggest following the wording already in use in ts_locale.c: errcontext(line %d of configuration file \%s\, stp-lineno, stp-filename); if only to save translation effort. Also, in the last example, don't include the line number in the errmsg. Otherwise it looks ok to me. Updated and applied. //Magnus -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 8.3 vs HEAD difference in Interval output?
Unless I'm compiling stuff wrong, it seems HEAD is giving me slightly different output on Intervals than 8.3 in the roundoff of seconds. 8.3 was rounding to the nearest fraction of a second, HEAD seems to be truncating. In the psql output below it shows 8.3.1 outputting 6.70 secs while the similar output for head is showing 6.69 secs. At first glance it seems this might be because HEAD defaults to USE_INTEGER_DATETIMES, which leads to HAVE_INT64_TIMESTAMP which leads to sprintf(cp, %s%d.%02d secs, is_nonzero ? : , tm-tm_sec, ((int) sec) / 1); in EncodeInterval in datetime.c which doesn't seem to be doing any rounding. Am I interpreting this right? If so, shall I submit a patch that rounds it to hundredths of a second (hundredths seems hardcoded in the sprintf), or perhaps just silently add that fix to the EncodeInterval patch I'm doing any for SQL Standard and ISO intervals? psql (8.4devel) Type help for help. regression=# set datestyle to sql; SET regression=# select '1 year 2 mons 3 days 04:05:06.69'::interval; interval - @ 1 year 2 mons 3 days 4 hours 5 mins 6.69 secs (1 row) Welcome to psql 8.3.1, the PostgreSQL interactive terminal. ... pg83=# set datestyle to sql; SET pg83=# select '1 year 2 mons 3 days 04:05:06.69'::interval; interval - @ 1 year 2 mons 3 days 4 hours 5 mins 6.70 secs (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
Ron Mayer [EMAIL PROTECTED] writes: Unless I'm compiling stuff wrong, it seems HEAD is giving me slightly different output on Intervals than 8.3 in the roundoff of seconds. 8.3 was rounding to the nearest fraction of a second, HEAD seems to be truncating. Yeah, that's surely because of the change to integer timestamps. Am I interpreting this right? If so, shall I submit a patch that rounds it to hundredths of a second (hundredths seems hardcoded in the sprintf), or perhaps just silently add that fix to the EncodeInterval patch I'm doing any for SQL Standard and ISO intervals? This is not the only place where the float-timestamps code has rounding behavior that doesn't appear in the integer-timestamps code. I don't know if we want to buy into making them act the same ... after all, if they acted exactly the same, we'd not have bothered with writing the integer code. In this example, rounding to hundredths doesn't seem like a particularly good idea; it seems to me it should give you the exact down-to-the-microsecond value. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
On Mon, Sep 15, 2008 at 4:58 PM, in message [EMAIL PROTECTED], Tom Lane [EMAIL PROTECTED] wrote: Ron Mayer [EMAIL PROTECTED] writes: Unless I'm compiling stuff wrong, it seems HEAD is giving me slightly different output on Intervals than 8.3 in the roundoff of seconds. 8.3 was rounding to the nearest fraction of a second, HEAD seems to be truncating. Yeah, that's surely because of the change to integer timestamps. Am I interpreting this right? If so, shall I submit a patch that rounds it to hundredths of a second (hundredths seems hardcoded in the sprintf), or perhaps just silently add that fix to the EncodeInterval patch I'm doing any for SQL Standard and ISO intervals? This is not the only place where the float-timestamps code has rounding behavior that doesn't appear in the integer-timestamps code. I don't know if we want to buy into making them act the same ... after all, if they acted exactly the same, we'd not have bothered with writing the integer code. In this example, rounding to hundredths doesn't seem like a particularly good idea; it seems to me it should give you the exact down-to-the-microsecond value. I find the results on 8.3.3 with integer timestamps surprising: ccdev=# select version(); version -- PostgreSQL 8.3.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (prerelease) (SUSE Linux) (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval; interval -- 1 year 2 mons 3 days 04:05:06.69 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(1); interval -- 1 year 2 mons 3 days 04:05:06.70 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(2); interval -- 1 year 2 mons 3 days 04:05:06.70 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(3); interval -- 1 year 2 mons 3 days 04:05:06.70 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(4); interval -- 1 year 2 mons 3 days 04:05:06.70 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(5); interval -- 1 year 2 mons 3 days 04:05:06.70 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.69'::interval(6); interval -- 1 year 2 mons 3 days 04:05:06.69 (1 row) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
Kevin Grittner [EMAIL PROTECTED] wrote: I find the results on 8.3.3 with integer timestamps surprising: Even more surprising is the behavior for interval(1) here: ccdev=# select '1 year 2 mons 3 days 04:05:06.64321'::interval; interval - 1 year 2 mons 3 days 04:05:06.64321 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.64321'::interval(1); interval -- 1 year 2 mons 3 days 04:05:06.60 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.64321'::interval(2); interval -- 1 year 2 mons 3 days 04:05:06.64 (1 row) ccdev=# select '1 year 2 mons 3 days 04:05:06.64321'::interval(3); interval --- 1 year 2 mons 3 days 04:05:06.643 (1 row) etc. That trailing zero should be considered a bug. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
Kevin Grittner wrote: ...not the only place where the float-timestamps code has rounding behavior that doesn't appear in the integer-timestamps code. ... I find the results on 8.3.3 with integer timestamps surprising: Agreed it's surprising and agreed there are more places. Sounds like I should keep that separate and perhaps later submit a separate patch to identify and/or remove surprising rounding behavior. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
Ron Mayer [EMAIL PROTECTED] writes: Sounds like I should keep that separate and perhaps later submit a separate patch to identify and/or remove surprising rounding behavior. Agreed. It's easier to review and get consensus on patches if you keep separate issues separate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 8.3 vs HEAD difference in Interval output?
Unless I'm compiling stuff wrong, it seems HEAD is giving me slightly different output on Intervals than 8.3 in the roundoff of seconds. 8.3 was rounding to the nearest fraction of a second, HEAD seems to be truncating. In the psql output below it shows 8.3.1 outputting 6.70 secs while the similar output for head is showing 6.69 secs. At first glance it seems this might be because HEAD defaults to USE_INTEGER_DATETIMES, which leads to HAVE_INT64_TIMESTAMP which leads to sprintf(cp, %s%d.%02d secs, is_nonzero ? : , tm-tm_sec, ((int) sec) / 1); in EncodeInterval in datetime.c which doesn't seem to be doing any rounding. Am I interpreting this right? If so, shall I submit a patch that rounds it to hundredths of a second (hundredths seems hardcoded in the sprintf), or perhaps just silently add that fix to the EncodeInterval patch I'm doing any for SQL Standard and ISO intervals? psql (8.4devel) Type help for help. regression=# set datestyle to sql; SET regression=# select '1 year 2 mons 3 days 04:05:06.69'::interval; interval - @ 1 year 2 mons 3 days 4 hours 5 mins 6.69 secs (1 row) Welcome to psql 8.3.1, the PostgreSQL interactive terminal. ... pg83=# set datestyle to sql; SET pg83=# select '1 year 2 mons 3 days 04:05:06.69'::interval; interval - @ 1 year 2 mons 3 days 4 hours 5 mins 6.70 secs (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 8.3 vs HEAD difference in Interval output?
Tom Lane wrote: This is not the only place where the float-timestamps code has rounding behavior that doesn't appear in the integer-timestamps code. Yeah... For that matter, I find this surprising as well: regression=# select '0.7 secs'::interval, ('7 secs'::interval/10); interval | ?column? -+- 00:00:00.69 | 00:00:00.70 (1 row) I'll start making a list of them for a future patch down the road. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Coping with nLocks overflow
We have recently seen one definite and one probable report of overflow of the nLocks field of a backend's local lock table: http://archives.postgresql.org/pgsql-bugs/2008-09/msg00021.php While it's still unclear exactly why 8.3 seems more prone to this than earlier releases, the general shape of the problem seems clear enough. We have many code paths that intentionally take a lock on some object and leave it locked until end of transaction. Repeat such a command on the same object enough times within one transaction, and voila, overflow. What's news, perhaps, is that we've reached a performance level where this can actually happen within transactions of lengths that people might try to run. I'm considering that a simple solution to this might be to widen nLocks from int to int64. This would definitely fix it on machines that have working int64 arithmetic, and if there are any left that do not, they're probably not fast enough to encounter the overflow in real-world usage anyway. For machines that aren't native 64-bit it would add a couple of cycles to lock acquisition/release, but that's hardly likely to be measurable compared to all the other work done in LockAcquire/LockRelease. Comments, objections? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec
Tom Lane wrote: support for SQL-spec interval literals. I decided to go look at exactly how unfinished it was, and it turns out that it's actually pretty close. Hence the attached proposed patch ;-) Is this code handling negative interval literals right? I think I quote the relevant spec part at the bottom. If I'm reading the spec right, I find this surprising. regression=# select interval '-1-1'; interval --- -1 days -01:00:00 regression=# select interval '1-1'; interval -- 1 year 1 mon Also if I read the spec right, ISTM the sign should apply to the entire interval literal. But if I compiled the patch right, the negative sign applies only to the first part it encounters? regression=# select interval '-1 2:3:4'; interval --- -1 days +02:03:04 (1 row) If I understand right, this'll add confusion to SQL standard output for mixed year-month and day-time intervals that have both negative and positive components too. :( This looks to me like the relevant part of SQL 200N(8?), which seems to me to allow a sign inside the quotes: interval literal ::= INTERVAL [ sign ] interval string interval qualifier interval string ::= quote unquoted interval string quote unquoted interval string ::= [ sign ] { year-month literal | day-time literal } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec
Ron Mayer [EMAIL PROTECTED] writes: Is this code handling negative interval literals right? I think I quote the relevant spec part at the bottom. We support independent signs for the different components of the interval. I'm not sure how well the spec copes with that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed patch: make SQL interval-literal syntax work per spec
Tom Lane wrote: Ron Mayer [EMAIL PROTECTED] writes: Is this code handling negative interval literals right? I think I quote the relevant spec part at the bottom. We support independent signs for the different components of the Even so it surprises me that: '-1-1'::interval gives me a day-hour interval while '1-1'::interval gives me a year-month interval. I'm not sure how well the spec copes with that. If I'm read the spec right, SQL 2008 expects -1 12:34:56 to be what we'd see as -1 -12:34:56, and doesn't handle with different signs for different components at all. SQL 92 seems to have been simpler, apparently requiring the negative sign to stay outside the quotes. ==SQL 92=== interval literal ::= INTERVAL [ sign ] interval string interval qualifier interval string ::= quote { year-month literal | day-time literal } quote === SQL 200N seems to allow a sign inside the string: ==SQL 200N= interval literal ::= INTERVAL [ sign ] interval string interval qualifier interval string ::= quote unquoted interval string quote unquoted interval string ::= [ sign ] { year-month literal | day-time literal } === -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)
Peter, Abhijit, Could you tell me the current status of reviewing process in the CommitFest:Sep? I can understand the patches contain several new concept and a bit large, and it is a tough work to review them comprehensively. I'm not unconcern anything even if reviwing comments are partial one. However, I cannot improve anything without any comments. :( Did the previous message which I posted help you to understand the patches? If not, please feel free to ask me anything. Thanks, KaiGai Kohei wrote: Hello, The latest SE-PostgreSQL patches are updated here: [1/4] http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1005.patch [2/4] http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1005.patch [3/4] http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1005.patch [4/4] http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1005.patch They contains rebasing to the CVS HEAD, because we cannot apply the previous ones correctly due to progress in the base version. Rest of changes are here: - A new PGACE hook: pgaceIsAllowExecutorRunHook(). It enables to control overriding ExecutorRun_hook, because several important hooks are invoked from standard_ExecutorRun(). - T_SEvalItem related equal() functions are added to nodes/equalfuncs.c. # I've left for implement them. - Fix typo in src/include/security/pgace.h BTW, I thought I have to show the overview of the patch to help reviwers. The main patch ([1/4]) is especially big and contains new concepts. The following explanation shows several key concept of SE-PostgreSQL. I'm happy if it can reduce the load of reviwers. No need to say, please feel free to ask me anything. :-) Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei [EMAIL PROTECTED] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] NDirectFileRead and Write
Here is a revised patch to renewal NDirectFileRead/Write. Tom Lane [EMAIL PROTECTED] wrote: * Let's rename them BufFileReadCount and BufFileWriteCount to reflect their actual purpose. * In any case I agree that the current arrangement with execdebug.h declaring variables defined in bufmgr.c is just weird. - NDirectFile{Read|Write} are renamed to BufFile{Read|Write}Count. - They are still declared in execdebug.h and buffile.c includes it. I did not touch messages in ShowBufferUsage() in the patch. I think it still has meaning because BufFile counters are kinds of direct block access. ShowBufferUsage() ! Shared blocks: ... ! Local blocks: ... ! Direct blocks: R read, W written Regards, --- ITAGAKI Takahiro NTT Open Source Software Center BufFileCount.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpgsql is not translate-aware
Alvaro Herrera [EMAIL PROTECTED] writes: Tom Lane wrote: Can't we give it its own text domain? It seems fundamentally wrong for a plug-in language to require core support for its messages. Another idea would be to redefine errmsg() and friends within the plpgsql sources, but that seems worse because every other library will need to do the same. Refinement of the previous idea: maybe we can add a compiler flag, to be set in Makefiles, that defines the domain and passes it down to EVALUATE_MESSAGE. It seems like something like that could work. I'd be inclined to change the ereport() macro itself, not errmsg/errdetail/etc. The domain name could be passed in at ereport and then used when evaluating the messages. A possible problem is that some places use _() (ie, gettext) directly instead of leaving it to the elog.c code to apply translation. But I suppose we might be able to redefine _() too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Add default_val to pg_settings
Attached is an updated and separate version of my patch exposing the internal GUC boot_val as default_val, which failed to attach itself to the already committed changes to the pg_settings view. Brief reminder of what it does: postgres=# select name,setting,default_val from pg_settings where name='shared_buffers'; name | setting | default_val +-+- shared_buffers | 4096| 1024 General justification for this change with a longer example is at http://archives.postgresql.org/pgsql-hackers/2008-09/msg00233.php Based on feedback the first time around, I updated the documentation for this column to read Default value assumed at server startup if the parameter is not otherwise set. Would only take a quick search/replace of the patch to change default_val to something else if there are still any objections there. boot_val is another candidate name, I feel that would make the purpose of the column less obvious to users of pg_settings even if it is more correct. I'm really more concerned about the feature than exactly what it's named though. I didn't bother to expose the reset_val since I can't think of any obvious use cases for wanting to know it. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MDIndex: doc/src/sgml/catalogs.sgml === RCS file: /home/gsmith/cvsrepo/pgsql/doc/src/sgml/catalogs.sgml,v retrieving revision 2.174 diff -c -r2.174 catalogs.sgml *** doc/src/sgml/catalogs.sgml 15 Sep 2008 18:43:41 - 2.174 --- doc/src/sgml/catalogs.sgml 16 Sep 2008 03:18:42 - *** *** 6422,6427 --- 6422,6433 values)/entry /row row + entrystructfielddefault_val/structfield/entry + entrytypetext/type/entry + entryDefault value assumed at server startup if the parameter is not + otherwise set/entry + /row + row entrystructfieldsourcefile/structfield/entry entrytypetext/type/entry entryInput file the current value was set from (NULL for values set in Index: src/backend/utils/misc/guc.c === RCS file: /home/gsmith/cvsrepo/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.472 diff -c -r1.472 guc.c *** src/backend/utils/misc/guc.c10 Sep 2008 19:16:22 - 1.472 --- src/backend/utils/misc/guc.c16 Sep 2008 03:44:09 - *** *** 6087,6092 --- 6087,6094 { case PGC_BOOL: { + struct config_bool *lconf = (struct config_bool *) conf; + /* min_val */ values[9] = NULL; *** *** 6095,6100 --- 6097,6106 /* enumvals */ values[11] = NULL; + + /* default_val */ + snprintf(buffer, sizeof(buffer), %s, lconf-boot_val ? on : off); + values[12] = pstrdup(buffer); } break; *** *** 6112,6117 --- 6118,6127 /* enumvals */ values[11] = NULL; + + /* default_val */ + snprintf(buffer, sizeof(buffer), %d, lconf-boot_val); + values[12] = pstrdup(buffer); } break; *** *** 6129,6139 --- 6139,6155 /* enumvals */ values[11] = NULL; + + /* default_val */ + snprintf(buffer, sizeof(buffer), %g, lconf-boot_val); + values[12] = pstrdup(buffer); } break; case PGC_STRING: { + struct config_string *lconf = (struct config_string *) conf; + /* min_val */ values[9] = NULL; *** *** 6142,6152 --- 6158,6180 /* enumvals */ values[11] = NULL; + + /* default_val */ + if (lconf-boot_val == NULL) + { + values[12] = NULL; + } else + { + snprintf(buffer, sizeof(buffer), %s, lconf-boot_val); + values[12] = pstrdup(buffer); + } }