Re: [HACKERS] Unreproducible bug in snapshot import code
On Tue, Oct 25, 2011 at 2:41 AM, Gurjeet Singh singh.gurj...@gmail.com wrote: On Mon, Oct 24, 2011 at 1:08 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Gurjeet Singh's message of lun oct 24 13:55:44 -0300 2011: Starting from line 89 of terminal1.txt we see this snippet: postgres=# rollback; ROLLBACK postgres=# begin TRANSACTION ; BEGIN postgres=# set transaction snapshot '02C8-1'; ERROR: SET TRANSACTION SNAPSHOT must be called before any query postgres=# rollback; ROLLBACK As you can see the SET TRANSACTION SNAPSHOT was the first statement in that transaction, and yet the ERROR message says that it is not. Maybe the tab-completion feature issued a query before the set transaction command. I have tried reproducing the bug starting from 1 and 2 transactions before the one shown in snippet, and I used tab-completion to get the same screen-output as termonal1.txt and yet it's not reproducible. I could reproduce it when I typed TAB just after typing set in set transaction snapshot. As Tom and Alvaro pointed out, the tab-completion issues a query and which prevents the set transaction snapshot command. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] So, is COUNT(*) fast now?
Hello, my experience is that as soon as index only scans are available they are used - sometimes just because of the simple logic that a user thinks it is faster. Even when the index is so ridiculously long just to have all info in the index... Regards Wolfgang Wilhelm Von: Tom Lane t...@sss.pgh.pa.us An: Robert Haas robertmh...@gmail.com Cc: Kevin Grittner kevin.gritt...@wicourts.gov; pgsql-hackers@postgresql.org Gesendet: 21:35 Montag, 24.Oktober 2011 Betreff: Re: [HACKERS] So, is COUNT(*) fast now? Robert Haas robertmh...@gmail.com writes: But even though Tom's statement that most indexes are one column might be a slight exaggeration, I suspect it probably is true that the optimizations he's talking about for large numbers of columns won't produce any material benefit even for a 3 or 4 column index. Which makes me think maybe we should focus our efforts elsewhere. Right. If we thought the average was something like ten, it might be worth pursuing optimizations similar to slot_getallattrs. If it's around two or three, almost certainly not. Your point about people trying to create wider indexes to exploit index-only scans is an interesting one, but I think it's premature to optimize on the basis of hypotheses about what people might do in future. Not sure about your other idea of returning multiple tuples per amgettuple call. The trouble with that is that it will add complexity (and hence cycles) at the nodeIndexscan level, because now nodeIndexscan will have to buffer those tuples, keep track of whether it's fetching forward or backward, etc etc. Plus another layer of the same in indexam.c (index_getnext etc). I'm not at all convinced that it's likely to be a net win. I wonder how trustworthy the measure of the visibilitymap_test call site as a consumer of cycles really is. I've frequently noticed that oprofile blames remarkably large fractions of the runtime on individual statements that appear to be quite trivial. I'm not sure if that represents real hardware-level effects such as cache line switching, or whether it's just measurement artifacts. Keep in mind that sampling-based measurements are always subject to sampling artifacts. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 25.10.2011 08:12, Fujii Masao wrote: On Tue, Oct 25, 2011 at 12:24 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 24.10.2011 15:29, Fujii Masao wrote: +listitem +para + Copy the pg_control file from the cluster directory to the global + sub-directory of the backup. For example: +programlisting + cp $PGDATA/global/pg_control /mnt/server/backupdir/global +/programlisting +/para +/listitem Why is this step required? The control file is overwritten by information from the backup_label anyway, no? Yes, when recovery starts, the control file is overwritten. But before that, we retrieve the minimum recovery point from the control file. Then it's used as the backup end location. During recovery, pg_stop_backup() cannot write an end-of-backup record. So, in standby-only backup, other way to retrieve the backup end location (instead of an end-of-backup record) is required. Ishiduka-san used the control file as that, according to your suggestion ;) http://archives.postgresql.org/pgsql-hackers/2011-05/msg01405.php Oh :-) +para + Again connect to the database as a superuser, and execute +functionpg_stop_backup/. This terminates the backup mode, but does not + perform a switch to the next WAL segment, create a backup history file and + wait for all required WAL segments to be archived, + unlike that during normal processing. +/para +/listitem How do you ensure that all the required WAL segments have been archived, then? The patch doesn't provide any capability to ensure that, IOW assumes that's a user responsibility. If a user wants to ensure that, he/she needs to calculate the backup start and end WAL files from the result of pg_start_backup() and pg_stop_backup() respectively, and needs to wait until those files have appeared in the archive. Also if the required WAL file has not been archived yet, a user might need to execute pg_switch_xlog() in the master. Frankly, I think this whole thing is too fragile. The procedure is superficially similar to what you do on master: run pg_start_backup(), rsync data directory, run pg_stop_backup(), but is actually subtly different and more complicated. If you don't know that, and don't follow the full procedure, you get a corrupt backup. And the backup might look ok, and might even sometimes work, which means that you won't notice in quick testing. That's a *huge* foot-gun. I think we need to step back and find a way to make this: a) less complicated, or at least b) more robust, so that if you don't follow the procedure, you get an error. With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Mon, Oct 24, 2011 at 7:13 AM, Florian Pflug f...@phlo.org wrote: I think Simon's theory that we're starting recovery from the wrong place, i.e. should start with an earlier WAL location, is probably correct. The question is, why? Err, that's not what I said and I don't mean that. Having said that, what I said about pg_control being invalid would imply that, so is wrong also. We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the oldestActiveXid is being derived later than it should be, which can cause problems if this then means that whole pages are unitialised in subtrans. The bug only shows up if you do enough transactions (2048 is always enough) to move to the next subtrans page between the redo pointer and the checkpoint record while at the same time we do not have a long running transaction that spans those two points. That's just enough to happen reasonably frequently on busy systems and yet just enough to have slipped through testing. We must either 1. During CreateCheckpoint() we should derive oldestActiveXid before we derive the redo location 2. Change the way subtrans pages are initialized during recovery so we don't rely on oldestActiveXid I need to think some more before a decision on this in my own mind, but I lean towards doing (1) as a longer term fix and doing (2) as a short term fix for existing releases. I expect to have a fix later today. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] debug query execution
Hi. I would like to ask you which sources are responsible for execute queries in PostgreSQL? ( i would like to run a simple query and debug it execution on PostgreSql server for understanding how PostgeSql does query processing internally) regards, Vadym
Re: [HACKERS] Online base backup from the hot-standby
On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +para + Again connect to the database as a superuser, and execute +functionpg_stop_backup/. This terminates the backup mode, but does not + perform a switch to the next WAL segment, create a backup history file and + wait for all required WAL segments to be archived, + unlike that during normal processing. +/para +/listitem How do you ensure that all the required WAL segments have been archived, then? The patch doesn't provide any capability to ensure that, IOW assumes that's a user responsibility. If a user wants to ensure that, he/she needs to calculate the backup start and end WAL files from the result of pg_start_backup() and pg_stop_backup() respectively, and needs to wait until those files have appeared in the archive. Also if the required WAL file has not been archived yet, a user might need to execute pg_switch_xlog() in the master. Frankly, I think this whole thing is too fragile. The procedure is superficially similar to what you do on master: run pg_start_backup(), rsync data directory, run pg_stop_backup(), but is actually subtly different and more complicated. If you don't know that, and don't follow the full procedure, you get a corrupt backup. And the backup might look ok, and might even sometimes work, which means that you won't notice in quick testing. That's a *huge* foot-gun. I think we need to step back and find a way to make this: a) less complicated, or at least b) more robust, so that if you don't follow the procedure, you get an error. One idea to make the way more robust is to change the PostgreSQL so that it writes the buffer page to a temporary space instead of database file during a backup. This means that there is no torn-pages in the database files of the backup. After backup, the data blocks are written back to the database files over time. When recovery starts from that backup(i.e., backup_label is found), it clears the temporary space in the backup first and continues recovery by using the database files which contain no torn-pages. OTOH, in crash recovery (i.e., backup_label is not found), recovery is performed by using both database files and temporary space. This whole approach would make the standby-only backup available even if FPW is disabled in the master and you don't care about the order to backup the control file. But this idea looks overkill. It seems very complicated to implement that, and likely to invite other bugs. I don't have any other good and simple idea for now. With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. Yes, so we should give up supporting manual procedure? And extend pg_basebackup for the standby-only backup, first? I can live with this. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote: We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the oldestActiveXid is being derived later than it should be, which can cause problems if this then means that whole pages are unitialised in subtrans. The bug only shows up if you do enough transactions (2048 is always enough) to move to the next subtrans page between the redo pointer and the checkpoint record while at the same time we do not have a long running transaction that spans those two points. That's just enough to happen reasonably frequently on busy systems and yet just enough to have slipped through testing. We must either 1. During CreateCheckpoint() we should derive oldestActiveXid before we derive the redo location 2. Change the way subtrans pages are initialized during recovery so we don't rely on oldestActiveXid I need to think some more before a decision on this in my own mind, but I lean towards doing (1) as a longer term fix and doing (2) as a short term fix for existing releases. I expect to have a fix later today. (1) looks the best way forwards in all cases. Patch attached. Will be backpatched to 9.0 I think it is possible to avoid taking XidGenLock during GetRunningTransactions() now, but I haven't included that change in this patch. Any other comments before commit? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services oldestActiveXid_fixed.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
2011/10/25 Shigeru Hanada shigeru.han...@gmail.com: I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. I have not looked at the code itself, but I wonder if we shouldn't consider making this a part of core-proper, not just a contrib module. The fact that it isn't *already* available in core surprises a lot of people... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On Tue, Oct 25, 2011 at 10:50, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 25, 2011 at 3:44 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +para + Again connect to the database as a superuser, and execute +functionpg_stop_backup/. This terminates the backup mode, but does not + perform a switch to the next WAL segment, create a backup history file and + wait for all required WAL segments to be archived, + unlike that during normal processing. +/para +/listitem How do you ensure that all the required WAL segments have been archived, then? The patch doesn't provide any capability to ensure that, IOW assumes that's a user responsibility. If a user wants to ensure that, he/she needs to calculate the backup start and end WAL files from the result of pg_start_backup() and pg_stop_backup() respectively, and needs to wait until those files have appeared in the archive. Also if the required WAL file has not been archived yet, a user might need to execute pg_switch_xlog() in the master. Frankly, I think this whole thing is too fragile. The procedure is superficially similar to what you do on master: run pg_start_backup(), rsync data directory, run pg_stop_backup(), but is actually subtly different and more complicated. If you don't know that, and don't follow the full procedure, you get a corrupt backup. And the backup might look ok, and might even sometimes work, which means that you won't notice in quick testing. That's a *huge* foot-gun. I think we need to step back and find a way to make this: a) less complicated, or at least b) more robust, so that if you don't follow the procedure, you get an error. One idea to make the way more robust is to change the PostgreSQL so that it writes the buffer page to a temporary space instead of database file during a backup. This means that there is no torn-pages in the database files of the backup. After backup, the data blocks are written back to the database files over time. When recovery starts from that backup(i.e., backup_label is found), it clears the temporary space in the backup first and continues recovery by using the database files which contain no torn-pages. OTOH, in crash recovery (i.e., backup_label is not found), recovery is performed by using both database files and temporary space. This whole approach would make the standby-only backup available even if FPW is disabled in the master and you don't care about the order to backup the control file. But this idea looks overkill. It seems very complicated to implement that, and likely to invite other bugs. I don't have any other good and simple idea for now. With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. Yes, so we should give up supporting manual procedure? And extend pg_basebackup for the standby-only backup, first? I can live with this. I don't think we should necessarily give up completely. But doing a pg_basebackup way *first* seems reasonable - because it's going to be the easiest one to get right, given that we have more control there. Doesn't mean we shouldn't extend it in the future... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Oct25, 2011, at 11:13 , Simon Riggs wrote: On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote: We are starting recovery at the right place but we are initialising the clog and subtrans incorrectly. Precisely, the oldestActiveXid is being derived later than it should be, which can cause problems if this then means that whole pages are unitialised in subtrans. The bug only shows up if you do enough transactions (2048 is always enough) to move to the next subtrans page between the redo pointer and the checkpoint record while at the same time we do not have a long running transaction that spans those two points. That's just enough to happen reasonably frequently on busy systems and yet just enough to have slipped through testing. We must either 1. During CreateCheckpoint() we should derive oldestActiveXid before we derive the redo location (1) looks the best way forwards in all cases. Let me see if I understand this The probem seems to be that we currently derive oldestActiveXid end the end of the checkpoint, just before writing the checkpoint record. Since we use oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before that checkpoint record (but after the REDO location, of course) may very well contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt to touch these XID's SUBTRANS state, we error out. Your patch seems sensible, because the checkpoint logically occurs at the REDO location not the checkpoint's location, so we ought to log an oldestActiveXID corresponding to that location. What I don't understand is how this affects the CLOG. How does oldestActiveXID factor into CLOG initialization? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Idea: Always consistent in-database cache using SSI mechanisms
On Tue, Oct 25, 2011 at 00:00, Alexander Korotkov aekorot...@gmail.com wrote: On Tue, Oct 25, 2011 at 1:46 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Alexander Korotkov aekorot...@gmail.com wrote: Coundn't be predicate locking implementation in SSI be used for in-database cache invalidation. It would not necessarily be limited to *in-database* caches. The main thing would be to design a good API to the predicate locking portion of SSI, which I think is about 80% of the SSI code. Dan and I both have an interest in such further use, and there have been others who have talked about potential uses for the non-blocking predicate locking. I think the API would need to be based around a listen/notify model. IIRC, I discussed this with Dan Ports at pgcon, as it was one of the things he had been looking into as well. You might want to talk to him about it. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander mag...@hagander.net wrote: I don't think we should necessarily give up completely. But doing a pg_basebackup way *first* seems reasonable - because it's going to be the easiest one to get right, given that we have more control there. Doesn't mean we shouldn't extend it in the future... Agreed. The question is -- how far should we change pg_basebackup to get right? I think it's not difficult to change it so that it backs up the control file at the end. But eliminating the need for full_page_writes=on seems not easy. No? So I'm not inclined to do that in at least first commit. Otherwise, I'm afraid the patch would become huge. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On Tue, Oct 25, 2011 at 13:54, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander mag...@hagander.net wrote: I don't think we should necessarily give up completely. But doing a pg_basebackup way *first* seems reasonable - because it's going to be the easiest one to get right, given that we have more control there. Doesn't mean we shouldn't extend it in the future... Agreed. The question is -- how far should we change pg_basebackup to get right? I think it's not difficult to change it so that it backs up the control file at the end. But eliminating the need for full_page_writes=on seems not easy. No? So I'm not inclined to do that in at least first commit. Otherwise, I'm afraid the patch would become huge. It's more server side of base backups than the actual pg_basebackup tool of course, but I'm sure that's what we're all referring to here. Personally, I'd see the fpw stuff as part of the infrastructure needed. Meaning that the fpw stuff should go in *first*, and the pg_basebackup stuff later. If we want something to go in early, that could be as simple as a version of pg_basebackup that runs against the slave but only if full_page_writes=on on the master. If it's not, it throws an error. Then we can improve upon that by adding handling of fpw=off, first by infrastructure, then by tool. Doing it piece by piece like that is probably a good idea, since as you say, all at once will be pretty huge. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Magnus Hagander mag...@hagander.net writes: 2011/10/25 Shigeru Hanada shigeru.han...@gmail.com: I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. I have not looked at the code itself, but I wonder if we shouldn't consider making this a part of core-proper, not just a contrib module. The fact that it isn't *already* available in core surprises a lot of people... We've just spent a whole lot of blood and sweat on making the extension mechanism work nicely. I don't understand this urge to not use it. ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. Once we do that its release schedule will get locked to core's --- wouldn't it be better to keep flexibility for now, while it's in such active development? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Tue, Oct 25, 2011 at 14:08, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: 2011/10/25 Shigeru Hanada shigeru.han...@gmail.com: I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. I have not looked at the code itself, but I wonder if we shouldn't consider making this a part of core-proper, not just a contrib module. The fact that it isn't *already* available in core surprises a lot of people... We've just spent a whole lot of blood and sweat on making the extension mechanism work nicely. I don't understand this urge to not use it. We're back to the old discussion, I guess.. I'm happy to see it as an extension, but I think it should be included with the standard installation. Like we do with for example pl/pgsql (which I realize has a dependency on the backend anyway, so it can't be done another way easily) and pl/perl (which doesn't, AFAIK, so it's a better example) ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. Once we do that its release schedule will get locked to core's --- wouldn't it be better to keep flexibility for now, while it's in such active development? I would be happy to keep it outside, and integrate it in the final CF for example :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. Once we do that its release schedule will get locked to core's --- wouldn't it be better to keep flexibility for now, while it's in such active development? I would be happy to keep it outside, and integrate it in the final CF for example :) Right now, file_fdw is the only FDW module that we have in the core, however, it is inadequacy to proof the new concept of FDW feature to utilize external RDBMS, such as join push-down of foreign tables. I think the pgsql-fdw module also should be included in the core distribution as a basis of future enhancement, unless we don't need any working modules when an enhancement of FDW is proposed. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
* Kohei KaiGai (kai...@kaigai.gr.jp) wrote: Right now, file_fdw is the only FDW module that we have in the core, Erm, guess I'm a bit confused why we've got that in core while not putting pgsql_fdw in core. This all gets back to previous discussions around 'recommended' contrib modules (which should really be installed by default on the filesystem through the distros, ala Debian's recommends: approach) and 'other' contrib modules. I'm in favor of making that distinction. I would still have pgsql_fdw, file_fdw, etc, be packaged more-or-less the same way and still use the CREATE EXTENTION framework, of course. It would be nice if we didn't have to lock the release schedule of those recommended modules to the core release schedule, or even to each other, but that's a separate issue, imv. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote: What I don't understand is how this affects the CLOG. How does oldestActiveXID factor into CLOG initialization? It is an entirely different error. Chris' clog error was caused by a file read error. The file was opened, we did a seek within the file and then the call to read() failed to return a complete page from the file. The xid shown is 22811359, which is the nextxid in the control file. So StartupClog() must have failed trying to read the clog page from disk. That isn't a Hot Standby problem, a recovery problem nor is it certain its a PostgreSQL problem. OTOH SlruPhysicalReadPage() does cope gracefully with missing clog files during recovery, so maybe we can think of a way to make recovery cope with a SLRU_READ_FAILED error gracefully also. Any ideas? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-10-25 02:44 AM, Heikki Linnakangas wrote: With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. Would pg_basebackup be able to work with the buffer manager on the slave to avoid full_page_writes=on needing to be set on the master? (the point of this is to be able to take the base backup without having the backup program contact the master). If so could pg_start_backup() not just put the buffer manager into the same state? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 25.10.2011 15:56, Steve Singer wrote: On 11-10-25 02:44 AM, Heikki Linnakangas wrote: With pg_basebackup, we have a fighting chance of getting this right, because we have more control over how the backup is made. For example, we can co-operate with the buffer manager to avoid torn-pages, eliminating the need for full_page_writes=on, and we can include a control file with the correct end-of-backup location automatically, without requiring user intervention. pg_basebackup is less flexible than the pg_start/stop_backup method, and unfortunately you're more likely to need the flexibility in a more complicated setup with a hot standby server and all, but making the generic pg_start/stop_backup method work seems infeasible at the moment. Would pg_basebackup be able to work with the buffer manager on the slave to avoid full_page_writes=on needing to be set on the master? (the point of this is to be able to take the base backup without having the backup program contact the master). In theory, yes. I'm not sure how difficult it would be in practice. Currently, the walsender process just scans and copies everything in the data directory, at the filesystem level. It would have to go through the buffer manager instead, to avoid reading a page at the same time that the buffer manager is writing it out. If so could pg_start_backup() not just put the buffer manager into the same state? No. . The trick that pg_basebackup (= walsender) can do is to co-operate with the buffer manager when reading each page. An external program cannot do that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On Tue, Oct 25, 2011 at 3:08 PM, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: 2011/10/25 Shigeru Hanada shigeru.han...@gmail.com: I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a contrib module. I think that this module would be the basis of further SQL/MED development for core, e.g. join-push-down and ANALYZE support. I have not looked at the code itself, but I wonder if we shouldn't consider making this a part of core-proper, not just a contrib module. The fact that it isn't *already* available in core surprises a lot of people... We've just spent a whole lot of blood and sweat on making the extension mechanism work nicely. I don't understand this urge to not use it. ATM I'm not sure it's even a good idea to push pgsql_fdw into contrib. Once we do that its release schedule will get locked to core's --- wouldn't it be better to keep flexibility for now, while it's in such active development? Simple question - do FDW internals need work? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Unreproducible bug in snapshot import code
On Tue, Oct 25, 2011 at 2:33 AM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 25, 2011 at 2:41 AM, Gurjeet Singh singh.gurj...@gmail.com wrote: On Mon, Oct 24, 2011 at 1:08 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Gurjeet Singh's message of lun oct 24 13:55:44 -0300 2011 : Starting from line 89 of terminal1.txt we see this snippet: postgres=# rollback; ROLLBACK postgres=# begin TRANSACTION ; BEGIN postgres=# set transaction snapshot '02C8-1'; ERROR: SET TRANSACTION SNAPSHOT must be called before any query postgres=# rollback; ROLLBACK As you can see the SET TRANSACTION SNAPSHOT was the first statement in that transaction, and yet the ERROR message says that it is not. Maybe the tab-completion feature issued a query before the set transaction command. I have tried reproducing the bug starting from 1 and 2 transactions before the one shown in snippet, and I used tab-completion to get the same screen-output as termonal1.txt and yet it's not reproducible. I could reproduce it when I typed TAB just after typing set in set transaction snapshot. As Tom and Alvaro pointed out, the tab-completion issues a query and which prevents the set transaction snapshot command. Great! That settles it then. Reproducible, but not a bug. Thanks, -- Gurjeet Singh EnterpriseDB Corporation The Enterprise PostgreSQL Company
[HACKERS] lexemes in prefix search going through dictionary modifications
I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. db=# select to_tsquery('english', 's:*'); NOTICE: text-search query contains only stop words or doesn't contain lexemes, ignored to_tsquery (1 row) db=# select to_tsquery('simple', 's:*'); to_tsquery 's':* (1 row) I also think that this is a mistake. It should only be highlighting s. db=# select ts_headline('sushant', to_tsquery('simple', 's:*')); ts_headline bsushant/b Thanks, Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lexemes in prefix search going through dictionary modifications
On Oct25, 2011, at 17:26 , Sushant Sinha wrote: I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. Hm, I don't think so. If they don't pass through stopword dictionaries, then queries containing stopwords will fail to find any rows - which is probably not what one would expect. Here's an example: Query for records containing the* and car*. The @@-operator returns true, because the stopword is removed from both the tsvector and the tsquery (the 'english' dictionary drops 'these' as a stopward and stems 'cars' to 'car. Both the tsvector and the query end up being just 'car') postgres=# select to_tsvector('english', 'these cars') @@ to_tsquery('english', 'the:* car:*'); ?column? -- t (1 row) Here what happens stopwords aren't removed from the query (Now, the tsvector ends up being 'car', but the query is 'the:* car:*') postgres=# select to_tsvector('english', 'these cars') @@ to_tsquery('simple', 'the:* car:*'); ?column? -- f (1 row) best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] isolationtester's dry run mode
This patch adds a -n option to isolationtester. With this option, instead of running the tests, it prints the permutations that would be run. This is very useful for generating test specs which need to remove permutations that are not valid (because they would run a command on a blocked session). Objections? Sample run: $ ./isolationtester -n ~-/specs/classroom-scheduling.spec permutation rx1 wy1 c1 ry2 wx2 c2 permutation rx1 wy1 ry2 c1 wx2 c2 permutation rx1 wy1 ry2 wx2 c1 c2 permutation rx1 wy1 ry2 wx2 c2 c1 permutation rx1 ry2 wy1 c1 wx2 c2 permutation rx1 ry2 wy1 wx2 c1 c2 permutation rx1 ry2 wy1 wx2 c2 c1 permutation rx1 ry2 wx2 wy1 c1 c2 permutation rx1 ry2 wx2 wy1 c2 c1 permutation rx1 ry2 wx2 c2 wy1 c1 permutation ry2 rx1 wy1 c1 wx2 c2 permutation ry2 rx1 wy1 wx2 c1 c2 permutation ry2 rx1 wy1 wx2 c2 c1 permutation ry2 rx1 wx2 wy1 c1 c2 permutation ry2 rx1 wx2 wy1 c2 c1 permutation ry2 rx1 wx2 c2 wy1 c1 permutation ry2 wx2 rx1 wy1 c1 c2 permutation ry2 wx2 rx1 wy1 c2 c1 permutation ry2 wx2 rx1 c2 wy1 c1 permutation ry2 wx2 c2 rx1 wy1 c1 -- Álvaro Herrera alvhe...@alvh.no-ip.org isotester-dryrun.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] isolationtester's dry run mode
Excerpts from Alvaro Herrera's message of mar oct 25 13:32:31 -0300 2011: This patch adds a -n option to isolationtester. With this option, instead of running the tests, it prints the permutations that would be run. This is very useful for generating test specs which need to remove permutations that are not valid (because they would run a command on a blocked session). Forgot to credit the author: this patch was developed by Alexander Shulgin. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Range Types - typo + NULL string constructor
On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote: Hmm, I don't think that's safe. After Oid wraparound, a range type oid might get reused for some other range type, and the cache would return stale values. Extremely unlikely to happen by accident, but could be exploited by an attacker. Any ideas on how to remedy that? I don't have another plan for making it perform well. Plugging it into the cache invalidation mechanism seems like overkill, but I suppose that would solve the problem. Aren't there a few other cases like this floating around the code? I know the single-xid cache is potentially vulnerable to xid wraparound for the same reason. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lexemes in prefix search going through dictionary modifications
On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote: On Oct25, 2011, at 17:26 , Sushant Sinha wrote: I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. Hm, I don't think so. If they don't pass through stopword dictionaries, then queries containing stopwords will fail to find any rows - which is probably not what one would expect. I think what you are saying a feature is really a bug. I am fairly sure that when someone says to_tsquery('english', 's:*') one is looking for an entry that has a *non-stopword* word that starts with 's'. And specially so in a text search configuration that eliminates stop words. Does it even make sense to stem, abbreviate, synonym for a few letters? It will be so unpredictable. -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for distinguishing PG instances in event log v2
2011/7/16 MauMau maumau...@gmail.com: Hello, The attached file is a revised patch which reflects all review comments by Magnus in: http://archives.postgresql.org/pgsql-hackers/2011-07/msg00839.php I made sure the previous tests (both custom and default PostgreSQL event source) succeeded. I'm submitting this to the currently open CommitFest 2001-9 shortly. Please review it again. I have applied this patch after another round of rather extensive modifications. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)
On Mon, Oct 24, 2011 at 3:05 PM, Alexander Korotkov aekorot...@gmail.comwrote: If we allow user to specify own gist_penalty function, then such function should deal with: 1) GiST-specific data structures such as GISTENTRY. 2) Decomposing ranges using range_deserialize. 3) Inifinities, which we could handle in general penalty functions. Thats why I prefere to implement subtype_diff. I forgot another agument for having subtype_diff: 4) In my picksplit algorithm it would be more natural to use subtype_diff for measuring overlap than use penalty function. -- With best regards, Alexander Korotkov.
Re: [HACKERS] (PATCH) Adding CORRESPONDING to Set Operations
On Tue, October 25, 2011 19:49, Kerem Kat wrote: On Mon, Oct 24, 2011 at 20:52, Erik Rijkers e...@xs4all.nl wrote: On Wed, October 19, 2011 15:01, Kerem Kat wrote: Adding CORRESPONDING to Set Operations I have corrected the ordering of the targetlists of subqueries. And added 12 regression tests for column list ordering. Can you confirm that the order has changed for you? Yes, this one is OK. thanks, Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Firing order of RI triggers
I've looked into the cause of bug #6268, http://archives.postgresql.org/pgsql-bugs/2011-10/msg00223.php It's fairly simple: we're firing RI triggers in the wrong order. What's happening is that we update the tuple and queue RI_FKey_check_upd and RI_FKey_cascade_upd events for the update action, *in that order*. When RI_FKey_check_upd runs, it checks things and quite properly complains that there's no matching PK, since the row that used to have the matching PK is now obsolete. Had RI_FKey_cascade_upd fired first, all would have been well, because when RI_FKey_check_upd fired for this particular update, it would've seen the new tuple is already obsolete and done nothing. The reason they fire in the wrong order is that triggers for a single event are fired in name order, and the names being used are things like RI_ConstraintTrigger_53569. Most of the time, the trigger with higher OID is going to sort last ... and createForeignKeyTriggers creates the check triggers before the action triggers. You might wonder why this doesn't mean that all self-referential foreign key situations are broken all the time. Well, the answer is that the problem is usually masked by the optimization that avoids firing a check trigger at all if the referencing field didn't change --- see AfterTriggerSaveEvent. In the test case given in the bug, the first UPDATE within the transaction doesn't see the problem because of this. But in the second UPDATE of the same row, that optimization is disabled, so the check trigger fires and fails. As far as I can see, the only practical way to fix this is to change the names given to RI triggers so that cascade actions will fire before check triggers. Just changing the order of creation would fix it 99.99% of the time, but fail on the times when the first trigger had OID 9 and the second OID 100, for example. And I definitely don't think we want to mess with the general rule that triggers fire in name order. I'm thinking we could do RI_ConstraintTrigger_a_ for action triggers and RI_ConstraintTrigger_c_ for checking triggers, and then the names would be guaranteed to sort correctly. I'm not sure if this is something we can back-patch --- I don't see any dependencies in our own code on what names RI triggers have, but I'm afraid there is client-side code out there that knows it. In any case, changing the name assignments would not fix things for existing triggers; but if we did back-patch then any affected users could just drop and re-create the problematic FK constraint. Or maybe we could back-patch a change in creation order and rely on that usually working. Given the lack of prior complaints that might be good enough. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lexemes in prefix search going through dictionary modifications
On Oct25, 2011, at 18:47 , Sushant Sinha wrote: On Tue, 2011-10-25 at 18:05 +0200, Florian Pflug wrote: On Oct25, 2011, at 17:26 , Sushant Sinha wrote: I am currently using the prefix search feature in text search. I find that the prefix characters are treated the same as a normal lexeme and passed through stemming and stopword dictionaries. This seems like a bug to me. Hm, I don't think so. If they don't pass through stopword dictionaries, then queries containing stopwords will fail to find any rows - which is probably not what one would expect. I think what you are saying a feature is really a bug. I am fairly sure that when someone says to_tsquery('english', 's:*') one is looking for an entry that has a *non-stopword* word that starts with 's'. And specially so in a text search configuration that eliminates stop words. But the whole idea of removing stopwords from the query is that users *don't* need to be aware of the precise list of stopwords. The way I see it, stopwords are simply an optimization that helps reduce the size of your fulltext index. Assume, for example, that the postgres mailing list archive search used tsearch (which I think it does, but I'm not sure). It'd then probably make sense to add postgres to the list of stopwords, because it's bound to appear in nearly every mail. But wouldn't you want searched which include 'postgres*' to turn up empty? Quite certainly not. Does it even make sense to stem, abbreviate, synonym for a few letters? It will be so unpredictable. That depends on the language. In german (my native tongue), one can concatenate nouns to form new nouns. It's this not entirely unreasonable that one would want the prefix to be stemmed to it's singular form before being matched. Also, suppose you're using a dictionary which corrects common typos. Who says you wouldn't want that to be applied to prefix queries? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] (PATCH) Adding CORRESPONDING to Set Operations
On Mon, Oct 24, 2011 at 20:52, Erik Rijkers e...@xs4all.nl wrote: On Wed, October 19, 2011 15:01, Kerem Kat wrote: Adding CORRESPONDING to Set Operations Initial patch, filename: corresponding_clause_v2.patch I had a quick look at the behaviour of this patch. Btw, the examples in your email were typoed (one select is missing): SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f; should be: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f; and SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f; should be: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f; Yes you are correct, mea culpa. But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed, according to the standard (foundation, p. 408): 2) If corresponding column list is specified, then let SL be a select list of those column names explicitly appearing in the corresponding column list in the order that these column names appear in the corresponding column list. Every column name in the corresponding column list shall be a column name of both T1 and T2. That would make this wrong, I think: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ; b | c ---+--- 2 | 3 4 | 6 (2 rows) i.e., I think it should show columns in the order c, b (and not b, c); the order of the CORRESPONDING BY phrase. (but maybe I'm misreading the text of the standard; I find it often difficult to follow) It wasn't a misread, I checked the draft, in my version same explanation is at p.410. I have corrected the ordering of the targetlists of subqueries. And added 12 regression tests for column list ordering. Can you confirm that the order has changed for you? Thanks, Erik Rijkers Regards, Kerem KAT *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *** *** 1225,1230 --- 1225,1233 primaryEXCEPT/primary /indexterm indexterm zone=queries-union +primaryCORRESPONDING/primary + /indexterm + indexterm zone=queries-union primaryset union/primary /indexterm indexterm zone=queries-union *** *** 1241,1249 The results of two queries can be combined using the set operations union, intersection, and difference. The syntax is synopsis ! replaceablequery1/replaceable UNION optionalALL/optional replaceablequery2/replaceable ! replaceablequery1/replaceable INTERSECT optionalALL/optional replaceablequery2/replaceable ! replaceablequery1/replaceable EXCEPT optionalALL/optional replaceablequery2/replaceable /synopsis replaceablequery1/replaceable and replaceablequery2/replaceable are queries that can use any of --- 1244,1252 The results of two queries can be combined using the set operations union, intersection, and difference. The syntax is synopsis ! replaceablequery1/replaceable UNION optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable ! replaceablequery1/replaceable INTERSECT optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable ! replaceablequery1/replaceable EXCEPT optionalALL/optional optionalCORRESPONDING optionalBY (replaceableselect_list/replaceable)/optional/optional replaceablequery2/replaceable /synopsis replaceablequery1/replaceable and replaceablequery2/replaceable are queries that can use any of *** *** 1283,1288 --- 1286,1299 /para para + literalCORRESPONDING/ returns all columns that are in both replaceablequery1/ and replaceablequery2/ with the same name. + /para + + para + literalCORRESPONDING BY/ returns all columns in the column list that are also in both replaceablequery1/ and replaceablequery2/ with the same name. + /para + + para In order to calculate the union, intersection, or difference of two queries, the two queries must be quoteunion compatible/quote, which means that they return the same number of columns and *** a/doc/src/sgml/sql.sgml --- b/doc/src/sgml/sql.sgml *** *** 859,865 [ WHERE replaceable class=PARAMETERcondition/replaceable ] [ GROUP BY replaceable class=PARAMETERexpression/replaceable [, ...] ] [ HAVING replaceable class=PARAMETERcondition/replaceable [, ...] ] ! [ { UNION | INTERSECT | EXCEPT } [ ALL ] replaceable class=PARAMETERselect/replaceable ] [ ORDER BY replaceable class=parameterexpression/replaceable [ ASC | DESC | USING replaceable class=parameteroperator/replaceable ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { replaceable class=PARAMETERcount/replaceable | ALL } ] [ OFFSET replaceable class=PARAMETERstart/replaceable ] --- 859,865 [ WHERE replaceable class=PARAMETERcondition/replaceable ] [ GROUP BY
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Chris, can you rearrange the backup so you copy the pg_control file as the first act after the pg_start_backup? I tried this and it doesn't seem to make any difference. I also tried the patch and I can no longer reproduce the subtrans error, however instead it now it starts up, but never gets to the point where it'll accept connections. It starts up but if I try to do anything I always get FATAL: the database system is starting up...even if the load is removed from the primary, the standby still never finishes starting up. Attached below is a log of one of these startup attempts. In my testing with the patch applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go into the db is starting up statethe pg_clog error is still there, but seems much harder to reproduce nowI've seen it only once since applying the patch (out of probably 50 or 60 under-load startup attempts). It does seem to be moody like that thoit will be very difficult to reproduce for a while, and then it will happen damn-near every time for a while...weirdness On a bit of a side note, I've been thinking of changing my scripts so that they perform an initial rsync prior to doing the startbackup-rsync-stopbackup just so that the second rsync will be fasterso that the backup is in progress for a shorter period of time, as while it is running it will stop other standbys from starting upthis shouldn't cause any issues eh? 2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG: database system was interrupted; last known up at 2011-10-25 13:43:11 MDT 2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG: creating missing WAL directory pg_xlog/archive_status 2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG: entering standby mode DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 2/CF00 2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG: streaming replication successfully connected to primary 2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG: checkpoint record is at 2/CF81A478 2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG: redo record is at 2/CF20; shutdown FALSE 2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG: next transaction ID: 0/4634700; next OID: 1188228 2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG: next MultiXactId: 839; next MultiXactOffset: 1686 2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG: oldest unfrozen transaction ID: 1669, in database 1 2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG: transaction ID wrap limit is 2147485316, limited by database with OID 1 2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG: resetting unlogged relations: cleanup 1 init 0 2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG: initializing for hot standby 2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG: consistent recovery state reached at 2/CF81A4D0 2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG: redo starts at 2/CF20 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed because recovery snapshot incomplete 2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT: xlog redo running xacts: nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130 xacts: 4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344 4634345 4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354 4634355 4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364 4634365 4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517 4634518 4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527 4634528 4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537 4634538 4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388 4634389 4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398 4634399 4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408 4634409 4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418 4634419 4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586 4634587 4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596 4634597 4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606 4634607; subxid ovf 2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL: the database system is starting up DEBUG: standby sync_rep_test has now caught up with primary 2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL: the database system is starting up and so on... On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org
Re: [HACKERS] Firing order of RI triggers
Excerpts from Tom Lane's message of mar oct 25 14:57:43 -0300 2011: I'm not sure if this is something we can back-patch --- I don't see any dependencies in our own code on what names RI triggers have, but I'm afraid there is client-side code out there that knows it. Yeah, sounds possible. In any case, changing the name assignments would not fix things for existing triggers; but if we did back-patch then any affected users could just drop and re-create the problematic FK constraint. Or maybe we could back-patch a change in creation order and rely on that usually working. Given the lack of prior complaints that might be good enough. The latter looks reasonable ... particularly if the symptoms of a botched order would be immediately visible -- the user could just drop and reload the constraints to fix the order in the very unlikely case that they are reversed. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Firing order of RI triggers
Alvaro Herrera alvhe...@commandprompt.com writes: Excerpts from Tom Lane's message of mar oct 25 14:57:43 -0300 2011: ... Or maybe we could back-patch a change in creation order and rely on that usually working. Given the lack of prior complaints that might be good enough. The latter looks reasonable ... particularly if the symptoms of a botched order would be immediately visible -- the user could just drop and reload the constraints to fix the order in the very unlikely case that they are reversed. Well, the symptoms would probably be just like in the bug report: you'd get unexpected failures from double updates of a self-referential row in a single transaction. That's a sufficiently weird corner case that most people probably wouldn't exercise it right away. But given that this problem has been there from day one and nobody noticed before, I'm not too concerned about the intersection of people who have an issue and people who are unlucky enough to get an end-of-decade trigger OID. I think 100% solution in HEAD and 99.99% solution in back branches should be good enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lexemes in prefix search going through dictionary modifications
On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote: Assume, for example, that the postgres mailing list archive search used tsearch (which I think it does, but I'm not sure). It'd then probably make sense to add postgres to the list of stopwords, because it's bound to appear in nearly every mail. But wouldn't you want searched which include 'postgres*' to turn up empty? Quite certainly not. That improves recall for postgres:* query and certainly doesn't help other queries like post:*. But more importantly it affects precision for all queries like a:*, an:*, and:*, s:*, 't:*', the:*, etc (When that is the only search it also affects recall as no row matches an empty tsquery). Since stopwords are smaller, it means prefix search for a few characters is meaningless. And I would argue that is when the prefix search is more important -- only when you know a few characters. -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Update on documentation builds on OSX w/ macports
On Oct20, 2011, at 05:07 , Dan Ports wrote: On Thu, Oct 20, 2011 at 02:02:09AM +0200, Florian Pflug wrote: I've patched the ports for openjade, iso8879 and docbook-dsssl, and added a new port for docbook-sgml-4.2. These patches are sitting in the macports trac now, waiting to be applied. I'll try to take a look at them in the next couple days (with my MacPorts hat on), unless someone beats me to it. And an effective hat that was! Thanks for cleaning up and applying all of those patches, Dan! Should someone want to try this out, beware that clang (at least the version shipped with XCode 4.2) doesn't like openjade. Also, universal builds of openjade seem to be broken. On my machine, I thus need to say port install openjade -universal configure.compiler=llvm-gcc-4.2 instead of just port install openjade I've submitted a patch for that, so hopefully that problem too will go away in the near future. Anyway, after having installed openjade, a simple port install docbook-sgml-4.2 should be enough to make our documentation build work. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] TOAST versus VACUUM, or missing chunk number 0 for toast value identified
I believe I have reproduced the behavior described by Andrew Hammond in http://archives.postgresql.org/pgsql-general/2011-10/msg00928.php This is using the regression database: 1. In session 1, do set default_statistics_target TO 1; analyze tenk1; (We need the large stats target to ensure that tenk1's pg_statistic entries require toasting.) 2. Attach to session 1 with a debugger and set a breakpoint at CommitTransaction's call to CallXactCallbacks (or anyplace after ProcArrayEndTransaction and before AtEOXact_Inval). 3. In session 2, do select count(*) from tenk1 where fivethous 2500; (This loads up session 2's syscaches with toasted pg_statistic entries.) 4. In session 1, again do analyze tenk1; and wait for it to stop at the breakpoint. 5. In session 3 (or you can use session 2 for this), do vacuum verbose pg_statistic; You should see it removing toast entries that were generated in step 1 and obsoleted in step 4. 6. In session 2, again do select count(*) from tenk1 where fivethous 2500; and voila: ERROR: missing chunk number 0 for toast value 53668 in pg_toast_2619 What has happened here is that the second ANALYZE has marked itself committed in pg_clog and no longer running in the ProcArray, so VACUUM feels entitled to remove toast tuples that the ANALYZE deleted. However, the ANALYZE has not yet sent out the sinval messages that would inform session 2 that its syscache entries are obsolete. In Andrew's report, presumably the machine was under enough load to slow down ANALYZE at just this point, and there was a concurrent autovacuum that would have done the rest of the deed. The problem could only be seen for a short interval, which squares with his report, and with a similar one from Tim Uckun back in September. Ordinarily, sending out sinval messages post-commit is okay because we don't release locks until after that, and we suppose that our locks prevent any other transactions from getting to the point of using syscache entries that might have been invalidated by our transaction. However, *we have carefully hacked on ANALYZE until it doesn't take any locks that would block concurrent queries on the analyzed table.* So the normal protection against stale syscache entries simply doesn't work for pg_statistic fetches. I'm not sure about a good way to fix this. When we last dealt with a similar failure, Heikki suggested that we forcibly detoast all fields in a tuple that we're putting into the syscaches: http://archives.postgresql.org/pgsql-hackers/2011-08/msg00661.php I don't much like that, though, as it seems expensive, and I'm worried about possible circularity of needing to know about all toastable fields while making a syscache entry, and anyway it's papering over a symptom rather than solving the actual problem that we're relying on a stale syscache entry. We could fix it by not using a syscache anymore for pg_statistic entries, but that's probably not acceptable from a performance standpoint. A clean fix would be to add locking that blocks would-be users of pg_statistic entries when an ANALYZE is about to commit. This isn't much fun from a performance standpoint either, but at least it should be relatively cheap most of the time. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] isolationtester and invalid permutations
Instead of simply aborting a spec that specifies running commands on blocked sessions (what we call an invalid permutation), it seems more useful to report the problem, cleanup the sessions, and continue with the next permutation. This, in conjunction with the dry-run patch I submitted earlier, makes it easier to determine a working spec: dry-run the spec; copy the so-generated permutation lines into the spec; run the spec normally, which reports the invalid permutations; comment out the invalid permutations from the spec; done. The attached patch, again from Alexander Shulgin (with some tweaks from me) does that. Comments? -- Álvaro Herrera alvhe...@alvh.no-ip.org isotester-allow-invalid-perms.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
That isn't a Hot Standby problem, a recovery problem nor is it certain its a PostgreSQL problem. Do you have any theories on this that I could help investigate? It happens even when using pg_basebackup and it persists until another sync is performed, so the files must be in some state that that it can't recover fromwithout understanding the internals just viewing from an outside perspective, I don't really see how this could not be a PostgreSQL problem
Re: [HACKERS] Online base backup from the hot-standby
On Tue, Oct 25, 2011 at 9:03 PM, Magnus Hagander mag...@hagander.net wrote: On Tue, Oct 25, 2011 at 13:54, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Oct 25, 2011 at 7:19 PM, Magnus Hagander mag...@hagander.net wrote: I don't think we should necessarily give up completely. But doing a pg_basebackup way *first* seems reasonable - because it's going to be the easiest one to get right, given that we have more control there. Doesn't mean we shouldn't extend it in the future... Agreed. The question is -- how far should we change pg_basebackup to get right? I think it's not difficult to change it so that it backs up the control file at the end. But eliminating the need for full_page_writes=on seems not easy. No? So I'm not inclined to do that in at least first commit. Otherwise, I'm afraid the patch would become huge. It's more server side of base backups than the actual pg_basebackup tool of course, but I'm sure that's what we're all referring to here. Personally, I'd see the fpw stuff as part of the infrastructure needed. Meaning that the fpw stuff should go in *first*, and the pg_basebackup stuff later. Agreed. I'll extract FPW stuff from the patch that I submitted, and revise it as the infrastructure patch. The changes of pg_start_backup() etc that Ishiduka-san did are also a server-side infrastructure. I will extract them as another infrastructure one. Ishiduka-san, if you have time, feel free to try the above, barring objection. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST versus VACUUM, or missing chunk number 0 for toast value identified
On Tue, Oct 25, 2011 at 9:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: What has happened here is that the second ANALYZE has marked itself committed in pg_clog and no longer running in the ProcArray, so VACUUM feels entitled to remove toast tuples that the ANALYZE deleted. However, the ANALYZE has not yet sent out the sinval messages that would inform session 2 that its syscache entries are obsolete. In Andrew's report, presumably the machine was under enough load to slow down ANALYZE at just this point, and there was a concurrent autovacuum that would have done the rest of the deed. The problem could only be seen for a short interval, which squares with his report, and with a similar one from Tim Uckun back in September. Ordinarily, sending out sinval messages post-commit is okay because we don't release locks until after that, and we suppose that our locks prevent any other transactions from getting to the point of using syscache entries that might have been invalidated by our transaction. However, *we have carefully hacked on ANALYZE until it doesn't take any locks that would block concurrent queries on the analyzed table.* So the normal protection against stale syscache entries simply doesn't work for pg_statistic fetches. This is very similar to one of the issues that reared its ugly head in regards to Simon's now-reverted patch to lower DDL locking strength. You identified some other issues there as well, but *one* of the issues was that, as in this case, the sinval mechanism fails to provide the necessary synchronization guarantees unless the lock required to reread the updated data conflicts with the lock required to change the data. In that case, the data meant the pg_class entry or the pg_attribute entry whereas here it means the pg_statistic entry, but I believe the principal is the same. And there as here, (1) there is a fundamental conflict between what the sinval mechanism requires for correctness and what is actually desirable in terms of lock levels from a user experience point of view and (2) it is relatively easy to write code that looks superficially safe but which actually contains subtle race conditions. IIRC, you never thought Simon's patch looked safe, but I'm guessing that this pg_statistic bug has been around for a long time. So I'm wondering if we ought to rethink our position that users of the sinval machinery must provide their own external synchronization through heavyweight locking, and instead build the synchronization into the sinval mechanism itself. One idea I had was to include the XID of the transaction sending the sinval mechanism in every message, and to force clients receiving a message to do XactLockTableWait() for each such XID. That would force the backend reloading its cache to wait until the committing transaction reaches the lock-release phase. If we sent out sinval messages just before removing ourselves from the ProcArray, I think that would more-or-less fix this bug (although maybe I'm missing some reason why it's not practical to send them that early) except that I don't see any way to handle the sinval-reset case, which seems to more or less kill this idea in its tracks. But maybe there's some other mechanism whereby we could combine sending the sinval messages slightly earlier (before ProcArrayEndTransaction) with blocking anyone who processes those messages until after the committing backend finishes ProcArrayEndTransaction. For example, you could add an additional LWLock, which has to be held in exclusive mode by a committing transaction that sends any sinval messages. It must be acquired before sending the sinval messages and can't be released until after ProcArrayEndTransaction() is complete. Anyone processing a sinval message must acquire and release the lock in shared mode before reloading their caches, so that we guarantee that at the time you reread the catalogs, any transactions involved in sending those messages are visible. That's actually a bit coarse-grained; there's probably a better mechanism, but I'm just throwing this out to see if the basic idea has any legs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TOAST versus VACUUM, or missing chunk number 0 for toast value identified
Robert Haas robertmh...@gmail.com writes: On Tue, Oct 25, 2011 at 9:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Ordinarily, sending out sinval messages post-commit is okay because we don't release locks until after that, and we suppose that our locks prevent any other transactions from getting to the point of using syscache entries that might have been invalidated by our transaction. However, *we have carefully hacked on ANALYZE until it doesn't take any locks that would block concurrent queries on the analyzed table.* So the normal protection against stale syscache entries simply doesn't work for pg_statistic fetches. This is very similar to one of the issues that reared its ugly head in regards to Simon's now-reverted patch to lower DDL locking strength. You identified some other issues there as well, but *one* of the issues was that, as in this case, the sinval mechanism fails to provide the necessary synchronization guarantees unless the lock required to reread the updated data conflicts with the lock required to change the data. Right. We may take as little as AccessShareLock on a relation before examining its pg_statistic entries, and ANALYZE isn't taking anything that would block that. So I'm wondering if we ought to rethink our position that users of the sinval machinery must provide their own external synchronization through heavyweight locking, and instead build the synchronization into the sinval mechanism itself. Yeah, it's starting to feel like we need a basic redesign of sinval ... although I'd not care to back-patch that, so we also need to think of a sane solution for the back branches. If we sent out sinval messages just before removing ourselves from the ProcArray, I think that would more-or-less fix this bug (although maybe I'm missing some reason why it's not practical to send them that early) except that I don't see any way to handle the sinval-reset case, which seems to more or less kill this idea in its tracks. The other reason that doesn't work is there's a race condition: someone might load their cache entry immediately after the sinval message went past, but before the updating transaction commits. But maybe there's some other mechanism whereby we could combine sending the sinval messages slightly earlier (before ProcArrayEndTransaction) with blocking anyone who processes those messages until after the committing backend finishes ProcArrayEndTransaction. For example, you could add an additional LWLock, which has to be held in exclusive mode by a committing transaction that sends any sinval messages. Doesn't sound very scalable :-(. Even given your recent changes to reduce the overhead of checking for sinval messages, I'm not sure that it'd be practical to move the sinval message processing to just-before-we-look-up-a-cache-entry. Right now, we do AcceptInvalidationMessages basically once per table per query (or maybe it's twice or so, but anyway a very small multiplier on that). If we try to do it every time through SearchSysCache, we are probably talking two to three orders of magnitude more checks, which ISTM is certain to push the sinval queue back up to the top of the heap for contention. But in any case, this isn't the core of the problem. The real point here is that we need a guarantee that a syscache entry we're going to use is/was valid as of some suitable time point later than the start of our current transaction. (Once we have taken a snapshot, VACUUM will know that it can't remove any tuples that were deleted after the time of that snapshot; so even for SnapshotNow fetches, it's important to have an MVCC snapshot to protect toast-table dereferences.) Perhaps rather than tying the problem into SearchSysCache, we should attach the overhead to GetTransactionSnapshot, which is called appealingly few times per query. But right offhand it seems like that only protects us against the toast-tuple-deletion problem, not against the more general one of getting a stale view of the status of some relation. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers