[HACKERS] Re: [pgeu-general] REMINDER: FOSDEM 2012 - PostgreSQL Devroom: Call for Speakers
Please keep in mind, that the Call for Speakers is open until December 20th. Only a few days left. Now it's a good time to submit your proposal ;-) Did someone applied? -- -- Emanuel Calvo Helpame.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ToDo: conditional ALTER TABLE
Hello I am working on quiet dumps now. i found a small issue. pg_dump produces a statements ALTER TABLE ONLY public.b DROP CONSTRAINT b_fk_fkey; ALTER TABLE ONLY public.a DROP CONSTRAINT a_pkey; DROP TABLE IF EXISTS public.b; DROP TABLE IF EXISTS public.a; Actually there is no a conditional ALTER. These statements must be before DROPs, but then it can fails when these tables are missing. So some form like ALTER TABLE IF EXISTS ... should be usefull Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every
Hi list, As discussed on the pgsql-general list, the bool_and() and bool_or() aggregate functions behave exactly like min() and max() would over booleans. While it's not likely that people would have an appropriate index on a boolean column, it seems it wouldn't cost us anything to take advantage of this optimization, as it requires no code changes at all, simply value changes in the pg_aggregate catalog. Before: db=# explain analyze select bool_and(b) from bools; Aggregate (cost=1693.01..1693.02 rows=1 width=1) - Seq Scan on bools (cost=0.00..1443.01 rows=11 width=1) Total runtime: 29.736 ms After: db=# explain analyze select bool_and(b) from bools; Result (cost=0.03..0.04 rows=1 width=0) InitPlan 1 (returns $0) - Limit (cost=0.00..0.03 rows=1 width=1) - Index Scan using bools_b_idx on bools (cost=0.00..3300.28 rows=11 width=1) Index Cond: (b IS NOT NULL) Total runtime: 0.109 ms Original discussion here: http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=b4cjecue2-qcpb_zdrnpgqna8x...@mail.gmail.com PS: It seems that the min/max optimization isn't documented in the manual (apart from release notes), so I didn't include any doc changes in this patch. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
Merlin Moncure mmonc...@gmail.com writes: Getting back on point, I'm curious about your statement: without writing a single line of C. I took a look at the pl/scheme docs and was pretty impressed -- what exactly would be involved to get a guile-based ECMAscript working over the pl/scheme implementation? How would that interact exactly with the stated topic -- JSON support? Do you even need a json type if you have strong library based parsing and composition features? My understanding is that JSON is a subset of ECMAscript, so if you get the latter you already have the former. Now, someone would have to check if plscheme still build with guile-2.0, and given that, how exactly you get pl/ecmascript (or pl/js) out of that. I won't be in a position to spend time on that this year… Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 4:21 AM, Josh Berkus j...@agliodbs.com wrote: On 12/18/11 5:55 PM, Greg Stark wrote: There is another way to look at this problem. Perhaps it's worth having a checksum *even if* there are ways for the checksum to be spuriously wrong. Obviously having an invalid checksum can't be a fatal error then but it might still be useful information. Rright now people don't really know if their system can experience torn pages or not and having some way of detecting them could be useful. And if you have other unexplained symptoms then having checksum errors might be enough evidence that the investigation should start with the hardware and get the sysadmin looking at hardware logs and running memtest sooner. Frankly, if I had torn pages, even if it was just hint bits missing, I would want that to be logged. That's expected if you crash, but if you start seeing bad CRC warnings when you haven't had a crash? That means you have a HW problem. As long as the CRC checks are by default warnings, then I don't see a problem with this; it's certainly better than what we have now. It is an important problem, and also a big one, hence why it still exists. Throwing WARNINGs for normal events would not help anybody; thousands of false positives would just make Postgres appear to be less robust than it really is. That would be a credibility disaster. VMWare already have their own distro, so if they like this patch they can use it. The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Monday, December 19, 2011 12:10:11 PM Simon Riggs wrote: The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. Totally with you that its not 9.2 material. But I think if somebody actually wants to implement that that person would need to start discussing and implementing rather soon if it should be ready for 9.3. Just because its not geared towards the next release doesn't mean it OT. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 6:10 AM, Simon Riggs si...@2ndquadrant.com wrote: Throwing WARNINGs for normal events would not help anybody; thousands of false positives would just make Postgres appear to be less robust than it really is. That would be a credibility disaster. VMWare already have their own distro, so if they like this patch they can use it. Agreed on all counts. It seems to me that it would be possible to plug this hole by keeping track of which pages in shared_buffers have had unlogged changes to them since the last FPI. When you go to evict such a page, you write some kind of WAL record for it - either an FPI, or maybe a partial page image containing just the parts that might have been changed (like all the tuple headers, or whatever). This would be expensive, of course. The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. I'm not sure that I understand the dividing line you are drawing here. However, with respect to the implementation of this particular feature, it would be nice if we could arrange things so that space cost of the feature need only be paid by people who are using it. I think it would be regrettable if everyone had to give up 4 bytes per page because some people want checksums. Maybe I'll feel differently if it turns out that the overhead of turning on checksumming is modest, but that's not what I'm expecting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why do we need create tuplestore for each fetch?
On Thu, Dec 15, 2011 at 8:30 AM, 高增琦 pgf...@gmail.com wrote: I found this several days ago when I try to debug a fetch of cursor. And I have sent a mail to this list, but no one reply... Maybe this is a very simple problem, please help me, thanks a lot... Here is the example: create table t (a int); insert into t values (1),(3),(5),(7),(9); insert into t select a+1 from t; begin; declare c cursor for select * from t order by a; fetch 3 in c; fetch 3 in c; fetch 3 in c; In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT, and then a tuplestore will be created in 'FillPortalStore' in the fetch stmt's portal. In 'FillPortalStore', all result will be store at that tuplestore, Then, go back to 'PortalRun'; next, 'PortalRunSelect' will send this results to client... My problem is: why do we need create that tuplestore as an middle storeage? why do not we just send these result to clent at the first time? Good question. I wouldn't expect it to matter very much for a three-row fetch, but maybe it does for larger ones? What is your motivation for investigating this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
On Fri, Dec 16, 2011 at 2:06 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Yeah. Nikhil, Alex, this is the merged patch. Have a look that it still works for you (particularly the pg_dump bits) and I'll commit it. I adjusted the regression test a bit too. It seems hard to believe that ATExecDropConstraint() doesn't need any adjustment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote: To provide some form of keepalive on quiet systems the archive_keepalive_command provides a generic hook to implement keepalives. This is implemented as a separate command to avoid storing keepalive messages in the archive, or at least allow overwrites using a single filename like keepalive. This may be stupid of me, but I don't see the point of this. If you want keepalives, why use log shipping rather than SR? Implementing a really-high-latency method of passing protocol messages through the archive seems like a complex solution to a non-problem (but, like I say, I may be missing something). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On 12/19/2011 08:17 AM, Robert Haas wrote: If you want keepalives, why use log shipping rather than SR? Implementing a really-high-latency method of passing protocol messages through the archive seems like a complex solution to a non-problem The problem being addressed is how can people using archiving compute time-based lag usefully? Thinking about an answer to that question that made sense for SR drove us toward keepalive timestamp sharing. This is trying to introduce a mechanism good enough to do the same thing for regular archive recovery. In the archiving case, the worst case waiting to trip you up is always the one where not enough activity happened to generate a new WAL file yet. If people want lag to move correctly in that case anyway, a message needs to be transferred from archiver to recovery system. Simon is suggesting that we do that via shipping a new small file in that case, rather than trying to muck with putting it into the WAL data or something like that. It's a bit hackish, but a) no more hackish than people are used to for PITR, and b) in a way that avoids touching database code in the critical path for SR. This idea might eliminate the last of the reasons I was speculating on for adding more timestamps into the WAL stream. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
* Aidan Van Dyk (ai...@highrise.ca) wrote: But the scary part is you don't know how long *ago* the crash was. Because a hint-bit-only change w/ a torn-page is a non event in PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try and scrub every page in the database. Fair enough, but, could we distinguish these two cases? In other words, would it be possible to detect if a page was torn due to a 'traditional' crash and not complain in that case, but complain if there's a CRC failure and it *doesn't* look like a torn page? Perhaps that's a stretch, but if we can figure out that a page is torn already, then perhaps it's not so far fetched.. Thanks, Stephen (who is no expert on WAL/torn pages/etc) signature.asc Description: Digital signature
Re: [HACKERS] Page Checksums
* Aidan Van Dyk (ai...@highrise.ca) wrote: #) Anybody investigated putting the CRC in a relation fork, but not right in the data block? If the CRC contains a timestamp, and is WAL logged before the write, at least on reading a block with a wrong checksum, if a warning is emitted, the timestamp could be looked at by whoever is reading the warning and know tht the block was written shortly before the crash $X $PERIODS ago I do like the idea of putting the CRC info in a relation fork, if it can be made to work decently, as we might be able to then support it on a per-relation basis, and maybe even avoid the on-disk format change.. Of course, I'm sure there's all kinds of problems with that approach, but it might be worth some thinking about. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Command Triggers
On 12/18/2011 09:02 PM, Robert Haas wrote: If we're actually going to expose the parse tree, I think JSON (or even XML) would be a far better way to expose that than the existing nodeToString() output. Sure, you could make due with the nodeToString() output for some things, especially in PL/perl or PL/python. But JSON would be far better, since it's a standard format rather than something we just made up, and could be used in PL/pgsql as well, given proper support functions. That seems pretty sensible; I wouldn't want to make it a hard requirement though. There are three major ways this could go for 9.2: 1) Nothing is changed in core, the best that can be done is whatever you can cram into an extension. 2) 9.2 is released with Command Triggers using nodeToString() output as the detailed description. That part is labeled experimental and the API is expected to change completely in the next release 3) 9.2 gets enough JSON support to also have an early nodeToJSON API instead. Now it's labeled early release and some API changes may be needed in future releases. From the perspective of enabling a developer community to spring up around working in this area, I don't see a huge difference between (2) and (3). We're going in a direction I don't think is very well explored yet, by anyone. The idea that we're going to learn enough to accumulate a comprehensive, stable API before release is rather optimistic. There's something to be said for Dimitri's suggested approach from the perspective of ship it knowing it works for features A, B, then let's see what else the users think to do with it. We can't determine what feature complete looks like from what other people are doing anymore; only way to find out is to see what the world at large thinks of after release. Think of it as being like the EXPLAIN plan output. That shipped as just text, programs popped up to parse it anyway, they suffered some breaks with each new release. That was still a major win. Then, new easier to parse formats appeared, making the bar to entry for writing new such tool much lower. And the construction of a better API for output benefited from seeing what people had actually been struggling with before then. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Excerpts from Stephen Frost's message of lun dic 19 11:18:21 -0300 2011: * Aidan Van Dyk (ai...@highrise.ca) wrote: #) Anybody investigated putting the CRC in a relation fork, but not right in the data block? If the CRC contains a timestamp, and is WAL logged before the write, at least on reading a block with a wrong checksum, if a warning is emitted, the timestamp could be looked at by whoever is reading the warning and know tht the block was written shortly before the crash $X $PERIODS ago I do like the idea of putting the CRC info in a relation fork, if it can be made to work decently, as we might be able to then support it on a per-relation basis, and maybe even avoid the on-disk format change.. Of course, I'm sure there's all kinds of problems with that approach, but it might be worth some thinking about. I think the main objection to that idea was that if you lose a single page of CRCs you have hundreds of data pages which no longer have good CRCs. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote: * Aidan Van Dyk (ai...@highrise.ca) wrote: But the scary part is you don't know how long *ago* the crash was. Because a hint-bit-only change w/ a torn-page is a non event in PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try and scrub every page in the database. Fair enough, but, could we distinguish these two cases? In other words, would it be possible to detect if a page was torn due to a 'traditional' crash and not complain in that case, but complain if there's a CRC failure and it *doesn't* look like a torn page? No. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Excerpts from Dimitri Fontaine's message of dom dic 18 16:54:11 -0300 2011: Tom Lane t...@sss.pgh.pa.us writes: Hmm ... I don't think that I *am* ok with that. ISTM that we'd then find ourselves with any changes in utility statement parse trees amounting to a user-visible API break, and that's not an acceptable situation. Oh, you mean like exposing the parser for syntax coloring etc. I failed to see it's the same case. Do we have an acceptable proposal on that front yet? The conclusion that was reached during developer's meeting was that those interested should use a technique similar to the one used by the ecpg parser, namely use some sort of tool to transform the gram.y source code into something else (different productions). That idea is not useful to you here, I'm afraid. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to allow users to kill their own queries
On 12/18/2011 07:31 AM, Magnus Hagander wrote: * I restructured the if statements, because I had a hard time following the comments around that ;) I find this one easier - but I'm happy to change back if you think your version was more readable. That looks fine. I highlighted this because I had a feeling there was still some gain to be had here, just didn't see it myself. This works. * The error message in pg_signal_backend breaks the abstraction, because it specifically talks about terminating the other backend - when it's not supposed to know about that in that function. I think we either need to get rid of the hint completely, or we need to find a way to issue it from the caller. Or pass it as a parameter. It's fine for now since we only have two signals, but we might have more in the future.. I feel that including a hint in the pg_terminate_backend case is a UI requirement. If someone has made it as far as discovering that function exists, tries calling it, and it fails, the friendly thing to do is point them toward a direction that might work better. Little things like that make a huge difference in how friendly the software appears to its users; this is even more true in cases where version improvements actually expand what can and cannot be done. My quick and dirty side thinks that just documenting the potential future issues would be enough: Due to the limited number of callers of this function, the hint message here can be certain that pg_terminate_backend provides the only path to reach this point. If more callers to pg_signal_backend appear, a more generic hint mechanism might be needed here. If you must have this more generic mechanism available, I would accept re-factoring to provide it instead. What I wouldn't want to live with is a commit of this where the hint goes away completely. It's taken a long time chopping the specification to get this feature sorted out; we might as well make what's left be the best it can be now. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
On 12/17/2011 04:00 PM, Pavel Stehule wrote: I use it for checking of my most large plpgsql project - it is about 300KB plpgsql procedures - but this code is not free - and this module helps to find lot of bugs. Great. If you continue to check against that regularly, that makes me feel better. I was guessing you had a large body of such source code around, and knowing it executed correctly against all of it improves my confidence here. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
OS: I am using Windows server 2003 Version upgrade: hmm.. Is there any fix/change related to this issue in 9.0.6? If yes, I will upgrade in next scheduled downtime (I am using this as production server)... postgres queries are very occasionly used (a set of calls once in 30 minutes).. so I guess I am not calling my DB component heavily. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5086379.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Fri, Dec 16, 2011 at 10:32 AM, Noah Misch n...@leadboat.com wrote: I agree, but my point is that so far we have no callbacks that differ only in that detail. I accept that we'd probably want to avoid that. To illustrate what I had in mind, here's a version of your patch that has five callers sharing a callback. The patch is against d039fd51f79e, just prior to your recent commits. I don't think RenameRelation() ought to wait until we've taken the lock before doing this: + aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_NAMESPACE, + get_namespace_name(namespaceId)); I'm not sure how we can distinguished in a principled way between permissions checks that must be conducted before locking the relation and those that can be done afterwards. And I don't really want to make that distinction, anyhow. But I see your point, otherwise. What I'm tempted to do is define a new function RangeVarGetRelidCheckOwner(RangeVar *relation, LOCKMODE lockmode, bool system_tables_ok) that will arrange to look up the OID, check that you own it, optionally check that it's not a system table, and acquire the specified lock. Internally that will call RangeVarGetRelidExtended() with a callback; the callback arg can be used to pass through the system_tables_ok flag. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade with plpython is broken
Upgrading an instance containing plpython from =8.4 to =9.0 is broken because the module plpython.so was renamed to plpython2.so, and so the pg_upgrade check for loadable libraries fails thus: Your installation references loadable libraries that are missing from the new installation. etc. Installing a symlink fixes the issue. Should we teach pg_upgrade about this renaming, or should we install the symlink as part of the standard installation? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically the number of rows is just 1-2 thousand when the table already has 500K rows. Whenever the program is adding rows, the performance of the search query on the same table is very bad. The query uses the gin index and the tsearch ranking function ts_rank_cd. This never happened earlier with postgres 9.0 Is there a known issue with Postgres 9.1? Or how to report this problem? -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
On 19-12-2011 12:30, Sushant Sinha wrote: I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically the number of rows is just 1-2 thousand when the table already has 500K rows. Whenever the program is adding rows, the performance of the search query on the same table is very bad. The query uses the gin index and the tsearch ranking function ts_rank_cd. How bad is bad? It seems you are suffering from don't-fit-on-cache problem, no? This never happened earlier with postgres 9.0 Is there a known issue with Postgres 9.1? Or how to report this problem? Test case? Query times? Query plans? Are you sure you the compile options are the same? What about the configuration parameters? What is the exact version of both installations? -- Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
2011/12/19 Greg Smith g...@2ndquadrant.com: On 12/17/2011 04:00 PM, Pavel Stehule wrote: I use it for checking of my most large plpgsql project - it is about 300KB plpgsql procedures - but this code is not free - and this module helps to find lot of bugs. Great. If you continue to check against that regularly, that makes me feel better. I was guessing you had a large body of such source code around, and knowing it executed correctly against all of it improves my confidence here. I am not alone a subset is used in plpgsql_lint and I know about some commercial subjects that use it too. https://github.com/okbob/plpgsql_lint but code in check function is little newer. It can interesting test some code that is wroted by person with background from other db, because they use a different patterns I don't use a explicit cursors for example - on other hand, I use exception intensively in my last project. We can ask people from LedgerSMB about testing if somebody has contact Regards Pavel -- Greg Smith 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli gianni.cio...@2ndquadrant.it wrote: http://wiki.postgresql.org/wiki/Autonomous_subtransactions It is meant to be an ongoing project, requesting comments and contributions, rather than a conclusive document. In addition to what Jim Nasby said, this proposal seems a bit inflexible. In particular: 1. It limits us to exactly 2 autonomous transactions at any time (the main one and the subtransaction). 2. There's no reason why two autonomous transactions should have a main / sub relationship. They are autonomous -- they should not depend on the state of the outer transaction. Now, the initial implementation may well have such limits, but I think you should design your proposal to accommodate the above two features in the future without having to redesign the syntax. Maybe look at SAVEPOINTs for inspiration. There can be multiple savepoints in a single transaction, and they can be released/rolled back to, at any time. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote: That seems pretty sensible; I wouldn't want to make it a hard requirement though. There are three major ways this could go for 9.2: 1) Nothing is changed in core, the best that can be done is whatever you can cram into an extension. 2) 9.2 is released with Command Triggers using nodeToString() output as the detailed description. That part is labeled experimental and the API is expected to change completely in the next release 3) 9.2 gets enough JSON support to also have an early nodeToJSON API instead. Now it's labeled early release and some API changes may be needed in future releases. From the perspective of enabling a developer community to spring up around working in this area, I don't see a huge difference between (2) and (3). I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. People in PL/python and PL/perl may be a bit better off, but I see no reason to ship something for 9.2 and then break it for 9.3 when we could perfectly well make that compatibility break before the release. (And, in case it's not clear, yes, I am volunteering to do the work, if it comes down to that.) But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. On the flip side, I don't really see any other way to make this feature work at all. I think that AFTER CREATE triggers and BEFORE DROP triggers could potentially be implemented by just passing OIDs in, and that might be useful enough for many people. But what about ALTER? I don't see that you're going to be able to write any sort of meaningful triggers around ALTER without passing at least some of the parse tree information to the trigger function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Hi, On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote: On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote: That seems pretty sensible; I wouldn't want to make it a hard requirement though. There are three major ways this could go for 9.2: 1) Nothing is changed in core, the best that can be done is whatever you can cram into an extension. 2) 9.2 is released with Command Triggers using nodeToString() output as the detailed description. That part is labeled experimental and the API is expected to change completely in the next release 3) 9.2 gets enough JSON support to also have an early nodeToJSON API instead. Now it's labeled early release and some API changes may be needed in future releases. From the perspective of enabling a developer community to spring up around working in this area, I don't see a huge difference between (2) and (3). I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. People in PL/python and PL/perl may be a bit better off, but I see no reason to ship something for 9.2 and then break it for 9.3 when we could perfectly well make that compatibility break before the release. (And, in case it's not clear, yes, I am volunteering to do the work, if it comes down to that.) I personally think youre underestimating the complexity of providing a sensible json compatibility shim ontop the nodestring representation. But I would like to be proven wrong ;) Whats your idea? But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. I don't see any realistic way to present the data in way thats abstracted from the internals for now. The infrastructure is completely new and we don't really know what it will be used for. I personally have no problem requiring that any nontrivial nodestring access has to be done via c functions for now. Building a completely new form of parstree seems way too much effort/flamebait for me. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
2011/12/19 Robert Haas robertmh...@gmail.com: On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote: That seems pretty sensible; I wouldn't want to make it a hard requirement though. There are three major ways this could go for 9.2: 1) Nothing is changed in core, the best that can be done is whatever you can cram into an extension. 2) 9.2 is released with Command Triggers using nodeToString() output as the detailed description. That part is labeled experimental and the API is expected to change completely in the next release 3) 9.2 gets enough JSON support to also have an early nodeToJSON API instead. Now it's labeled early release and some API changes may be needed in future releases. From the perspective of enabling a developer community to spring up around working in this area, I don't see a huge difference between (2) and (3). I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. People in PL/python and PL/perl may be a bit better off, but I see no reason to ship something for 9.2 and then break it for 9.3 when we could perfectly well make that compatibility break before the release. (And, in case it's not clear, yes, I am volunteering to do the work, if it comes down to that.) absolutelly Parsing a some document is not a good way for plpgsql. We can enhance a diagnostics statement for triggers values. there should be lot of variables, and it is cheap because we can take content when it is requested STATEMENT_NAME, OBJECT_SCHEMA, OBJECT_NAME, OBJECT_TYPE, OBJECT_OID, ATTRIBUT_NAME, ATTRIBUT_OID, ATTRIBUT_TYPE, STATEMENT_PARAMETERS_ARRAY plpgsql is not good for recursive data - can be nice if all necessary data can be flat. Regards Pavel But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. On the flip side, I don't really see any other way to make this feature work at all. I think that AFTER CREATE triggers and BEFORE DROP triggers could potentially be implemented by just passing OIDs in, and that might be useful enough for many people. But what about ALTER? I don't see that you're going to be able to write any sort of meaningful triggers around ALTER without passing at least some of the parse tree information to the trigger function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Mon, Dec 19, 2011 at 11:20 AM, Andres Freund and...@anarazel.de wrote: I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. People in PL/python and PL/perl may be a bit better off, but I see no reason to ship something for 9.2 and then break it for 9.3 when we could perfectly well make that compatibility break before the release. (And, in case it's not clear, yes, I am volunteering to do the work, if it comes down to that.) I personally think youre underestimating the complexity of providing a sensible json compatibility shim ontop the nodestring representation. But I would like to be proven wrong ;) Whats your idea? I haven't gotten that far down into the minutiae yet. :-) But the reason I think it won't be too bad is because the current representation is darn close to JSON already: {VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 2 :varoattno 1 :location 9998} = {_:VAR,varno:2,varattno:1,vartype:25,vartypmod:-1,varcollid:100,varlevelsup:0,varnoold:2,varoattno:1,location:9998} If you've got something like: {OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false :opcollid 0 :inputcollid 100 :args ({VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 2 :varoattno 1 :location 9998} {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location 10009}) :location 10007} ...then the value for the args label will just be an object rather than an integer. But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. I don't see any realistic way to present the data in way thats abstracted from the internals for now. The infrastructure is completely new and we don't really know what it will be used for. That's my feeling as well, but I'm hoping Tom or someone else has a better idea. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On Mon, Dec 19, 2011 at 10:02 AM, pratikchirania pratik.chira...@hp.com wrote: Version upgrade: hmm.. Is there any fix/change related to this issue in 9.0.6? You could read the release notes for those minor version upgrades. Based on a quick look through the commit logs, and a quick grep of release-9-0.sgml, I don't think so. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping : in .pgpass - code or docs bug?
On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstrom reeds...@rice.edu wrote: On Fri, Dec 16, 2011 at 02:55:09PM +, Richard Huxton wrote: According to the docs [1], you should escape embedded colons in .pgpass (fair enough). Below is PG 9.1.1 user = te:st, db = te:st, password = te:st $ cat ~/.pgpass *:*:te:st:te:st:te:st $ psql91 -U te:st -d te:st te:st= $ cat ~/.pgpass *:*:te\:st:te\:st:te:st $ psql91 -U te:st -d te:st te:st= $ cat ~/.pgpass *:*:te\:st:te\:st:te\:st $ psql91 -U te:st -d te:st psql: FATAL: password authentication failed for user te:st password retrieved from file /home/richardh/.pgpass I'm a bit puzzled how it manages without the escaping in the first case. There's a lack of consistency though that either needs documenting or fixing. Hmm, seems the code in fe-connect.c that reads the password out of .pgpass does this: if ((t = pwdfMatchesString(t, hostname)) == NULL || (t = pwdfMatchesString(t, port)) == NULL || (t = pwdfMatchesString(t, dbname)) == NULL || (t = pwdfMatchesString(t, username)) == NULL) [...] pwdfMatchesString 'eats' the stringbuffer until the next unmatched character or unescaped colon. If it falls out the bottom of that, the rest of the line is returned as the candidate password. Since the code that does the backslash detection is in pwdfMatchesString(), and the password never goes through that function, the escapes are not cleaned up. This should either be fixed by changing the documentation to say to not escape colons or backslashes in the password part, only, or modify this function (PasswordFromFile) to silently unescape the password string. It already copies it. My vote is for a doc correction in the back-branches and a behavior change in master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Mon, Dec 19, 2011 at 10:11:23AM -0500, Robert Haas wrote: On Fri, Dec 16, 2011 at 10:32 AM, Noah Misch n...@leadboat.com wrote: I agree, but my point is that so far we have no callbacks that differ only in that detail. ?I accept that we'd probably want to avoid that. To illustrate what I had in mind, here's a version of your patch that has five callers sharing a callback. ?The patch is against d039fd51f79e, just prior to your recent commits. I don't think RenameRelation() ought to wait until we've taken the lock before doing this: + aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, ACL_KIND_NAMESPACE, + get_namespace_name(namespaceId)); I'm not sure how we can distinguished in a principled way between permissions checks that must be conducted before locking the relation and those that can be done afterwards. And I don't really want to make that distinction, anyhow. Here's the rule I used: the pre-lock checks must prove that the user could, by some command working as-designed, have taken a lock at least as strong as the one we're about to acquire. How does that work out in practice? Relation owners can always take AccessExclusiveLock by way of a DROP command, so ownership is always sufficient as a pre-lock test. The above namespace check is no exception; the object owner has authority to take the AccessExclusiveLock independent of his authority to ultimately complete the rename. (Changes arising from the recent discussions about locking namespaces during DDL would complicate the situation in other ways.) But I see your point, otherwise. What I'm tempted to do is define a new function RangeVarGetRelidCheckOwner(RangeVar *relation, LOCKMODE lockmode, bool system_tables_ok) that will arrange to look up the OID, check that you own it, optionally check that it's not a system table, and acquire the specified lock. Internally that will call RangeVarGetRelidExtended() with a callback; the callback arg can be used to pass through the system_tables_ok flag. Is the system catalog check special? Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array behavior
On Fri, Dec 16, 2011 at 00:15, amit sehas cu...@yahoo.com wrote: There is the TOAST mechanism for oversized tuples but that is still considered to be a single tuple. Is there any circumstance in which an attribute which is an array will be broken up into individual tuples which are somehow associated with the main tuple. I think you've misunderstood TOAST. That's exactly what TOAST does now. TOAST works by storing individual *attributes* (values) in a separate table -- not entire tuples. All smaller columns in a row are still stored together with other rows, and only large fields are split out. and additionally it may have undesirable performance impact if the queries are not even interested in seeing the array when fetching the object ? If your query doesn't touch the TOASTed field, it's not even read from the TOAST table. However, large values that are smaller than the TOAST threshold (2kB compressed) sometimes cause problems, since they're stored inline in the heap, bloating the table size and thus increasing the time it takes to scan it. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgstat wait timeout
On 12/19/2011 11:45 AM, Robert Haas wrote: On Mon, Dec 19, 2011 at 10:02 AM, pratikchiraniapratik.chira...@hp.com wrote: Version upgrade: hmm.. Is there any fix/change related to this issue in 9.0.6? You could read the release notes for those minor version upgrades. Based on a quick look through the commit logs, and a quick grep of release-9-0.sgml, I don't think so. Would this be alleviated by setting stats_temp_dir to point to a ramdisk? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote: On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote: * Aidan Van Dyk (ai...@highrise.ca) wrote: But the scary part is you don't know how long *ago* the crash was. Because a hint-bit-only change w/ a torn-page is a non event in PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try and scrub every page in the database. Fair enough, but, could we distinguish these two cases? In other words, would it be possible to detect if a page was torn due to a 'traditional' crash and not complain in that case, but complain if there's a CRC failure and it *doesn't* look like a torn page? No. Would you be so kind as to elucidate this a bit? Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
It seems hard to believe that ATExecDropConstraint() doesn't need any adjustment. Yeah, I think we could return early on for only type of constraints. Also, shouldn't the systable scan break out of the while loop after a matching constraint name has been found? As of now, it's doing a full scan of all the constraints for the given relation. @@ -6755,6 +6765,7 @@ ATExecDropConstraint(Relation rel, const char *constrName, HeapTupletuple; boolfound = false; boolis_check_constraint = false; +boolis_only_constraint = false; /* At top level, permission check was done in ATPrepCmd, else do it */ if (recursing) @@ -6791,6 +6802,12 @@ ATExecDropConstraint(Relation rel, const char *constrName, /* Right now only CHECK constraints can be inherited */ if (con-contype == CONSTRAINT_CHECK) is_check_constraint = true; + +if (con-conisonly) +{ +Assert(is_check_constraint); +is_only_constraint = true; +} /* * Perform the actual constraint deletion @@ -6802,6 +6819,9 @@ ATExecDropConstraint(Relation rel, const char *constrName, performDeletion(conobj, behavior); found = true; + +/* constraint found - break from the while loop now */ +break; } systable_endscan(scan); @@ -6830,7 +6850,7 @@ ATExecDropConstraint(Relation rel, const char *constrName, * routines, we have to do this one level of recursion at a time; we can't * use find_all_inheritors to do it in one pass. */ -if (is_check_constraint) +if (is_check_constraint !is_only_constraint) children = find_inheritance_children(RelationGetRelid(rel), lockmode); else children = NIL; PFA, revised version containing the above changes based on Alvaro's v4 patch. Regards, Nikhils non_inh_check_v5.0.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
On Mon, Dec 19, 2011 at 17:30, Sushant Sinha sushant...@gmail.com wrote: This never happened earlier with postgres 9.0 Is there a known issue with Postgres 9.1? Or how to report this problem? What *did* change in 9.1 is that there's new GIN cost estimation in the planner. If you do EXPLAIN ANALYZE for your queries, do the plans differ for 9.0 or 9.1? E.g. doing an index scan instead of a seq scan or vice versa. The query uses the gin index and the tsearch ranking function ts_rank_cd. Another thought -- have you read about the GIN fast updates feature? This existed in 9.0 too. Instead of updating the index directly, GIN appends all changes to a sequential list, which needs to be scanned in whole for read queries. The periodic autovacuum process has to merge these values back into the index. Maybe the solution is to tune autovacuum to run more often on the table. http://www.postgresql.org/docs/9.1/static/gin-implementation.html Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Monday, December 19, 2011 03:33:22 PM Alvaro Herrera wrote: Excerpts from Stephen Frost's message of lun dic 19 11:18:21 -0300 2011: * Aidan Van Dyk (ai...@highrise.ca) wrote: #) Anybody investigated putting the CRC in a relation fork, but not right in the data block? If the CRC contains a timestamp, and is WAL logged before the write, at least on reading a block with a wrong checksum, if a warning is emitted, the timestamp could be looked at by whoever is reading the warning and know tht the block was written shortly before the crash $X $PERIODS ago I do like the idea of putting the CRC info in a relation fork, if it can be made to work decently, as we might be able to then support it on a per-relation basis, and maybe even avoid the on-disk format change.. Of course, I'm sure there's all kinds of problems with that approach, but it might be worth some thinking about. I think the main objection to that idea was that if you lose a single page of CRCs you have hundreds of data pages which no longer have good CRCs. Which I find a pretty non-argument because there is lots of SPOF data in a cluster (WAL, control record) anyway... If recent data starts to fail you have to restore from backup anyway. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
* David Fetter (da...@fetter.org) wrote: On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote: On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote: Fair enough, but, could we distinguish these two cases? In other words, would it be possible to detect if a page was torn due to a 'traditional' crash and not complain in that case, but complain if there's a CRC failure and it *doesn't* look like a torn page? No. Would you be so kind as to elucidate this a bit? I'm guessing, based on some discussion on IRC, that it's because we don't really 'detect' torn pages today, when it's due to a hint-bit-only update. With all the trouble due to hint-bit updates, and if they're written out or not, makes me wish we could just avoid doing hint-bit only updates to disk somehow.. Or log them when we do them. Both of those have their own drawbacks, of course. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Page Checksums
* Andres Freund (and...@anarazel.de) wrote: On Monday, December 19, 2011 03:33:22 PM Alvaro Herrera wrote: I do like the idea of putting the CRC info in a relation fork, if it can be made to work decently, as we might be able to then support it on a per-relation basis, and maybe even avoid the on-disk format change.. I think the main objection to that idea was that if you lose a single page of CRCs you have hundreds of data pages which no longer have good CRCs. Which I find a pretty non-argument because there is lots of SPOF data in a cluster (WAL, control record) anyway... If recent data starts to fail you have to restore from backup anyway. I agree with Andres on this one.. Also, if we use CRC on the pages in the CRC, hopefully we'd be able to detect when a bad block impacted the CRC fork and differentiate that from a whole slew of bad blocks in the heap.. There might be an issue there with handling locking and having to go through the page-level lock on the CRC, which locks a lot more pages in the heap and therefore reduces scalability.. Don't we have a similar issue with the visibility map though? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
On Mon, 2011-12-19 at 19:08 +0200, Marti Raudsepp wrote: Another thought -- have you read about the GIN fast updates feature? This existed in 9.0 too. Instead of updating the index directly, GIN appends all changes to a sequential list, which needs to be scanned in whole for read queries. The periodic autovacuum process has to merge these values back into the index. Maybe the solution is to tune autovacuum to run more often on the table. http://www.postgresql.org/docs/9.1/static/gin-implementation.html Regards, Marti Probably this is the problem. Is running vacuum analyze under psql is the same as autovacuum? -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
On Mon, 2011-12-19 at 12:41 -0300, Euler Taveira de Oliveira wrote: On 19-12-2011 12:30, Sushant Sinha wrote: I recently upgraded my postgres server from 9.0 to 9.1.2 and I am finding a peculiar problem.I have a program that periodically adds rows to this table using INSERT. Typically the number of rows is just 1-2 thousand when the table already has 500K rows. Whenever the program is adding rows, the performance of the search query on the same table is very bad. The query uses the gin index and the tsearch ranking function ts_rank_cd. How bad is bad? It seems you are suffering from don't-fit-on-cache problem, no? The memory is 32GB and the entire database is just 22GB. Even vmstat 1 does not show any disk activity. I was not able to isolate the performance numbers since I have observed this only on the production box where the number of requests keep increasing as the box gets loaded. But a query that takes 1sec normally is taking more than 10secs (not sure whether it got the same number of CPU cycles). Is there a way to find that? -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
Andres Freund and...@anarazel.de writes: On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote: I do. Anyone coding in PL/pgsql is going to find the nodeToString() output unusable, and we can easily provide a built-in JSON datatype with enough richness to make that problem go away in time for 9.2. I hear the sound of goalposts moving. I personally think youre underestimating the complexity of providing a sensible json compatibility shim ontop the nodestring representation. But I would like to be proven wrong ;) If we were going to do this at all, the way to do it is to flush the existing nodestring representation and redefine it as being JSON. No? If this is as easy as people are claiming, it should not be hard to revise the lower-level bits of outfuncs/readfuncs to make the text representation compatible. And there's no reason we can't change what is stored in pg_rewrite. But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. The problem that changing the nodestring representation could help with is making user-written triggers roughly as robust as C code is, to wit that you don't have to change it as long as the specific fields it touches aren't redefined. The question is whether that's good enough. The DROP command changes provide a pretty strong clue that it isn't. Admittedly, that's not the sort of change we make too often. But I will be seriously annoyed if we start getting the sort of pushback on parsetree changes that we've been hearing from certain quarters about configuration file changes. Those structures are *internal* and we have got to have the flexibility to whack them around. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
On 2011-12-19 18:08, Marti Raudsepp wrote: The query uses the gin index and the tsearch ranking function ts_rank_cd. Another thought -- have you read about the GIN fast updates feature? This existed in 9.0 too. Instead of updating the index directly, GIN appends all changes to a sequential list, which needs to be scanned in whole for read queries. The periodic autovacuum process has to merge these values back into the index. Maybe the solution is to tune autovacuum to run more often on the table. http://www.postgresql.org/docs/9.1/static/gin-implementation.html I have to say that I consistently have to turn fastupdate off for our heavily updated gin-indexes. The overall performance gain may be measurable, but its not intolerable without. The spikes seen from the applications, when cleanup happens. Either in the foreground or in the background are not tolerable. (multiple seconds). I may just not have experienced suffienctly with the various sizes of work_mem, but I would indeed love to have a connection only fastupdate, so within a single transaction it could use the fastupdate technique, but not stuffing index-updates onto unreleated queries by random. Jesper -- Jesper -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On Mon, Dec 19, 2011 at 1:17 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote: To provide some form of keepalive on quiet systems the archive_keepalive_command provides a generic hook to implement keepalives. This is implemented as a separate command to avoid storing keepalive messages in the archive, or at least allow overwrites using a single filename like keepalive. This may be stupid of me, but I don't see the point of this. If you want keepalives, why use log shipping rather than SR? On Dec 12, you said It also strikes me that anything that is based on augmenting the walsender/walreceiver protocol leaves anyone who is using WAL shipping out in the cold. I'm not clear from the comments you or Simon have made how important you think that use case still is. Not wanting to leave anyone out in the cold, I proposed something to enhance file based replication also. In any case, multiple others have requested this feature, so its worth doing even if you have changed your mind. Implementing a really-high-latency method of passing protocol messages through the archive seems like a complex solution to a non-problem (but, like I say, I may be missing something). So a) it is a problem, and b) its not complex. The proposed method doesn't necessarily use the archive. Allowing users to specify how the keepalive will work makes it a flexible solution to a widely recognised problem. This proposal doesn't replace the protocol keepalive for streaming replication, it provides exactly the same thing for file based replication users. Many people use both streaming and file-based, so need a way to measure latency that acts similarly no matter which one is currently in use. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
Excerpts from Alex Hunsaker's message of vie dic 16 18:07:05 -0300 2011: On Fri, Dec 16, 2011 at 14:01, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Alex Hunsaker's message of vie dic 16 17:50:12 -0300 2011: On Fri, Dec 16, 2011 at 12:06, Alvaro Herrera alvhe...@commandprompt.com wrote: Yeah. Nikhil, Alex, this is the merged patch. Have a look that it still works for you (particularly the pg_dump bits) and I'll commit it. I adjusted the regression test a bit too. Other than the version checks seem to be off by one looks fine. Uhm ... you're right that convalidated is present in 9.1 [...] So I don't think we really need to add a separate branch for 9.1 here, but it certainly needs a comment improvement. Hrm... What am I missing? I was saying that it should all be = 9.2. There are no convalidated=false check constraints in 9.1, so the extra branch is useless. This is sufficient: @@ -6019,8 +6019,13 @@ getTableAttrs(TableInfo *tblinfo, int numTables) tbinfo-dobj.name); resetPQExpBuffer(q); - if (g_fout-remoteVersion = 90100) + if (g_fout-remoteVersion = 90200) { + /* +* conisonly and convalidated are new in 9.2 (actually, the latter +* is there in 9.1, but it wasn't ever false for check constraints +* until 9.2). +*/ appendPQExpBuffer(q, SELECT tableoid, oid, conname, pg_catalog.pg_get_constraintdef(oid) AS consrc, conislocal, convalidated, conisonly -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On 12/19/2011 11:12 AM, Robert Haas wrote: But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. Any interface here would need to be in the same sense Linux uses the term: subject to change in every major version, and maybe even in a minor one if that's the best way to solve a higher priority issue. An example we've been consuming that comes to mind is the API for keeping processes from being killed by the OOM killer. It's far from stable: http://archives.postgresql.org/message-id/4ce5e437.7080...@2ndquadrant.com but it's still possible for users of it to keep up with new releases, and when feasible some work toward backward compatibility is done (but not always) As a tool author, I would expect anything working at the level where the data needed is only available from the parse tree would need to be re-tested against each new version, and then have version specific changes as needed. Setting the expectations bar any higher for consumers of that interface would be unrealistic. The minority of people who'd like to use this feature shouldn't want to see PostgreSQL development hamstrung for the majority either, and the standards for breakage here should be clear from the beginning--unlike those for, say, GUC changes between releases. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote: Attached is a patch that addresses the todo item Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation. http://archives.postgresql.org/pgsql-general/2010-10/msg00332.php Instead of returning an error, they now return NULL if the OID is found in pg_class when using SnapshotAny. I only applied it to 4 functions: select pg_relation_size, pg_total_relation_size, pg_table_size and pg_indexes_size. These are the ones that were using relation_open(). I changed them to using try_relation_open(). For three of them I had to move the try_relation_open() call up one level in the call stack and change the parameter types for some support functions from Oid to Relation. They now also call a new function relation_recently_dead() which is what checks for relation in SnapshotAny. All regression tests passed. Is SnapshotAny the snapshot I should be using? It seems to get the correct results. I can drop a table and I get NULL. Then after a vacuumdb it returns an error. Something I'd like to point out myself is that I need to be using ObjectIdAttributeNumber instead of Anum_pg_class_relfilenode. Probably just luck that I got the intended results before. This will also change the logic in a few other places. Still not clear on the semantics of Snapshot{Any|Self|Now|Dirty}. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Sun, Dec 18, 2011 at 4:22 PM, Jim Nasby j...@nasby.net wrote: On Dec 18, 2011, at 2:28 AM, Gianni Ciolli wrote: I have written some notes about autonomous subtransactions, which have already been touched (at least) in two separate threads; please find them at http://wiki.postgresql.org/wiki/Autonomous_subtransactions The document seems to mix the terms subtransaction and autonomous transaction. That's going to generate a ton of confusion, because both terms already have meaning associated with them: - Autonomous transaction means you can execute something outside of your current transaction and it is in no way effected by the current transaction (doesn't matter if T0 commits or not). - Subtransactions are an alternative to savepoints. They allow you to break a large transaction into smaller chunks, but if T0 doesn't commit then none of the subtransactions do either. OK, perhaps we should just stick to the term Autonomous Transaction. That term is in common use, even if the usage is otherwise exactly the same as a subtransaction i.e. main transaction stops until the subtransaction is complete. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Mon, Dec 19, 2011 at 3:52 PM, Marti Raudsepp ma...@juffo.org wrote: On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli gianni.cio...@2ndquadrant.it wrote: http://wiki.postgresql.org/wiki/Autonomous_subtransactions It is meant to be an ongoing project, requesting comments and contributions, rather than a conclusive document. In addition to what Jim Nasby said, this proposal seems a bit inflexible. In particular: 1. It limits us to exactly 2 autonomous transactions at any time (the main one and the subtransaction). It's not clear to me why you think there would be a limitation to exactly 2 autonomous transactions. The intention would be to have a theoretically unlimited number, subject to resource limits, just as normal subtransactions already do. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers
On Mon, Dec 19, 2011 at 12:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: I personally think youre underestimating the complexity of providing a sensible json compatibility shim ontop the nodestring representation. But I would like to be proven wrong ;) If we were going to do this at all, the way to do it is to flush the existing nodestring representation and redefine it as being JSON. No? If this is as easy as people are claiming, it should not be hard to revise the lower-level bits of outfuncs/readfuncs to make the text representation compatible. And there's no reason we can't change what is stored in pg_rewrite. I thought about that. A quick inspection suggests that this would slightly increase the size of the stored rules, because the node type would need to become a key, which would add at least a few more characters, and also because we'd need more quoting. That is: {THING :wump 1} becomes, at a minimum: {_: THING, wump: 1} And that's using a somewhat-lame single character label for the node tag. Now, I suspect that in practice the cost of the stored rules becoming slightly larger is negligible, so maybe it's worth it. But before we get bogged down in the data representation, I think we need to make a more fundamental decision: to what extent are we OK with exporting the parse tree at all, in any form? Tom is arguing that we shouldn't, and I see his point: the recent DROP command rework would have broken everybody's command triggers if we had adopted this proposal, and that would be a real shame, because I don't particularly like the idea that we can't continue to improve the code and refactor things because someone out there might be depending on an older and less well-considered behavior. The problem that changing the nodestring representation could help with is making user-written triggers roughly as robust as C code is, to wit that you don't have to change it as long as the specific fields it touches aren't redefined. The question is whether that's good enough. Agreed. The DROP command changes provide a pretty strong clue that it isn't. Admittedly, that's not the sort of change we make too often. But I will be seriously annoyed if we start getting the sort of pushback on parsetree changes that we've been hearing from certain quarters about configuration file changes. Those structures are *internal* and we have got to have the flexibility to whack them around. Yes. Maybe we should try to split the baby here and defer the question of whether to expose any of the parse tree internals, and if so how much, to a future release. It seems to me that we could design a fairly useful set of functionality around AFTER-CREATE, BEFORE-DROP, and maybe even AFTER-ALTER triggers without exposing any parse tree details. For CREATE and ALTER, that would make it the client's responsibility to inspect the system catalogs and figure out what had happened and what to do about it, which is admittedly not ideal, but it would be more than we have now, and it would then give us the option to consider requests to expose more information in future releases on a case-by-case basis, rather than making a blanket decision about whether to expose the parse tree or not. I have a sneaking suspicion that, while we probably can't get by without exposing any parse tree information ever, the amount we truly need to expose might end up being only a small percentage of the total... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Mon, Dec 19, 2011 at 11:55 AM, Noah Misch n...@leadboat.com wrote: Here's the rule I used: the pre-lock checks must prove that the user could, by some command working as-designed, have taken a lock at least as strong as the one we're about to acquire. How does that work out in practice? Relation owners can always take AccessExclusiveLock by way of a DROP command, so ownership is always sufficient as a pre-lock test. The above namespace check is no exception; the object owner has authority to take the AccessExclusiveLock independent of his authority to ultimately complete the rename. (Changes arising from the recent discussions about locking namespaces during DDL would complicate the situation in other ways.) Yes: and I have it mind to work on that for the current release cycle, time permitting. It seems to me that there's no real value in consolidating this callback if there's a change coming down the pipe that would require us to split it right back out again. Also, while I see the point that it wouldn't be a security issue to defer this particular check until after the lock is taken, I still don't think it's a good idea. I basically don't see any point in splitting up the security checks across multiple functions. Our permissions-checking logic is already rather diffuse; I think that finding more reasons for some of it to be done in one function and some of it to be done in some other function is going in the wrong direction. It makes it easier for future people editing the code to rearrange things in ways that subtly break security without realizing it, and it's also one more obstacle for things like sepgsql which would like to get control whenever we do a security check. (Dimitri's recent comments about command triggers suggest that MAC isn't the only application of an option to override the default access control policy.) It's also not terribly consistent from a user perspective: hmm, if I don't have permission to do operation X because of reason A, then my command fails immediately; if I don't have permission because of reason B, then it waits for a lock and then fails. Some amount of inconsistency there is probably inevitable, because, for example, we won't know whether you have permission on an entire inheritance hierarchy until we start walking it. But I don't really see the point in going out of our way to add more of it. I think it's important to keep in mind that most of the code that is going into those callbacks is just being moved. We're not really ending up with any more code overall, except to the extent that there are a couple of lines of net increase for each callback because, well, it has a comment, and maybe a declaration, and some curly braces at the beginning and the end. The ownership check is two lines of code; it doesn't matter whether we duplicate that or not. In many cases, I think the callbacks are actually increasing readability, by pulling a bunch of related checks into their own function instead of cluttering up the main code path with them. I don't want to end up with a lot of needless code duplication, but I also don't feel any powerful need to squeeze the number of callback functions down to an absolute minimum just because I can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 12/19/2011 07:50 AM, Robert Haas wrote: On Mon, Dec 19, 2011 at 6:10 AM, Simon Riggssi...@2ndquadrant.com wrote: The only sensible way to handle this is to change the page format as discussed. IMHO the only sensible way that can happen is if we also support an online upgrade feature. I will take on the online upgrade feature if others work on the page format issues, but none of this is possible for 9.2, ISTM. I'm not sure that I understand the dividing line you are drawing here. There are three likely steps to reaching checksums: 1) Build a checksum mechanism into the database. This is the straighforward part that multiple people have now done. 2) Rework hint bits to make the torn page problem go away. Checksums go elsewhere? More WAL logging to eliminate the bad situations? Eliminate some types of hint bit writes? It seems every alternative has trade-offs that will require serious performance testing to really validate. 3) Finally tackle in-place upgrades that include a page format change. One basic mechanism was already outlined: a page converter that knows how to handle two page formats, some metadata to track which pages have been converted, a daemon to do background conversions. Simon has some new ideas here too (online upgrade involves two clusters kept in sync on different versions, slightly different concept than the current in-place upgrade). My recollection is that the in-place page upgrade work was pushed out of the critical path before due to lack of immediate need. It wasn't necessary until a) a working catalog upgrade tool was validated and b) a bite-size feature change to test it on appeared. We have (a) now in pg_upgrade, and CRCs could be (b)--if the hint bit issues are sorted first. What Simon was saying is that he's got some interest in (3), but wants no part of (2). I don't know how much time each of these will take. I would expect that (2) and (3) have similar scopes though--many days, possibly a few months, of work--which means they both dwarf (1). The part that's been done is the visible tip of a mostly underwater iceburg. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] reprise: pretty print viewdefs
A client has again raised with me the ugliness of pretty printed viewdefs. We looked at this a couple of years ago, but discussion went off into the weeds slightly, so I dropped it, but as requested I'm revisiting it. The problem can be simply seen in the screenshot here: http://developer.postgresql.org/~adunstan/view_before.png This is ugly, unreadable and unusable, IMNSHO. Calling it pretty is just laughable. The simple solution I originally proposed to put a line feed and some space before every target field in pretty print mode. This is a two line patch. The downsides are a) maybe not everyone will like the change and b) it will produce superfluous newlines, e.g. before CASE expressions. We can avoid the superfluous newlines at the cost of some code complexity. Whether it's worth it is a judgement call. If we don't want to do this unconditionally, we'd need either a new function which would make it available or a new flavor of pg_get_viewdef - if the latter I'm not really sure what the new signatures should be - oid/text | something not boolean, but what? Personally, I'd much rather just do it. Does anyone *really* like the present output? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
On 12/14/2011 09:02 AM, Shigeru Hanada wrote: Here I'd like to propose three incremental patches: 1) fdw_helper_funcs_v3.patch...: This is not specific to pgsql_fdw, but probably useful for every FDWs which use FDW options... 2) pgsql_fdw_v5.patch: This patch provides simple pgsql_fdw which does *NOT* support any push-down... 3) pgsql_fdw_pushdown_v1.patch: This patch adds limited push-down capability to pgsql_fdw which is implemented by previous patch... ... To implement [expression which uses user-defined function], I added exprFunction to nodefuncs.c which returns Oid of function which is used in the expression node, but I'm not sure that it should be there. Should we have it inside pgsql_fdw? After failing to bring some light onto this during my general update, will try again here. We now have 3 updated patches that refactor things from how this was originally presented, with one asked implementation question. There's also a spawned off Join push-down for foreign tables patch off in another thread. I don't think it's really clear to everyone what state this feature proposal is in. We've gotten bits of review here from KaiGai and Heikki, big picture comments from Robert and Tom. Given how these are structured, is fdw_helper_funcs_v3.patch at the point where it should be considered for committer review? Maybe pgsql_fdw_v5.patch too? The others seem to be more in flux to me, due to all the recent pushdown changes. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Greg Smith g...@2ndquadrant.com wrote: 2) Rework hint bits to make the torn page problem go away. Checksums go elsewhere? More WAL logging to eliminate the bad situations? Eliminate some types of hint bit writes? It seems every alternative has trade-offs that will require serious performance testing to really validate. I'm wondering whether we're not making a mountain out of a mole-hill here. In real life, on one single crash, how many torn pages with hint-bit-only updates do we expect on average? What's the maximum possible? In the event of a crash recovery, can we force all tables to be seen as needing autovacuum? Would there be a way to limit this to some subset which *might* have torn pages somehow? It seems to me that on a typical production system you would probably have zero or one such page per OS crash, with zero being far more likely than one. If we can get that one fixed (if it exists) before enough time has elapsed for everyone to forget the OS crash, the idea that we would be scaring the users and negatively affecting the perception of reliability seems far-fetched. The fact that they can *have* page checksums in PostgreSQL should do a lot to *enhance* the PostgreSQL reputation for reliability in some circles, especially those getting pounded with FUD from competing products. If a site has so many OS or hardware failures that they lose track -- well, they really should be alarmed. Of course, the fact that you may hit such a torn page in a situation where all data is good means that it shouldn't be more than a warning. This seems as though it eliminates most of the work people have been suggesting as necessary, and makes the submitted patch fairly close to what we want. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why do we need two snapshots per query?
This feature has now passed through review by Dimitri with him no longer having anything to say about it. I've marked it ready for committer now. Seems the main decision left here is whether another committer wants to take a look at this, or if Robert wants to take a spin on the buildfarm wheel by committing it himself. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 12:07 PM, David Fetter da...@fetter.org wrote: On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote: On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote: * Aidan Van Dyk (ai...@highrise.ca) wrote: But the scary part is you don't know how long *ago* the crash was. Because a hint-bit-only change w/ a torn-page is a non event in PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try and scrub every page in the database. Fair enough, but, could we distinguish these two cases? In other words, would it be possible to detect if a page was torn due to a 'traditional' crash and not complain in that case, but complain if there's a CRC failure and it *doesn't* look like a torn page? No. Would you be so kind as to elucidate this a bit? Well, basically, Stephen's proposal was pure hand-waving. :-) I don't know of any magic trick that would allow us to know whether a CRC failure looks like a torn page. The only information we're going to get is the knowledge of whether the CRC matches or not. If it doesn't, it's fundamentally impossible for us to know why. We know the page contents are not as expected - that's it! It's been proposed before that we could examine the page, consider all the unset hint bits that could be set, and try all combinations of setting and clearing them to see whether any of them produce a valid CRC. But, as Tom has pointed out previously, that has a really quite large chance of making a page that's *actually* been corrupted look OK. If you have 30 or so unset hint bits, odds are very good that some combination will produce the 32-CRC you're expecting. To put this another way, we currently WAL-log just about everything. We get away with NOT WAL-logging some things when we don't care about whether they make it to disk. Hint bits, killed index tuple pointers, etc. cause no harm if they don't get written out, even if some other portion of the same page does get written out. But as soon as you CRC the whole page, now absolutely every single bit on that page becomes critical data which CANNOT be lost. IOW, it now requires the same sort of protection that we already need for our other critical updates - i.e. WAL logging. Or you could introduce some completely new mechanism that serves the same purpose, like MySQL's double-write buffer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On Mon, Dec 19, 2011 at 2:16 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It seems to me that on a typical production system you would probably have zero or one such page per OS crash, with zero being far more likely than one. If we can get that one fixed (if it exists) before enough time has elapsed for everyone to forget the OS crash, the idea that we would be scaring the users and negatively affecting the perception of reliability seems far-fetched. The problem is that you can't fix them. If you come to a page with a bad CRC, you only have two choices: take it seriously, or don't. If you take it seriously, then you're complaining about something that may be completely benign. If you don't take it seriously, then you're ignoring something that may be a sign of data corruption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Mon, Dec 19, 2011 at 20:34, Simon Riggs si...@2ndquadrant.com wrote: It's not clear to me why you think there would be a limitation to exactly 2 autonomous transactions. Sorry my bad, I didn't read the proposal carefully. Nesting does indeed allow multiple autonomous subtransactions. Maybe that's just my paranoia, but the fact that subtransactions aren't named means it's pretty easy to accidentally get out of step and commit the wrong subtransaction. I see app developers often messing up BEGIN/COMMIT/ROLLBACK already. This is why I like the SAVEPOINT style; it's obvious when there's a bug. (I do realize that allowing subtransactions to commit out of order also makes it failure prone) Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping : in .pgpass - code or docs bug?
On 19/12/11 16:48, Robert Haas wrote: On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstromreeds...@rice.edu wrote: This should either be fixed by changing the documentation to say to not escape colons or backslashes in the password part, only, or modify this function (PasswordFromFile) to silently unescape the password string. It already copies it. My vote is for a doc correction in the back-branches and a behavior change in master. Seems sensible - presumably mentioning this will be corrected in 9.2? It's clearly not what you'd call urgent since nobody else seems to have noticed before now. -- Richard Huxton Archonet Ltd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Escaping : in .pgpass - code or docs bug?
Excerpts from Richard Huxton's message of lun dic 19 16:33:31 -0300 2011: On 19/12/11 16:48, Robert Haas wrote: On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstromreeds...@rice.edu wrote: This should either be fixed by changing the documentation to say to not escape colons or backslashes in the password part, only, or modify this function (PasswordFromFile) to silently unescape the password string. It already copies it. My vote is for a doc correction in the back-branches and a behavior change in master. Seems sensible - presumably mentioning this will be corrected in 9.2? It's clearly not what you'd call urgent since nobody else seems to have noticed before now. Yeah, it is a pretty old bug -- this code was clearly written by some rookie that didn't know very well what he was doing. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
Excerpts from Marti Raudsepp's message of lun dic 19 16:32:06 -0300 2011: (I do realize that allowing subtransactions to commit out of order also makes it failure prone) Uhm? You can't commit savepoints out of order. You can release an older one, but then all the ones following it disappear and can't be released separately. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
Robert Haas robertmh...@gmail.com wrote: On Mon, Dec 19, 2011 at 2:16 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: It seems to me that on a typical production system you would probably have zero or one such page per OS crash, with zero being far more likely than one. If we can get that one fixed (if it exists) before enough time has elapsed for everyone to forget the OS crash, the idea that we would be scaring the users and negatively affecting the perception of reliability seems far-fetched. The problem is that you can't fix them. If you come to a page with a bad CRC, you only have two choices: take it seriously, or don't. If you take it seriously, then you're complaining about something that may be completely benign. If you don't take it seriously, then you're ignoring something that may be a sign of data corruption. I was thinking that we would warn when such was found, set hint bits as needed, and rewrite with the new CRC. In the unlikely event that it was a torn hint-bit-only page update, it would be a warning about something which is a benign side-effect of the OS or hardware crash. The argument was that it could happen months later, and people might not remember the crash. My response to that is: don't let it wait that long. By forcing a vacuum of all possibly-affected tables (or all tables if the there's no way to rule any of them out), you keep it within recent memory. Of course, it would also make sense to document that such an error after an OS or hardware crash might be benign or may indicate data corruption or data loss, and give advice on what to do. There is obviously no way for PostgreSQL to automagically fix real corruption flagged by a CRC failure, under any circumstances. There's also *always a possibility that CRC error is a false positive -- if only the bytes in the CRC were damaged. We're talking quantitative changes here, not qualitative. I'm arguing that the extreme measures suggested to achieve the slight quantitative improvements are likely to cause more problems than they solve. A better use of resources to improve the false positive numbers would be to be more aggressive about setting hint bits -- perhaps when a page is written with any tuples with transaction IDs before the global xmin, the hint bits should be set and the CRC calculated before write, for example. (But that would be a different patch.) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Mon, Dec 19, 2011 at 21:43, Alvaro Herrera alvhe...@commandprompt.com wrote: (I do realize that allowing subtransactions to commit out of order also makes it failure prone) Uhm? You can't commit savepoints out of order. You can release an older one, but then all the ones following it disappear and can't be released separately. We're confused about the terminology already :) I was talking about autonomous subtransactions as in COMMIT SUBTRANSACTION from the proposal. Earlier I commented that it would be nice if the syntax didn't require autonomous transactions to be strictly nested. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs si...@2ndquadrant.com wrote: A simpler example shows it better. Tom's idea prevents 2 rows being returned, but doesn't prevent zero rows. I don't think it prevents either one. Suppose we read and return a tuple, and then someone HOT-updates it and commits. SnapshotNow is very happy to also return the updated version of that same tuple on the next iteration of the scan. See commit c0f03aae0469e758964faac0fb741685170c39a5. That's a useful improvement, but not the only thing. ISTM we can run a scan as we do now, without locking. If scan returns zero rows we scan again, but take a definition lock first. ALTER TABLE holds the definition lock while running, which won't be a problem because we would only use that on shorter AT statements. Definition lock being the type of lock described earlier on this thread, that we already have code for. So we have code for all the parts we need to make this work. So that means we'd be able to plug the gaps noted, without reducing performance on the common code paths. I don't see any objections made so far remain valid with that approach. I think a retry loop is a possibly useful tool for solving this problem, but I'm very skeptical about the definition locks, unless we can arrange to have them taken just before commit (regardless of when during the transaction ALTER TABLE was executed) and released immediately afterwards. What I think might be even better is something along the lines of what Noah Misch did RangeVarGetRelid -- don't try to lock out concurrent activity, just detect it and redo the operation if anything bad happens while we're in process. In this case, that would mean having a mechanism to know whether any concurrent transaction had updated the catalog we're scanning during the scan. Yet another option, which I wonder whether we're dismissing too lightly, is to just call GetSnapshotData() and do the scan using a plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, but is it expensive enough to worry about? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RangeVarGetRelid()
On Mon, Dec 19, 2011 at 01:43:18PM -0500, Robert Haas wrote: On Mon, Dec 19, 2011 at 11:55 AM, Noah Misch n...@leadboat.com wrote: Here's the rule I used: the pre-lock checks must prove that the user could, by some command working as-designed, have taken a lock at least as strong as the one we're about to acquire. ?How does that work out in practice? ?Relation owners can always take AccessExclusiveLock by way of a DROP command, so ownership is always sufficient as a pre-lock test. The above namespace check is no exception; the object owner has authority to take the AccessExclusiveLock independent of his authority to ultimately complete the rename. ?(Changes arising from the recent discussions about locking namespaces during DDL would complicate the situation in other ways.) Yes: and I have it mind to work on that for the current release cycle, time permitting. It seems to me that there's no real value in consolidating this callback if there's a change coming down the pipe that would require us to split it right back out again. Also, while I see the point that it wouldn't be a security issue to defer this particular check until after the lock is taken, I still don't think it's a good idea. I basically don't see any point in splitting up the security checks across multiple functions. Our permissions-checking logic is already rather diffuse; I think that finding more reasons for some of it to be done in one function and some of it to be done in some other function is going in the wrong direction. It makes it easier for future people editing the code to rearrange things in ways that subtly break security without realizing it, and it's also one more obstacle for things like sepgsql which would like to get control whenever we do a security check. (Dimitri's recent comments about command triggers suggest that MAC isn't the only application of an option to override the default access control policy.) It's also not terribly consistent from a user perspective: hmm, if I don't have permission to do operation X because of reason A, then my command fails immediately; if I don't have permission because of reason B, then it waits for a lock and then fails. Some amount of inconsistency there is probably inevitable, because, for example, we won't know whether you have permission on an entire inheritance hierarchy until we start walking it. But I don't really see the point in going out of our way to add more of it. I think it's important to keep in mind that most of the code that is going into those callbacks is just being moved. We're not really ending up with any more code overall, except to the extent that there are a couple of lines of net increase for each callback because, well, it has a comment, and maybe a declaration, and some curly braces at the beginning and the end. The ownership check is two lines of code; it doesn't matter whether we duplicate that or not. In many cases, I think the callbacks are actually increasing readability, by pulling a bunch of related checks into their own function instead of cluttering up the main code path with them. I don't want to end up with a lot of needless code duplication, but I also don't feel any powerful need to squeeze the number of callback functions down to an absolute minimum just because I can. I'm satisfied that you and I have a common understanding of the options and their respective merits. There's plenty of room for judgement in choosing which merits to seize at the expense of others. Your judgements here seem entirely reasonable. nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 12/19/2011 02:44 PM, Kevin Grittner wrote: I was thinking that we would warn when such was found, set hint bits as needed, and rewrite with the new CRC. In the unlikely event that it was a torn hint-bit-only page update, it would be a warning about something which is a benign side-effect of the OS or hardware crash. The argument was that it could happen months later, and people might not remember the crash. My response to that is: don't let it wait that long. By forcing a vacuum of all possibly-affected tables (or all tables if the there's no way to rule any of them out), you keep it within recent memory. Cleanup that requires a potentially unbounded in size VACUUM to sort out doesn't sound like a great path to wander down. Ultimately any CRC implementation is going to want a scrubbing feature like those found in RAID arrays eventually, one that wanders through all database pages looking for literal bitrot. And pushing in priority requests for things to check to the top of its queue may end up being a useful feature there. But if you need all that infrastructure just to get the feature launched, that's a bit hard to stomach. Also, as someone who follows Murphy's Law as my chosen religion, I would expect this situation could be exactly how flaky hardware would first manifest itself: server crash and a bad CRC on the last thing written out. And in that case, the last thing you want to do is assume things are fine, then kick off a VACUUM that might overwrite more good data with bad. The sort of bizarre, that should never happen cases are the ones I'd most like to see more protection against, rather than excusing them and going on anyway. There's also *always a possibility that CRC error is a false positive -- if only the bytes in the CRC were damaged. We're talking quantitative changes here, not qualitative. The main way I expect to validate this sort of thing is with an as yet unwritten function to grab information about a data block from a standby server for this purpose, something like this: Master: Computed CRC A, Stored CRC B; error raised because A!=B Standby: Computed CRC C, Stored CRC D If C==D A==C, the corruption is probably overwritten bits of the CRC B. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
Excerpts from Marti Raudsepp's message of lun dic 19 16:50:22 -0300 2011: On Mon, Dec 19, 2011 at 21:43, Alvaro Herrera alvhe...@commandprompt.com wrote: (I do realize that allowing subtransactions to commit out of order also makes it failure prone) Uhm? You can't commit savepoints out of order. You can release an older one, but then all the ones following it disappear and can't be released separately. We're confused about the terminology already :) Yeah. The code talks about savepoints as subtransactions (because that's the name they had at first), so if you guys are going to talk about autonomous transactions as subtransactions too, then the code is going to end up pretty schizo. I was talking about autonomous subtransactions as in COMMIT SUBTRANSACTION from the proposal. Earlier I commented that it would be nice if the syntax didn't require autonomous transactions to be strictly nested. Oh ... Probably. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Checksums
On 19.12.2011 21:27, Robert Haas wrote: To put this another way, we currently WAL-log just about everything. We get away with NOT WAL-logging some things when we don't care about whether they make it to disk. Hint bits, killed index tuple pointers, etc. cause no harm if they don't get written out, even if some other portion of the same page does get written out. But as soon as you CRC the whole page, now absolutely every single bit on that page becomes critical data which CANNOT be lost. IOW, it now requires the same sort of protection that we already need for our other critical updates - i.e. WAL logging. Or you could introduce some completely new mechanism that serves the same purpose, like MySQL's double-write buffer. Double-writes would be a useful option also to reduce the size of WAL that needs to be shipped in replication. Or you could just use a filesystem that does CRCs... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Mon, Dec 19, 2011 at 05:52:40PM +0200, Marti Raudsepp wrote: On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli gianni.cio...@2ndquadrant.it wrote: http://wiki.postgresql.org/wiki/Autonomous_subtransactions It is meant to be an ongoing project, requesting comments and contributions, rather than a conclusive document. In addition to what Jim Nasby said, this proposal seems a bit inflexible. In particular: 1. It limits us to exactly 2 autonomous transactions at any time (the main one and the subtransaction). Thanks for pointing this out; it shows me that the only example I provided was too basic, so that I have added a second one, plus one related remark. The spec doesn't actually impose a limit of only 2 total transactions: if we have the capability to pause the current transaction (T0) to start a new one (T1), and then resume T0 after the conclusion of T1, then we can apply that capability more than once in the same transaction, resulting in a sequence of transactions T1, T2, ..., Tn all subordinate to T0, possibly interspersed by statements from T0. Also, if T0 is a function then it is possible peek at the (very) near future and eliminate any empty interlude of T0 between T_k and T_(k+1). 2. There's no reason why two autonomous transactions should have a main / sub relationship. They are autonomous -- they should not depend on the state of the outer transaction. COMMIT/ROLLBACK of any of T1,T2,... does not depend on COMMIT/ROLLBACK of T0; the parent transaction exists only to simplify the logical model. It can happen that T0 contains zero statements, so that it is very similar to a sequence of independent transactions. The advantage to me is that we allow autonomous transactions, but there is always one master transaction, so that it looks more like a generalisation of the existing model as opposed to something radically different (and therefore less likely to happen). The behaviour is indeed similar to the one that can be obtained with dblink: it allows for instance to have an umbrella transaction which doesn't modify the database, and has the only purpose of executing other transactions, which commit or rollback independently of each other and of the umbrella transaction. With respect to dblink, this spec brings a simpler syntax, which doesn't require managing a second connection to the same db, and one significant difference: you reuse the current session instead of opening a new one (with some caveats, as noted in the Wiki). Regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Autonomous subtransactions
On Mon, Dec 19, 2011 at 09:32:06PM +0200, Marti Raudsepp wrote: Maybe that's just my paranoia, but the fact that subtransactions aren't named means it's pretty easy to accidentally get out of step and commit the wrong subtransaction. I see app developers often messing up BEGIN/COMMIT/ROLLBACK already. This is why I like the SAVEPOINT style; it's obvious when there's a bug. For each COMMIT/ROLLBACK there is only one possible BEGIN SUBTRANSACTION, therefore naming is not strictly necessary, unlike in the SAVEPOINT case where you can have more than one option. However, ISTM that allowing to optionally name each transaction could provide a nice safety measure that the careful developer can use to reduce the chances of human error. If subtransactions are not named, no check is made; but if they are named, then the Xs on BEGIN SUBTRANSACTION X and COMMIT/ROLLBACK SUBTRANSACTION X must match, otherwise an exception is raised (and a bug is detected). Regards, Dr. Gianni Ciolli - 2ndQuadrant Italia PostgreSQL Training, Services and Support gianni.cio...@2ndquadrant.it | www.2ndquadrant.it -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
Marti Raudsepp ma...@juffo.org writes: On Mon, Dec 19, 2011 at 17:30, Sushant Sinha sushant...@gmail.com wrote: This never happened earlier with postgres 9.0 Is there a known issue with Postgres 9.1? Or how to report this problem? What *did* change in 9.1 is that there's new GIN cost estimation in the planner. If you do EXPLAIN ANALYZE for your queries, do the plans differ for 9.0 or 9.1? I trolled the commit log a bit, and AFAICS the only significant GIN changes between 9.1 and reasonably late-model 9.0 are the cost estimation patch and this one: http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=73912e7fbd1b52c51d914214abbec1cda64595f2 which makes me wonder if maybe the OP has a very large fraction of empty or null entries in his data. Previously those would have resulted in no insertion traffic on a GIN index, but now they do. Another thought -- have you read about the GIN fast updates feature? This existed in 9.0 too. Yeah, so it seems unlikely to be that, or at least not that by itself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries
Jesper Krogh jes...@krogh.cc writes: I have to say that I consistently have to turn fastupdate off for our heavily updated gin-indexes. The overall performance gain may be measurable, but its not intolerable without. The spikes seen from the applications, when cleanup happens. Either in the foreground or in the background are not tolerable. (multiple seconds). Well, that's why there's a provision to turn it off: if response time spikes are a bigger deal to you than overall performance, you probably don't want bulk updates. The theory is that you should be able to tune things so that the bulk updates are done by autovacuum, but if you can't get that to work sufficiently reliably, fastupdate=off is the best answer. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
On Mon, Dec 19, 2011 at 12:07 PM, Nikhil Sontakke nikkh...@gmail.com wrote: It seems hard to believe that ATExecDropConstraint() doesn't need any adjustment. Yeah, I think we could return early on for only type of constraints. It's not just that. Suppose that C inherits from B which inherits from A. We add an only constraint to B and a non-only constraint to A. Now, what happens in each of the following scenarios? 1. We drop the constraint from B without specifying ONLY. 2. We drop the constraint from B *with* ONLY. 3. We drop the constraint from A without specifying ONLY. 4. We drop the constraint from A *with* ONLY. Off the top of my head, I suspect that #1 should be an error; #2 should succeed, leaving only the inherited version of the constraint on B; #3 should remove the constraint from A and leave it on B but I'm not sure what should happen to C, and I have no clear vision of what #4 should do. As a followup question, if we do #2 followed by #4, or #4 followed by #2, do we end up with the same final state in both cases? Here's another scenario. B inherits from A. We a constraint to A using ONLY, and then drop it without ONLY. Does that work or fail? Also, what happens we add matching constraints to B and A, in each case using ONLY, and then remove the constraint from A without using ONLY? Does anything happen to B's constraint? Why or why not? Just to be clear, I like the feature. But I've done some work on this code before, and it is amazingly easy for to screw it up and end up with bugs... so I think lots of careful thought is in order. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
Excerpts from Nikhil Sontakke's message of lun dic 19 14:07:02 -0300 2011: PFA, revised version containing the above changes based on Alvaro's v4 patch. Committed with these fixes, and with my fix for the pg_dump issue noted by Alex, too. Thanks. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Hi list, Since PostgreSQL 9.1, GIN has new cost estimation code. This code assumes that the only expression type it's going to see is OpExpr. However, ScalarArrayOpExpr has also been possible in earlier versions. Estimating col op ANY (array) queries segfaults in 9.1 if there's a GIN index on the column. Case in point: create table words (word text); create index on words using gin (to_tsvector('english', word)); explain analyze select * from words where to_tsvector('english', word) @@ any ('{foo}'); (It seems that RowCompareExpr and NullTest clauses are impossible for a GIN index -- at least my efforts to find such cases failed) Attached is an attempted fix for the issue. I split out the code for the extract call and now run that for each array element, adding together the average of (partialEntriesInQuals, exactEntriesInQuals, searchEntriesInQuals) for each array element. After processing all quals, I multiply the entries by the number of array_scans (which is the product of all array lengths) to get the total cost. This required a fair bit of refactoring, but I tried to follow the patterns for OpExpr pretty strictly -- discounting scans over NULL elements, returning 0 cost when none of the array elements can match, accounting for cache effects when there are multiple scans, etc. But it's also possible that I have no idea what I'm really doing. :) I also added regression tests for this to tsearch and pg_trgm. Regards, Marti diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out index e7af7d4..250d853 100644 --- a/contrib/pg_trgm/expected/pg_trgm.out +++ b/contrib/pg_trgm/expected/pg_trgm.out @@ -3486,6 +3486,16 @@ explain (costs off) Index Cond: (t ~~* '%BCD%'::text) (4 rows) +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); + QUERY PLAN +- + Bitmap Heap Scan on test2 + Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + - Bitmap Index Scan on test2_idx_gin + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +(4 rows) + select * from test2 where t like '%BCD%'; t --- @@ -3509,6 +3519,13 @@ select * from test2 where t ilike 'qua%'; quark (1 row) +select * from test2 where t like any ('{%bcd%,qua%}'); + t + + abcdef + quark +(2 rows) + drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; @@ -3528,6 +3545,16 @@ explain (costs off) Index Cond: (t ~~* '%BCD%'::text) (2 rows) +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); + QUERY PLAN +- + Bitmap Heap Scan on test2 + Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) + - Bitmap Index Scan on test2_idx_gist + Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[])) +(4 rows) + select * from test2 where t like '%BCD%'; t --- @@ -3551,3 +3578,10 @@ select * from test2 where t ilike 'qua%'; quark (1 row) +select * from test2 where t like any ('{%bcd%,qua%}'); + t + + abcdef + quark +(2 rows) + diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql index ea902f6..ac969e6 100644 --- a/contrib/pg_trgm/sql/pg_trgm.sql +++ b/contrib/pg_trgm/sql/pg_trgm.sql @@ -47,10 +47,13 @@ explain (costs off) select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; +select * from test2 where t like any ('{%bcd%,qua%}'); drop index test2_idx_gin; create index test2_idx_gist on test2 using gist (t gist_trgm_ops); set enable_seqscan=off; @@ -58,7 +61,10 @@ explain (costs off) select * from test2 where t like '%BCD%'; explain (costs off) select * from test2 where t ilike '%BCD%'; +explain (costs off) + select * from test2 where t like any ('{%bcd%,qua%}'); select * from test2 where t like '%BCD%'; select * from test2 where t like '%bcd%'; select * from test2 where t ilike '%BCD%'; select * from test2 where t ilike 'qua%'; +select * from test2 where t like any ('{%bcd%,qua%}'); diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index d06809e..64b732e 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -6591,6 +6591,70 @@ find_index_column(Node *op, IndexOptInfo *index) } /* + * Estimate gin costs for a single search value. Returns false if no match + * is possible, true otherwise. All pointer arguments must be initialized by + * the caller. + */ +static bool +gin_costestimate_search(Oid extractProcOid, Datum value, int strategy_op,
Re: [HACKERS] Page Checksums
Greg Smith g...@2ndquadrant.com wrote: But if you need all that infrastructure just to get the feature launched, that's a bit hard to stomach. Triggering a vacuum or some hypothetical scrubbing feature? Also, as someone who follows Murphy's Law as my chosen religion, If you don't think I pay attention to Murphy's Law, I should recap our backup procedures -- which involves three separate forms of backup, each to multiple servers in different buildings, real-time, plus idle-time comparison of the databases of origin to all replicas with reporting of any discrepancies. And off-line snapshot backups on disk at a records center controlled by a different department. That's in addition to RAID redundancy and hardware health and performance monitoring. Some people think I border on the paranoid on this issue. I would expect this situation could be exactly how flaky hardware would first manifest itself: server crash and a bad CRC on the last thing written out. And in that case, the last thing you want to do is assume things are fine, then kick off a VACUUM that might overwrite more good data with bad. Are you arguing that autovacuum should be disabled after crash recovery? I guess if you are arguing that a database VACUUM might destroy recoverable data when hardware starts to fail, I can't argue. And certainly there are way too many people who don't ensure that they have a good backup before firing up PostgreSQL after a failure, so I can see not making autovacuum more aggressive than usual, and perhaps even disabling it until there is some sort of confirmation (I have no idea how) that a backup has been made. That said, a database VACUUM would be one of my first steps after ensuring that I had a copy of the data directory tree, personally. I guess I could even live with that as recommended procedure rather than something triggered through autovacuum and not feel that the rest of my posts on this are too far off track. The main way I expect to validate this sort of thing is with an as yet unwritten function to grab information about a data block from a standby server for this purpose, something like this: Master: Computed CRC A, Stored CRC B; error raised because A!=B Standby: Computed CRC C, Stored CRC D If C==D A==C, the corruption is probably overwritten bits of the CRC B. Are you arguing we need *that* infrastructure to get the feature launched? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote: We can hopefully get around this for the extensions in contrib (and reasonably well has already), but few large companies are going to be happy to go to pgxn and download an extension that has a single maintainer (not the team, and in most cases not even a team), usually no defined lifecycle, no support, etc. (I'm pretty sure you won't get support included for random pgxn modules when you buy a contract from EDB, or CMD, or us, or PGX, or anybody really - wheras if it the datatype is in core, you *will* get this) I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN. Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from. If we can find a way to have a stable part in core and then have addons that can provide these tons of interesting features (which I agree there are) until such time that they can be considered stable enough for core, I think that's the best compromise. +1, though I think the core type will at least need some basic operators and indexing support. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lots of FSM-related fragility in transaction commit
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 08.12.2011 08:20, Tom Lane wrote: I thought that removing the unreadable file would let me restart, but after rm 52860_fsm and trying again to start the server, there's a different problem: LOG: consistent recovery state reached at 0/5D71BA8 LOG: redo starts at 0/5100050 WARNING: page 18 of relation base/27666/52860 is uninitialized CONTEXT: xlog redo visible: rel 1663/27666/52860; blk 18 PANIC: WAL contains references to invalid pages The bug here is that we consider having immediately reached consistency during crash recovery. That fixes only part of the problem I was on about, though: I don't believe that this type of situation should occur in the first place. We should not be throwing ERROR during post-commit attempts to unlink the physical files of deleted relations. After some investigation I believe that the problem was introduced by the changes to support multiple forks in a relation. Specifically, instead of just doing smgrdounlink() during post-commit, we now do something like this: for (fork = 0; fork = MAX_FORKNUM; fork++) { if (smgrexists(srel, fork)) smgrdounlink(srel, fork, false); } and if you look into mdexists() you'll find that it throws ERROR for any unexpected errno. This makes smgrdounlink's attempts to be robust rather pointless. I'm not entirely sure whether that behavior of mdexists is a good thing, but it strikes me that the smgrexists call is pretty unnecessary here in the first place. We should just get rid of it and do smgrdounlink unconditionally. The only reason it's there is to keep smgrdounlink from whinging about ENOENT on non-primary forks, which is simply stupid for smgrdounlink to be doing anyway --- it is actually *more* willing to complain about ENOENT on non-primary than primary forks, which has no sense that I can detect. Accordingly I propose the attached patch for HEAD, and similar changes in all branches that have multiple forks. Note that this makes the warning-report conditions identical for both code paths in mdunlink. regards, tom lane diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index d2fecb1ecb9171fc2c6dd2887d846c8a16779bb0..c5a2541807e148c29c790b0fe12a52c0389f4626 100644 *** a/src/backend/access/transam/twophase.c --- b/src/backend/access/transam/twophase.c *** FinishPreparedTransaction(const char *gi *** 1366,1373 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! if (smgrexists(srel, fork)) ! smgrdounlink(srel, fork, false); } smgrclose(srel); } --- 1366,1372 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! smgrdounlink(srel, fork, false); } smgrclose(srel); } diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c index c383011b5f6538fa2b90fa5f7778da7ff59fa679..4177a6c5cf95d5e1a1b49b9f11c17098b2158529 100644 *** a/src/backend/access/transam/xact.c --- b/src/backend/access/transam/xact.c *** xact_redo_commit_internal(TransactionId *** 4595,4605 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! if (smgrexists(srel, fork)) ! { ! XLogDropRelation(xnodes[i], fork); ! smgrdounlink(srel, fork, true); ! } } smgrclose(srel); } --- 4595,4602 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! XLogDropRelation(xnodes[i], fork); ! smgrdounlink(srel, fork, true); } smgrclose(srel); } *** xact_redo_abort(xl_xact_abort *xlrec, Tr *** 4738,4748 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! if (smgrexists(srel, fork)) ! { ! XLogDropRelation(xlrec-xnodes[i], fork); ! smgrdounlink(srel, fork, true); ! } } smgrclose(srel); } --- 4735,4742 for (fork = 0; fork = MAX_FORKNUM; fork++) { ! XLogDropRelation(xlrec-xnodes[i], fork); ! smgrdounlink(srel, fork, true); } smgrclose(srel); } diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index fff933d403e7ca5e7b551b987e1684cd6a6e98a7..450e292f4013f9be5418f482654cd33c14c6d344 100644 *** a/src/backend/catalog/storage.c --- b/src/backend/catalog/storage.c *** smgrDoPendingDeletes(bool isCommit) *** 361,368 srel = smgropen(pending-relnode, pending-backend); for (i = 0; i = MAX_FORKNUM; i++) { ! if (smgrexists(srel, i)) ! smgrdounlink(srel, i, false); } smgrclose(srel); } --- 361,367 srel = smgropen(pending-relnode, pending-backend); for (i = 0; i = MAX_FORKNUM; i++) { ! smgrdounlink(srel, i, false); } smgrclose(srel); } diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 3:39 PM, David E. Wheeler wrote: Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. Bah, it says “It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999.” But my point still holds: it is not a programming language, and one does not need a PL to have a JSON data type. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote: My understanding is that JSON is a subset of ECMAscript Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript. http://json.org/ IOW, one does not need a new PL to get this type. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] PATCH: tracking temp files in pg_stat_database
Hello everybody, this patch adds two counters to pg_stat_database to track temporary files - number of temp files and number of bytes. I see this as a useful feature, as temporary files often cause a lot of IO (because of low work_mem etc.). The log_temp_files is useful, but you have to parse the log and only temp files exceeding a size limit are logged so the actual amount of I/O is unknown. The patch is rather simple: 1) two new fields in PgStat_StatDBEntry (n_temp_files, n_temp_bytes) 2) report/recv function in pgstat.c 3) FileClose modified to log stats for all temp files (see below) 4) two new fields added to pg_stat_database (temp_files, temp_bytes) I had to modify FileClose to call stat() on each temp file as this should log all temp files (not just when log_temp_file = 0). But the impact of this change should be negligible, considering that the user is already using temp files. I haven't updated the docs yet - let's see if the patch is acceptable at all first. Tomas diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 2253ca8..55d20dc 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -574,6 +574,8 @@ CREATE VIEW pg_stat_database AS pg_stat_get_db_tuples_updated(D.oid) AS tup_updated, pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted, pg_stat_get_db_conflict_all(D.oid) AS conflicts, +pg_stat_get_db_temp_files(D.oid) AS temp_files, +pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes, pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset FROM pg_database D; diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index 24f4cde..97c7004 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -286,7 +286,7 @@ static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len); static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len); static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len); static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len); - +static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len); /* * Public functions called from postmaster follow @@ -1339,6 +1339,29 @@ pgstat_report_recovery_conflict(int reason) pgstat_send(msg, sizeof(msg)); } + +/* + * pgstat_report_tempfile() - + * + * Tell the collector about a temporary file. + * + */ +void +pgstat_report_tempfile(size_t filesize) +{ + PgStat_MsgTempFile msg; + + if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts) + return; + + pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_TEMPFILE); + msg.m_databaseid = MyDatabaseId; + msg.m_filesize = filesize; + pgstat_send(msg, sizeof(msg)); +} + +; + /* -- * pgstat_ping() - * @@ -3185,6 +3208,10 @@ PgstatCollectorMain(int argc, char *argv[]) pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len); break; + case PGSTAT_MTYPE_TEMPFILE: + pgstat_recv_tempfile((PgStat_MsgTempFile *) msg, len); + break; + default: break; } @@ -3266,6 +3293,8 @@ pgstat_get_db_entry(Oid databaseid, bool create) result-n_conflict_snapshot = 0; result-n_conflict_bufferpin = 0; result-n_conflict_startup_deadlock = 0; + result-n_temp_files = 0; + result-n_temp_bytes = 0; result-stat_reset_timestamp = GetCurrentTimestamp(); @@ -4177,6 +4206,8 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int len) dbentry-n_tuples_updated = 0; dbentry-n_tuples_deleted = 0; dbentry-last_autovac_time = 0; + dbentry-n_temp_bytes = 0; + dbentry-n_temp_files = 0; dbentry-stat_reset_timestamp = GetCurrentTimestamp(); @@ -4403,6 +4434,24 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int len) } /* -- + * pgstat_recv_tempfile() - + * + * Process as PGSTAT_MTYPE_TEMPFILE message. + * -- + */ +static void +pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len) +{ + PgStat_StatDBEntry *dbentry; + + dbentry = pgstat_get_db_entry(msg-m_databaseid, true); + + dbentry-n_temp_bytes += msg-m_filesize; + dbentry-n_temp_files += 1; + +} + +/* -- * pgstat_recv_funcstat() - * * Count what the backend has done. diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c index 9540279..5e40d12 100644 --- a/src/backend/storage/file/fd.c +++ b/src/backend/storage/file/fd.c @@
Re: [HACKERS] Review: Non-inheritable check constraints
PFA, revised version containing the above changes based on Alvaro's v4 patch. Committed with these fixes, and with my fix for the pg_dump issue noted by Alex, too. Thanks. Thanks a lot Alvaro! Regards, Nikhils -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Re: [HACKERS] Review: Non-inheritable check constraints
Excerpts from Nikhil Sontakke's message of lun dic 19 22:32:57 -0300 2011: PFA, revised version containing the above changes based on Alvaro's v4 patch. Committed with these fixes, and with my fix for the pg_dump issue noted by Alex, too. Thanks. Thanks a lot Alvaro! You're welcome. But did you see Robert Haas' response upthread? It looks like there's still some work to do -- at least some research to determine that we're correctly handling all those cases. As the committer I'm responsible for it, but I don't have resources to get into it any time soon. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation
On Mon, Dec 19, 2011 at 1:27 PM, Phil Sorber p...@omniti.com wrote: On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote: Attached is a patch that addresses the todo item Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation. http://archives.postgresql.org/pgsql-general/2010-10/msg00332.php Instead of returning an error, they now return NULL if the OID is found in pg_class when using SnapshotAny. I only applied it to 4 functions: select pg_relation_size, pg_total_relation_size, pg_table_size and pg_indexes_size. These are the ones that were using relation_open(). I changed them to using try_relation_open(). For three of them I had to move the try_relation_open() call up one level in the call stack and change the parameter types for some support functions from Oid to Relation. They now also call a new function relation_recently_dead() which is what checks for relation in SnapshotAny. All regression tests passed. Is SnapshotAny the snapshot I should be using? It seems to get the correct results. I can drop a table and I get NULL. Then after a vacuumdb it returns an error. Something I'd like to point out myself is that I need to be using ObjectIdAttributeNumber instead of Anum_pg_class_relfilenode. Probably just luck that I got the intended results before. This will also change the logic in a few other places. Still not clear on the semantics of Snapshot{Any|Self|Now|Dirty}. I've attached the updated version of my patch with the changes mentioned in the previous email. diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 2ee5966..bc2c862 *** a/src/backend/utils/adt/dbsize.c --- b/src/backend/utils/adt/dbsize.c *** *** 15,20 --- 15,21 #include sys/stat.h #include access/heapam.h + #include access/sysattr.h #include catalog/catalog.h #include catalog/namespace.h #include catalog/pg_tablespace.h *** *** 24,32 --- 25,35 #include storage/fd.h #include utils/acl.h #include utils/builtins.h + #include utils/fmgroids.h #include utils/rel.h #include utils/relmapper.h #include utils/syscache.h + #include utils/tqual.h /* Return physical size of directory contents, or 0 if dir doesn't exist */ *** calculate_relation_size(RelFileNode *rfn *** 281,286 --- 284,322 return totalsize; } + /* + * relation_recently_dead + * + * Check to see if a relation exists in SnapshotAny + */ + static bool + relation_recently_dead(Oid relOid) + { + Relation classRel; + ScanKeyData key[1]; + HeapScanDesc scan; + bool status; + + classRel = heap_open(RelationRelationId, AccessShareLock); + + ScanKeyInit(key[0], + ObjectIdAttributeNumber, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(relOid)); + + scan = heap_beginscan(classRel, SnapshotAny, 1, key); + + if (heap_getnext(scan, ForwardScanDirection) != NULL) + status = true; + else + status = false; + + heap_endscan(scan); + heap_close(classRel, AccessShareLock); + + return status; + } + Datum pg_relation_size(PG_FUNCTION_ARGS) { *** pg_relation_size(PG_FUNCTION_ARGS) *** 289,295 Relation rel; int64 size; ! rel = relation_open(relOid, AccessShareLock); size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); --- 325,339 Relation rel; int64 size; ! rel = try_relation_open(relOid, AccessShareLock); ! ! if (rel == NULL) ! { ! if (relation_recently_dead(relOid)) ! PG_RETURN_NULL(); ! else ! elog(ERROR, could not open relation with OID %u, relOid); ! } size = calculate_relation_size((rel-rd_node), rel-rd_backend, forkname_to_number(text_to_cstring(forkName))); *** calculate_toast_table_size(Oid toastreli *** 339,352 * those won't have attached toast tables, but they can have multiple forks. */ static int64 ! calculate_table_size(Oid relOid) { int64 size = 0; - Relation rel; ForkNumber forkNum; - rel = relation_open(relOid, AccessShareLock); - /* * heap size, including FSM and VM */ --- 383,393 * those won't have attached toast tables, but they can have multiple forks. */ static int64 ! calculate_table_size(Relation rel) { int64 size = 0; ForkNumber forkNum; /* * heap size, including FSM and VM */ *** calculate_table_size(Oid relOid) *** 360,367 if (OidIsValid(rel-rd_rel-reltoastrelid)) size += calculate_toast_table_size(rel-rd_rel-reltoastrelid); - relation_close(rel, AccessShareLock); - return size; } --- 401,406 *** calculate_table_size(Oid relOid) *** 371,382 * Can be applied safely to an index, but you'll just get zero. */ static int64 ! calculate_indexes_size(Oid
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote: Yet another option, which I wonder whether we're dismissing too lightly, is to just call GetSnapshotData() and do the scan using a plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, but is it expensive enough to worry about? Good point. For the most part, we already regard a catalog scan as too expensive for bulk use, hence relcache and catcache. That's not license to slow them down recklessly, but it's worth discovering how much of a hit we'd actually face. I created a function that does this in a loop: HeapTuple t; CatalogCacheFlushCatalog(ProcedureRelationId); t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); if (!HeapTupleIsValid(t)) elog(ERROR, cache lookup failed for function 42); ReleaseSysCache(t); Then, I had pgbench call the function once per client with various numbers of clients and a loop iteration count such that the total number of scans per run was always 1920. Results for master and for a copy patched to use MVCC snapshots in catcache.c only: 2 clients, master: 4:30.66elapsed 4 clients, master: 4:26.82elapsed 32 clients, master: 4:25.30elapsed 2 clients, master: 4:25.67elapsed 4 clients, master: 4:26.58elapsed 32 clients, master: 4:26.40elapsed 2 clients, master: 4:27.54elapsed 4 clients, master: 4:26.60elapsed 32 clients, master: 4:27.20elapsed 2 clients, mvcc-catcache: 4:35.13elapsed 4 clients, mvcc-catcache: 4:30.40elapsed 32 clients, mvcc-catcache: 4:37.91elapsed 2 clients, mvcc-catcache: 4:28.13elapsed 4 clients, mvcc-catcache: 4:27.06elapsed 32 clients, mvcc-catcache: 4:32.84elapsed 2 clients, mvcc-catcache: 4:32.47elapsed 4 clients, mvcc-catcache: 4:24.35elapsed 32 clients, mvcc-catcache: 4:31.54elapsed I see roughly a 2% performance regression. However, I'd expect any bulk losses to come from increased LWLock contention, which just doesn't materialize in a big way on this 2-core box. If anyone would like to rerun this on a larger machine, I can package it up for reuse. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ToDo: conditional ALTER TABLE
On Mon, Dec 19, 2011 at 10:25:34AM +0100, Pavel Stehule wrote: I am working on quiet dumps now. i found a small issue. pg_dump produces a statements ALTER TABLE ONLY public.b DROP CONSTRAINT b_fk_fkey; ALTER TABLE ONLY public.a DROP CONSTRAINT a_pkey; DROP TABLE IF EXISTS public.b; DROP TABLE IF EXISTS public.a; Actually there is no a conditional ALTER. These statements must be before DROPs, but then it can fails when these tables are missing. So some form like ALTER TABLE IF EXISTS ... should be usefull If ALTER TABLE is the only ALTER command you'd need to change this way, I think your proposal is good. nm (who has never used pg_dump --clean) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
Noah Misch n...@leadboat.com writes: On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote: Yet another option, which I wonder whether we're dismissing too lightly, is to just call GetSnapshotData() and do the scan using a plain old MVCC snapshot. Sure, it's more expensive than SnapshotNow, but is it expensive enough to worry about? That might actually be workable ... I created a function that does this in a loop: HeapTuple t; CatalogCacheFlushCatalog(ProcedureRelationId); t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */); if (!HeapTupleIsValid(t)) elog(ERROR, cache lookup failed for function 42); ReleaseSysCache(t); ... but this performance test seems to me to be entirely misguided, because it's testing a situation that isn't going to occur much in the field, precisely because the syscache should prevent constant reloads of the same syscache entry. Poking around a bit, it looks to me like one of the bigger users of non-cache-fronted SnapshotNow scans is dependency.c. So maybe testing the speed of large cascaded drops would be a more relevant test case. For instance, create a schema containing a few thousand functions, and see how long it takes to drop the schema. Another thing that would be useful to know is what effect such a change would have on the time to run the regression tests with CLOBBER_CACHE_ALWAYS. That has nothing to do with any real-world performance concerns, but it would be good to see if we're going to cause a problem for the long-suffering buildfarm member that does that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
But did you see Robert Haas' response upthread? It looks like there's still some work to do -- at least some research to determine that we're correctly handling all those cases. As the committer I'm responsible for it, but I don't have resources to get into it any time soon. Yeah, am definitely planning to test out those scenarios and will respond some time late today. Regards, Nikhils
Re: [HACKERS] RangeVarGetRelid()
On Mon, Dec 19, 2011 at 3:12 PM, Noah Misch n...@leadboat.com wrote: I'm satisfied that you and I have a common understanding of the options and their respective merits. There's plenty of room for judgement in choosing which merits to seize at the expense of others. Your judgements here seem entirely reasonable. Thanks. I'm not trying to be difficult. After staring at this for quite a while longer, it seemed to me that the logic for renaming a relation was similar enough to the logic for changing a schema that the two calbacks could reasonably be combined using a bit of conditional logic; and that, further, the same callback could be used, with a small amount of additional modification, for ALTER TABLE. Here's a patch to do that. I also notice that cluster() - which doesn't have a callback - has exactly the same needs as ReindexRelation() - which does. So that case can certainly share code; though I'm not quite sure what to call the shared callback, or which file to put it in. RangeVarCallbackForStorageRewrite? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company rangevargetrelid-callback-round3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON for PG 9.2
On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com wrote: +1, though I think the core type will at least need some basic operators and indexing support. And I'm willing to do that, but I thought it best to submit a bare bones patch first, in the hopes of minimizing the number of objectionable things therein. For example, if you want to be able to index a JSON column, you have to decide on some collation order that is consistent with JSON's notion of equality, and it's not obvious what is most logical. Heck, equality itself isn't 100% obvious. If there's adequate support for including JSON in core, and nobody objects to my implementation, then I'll throw some ideas for those things up against the wall and see what sticks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Marti Raudsepp ma...@juffo.org writes: Since PostgreSQL 9.1, GIN has new cost estimation code. This code assumes that the only expression type it's going to see is OpExpr. However, ScalarArrayOpExpr has also been possible in earlier versions. Estimating col op ANY (array) queries segfaults in 9.1 if there's a GIN index on the column. Ugh. I think we subconsciously assumed that ScalarArrayOpExpr couldn't appear here because GIN doesn't set amsearcharray, but of course that's wrong. (It seems that RowCompareExpr and NullTest clauses are impossible for a GIN index -- at least my efforts to find such cases failed) No, those are safe for the moment --- indxpath.c has a hard-wired assumption that RowCompareExpr is only usable with btree, and NullTest is only considered to be indexable if amsearchnulls is set. Still, it'd likely be better if this code ignored unrecognized qual expression types rather than Assert'ing they're not there. It's not like the cases it *does* handle are done so perfectly that ignoring an unknown qual could be said to bollix the estimate unreasonably. Attached is an attempted fix for the issue. I split out the code for the extract call and now run that for each array element, adding together the average of (partialEntriesInQuals, exactEntriesInQuals, searchEntriesInQuals) for each array element. After processing all quals, I multiply the entries by the number of array_scans (which is the product of all array lengths) to get the total cost. This required a fair bit of refactoring, but I tried to follow the patterns for OpExpr pretty strictly -- discounting scans over NULL elements, returning 0 cost when none of the array elements can match, accounting for cache effects when there are multiple scans, etc. But it's also possible that I have no idea what I'm really doing. :) Hmm. I am reminded of how utterly unreadable diff -u format is for anything longer than single-line changes :-( ... but I think I don't like this refactoring much. Will take a closer look tomorrow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: Non-inheritable check constraints
Hi Robert, First of all, let me state that this ONLY feature has not messed around with existing inheritance semantics. It allows attaching a constraint to any table (which can be part of any hierarchy) without the possibility of it ever playing any part in future or existing inheritance hierarchies. It is specific to that table, period. It's not just that. Suppose that C inherits from B which inherits from A. We add an only constraint to B and a non-only constraint to A. Now, what happens in each of the following scenarios? An example against latest HEAD should help here: create table A(ff1 int); create table B () inherits (A); create table C () inherits (B); alter table A add constraint Achk check (ff1 10); The above will attach Achk to A, B and C alter table only B add constraint Bchk check (ff1 0); The above will attach Bchk ONLY to table B 1. We drop the constraint from B without specifying ONLY. postgres=# alter table B drop constraint Achk; ERROR: cannot drop inherited constraint achk of relation b The above is existing inheritance based behaviour. Now let's look at the ONLY constraint: postgres=# alter table B drop constraint Bchk; ALTER TABLE Since this constraint is not part of any hierarchy, it can be removed. postgres=# alter table only B add constraint bchk check (ff1 0); ALTER TABLE postgres=# alter table only B drop constraint Bchk; ALTER TABLE So only constraints do not need the only B qualification to be deleted. They work both ways and can always be deleted without any issues. 2. We drop the constraint from B *with* ONLY. postgres=# alter table only B drop constraint Achk; ERROR: cannot drop inherited constraint achk of relation b The above is existing inheritance based behavior. So regardless of ONLY an inherited constraint cannot be removed from the middle of the hierarchy. 3. We drop the constraint from A without specifying ONLY. postgres=# alter table A drop constraint Achk; ALTER TABLE This removes the constraint from the entire hierarchy across A, B and C. Again existing inheritance behavior. 4. We drop the constraint from A *with* ONLY. postgres=# alter table only A drop constraint Achk; ALTER TABLE This converts the Achk constraints belonging to B into a local one. C still has it as an inherited constraint from B. We can now delete those constraints as per existing inheritance semantics. However I hope the difference between these and ONLY constraints are clear. The Achk constraint associated with B can get inherited in the future whereas only constraints will not be. Off the top of my head, I suspect that #1 should be an error; It's an error for inherited constraints, but not for only constraints. #2 should succeed, leaving only the inherited version of the constraint on B; Yeah, only constraints removal succeeds, whereas inherited constraints cannot be removed. #3 should remove the constraint from A and leave it on B but I'm not sure what should happen to C, This removes the entire hierarchy. and I have no clear vision of what #4 should do. This removes the constraint from A, but maintains the inheritance relationship between B and C. Again standard existing inheritance semantics. As a followup question, if we do #2 followed by #4, or #4 followed by #2, do we end up with the same final state in both cases? Yeah. #2 is not able to do much really because we do not allow inherited constraints to be removed from the mid of the hierarchy. Here's another scenario. B inherits from A. We a constraint to A using ONLY, and then drop it without ONLY. Does that work or fail? The constraint gets added to A and since it is an only constraint, its removal both with and without only A works just fine. Also, what happens we add matching constraints to B and A, in each case using ONLY, and then remove the constraint from A without using ONLY? Does anything happen to B's constraint? Why or why not? Again the key differentiation here is that only constraints are bound to that table and wont be inherited ever. So this works just fine. postgres=# alter table only A add constraint A2chk check (ff1 10); ALTER TABLE postgres=# alter table only B add constraint A2chk check (ff1 10); ALTER TABLE Just to be clear, I like the feature. But I've done some work on this code before, and it is amazingly easy for to screw it up and end up with bugs... so I think lots of careful thought is in order. Agreed. I just tried out the scenarios laid out by you both with and without the committed patch and AFAICS, normal inheritance semantics have been preserved properly even after the commit. Regards, Nikhils