Re: [HACKERS] Incorrect behaviour when using a GiST index on points
On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov aekorot...@gmail.comwrote: Described differences leads to incorrect behaviour of GiST index. The question is: what is correct way to fix it? Should on_pb also use FP* or consistent method should behave like on_pb? Any comments on this? Current behaviour definitely indicates a bug, and I'm ready to fix it. The only question: is this bug in on_pb or gist? -- With best regards, Alexander Korotkov.
Re: [HACKERS] Google Summer of Code? Call for mentors.
On Thu, Feb 16, 2012 at 12:56 PM, Dave Page dp...@pgadmin.org wrote: On Wed, Feb 15, 2012 at 10:18 PM, Josh Berkus j...@agliodbs.com wrote: Hackers, The call is now open for Google Summer of Code. If you are interested in being a GSoC mentor this summer, please reply to this email. I want to gauge whether or not we should participate this summer. I'll play, but I think I'm going to have to be very sure about any project before I agree to mentor it this year (not that there was anything wrong with my student last year - quite the opposite in fact - I just need to be sure I'm spending my time on the right things). FYI, I found myself to be eligible for this year. So, if PostgreSQL will participate this year, I'll do some proposals on indexing. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Bugs/slowness inserting and indexing cubes
On Wed, Feb 15, 2012 at 7:28 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Wed, Feb 15, 2012 at 4:26 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: So, I think we should go with your original fix and simply do nothing in gistRelocateBuildBuffersOnSpli**t() if the page doesn't have a buffer. I agree. OK, I won't object. So, I think we can just commit first version of fix now. -- With best regards, Alexander Korotkov. *** a/src/backend/access/gist/gistbuildbuffers.c --- b/src/backend/access/gist/gistbuildbuffers.c *** *** 607,617 gistRelocateBuildBuffersOnSplit(GISTBuildBuffers *gfbb, GISTSTATE *giststate, if (!found) { /* ! * Node buffer should exist at this point. If it didn't exist before, ! * the insertion that caused the page to split should've created it. */ ! elog(ERROR, node buffer of page being split (%u) does not exist, ! blocknum); } /* --- 607,616 if (!found) { /* ! * Page without buffer could be produced by split of root page. So ! * we've just nothing to do here when there is no buffer. */ ! return; } /* -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
On 2012-02-20 06:37, Don Baccus wrote: On Feb 19, 2012, at 7:24 PM, Tom Lane wrote: It's not clear to me whether pure/leakproof functions are meant to be a strict subset of immutable functions Superset, not subset, unless my guessing is wrong. How could pure be a subset of immutable? If immutable functions are not necessarily leakproof/pure, and all leakproof/pure functions are immutable. If the latter is not the case, pure leads to confusion as well. What about discreet? -- Yeb -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] leakproof
Greg Stark wrote: I suspect this is wrong for similar reasons as pure but I'll throw it out there: hermetic I personally have no problem with leakproof, but if it does not tickle the right associations with many people: What about secure? It is also not self-explanatory, but it might give people the idea that it's more a problem of restricting access to information than anything else. On the downside, security has been over- and abused so much already. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_buffers
On Mon, Feb 20, 2012 at 4:10 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Feb 20, 2012 at 3:08 AM, Robert Haas robertmh...@gmail.com wrote: On Sun, Feb 19, 2012 at 9:46 AM, Euler Taveira de Oliveira eu...@timbira.com wrote: On 19-02-2012 02:24, Robert Haas wrote: I have attached tps scatterplots. The obvious conclusion appears to be that, with only 16MB of wal_buffers, the buffer wraps around with some regularity Isn't it useful to print some messages on the log when we have wrap around? In this case, we have an idea that wal_buffers needs to be increased. I was thinking about that. I think that what might be more useful than a log message is a counter somewhere in shared memory. Logging imposes a lot of overhead, which is exactly what we don't want here, and the volume might be quite high on a system that is bumping up against this problem. Of course then the question is... how would we expose the counter value? There is no existing statistics view suitable to include such information. What about defining pg_stat_xlog or something? Perhaps pg_stat_perf so we don't need to find a new home every time. Thinking about it, I think renaming pg_stat_bgwriter would make more sense. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Google Summer of Code? Call for mentors.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello, On 01-02-2012 20:59, Josh Berkus wrote: Hackers, The call is now open for Google Summer of Code. If you are interested in being a GSoC mentor this summer, please reply to this email. I want to gauge whether or not we should participate this summer. Yes, I'll be available to mentor a student on phpPgAdmin this year again. Thanks, - -- Jehan-Guillaume (ioguix) de Rorthais http://www.dalibo.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9CC+4ACgkQXu9L1HbaT6KwBQCg3eEbJ22rEyuJA5AddS56H+ta fOoAnj9xecZ1fRUNFvuv40lgMfXbnfuI =ml12 -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Scaling XLog insertion (was Re: [HACKERS] Moving more work outside WALInsertLock)
On Sun, Feb 19, 2012 at 3:01 AM, Jeff Janes jeff.ja...@gmail.com wrote: I've tested your v9 patch. I no longer see any inconsistencies or lost transactions in the recovered database. But occasionally I get databases that fail to recover at all. It has always been with the exact same failed assertion, at xlog.c line 2154. I've only seen this 4 times out of 2202 cycles of crash and recovery, so it must be some rather obscure situation. LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 0/180001B0 LOG: unexpected pageaddr 0/15084000 in log file 0, segment 25, offset 540672 LOG: redo done at 0/19083FD0 LOG: last completed transaction was at log time 2012-02-17 11:13:50.369488-08 LOG: checkpoint starting: end-of-recovery immediate TRAP: FailedAssertion(!(((uint64) (NewPageEndPtr).xlogid * (uint64) (((uint32) 0x) / ((uint32) (16 * 1024 * 1024))) * ((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192) % (XLogCtl-XLogCacheBlck + 1)) == nextidx), File: xlog.c, Line: 2154) LOG: startup process (PID 5390) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure I could reproduce this when I made the server crash just after executing select pg_switch_xlog(). $ initdb -D data $ pg_ctl -D data start $ psql -c select pg_switch_xlog() $ pg_ctl -D data stop -m i $ pg_ctl -D data start ... LOG: redo done at 0/16E3B0C TRAP: FailedAssertion(!(((uint64) (NewPageEndPtr).xlogid * (uint64) (((uint32) 0x) / ((uint32) (16 * 1024 * 1024))) * ((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192) % (XLogCtl-XLogCacheBlck + 1)) == nextidx), File: xlog.c, Line: 2154) LOG: startup process (PID 16361) was terminated by signal 6: Aborted LOG: aborting startup due to startup process failure Though I've not read new patch yet, I doubt that xlog switch code would still have a bug. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
I wrote: Shigeru Hanada wrote: - Since a rescan is done by rewinding the cursor, is it necessary to have any other remote isolation level than READ COMMITED? There is only one query issued per transaction. If multiple foreign tables on a foreign server is used in a local query, multiple queries are executed in a remote transaction. So IMO isolation levels are useful even if remote query is executed only once. Oh, I see. You are right. I thought some more about this and changed my mind. If your query involves foreign scans on two foreign tables on the same foreign server, these should always see the same snapshot, because that's how it works with two scans in one query on local tables. So I think it should be REPEATABLE READ in all cases - SERIALIZABLE is not necessary as long as all you do is read. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
I looked into the complaint here of poor estimation for GIN indexscans: http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php At first glance it sounds like a mistake in selectivity estimation, but it isn't: the rowcount estimates are pretty nearly dead on. The problem is in the planner's estimate of the cost of executing the @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but actually it's a good deal more expensive than that. Some experimentation suggests that @@ might be about 500 times as expensive as a simple integer comparison. I don't propose pushing its procost up that much, but surely at least 10 would be appropriate, maybe even 100. However ... if you just alter pg_proc.procost in Marc's example, the planner *still* picks a seqscan, even though its estimate of the seqscan cost surely does go up. The reason is that its estimate of the GIN indexscan cost goes up just as much, since we charge one qual eval cost per returned tuple in gincostestimate. It is easy to tell from the actual runtimes that that is not what's happening in a GIN indexscan; we are not re-executing the @@ operator for every tuple. But the planner's cost model doesn't know that. Hello, many thanks for your feedback. I've repeated my test with a table using plain storage, which halved the query time. This confirms that detoasting is the major issue for cost estimation, but even with plain storage the table scan remains about 30% slower compared to the index scan. I've also looked for complex tsqueries where the planner would make a better choice when the statistics are available but found none. In some cases I got an identical plan, in other an inversion of the plans (with NOT operator(s)). In all cases where the plans differed, the planner chose the worse one, with severe time differences. So a naive 'empirical' question: In case of an inverted index in non lossy situation, shouldn't the planner also invert its cost assumptions? best regards, Marc Mamin toast impact: query: select id from table where v @@ 'fooblablabla'::tsquery toasted table, analyzed: 813 ms (table scan) http://explain.depesz.com/s/EoP plain storage, analyzed: 404 ms (table scan) http://explain.depesz.com/s/iGX without analyze: 280 ms (index scan) http://explain.depesz.com/s/5aGL other queries v @@ '(lexeme1 | lexeme4 ) ! (lexeme2 | lexeme3)'::tsquery http://explain.depesz.com/s/BC7 (index scan im both cases) plan switch ! v @@ '! fooblablabla'::tsquery plain storage, analyzed: 2280 ms (index scan !) http://explain.depesz.com/s/gCt without analyze: 760 ms(table scan !) http://explain.depesz.com/s/5aGL There are a couple of issues that would have to be addressed to make this better: 1. If we shouldn't charge procost per row, what should we charge? It's probably reasonable to assume that the primitive GIN-index-entry comparison operations have cost equal to one cpu_operator_cost regardless of what's assigned to the user-visible operators, but I'm not convinced that that's sufficient to model complicated operators. It might be okay to charge that much per index entry visited rather than driving it off the number of heap tuples returned. The code in gincostestimate goes to considerable lengths to estimate the number of index pages fetched, and it seems like it might be able to derive the number of index entries visited too, but it's not trying to account for any CPU costs at the moment. 2. What about lossy operators, or lossy bitmap scans? If either of those things happen, we *will* re-execute the @@ operator at visited tuples, so discounting its high cost would be a mistake. But the planner has no information about either effect, since we moved all support for lossiness to runtime. I think it was point #2 that led us to not consider these issues before. But now that we've seen actual cases where the planner makes a poor decision because it's not modeling this effect, I think we ought to try to do something about it. I haven't got time to do anything about this for 9.2, and I bet you don't either, but it ought to be on the TODO list to try to improve this. BTW, an entirely different line of thought is why on earth is @@ so frickin expensive, when it's comparing already-processed tsvectors with only a few entries to an already-processed tsquery with only one entry??. This test case suggests to me that there's something unnecessarily slow in there, and a bit of micro-optimization effort might be well repaid. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 12:42 AM, Robert Haas robertmh...@gmail.com wrote: I think we could do worse than this patch, but I don't really believe it's ready for commit. We don't have a single performance number showing how much of a performance regression this causes, either on the master or on the standby, on any workload, much less those where a problem is reasonably forseeable from the design you've chosen. Many controversial aspects of the patch, such as the way you're using the buffer header spinlocks as a surrogate for x-locking the buffer, or the right way of obtaining the bit-space the patch requires, haven't really been discussed, and to the extent that they have been discussed, they have not been agreed. These points are being discussed here. If you have rational objections, say what they are. On the former point, you haven't updated src/backend/storage/buffer/README at all; I've updated the appropriate README file which relates to page LSNs to explain. but updating is not by itself sufficient. Before we change the buffer-locking rules, we ought to have some discussion of whether it's OK to do that, and why it's necessary. Why it's necessary would presumably include demonstrating that the performance of the straightforward implementation stinks, and that with this change is better. What straightforward implementation is that?? This *is* the straightforward one. God knows what else we'd break if we drop the lock, reacquire as an exclusive, then drop it again and reacquire in shared mode. Code tends to assume that when you take a lock you hold it until you release; doing otherwise would allow all manner of race conditions to emerge. And do you really think that would be faster? You haven't made any effort to do that whatsoever, or if you have, you haven't posted the results here. I'm pretty un-excited by that change, first because I think it's a modularity violation and possibly unsafe, and second because I believe we've already got a problem with buffer header spinlock contention which this might exacerbate. Another disadvantage of the current scheme is that there's no particularly easy way to know that your whole cluster has checksums. No matter how we implement checksums, you'll have to rewrite every table in the cluster in order to get them fully turned on. But with the current design, there's no easy way to know how much of the cluster is actually checksummed. You can read every block and check. Using what tool? Pageinspect? If you shut checksums off, they'll linger until those pages are rewritten, and there's no easy way to find the relations from which they need to be removed, either. We can't have it both ways. Either we have an easy upgrade, or we don't. I'm told that an easy upgrade is essential. Easy upgrade and removal of checksums are unrelated issues AFAICS. I'm tempted to suggest a relation-level switch: when you want checksums, you use ALTER TABLE to turn them on, and when you don't want them any more you use ALTER TABLE to shut them off again, in each case rewriting the table. That way, there's never any ambiguity about what's in the data pages in a given relation: either they're either all checksummed, or none of them are. This moves the decision about whether checksums are enabled or disabled quite a but further away from the data itself, and also allows you to determine (by catalog inspection) which parts of the cluster do in fact have checksums. It might be kind of a pain to implement, though: you'd have to pass the information about how any given relation was configured down to the place where we validate page sanity. I'm not sure whether that's practical. It's not practical as the only mechanism, given the requirement for upgrade. Why not? The version stuff is catered for by the current patch. Your patch reuses the version number field for an unrelated purpose and includes some vague hints of how we might do versioning using some of the page-level flag bits. Since bit-space is fungible, I think it makes more sense to keep the version number where it is and carve the checksum out of whatever bit space you would have moved the version number into. Whether that's pd_tli or pd_flags is somewhat secondary; the point is that you can certainly arrange to leave the 8 bits that currently contain the version number as they are. If you've got some rational objections, please raise them. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c
On 20/02/12 04:29, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: On 18/02/12 21:17, Tom Lane wrote: Dave Malcolm at Red Hat has been working on a static code analysis tool for Python-related C code. He reports here on some preliminary results for plpython.c: https://bugzilla.redhat.com/show_bug.cgi?id=795011 Here's a patch that fixes everything I was sure was an actual bug. The rest of the warnings seem to be caused by the tool not knowing that elog(ERROR) throws a longjmp and things like we never unref this object, so it can't disappear mid-execution. My only comment is whether elog(ERROR) is appropriate, ie, do we consider these to be internal errors that users will never see in practice? AFAICS these errors can only happen on out of memory conditions or other internal errors (like trying to create a list with a negative length). Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] array_to_json re-encodes ARRAY of json type
If we pass an ARRAY of json type to array_to_json() function, the function seems to re-encode the JSON text. But should the following examples be the same result? I'm not sure why we don't have a special case for json type in datum_to_json() -- do we need to pass-through json types in it? =# \x =# SELECT '[A]'::json, array_to_json(ARRAY['A']), array_to_json(ARRAY['A'::json]); -[ RECORD 1 ]-+-- json | [A] array_to_json | [A] array_to_json | [\A\] -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Potential reference miscounts and segfaults in plpython.c
On Mon, Feb 20, 2012 at 5:05 AM, Jan Urbański wulc...@wulczer.org wrote: On 20/02/12 04:29, Tom Lane wrote: =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes: On 18/02/12 21:17, Tom Lane wrote: Dave Malcolm at Red Hat has been working on a static code analysis tool for Python-related C code. He reports here on some preliminary results for plpython.c: https://bugzilla.redhat.com/show_bug.cgi?id=795011 Here's a patch that fixes everything I was sure was an actual bug. The rest of the warnings seem to be caused by the tool not knowing that elog(ERROR) throws a longjmp and things like we never unref this object, so it can't disappear mid-execution. My only comment is whether elog(ERROR) is appropriate, ie, do we consider these to be internal errors that users will never see in practice? AFAICS these errors can only happen on out of memory conditions or other internal errors (like trying to create a list with a negative length). We typically want out of memory to use ereport, so that it gets translated. For example, in fd.c we have: ereport(FATAL, (errcode(ERRCODE_OUT_OF_MEMORY), errmsg(out of memory))); Trying to create a list with a negative length sounds similar. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 4:18 AM, Simon Riggs si...@2ndquadrant.com wrote: What straightforward implementation is that?? This *is* the straightforward one. God knows what else we'd break if we drop the lock, reacquire as an exclusive, then drop it again and reacquire in shared mode. Code tends to assume that when you take a lock you hold it until you release; doing otherwise would allow all manner of race conditions to emerge. And do you really think that would be faster? I don't know, but neither do you, because you apparently haven't tried it. Games where we drop the shared lock and get an exclusive lock are used in numerous places in the source code; see, for example, heap_update(), so I don't believe that there is any reason to reject that a priori. Indeed, I can think of at least one pretty good reason to do it exactly that way: it's the way we've handled all previous problems of this type, and in general it's better to make new code look like existing code than to invent something new, especially when you haven't made any effort to quantify the problem or the extent to which the proposed solution mitigates it. If you've got some rational objections, please raise them. Look, I think that the objections I have been raising are rational. YMMV, and obviously does. The bottom line here is that I can't stop you from committing this patch, and we both know that. And, really, at the end of the day, I have no desire to keep you from committing this patch, even if I had the power to do so, because I agree that the feature has merit. But I can object to it and, as the patch stands now, I do object to it, for the following specific reasons: 1. You haven't posted any meaningful performance test results, of either normal cases or worst cases. 2. You're fiddling with the buffer locking in a way that I'm very doubtful about without having tried any of the alternatives. 3. You're rearranging the page header in a way that I find ugly and baroque. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On Mon, Feb 20, 2012 at 2:49 AM, Fujii Masao masao.fu...@gmail.com wrote: In pg_stat_statements--1.0--1.1.sql, we should complain if script is sourced in psql, as follows? \echo Use ALTER EXTENSION pg_stat_statements UPDATE TO '1.1' to load this file. \quit Yeah, maybe. I don't know if we want to put that in every file forever, but I guess it probably makes sense to do it here. +DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \ + pg_stat_statements--unpackaged--1.0.sql Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent old version (i.e., 1.0) of pg_stat_statements from being used in 9.2? I'm not sure. My feeling is that we probably don't want to ship all the old scripts forever. People should install the latest version, and use the upgrade scripts to get there if they have an older one. So my gut feeling here is to change hstore to exclude that file rather than adding it here. Any other opinions? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_to_json re-encodes ARRAY of json type
On 02/20/2012 07:30 AM, Itagaki Takahiro wrote: If we pass an ARRAY of json type to array_to_json() function, the function seems to re-encode the JSON text. But should the following examples be the same result? I'm not sure why we don't have a special case for json type in datum_to_json() -- do we need to pass-through json types in it? =# \x =# SELECT '[A]'::json, array_to_json(ARRAY['A']), array_to_json(ARRAY['A'::json]); -[ RECORD 1 ]-+-- json | [A] array_to_json | [A] array_to_json | [\A\] Hmm, maybe. The trouble is that datum_to_json doesn't know what type it's getting, only the type category. We could probably fudge it by faking a false one for JSON, say with a lower case 'j', which should be fairly future-proof, where the category is detected - for efficiency reasons we do this for the whole array rather than for each element of the array. There's another case I have on my list to fix too - some numeric output such as NaN and Infinity are not legal JSON numeric values, and need to be quoted in order to avoid generating illegal JSON. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
On Mon, Feb 20, 2012 at 10:18:31AM +0100, Marc Mamin wrote: I looked into the complaint here of poor estimation for GIN indexscans: http://archives.postgresql.org/pgsql-performance/2012-02/msg00028.php At first glance it sounds like a mistake in selectivity estimation, but it isn't: the rowcount estimates are pretty nearly dead on. The problem is in the planner's estimate of the cost of executing the @@ operator. We have pg_proc.procost set to 1 for ts_match_vq, but actually it's a good deal more expensive than that. Some experimentation suggests that @@ might be about 500 times as expensive as a simple integer comparison. I don't propose pushing its procost up that much, but surely at least 10 would be appropriate, maybe even 100. However ... if you just alter pg_proc.procost in Marc's example, the planner *still* picks a seqscan, even though its estimate of the seqscan cost surely does go up. The reason is that its estimate of the GIN indexscan cost goes up just as much, since we charge one qual eval cost per returned tuple in gincostestimate. It is easy to tell from the actual runtimes that that is not what's happening in a GIN indexscan; we are not re-executing the @@ operator for every tuple. But the planner's cost model doesn't know that. Hello, many thanks for your feedback. I've repeated my test with a table using plain storage, which halved the query time. This confirms that detoasting is the major issue for cost estimation, but even with plain storage the table scan remains about 30% slower compared to the index scan. Hi Marc, Do you happen to know in which function, the extra time for the toast storage is spent -- zlib compression? I saw a mention of the LZ4 compression algorithm that is BSD licensed as a Google summer of code project: http://code.google.com/p/lz4/ that compresses at almost 7X than zlib (-1) and decompresses at 6X. Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Qual evaluation cost estimates for GIN indexes
Hi Marc, Do you happen to know in which function, the extra time for the toast storage is spent -- zlib compression? I saw a mention of the LZ4 compression algorithm that is BSD licensed as a Google summer of code project: http://code.google.com/p/lz4/ that compresses at almost 7X than zlib (-1) and decompresses at 6X. Regards, Ken Hi, No, and my concern is more about cost estimation for ts_queries / gin indexes as for the detoasting issue. regards, Marc Mamin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label
On 2012-02-05 10:09, Kohei KaiGai wrote: The attached part-1 patch moves related routines from hooks.c to label.c because of references to static variables. The part-2 patch implements above mechanism. I took a short look at this patch but am stuck getting the regression test to run properly. First, patch 2 misses the file sepgsql.sql.in and therefore the creation function command for sepgsql_setcon is missing. When that was solved, ./test_psql failed on the message that the psql file may not be of object type unconfined_t. Once it was changed to bin_t, the result output for the domain transition gives differences on this output (the other parts of label.sql were ok) : -- -- Test for Dynamic Domain Transition -- -- validation of transaction aware dynamic-transition /usr/bin/runcon: /usr/local/pgsql/bin/psql: Permission denied /usr/bin/runcon: /usr/local/pgsql/bin/psql: Permission denied /usr/bin/runcon: /usr/local/pgsql/bin/psql: Permission denied However when I connect to the regression database directly, I can execute the first setcon command but get regression=# SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c12'); ERROR: SELinux: security policy violation logfile shows LOG: SELinux: denied { dyntransition } scontext=unconfined_u:unconfined_r:unconfined_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0:c0.c12 tclass=process So maybe this is because my start domain is not s0-s0:c0.c1023 However, when trying to run bash or psql in domain unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 I get permission denied. Distribution is FC15, sestatus SELinux status: enabled SELinuxfs mount:/selinux Current mode: enforcing Mode from config file: enforcing Policy version: 24 Policy from config file:targeted -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Albe Laurenz laurenz.a...@wien.gv.at wrote: If your query involves foreign scans on two foreign tables on the same foreign server, these should always see the same snapshot, because that's how it works with two scans in one query on local tables. That makes sense. So I think it should be REPEATABLE READ in all cases - SERIALIZABLE is not necessary as long as all you do is read. That depends on whether you only want to see states of the database which are consistent with later states of the database and any invariants enforced by triggers or other software. See this example of how a read-only transaction can see a bogus state at REPEATABLE READ or less strict transaction isolation: http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions Perhaps if the transaction using the pgsql_fdw is running at the SERIALIZABLE transaction isolation level, it should run the queries at the that level, otherwise at REPEATABLE READ. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect behaviour when using a GiST index on points
Alexander Korotkov aekorot...@gmail.com writes: On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov aekorot...@gmail.comwrote: Described differences leads to incorrect behaviour of GiST index. The question is: what is correct way to fix it? Should on_pb also use FP* or consistent method should behave like on_pb? Any comments on this? Current behaviour definitely indicates a bug, and I'm ready to fix it. The only question: is this bug in on_pb or gist? I'm inclined to think the right answer is to make on_pb use the FP* macros, for consistency with other geometric operators. But it's worth asking whether that will actually fix the problem. I've thought for some time that we'd eventually find cases where geo_ops' use of fuzzy comparisons breaks index behavior entirely, because it destroys natural assumptions like the transitive law. So that could eventually lead us to rip out the FP* macros everywhere. In any case, this doesn't seem like something we could back-patch; it'd be a behavioral change in HEAD only. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REASSIGN OWNED lacks support for FDWs
On Wed, Feb 15, 2012 at 12:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php there is no switch case in shdepReassignOwned for foreign data wrappers. The obvious short-term answer (and probably the only back-patchable one) is to add a case for that object type. But after all the refactoring that's been done in the general area of this type of command, I'm a bit surprised that shdepReassignOwned still looks like this. Can't we merge this knowledge into someplace where it doesn't have to be maintained separately? Hmm. I guess we could add function pointers to the ObjectProperty array in objectaddress.c. Then we could just search the array for the catalog ID and call the associated function through the function pointer, rather than having a switch in shdepReassignOwned(). Since anyone adding a new object type ought to be looking at objectaddress.c anyway, that would be one less place for people to forget to update. However, I'm not 100% sure that's an improvement. Switches by object type are probably not going to go away... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] REASSIGN OWNED lacks support for FDWs
Robert Haas robertmh...@gmail.com writes: On Wed, Feb 15, 2012 at 12:58 PM, Tom Lane t...@sss.pgh.pa.us wrote: As per http://archives.postgresql.org/pgsql-general/2012-02/msg00304.php there is no switch case in shdepReassignOwned for foreign data wrappers. The obvious short-term answer (and probably the only back-patchable one) is to add a case for that object type. But after all the refactoring that's been done in the general area of this type of command, I'm a bit surprised that shdepReassignOwned still looks like this. Can't we merge this knowledge into someplace where it doesn't have to be maintained separately? Hmm. I guess we could add function pointers to the ObjectProperty array in objectaddress.c. Then we could just search the array for the catalog ID and call the associated function through the function pointer, rather than having a switch in shdepReassignOwned(). Since anyone adding a new object type ought to be looking at objectaddress.c anyway, that would be one less place for people to forget to update. I was wondering more whether there isn't some single entry point that would allow access to ALTER OWNER functionality for any object type. If we still are in a situation where new shdepReassignOwned-specific code has to be written for every object type, it's not really much better. BTW, code freeze for the upcoming releases is Thursday ... is anyone going to actually fix this bug before then? I'm unlikely to find the time myself. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Kevin Grittner wrote: If your query involves foreign scans on two foreign tables on the same foreign server, these should always see the same snapshot, because that's how it works with two scans in one query on local tables. That makes sense. So I think it should be REPEATABLE READ in all cases - SERIALIZABLE is not necessary as long as all you do is read. That depends on whether you only want to see states of the database which are consistent with later states of the database and any invariants enforced by triggers or other software. See this example of how a read-only transaction can see a bogus state at REPEATABLE READ or less strict transaction isolation: http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions Perhaps if the transaction using the pgsql_fdw is running at the SERIALIZABLE transaction isolation level, it should run the queries at the that level, otherwise at REPEATABLE READ. I read the example carefully, and it seems to me that it is necessary for the read-only transaction (T3) to be SERIALIZABLE so that T1 is aborted and the state that T3 saw remains valid. If I understand right, I agree with your correction. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
Albe Laurenz laurenz.a...@wien.gv.at wrote: I read the example carefully, and it seems to me that it is necessary for the read-only transaction (T3) to be SERIALIZABLE so that T1 is aborted and the state that T3 saw remains valid. Correct. If I understand right, I agree with your correction. :-) -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect behaviour when using a GiST index on points
On Mon, Feb 20, 2012 at 7:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Korotkov aekorot...@gmail.com writes: On Thu, Feb 16, 2012 at 11:43 AM, Alexander Korotkov aekorot...@gmail.comwrote: Described differences leads to incorrect behaviour of GiST index. The question is: what is correct way to fix it? Should on_pb also use FP* or consistent method should behave like on_pb? Any comments on this? Current behaviour definitely indicates a bug, and I'm ready to fix it. The only question: is this bug in on_pb or gist? I'm inclined to think the right answer is to make on_pb use the FP* macros, for consistency with other geometric operators. But it's worth asking whether that will actually fix the problem. I've thought for some time that we'd eventually find cases where geo_ops' use of fuzzy comparisons breaks index behavior entirely, because it destroys natural assumptions like the transitive law. So that could eventually lead us to rip out the FP* macros everywhere. In any case, this doesn't seem like something we could back-patch; it'd be a behavioral change in HEAD only. Analyzing GiST interface methods more detail I found one other place where fuzzy comparison was used. It is gist_box_same function. And it's really scary thing. It means that entry of internal page is not extended when inserting index tuple extends it in less than EPSILON. Correspondingly current GiST search behaviour is neither exact search or fuzzy search with given EPSILON. It is something in the middle. See following example for proof. test=# create table test(a box); CREATE TABLE test=# insert into test (select (case when i%2= 0 then box(point(1,1),point(1,1)) else box(point(2,2),point(2,2)) end) from generate_series(1,300) i); INSERT 0 300 test=# create index test_idx on test using gist(a); CREATE INDEX test=# test=# insert into test values (box(point(1.009,1.009), point(1.009,1.009))); INSERT 0 1 test=# select * from test where a box(point(1.018,1.018), point(1.018,1.018)); a - (1.009,1.009),(1.009,1.009) (1 row) test=# set enable_seqscan = off; SET test=# select * from test where a box(point(1.018,1.018), point(1.018,1.018)); a --- (0 rows) But, I believe we still can bachpatch it in one of following ways: 1) Fix consistent and same functions. Add to release note notice that users should rebuild indexes if they want correct behaviour. 2) Leave same function as is. Add kluges to consistent functions which provides correct search on current not truly correct trees. But anyway +1 for rip out the FP* macros everywhere in HEAD. Because I really dislike the way FP* are currently implemented. Why EPSILON is 1.0E-06? We don't know anything about nature of data that users store in geometrical datatypes. The lesser double precision value is 5E-324. For some datasets EPSILON can easily cover whole range. -- With bestregards, Alexander Korotkov.
Re: [HACKERS] 16-bit page checksums for 9.2
On 2/20/12 5:57 AM, Robert Haas wrote: 3. You're rearranging the page header in a way that I find ugly and baroque. Guys, are we talking about an on-disk format change? If so, this may need to be kicked out of 9.2 ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote: On 2/20/12 5:57 AM, Robert Haas wrote: 3. You're rearranging the page header in a way that I find ugly and baroque. Guys, are we talking about an on-disk format change? If so, this may need to be kicked out of 9.2 ... Yes, we are. Simon's gone to some pains to make it backward compatible, but IMHO it's a lot messier and less future-proof than it could be with some more work, and if we commit this patch than we WILL be stuck with this for a very long time. The fact is that, thus far, so real effort has been made by anyone to evict anything from the current CommitFest. I did that for the last two cycles, but as the minutes for last year's development meeting succinctly observed Haas ... doesn't like being the schedule jerk. My resolve to be less of a schedule jerk is being sorely tested, though: aside from this patch, which has its share of issues, there's also Alvaro's foreign key stuff, which probably needs a lot more review than it has so far gotten, and several large patches from Dimitri, which do cool stuff but need a lot more work, and Peter Geoghegan's pg_stat_statements patch, which is awesome functionality but sounds like it's still a little green, and parallel pg_dump, which is waiting on a rebase after some refactoring I did to simplify things, and pgsql_fdw, and Heikki's xlog insert scaling patch, which fortunately seems to be in the capable hands of Fujii Masao and Jeff Janes, but certainly isn't ready to go today. Any of these individually could probably eat up a solid week of my time, and then there are the other 40 as-yet-undealt-with patches. I said five weeks ago that it probably wouldn't hurt anything to let the CommitFest run six weeks, and Tom opined that it would probably require two months. But the sad reality is that, after five weeks, we're not even half done with this CommitFest. That's partly because most people did not review as much code as they submitted, partly because a bunch of people played fast and loose with the submission deadline, and partly because we didn't return any patches that still needed major rehashing after the first round of review, leading to a situation where supposedly code-complete patches are showing up for the first time four or five weeks after the submission deadline. Now none of that is the fault of this patch specifically: honestly, if I had to pick just two more patches to get committed before beta, this would probably be one of them. But that doesn't make me happy with where it's at today, not to mention the fact that there are people who submitted their code a lot earlier who still haven't gotten the attention this patch has, which is still less than the attention a patch of this magnitude probably needs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wal_buffers
On Mon, Feb 20, 2012 at 3:59 AM, Simon Riggs si...@2ndquadrant.com wrote: There is no existing statistics view suitable to include such information. What about defining pg_stat_xlog or something? Perhaps pg_stat_perf so we don't need to find a new home every time. Thinking about it, I think renaming pg_stat_bgwriter would make more sense. When we created pg_stat_reset_shared(text), we seemed to be contemplating the idea of multiple sets of shared counters identified by names -- bgwriter for the background writer, and maybe other things for other subsystems. So we'd have to think about how to adjust that. I do agree with you that it seems a shame to invent a whole new view for one counter... Another thought is that I'm not sure it makes sense to run this through the stats system at all. We could regard it as a shared memory counter protected by one of the LWLocks involved, which would probably be quite a bit cheaper - just one machine instruction to increment it at need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future of our regular expression code
Jay, Good links, and I've also looked at a few others with benchmarks. I believe most of the benchmarks are done before PCRE implemented jit. I haven't found a benchmark with jit enabled, so I'm not sure if it will make a difference. Also I'm not sure how accurately the benchmarks will show how they will perform in an RDBMS environment. The optimizer probably is a very important variable in many complex queries. I'm leaning towards trying to implement RE2 and PCRE and running some benchmarks to see which performs best. Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in almost all categories)? Or is this a bad idea? Just a thought. On Mon, Feb 20, 2012 at 12:09 AM, Jay Levitt jay.lev...@gmail.com wrote: Stephen Frost wrote: Alright, I'll bite.. Which existing regexp implementation that's well written, well maintained, and which is well protected against malicious regexes should we be considering then? FWIW, there's a benchmark here that compares a number of regexp engines, including PCRE, TRE and Russ Cox's RE2: http://lh3lh3.users.**sourceforge.net/reb.shtmlhttp://lh3lh3.users.sourceforge.net/reb.shtml The fastest backtracking-style engine seems to be oniguruma, which is native to Ruby 1.9 and thus not only supports Unicode but I'd bet performs pretty well on it, on account of it's developed in Japan. But it goes pathological on regexen containing '|'; the only safe choice among PCRE-style engines is RE2, but of course that doesn't support backreferences. Russ's page on re2 (http://code.google.com/p/re2/**) says: If you absolutely need backreferences and generalized assertions, then RE2 is not for you, but you might be interested in irregexp, Google Chrome's regular expression engine. That's here: http://blog.chromium.org/2009/**02/irregexp-google-chromes-** new-regexp.htmlhttp://blog.chromium.org/2009/02/irregexp-google-chromes-new-regexp.html Sadly, it's in Javascript. Seems like if you need a safe, performant regexp implementation, your choice is (a) finish PLv8 and support it on all platforms, or (b) add backreferences to RE2 and precompile it to C with Comeau (if that's still around), or... Jay -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future of our regular expression code
Billy Earney billy.ear...@gmail.com writes: Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in almost all categories)? Or is this a bad idea? We used to have a GUC that selected the default mode for Spencer's package (ARE, ERE, or BRE), and eventually gave it up on the grounds that it did more harm than good. In particular, you really cannot treat the regex operators as immutable if their behavior varies depending on a GUC, which is more or less fatal from an optimization standpoint. So I'd say a GUC that switches engines, and thereby brings in subtler but no less real incompatibilities than the old one did, would be a pretty bad idea. Also, TBH I have exactly zero interest in supporting pluggable regex engines in Postgres. Regex is not sufficiently central to what we do to justify the work of coping with N different APIs and sets of idiosyncrasies. (Perl evidently sees that differently, and with some reason.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Future of our regular expression code
Tom, Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Thanks! Billy On Mon, Feb 20, 2012 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Billy Earney billy.ear...@gmail.com writes: Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in almost all categories)? Or is this a bad idea? We used to have a GUC that selected the default mode for Spencer's package (ARE, ERE, or BRE), and eventually gave it up on the grounds that it did more harm than good. In particular, you really cannot treat the regex operators as immutable if their behavior varies depending on a GUC, which is more or less fatal from an optimization standpoint. So I'd say a GUC that switches engines, and thereby brings in subtler but no less real incompatibilities than the old one did, would be a pretty bad idea. Also, TBH I have exactly zero interest in supporting pluggable regex engines in Postgres. Regex is not sufficiently central to what we do to justify the work of coping with N different APIs and sets of idiosyncrasies. (Perl evidently sees that differently, and with some reason.) regards, tom lane
Re: [HACKERS] Future of our regular expression code
Billy Earney billy.ear...@gmail.com writes: Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Well, introducing a new library would create compatibility issues that we'd just as soon not deal with, so I think that that's only likely to be seriously entertained if we decide that Spencer's code is unmaintainable. That's not a foregone conclusion; IMO the only fact in evidence is that the Tcl community isn't getting it done. Since Brendan Jurd has volunteered to try to split that code out into a standalone library, I think such a decision really has to wait until we see if (a) he's successful and (b) the result attracts some kind of community around it. So in short, let's give him a couple years and then if things are no better we'll revisit this issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
Guys, are we talking about an on-disk format change? If so, this may need to be kicked out of 9.2 ... Yes, we are. Simon's gone to some pains to make it backward compatible, but IMHO it's a lot messier and less future-proof than it could be with some more work, and if we commit this patch than we WILL be stuck with this for a very long time. Yeah. I'd personally prefer to boot this to 9.3, then. It's not like there's not enough features for 9.2, and I really don't want this feature to cause 5 others to be delayed. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server
2012/02/21 0:58 Kevin Grittner kevin.gritt...@wicourts.gov: Albe Laurenz laurenz.a...@wien.gv.at wrote: I read the example carefully, and it seems to me that it is necessary for the read-only transaction (T3)v to be SERIALIZABLE so that T1 is aborted and the state that T3 saw remains valid. Correct. Hm, agreed that isolation levels REPEATABLE READ are not sufficient for pgsql_fdw's usage. I'll examine the example and fix pgsql_fdw. Thanks.
Re: [HACKERS] 16-bit page checksums for 9.2
On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote: Another disadvantage of the current scheme is that there's no particularly easy way to know that your whole cluster has checksums. No matter how we implement checksums, you'll have to rewrite every table in the cluster in order to get them fully turned on. But with the current design, there's no easy way to know how much of the cluster is actually checksummed. If you shut checksums off, they'll linger until those pages are rewritten, and there's no easy way to find the relations from which they need to be removed, either. Yes, pg_upgrade makes this hard. If you are using pg_dump to restore, and set the checksum GUC before you do the restore, and never turn it off, then you will have a fully checksum'ed database. If you use pg_upgrade, and enable the checksum GUC, your database will become progressively checksummed, and as Simon pointed out, the only clean way is VACUUM FULL. It is quite hard to estimate the checksum coverage of a database with mixed checksumming --- one cool idea would be for ANALYZE to report how many of the pages it saw were checksummed. Yeah, crazy, but it might be enough. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 6:22 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 20, 2012 at 12:53 PM, Josh Berkus j...@agliodbs.com wrote: On 2/20/12 5:57 AM, Robert Haas wrote: 3. You're rearranging the page header in a way that I find ugly and baroque. Guys, are we talking about an on-disk format change? If so, this may need to be kicked out of 9.2 ... Yes, we are. Simon's gone to some pains to make it backward compatible, but IMHO it's a lot messier and less future-proof than it could be with some more work, and if we commit this patch than we WILL be stuck with this for a very long time. No, we aren't. The format is not changed, though there are 3 new bits added. The format is designed to be extensible and we have room for 10 more bits, which allows 1024 new page formats, which at current usage rate should last us for approximately 5000 years of further development. Previously, we were limited to 251 more page formats, so this new mechanism is more future proof than the last. Alternatively you might say that we have dropped from 1271 page formats to only 1024, which is hardly limiting. This has already been discussed and includes points made by Bruce and Heikki in the final design. The more work required is not described clearly, nor has anyone but RH requested it. In terms of messier, RH's only alternate suggestion is to split the checksum into multiple pieces, which I don't think yer average reader would call less ugly, less messy or more future proof. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 16 February 2012 21:11, Peter Geoghegan pe...@2ndquadrant.com wrote: * # XXX: This test currently fails!: #verify_normalizes_correctly(SELECT cast('1' as dnotnull);,SELECT cast(? as dnotnull);,conn, domain literal canonicalization/cast) It appears to fail because the CoerceToDomain node gives its location to the constant node as starting from cast, so we end up with SELECT ?('1' as dnotnull);. I'm not quite sure if this points to there being a slight tension with my use of the location field in this way, or if this is something that could be fixed as a bug in core (albeit a highly obscure one), though I suspect the latter. So I looked at this in more detail today, and it turns out that it has nothing to do with CoerceToDomain in particular. The same effect can be observed by doing this: select cast('foo' as text); In turns out that this happens for the same reason as the location of the Const token in the following query: select integer 5; being given such that the string select ? results. Resolving this one issue resolves some others, as it allows me to greatly simplify the get_constant_length() logic. Here is the single, hacky change I've made just for now to the core parser to quickly see if it all works as expected: *** transformTypeCast(ParseState *pstate, Ty *** 2108,2113 --- 2108,2116 if (location 0) location = tc-typeName-location; + if (IsA(expr, Const)) + location = ((Const*)expr)-location; + result = coerce_to_target_type(pstate, expr, inputType, targetType, targetTypmod, COERCION_EXPLICIT, After making this change, I can get all my regression tests to pass (once I change the normalised representation of certain queries to look like: select integer ? rather than select ?, which is better anyway), including the CAST()/CoerceToDomain one that previously failed. So far so good. Clearly this change is a quick and dirty workaround, and something better is required. The question I'd pose to the maintainer of this code is: what business does the coerce_to_target_type call have changing the location of the Const node resulting from coercion under the circumstances described? I understand that the location of the CoerceToDomain should be at CAST, but why should the underlying Const's position be the same? Do you agree that this is a bug, and if so, would you please facilitate me by committing a fix? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 11:09 PM, Bruce Momjian br...@momjian.us wrote: On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote: Another disadvantage of the current scheme is that there's no particularly easy way to know that your whole cluster has checksums. No matter how we implement checksums, you'll have to rewrite every table in the cluster in order to get them fully turned on. But with the current design, there's no easy way to know how much of the cluster is actually checksummed. If you shut checksums off, they'll linger until those pages are rewritten, and there's no easy way to find the relations from which they need to be removed, either. Yes, pg_upgrade makes this hard. If you are using pg_dump to restore, and set the checksum GUC before you do the restore, and never turn it off, then you will have a fully checksum'ed database. If you use pg_upgrade, and enable the checksum GUC, your database will become progressively checksummed, and as Simon pointed out, the only clean way is VACUUM FULL. It is quite hard to estimate the checksum coverage of a database with mixed checksumming --- one cool idea would be for ANALYZE to report how many of the pages it saw were checksummed. Yeah, crazy, but it might be enough. Well, I didn't say VACUUM FULL was the only clean way of knowing whether every block is checksummed, its a very intrusive way. If you want a fast upgrade with pg_upgrade, rewriting every block is not really a grand plan, but if you want it... If we did that, I think I would prefer to do it with these commands VACUUM ENABLE CHECKSUM; //whole database only VACUUM DISABLE CHECKSUM; rather than use a GUC. We can then add an option to pg_upgrade. That way, we scan whole database, adding checksums and then record it in pg_database When we remove it, we do same thing in reverse then record it. So there's no worries about turning on/off GUCs and we know for certain where our towel is. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
On 20 February 2012 23:16, Peter Geoghegan pe...@2ndquadrant.com wrote: Clearly this change is a quick and dirty workaround, and something better is required. The question I'd pose to the maintainer of this code is: what business does the coerce_to_target_type call have changing the location of the Const node resulting from coercion under the circumstances described? I understand that the location of the CoerceToDomain should be at CAST, but why should the underlying Const's position be the same? Another look around shows that the CoerceToDomain struct takes its location from the new Const location in turn, so my dirty little hack will break the location of the CoerceToDomain struct, giving an arguably incorrect caret position in some error messages. It would suit me if MyCoerceToDomain-arg (or the arg of a similar node related to coercion, like CoerceViaIO) pointed to a Const node with, potentially, and certainly in the case of my original CoerceToDomain test case, a distinct location to the coercion node itself. Can we do that? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan pe...@2ndquadrant.com writes: Here is the single, hacky change I've made just for now to the core parser to quickly see if it all works as expected: *** transformTypeCast(ParseState *pstate, Ty *** 2108,2113 --- 2108,2116 if (location 0) location = tc-typeName-location; + if (IsA(expr, Const)) + location = ((Const*)expr)-location; + result = coerce_to_target_type(pstate, expr, inputType, targetType, targetTypmod, COERCION_EXPLICIT, This does not look terribly sane to me. AFAICS, the main effect of this would be that if you have an error in coercing a literal to some specified type, the error message would point at the literal and not at the cast operator. That is, in examples like these: regression=# select 42::point; ERROR: cannot cast type integer to point LINE 1: select 42::point; ^ regression=# select cast (42 as point); ERROR: cannot cast type integer to point LINE 1: select cast (42 as point); ^ you're proposing to move the error pointer to the 42, and that does not seem like an improvement, especially not if it only happens when the cast subject is a simple constant rather than an expression. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Peter Geoghegan pe...@2ndquadrant.com writes: Another look around shows that the CoerceToDomain struct takes its location from the new Const location in turn, so my dirty little hack will break the location of the CoerceToDomain struct, giving an arguably incorrect caret position in some error messages. It would suit me if MyCoerceToDomain-arg (or the arg of a similar node related to coercion, like CoerceViaIO) pointed to a Const node with, potentially, and certainly in the case of my original CoerceToDomain test case, a distinct location to the coercion node itself. Sorry, I'm not following. What about that isn't true already? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Displaying accumulated autovacuum cost
On Mon, Feb 20, 2012 at 11:00 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 20, 2012 at 2:49 AM, Fujii Masao masao.fu...@gmail.com wrote: +DATA = pg_stat_statements--1.1.sql pg_stat_statements--1.0--1.1.sql \ + pg_stat_statements--unpackaged--1.0.sql Though I'm not familiar with CREATE EXTENSION. Why did you exclude 1.0.sql from DATA? In hstore/Makefile, 1.0.sql is included. You think we should prevent old version (i.e., 1.0) of pg_stat_statements from being used in 9.2? I'm not sure. My feeling is that we probably don't want to ship all the old scripts forever. People should install the latest version, and use the upgrade scripts to get there if they have an older one. So my gut feeling here is to change hstore to exclude that file rather than adding it here. Any other opinions? Agreed. But I wonder why VERSION option is usable in CREATE EXTENSION if people always should use the latest version. Maybe I'm missing something.. Anyway, shipping v1.0 of pg_stat_statement seems less useful in 9.2, so I agree to exclude it. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 16-bit page checksums for 9.2
On Mon, Feb 20, 2012 at 11:23:42PM +, Simon Riggs wrote: If you use pg_upgrade, and enable the checksum GUC, your database will become progressively checksummed, and as Simon pointed out, the only clean way is VACUUM FULL. It is quite hard to estimate the checksum coverage of a database with mixed checksumming --- one cool idea would be for ANALYZE to report how many of the pages it saw were checksummed. Yeah, crazy, but it might be enough. Well, I didn't say VACUUM FULL was the only clean way of knowing whether every block is checksummed, its a very intrusive way. If you want a fast upgrade with pg_upgrade, rewriting every block is not really a grand plan, but if you want it... If we did that, I think I would prefer to do it with these commands VACUUM ENABLE CHECKSUM; //whole database only VACUUM DISABLE CHECKSUM; rather than use a GUC. We can then add an option to pg_upgrade. That way, we scan whole database, adding checksums and then record it in pg_database When we remove it, we do same thing in reverse then record it. So there's no worries about turning on/off GUCs and we know for certain where our towel is. Yes, that would work, but I suspect most users are going to want to enable checksums when they are seeing some odd behavior and want to test the hardware. Requiring them to rewrite the database to do that is making the feature less useful, and once they have the problem fixed, they might want to turn it off again. Now, one argument is that they could enable checksums, see no checksum errors, but still be getting checksum failures from pages that were written before checksum was enabled. I think we just need to document that checksum only works for writes performed _after_ the feature is enabled. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
Hello, I don't have any attachment to PQskipRemainingResults(), but I think that some formal method to skip until Command-Complete('C') without breaking session is necessary because current libpq does so. On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote: The choices of the libpq user on that point are, - Continue to read succeeding tuples. - Throw away the remaining results. There is also third choice, which may be even more popular than those ones - PQfinish(). That's it. I've implicitly assumed not to tear off the current session. I think we already have such function - PQsetRowProcessor(). Considering the user can use that to install skipping callback or simply set some flag in it's own per-connection state, PQsetRowProcessor() sets row processor not to PGresult but PGconn. So using PGsetRowProcessor() makes no sense for the PGresult on which the user currently working. Another interface function is needed to do that on PGresult. But of course the users can do that by signalling using flags within their code without PQsetRowProcessor or PQskipRemainingResults. Or returning to the beginning implement using PG_TRY() to inhibit longjmp out of the row processor itself makes that possible too. Altough it is possible in that ways, it needs (currently) undocumented (in human-readable langs :-) knowledge about the client protocol and the handling manner of that in libpq which might be changed with no description in the release notes. I suspect the need is not that big. I think so, too. But current implement of libpq does so for the out-of-memory on receiving result rows. So I think some formal (documented, in other words) way to do that is indispensable. But if you want to set error state for skipping, I would instead generalize PQsetRowProcessorErrMsg() to support setting error state outside of callback. That would also help the external processing with 'return 2'. But I would keep the requirement that row processing must be ongoing, standalone error setting does not make sense. Thus the name can stay. mmm.. I consider that the cause of the problem proposed here is the exceptions raised by certain server-side functions called in row processor especially in C/C++ code. And we shouldn't use PG_TRY() to catch it there where is too frequently executed. I think 'return 2' is not applicable for the case. Some aid for non-local exit from row processors (PQexec and the link from users's sight) is needed. And I think it should be formal. There seems to be 2 ways to do it: 1) Replace the PGresult under PGconn. This seems ot require that PQsetRowProcessorErrMsg takes PGconn as argument instead of PGresult. This also means callback should get PGconn as argument. Kind of makes sense even. 2) Convert current partial PGresult to error state. That also makes sense, current use -rowProcessorErrMsg to transport the message to later set the error in caller feels weird. I guess I slightly prefer 2) to 1). The former might be inappropreate from the point of view of the `undocumented knowledge' above. The latter seems missing the point about exceptions. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed dblink using alternate libpq tuple storage
On Tue, Feb 21, 2012 at 02:11:35PM +0900, Kyotaro HORIGUCHI wrote: I don't have any attachment to PQskipRemainingResults(), but I think that some formal method to skip until Command-Complete('C') without breaking session is necessary because current libpq does so. We have such function: PQgetResult(). Only question is how to flag that the rows should be dropped. I think we already have such function - PQsetRowProcessor(). Considering the user can use that to install skipping callback or simply set some flag in it's own per-connection state, PQsetRowProcessor() sets row processor not to PGresult but PGconn. So using PGsetRowProcessor() makes no sense for the PGresult on which the user currently working. Another interface function is needed to do that on PGresult. If we are talking about skipping incoming result rows, it's PGconn feature, not PGresult. Because you want to do network traffic for that, yes? Also, as row handler is on connection, then changing it's behavior is connection context, not result. But of course the users can do that by signalling using flags within their code without PQsetRowProcessor or PQskipRemainingResults. Or returning to the beginning implement using PG_TRY() to inhibit longjmp out of the row processor itself makes that possible too. Altough it is possible in that ways, it needs (currently) undocumented (in human-readable langs :-) knowledge about the client protocol and the handling manner of that in libpq which might be changed with no description in the release notes. You might be right that how to do it may not be obvious. Ok, lets see how it looks. But please do it like this: - PQgetRowProcessor() that returns existing row processor. - PQskipResult() that just replaces row processor, then calls PQgetResult() to eat the result. It's behaviour fully matches PQgetResult() then. I guess the main thing that annoys me with skipping is that it would require additional magic flag inside libpq. With PQgetRowProcessor() it does not need anymore, it's just a helper function that user can implement as well. Only question here is what should happen if there are several incoming resultsets (several queries in PQexec). Should we leave to user to loop over them? Seems there is 2 approaches here: 1) int PQskipResult(PGconn) 2) int PQskipResult(PGconn, int skipAll) Both cases returning: 1 - got resultset, there might be more 0 - PQgetResult() returned NULL, connection is empty -1 - error Although 1) mirrors regular PGgetResult() better, most users might not know that function as they are using sync API. They have simpler time with 2). So 2) then? -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers