Re: [HACKERS] Performance problem in textanycat/anytextcat
Tom Lane wrote: So I think that labeling textanycat/anytextcat as immutable was a thinko, and we instead ought to label them volatile so that the planner can inline them no matter what the behavior of the underlying text cast is. That feels backwards, having to label functions as more volatile than they really are, just to allow optimizations elsewhere. Marking textanycat as not immutable would forbid using it in expression indexes, too. ... The planner will not inline a function if the resulting expression would be more volatile than the function claims itself to be, because sometimes the point of such a function is to hide the expression's volatility. ... Could we inline the function anyway, if it came from an operator? Presumably if you want to hide an expresssion's volatility, you use an explicit function call - I can't imagine using an operator for that purpose. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for PKST timezone
On Wed, May 12, 2010 at 12:39 AM, Tom Lane t...@sss.pgh.pa.us wrote: Joachim Wieland j...@mcknight.de writes: Good we have found that inconsistency now, so let's add PKST. OK, done. BTW, I notice that PKT was labeled (not in zic), which is not the case, per this discussion. I seem to recall having noticed some others that seemed to be mislabeled the same way. What process did you use to compare this list to the zic files, and do we need to revisit it? I have used a modified version of zic.c that outputs the data while generating the binary timezone files. Generating the timezone offset files from that then included some scripts and some manual work. It seems that we should have an automated process for that, at least for checking against our current set. I'll see if I can come up with that. PKST for example was valid only for a single year in the past but in the newer timezone data it is now valid forever. Ideally we can run a script that tells us about such changes whenever we bring in new timezone data. Joachim -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Andrew Dunstan wrote: Bruce Momjian wrote: Maybe I have misunderstood. How exactly is the server version being hacked here? I know it's only for testing, but it still seems to me that lying to a program as heavily version dependant as pg_dump is in general a bad idea. The code in pg_dump 9.0 is: /* * If supported, set extra_float_digits so that we can dump float data * exactly (given correctly implemented float I/O code, anyway) */ if (g_fout-remoteVersion = 9) do_sql_command(g_conn, SET extra_float_digits TO 3); else if (g_fout-remoteVersion = 70400) -- do_sql_command(g_conn, SET extra_float_digits TO 2); The indicated line had to be changed to '3'. I did not change anything else, and it was only done in my private CVS tree. Oh, I see. It is pg_dump that you hacked. That wasn't clear to me from what you first said. But do earlier server versions accept a value of 3? The 8.4 docs say The value can be set as high as 2. That is the other thing I had to hack --- the 8.4 backend version had to be changed to accept '3'. The good thing is this has to be done only once --- once I have the dump file, I can use it in testing repeatedly because 8.4 does not change. Eventually the idea would be to have the build farm run such tests (with a properly created dump file) so we can learn quickly if the backend data format is changed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Bruce Momjian br...@momjian.us writes: Andrew Dunstan wrote: But do earlier server versions accept a value of 3? The 8.4 docs say The value can be set as high as 2. That is the other thing I had to hack --- the 8.4 backend version had to be changed to accept '3'. The good thing is this has to be done only once --- once I have the dump file, I can use it in testing repeatedly because 8.4 does not change. Eventually the idea would be to have the build farm run such tests (with a properly created dump file) so we can learn quickly if the backend data format is changed. If we're thinking of doing that, it would be better to back-patch the change that allowed '3'. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: So I think that labeling textanycat/anytextcat as immutable was a thinko, and we instead ought to label them volatile so that the planner can inline them no matter what the behavior of the underlying text cast is. That feels backwards, having to label functions as more volatile than they really are, just to allow optimizations elsewhere. Well, it's also bogus to label functions as less volatile than they really are. An error in that direction is unsafe, while marking a function as more volatile than it truly is cannot result in wrong behavior. Marking textanycat as not immutable would forbid using it in expression indexes, too. True. On the other hand, the current state of affairs allows one to create an index on expressions that aren't really immutable, with ensuing hilarity. The basic problem here is that these functions are polymorphic and their actual volatility can vary depending on the argument datatype. We don't have any way to describe that in the present pg_proc definition. It does seem like we ought to think about improving this, but that's clearly a research project. In terms of what we could reasonably do for 9.0, I think it's change the volatility marking or nothing ... and changing the marking looks like the better bet to me. ... The planner will not inline a function if the resulting expression would be more volatile than the function claims itself to be, because sometimes the point of such a function is to hide the expression's volatility. ... Could we inline the function anyway, if it came from an operator? No, that's just a crock with no semantic justification at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Keepalive for max_standby_delay
On Sun, 2010-05-16 at 00:05 +0300, Heikki Linnakangas wrote: Heikki Linnakangas wrote: Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't imagine why that's useful behaviour. Good catch. That should be fixed. I also note that walsender doesn't respond to signals, while it's sending a large batch. That's analogous to the issue that was addressed recently in the archiver process. Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE / 2) in one round and returns to the main loop after that even if there's unsent WAL, and the main loop no longer sleeps if there's unsent WAL. That way the main loop gets to respond to signals quickly, and we also get to update the shared memory status and PS display more often when there's a lot of catching up to do. Comments 8MB at a time still seems like a large batch to me. libpq is going to send it in smaller chunks anyway, so I don't see the importance of trying to keep the batch too large. It just introduces delay into the sending process. We should be sending chunks that matches libpq better. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
I wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Marking textanycat as not immutable would forbid using it in expression indexes, too. True. On the other hand, the current state of affairs allows one to create an index on expressions that aren't really immutable, with ensuing hilarity. It strikes me that we could avoid any possible functional regression here by having CREATE INDEX perform expression preprocessing (in particular, function inlining) before it tests to see if the index expression contains any non-immutable functions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql
* Tom Lane (t...@sss.pgh.pa.us) wrote: A saner approach, which would also help for other corner cases such as out-of-disk-space, would be to check for write failures on the output file and abandon the query if any occur. I had considered this, but I'm not sure we really need to catch *every* write failure. Perhaps just catching if the '\n' at the end of a row fails to be written out would be sufficient? Then turning around and setting cancel_query might be enough.. I'll write that up and test if it works. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Keepalive for max_standby_delay
On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote: On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: I will recode using that concept. Startup gets new pointer when it runs out of data to replay. That might or might not include an updated keepalive timestamp, since there's no exact relationship between chunks sent and chunks received. Startup might ask for a new chunk when half a chunk has been received, or when multiple chunks have been received. New version, with some other cleanup of wait processing. New logic is that when Startup asks for next applychunk of WAL it saves the lastChunkTimestamp. That is then the base time used by WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later. Since multiple receivechunks can arrive from primary before Startup asks for next applychunk we use the oldest receivechunk timestamp, not the latest. Doing it this way means the lastChunkTimestamp doesn't change when new keepalives arrive, so we have a stable and reasonably accurate recordSendTimestamp for each WAL record. The keepalive is sent as the first part of a new message, if any. So partial chunks of data always have an accurate timestamp, even if that is slightly older as a result. Doesn't make much difference except with very large chunks. I think that addresses the points raised on this thread and others. -- Simon Riggs www.2ndQuadrant.com *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** *** 4232,4247 The commands accepted in walsender mode are: /varlistentry /variablelist /para ! /listitem ! /varlistentry ! /variablelist ! /para ! para A single WAL record is never split across two CopyData messages. When a WAL record crosses a WAL page boundary, however, and is therefore already split using continuation records, it can be split at the page boundary. In other words, the first main WAL record and its continuation records can be split across different CopyData messages. /para /listitem /varlistentry --- 4232,4283 /varlistentry /variablelist /para ! para A single WAL record is never split across two CopyData messages. When a WAL record crosses a WAL page boundary, however, and is therefore already split using continuation records, it can be split at the page boundary. In other words, the first main WAL record and its continuation records can be split across different CopyData messages. + /para + /listitem + /varlistentry + varlistentry + term + Keepalive (B) + /term + listitem + para + variablelist + varlistentry + term + Byte1('k') + /term + listitem + para + Identifies the message as a keepalive. + /para + /listitem + /varlistentry + varlistentry + term + TimestampTz + /term + listitem + para + The current timestamp on the primary server when the keepalive was sent. + /para + /listitem + /varlistentry + /variablelist + /para + para +If varnamewal_level/ is set to literalhot_standby/ then a keepalive +is sent once per varnamewal_sender_delay/. The keepalive is sent after +WAL data has been sent, if any. + /para + /listitem + /varlistentry + /variablelist /para /listitem /varlistentry *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 1938,1944 UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force) UpdateControlFile(); minRecoveryPoint = newMinRecoveryPoint; ! ereport(DEBUG2, (errmsg(updated min recovery point to %X/%X, minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff))); } --- 1938,1944 UpdateControlFile(); minRecoveryPoint = newMinRecoveryPoint; ! ereport(DEBUG3, (errmsg(updated min recovery point to %X/%X, minRecoveryPoint.xlogid, minRecoveryPoint.xrecoff))); } *** *** 9212,9218 retry: * While walreceiver is active, wait for new WAL to arrive * from primary. */ ! receivedUpto = GetWalRcvWriteRecPtr(); if (XLByteLT(*RecPtr, receivedUpto)) { /* --- 9212,9218 * While walreceiver is active, wait for new WAL to arrive * from primary. */ ! receivedUpto = GetWalRcvNextChunk(); if (XLByteLT(*RecPtr, receivedUpto)) { /* *** a/src/backend/replication/walreceiver.c --- b/src/backend/replication/walreceiver.c *** *** 407,412 XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len) --- 407,428 XLogWalRcvWrite(buf, len, recptr); break; } + case 'k':/* keepalive */ + { +
Re: [HACKERS] [PATCH] Add SIGCHLD catch to psql
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: A saner approach, which would also help for other corner cases such as out-of-disk-space, would be to check for write failures on the output file and abandon the query if any occur. I had considered this, but I'm not sure we really need to catch *every* write failure. Perhaps just catching if the '\n' at the end of a row fails to be written out would be sufficient? If you're combining this with the FETCH_COUNT logic then it seems like it'd be sufficient to check ferror(fout) once per fetch chunk, and just fall out of that loop then. I don't want psql issuing query cancels on its own authority, either. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
On Sat, May 15, 2010 at 9:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: I noticed by accident that there are some cases where the planner fails to inline the SQL functions that underlie the || operator for text vs non-text cases. The reason is that these functions are marked immutable, but their expansion involves a coercion to text that might not be immutable. The planner will not inline a function if the resulting expression would be more volatile than the function claims itself to be, because sometimes the point of such a function is to hide the expression's volatility. In this case, however, we don't want to hide the true nature of the expression, and we definitely don't want to pay the performance price of calling a SQL function. That price is pretty significant, eg on a rather slow machine I get regression=# select count(localtimestamp || i::text) from generate_series(1,10) i; count 10 (1 row) Time: 12512.624 ms regression=# update pg_proc set provolatile = 'v' where oid = 2004; UPDATE 1 Time: 7.104 ms regression=# select count(localtimestamp || i::text) from generate_series(1,10) i; count 10 (1 row) Time: 4967.086 ms so the added overhead more than doubles the cost of this case. There's also a possibility of an outright wrong behavior, since the immutable marking will allow the concatenation of two constants to be folded to a constant in contexts where perhaps it shouldn't be. Continuing the above example, 'now'::timestamp || 'foo' will be folded to a constant on sight, which is wrong because the coercion to text depends on DateStyle and ought to react to a later change in DateStyle. So I think that labeling textanycat/anytextcat as immutable was a thinko, and we instead ought to label them volatile so that the planner can inline them no matter what the behavior of the underlying text cast is. Couldn't you apply this argument to any built-in immutable function whatsoever? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Andrew Dunstan wrote: But do earlier server versions accept a value of 3? The 8.4 docs say The value can be set as high as 2. That is the other thing I had to hack --- the 8.4 backend version had to be changed to accept '3'. The good thing is this has to be done only once --- once I have the dump file, I can use it in testing repeatedly because 8.4 does not change. Eventually the idea would be to have the build farm run such tests (with a properly created dump file) so we can learn quickly if the backend data format is changed. If we're thinking of doing that, it would be better to back-patch the change that allowed '3'. Yeah. It's going to require some fancy dancing to get the buildfarm to do it. Each buildfarm run is for a specific branch, and all the built artefacts are normally thrown away. I'd have to work out a way of stashing the binaries from a build on one branch for use in the pg_upgrade tests in the run on another branch. It's doable but could get messy. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
Robert Haas robertmh...@gmail.com writes: Couldn't you apply this argument to any built-in immutable function whatsoever? No, only the ones that are built on top of other functions that aren't immutable. I did go looking for other potential problems of the same ilk. The only one I can find at present is to_timestamp(double), which is an immutable SQL function but it uses timestamptz + interval, which is marked as not immutable. I believe the reason for that is that if the interval includes month or day components then the addition result can depend on the timezone setting. However, the usage in to_timestamp() involves only a pure seconds component so the immutable marking should be correct. Still, we might want to think about reimplementing to_timestamp() as a pure C function sometime, because the current implementation is many times slower than it needs to be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Synchronous replication patch built on SR
On Fri, 2010-05-14 at 15:15 -0400, Robert Haas wrote: On Fri, May 14, 2010 at 9:33 AM, Boszormenyi Zoltan z...@cybertec.at wrote: If min_sync_replication_clients == 0, then the replication is async. If min_sync_replication_clients == max_wal_senders then the replication is fully synchronous. If 0 min_sync_replication_clients max_wal_senders then the replication is partially synchronous, i.e. the master can wait only for say, 50% of the clients to report back before it's considered synchronous and the relevant transactions get released from the wait. That's an interesting design and in some ways pretty elegant, but it rules out some things that people might easily want to do - for example, synchronous replication to the other server in the same data center that acts as a backup for the master; and asynchronous replication to a reporting server located off-site. The design above allows the case you mention: min_sync_replication_clients = 1 max_wal_senders = 2 It works well in failure cases, such as the case where the local backup server goes down. It seems exactly what we need to me, though not sure about names. One of the things that I think we will probably need/want to change eventually is the fact that the master has no real knowledge of who the replication slaves are. That might be something we want to change in order to be able to support more configurability. Inventing syntax out of whole cloth and leaving semantics to the imagination of the reader: CREATE REPLICATION SLAVE reporting_server (mode asynchronous, xid_feedback on); CREATE REPLICATION SLAVE failover_server (mode synchronous, xid_feedback off, break_synchrep_timeout 30); I am against labelling servers as synchronous/asynchronous. We've had this discussion a few times since 2008. There is significant advantage in having the user specify the level of robustness, so that it can vary from transaction to transaction, just as already happens at commit. That way the user gets to say what happens. Look for threads on transaction controlled robustness. As alluded to above, if you label the servers you also need to say what happens when one or more of them are down. e.g. synchronous to B AND async to C, except when B is not available, in which case make C synchronous. With N servers, you end up needing to specify O(N^2) rules for what happens, so it only works neatly for 2, maybe 3 servers. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Couldn't you apply this argument to any built-in immutable function whatsoever? No, only the ones that are built on top of other functions that aren't immutable. Built on top of? I don't get it. It seems like anything of the form immutablefunction(volatilefunction()) is vulnerable to this, and you can give a volatile function as an argument to any function you like. If you're saying we're testing for immutability by looking only at the outermost function call, that seems pretty broken. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
On Sun, May 16, 2010 at 1:20 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Couldn't you apply this argument to any built-in immutable function whatsoever? No, only the ones that are built on top of other functions that aren't immutable. Built on top of? I don't get it. It seems like anything of the form immutablefunction(volatilefunction()) is vulnerable to this, and you can give a volatile function as an argument to any function you like. If you're saying we're testing for immutability by looking only at the outermost function call, that seems pretty broken. you mean we shouldn't allow this? select version(); version --- PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 64-bit (1 row) create table t1 (col1 int); create function f1(int) returns double precision as $$ select random() * $1; $$ language sql immutable; create index idx on t1(f1(col1)); then, welcome to the club... there were various conversations on this same topic -- Jaime Casanova www.2ndQuadrant.com Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
Robert Haas robertmh...@gmail.com writes: On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, only the ones that are built on top of other functions that aren't immutable. Built on top of? I don't get it. It seems like anything of the form immutablefunction(volatilefunction()) is vulnerable to this, Uh, no, you misunderstand completely. The problematic case is where the function's own expansion contains a non-immutable function. In particular, what we have for these functions is that textanycat(a,b) expands to a || b::text, and depending on what the type of b is, the cast from that to text might not be immutable. This is entirely independent of whether the argument expressions are volatile or not. Rather, the problem is that inlining the function definition could by itself increase the expression's apparent volatility. (Decreasing the volatility is not problematic. Increasing it is.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
On Sun, May 16, 2010 at 2:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sun, May 16, 2010 at 1:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: No, only the ones that are built on top of other functions that aren't immutable. Built on top of? I don't get it. It seems like anything of the form immutablefunction(volatilefunction()) is vulnerable to this, Uh, no, you misunderstand completely. The problematic case is where the function's own expansion contains a non-immutable function. In particular, what we have for these functions is that textanycat(a,b) expands to a || b::text, and depending on what the type of b is, the cast from that to text might not be immutable. This is entirely independent of whether the argument expressions are volatile or not. Rather, the problem is that inlining the function definition could by itself increase the expression's apparent volatility. (Decreasing the volatility is not problematic. Increasing it is.) I guess my point is that the actual volatility of an expression is presumably independent of whether it gets inlined. (If inlining is changing the semantics, that's a problem.) So if inlining is changing it's apparent volatility, then there's something wrong with the way we're computing apparent volatility. No? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff
Hi all, After having received several reports of worse plans on 8.4 compared to 8.3 and recently once more one from 'vaxerdec' on IRC I tried to investigate the difference. Reducing the (large and ugly, automatically generated queries) to a reproducible testcase I ended up with the following pattern: explain SELECT 1 FROM c WHERE EXISTS ( SELECT * FROM a JOIN b USING (b_id) WHERE b.c_id = c.c_id) AND c.value = 1; 8.3 planned this to: Index Scan using c_value_key on c (cost=0.00..24.83 rows=1 width=0) Index Cond: (value = 1) Filter: (subplan) SubPlan - Nested Loop (cost=0.00..16.56 rows=1 width=12) - Index Scan using b__c_id on b (cost=0.00..8.27 rows=1 width=8) Index Cond: (c_id = $0) - Index Scan using a__b_id on a (cost=0.00..8.27 rows=1 width=8) Index Cond: (a.b_id = b.b_id) Which is quite good for such a kind of query. From 8.4 onwards this gets planned to Nested Loop Semi Join (cost=1543.00..7708.29 rows=1 width=0) Join Filter: (c.c_id = b.c_id) - Index Scan using c_value_key on c (cost=0.00..8.27 rows=1 width=4) Index Cond: (value = 1) - Hash Join (cost=1543.00..7075.02 rows=5 width=4) Hash Cond: (b.b_id = a.b_id) - Seq Scan on b (cost=0.00..2164.01 rows=150001 width=8) - Hash (cost=722.00..722.00 rows=5 width=4) - Seq Scan on a (cost=0.00..722.00 rows=5 width=4) which is the near-equivalent (with s/Semi/IN/) what 8.3 produces for the above query with IN instead of EXISTS. This kind of plan obviously is horrid. The behavioral change was introduced in Tom's initial commit to support Semi Joins Implement SEMI and ANTI joins in the planner and executor. from 2008-08-14 (I tried the commits before and after). Seeing that 8.3 didn't inline EXISTS but IN and showed the bad plan with IN its pretty evident that the inlining is the problem and not the patch itself. Unsurprisingly 8.4 produces a similar plan to 8.3 if one uses a volatile function or a OFFSET 0 as that stops inlining. Two questions: 1. Is there a reason this cannot be optimized? In the semi join case it doesn't seem to be *that* complex to push down qualifiers resulting in a plan like: Nested Loop Semi Join - Index Scan using c_value_key on c Index Cond: (value = 1) - Nested Loop - Index Scan using b__c_id on b Index Cond: (b.c_id = c.c_id) - Index Scan using a__b_id on a Index Cond: (a.b_id = b.b_id) or, a bit more complex: Nested Loop Semi Join - Nested Loop Semi Join - Index Scan using c_value_key on c Index Cond: (value = 1) - Index Scan using b__c_id on b Index Cond: (b.c_id = c.c_id) - Index Scan using a__b_id on a Index Cond: (a.b_id = b.b_id) 2. I can construct several cases off the top of my head where avoiding inlining might yield significantly better plans unless variables are pushed down more aggressively into EXISTS/IN. While it would solve this issue I doubt that generating a separate path without inlining is viable (code complexity, plantime)? I thinks its pretty annoying to have so much worse plans in 8.4 than earlier in relatively obvious queries, but I don't see any good, low-impact fix? Greetings, Andres PS: Tom: Do you like to get directly addressed on bugs/mails like this one or not? I am not really sure whats the policy regarding that is on the pg mailinglists. Is there one? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff
Testschema: ROLLBACK; BEGIN; CREATE TABLE a ( a_id serial PRIMARY KEY NOT NULL, b_id integer ); CREATE INDEX a__b_id ON a USING btree (b_id); CREATE TABLE b ( b_id serial NOT NULL, c_id integer ); CREATE INDEX b__c_id ON b USING btree (c_id); CREATE TABLE c ( c_id serial PRIMARY KEY NOT NULL, value integer UNIQUE ); INSERT INTO b (b_id, c_id) SELECT g.i, g.i FROM generate_series(1, 5) g(i); INSERT INTO a(b_id) SELECT g.i FROM generate_series(1, 5) g(i); COMMIT; ANALYZE; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff
On Sun, May 16, 2010 at 7:07 PM, Andres Freund and...@anarazel.de wrote: Reducing the (large and ugly, automatically generated queries) to a reproducible testcase I ended up with the following pattern: explain SELECT 1 FROM c WHERE EXISTS ( SELECT * FROM a JOIN b USING (b_id) WHERE b.c_id = c.c_id) AND c.value = 1; 8.3 planned this to: Index Scan using c_value_key on c (cost=0.00..24.83 rows=1 width=0) Index Cond: (value = 1) Filter: (subplan) SubPlan - Nested Loop (cost=0.00..16.56 rows=1 width=12) - Index Scan using b__c_id on b (cost=0.00..8.27 rows=1 width=8) Index Cond: (c_id = $0) - Index Scan using a__b_id on a (cost=0.00..8.27 rows=1 width=8) Index Cond: (a.b_id = b.b_id) Which is quite good for such a kind of query. From 8.4 onwards this gets planned to [something bad] I believe this is a result of a limitation we've discussed previously, namely, that the planner presently uses a limited, special-case kludge to consider partial index scans, and the executor uses another kludge to execute them. http://archives.postgresql.org/pgsql-hackers/2009-09/msg00525.php http://archives.postgresql.org/pgsql-hackers/2009-10/msg00994.php http://archives.postgresql.org/pgsql-hackers/2009-12/msg01755.php I believe that Tom is planning to fix this for 9.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Andrew Dunstan wrote: Eventually the idea would be to have the build farm run such tests (with a properly created dump file) so we can learn quickly if the backend data format is changed. If we're thinking of doing that, it would be better to back-patch the change that allowed '3'. Yeah. It's going to require some fancy dancing to get the buildfarm to do it. Each buildfarm run is for a specific branch, and all the built artefacts are normally thrown away. I'd have to work out a way of stashing the binaries from a build on one branch for use in the pg_upgrade tests in the run on another branch. It's doable but could get messy. Uh, that is not actually a problem. You just need to set extra_float_digits=-3 to create the dump file, which is only done once for each major version. You can _load_ that dump file into an unmodified old cluster and test just fine. I will write up some instructions in the next few days. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Bruce Momjian wrote: Andrew Dunstan wrote: Eventually the idea would be to have the build farm run such tests (with a properly created dump file) so we can learn quickly if the backend data format is changed. If we're thinking of doing that, it would be better to back-patch the change that allowed '3'. Yeah. It's going to require some fancy dancing to get the buildfarm to do it. Each buildfarm run is for a specific branch, and all the built artefacts are normally thrown away. I'd have to work out a way of stashing the binaries from a build on one branch for use in the pg_upgrade tests in the run on another branch. It's doable but could get messy. Uh, that is not actually a problem. You just need to set extra_float_digits=-3 to create the dump file, which is only done once for each major version. You can _load_ that dump file into an unmodified old cluster and test just fine. I will write up some instructions in the next few days. You are missing the point I was making. A buildfarm run does not normally have available to it any binaries for a version other that the one it is building. There is no notion of a multi-branch buildfarm run. Each run is for a particular branch and is a separate miracle. So I'm not concerned about the structure of the dump file but about what will be used to load it into an old cluster during a buildfarm run. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle
On May 14, 2010, at 22:54 , Robert Haas wrote: On Thu, May 13, 2010 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Florian Pflug f...@phlo.org writes: All in all, I believe that SHARE and UPDATE row-level locks should be changed to cause concurrent UPDATEs to fail with a serialization error. I don't see an argument for doing that for FOR SHARE locks, and it already happens for FOR UPDATE (at least if the row actually gets updated). AFAICS this proposal mainly breaks things, in pursuit of an unnecessary and probably-impossible-anyway goal of making FK locking work with only user-level snapshots. After giving this considerable thought and testing the behavior at some length, I think the OP has it right. One thing I sometimes need to do is denormalize a copy of a field, e.g. snipped example I've whipped up a quick and still rather dirty patch that implements the behavior I proposed, at least for the case of conflicts between FOR UPDATE locks and updates. With the patch, any attempt to UPDATE or FOR UPDATE lock a row that has concurrently been FOR UPDATE locked will cause a serialization error. (The same for an actually updated row of course, but that happened before too). While this part of the patch was fairly straight forward, make FOR SHARE conflict too seems to be much harder. The assumption that a lock becomes irrelevant after the transaction(s) that held it completely is built deeply into the multi xact machinery that powers SHARE locks. That machinery therefore assumes that once all members of a multi xact have completed the multi xact is dead also. But my proposal depends on a SERIALIZABLE transaction being able to find if any of the lockers of a row are invisible under it's snapshot - for which it'd need any multi xact containing invisible xids to outlive its snapshot. best regards, Florian Pflug serializable_share_lock.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Andrew Dunstan wrote: Uh, that is not actually a problem. You just need to set extra_float_digits=-3 to create the dump file, which is only done once for each major version. You can _load_ that dump file into an unmodified old cluster and test just fine. I will write up some instructions in the next few days. You are missing the point I was making. A buildfarm run does not normally have available to it any binaries for a version other that the one it is building. There is no notion of a multi-branch buildfarm run. Each run is for a particular branch and is a separate miracle. So I'm not concerned about the structure of the dump file but about what will be used to load it into an old cluster during a buildfarm run. Thank you. I understand now. Imagine finding out on the build farm right away when we break binary compatibility --- that would be cool. However, that might be overkill. My testing seems to be working just fine. In fact the only diff I see is: CREATE PROCEDURAL LANGUAGE plpgsql; --- CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql; and that is a known change. I might end up adding my regression dump files to our ftp site (400k for each major version), and just having people use them for testing. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle
On May 14, 2010, at 16:32 , Kevin Grittner wrote: Florian Pflug f...@phlo.org wrote: I must admit that I wasn't able to find an explicit reference to Oracle's behavior in their docs, so I had to resort to experiments. They do have examples showing how to do FK-like constraints with triggers, and those don't contain any warning whatsoever about problems in SERIALIZABLE mode, though. But still, if there is word on this from Oracle somewhere, I'd love to hear about it. I suspect that in trying to emulate Oracle on this, you may run into an issue which posed challenges for the SSI implementation which didn't come up in the Cahill prototype implementations: Oracle, and all other MVCC databases I've read about outside of PostgreSQL, use an update in place with a rollback log technique. Access to any version of a given row or index entry goes through a single location, with possible backtracking through the log after that, which simplifies management of certain concurrency issues. Do they perhaps use an in-RAM lock table, pointing to the base location of the row for these SELECT FOR UPDATE locks? (Just guessing; I've never used Oracle, myself.) Thanks for the heads up. I think my proposed doges this, though, since UPDATE as well as FOR SHARE and FOR UPDATE already follow the ctid chain to find the most recent tuple and fail with a serialization error (within = REPEATABLE READ transaction) should this tuple be inaccessible to the transaction's snapshot. Btw, I've just posted a quick-and-dirty patch that implements the parts of my proposal that deal with FOR UPDATE vs. UPDATE conflicts in response to Robert Haas' mail on this thread, just in case you're interested. best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stefan's bug (was: max_standby_delay considered harmful)
On Wed, May 12, 2010 at 10:41 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, May 12, 2010 at 10:36 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote: Excerpts from Robert Haas's message of mié may 12 20:48:41 -0400 2010: On Wed, May 12, 2010 at 3:55 PM, Robert Haas robertmh...@gmail.com wrote: I am wondering if we are not correctly handling the case where we get a shutdown request while we are still in the PM_STARTUP state. It looks like we might go ahead and switch to PM_RECOVERY and then PM_RECOVERY_CONSISTENT without noticing the shutdown. There is some logic to handle the shutdown when the startup process exits, but if the startup process never exits it looks like we might get stuck. I can reproduce the behavior Stefan is seeing consistently using the attached patch. W1: postgres -D ~/pgslave W2: pg_ctl -D ~/pgslave stop If there's anything to learn from this patch, is that sleep is uninterruptible on some platforms. This is why sleeps elsewhere are broken down in loops, sleeping in small increments and checking interrupts each time. Maybe some of the sleeps in the new HS code need to be handled this way? I don't think the problem is that the sleep is uninterruptible. I think the problem is that a smart shutdown request received while in the PM_STARTUP state does not acted upon until we enter the PM_RUN state. That is, there's a race condition between the SIGUSR1 that the startup process sends to the postmaster to signal that recovery has begun and the SIGTERM being sent by pg_ctl. However, since I haven't succeeded in producing a fix yet, take that with a grain of salt... I'm replying to this thread rather than the max_standby_delay thread because this line of discussion is not actually related to max_standby_delay. Fujii Masao previously reported this problem and proposed this fix: http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php I responded by proposing that we instead simply adjust things so that a smart shutdown is always handled at the end of recovery, just prior to entering normal running. On further reflection, though, I've concluded that was a dumb idea. Currently, when a smart shutdown occurs during recovery, we handle it immediately UNLESS it happens before we receive the signal that tells us it's OK to start the background writer, in which case we get confused and lock everyone out of the database until a fast or immediate shutdown request arrives. So this is just a race condition, plain and simple. Therefore I think Fujii Masao's original idea was the best, but I have what I believe is an equivalent but simpler implementation, which is attached. Thoughts? Should we try to fix this in 8.4 also, or just in HEAD? 8.3 and 8.2 never handle a smart shutdown prior to entering normal running, and while that seems pretty useless, doing something different would be a behavior change, so that seems like a non-starter. 8.4 has the same behavior as HEAD, though it's not documented in the release notes, so it's not clear how intentional the change was. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company fix_smart_shutdown_in_recovery.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Bruce Momjian wrote: Thank you. I understand now. Imagine finding out on the build farm right away when we break binary compatibility --- that would be cool. I'm not saying we can't do that, just that it will not be a trivial change. And yes it would be cool, although I hope we would know before we committed such a change that that would be the outcome. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Row-level Locks SERIALIZABLE transactions, postgres vs. Oracle
On Sun, May 16, 2010 at 9:07 PM, Florian Pflug f...@phlo.org wrote: On May 14, 2010, at 22:54 , Robert Haas wrote: On Thu, May 13, 2010 at 5:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Florian Pflug f...@phlo.org writes: All in all, I believe that SHARE and UPDATE row-level locks should be changed to cause concurrent UPDATEs to fail with a serialization error. I don't see an argument for doing that for FOR SHARE locks, and it already happens for FOR UPDATE (at least if the row actually gets updated). AFAICS this proposal mainly breaks things, in pursuit of an unnecessary and probably-impossible-anyway goal of making FK locking work with only user-level snapshots. After giving this considerable thought and testing the behavior at some length, I think the OP has it right. One thing I sometimes need to do is denormalize a copy of a field, e.g. snipped example I've whipped up a quick and still rather dirty patch that implements the behavior I proposed, at least for the case of conflicts between FOR UPDATE locks and updates. With the patch, any attempt to UPDATE or FOR UPDATE lock a row that has concurrently been FOR UPDATE locked will cause a serialization error. (The same for an actually updated row of course, but that happened before too). While this part of the patch was fairly straight forward, make FOR SHARE conflict too seems to be much harder. The assumption that a lock becomes irrelevant after the transaction(s) that held it completely is built deeply into the multi xact machinery that powers SHARE locks. That machinery therefore assumes that once all members of a multi xact have completed the multi xact is dead also. But my proposal depends on a SERIALIZABLE transaction being able to find if any of the lockers of a row are invisible under it's snapshot - for which it'd need any multi xact containing invisible xids to outlive its snapshot. Thanks for putting this together. I suggest adding it to the open CommitFest - even if we decide to go forward with this, I don't imagine anyone is going to be excited about changing it during beta. https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Bruce Momjian br...@momjian.us writes: Andrew Dunstan wrote: It's going to require some fancy dancing to get the buildfarm to do it. Each buildfarm run is for a specific branch, and all the built artefacts are normally thrown away. Uh, that is not actually a problem. You just need to set extra_float_digits=-3 to create the dump file, which is only done once for each major version. Wrong. In the first place, we're not going to start carrying something as large as a pg_dump of the regression database as part of the source code for the buildfarm. Even if we wanted to, it wouldn't work because the results aren't platform-independent --- there are float differences and probably row ordering differences to worry about. In the second place, it won't only be done once, unless you imagine that we never change the regression tests for back branches; a casual perusal of the CVS logs will disprove that idea. The only thing that's really going to work here is to generate the dump on the fly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance problem in textanycat/anytextcat
Robert Haas robertmh...@gmail.com writes: I guess my point is that the actual volatility of an expression is presumably independent of whether it gets inlined. (If inlining is changing the semantics, that's a problem.) So if inlining is changing it's apparent volatility, then there's something wrong with the way we're computing apparent volatility. No? Well, the way we're computing volatility is certainly an oversimplification of reality, as was already noted upthread --- the fundamental issue here is that polymorphism of the textcat functions renders it impossible to know their true volatility status without knowing the specific datatype they're being applied to. Perhaps we could generalize the pg_proc representation to accommodate that, but it's certainly in the nature of a research project. And there are *many* cases where we're already approximating for other reasons; the timestamptz plus interval case that I mentioned earlier is one. So long as any approximations are in the direction of supposing the expression is more volatile than it really is, they are not dangerous. But right at the moment, the textcat functions are on the wrong side of that, because they're claiming to be immutable when they sometimes aren't. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff
Robert Haas robertmh...@gmail.com writes: I believe this is a result of a limitation we've discussed previously, namely, that the planner presently uses a limited, special-case kludge to consider partial index scans, and the executor uses another kludge to execute them. Yeah. To restore this case to something like what previous versions did, we need to be able to push an inner-indexscan parameter down through multiple join levels, which neither the planner nor executor can deal with at the moment. I am planning to work on this for 9.1. It may be worth pointing out that while the current code sucks for the case where a nestloop-with-inner-indexscan would be the best plan, the previous code sucked for every other case; because the previous code was only capable of generating the equivalent of a nestloop join. We have to continue down this path in order to get to the place we need to be. It's too bad that all the work didn't get done in one development cycle, but sometimes life's like that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Keepalive for max_standby_delay
On Sun, May 16, 2010 at 6:05 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Heikki Linnakangas wrote: Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't imagine why that's useful behaviour. Good catch. That should be fixed. I also note that walsender doesn't respond to signals, while it's sending a large batch. That's analogous to the issue that was addressed recently in the archiver process. Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE / 2) in one round and returns to the main loop after that even if there's unsent WAL, and the main loop no longer sleeps if there's unsent WAL. That way the main loop gets to respond to signals quickly, and we also get to update the shared memory status and PS display more often when there's a lot of catching up to do. Comments The main loop needs to respond to also the 'X' message from the standby quickly? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Tom Lane wrote: Bruce Momjian br...@momjian.us writes: Andrew Dunstan wrote: It's going to require some fancy dancing to get the buildfarm to do it. Each buildfarm run is for a specific branch, and all the built artefacts are normally thrown away. Uh, that is not actually a problem. You just need to set extra_float_digits=-3 to create the dump file, which is only done once for each major version. Wrong. In the first place, we're not going to start carrying something as large as a pg_dump of the regression database as part of the source code for the buildfarm. Even if we wanted to, it wouldn't work because the results aren't platform-independent --- there are float differences and probably row ordering differences to worry about. In the second place, it won't only be done once, unless you imagine that we never change the regression tests for back branches; a casual perusal of the CVS logs will disprove that idea. The only thing that's really going to work here is to generate the dump on the fly. This whole discussion leads me to the conclusion that we need to look more imaginatively at our testing regime. When the buildfarm was created it (via pg_regress) covered a lot of what we needed to test, but that is becoming less and less true. Not only does pg_upgrade need testing but we need to devise some sort of automated testing regime for SR and HS, among other things. pg_regress is showing it's age a bit, I think. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Sort of a planner regression 8.3-8.4 (due to EXISTS inlining) and related stuff
On Monday 17 May 2010 04:10:46 Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: I believe this is a result of a limitation we've discussed previously, namely, that the planner presently uses a limited, special-case kludge to consider partial index scans, and the executor uses another kludge to execute them. It may be worth pointing out that while the current code sucks for the case where a nestloop-with-inner-indexscan would be the best plan, the previous code sucked for every other case; because the previous code was only capable of generating the equivalent of a nestloop join. We have to continue down this path in order to get to the place we need to be. It's too bad that all the work didn't get done in one development cycle, but sometimes life's like that. Yes, I realize that. Thats why I didnt report it as an actual bug... And its way easier to deal with 8.4s deficiency than with the former behaviour. Thanks, Andres PS: I think it lead me to an actual bug, expect a report tomorrow... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_upgrade and extra_float_digits
Andrew Dunstan and...@dunslane.net writes: This whole discussion leads me to the conclusion that we need to look more imaginatively at our testing regime. When the buildfarm was created it (via pg_regress) covered a lot of what we needed to test, but that is becoming less and less true. Not only does pg_upgrade need testing but we need to devise some sort of automated testing regime for SR and HS, among other things. pg_regress is showing it's age a bit, I think. The regression tests have never pretended to test more than a fraction of what might be interesting to test. Crash recovery, in particular, has always been interesting and has never been tested in any mechanized way. They don't really exercise concurrent behavior in any meaningful way either. I don't think they're showing their age so much as we're starting to get more ambitious about what we would like to have routine testing for. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Keepalive for max_standby_delay
On Mon, May 17, 2010 at 1:11 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote: On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: I will recode using that concept. Startup gets new pointer when it runs out of data to replay. That might or might not include an updated keepalive timestamp, since there's no exact relationship between chunks sent and chunks received. Startup might ask for a new chunk when half a chunk has been received, or when multiple chunks have been received. New version, with some other cleanup of wait processing. New logic is that when Startup asks for next applychunk of WAL it saves the lastChunkTimestamp. That is then the base time used by WaitExceedsMaxStandbyDelay(), except when latestXLogTime is later. Since multiple receivechunks can arrive from primary before Startup asks for next applychunk we use the oldest receivechunk timestamp, not the latest. Doing it this way means the lastChunkTimestamp doesn't change when new keepalives arrive, so we have a stable and reasonably accurate recordSendTimestamp for each WAL record. The keepalive is sent as the first part of a new message, if any. So partial chunks of data always have an accurate timestamp, even if that is slightly older as a result. Doesn't make much difference except with very large chunks. I think that addresses the points raised on this thread and others. Is it OK that this keepalive message cannot be used by HS in file-based log-shipping? Even in SR, the startup process cannot use the keepalive until walreceiver has been started up. WalSndKeepAlive() always calls initStringInfo(), which seems to cause memory-leak. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers