Re: [HACKERS] Online base backup from the hot-standby
* Not correspond yet ?* full_page_write = off ? ?- If the primary is full_page_write = off, archive recovery may not act ? ? ? normally. Therefore the standby may need to check whether full_page_write ? ? ? = off to WAL. Isn't having a standby make the full_page_write = on in all case (bypass configuration) ? what's the meaning? Thanks. This has the following two problems. * pg_start_backup() must set 'on' to full_page_writes of the master that is actual writing of the WAL, but not the standby. * The standby doesn't need to connect to the master that's actual writing WAL. (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2) I'm worried how I should clear these problems. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
* Not correspond yet ?* full_page_write = off ? ?- If the primary is full_page_write = off, archive recovery may not act ? ? ? normally. Therefore the standby may need to check whether full_page_write ? ? ? = off to WAL. Isn't having a standby make the full_page_write = on in all case (bypass configuration) ? what's the meaning? Yeah. full_page_writes is a WAL generation parameter. Standbys don't generate WAL. I think you just have to insist that the master has it on. Thanks. This has the following two problems. * pg_start_backup() must set 'on' to full_page_writes of the master that is actual writing of the WAL, but not the standby. * The standby doesn't need to connect to the master that's actual writing WAL. (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2) I'm worried how I should clear these problems. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup's from standby
Thanks for your reply. What will happen if you issue *checkpoint* at STANDBY. I presume that it will flush the data to the disk. Will there be any conflict with the master WAL.(like checkpoint location...) Senthil -- View this message in context: http://postgresql.1045698.n5.nabble.com/Backup-s-from-standby-tp4688344p4703469.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
I found that I forgot to remove levelstep and buffersize from reloptions.c. Updated patch is attached. -- With best regards, Alexander Korotkov. gist_fast_build-0.14.1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some missing internationalization in pg_basebackup
On ons, 2011-08-10 at 11:39 +0200, Magnus Hagander wrote: On Tue, Aug 9, 2011 at 13:38, Peter Eisentraut pete...@gmx.net wrote: I noticed that the progress reporting code in pg_basebackup does not allow for translation. This would normally be easy to fix, but this code has a number of tricky issues, including the INT64_FORMAT, possibly some plural concerns, and some space alignment issues that hidden in some of those hardcoded numbers. Maybe it can/should be rewritten not to try to do all those things in one step, thus making it easier somehow? I'm afraid I don't know enough about the translation system to know exactly what would go all the way there, though.. I've fixed this now. During testing, I noticed that the final kB number always overshoots the projected total by 1 kB, e.g., 52763/52762 kB (100%), 1/1 tablespace This happens even in trivial test (base backup of regression test database, for example), so is it possible that there is some counting bug in the code? It could appear as postgres can't count to the user. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walprotocol.h vs frontends
On Tue, Aug 16, 2011 at 00:05, Simon Riggs si...@2ndquadrant.com wrote: On Mon, Aug 15, 2011 at 10:32 PM, Magnus Hagander mag...@hagander.net wrote: At present the WALSender only sends from one file at a time, so sending a message when we open a new file would be straightforward. Are you sure? We can receive a single message spanning multiple files... You're right. That was the way the original code ran but I thought we had stopped that when we introduced MAX_SEND_SIZE. The comment says don't cross a logfile boundary within one message. What that actually does is prevent us incrementing a logid value, which happens every 255 files. I read that as meaning WAL file which is not what it means at all. That has bitten me many times. log file != wal file.. It's not exactly intuitive. So right now what we do is allow a single packet to span multiple files, but since MAX_SEND_SIZE is 128KB it will always be smaller than a single file, so we can only ever span two files at most. Unless someone has tweaked their xlog segment size to insane values. I ended up writing a loop that can deal with it spanning 2 files as well - but that also turned out to be less code than the special case for 2 files, so I'll be keeping that :-) That is all just a little bizarre, especially since libpq sends data in 8KB chunks anyway. So I suggest we change XLogSend() so that it only ever sends up to the end of a file. That way all w messages will relate to just one file, and we can have another message to initiate a new file. And then, as you say, give full metadata for the new file. That could be an idea in itself. It is not necessary for the log receiver, since it can use the #define FRONTEND hack (which only works if you build inside the source tree of course, not if it's an extension) but it might make the protocol simpler to deal with for third parties who want to do something similar. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some missing internationalization in pg_basebackup
On Tue, Aug 16, 2011 at 10:33, Peter Eisentraut pete...@gmx.net wrote: On ons, 2011-08-10 at 11:39 +0200, Magnus Hagander wrote: On Tue, Aug 9, 2011 at 13:38, Peter Eisentraut pete...@gmx.net wrote: I noticed that the progress reporting code in pg_basebackup does not allow for translation. This would normally be easy to fix, but this code has a number of tricky issues, including the INT64_FORMAT, possibly some plural concerns, and some space alignment issues that hidden in some of those hardcoded numbers. Maybe it can/should be rewritten not to try to do all those things in one step, thus making it easier somehow? I'm afraid I don't know enough about the translation system to know exactly what would go all the way there, though.. I've fixed this now. During testing, I noticed that the final kB number always overshoots the projected total by 1 kB, e.g., 52763/52762 kB (100%), 1/1 tablespace This happens even in trivial test (base backup of regression test database, for example), so is it possible that there is some counting bug in the code? It could appear as postgres can't count to the user. Hmm. I bet that's the backup label file. It doesn't exist in the data directory anymore when we do these backups, but it is synthesized into the straem. Thus it's not counted. I wonder if we should bother counting it, or just adjust pg_basebackup so that it always ends up at exactly 100% by the numbers as well in cases like this. Note that the progress indicator will *always* count wrong when you choose to include WAL, since we just don't know how much WAL there should be. I guess in this case we could just advance the end counter as well as we go, making sure it doesn't shoot above 100%. Does that seem reasonable? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walprotocol.h vs frontends
On mån, 2011-08-15 at 18:39 +0100, Peter Geoghegan wrote: If you want to upgrade a system running 8.3 (that uses float based timestamps) in using pg_upgrade you must compile 9.0 (or 8.4 or 9.1) with --disable-integer-datetimes. If at some point in the future you then want to upgrade to 9.2 with pg_upgrade you will again need to build 9.2 with --disable-integer-datetimes.If we remove the code for floating point representations of datetime then you won't be able to do that. I'm pretty surprised that pg_upgrade pushes that onus onto its users - for many users, the need to build their own binaries is a greater barrier to upgrading than doing a logical restore. Maybe that's simply considered a matter for package managers to worry about, but that doesn't sit well with me. Well, pg_upgrade only moves the heap files, it doesn't look into them or change them. Possibly, this sort of issue could be better handled in the future by making this a cluster, database, or table flag instead of a compile-time option. That way, at least newly created things could move to the new recommended behavior. The way it is set up now, we will possibly never get rid of the legacy behavior, unless we break pg_upgrade at some point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Some problems about cascading replication
Hi, When I tested the PITR on git master with max_wal_senders 0, I found that the following inappropriate log meesage was always output even though cascading replication is not in progress. Attached patch fixes this problem. LOG: terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect When making the patch, I found another problem about cascading replication; When promoting a cascading standby, postmaster sends SIGUSR2 to any cascading walsenders to kill them. But there is a orner-case where such walsender fails to receive SIGUSR2 and survives a standby promotion unexpectedly. This happens when postmaster sends SIGUSR2 before the walsender marks itself as a WAL sender, because postmaster sends SIGUSR2 to only the processes marked as a WAL sender. To avoid the corner-case, I changed walsender so that it checks whether recovery is in progress or not again after marking itself as a WAL sender. If recovery is not in progress even though the walsender is cascading one, it does the same thing as SIGUSR2 signal handler does, and then exits later. Attached patch also includes this fix. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index c0a32a3..2ec39dd 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2328,7 +2328,7 @@ reaper(SIGNAL_ARGS) * XXX should avoid the need for disconnection. When we do, * am_cascading_walsender should be replaced with RecoveryInProgress() */ - if (max_wal_senders 0) + if (max_wal_senders 0 CountChildren(BACKEND_TYPE_WALSND) 0) { ereport(LOG, (errmsg(terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect))); diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 0eadf64..ef6894c 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -368,6 +368,22 @@ StartReplication(StartReplicationCmd *cmd) SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE); /* + * When promoting a cascading standby, postmaster sends SIGUSR2 to + * any cascading walsenders to kill them. But there is a corner-case where + * such walsender fails to receive SIGUSR2 and survives a standby promotion + * unexpectedly. This happens when postmaster sends SIGUSR2 before + * the walsender marks itself as a WAL sender, because postmaster sends + * SIGUSR2 to only the processes marked as a WAL sender. + * + * To avoid this corner-case, if recovery is NOT in progress even though + * the walsender is cascading one, we do the same thing as SIGUSR2 signal + * handler does, i.e., set walsender_ready_to_stop to true. Which causes + * the walsender to end later. + */ + if (am_cascading_walsender !RecoveryInProgress()) + walsender_ready_to_stop = true; + + /* * We assume here that we're logging enough information in the WAL for * log-shipping, since this is checked in PostmasterMain(). * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On Mon, Aug 15, 2011 at 23:49, Greg Stark st...@mit.edu wrote: On Mon, Aug 15, 2011 at 9:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... and that would be a seriously bad API. There are not SUSET restrictions on other resources such as work_mem. Why do we need one for this? I think a better analogy would be imposing a maximum number of rows a query can output. That might be a sane thing to have for some circumstances but it's not useful in general. Uh. You mean like LIMIT, which we already have? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Allowing same cursor name in nested levels
Hi Tom, While going through few test-cases, I found that we cannot have two opened cursors with same name even though they are in two different functions. Here is what I mean: 1. I have two functions func1 and func2. 2. func1 calls func2 3. Both has cursor with same name, say mycursor 4. Somehow I forgot closing it 5. executing func1 throws an error 'cursor mycursor already in use' Is this expected behavior??? I just mingled around the code and later appended a cursor count to the cursor name to allow same cursor name in nested levels. I have run make check and didn't find any side-effect other than one expected output change. PFA, patch for the same. Please let me know your views / suggestions. Thanks Test-case: === CREATE OR REPLACE FUNCTION func1() RETURNS void AS $$ DECLARE res int; mycursor CURSOR FOR SELECT 'foo'; BEGIN OPEN mycursor; SELECT func2() INTO res; END; $$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION func2() RETURNS int AS $$ DECLARE mycursor CURSOR FOR SELECT 'bar'; BEGIN OPEN mycursor; return 1; END; $$ LANGUAGE plpgsql; SELECT func1(); ERROR: cursor mycursor already in use CONTEXT: PL/pgSQL function func2 line 5 at OPEN SQL statement SELECT func2() PL/pgSQL function func1 line 7 at SQL statement -- Jeevan B Chalke Senior Software Engineer, RD EnterpriseDB Corporation The Enterprise PostgreSQL Company Phone: +91 20 30589500 Website: www.enterprisedb.com EnterpriseDB Blog: http://blogs.enterprisedb.com/ Follow us on Twitter: http://www.twitter.com/enterprisedb This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message. diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y index 92b54dd..46e7442 100644 --- a/src/pl/plpgsql/src/gram.y +++ b/src/pl/plpgsql/src/gram.y @@ -509,6 +509,7 @@ decl_statement : decl_varname decl_const decl_datatype decl_collate decl_notnull char buf[1024]; char *cp1; char *cp2; + static unsigned int cursor_count = 0; /* pop local namespace for cursor args */ plpgsql_ns_pop(); @@ -539,7 +540,13 @@ decl_statement : decl_varname decl_const decl_datatype decl_collate decl_notnull *cp2++ = *cp1; *cp2++ = *cp1++; } - strcpy(cp2, '::pg_catalog.refcursor); + + /* + * Append cursor count after cursor name to remove + * conflict of same named cursor in different + * nested levels. + */ + snprintf(cp2, sizeof(buf) - strlen(buf), :%u'::pg_catalog.refcursor, cursor_count++); curname_def-query = pstrdup(buf); new-default_val = curname_def; diff --git a/src/test/regress/expected/plpgsql.out b/src/test/regress/expected/plpgsql.out index bed34c8..4bdccf3 100644 --- a/src/test/regress/expected/plpgsql.out +++ b/src/test/regress/expected/plpgsql.out @@ -3284,15 +3284,15 @@ begin end; $$ language plpgsql; select forc01(); -NOTICE: 5 from c -NOTICE: 6 from c -NOTICE: 7 from c -NOTICE: 9 from c -NOTICE: 10 from c -NOTICE: 41 from c2 -NOTICE: 42 from c2 -NOTICE: 43 from c2 -NOTICE: after loop, c2 = c2 +NOTICE: 5 from c:5 +NOTICE: 6 from c:5 +NOTICE: 7 from c:5 +NOTICE: 9 from c:5 +NOTICE: 10 from c:5 +NOTICE: 41 from c2:6 +NOTICE: 42 from c2:6 +NOTICE: 43 from c2:6 +NOTICE: after loop, c2 = c2:6 NOTICE: 41 from special_name NOTICE: 42 from special_name NOTICE: 43 from special_name -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some missing internationalization in pg_basebackup
Magnus Hagander mag...@hagander.net writes: Note that the progress indicator will *always* count wrong when you choose to include WAL, since we just don't know how much WAL there should be. I guess in this case we could just advance the end counter as well as we go, making sure it doesn't shoot above 100%. Does that seem reasonable? Yes. If you're interested into the progress being made and the goal is moving, it's better to know about that. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage
On Mon, Aug 15, 2011 at 13:50, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 15, 2011 at 7:03 AM, Magnus Hagander mag...@hagander.net wrote: The pg_stat_replication view exposes all the fields in StandbyReplyMessage *except* for the timestamp when the message was generated. On an active system this is not all that interesting, but on a mostly idle system that allows the monitoring to react faster than the timeout that actually kicks the other end off - and could be useful in manual debugging scenarios. Any particular reason why this was not exposed as it's own column? I wondered the same thing. Sounds like a good idea. I can go do that. Care to argue^Wbikeshed for a specific name? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] index-only scans
On 08/14/2011 12:31 AM, Heikki Linnakangas wrote: The same idea could of course be used to calculate the effective cache hit ratio for each table. Cache hit ratio would have the problem of feedback loops, though. Yeah, I'm not excited about making the planner and statistics more dynamic. Feedback loops and plan instability are not fun. I might be a little out of my league here... But I was thinking about the cache hit ratio and feedback loops. I understand automatic tuning would be hard. But making automatic tuning easier (by using pg_tune for example) would be a big plus for most use cases. To make it easier to tune the page read costs automatically, it would be nice if there would be four variables instead of the current two: - random_page_cost is the cost of reading a random page from storage. Currently it is not, it is the cost of accessing a random page, taking in account it might be in memory. - seq_page_cost is the cost of reading pages sequentially from storage - memory_page_cost is the cost of reading a page in memory - cache_hit_ratio is the expected cache hit ratio memory_page_cost would be server global, random and seq page costs tablespace specific, and cache_hit_ratio relation specific. You would get the current behavior by tuning *_page_costs realistically, and setting cache_hit_ratio globally so that the expected random_page_cost / seq_page_cost stays the same as now. The biggest advantage of this would be that the correct values are much easier to detect automatically compared to current situation. This can be done using pg_statio_* views and IO speed testing. They should not be tuned automatically by PostgreSQL, at least not the cache_hit_ratio, as that leads to the possibility of feedback loops and plan instability. The variables would also be much easier to understand. There is the question if one should be allowed to tune the *_page_costs at all. If I am not missing something, it is possible to detect the correct values programmatically and they do not change if you do not change the hardware. Cache hit ratio is the real reason why they are currently so important for tuning. An example why the current random_page_cost and seq_page_cost tuning is not adequate is that you can only set random_page_cost per tablespace. That makes perfect sense if random_page_cost would be the cost of accessing a page in storage. But it is not, it is a combination of that and caching effects, so that it actually varies per relation (and over time). How do you set it correctly for a query where one relation is fully cached and another one not? Another problem is that if you use random_page_cost == seq_page_cost, you are effectively saying that everything is in cache. But if everything is in cache, the cost of page access relative to cpu_*_costs is way off. The more random_page_cost and seq_page_cost are different, the more they mean the storage access costs. When they are the same, they mean the memory page cost. There can be an order of magnitude in difference of a storage page cost and a memory page cost. So it is hard to tune the cpu_*_costs realistically for cases where sometimes data is in cache and sometimes not. Ok, enough hand waving for one post :) Sorry if this all is obvious / discussed before. My googling didn't turn out anything directly related, although these have some similarity: - Per-table random_page_cost for tables that we know are always cached [http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php] - Script to compute random page cost [http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php] - The science of optimization in practical terms? [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php], getting really interesting starting from here: [http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php] - Anssi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On 08/16/2011 04:56 AM, Magnus Hagander wrote: On Mon, Aug 15, 2011 at 23:49, Greg Starkst...@mit.edu wrote: On Mon, Aug 15, 2011 at 9:31 PM, Tom Lanet...@sss.pgh.pa.us wrote: ... and that would be a seriously bad API. There are not SUSET restrictions on other resources such as work_mem. Why do we need one for this? I think a better analogy would be imposing a maximum number of rows a query can output. That might be a sane thing to have for some circumstances but it's not useful in general. Uh. You mean like LIMIT, which we already have? There is no LIMIT imposed on a query by a server setting, which would be the right analogy here. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
Looking at the calculation of levelStep: + /* +* Calculate levelStep by available amount of memory. We should be able to +* load into main memory one page of each underlying node buffer (which +* are in levelStep below). That give constraint over +* maintenance_work_mem. Also we should be able to have subtree of +* levelStep level in cache. That give constraint over +* effective_cache_size. +* +* i'th underlying level of sub-tree can consists of +* i^maxIndexTuplesPerPage pages at maximum. So, subtree of levelStep +* levels can't be greater then 2 * maxIndexTuplesPerPage ^ levelStep +* pages. We use some more reserve due to we probably can't take whole +* effective cache and use formula 4 * maxIndexTuplesPerPage ^ levelStep = +* effectiveCache. We use similar logic with maintenance_work_mem. We +* should be able to store at least last pages of all buffers where we are +* emptying current buffer to. +*/ + effectiveMemory = Min(maintenance_work_mem * 1024 / BLCKSZ, + effective_cache_size); + levelStep = (int) log((double) effectiveMemory / 4.0) / + log((double) maxIndexTuplesPerPage); + I can see that that's equal to the formula given in the paper, log_B(M/4B), but I couldn't see any explanation for that formula in the paper. Your explanation makes sense, but where did it come from? It seems a bit pessimistic: while it's true that the a subtree can't be larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter upper bound on it. The max size of a subtree of depth n can be calculated as the geometric series: r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r) where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but for a large r and small n (which is typical), the 2 * maxIndexTuplesPerPage^levelStep formula gives a value that's almost twice as large as the real max size of a subtree. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] non-ipv6 vs hostnames
Accidentally specifying an IPv6 address in pg_hba.conf on a system that doesn't have ipv6 support gives the following error: LOG: specifying both host name and CIDR mask is invalid: ::1/128 Which is obviously wrong, because I didn't do that. Do we need to detect and special-case ipv6 addresses in this case? FWIW, the line was simply: hostreplication all ::1/128 trust -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup's from standby
On Tue, Aug 16, 2011 at 2:34 AM, senthilnathan senthilnatha...@gmail.com wrote: Thanks for your reply. What will happen if you issue *checkpoint* at STANDBY. I presume that it will flush the data to the disk. It will perform a restartpoint. http://www.postgresql.org/docs/9.0/static/wal-configuration.html -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Backup's from standby
On Tue, Aug 16, 2011 at 7:34 AM, senthilnathan senthilnatha...@gmail.com wrote: Thanks for your reply. What will happen if you issue *checkpoint* at STANDBY. I presume that it will flush the data to the disk. Yes. On the standby that is known as a restartpoint. Will there be any conflict with the master WAL.(like checkpoint location...) No, there is no effect on the master. The restartpoint happens as a background process on the standby, so replay will continue while we restart. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage
On Tue, Aug 16, 2011 at 10:51 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, Aug 15, 2011 at 13:50, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 15, 2011 at 7:03 AM, Magnus Hagander mag...@hagander.net wrote: The pg_stat_replication view exposes all the fields in StandbyReplyMessage *except* for the timestamp when the message was generated. On an active system this is not all that interesting, but on a mostly idle system that allows the monitoring to react faster than the timeout that actually kicks the other end off - and could be useful in manual debugging scenarios. Any particular reason why this was not exposed as it's own column? I wondered the same thing. Sounds like a good idea. I can go do that. Care to argue^Wbikeshed for a specific name? reply_timestamp -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] walprotocol.h vs frontends
On Tue, Aug 16, 2011 at 9:35 AM, Magnus Hagander mag...@hagander.net wrote: So right now what we do is allow a single packet to span multiple files, but since MAX_SEND_SIZE is 128KB it will always be smaller than a single file, so we can only ever span two files at most. Unless someone has tweaked their xlog segment size to insane values. We should limit MAX_SEND_SIZE to be same or less than WAL_SEGMENT_SIZE. We gain nothing by continuing to allow the existing code to work in the way it does. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Tue, Aug 16, 2011 at 4:04 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I can see that that's equal to the formula given in the paper, log_B(M/4B), but I couldn't see any explanation for that formula in the paper. Your explanation makes sense, but where did it come from? I didn't find it too. But it has to reservse memory for both sub-tree and active buffers. While we'are reserving memory for sub-tree in effective_cache_size and memory for last pages of buffers in maintenance_work_mem. It seems a bit pessimistic: while it's true that the a subtree can't be larger than 2 * maxIndexTuplesPerPage ^ levelStep, you can put a tighter upper bound on it. The max size of a subtree of depth n can be calculated as the geometric series: r^0 + r^1 + r^2 + ... + r^n = (1 - r^(n + 1)) / (1 - r) where r = maxIndexTuplesPerPage. For r=2 those formulas are equal, but for a large r and small n (which is typical), the 2 * maxIndexTuplesPerPage^**levelStep formula gives a value that's almost twice as large as the real max size of a subtree. Thus, we can calculate: levelstep = min(log_r(1 + effective_cache_size_in_pages*(r - 1)) - 1, log_r(maintenance_work_mem_in_pages - 1)) and get more precise result. But also we need at least very rough estimate of memory occupied by node buffers hash tab and path items. -- With best regards, Alexander Korotkov.
Re: [HACKERS] src/backend/storage/ipc/README
On Tue, Aug 16, 2011 at 12:37 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: $SUBJECT is wildly out-of-date. Is there any point in keeping this, given the large (and actually correct) comment block near the top of sinvaladt.c? Huh, I never noticed that file before. Yeah, it seems useless as it stands. I wonder however if we could repurpose it as a slightly higher-level summary? If you just look at sinval(adt).c, you'll find out all about how inval messages get transmitted, but nothing about why we need them in the first place. There's some material about that over in src/backend/utils/cache/inval.c, which is not the easiest place to find if you're looking at storage/ipc/. That's for sure. IMHO, the fact that this functionality is broken up between three source files in two completely separate parts of the source tree is a mistake to begin with. It appears to me to be an attempt to create proper layers of abstraction, but in reality I think it's just an obstacle to understanding and improving the code. I would suggest merging inval.c into sinval.c so that it's all in src/backend/storage/ipc. That having been said, some more documentation of what sinval is for and what its restrictions are would be great. I wonder if it shouldn't go into a file called README.sinval or somesuch, though. The functionality in that directory is fairly diverse and you could conceivably end up with README.procarray, README.shmem, etc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some problems about cascading replication
On Tue, Aug 16, 2011 at 9:55 AM, Fujii Masao masao.fu...@gmail.com wrote: When I tested the PITR on git master with max_wal_senders 0, I found that the following inappropriate log meesage was always output even though cascading replication is not in progress. Attached patch fixes this problem. LOG: terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect When making the patch, I found another problem about cascading replication; When promoting a cascading standby, postmaster sends SIGUSR2 to any cascading walsenders to kill them. But there is a orner-case where such walsender fails to receive SIGUSR2 and survives a standby promotion unexpectedly. This happens when postmaster sends SIGUSR2 before the walsender marks itself as a WAL sender, because postmaster sends SIGUSR2 to only the processes marked as a WAL sender. To avoid the corner-case, I changed walsender so that it checks whether recovery is in progress or not again after marking itself as a WAL sender. If recovery is not in progress even though the walsender is cascading one, it does the same thing as SIGUSR2 signal handler does, and then exits later. Attached patch also includes this fix. Looks like valid problems and appropriate fixes to me. Will commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage
On Tue, Aug 16, 2011 at 8:54 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Aug 16, 2011 at 10:51 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, Aug 15, 2011 at 13:50, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 15, 2011 at 7:03 AM, Magnus Hagander mag...@hagander.net wrote: The pg_stat_replication view exposes all the fields in StandbyReplyMessage *except* for the timestamp when the message was generated. On an active system this is not all that interesting, but on a mostly idle system that allows the monitoring to react faster than the timeout that actually kicks the other end off - and could be useful in manual debugging scenarios. Any particular reason why this was not exposed as it's own column? I wondered the same thing. Sounds like a good idea. I can go do that. Care to argue^Wbikeshed for a specific name? reply_timestamp Works for me. I'd suggest that we rename it that way in StandbyReplyMessage, so that the name in the struct and the name in the system view match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On Tue, Aug 16, 2011 at 7:23 AM, Andrew Dunstan and...@dunslane.net wrote: There is no LIMIT imposed on a query by a server setting, which would be the right analogy here. I am not sure I understand any of these analogies. I think Peter's point is that it's not very difficult to write (perhaps accidentally) a CTE that goes into infinite recursion. In general, we can't detect that situation, because it's equivalent to the halting problem. But there's an old joke about a Turing test (where a computer program must try to fool a human into believing that it is also human) where the person asks the computer: What would the following program do? 10 PRINT HELLO 20 GOTO 10 And gets back an infinite stream of HELLO HELLO HELLO HELLO HELLO I don't think it's going to be feasible to implement a security restriction that keeps untrusted users from hosing the machine with a long running CTE; there are nearly infinitely many ways for an untrusted user who can run queries to hose the machine, and plugging one of them imperfectly is going to get us pretty much nowhere. On the other hand, there is perhaps a reasonable argument to be made that we should cut off CTE processing at some point to prevent *inadvertent* exhaustion of system resources. Or even query processing more generally. In fact, we already have some things sort of like this: you can use statement_timeout to kill queries that run for too long, and we just recently added temp_file_limit to kill those that eat too much temp file space. I can see a good case for memory_limit and query_cpu_limit and maybe some others. cte_recursion_depth_limit wouldn't be all that high on my personal list, I guess, but the concept doesn't seem completely insane. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing same cursor name in nested levels
Jeevan Chalke jeevan.cha...@enterprisedb.com writes: 1. I have two functions func1 and func2. 2. func1 calls func2 3. Both has cursor with same name, say mycursor 4. Somehow I forgot closing it 5. executing func1 throws an error 'cursor mycursor already in use' Is this expected behavior??? Yes ... or at least, it's always been like that. I just mingled around the code and later appended a cursor count to the cursor name to allow same cursor name in nested levels. That would break code that expects the cursor name to be what it said it should be. It is documented that you can refer to cursors by name across multiple functions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland j...@mcknight.de wrote: On Mon, Aug 15, 2011 at 6:09 PM, Jim Nasby j...@nasby.net wrote: I suspect that all the other cases of BEGIN failing would be syntax errors, so you would immediately know in testing that something was wrong. A missing file is definitely not a syntax error, so we can't really depend on user testing to ensure this is handled correctly. IMO, that makes it critical that that error puts us in an aborted transaction. Why can we not just require the user to verify if his BEGIN query failed or succeeded? Is that really too much to ask for? Also see what Robert wrote about proxies in between that keep track of the transaction state. Consider they see a BEGIN query that fails. How would they know if the session is now in an aborted transaction or not in a transaction at all? I think the point here is that we should be consistent. Currently, you can make BEGIN fail by doing it on the standby, and asking for READ WRITE mode: rhaas=# begin transaction read write; ERROR: cannot set transaction read-write mode during recovery After doing that, you are NOT in a transaction context: rhaas=# select 1; ?column? -- 1 (1 row) So whatever this does should be consistent with that, at least IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage
On Tue, Aug 16, 2011 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Aug 16, 2011 at 8:54 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Aug 16, 2011 at 10:51 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, Aug 15, 2011 at 13:50, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 15, 2011 at 7:03 AM, Magnus Hagander mag...@hagander.net wrote: The pg_stat_replication view exposes all the fields in StandbyReplyMessage *except* for the timestamp when the message was generated. On an active system this is not all that interesting, but on a mostly idle system that allows the monitoring to react faster than the timeout that actually kicks the other end off - and could be useful in manual debugging scenarios. Any particular reason why this was not exposed as it's own column? I wondered the same thing. Sounds like a good idea. I can go do that. Care to argue^Wbikeshed for a specific name? reply_timestamp Works for me. I'd suggest that we rename it that way in StandbyReplyMessage, so that the name in the struct and the name in the system view match. -1 The field is named same as equivalent field in other messages. The field on the view is a summary of all previous messages, which is a different thing. Perhaps we should call it last_reply_timestamp to make that clearer, though long titles are annoying. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_stat_replication vs StandbyReplyMessage
On Tue, Aug 16, 2011 at 16:00, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Aug 16, 2011 at 2:25 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Aug 16, 2011 at 8:54 AM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, Aug 16, 2011 at 10:51 AM, Magnus Hagander mag...@hagander.net wrote: On Mon, Aug 15, 2011 at 13:50, Robert Haas robertmh...@gmail.com wrote: On Mon, Aug 15, 2011 at 7:03 AM, Magnus Hagander mag...@hagander.net wrote: The pg_stat_replication view exposes all the fields in StandbyReplyMessage *except* for the timestamp when the message was generated. On an active system this is not all that interesting, but on a mostly idle system that allows the monitoring to react faster than the timeout that actually kicks the other end off - and could be useful in manual debugging scenarios. Any particular reason why this was not exposed as it's own column? I wondered the same thing. Sounds like a good idea. I can go do that. Care to argue^Wbikeshed for a specific name? reply_timestamp Works for me. I'd suggest that we rename it that way in StandbyReplyMessage, so that the name in the struct and the name in the system view match. -1 The field is named same as equivalent field in other messages. The field on the view is a summary of all previous messages, which is a different thing. Perhaps we should call it last_reply_timestamp to make that clearer, though long titles are annoying. We don't say last_replay_location either, we just say replay_location. Adding the last_ part is just annoying. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] non-ipv6 vs hostnames
Magnus Hagander mag...@hagander.net writes: Accidentally specifying an IPv6 address in pg_hba.conf on a system that doesn't have ipv6 support gives the following error: LOG: specifying both host name and CIDR mask is invalid: ::1/128 Which is obviously wrong, because I didn't do that. Do we need to detect and special-case ipv6 addresses in this case? Doesn't really seem worth going out of our way for that. Systems with no IPv6 support are a dying breed, and will be more so by the time 9.2 gets deployed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] non-ipv6 vs hostnames
On Tue, Aug 16, 2011 at 16:12, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: Accidentally specifying an IPv6 address in pg_hba.conf on a system that doesn't have ipv6 support gives the following error: LOG: specifying both host name and CIDR mask is invalid: ::1/128 Which is obviously wrong, because I didn't do that. Do we need to detect and special-case ipv6 addresses in this case? Doesn't really seem worth going out of our way for that. Systems with no IPv6 support are a dying breed, and will be more so by the time 9.2 gets deployed. Well, I got this on a win64 build. It's *supposed* to have ipv6. I wonder if it breaks on windows just because there is no ipv6 address on the machine... Unfortunately I shut the machine down and won't have time to test more right now, but I'll try to figure that out later unless beaten to it... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
Robert Haas robertmh...@gmail.com writes: I don't think it's going to be feasible to implement a security restriction that keeps untrusted users from hosing the machine with a long running CTE; there are nearly infinitely many ways for an untrusted user who can run queries to hose the machine, and plugging one of them imperfectly is going to get us pretty much nowhere. On the other hand, there is perhaps a reasonable argument to be made that we should cut off CTE processing at some point to prevent *inadvertent* exhaustion of system resources. Or even query processing more generally. Indeed: the real question here is why a recursive CTE is any worse than, say, an accidentally unconstrained join (or three or four...). However, we already have a perfectly suitable general mechanism for that; it's called statement_timeout. I think we've already had the discussion about whether there should be a system-wide SUSET maximum statement_timeout, and rejected it on the grounds that there was not a very clear need for it. In fact, we already have some things sort of like this: you can use statement_timeout to kill queries that run for too long, and we just recently added temp_file_limit to kill those that eat too much temp file space. I can see a good case for memory_limit and query_cpu_limit and maybe some others. temp_file_limit got accepted because it was constraining a resource not closely related to run time. I don't think that it provides a precedent in support of any of these other ideas. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On Tue, Aug 16, 2011 at 10:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: In fact, we already have some things sort of like this: you can use statement_timeout to kill queries that run for too long, and we just recently added temp_file_limit to kill those that eat too much temp file space. I can see a good case for memory_limit and query_cpu_limit and maybe some others. temp_file_limit got accepted because it was constraining a resource not closely related to run time. I don't think that it provides a precedent in support of any of these other ideas. Well, CPU usage might be somewhat closely related to query runtime, but memory usage sure isn't. But we digress from $SUBJECT... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
Excerpts from Robert Haas's message of mar ago 16 09:59:04 -0400 2011: On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland j...@mcknight.de wrote: Also see what Robert wrote about proxies in between that keep track of the transaction state. Consider they see a BEGIN query that fails. How would they know if the session is now in an aborted transaction or not in a transaction at all? I think the point here is that we should be consistent. Currently, you can make BEGIN fail by doing it on the standby, and asking for READ WRITE mode: rhaas=# begin transaction read write; ERROR: cannot set transaction read-write mode during recovery After doing that, you are NOT in a transaction context: rhaas=# select 1; ?column? -- 1 (1 row) So whatever this does should be consistent with that, at least IMHO. I think we argued about a very similar problem years ago and the outcome was that you should be left in an aborted transaction block; otherwise running a dumb SQL script (which has no way to abort if it fails) could wreak serious havoc (?). I think this failure to behave in that fashion on the standby is something to be fixed, not imitated. What this says is that a driver or app seeing BEGIN fail should issue ROLLBACK before going further -- which seems the intuitive way to behave to me. No? -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, Aug 16, 2011 at 10:43 AM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of mar ago 16 09:59:04 -0400 2011: On Mon, Aug 15, 2011 at 6:46 PM, Joachim Wieland j...@mcknight.de wrote: Also see what Robert wrote about proxies in between that keep track of the transaction state. Consider they see a BEGIN query that fails. How would they know if the session is now in an aborted transaction or not in a transaction at all? I think the point here is that we should be consistent. Currently, you can make BEGIN fail by doing it on the standby, and asking for READ WRITE mode: rhaas=# begin transaction read write; ERROR: cannot set transaction read-write mode during recovery After doing that, you are NOT in a transaction context: rhaas=# select 1; ?column? -- 1 (1 row) So whatever this does should be consistent with that, at least IMHO. I think we argued about a very similar problem years ago and the outcome was that you should be left in an aborted transaction block; otherwise running a dumb SQL script (which has no way to abort if it fails) could wreak serious havoc (?). I think this failure to behave in that fashion on the standby is something to be fixed, not imitated. What this says is that a driver or app seeing BEGIN fail should issue ROLLBACK before going further -- which seems the intuitive way to behave to me. No? Maybe. But if we're going to change the behavior of BEGIN, then (1) we need to think about backward compatibility and (2) we should change it across the board. It's not for this patch to go invent something that's inconsistent with what we're already doing elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup and wal streaming
On Sat, Feb 26, 2011 at 20:59, Magnus Hagander mag...@hagander.net wrote: On Sat, Feb 26, 2011 at 20:48, Yeb Havinga yebhavi...@gmail.com wrote: On 2011-02-26 18:19, Magnus Hagander wrote: Attached is an updated version of the patch that includes these changes, as well as Windows support and an initial cut at a ref page for pg_receivexlog (needs some more detail still). I'm testing a bit more (with the previous version, sorry) and got the following while doing a stream backup from a cluster that was at that moment doing a pgbench run with 1 synchronous standby. mgrid@mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser Password: xlog start point: 15/72C8 pg_basebackup: starting background WAL receiver pg_basebackup: got WAL data offset 14744, expected 16791960424 ) 5148915/5148026 kb g(100%) 1/1 tablespaces xlog end point: 15/80568878 pg_basebackup: waiting for background process to finish streaming... pg_basebackup: child process exited with error 1 Hmm, strange. What platform are you on? I saw something similar *once* on Windows, but it then passed my tests a lot of times in a row so I figured it was just a didn't clean properly thing. Clearly there's a bug around. What's the size of the latest WAL file that it did work on? Is it 16791960424 bytes? That's way way to large, but perhaps it's not switching segment properly? (that value is supposedly the current write position in the file..) I'm in total monkey test mode here, so I don't even know if I'm not supposed to do the streaming variant while other stuff is going on. Oh yes, that's one of the main reasons to use it, so you should definitely be able to do that! I've posted a new version of this patch at http://archives.postgresql.org/pgsql-hackers/2011-08/msg00776.php - forgot there was an open thread on this, sorry. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython crash
After backporting plpython.c from HEAD, this is the error message I get: ERROR: key pg.dropped.6 not found in mapping HINT: To return null in a column, add the value None to the mapping with the key named after the column. CONTEXT: while creating return value PL/Python function myfunc What does it mean? -- Jean-Baptiste Quenot -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some problems about cascading replication
On 16.08.2011 16:25, Simon Riggs wrote: On Tue, Aug 16, 2011 at 9:55 AM, Fujii Masaomasao.fu...@gmail.com wrote: When I tested the PITR on git master with max_wal_senders 0, I found that the following inappropriate log meesage was always output even though cascading replication is not in progress. Attached patch fixes this problem. LOG: terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect When making the patch, I found another problem about cascading replication; When promoting a cascading standby, postmaster sends SIGUSR2 to any cascading walsenders to kill them. But there is a orner-case where such walsender fails to receive SIGUSR2 and survives a standby promotion unexpectedly. This happens when postmaster sends SIGUSR2 before the walsender marks itself as a WAL sender, because postmaster sends SIGUSR2 to only the processes marked as a WAL sender. To avoid the corner-case, I changed walsender so that it checks whether recovery is in progress or not again after marking itself as a WAL sender. If recovery is not in progress even though the walsender is cascading one, it does the same thing as SIGUSR2 signal handler does, and then exits later. Attached patch also includes this fix. Looks like valid problems and appropriate fixes to me. Will commit. I think there's a race condition here. If a walsender is just starting up, it might not have registered itself as a walsender yet. It's actually been there before this patch to suppress the log message. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] some missing internationalization in pg_basebackup
On Tue, Aug 16, 2011 at 11:35, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Magnus Hagander mag...@hagander.net writes: Note that the progress indicator will *always* count wrong when you choose to include WAL, since we just don't know how much WAL there should be. I guess in this case we could just advance the end counter as well as we go, making sure it doesn't shoot above 100%. Does that seem reasonable? Yes. If you're interested into the progress being made and the goal is moving, it's better to know about that. Done and applied. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython crash
On 16/08/11 16:52, Jean-Baptiste Quenot wrote: After backporting plpython.c from HEAD, this is the error message I get: ERROR: key pg.dropped.6 not found in mapping HINT: To return null in a column, add the value None to the mapping with the key named after the column. CONTEXT: while creating return value PL/Python function myfunc What does it mean? Ah, interesting, I think that this means that you are returning a table type and that table has a dropped column. The code should skip over dropped columns, but apparently it does not and tries to find a value for that column in the mapping you are returning. I'll try to reproduce it here. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On 16 August 2011 14:43, Robert Haas robertmh...@gmail.com wrote: What would the following program do? 10 PRINT HELLO 20 GOTO 10 And gets back an infinite stream of HELLO HELLO HELLO HELLO HELLO heh, that's pretty funny. It also compliments my view, because the Turing test is only failed because the human eventually thinks hmm, he's taking way too long to get to the '...and so on infinitum' bit. I don't think it's going to be feasible to implement a security restriction that keeps untrusted users from hosing the machine with a long running CTE; there are nearly infinitely many ways for an untrusted user who can run queries to hose the machine, and plugging one of them imperfectly is going to get us pretty much nowhere. Unless that happens to be the exact area that is a problem for you, due perhaps to a poorly written application. We're protecting against Murphy, not Machiavelli - if your users are malicious, or are motivated by seeing if they can somehow hose the machine for kicks, clearly all bets are off. This mindset happens to pretty well meet the needs of industry, IMHO. That said, I admit the case for making a separate SUSET GUC is the least compelling one I've made on this thread, if only because of the glaring inconsistency with other areas. On the other hand, there is perhaps a reasonable argument to be made that we should cut off CTE processing at some point to prevent *inadvertent* exhaustion of system resources. Or even query processing more generally. In fact, we already have some things sort of like this: you can use statement_timeout to kill queries that run for too long, and we just recently added temp_file_limit to kill those that eat too much temp file space. statement_timeout is far too blunt an instrument to deal with this problem. For one thing, it may vary based on many external factors, whereas number of iterations is a consistent, useful metric for the WITH query in isolation. For another, it prevents the DBA from managing known problems with deployed apps per database - maybe they have a reporting query that is expected to take a really long time. Sure, they can increase statement_timeout when that it run, but that's another thing to remember. I can see a good case for memory_limit and query_cpu_limit and maybe some others. cte_recursion_depth_limit wouldn't be all that high on my personal list, I guess, but the concept doesn't seem completely insane. I agree that those things would be much better than this. This is still a useful, easy-to-implement feature though. On 16 August 2011 15:26, Tom Lane t...@sss.pgh.pa.us wrote: Indeed: the real question here is why a recursive CTE is any worse than, say, an accidentally unconstrained join (or three or four...). It's much worse because an unconstrained join query will not all-of-a-sudden fail to have a terminating condition. It will, for the most part, take forever or practically forever predictably and consistently, even as the contents of tables changes over time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Online base backup from the hot-standby
On 11-08-16 02:09 AM, Jun Ishiduka wrote: Thanks. This has the following two problems. * pg_start_backup() must set 'on' to full_page_writes of the master that is actual writing of the WAL, but not the standby. Is there any way to tell from the WAL segments if they contain the full page data? If so could you verify this on the second slave when it is brought up? Or can you track this on the first slave and produce an error in either pg_start_backup or pg_stop_backup() I see in xlog.h XLR_BKP_REMOVABLE, the comment above it says that this flag is used to indicate that the archiver can compress the full page blocks to non-full page blocks. I am not familiar with where in the code this actually happens but will this cause issues if the first standby is processing WAL files from the archive? * The standby doesn't need to connect to the master that's actual writing WAL. (Ex. Standby2 in Cascade Replication: Master - Standby1 - Standby2) I'm worried how I should clear these problems. Regards. Jun Ishizuka NTT Software Corporation TEL:045-317-7018 E-Mail: ishizuka@po.ntts.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython crash
On 16/08/11 17:06, Jan Urbański wrote: On 16/08/11 16:52, Jean-Baptiste Quenot wrote: After backporting plpython.c from HEAD, this is the error message I get: ERROR: key pg.dropped.6 not found in mapping HINT: To return null in a column, add the value None to the mapping with the key named after the column. CONTEXT: while creating return value PL/Python function myfunc What does it mean? I did a couple of simple tests and can't see how can the code not skip dropped columns. It seems like you're missing this commit: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=41282111e6cc73aca4b63dffe950ba7a63e4bd8a Could you try running this query? (assuming your function is called 'myfync') select proname, relname, attname, attisdropped from pg_proc join pg_class on (prorettype = reltype) join pg_attribute on (attrelid = pg_class.oid) where proname = 'myfunc'; Cheers, Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] non-ipv6 vs hostnames
On tis, 2011-08-16 at 16:17 +0200, Magnus Hagander wrote: Well, I got this on a win64 build. It's *supposed* to have ipv6. I wonder if it breaks on windows just because there is no ipv6 address on the machine... It would mean that getaddrinfo() of ::1 failed. That seems weird. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython crash
Dear Jan, Sorry I typed the wrong git commands. With latest plpython from branch master I got the same gdb backtrace as reported before. I managed to wrap up a testcase that fails 100% of times on my setup: https://gist.github.com/1149512 Hope it crashes on your side too :-) This is the result on PG 9.0.4: https://gist.github.com/1149543 This is the result on PG 9.0.4 with plpython.c backported from HEAD: https://gist.github.com/1149558 Cheers, -- Jean-Baptiste Quenot -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plpython crash
On 16/08/11 19:07, Jean-Baptiste Quenot wrote: Dear Jan, Sorry I typed the wrong git commands. With latest plpython from branch master I got the same gdb backtrace as reported before. I managed to wrap up a testcase that fails 100% of times on my setup: https://gist.github.com/1149512 Hope it crashes on your side too :-) Awesome, it segfaults for me with HEAD ;) Now it's just a simple matter of programming... I'll take a look at it this evening. Jan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
Why is there ever a buffer on the root node? It seems like a waste of time to load N tuples from the heap into the root buffer, only to empty the buffer after it fills up. You might as well pull tuples directly from the heap. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fast GiST index build
On Tue, Aug 16, 2011 at 9:43 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Why is there ever a buffer on the root node? It seems like a waste of time to load N tuples from the heap into the root buffer, only to empty the buffer after it fills up. You might as well pull tuples directly from the heap. Yes, seems reasonable. Buffer on the root node was in the paper. But now I don't see the need of it too. -- With best regards, Alexander Korotkov.
Re: [HACKERS] WIP: Fast GiST index build
On 16.08.2011 21:46, Alexander Korotkov wrote: On Tue, Aug 16, 2011 at 9:43 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Why is there ever a buffer on the root node? It seems like a waste of time to load N tuples from the heap into the root buffer, only to empty the buffer after it fills up. You might as well pull tuples directly from the heap. Yes, seems reasonable. Buffer on the root node was in the paper. But now I don't see the need of it too. Here's an version of the patch with a bunch of minor changes: * No more buffer on root node. Aside from the root buffer being pointless, this simplifies gistRelocateBuildBuffersOnSplit slightly as it doesn't need the special case for root block anymore. * Moved the code to create new root item from gistplacetopage() to gistRelocateBuildBuffersOnSplit(). Seems better to keep the buffering-related code away from the normal codepath, for the sake of readability. * Changed the levelStep calculation to use the more accurate upper bound on subtree size that we discussed. * Changed the levelStep calculation so that it doesn't do just min(maintenance_work_mem, effective_cache_size) and calculate the levelStep from that. Maintenance_work_mem matters determines the max. number of page buffers that can be held in memory at a time, while effective_cache_size determines the max size of the subtree. Those are subtly different things. * Renamed NodeBuffer to GISTNodeBuffer, to avoid cluttering the namespace * Plus misc comment, whitespace, formatting and naming changes. I think this patch is in pretty good shape now. Could you re-run the performance tests you have on the wiki page, please, to make sure the performance hasn't regressed? It would also be nice to get some testing on the levelStep and pagesPerBuffer estimates, and the point where we switch to the buffering method. I'm particularly interested to know if there's any corner-cases with very skewed data distributions or strange GUC settings, where the estimates fails badly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM FULL versus relcache init files
On Mon, Aug 15, 2011 at 9:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: This might be the last bug from my concurrent-vacuum-full testing --- at least, I have no remaining unexplained events from about two full days of running the tests. The ones that are left involve backends randomly failing like this: psql: FATAL: could not read block 0 in file base/130532080/130545668: read only 0 of 8192 bytes usually during startup, though I have one example of a backend being repeatedly unable to access pg_proc due to similar errors. I believe what this traces to is stale relfilenode information taken from relcache init files, which contain precomputed relcache data intended to speed up backend startup. There is a curious little dance between write_relcache_init_file and RelationCacheInitFileInvalidate that is intended to ensure that when a process creates a new relcache init file that is already stale (because someone else invalidated the information concurrently), the bad file will get unlinked and not used. I think I invented that logic, so it's my fault that it doesn't work. It works fine as long as you consider only the two processes directly involved; but a third process can get fooled into using stale data. The scenario requires two successive invalidations, as for example from vacuum full on two system catalogs in a row, plus a stream of incoming new backends. It goes like this: 1. Process A vacuums a system catalog, unlinks init file, sends sinval messages, does second unlink (which does nothing). 2. Process B starts, observes lack of init file, begins to construct a new one. It gets to the end of AcceptInvalidationMessages in write_relcache_init_file. Since it started after A sent the first sinvals, it sees no incoming sinval messages and has no reason to think its new init file isn't good. 3. Meanwhile, Process A vacuums another system catalog, unlinks init file (doing nothing), and finally sends its sinval messages just after B looked for them. Now it will block trying to get RelCacheInitLock, which B holds. 4. Process B renames its already-stale init file into place, then releases RelCacheInitLock. 5. Process A gets the lock and removes the stale init file. Now process B is okay, because it will see A's second sinvals before it tries to make any use of the relcache data it has. And the stale init file is definitely gone after step 5. However, between steps 4 and 5 there is a window for Process C to start, read the stale init file, and attempt to use it. Since C started after A's second set of sinval messages, it doesn't see them and doesn't know it has stale data. As far as I can see at the moment, the only way to make this bulletproof is to turn both creation and deletion of the init file into atomic operations that include sinval messaging. What I have in mind is Creator: must take RelCacheInitLock, check for incoming invals, rename the new file into place if none, release RelCacheInitLock. (This is the same as what it does now.) Destroyer: must take RelCacheInitLock, unlink the init file, send its sinvals, release RelCacheInitLock. This guarantees that we serialize the sending of the sinval messages so that anyone who sees a bad init file in place *must* see the sinval messages afterwards, so long as they join the sinval messaging ring before looking for the init file (which they do). I don't think it's any worse than the current scheme from a parallelism point of view: the destroyer is holding RelCacheInitLock a bit longer than before, but that should not be a performance critical situation. Anybody see a hole in that? I don't. Seems more robust than the old way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
Robert Haas robertmh...@gmail.com writes: On Thu, Aug 11, 2011 at 5:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: What's bothering me at the moment is that the CLOBBER_CACHE_ALWAYS hack, which was meant to expose exactly this sort of problem, failed to do so --- buildfarm member jaguar has been running with that flag for ages and never showed this problem. I'm thinking that we should take out the hack in AcceptInvalidationMessages and instead put in #ifdeffed code that causes ReceiveSharedInvalidMessages to forcibly always call the reset function. Any thoughts about that? I'm OK with that, but while we're whacking this around, is there any chance that we could reduce the number of layers of function calls that are happening in here? I'm not excited about that. We have got three layers that each have significant complexities of their own to deal with. The only way to reduce the layer count is to make some piece of code deal with more of those complexities at once. I don't want to go there, especially not in the name of very marginal cycle-shaving. It would be nice to move the short-circuit test I recently inserted at the top of SIGetDataEntries() somewhere higher up in the call stack, but right now the layers of abstraction are so thick that it's not exactly clear how to do that. Glad you didn't try, because it would be wrong. The fact that there are no outstanding messages in sinvaladt.c doesn't prove there are none yet unprocessed inside ReceiveSharedInvalidMessages. Anyway, to get back to the original point: I'm getting less excited about redoing the CLOBBER_CACHE_ALWAYS code at a different level. The only thing that can really promise is that we find places outside the cache code that are mistakenly holding onto cache entry pointers. It can't do very much for the sort of problems I've been finding over the past week, first because you need some other process actively changing the underlying information (else the use of a stale cache entry isn't a problem), and second because when you're looking for bugs involving use of stale cache entries, over-enthusiastic flushing of the cache can actually mask the bugs. I'd still like to think of a better test methodology, but I don't think force clobbers inside ReceiveSharedInvalidMessages is it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Aug 15, 2011, at 5:46 PM, Joachim Wieland wrote: On Mon, Aug 15, 2011 at 6:09 PM, Jim Nasby j...@nasby.net wrote: I suspect that all the other cases of BEGIN failing would be syntax errors, so you would immediately know in testing that something was wrong. A missing file is definitely not a syntax error, so we can't really depend on user testing to ensure this is handled correctly. IMO, that makes it critical that that error puts us in an aborted transaction. Why can we not just require the user to verify if his BEGIN query failed or succeeded? Is that really too much to ask for? It's something else that you have to remember to get right. psql, for example, will blindly continue on unless you remembered to tell it to exit on an error. Also, an invalid transaction seems to be the result of least surprise... if you cared enough to begin a transaction, you're going to expect that either everything between that and the COMMIT succeeds or fails, not something in-between. Also see what Robert wrote about proxies in between that keep track of the transaction state. Consider they see a BEGIN query that fails. How would they know if the session is now in an aborted transaction or not in a transaction at all? AFAIK a proxy can tell if a transaction is in progress or not via libpq. Worst-case, it just needs to send an extra ROLLBACK. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] error: could not find pg_class tuple for index 2662
On Tue, Aug 16, 2011 at 3:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: It would be nice to move the short-circuit test I recently inserted at the top of SIGetDataEntries() somewhere higher up in the call stack, but right now the layers of abstraction are so thick that it's not exactly clear how to do that. Glad you didn't try, because it would be wrong. The fact that there are no outstanding messages in sinvaladt.c doesn't prove there are none yet unprocessed inside ReceiveSharedInvalidMessages. Oh. Good point. Anyway, to get back to the original point: I'm getting less excited about redoing the CLOBBER_CACHE_ALWAYS code at a different level. The only thing that can really promise is that we find places outside the cache code that are mistakenly holding onto cache entry pointers. It can't do very much for the sort of problems I've been finding over the past week, first because you need some other process actively changing the underlying information (else the use of a stale cache entry isn't a problem), and second because when you're looking for bugs involving use of stale cache entries, over-enthusiastic flushing of the cache can actually mask the bugs. I'd still like to think of a better test methodology, but I don't think force clobbers inside ReceiveSharedInvalidMessages is it. Makes sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] A note about hash-based catcache invalidations
I'm looking into the idea I mentioned earlier: All is not entirely lost, however: there's still some possible performance benefit to be gained here, if we go to the scheme of identifying victim catcache entries by hashvalue only. Currently, each heap_update in a cached catalog has to issue two sinval messages (per cache!): one against the old TID and one against the new TID. We'd be able to reduce that to one message in the common case where the hashvalue remains the same because the cache key columns didn't change. Removing the tuple ID from sinval messages turns out to have slightly wider impact than I thought at first, because the current coding passes those to callback functions registered with CacheRegisterSyscacheCallback, and there are a lot of 'em. However, only one of them seems to be doing anything with the tuplePtr argument, namely PlanCacheFuncCallback. We can make it work with the hash value instead, which will be about as good as what we're doing now. Any objections to that plan? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, 2011-08-16 at 11:01 -0500, Jim Nasby wrote: Also, an invalid transaction seems to be the result of least surprise... if you cared enough to begin a transaction, you're going to expect that either everything between that and the COMMIT succeeds or fails, not something in-between. Agreed. Perhaps we need a new utility command to set the snapshot to make the error handling a little more obvious? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Aug 16, 2011, at 5:40 PM, Jeff Davis wrote: On Tue, 2011-08-16 at 11:01 -0500, Jim Nasby wrote: Also, an invalid transaction seems to be the result of least surprise... if you cared enough to begin a transaction, you're going to expect that either everything between that and the COMMIT succeeds or fails, not something in-between. Agreed. Perhaps we need a new utility command to set the snapshot to make the error handling a little more obvious? Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem). So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN. I guess the big question to answer now is: what's the backwards compatibility impact of changing how BEGIN deals with runtime errors? -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Finding tables dropped by DROP TABLE CASCADE
Hi, I'm working on implemeting query cache for pgpool-II. The query cache must be deleted if related tables are dropped. Finding tables oids from DROP TABLE t1, t2, t3... is easy. Problem is DROP TABLE CASCADE. It seems there's no easy way to find table oids which will be deleted by DROP TABLE CASCADE. Any idea? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finding tables dropped by DROP TABLE CASCADE
Just a suggestion, but... Why not using an external wrapper function on reportDependentObjects in dependency.c to find the list of Oids for a cascade deletion based on a list of objects? Isn't it possible? Regards, -- Michael Paquier http://michael.otacoo.com
Re: [HACKERS] synchronized snapshots
Jim Nasby j...@nasby.net writes: Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem). So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN. I'm not convinced by the above argument, because it requires that you pretend there's a significant difference between syntax errors and run time errors (whatever those are). Syntax errors in a BEGIN command are not going to leave you in an aborted transaction, because the backend is not going to recognize the command as a BEGIN at all. This means that frontends *must* be capable of dealing with the case that a failed BEGIN didn't start a transaction. (Either that, or they just assume their commands are always syntactically perfect, which seems like pretty fragile programming to me; and the more weird nonstandard options we load onto BEGIN, the less tenable the position becomes. For example, if you feed BEGIN option-foo to a server that's a bit older than you thought it was, you will get a syntax error.) If we have some failure cases that start a transaction and some that do not, we just have a mess, IMO. I think we'd be far better off to maintain the position that a failed BEGIN does not start a transaction, under any circumstances. To do that, we cannot have this new option attached to the BEGIN, which is a good thing anyway IMO from a standards compatibility point of view. It'd be better to make it a separate utility statement. There is no logical problem in doing that, and we already have a precedent for utility statements that do something special before the transaction snapshot is taken: see LOCK. In fact, now that I think about it, setting the transaction snapshot from a utility statement would be functionally useful because then you could take locks beforehand. And as a bonus, we don't have a backwards compatibility problem to solve. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: Jim Nasby j...@nasby.net writes: Well, it appears we have a larger problem, as Robert pointed out that trying to start a writable transaction on a hot standby leaves you not in a transaction (which I feel is a problem). So IMHO the right thing to do here is make it so that runtime errors in BEGIN leave you in an invalid transaction. Then we can decide on the API for synchronized snapshots that makes sense instead of working around the behavior of BEGIN. I'm not convinced by the above argument, because it requires that you pretend there's a significant difference between syntax errors and run time errors (whatever those are). Syntax errors in a BEGIN command are not going to leave you in an aborted transaction, because the backend is not going to recognize the command as a BEGIN at all. This means that frontends *must* be capable of dealing with the case that a failed BEGIN didn't start a transaction. (Either that, or they just assume their commands are always syntactically perfect, which seems like pretty fragile programming to me; and the more weird nonstandard options we load onto BEGIN, the less tenable the position becomes. For example, if you feed BEGIN option-foo to a server that's a bit older than you thought it was, you will get a syntax error.) If we have some failure cases that start a transaction and some that do not, we just have a mess, IMO. More or less agreed. I think we'd be far better off to maintain the position that a failed BEGIN does not start a transaction, under any circumstances. Also agreed. To do that, we cannot have this new option attached to the BEGIN, ... Eh, why not? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finding tables dropped by DROP TABLE CASCADE
On Tue, Aug 16, 2011 at 8:10 PM, Tatsuo Ishii is...@postgresql.org wrote: I'm working on implemeting query cache for pgpool-II. The query cache must be deleted if related tables are dropped. Finding tables oids from DROP TABLE t1, t2, t3... is easy. Problem is DROP TABLE CASCADE. It seems there's no easy way to find table oids which will be deleted by DROP TABLE CASCADE. Any idea? Presumably it would also need to invalidated if someone did ALTER TABLE (which might recurse into unspecified children). It sort of seems like what you want to do is snoop the sinval traffic... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finding tables dropped by DROP TABLE CASCADE
Presumably it would also need to invalidated if someone did ALTER TABLE (which might recurse into unspecified children). Good point. For DROP TABLE/ALTER TABLE, I need to take care of its chidren. It sort of seems like what you want to do is snoop the sinval traffic... It's hard for pgpool-II since there's no API in PostgreSQL for that. Maybe I will look into the system catalog to find out children. I'm not sure if I can deal with CASCADE by the same method though. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
Robert Haas robertmh...@gmail.com writes: On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think we'd be far better off to maintain the position that a failed BEGIN does not start a transaction, under any circumstances. Also agreed. To do that, we cannot have this new option attached to the BEGIN, ... Eh, why not? Maybe I wasn't paying close enough attention to the thread, but I had the idea that there was some implementation reason why not. If not, we could still load the option onto BEGIN ... but I still find myself liking the idea of a separate command better, because of the locking issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, Aug 16, 2011 at 8:53 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Aug 16, 2011 at 8:35 PM, Tom Lane t...@sss.pgh.pa.us wrote: I think we'd be far better off to maintain the position that a failed BEGIN does not start a transaction, under any circumstances. Also agreed. To do that, we cannot have this new option attached to the BEGIN, ... Eh, why not? Maybe I wasn't paying close enough attention to the thread, but I had the idea that there was some implementation reason why not. If not, we could still load the option onto BEGIN ... but I still find myself liking the idea of a separate command better, because of the locking issue. Why does it matter whether you take the locks before or after the snapshot? If you're concerned with minimizing the race, what you should do is take all relevant locks in the parent before exporting the snapshot. I am not wild about adding another toplevel command for this. It seems a rather narrow use case, and attaching it to BEGIN feels natural to me. There may be some small benefit also in terms of minimizing the amount of sanity checking that must be done - for example, at BEGIN time, you don't have to check for the case where a snapshot has already been set. If we did add another toplevel command, what would we call it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finding tables dropped by DROP TABLE CASCADE
On Tue, Aug 16, 2011 at 8:52 PM, Tatsuo Ishii is...@postgresql.org wrote: Presumably it would also need to invalidated if someone did ALTER TABLE (which might recurse into unspecified children). Good point. For DROP TABLE/ALTER TABLE, I need to take care of its chidren. It sort of seems like what you want to do is snoop the sinval traffic... It's hard for pgpool-II since there's no API in PostgreSQL for that. Maybe I will look into the system catalog to find out children. I'm not sure if I can deal with CASCADE by the same method though. It's possible, but not too easy. Maybe we should have a special LISTEN channel that plays back (some subset of? some decoded version of?) the sinval messaging. I bet the pgAdmin guys would like an automated way of knowing when tables had been created/dropped, too... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, 2011-08-16 at 20:35 -0400, Tom Lane wrote: I'm not convinced by the above argument, because it requires that you pretend there's a significant difference between syntax errors and run time errors (whatever those are). After a syntax error like COMMMIT the transaction will remain inside the failed transaction block, but an error during COMMIT (e.g. deferred constraint check failure) will exit the transaction block. I think we'd be far better off to maintain the position that a failed BEGIN does not start a transaction, under any circumstances. To do that, we cannot have this new option attached to the BEGIN, which is a good thing anyway IMO from a standards compatibility point of view. It'd be better to make it a separate utility statement. +1 for a utility statement. Much clearer from the user's standpoint what kind of errors they might expect, and whether the session will remain in a transaction block. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Finding tables dropped by DROP TABLE CASCADE
On 08/16/2011 08:52 PM, Tatsuo Ishii wrote: Presumably it would also need to invalidated if someone did ALTER TABLE (which might recurse into unspecified children). Good point. For DROP TABLE/ALTER TABLE, I need to take care of its chidren. It sort of seems like what you want to do is snoop the sinval traffic... It's hard for pgpool-II since there's no API in PostgreSQL for that. Maybe I will look into the system catalog to find out children. I'm not sure if I can deal with CASCADE by the same method though. Not sure how much it will help, but I have implemented the logic to drop dependent tables and other objects from catalog info, in Pyrseas. The relevant code is in https://github.com/jmafc/Pyrseas/blob/master/pyrseas/dbobject/table.py In particular, _from_catalog() at line 375 fetches the information using the query and inhquery SELECTs just above it. Then in diff_map() starting at line 641 it issues the SQL to drop the various dependent objects and tables. If it's mostly inherited tables that you're concerned, the inhquery and the code dealing with inhstack should be helpful. Joe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, 2011-08-16 at 21:08 -0400, Robert Haas wrote: attaching it to BEGIN feels natural to me. My only objection is that people have different expectations about whether the session will remain in a transaction block when they encounter an error. So, it's hard to make this work without surprising about half the users. And there are some fairly significant consequences to users who guess that they will remain in a transaction block in case of an error; or who are just careless and don't consider that an error may occur. If we did add another toplevel command, what would we call it? SET TRANSACTION SNAPSHOT perhaps? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
On Tue, Aug 16, 2011 at 9:54 PM, Jeff Davis pg...@j-davis.com wrote: If we did add another toplevel command, what would we call it? SET TRANSACTION SNAPSHOT perhaps? Hmm, that's not bad, but I think we'd have to partially reserve TRANSACTION to make it work, which doesn't seem worth the pain it would cause. We could do something like TRANSACTION SNAPSHOT IS 'xyz', but that's a bit awkard. I still like BEGIN SNAPSHOT 'xyz' -- or even adding a generic options list like we use for some other commands, i.e. BEGIN (snapshot 'xyz'), which would leave the door open to arbitrary amounts of future nonstandard fiddling without the need for any more keywords. I understand the point about the results of a BEGIN failure leaving you outside a transaction, but that really only matters if you're doing psql dumb_script.sql, which is impractical for this feature anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Adjust total size in pg_basebackup progress report when reality
On Wed, Aug 17, 2011 at 12:00 AM, Magnus Hagander mag...@hagander.net wrote: Adjust total size in pg_basebackup progress report when reality changes When streaming including WAL, the size estimate will always be incorrect, since we don't know how much WAL is included. To make sure the output doesn't look completely unreasonable, this patch increases the total size whenever we go past the estimate, to make sure we never go above 100%. http://developer.postgresql.org/pgdocs/postgres/app-pgbasebackup.html Enables progress reporting. Turning this on will deliver an approximate progress report during the backup. Since the database may change during the backup, this is only an approximation and may not end at exactly 100%. In particular, when WAL log is included in the backup, the total amount of data cannot be estimated in advance, and in this case the progress report will only count towards the total amount of data without WAL. ISTM that the last sentence needs to be changed slightly because this commit changed the total amount so that it contains the amount of WAL. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some problems about cascading replication
On Tue, Aug 16, 2011 at 11:56 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think there's a race condition here. If a walsender is just starting up, it might not have registered itself as a walsender yet. It's actually been there before this patch to suppress the log message. Right. To address this problem, I changed the patch so that dead-end walsender (i.e., cascading walsender even though recovery is not in progress) always emits the log message. This change would cause duplicate log messages if the standby promotion is requested while multiple walsenders including dead-end one are running. But since this is less likely to happen, I don't think it's worth writing code to suppress those duplicate log messages. Comments? I attached the updated version of the patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index c0a32a3..90dad2c 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -2328,10 +2328,11 @@ reaper(SIGNAL_ARGS) * XXX should avoid the need for disconnection. When we do, * am_cascading_walsender should be replaced with RecoveryInProgress() */ - if (max_wal_senders 0) + if (max_wal_senders 0 CountChildren(BACKEND_TYPE_WALSND) 0) { ereport(LOG, - (errmsg(terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect))); + (errmsg(terminating all walsender processes to force cascaded +standby(s) to update timeline and reconnect))); SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND); } diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 0eadf64..813c998 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -368,6 +368,35 @@ StartReplication(StartReplicationCmd *cmd) SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE); /* + * When promoting a cascading standby, postmaster sends SIGUSR2 to + * any cascading walsenders to kill them. But there is a corner-case where + * such walsender fails to receive SIGUSR2 and survives a standby promotion + * unexpectedly. This happens when postmaster sends SIGUSR2 before + * the walsender marks itself as a WAL sender, because postmaster sends + * SIGUSR2 to only the processes marked as a WAL sender. + * + * To avoid this corner-case, if recovery is NOT in progress even though + * the walsender is cascading one, we do the same thing as SIGUSR2 signal + * handler does, i.e., set walsender_ready_to_stop to true. Which causes + * the walsender to end later. + * + * When terminating cascading walsenders, usually postmaster writes + * the log message announcing the terminations. But there is a race condition + * here. If there is no walsender except this process before reaching here, + * postmaster thinks that there is no walsender and suppresses that + * log message. To handle this case, we always emit that log message here. + * This might cause duplicate log messages, but which is less likely to happen, + * so it's not worth writing some code to suppress them. + */ + if (am_cascading_walsender !RecoveryInProgress()) + { + ereport(LOG, +(errmsg(terminating walsender process to force cascaded standby + to update timeline and reconnect))); + walsender_ready_to_stop = true; + } + + /* * We assume here that we're logging enough information in the WAL for * log-shipping, since this is checked in PostmasterMain(). * -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Fix bogus comment that claimed that the new BACKUP METHOD line i
On Tue, Aug 16, 2011 at 6:24 PM, Heikki Linnakangas heikki.linnakan...@iki.fi wrote: Fix bogus comment that claimed that the new BACKUP METHOD line in backup_label was new in 9.0. Spotted by Fujii Masao. Thanks for fixing that. What about the remaining of the patch which I submitted? Unless I'm missing something, the patch enables us to handle correctly the case where recovery starts from the backup using pg_basebackup, without adding new field like backupEndRequired into pg_control. Since it doesn't change the content of pg_control, it's applicable to v9.1, I think. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers