Re: [HACKERS] Patch to add regression tests for SCHEMA
Reviews are to be posted to pgsql-hackers, and then linked from the CommitFest site. Putting reviews only on the CommitFest site is bad practice. Indeed. Sorry, shame on me! I had not the original mail in my mailbox because I deleted it, I did not want to create a new thread because this is /also/ bad practice as I was recently told, and I was not motivated by fetching and reinstating the messages in my mailbox for a short one-liner review. Weel, I'll do better next time. Anyway, all Robins' test cases are basically a very good thing, especially as he tries corner cases, including checking for expected errors and permission denials. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add regression tests for SCHEMA
Completely agree. Although the poster was kind enough to intimate me by email about his update there, but was wondering that if he hadn't, I wouldnt' have dreamt that there is a feedback on the site, two months down the line. -- Robins Tharakan On 8 May 2013 09:13, Robert Haas wrote: > On Tue, May 7, 2013 at 7:26 PM, Robins Tharakan > wrote: > > Here is an updated patch that uses different schema / role names for > > different tests (as per commitfest site feedback). > > I'm not sure what's going on here. Reviews are to be posted to > pgsql-hackers, and then linked from the CommitFest site. Putting > reviews only on the CommitFest site is bad practice. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
[HACKERS] remove src/tools/make_keywords?
It doesn't look as though it is used or usable. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement
On Tue, May 7, 2013 at 12:48 PM, Jeff Janes wrote: > Anyway, I don't see this behavior change when turning on wal_debug and > looking in the logfiles for 'xlog flush request' messages. That could have everything to do with the hardware you're using. In general, the higher the cost of an fsync, the more useful it is to amortize that cost among concurrently committing transactions. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement
> Are you sure you properly cleared out the stats between profiling sessions? > Also, XLogFlush gets called by background processes like autovac, > checkpointer and bgwriter, in addition to being called by committing > processes. If one profiled session contained a checkpoint and other did > not, or one just idled a lot longer between when the benchmark finished and > when you shutdown the server, perhaps that explains it. > > Anyway, I don't see this behavior change when turning on wal_debug and > looking in the logfiles for 'xlog flush request' messages. Yes I did reset the stats between profiling sessions. Yes, probably XLogFlush()'s done by other processes were not considered by me. Also, after doing a few more runs (used pgbench -j 4 -c 32 -T 60) and observing results, I think I am getting convinced there is no surprise in getting XLogFlush() being called more often. Since, the patch enables backends to return more quickly from XLogFlush() in more number of cases than before, which in turn causes them to proceed to further transactions which again call XLogFlush(). So, the increased number of XLogFlush() should be accounted for by increase in throughput that we see. In earlier mail, it might have been wrong of me to conclude that the throughput rise and rise in #invocations of XLogFlush() are not proportional. I think they are, though not rigorously as far as I could measure. I am wondering if this line of interpreting it is correct? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add regression tests for SCHEMA
On Tue, May 7, 2013 at 7:26 PM, Robins Tharakan wrote: > Here is an updated patch that uses different schema / role names for > different tests (as per commitfest site feedback). I'm not sure what's going on here. Reviews are to be posted to pgsql-hackers, and then linked from the CommitFest site. Putting reviews only on the CommitFest site is bad practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement
On Tue, May 7, 2013 at 7:52 PM, Amit Langote wrote: > So, any rise in number of XLogFlush() calls should roughly > be accounted for by increased throughput. Am I right in interpreting > it this way? I think so. There certainly isn't any question that the increased throughput and the increased number of XLogFlush() calls are because of the new group commit behavior. The cost of a WAL write + flush is more effectively amortized, and so XLogFlush() calls becomes cheaper. I'm not prepared to make any predictions as to exactly how they might relate. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] local_preload_libraries logspam
On Tue, May 7, 2013 at 7:01 PM, Tom Lane wrote: > On the other hand, if we have it as DEBUG2 in the EXEC_BACKEND code > path, I would be willing to argue that that's too low. If you're > starting to feel a need to inquire into the backend's behavior, knowing > about loaded modules seems like one of the first things you need to know > about. Hence, I'd vote for DEBUG1. +1 to DEBUG1. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make targets of doc links used by phpPgAdmin static
Peter Eisentraut wrote: > On Tue, 2013-05-07 at 00:32 -0500, Karl O. Pinc wrote: > > Attached is a documentation patch against head which makes > > static the targets of the on-line PG html documentation that > > are referenced by the phpPgAdmin help system.e > > done I wonder about backpatching this to 9.2 ? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement
> Why is that surprising? Most of those XLogFlush() calls will recheck > the flushed-up-to point, and realize that another backend assumed the > role of group commit leader, and flushed their WAL for them, so aside > from the wait, the call to XLogFlush is cheap for that individual > backend. It's being invoked twice as many times because backends *can* > invoke it twice as many times. After going through it again, I think, I am getting convinced it is not surprising. Since, backends are now able to return quickly from XLogFlush(), on an average, they should also be able to proceed to next transactions faster and hence cause XLogFlush() to be invoked more often. So, any rise in number of XLogFlush() calls should roughly be accounted for by increased throughput. Am I right in interpreting it this way? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] local_preload_libraries logspam
Peter Eisentraut writes: > On Tue, 2013-05-07 at 14:28 -0700, Peter Geoghegan wrote: >> It seems like an oversight to me that local_preload_libraries causes a >> new log message to appear each time a new connection is established. > It is correct in my view (but perhaps I have just gotten used to it), > but I wouldn't mind if you wanted to make an informed change. It seems reasonable to me to reduce it to DEBUG1 level. In most use-cases, that message will appear on every backend start, rendering its usefulness debatable. Peter's characterization as "log spam" might be excessive, but probably not by much. On the other hand, if we have it as DEBUG2 in the EXEC_BACKEND code path, I would be willing to argue that that's too low. If you're starting to feel a need to inquire into the backend's behavior, knowing about loaded modules seems like one of the first things you need to know about. Hence, I'd vote for DEBUG1. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
(2013/05/07 22:40), Heikki Linnakangas wrote: On 26.04.2013 11:51, KONDO Mitsumasa wrote: So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint. It seems to fix this problem, but attached patch is plain. I didn't understand this. I committed a fix for the issue where recycled WAL files get in the way of recovery, but I'm not sure if you're describing the same issue or something else. But before we dig any deeper into this, can you verify if you're still seeing any issues on 9.3beta1? Yes, my fix was very plain and mistake fix point. I see your patch, your fix is right. I posted another problem that is cannot promote and PITR. It might be different problem, uut I could not reproduce it... Please wait for more! I try to analyze and test in 9.2.4 with your patch and 9.3beta1. Best regards, -- NTT Open Source Software Center Mitsumasa KONDO -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint
Greg Stark writes: > If we just reverted your fix and didn't fix it in 9.2 that would also > fix the crash right? The bug was only that it leaked the fact that the > view was provably empty from the definition? Well, it might fail to report a permissions violation when the not-allowed-to-be-accessed relation could be proven to yield no rows. I agree that it's a bit hard to call that a security issue as long as you assume that the attacker has access to the system catalogs; and even if you don't assume that, being able to discern that there's a check constraint on some table doesn't seem like a big leakage. I had originally thought that the issue only occurred in 9.2, but it turns out that the appendrel form of the problem occurs at least as far back as 8.4; for example the following admittedly-artificial query select * from ((select f1 as x from t1 offset 0) union all (select f2 as x from t2 offset 0)) ss where false; will not throw an error in any current release, even if the caller lacks select privilege on t1 and/or t2. With manipulation of the outer WHERE clause, you could find out about the nature of any check constraints on t1/t2. It's easier to see the bug in 9.2 because you no longer need a UNION ALL, but that doesn't much change the argument about whether it's a security issue. Given that forms of the bug have been around for a long time without anyone noticing, it might be okay to leave it unfixed in the back branches. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Make targets of doc links used by phpPgAdmin static
On Tue, 2013-05-07 at 00:32 -0500, Karl O. Pinc wrote: > Attached is a documentation patch against head which makes > static the targets of the on-line PG html documentation that > are referenced by the phpPgAdmin help system.e done -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] local_preload_libraries logspam
On Tue, 2013-05-07 at 14:28 -0700, Peter Geoghegan wrote: > It seems like an oversight to me that local_preload_libraries causes a > new log message to appear each time a new connection is established. It is correct in my view (but perhaps I have just gotten used to it), but I wouldn't mind if you wanted to make an informed change. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch to add regression tests for SCHEMA
Hi, Here is an updated patch that uses different schema / role names for different tests (as per commitfest site feedback). -- Robins Tharakan On 18 March 2013 17:16, robins wrote: > Hi, > > Attached is an updated patch that uses better schema / role names. > > -- > Robins Tharakan > > On 18 March 2013 12:59, robins wrote: > >> Thanks Alvaro. >> >> Since the tests were running fine, I didn't bother with elaborate names, >> but since you're concerned guess I'll make that change and re-submit. >> >> And yes, I've already submitted (to Commitfest) another patch related to >> regression tests for SEQUENCE. >> Would submit the SCHEMA patch once the above change is done. >> -- >> Robins >> >> >> On 18 March 2013 09:42, Alvaro Herrera wrote: >> >>> robins escribió: >>> > Hi, >>> > >>> > Please find attached a patch to take 'make check' code-coverage of >>> SCHEMA >>> > from 33% to 98%. >>> > >>> > Any feedback is more than welcome. >>> >>> I think you should use more explicit names for shared objects such as >>> roles -- i.e. not "r1" but "regression_user_1" and so on. (But be >>> careful about role names used by other tests). The issue is that these >>> tests might be run in a database that contains other stuff; certainly we >>> don't want to drop or otherwise affect previously existing roles. >>> >>> > p.s.: I am currently working on more regression tests (USER / VIEW / >>> > DISCARD etc). Please let me know if I need to post these as one large >>> > patch, instead of submitting one patch at a time. >>> >>> I think separate patches is better. Are you adding these patches to the >>> upcoming commitfest, for evaluation? https://commitfest.postgresql.org >>> >>> -- >>> Álvaro Herrerahttp://www.2ndQuadrant.com/ >>> PostgreSQL Development, 24x7 Support, Training & Services >>> >> >> > regress_schema_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint
On Tue, May 7, 2013 at 6:59 PM, Tom Lane wrote: > The main downside of this > is that fully dummy appendrels (those with no live children at all, > such as in your example) wouldn't be recognized by IS_DUMMY_PATH, > so the quality of planning in the outer query would be slightly > degraded. But such cases are probably sufficiently unusual that this > might be an okay price to pay for a back branch That's kind of dismaying. ORMs have a tendency to create queries like this and people may have even written such queries by hand and tested them to determine that postgres was able to exclude the useless relation. To have them install a security update and discover that something they had previously tested no longer worked would be annoying. If we just reverted your fix and didn't fix it in 9.2 that would also fix the crash right? The bug was only that it leaked the fact that the view was provably empty from the definition? But it's been that way for a long time and after all: xxx=> select * from pg_views where viewname = 'v'; schemaname | viewname | viewowner | definition +--+---+ public | v| stark | SELECT 1 + | | | WHERE false; (1 row) -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add some regression tests for SEQUENCE
Hi, Have provided an updated patch as per Fabien's recent response on Commitfest site. Any and all feedback is appreciated. -- Robins Tharakan regress_sequence_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] about index inheritance
On Mon, May 6, 2013 at 9:30 AM, Vincenzo Melandri wrote: > Hi guys, > > My first post here :) > I stumbled into the same problem as this guy > http://www.postgresql.org/message-id/4be2835a.5020...@cybertec.at > , so since I have some spare time recently, I've set-up the development > environment for postgresql and I think I may be able to contibute for the > feature of index inheritance, that is currently unsopported, but listed in > TODOs. > > I've spent some time reading the docs and I took a look at the code. Is > anybody out there working on this already? I don't want to overlap someone > else effort, plus I'll gladly take any advice or join the community efforts > if any, 'cause this feature seems pretty huge to me at a first glance.. This is a really hard problem. If you pick this as your first project hacking on PostgreSQL, you will almost certainly fail. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] local_preload_libraries logspam
It seems like an oversight to me that local_preload_libraries causes a new log message to appear each time a new connection is established. Is there any sympathy for the view that we should have a way of turning this off, or simply not log such messages? We could still have it appear at DEBUG2 level, as sometimes happens in the EXEC_BACKEND case (granted, this is just so that there is behavior equivalent to the !EXEC_BACKEND case for shared_preload_libraries). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] issues with dropped columns in plpgsql code again
On 7 May 2013 21:23, Pavel Stehule wrote: > sorry > > my test > > create table f1(a int, b int, c varchar, dropped_column numeric, d > varchar); > > create or replace function f1_trg() > returns trigger as $$ > declare _f1_var f1; > begin raise notice 'run trigger'; > _f1_var := new; > return _f1_var; > end; > $$ language plpgsql; > > create trigger xx before insert on f1 for row execute procedure f1_trg(); > > insert into f1 values(1,1,'aaa',1.1,'aaa'); > alter table f1 drop column dropped_column ; > > insert into f1 values(1,1,'aaa','aaa'); > > > Fails for me as well. I managed to run the last query either with restarting session, or disabling the trigger. Checked that on PostgreSQL 9.2.4 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit regards, Szymon
Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement
On Tue, May 7, 2013 at 2:20 AM, Amit Langote wrote: > Hello, > > I have been trying to understand how group commit implementation works > the way it does after 9.2 group commit enhancement patch > (9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit > it's a pretty old commit though I seek some clarification as to how it > provides the performance gain as it does. Also, I have observed some > behavior in this regard that I could not understand. > > Profiling results show that XLogFlush is called about twice as much > after this patch while for XLogWrite count remains about same as > before. Are you sure you properly cleared out the stats between profiling sessions? Also, XLogFlush gets called by background processes like autovac, checkpointer and bgwriter, in addition to being called by committing processes. If one profiled session contained a checkpoint and other did not, or one just idled a lot longer between when the benchmark finished and when you shutdown the server, perhaps that explains it. Anyway, I don't see this behavior change when turning on wal_debug and looking in the logfiles for 'xlog flush request' messages. > I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result > (after applying the patch) not being significantly different (as in > not twice as much on my system). > 1000 is a very small number of transactions to run a benchmark for. What was the duration? What were the actual TPS numbers? Does your hardware honor fsyncs? Cheers, Jeff
Re: [HACKERS] issues with dropped columns in plpgsql code again
sorry my test create table f1(a int, b int, c varchar, dropped_column numeric, d varchar); create or replace function f1_trg() returns trigger as $$ declare _f1_var f1; begin raise notice 'run trigger'; _f1_var := new; return _f1_var; end; $$ language plpgsql; create trigger xx before insert on f1 for row execute procedure f1_trg(); insert into f1 values(1,1,'aaa',1.1,'aaa'); alter table f1 drop column dropped_column ; insert into f1 values(1,1,'aaa','aaa'); 2013/5/7 Pavel Stehule > Hello > > One user reported a issues with triggers related to dropped columns. > > When I checked his code, I found a more possible problems. > > He use a trigger in form > > $$ > DECLARE somevar targettable; > BEGIN > somevar := NEW; > // do some with somevar; > RETURN somevar; > END; > $$ > > When I dropped column (I dropped numeric column) I had to finish session, > because I got a error in assign > > LOCATION: exec_stmt_raise, pl_exec.c:2985 > ERROR: 22P02: invalid input syntax for type numeric: "aaa" > CONTEXT: PL/pgSQL function f1_trg() line 4 at assignment > LOCATION: set_var_from_str, numeric.c:3253 > > Regards > > Pavel >
[HACKERS] issues with dropped columns in plpgsql code again
Hello One user reported a issues with triggers related to dropped columns. When I checked his code, I found a more possible problems. He use a trigger in form $$ DECLARE somevar targettable; BEGIN somevar := NEW; // do some with somevar; RETURN somevar; END; $$ When I dropped column (I dropped numeric column) I had to finish session, because I got a error in assign LOCATION: exec_stmt_raise, pl_exec.c:2985 ERROR: 22P02: invalid input syntax for type numeric: "aaa" CONTEXT: PL/pgSQL function f1_trg() line 4 at assignment LOCATION: set_var_from_str, numeric.c:3253 Regards Pavel
Re: [HACKERS] pg_dump --snapshot
* Andres Freund (and...@2ndquadrant.com) wrote: > > It helps in that once we have the lock, things aren't changing under us. > > The closer we can keep that to when the transaction starts, the better.. > > If you look at my example the timing where we take the snapshot isn't > the problem. While we wait for a lock on one object the not-yet-locked > objects can still change, get dropped et al. That window is so big that > the timing around the snapshot acquiration and trying to get the first > lock is insignificant. I wasn't talking about just getting the first lock but rather all the locks.. I agree that we can get stuck behind something else which is already holding a lock and that's certainly a problem. > Remember this is only a problem *if* there is > concurrent DDL. And in that case we very, very likely will have access > exlusive locks and thus will wait for them and have the described > problem. Yes, I understand that issue. > I don't think thats entirely possible. Think e.g. of constraints, > determination of HOTability, ... All those need to look at the database > state as its valid now, not as it was valid back when our snapshot > started. There will certainly still need to be parts of the system which are using SnapshotNow, yes. Catalog MVCC is a rather massive discussion which isn't going to be solved here. > In the logical decoding patches (submitted previously, will get > resubmitted when 9.3 is getting somewhere), we export a snapshot once we > have read enough WAL that all changes from that point henceforth can be > streamed out. That snapshot is obviously very useful to get a new node > up and running. At that point, you take a snapshot and want to export it for pg_dump to use to get the same view of the DB and then dump/restore it into a new database where you'll then start applying changes from the logical replication..? Interesting, but certainly also quite a specialized use-case, no? How do you plan to handle the locking concerns which have been raised during this discussion? Do you take locks out before creating the snapshot to pass to pg_dump? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] \watch stuck on execution of commands returning no tuples
On Tue, May 7, 2013 at 2:36 PM, Ian Lawrence Barwick wrote: > As a British native speaker involved in translating some PostgreSQL-related > Japanese text, all I can say is "yes please". (Although for true Japanese > support, the grammar would have to be pretty much reversed, with the verb > being placed last; and WHERE would come before SELECT, which might > challenge the parser a little). I am personally of the opinion that whoever designed SQL was far too concerned about making it look like English and insufficiently concerned about making it pleasant to use. Since the target list comes before the FROM clause, you see (or must write) what you want to select from which table aliases before defining what those table aliases mean. Overall, you end up with an organization where you define the aliases in the middle, and then half the stuff that uses those definitions is at the beginning (in the target list) while the other half is at the end (in the WHERE clause). Strange! But, it's a standard, so, we live with it. And, none of the query languages I've designed have gained quite as much traction as SQL, so who am I to complain? >> (I don't know whether I'm joking, so don't ask.) > > I think I am joking. Keep me posted as things develop. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] \watch stuck on execution of commands returning no tuples
2013/5/8 Robert Haas : > On Thu, May 2, 2013 at 7:53 PM, Bruce Momjian wrote: >> OK, what other database supports British spelling of commands? Can we >> call this a compatibility feature. ;-) The feature has been there >> since 2000: >> >> commit ebe0b236909732c75d665c73363bd4ac7a7aa138 >> Author: Bruce Momjian >> Date: Wed Nov 8 21:28:06 2000 + >> >> Add ANALYSE spelling of ANALYZE for vacuum. > > Maybe we should also support ANALIZAR, ANALYSER, and 分析する. As a British native speaker involved in translating some PostgreSQL-related Japanese text, all I can say is "yes please". (Although for true Japanese support, the grammar would have to be pretty much reversed, with the verb being placed last; and WHERE would come before SELECT, which might challenge the parser a little). > (I don't know whether I'm joking, so don't ask.) I think I am joking. Regards Ian Barwick -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertion failure at standby promotion
On 06.05.2013 13:08, Heikki Linnakangas wrote: On 03.05.2013 18:17, Fujii Masao wrote: Hi, I got the following assertion failure when I promoted the standby. 2013-05-04 00:12:31 JST sby1 LOG: received promote request 2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process due to administrator command 2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038 2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at log time 2013-05-04 00:12:25.878909+09 2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2 2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete 2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting: TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File: "walsender.c", Line: 1465) 2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started The way to reproduce this is: 1. Create one master A, one standby B, and one cascade standby C. 2. Run pgbench -i -s 10 3. Promote the standby B before pgbench -i finishes I was able to reproduce this. The assertion checks that if the system is promoted at WAL location X, we must not have already sent WAL at > X to the client. As the code stands, that assumption is wrong; the walsender will merrily stream WAL that hasn't been replayed yet, and the system can be promoted before replaying all the WAL that has been streamed to a cascading standby. The comment in GetStandbyFlushRecPtr(), which is the function that determined how far the WAL may be streamed to a cascading standby, says this: /* * We can safely send what's already been replayed. Also, if walreceiver * is streaming WAL from the same timeline, we can send anything that * it has streamed, but hasn't been replayed yet. */ There seems to be two bugs here: 1. This used to work in 9.2, because the startup process would always replay all the WAL present in pg_xlog before promoting (the WAL present in pg_xlog was streamed from master). But the refactorings in xlog.c in 9.3devel broke that, so that the startup process can promote earlier. 2. Even after fixing the logic in xlog.c, there is still a corner-case where the startup process can promote before all the WAL that has been received from walreceiver has been received. That happens if the WAL streaming is terminated at a page boundary, rather than at a record boundary. For example, we might have received WAL up to the page boundary at 0/5BFA000, but the last *complete* record that we have received ends at 0/5BF9BD8. To fix the second issue, I think two things need to happen. First, we need to suppress the check in walsender. Second, we need to teach the WAL replay to back off when that happens. At the moment, the replay in the cascading standby gets stuck, trying to fetch the next page containing rest of the partial WAL record. Instead, it should throw away the partial record it has, and resync at the end of the last replayed record. I came up with the attached patch for this (fix-standby-promotion-assert-fail-2.patch). You will also need to apply fast-promotion-quick-fix.patch to work around the bug in fast promotion I reported here: http://www.postgresql.org/message-id/5188cffa.3020...@vmware.com. pg_receivexlog has a variant of the same bug. If the server has sent WAL up to segment boundary, and the segment boundary splits a WAL record in half, and the server is then promoted so that the promotion checkpoint (or end-of-recovery) record goes to the previous segment, pg_receivexlog will not fetch the segment containing the checkpoint record correctly. When following a timeline switch, it should restart streaming from the new timeline at the point where the timeline switch occurred, rather than at the point on the old timeline where it stopped. Usually it's the same thing, but not if the streaming was paused at a page or segment boundary. To fix this for pg_receivexlog, I added the start position of the next timeline to the result set that the server sends at end of WAL streaming. Previously, it only sent the TLI of the next timeline, and it was assumed that the starting point is the same as the point where streaming was stopped. It's a bit late to change it, but I have to say I don't like this whole business of relaying WAL to a cascading standby that hasn't been replayed. It would be a lot simpler if you could assume that whatever gets streamed downstream has already been replayed upstream. I understand that it's nice from a performance point of view, and because a cascading standby isn't then lagged behind if the replay gets stalled in the upstream server because of a hot standby conflict. But still.. I'm going to sleep over this and continue testing tomorrow. Please have a look. I think 9.2 has the same bug, BTW. Without support for timeline switches over streaming replication, it was just more difficult to hit. Looking closer, I believe 9.2 is OK. Recovery loops back to retry the whole record correctly, it was just the timeline switch over streamin
Re: [HACKERS] \watch stuck on execution of commands returning no tuples
On Thu, May 2, 2013 at 7:53 PM, Bruce Momjian wrote: > OK, what other database supports British spelling of commands? Can we > call this a compatibility feature. ;-) The feature has been there > since 2000: > > commit ebe0b236909732c75d665c73363bd4ac7a7aa138 > Author: Bruce Momjian > Date: Wed Nov 8 21:28:06 2000 + > > Add ANALYSE spelling of ANALYZE for vacuum. Maybe we should also support ANALIZAR, ANALYSER, and 分析する. (I don't know whether I'm joking, so don't ask.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
On 2013-05-06 13:07:17 -0400, Tom Lane wrote: > I'm afraid that this is institutionalizing a design deficiency in > pg_dump; namely that it takes its snapshot before acquiring locks. I have suggested this before, but if pg_dump would use SELECT FOR SHARE in the queries it uses to build DDL it would detect most if not all modifications for most database objects including tables. Sure, it would error out, but thats far better than a silently corrupt dump: S1: =# CREATE TABLE testdump(); S2: =# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ; S2: =# SELECT count(*) FROM pg_class; --acquire snapshot S1: =# ALTER TABLE testdump ADD COLUMN a text; S2: =# -# SELECT * FROM pg_class cls -# JOIN pg_attribute att ON (cls.oid = att.attrelid) -# WHERE cls.oid = 'testdump'::regclass FOR UPDATE ERROR: could not serialize access due to concurrent update The serialization failure could be caught and translated into some error message explaining that concurrent ddl prevented pg_dump from working correctly. I don't immediately see a case where that would prevent valid backups from working. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint
Robert Haas writes: > On Wed, May 1, 2013 at 6:27 PM, Tom Lane wrote: >> Fix permission tests for views/tables proven empty by constraint exclusion. > I believe that this commit is responsible for the fact that the > following test case now crashes the server: > rhaas=# create or replace view foo (x) AS (select 1 union all select 2); > CREATE VIEW > rhaas=# select * from foo where false; > The connection to the server was lost. Attempting reset: Failed. OK, after looking a bit closer, this is actually a situation I had been wondering about last week, but I failed to construct a test case proving there was a problem. The issue is that the "append rel" code path also needs changes to handle the fact that subqueries that are appendrel members might get proven empty by constraint exclusion. (Even aside from the crash introduced here, the excluded subqueries fail to contribute to the final rangetable for permissions checks.) Unfortunately this blows a bit of a hole in the minimal fix I committed last week; there's no nice way to include these dummy subqueries into the plan tree that's passed on to setrefs.c. Ideally we'd add an out-of-band transmission path for them, and I think I will fix it that way in HEAD, but that requires adding a new List field to PlannerInfo. I'm afraid to do that in 9.2 for fear of breaking existing planner plugin modules. One way to fix it in 9.2 is to teach setrefs.c to thumb through the append_rel_list looking for excluded child subqueries to pull up their rangetables. Icky, but it shouldn't cost very many cycles in typical queries. The main downside of this solution is that it would add at least several dozen lines of new-and-poorly-tested code to a back branch, where it would get no additional testing to speak of before being loosed on users. The other way I can think of to handle it without PlannerInfo changes is to lobotomize set_append_rel_pathlist so that it doesn't omit dummy children from the AppendPath it constructs. The main downside of this is that fully dummy appendrels (those with no live children at all, such as in your example) wouldn't be recognized by IS_DUMMY_PATH, so the quality of planning in the outer query would be slightly degraded. But such cases are probably sufficiently unusual that this might be an okay price to pay for a back branch. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] corrupt pages detected by enabling checksums
On Tue, 2013-05-07 at 13:20 -0400, Robert Haas wrote: > Hmm. Rereading your last email, I see your point: since we now have > HEAP_XLOG_VISIBLE, this is much less of an issue than it would have > been before. I'm still not convinced that simplifying that code is a > good idea, but maybe it doesn't really hurt us much in practice. Given that there's not a big impact one way or another, I don't mind whether this particular patch is committed or not. Whichever you think is more understandable and safer at this late hour. Also, to be clear, the fact that I posted a patch was not meant to advocate either way; merely to present the options. Not sure exactly what you mean about the code simplification, but I agree that anything more substantial than this patch should be left for 9.4. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
On Tue, May 7, 2013 at 10:02 AM, Dimitri Fontaine wrote: > Rather than take some locks, you can now prevent the database objects > from changing with an event trigger. pg_dump could install that event > trigger in a preparing transaction, then do its work as currently, then > when done either remove or disable the event trigger. > > All the event trigger has to do is unconditionnaly raise an exception > with a message explaining that no DDL command is accepted during when a > dump is in progress. > I'm thinking of a case where a hot standby is executing a pg_dump and DDL is issued on the master -- would that cause any unexpected problems on the hot standby?
Re: [HACKERS] corrupt pages detected by enabling checksums
On Mon, May 6, 2013 at 5:04 PM, Jeff Davis wrote: > On Mon, 2013-05-06 at 15:31 -0400, Robert Haas wrote: >> On Wed, May 1, 2013 at 3:04 PM, Jeff Davis wrote: >> > Regardless, you have a reasonable claim that my patch had effects that >> > were not necessary. I have attached a draft patch to remedy that. Only >> > rudimentary testing was done. >> >> This looks reasonable to me. > > Can you please explain the scenario that loses many VM bits at once > during a crash, and results in a bunch of already-all-visible heap pages > being dirtied for no reason? Hmm. Rereading your last email, I see your point: since we now have HEAP_XLOG_VISIBLE, this is much less of an issue than it would have been before. I'm still not convinced that simplifying that code is a good idea, but maybe it doesn't really hurt us much in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
On 2013-05-07 16:50:52 +0100, Greg Stark wrote: > What's the worst case for using an old snapshot? If I try to access a > table that doesn't exist any longer I'll get an error. That doesn't > really seem that bad for the use case I described. It's worse for the > full table dump but for an explicit list of tables, eh. Seems ok to > me. Its worth than that, you can get a dump that dumps successfully but doesn't restore: http://archives.postgresql.org/message-id/20130507141526.GA6117%40awork2.anarazel.de But that's not really related to snapshots. And not related to the patch. Imo the whole focus on the time between snapshot taking and taking the locks is a misguided and not really the problem. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
On 2013-05-07 11:01:48 -0400, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > On 2013-05-07 08:54:54 -0400, Stephen Frost wrote: > > > Agreed- but it also isn't currently possible to make it any smaller. > > > > Uh. Why not? I think this is what needs to be fixed instead of making > > the hole marginally smaller elsewhere. > > If we're able to fix it- how would we allow users to take advantage of > that fix when starting their own snapshot and feeding it to pg_dump? Depends on the fix. I think the most realistic one is making sure central definitions haven't changed. In that case they wouldn't have to do anything. If we get some DDL lock or something they would probably need to do something like SELECT pg_prepare_database_dump(); or something - likely the same pg_dump would use. > > You can trivially reproduce the > > problem with pg_dump today: > > Agreed. The idea was to simply avoid making it worse. Your argument > seems to be that it's already horrible and easily broken and therefore > we can go ahead and make it worse and no one will complain because no > one has complained about how bad it is already. I don't follow that. It doesn't change *anything* with the fundamental problems. We could also say we forbid ALTER TABLE taking exlusive locks because that makes the problem far more noticeable. > > > My suggestion was to lock everything that pg_dump locks, which we > > > clearly have locks for since pg_dump is acquiring them. Also, I don't > > > believe it'd be that difficult to identify what pg_dump would lock, at > > > least in a 'default' whole-database run. This is more of a stop-gap > > > than a complete solution. > > > > The problem is that locking - as shown above - doesn't really help all > > that much. > > It helps in that once we have the lock, things aren't changing under us. > The closer we can keep that to when the transaction starts, the better.. If you look at my example the timing where we take the snapshot isn't the problem. While we wait for a lock on one object the not-yet-locked objects can still change, get dropped et al. That window is so big that the timing around the snapshot acquiration and trying to get the first lock is insignificant. Remember this is only a problem *if* there is concurrent DDL. And in that case we very, very likely will have access exlusive locks and thus will wait for them and have the described problem. > > You would have to do it like: > > 1) start txn > > 2) acquire DDL prevention lock > > 3) assert we do not yet have a snapshot > > 4) acquire snapshot > > 5) lock objects > > 6) release DDL lock > > 7) dump objects/data > > 8) commit txn > > We'd need a '4.5' to get the list of objects, no? Yea, I had folded that into lock objects. > > pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That > > would make it unrunnable in many situations. Especially as we cannot > > easily (without using several connections at once) release locks before > > ending a transaction. > > I agree that we wouldn't want pg_dump preventing all catalog updates, > but wouldn't we be able to draw a distinction between objects being > modified and objects being added? I don't easily see how. Often enough object creation modifies rows at some point. So we would need intelligence at some higher level. I guess it might be possible to reuse the event trigger infrastructure if it could do all that already... > > > This will be an interesting thing to consider when > > > implementing MVCC for the catalog. > > > > I think using proper mvcc snapsot for catalog scans doesn't, cannot even > > in all case, imply having to use the user's transaction's snapshot, just > > one that guarantees a consistent result while a query is running. > > The hope would be to see a consistent view of what you can access while > the transaction is running.. I don't think thats entirely possible. Think e.g. of constraints, determination of HOTability, ... All those need to look at the database state as its valid now, not as it was valid back when our snapshot started. > > c) Quite possibly the snapshot we need needs to meet some special > > criterions that pg_dump cannot guarantee. > > That's a much more difficult case, of course. It would help to have a > better understanding of what, exactly, Simon's use-case for this feature > is to answer if this is an issue or not. I am sure its an issue for at least one of Simon's use cases. Because one of them is also mine ;). But I think there are many usecases that require this. In the logical decoding patches (submitted previously, will get resubmitted when 9.3 is getting somewhere), we export a snapshot once we have read enough WAL that all changes from that point henceforth can be streamed out. That snapshot is obviously very useful to get a new node up and running. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support
Re: [HACKERS] pg_dump --snapshot
Robert Haas writes: > On Mon, May 6, 2013 at 1:07 PM, Tom Lane wrote: >> I'm afraid that this is institutionalizing a design deficiency in >> pg_dump; namely that it takes its snapshot before acquiring locks. >> Ideally that would happen the other way around. I don't have a good >> idea how we could fix that --- but a feature that allows imposition >> of an outside snapshot will permanently foreclose ever fixing it. I don't even understand how you could take locks for a later snapshot, it seems to me you're talking about something much harder to design and code than making the catalogs fully MVCC compliant. >> What's more, this would greatly widen the risk window between when >> the snapshot is taken and when we have all the locks and can have >> some confidence that the DB isn't changing under us. Rather than take some locks, you can now prevent the database objects from changing with an event trigger. pg_dump could install that event trigger in a preparing transaction, then do its work as currently, then when done either remove or disable the event trigger. All the event trigger has to do is unconditionnaly raise an exception with a message explaining that no DDL command is accepted during when a dump is in progress. > Now, I'll grant you that this technique couldn't be used together with > the proposed --snapshot option, but so what? Many people do DDL > infrequently enough that this is not a problem in practice. But even > if it is a problem, I don't see why that can't simply be a documented > limitation of --snapshot. IME it's a problem in practice, because people tend to want to run their pg_dump and their rollouts within the same maintenance window. I mean, those lucky guys out there having such thing as a maintenance window. Again, it seems to me that the proper long term solution is both fully MVCC catalogs and reduced locking for most commands. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 7 May 2013 13:50, Heikki Linnakangas wrote: >> Can I suggest that we discuss a range of related changes together? So >> we have a roadmap of agreed changes in this area. That will be more >> efficient than discussing each one individually; often each one makes >> sense only as part of the wider context. > > > Sure, do you have something else in mind related to this? Not right this second. But I feel it would be better to consider things in a more top-down "what do we need in this area?" approach than the almost random mechanisms we use now. Given each of us seems to be equally surprised by what others are thinking, it would make sense to have a broader topic-level discussion, make a list of the thoughts and priorities in each area and discuss things as a whole. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On Tue, May 7, 2013 at 9:38 PM, Simon Riggs wrote: > On 3 May 2013 14:40, Heikki Linnakangas wrote: >> On 03.05.2013 16:29, Bruce Momjian wrote: >>> >>> On Fri, May 3, 2013 at 01:02:08PM +0200, Cédric Villemain wrote: This changes the existing API which will confuse people that know it and invalidate everything written in software and on wikis as to how to do it. That means all the "in case of fire break glass" instructions are all wrong and need to be rewritten and retested. >>> >>> >>> Yes, *that* is the main reason *not* to make the change. It has a >>> pretty bad cost in backwards compatibility loss. There is a gain, but >>> I don't think it outweighs the cost. >> >> >> So, is there a way to add this feature without breaking the API? > > > Yes, by adding a new parameter exclusively used to control this feature, > something like recovery_target_immediate = 'on/off'. We just need to add a named restore point when ending the backup (in pg_stop_backup() ?). No API change required. Just document that some predefined target names are set during backup. >>> >>> >>> So we auto-add a restore point based on the backup label. Does that >>> work for everyone? >> >> >> Unfortunately, no. There are cases where you want to stop right after >> reaching consistency, but the point where you reach consistency is not at >> the end of a backup. For example, if you take a backup using an atomic >> filesystem snapshot, there are no pg_start/stop_backup calls, and the system >> will reach consistency after replaying all the WAL in pg_xlog. You might >> think that you can just not create a recovery.conf file in that case, or >> create a dummy recovery.conf file with restore_command='/bin/false'. >> However, then the system will not find the existing timeline history files >> in the archive, and can pick a TLI that's already in use. I found this out >> the hard way, and actually ended up writing a restore_command that restore >> timeline history files normally, but returns non-zero for any real other >> files; it wasn't pretty. >> >> Another case is that you take a backup from a standby server; you can't >> write a restore-point WAL record in a standby. >> >> If we want to avoid adding a new option for this, how about a magic restore >> point called "consistent" or "immediate": >> >> recovery_target_name='immediate' >> >> That would stop recovery right after reaching consistency, but there >> wouldn't be an actual restore point record in the WAL stream. > > recovery_target_name='something' > > ...works for me. Either constent or immediate works. > > I request that the docs recommend this be used in conjunction with > pause_at_recovery_target = on, so that the user can begin inspecting > the database at the first available point and then roll forward from > that point if desired. And, we should forbid users from setting recovery_target_inclusive to false when recovery_target_name is set to something like 'immediate'? Because in this case, recovery would always end before reaching the consistent state and fail. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
On Mon, May 6, 2013 at 1:07 PM, Tom Lane wrote: >> Exported snapshots allow you to coordinate a number of actions >> together, so they all see a common view of the database. So this patch >> allows a very general approach to this, much more so than pg_dump >> allows currently since the exact timing of the snapshot is not >> controlled by the user. > > I'm afraid that this is institutionalizing a design deficiency in > pg_dump; namely that it takes its snapshot before acquiring locks. > Ideally that would happen the other way around. I don't have a good > idea how we could fix that --- but a feature that allows imposition > of an outside snapshot will permanently foreclose ever fixing it. > > What's more, this would greatly widen the risk window between when > the snapshot is taken and when we have all the locks and can have > some confidence that the DB isn't changing under us. I don't find this argument very convincing. The way you could fix the ordering problem is: 1. take locks on all the objects you think you need to dump 2. update your snapshot and check whether the list of objects you think you need to dump has changed 3. if yes, go to step 1, else done Now, I'll grant you that this technique couldn't be used together with the proposed --snapshot option, but so what? Many people do DDL infrequently enough that this is not a problem in practice. But even if it is a problem, I don't see why that can't simply be a documented limitation of --snapshot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
Greg Stark writes: > [ ideas about dumping some past state of a table ] > If I try to access a table whose schema has changed then I might use > the wrong tupledesc and see rows that don't decode properly. That > would be a disaster. Can we protect against that by noticing that the > pg_class row isn't visible to our snapshot and throw an error? Would > that be sufficient to protect against all schema changes? Would it > cause massive false positives based on whether vacuum had happened to > have run recently? No, no, and I'm not sure :-(. It might be sufficient to notice whether the pg_class row and all relevant pg_attribute rows are visible in your snapshot, at least for the narrow purpose of deciding whether you can dump data. (This would probably not, for instance, be enough to give you reliable info about check or foreign key constraints.) In general though, any such facility would surely block vacuuming on the table, indeed probably *all* tables in the database, which would be pretty disastrous in the long run. I think a better answer for people who need such a facility is to keep a WAL archive and use PITR when they actually need to get back yesterday's data. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement
On Tue, May 7, 2013 at 8:13 AM, Amit Langote wrote: > Profiling results show that XLogFlush() is called about twice as many > times after this patch while for XLogWrite() count remains about same > as before. This patch modifies XLogFlush() such that it offers the > said performance gain by alleviating the lock contention on > WALWriteLock using the new LWLockAcquireOrWait(). I do not however > understand why XLogFlush is invoked twice as many times (as seen from > the profiling results) as an effect of this patch. Why should this > patch make XLogFlush() being invoked twice as many times? Why is that surprising? Most of those XLogFlush() calls will recheck the flushed-up-to point, and realize that another backend assumed the role of group commit leader, and flushed their WAL for them, so aside from the wait, the call to XLogFlush is cheap for that individual backend. It's being invoked twice as many times because backends *can* invoke it twice as many times. For the record, the new group commit code did more than just alleviate lock contention. If it was true that amelioration of lock contention fully explained the improvements, then surely sleeping on an exclusive lock on WALWriteLock within XLogFlush would always hurt throughput, when in fact it can considerably help throughput, as demonstrated by 9.3's commit_delay implementation. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
Greg, * Greg Stark (st...@mit.edu) wrote: > One natural way to do it would be to make an option to pg_dump which > caused it to do all the normal pre-dump things it would normally do, > then export a snapshot and wait for the user. (Alternately it could > even create a prepared transaction which iirc keeps the locks until > it's committed). That gives users a way to get a snapshot that is > guaranteed to work until that transaction is exited. Right, that was one of the suggestions that I made up-thread a bit. > But I don't think that would really make users happy. I think the > usual use case for this feature would be to dump a single table or > small number of tables as of some time in the past that they didn't > plan in advance that they would need. They might have a cron job > periodically export a snapshot "just in case" and then want to use it > later. The snapshot has to be 'alive' in order to be joined, so unless we're changing something else, I don't think this cronjob approach would actually work (unless it forks off and keeps the transaction open, but I didn't read that from what you wrote..). > They wouldn't be happy if they had to create a prepared > transaction for each such snapshot which locked every table in their > database until they decide they don't actually need it. That would > mean they could never do any ddl. If they don't lock the table then any DDL they do would break the backup *anyway*. This is a pretty clear example of exactly the problem with simply adding this option to pg_dump.. The view of pg_class would be from when the snapshot was taken, but without locks, anything could have changed between then and when the backup tries to run. > The use case of wanting to dump a single table as of a few hours ago > (e.g. before some application data loss bug) is pretty compelling. If > we could do it it I think it would be worth quite a bit. I certainly like the idea, but it seems rather beyond what Simon was after with this patch, aiui, and does come with some other concerns like dealing with what happens if the table was modified after the snapshot was taken (possible because there wasn't a lock on it). > What's the worst case for using an old snapshot? If I try to access a > table that doesn't exist any longer I'll get an error. That doesn't > really seem that bad for the use case I described. It's worse for the > full table dump but for an explicit list of tables, eh. Seems ok to > me. I'm not convinced that risk of error is the only issue with this.. What if a column was dropped and then a new one put in its place (with the same name)? > If I try to access a table whose schema has changed then I might use > the wrong tupledesc and see rows that don't decode properly. That > would be a disaster. Can we protect against that by noticing that the > pg_class row isn't visible to our snapshot and throw an error? Would > that be sufficient to protect against all schema changes? Would it > cause massive false positives based on whether vacuum had happened to > have run recently? The way this is handled now is that we use syscache to get whatever the current view of the object is when we're running queries against it. The problem is that the 'current' view ends up being different from what we see in pg_class / pg_attribute. Andres provided a good example which illustrates this upthread. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] pg_dump --snapshot
On Tue, May 7, 2013 at 1:54 PM, Stephen Frost wrote: > I believe the point that Tom is making is that we shouldn't paint > ourselves into a corner by letting users provide old snapshots to > pg_dump which haven't acquired any of the necessary locks. The goal, at > least as I read it, is to come up with a workable design (and I don't > know that we have, but still) which provides a way for the locks to be > taken at least as quickly as what pg_dump does today and which we could > modify down the road to take the locks pre-snapshot (presuming we can > figure out a way to make that work). One natural way to do it would be to make an option to pg_dump which caused it to do all the normal pre-dump things it would normally do, then export a snapshot and wait for the user. (Alternately it could even create a prepared transaction which iirc keeps the locks until it's committed). That gives users a way to get a snapshot that is guaranteed to work until that transaction is exited. But I don't think that would really make users happy. I think the usual use case for this feature would be to dump a single table or small number of tables as of some time in the past that they didn't plan in advance that they would need. They might have a cron job periodically export a snapshot "just in case" and then want to use it later. They wouldn't be happy if they had to create a prepared transaction for each such snapshot which locked every table in their database until they decide they don't actually need it. That would mean they could never do any ddl. The use case of wanting to dump a single table as of a few hours ago (e.g. before some application data loss bug) is pretty compelling. If we could do it it I think it would be worth quite a bit. What's the worst case for using an old snapshot? If I try to access a table that doesn't exist any longer I'll get an error. That doesn't really seem that bad for the use case I described. It's worse for the full table dump but for an explicit list of tables, eh. Seems ok to me. If I try to access a table whose schema has changed then I might use the wrong tupledesc and see rows that don't decode properly. That would be a disaster. Can we protect against that by noticing that the pg_class row isn't visible to our snapshot and throw an error? Would that be sufficient to protect against all schema changes? Would it cause massive false positives based on whether vacuum had happened to have run recently? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] XLogFlush invoked about twice as many times after 9.2 group commit enhancement
Hello, I have been trying to understand how group commit implementation works the way it does after 9.2 group commit enhancement patch (9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I have observed some behavior in this regard that I could not understand. Profiling results show that XLogFlush() is called about twice as many times after this patch while for XLogWrite() count remains about same as before. This patch modifies XLogFlush() such that it offers the said performance gain by alleviating the lock contention on WALWriteLock using the new LWLockAcquireOrWait(). I do not however understand why XLogFlush is invoked twice as many times (as seen from the profiling results) as an effect of this patch. Why should this patch make XLogFlush() being invoked twice as many times? I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result after applying the patch not being significantly different (as in not twice as much on my system). I used: pgbench scale=10 shared_buffers=256MB max_connections=1000 checkpoint_segments=15 synchronous_commit=on Comments? -- Amit Langote -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
* Andres Freund (and...@2ndquadrant.com) wrote: > On 2013-05-07 08:54:54 -0400, Stephen Frost wrote: > > Agreed- but it also isn't currently possible to make it any smaller. > > Uh. Why not? I think this is what needs to be fixed instead of making > the hole marginally smaller elsewhere. If we're able to fix it- how would we allow users to take advantage of that fix when starting their own snapshot and feeding it to pg_dump? > You can trivially reproduce the > problem with pg_dump today: Agreed. The idea was to simply avoid making it worse. Your argument seems to be that it's already horrible and easily broken and therefore we can go ahead and make it worse and no one will complain because no one has complained about how bad it is already. I don't follow that. > > My suggestion was to lock everything that pg_dump locks, which we > > clearly have locks for since pg_dump is acquiring them. Also, I don't > > believe it'd be that difficult to identify what pg_dump would lock, at > > least in a 'default' whole-database run. This is more of a stop-gap > > than a complete solution. > > The problem is that locking - as shown above - doesn't really help all > that much. It helps in that once we have the lock, things aren't changing under us. The closer we can keep that to when the transaction starts, the better.. > You would have to do it like: > 1) start txn > 2) acquire DDL prevention lock > 3) assert we do not yet have a snapshot > 4) acquire snapshot > 5) lock objects > 6) release DDL lock > 7) dump objects/data > 8) commit txn We'd need a '4.5' to get the list of objects, no? > Unfortunately most of these steps cannot easily/safely exposed to > sql. And again, this is a very old situation, that doesn't really have > to do anything with snapshot exporting. Perhaps we can't expose the individual components, but perhaps we could have a function which does some/all of the pieces (except the pieces that must be done by pg_dump) and which users need to grant explicit execute rights to for their "backup" user? > pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That > would make it unrunnable in many situations. Especially as we cannot > easily (without using several connections at once) release locks before > ending a transaction. I agree that we wouldn't want pg_dump preventing all catalog updates, but wouldn't we be able to draw a distinction between objects being modified and objects being added? > > It's really a problem for just about everything that uses transactions > > and locking, isn't it? pg_dump just happens to have it worst since it > > wants to go and touch every object in the database. It's certainly > > possible for people to connect to the DB, look at pg_class and then try > > to access some object which no longer exists (even though it's in > > pg_class). > > Well, normal sql shouldn't need to touch pg_class and will know > beforehand which locks it will need. But I have to say I more than once > wished we would throw an error if an objects definition is "newer" than > the one we started out with. I agree- that would be nice. I'd also contend that 'normal' SQL quite often looks at pg_class; I know that we have tons of code which does. We also basically lock all users out of the database when we're doing DDL changes to avoid any issues, but it'd certainly be nice if we didn't have to.. > > This will be an interesting thing to consider when > > implementing MVCC for the catalog. > > I think using proper mvcc snapsot for catalog scans doesn't, cannot even > in all case, imply having to use the user's transaction's snapshot, just > one that guarantees a consistent result while a query is running. The hope would be to see a consistent view of what you can access while the transaction is running.. > I am not sure if the correct fix is locking and not just making sure the > definition of objects hasn't changed since the snapshot started. I'm guessing that the intent of the locking is to make sure that the objects don't change under us. :) The problem, as you've explained, is that the object might change before we get our lock in place. That window of opportunity gets a lot larger when it's moved outside of our control. > But if > we go for locking creating a function which makes sure that the source > transaction has a certain strong lock wouldn't be that hard. We have all > the data for it. Agreed. > a) exporting a snapshot to a file was discussed and deemed unacceptable > risky. That's why pg_export_snapshot() exports it itself into some > internal format somewhere. The user doesn't need to know where/how. It wasn't my intent to export the 'snapshot' to a file but rather to simply dump out what pg_dump gets when it runs pg_export_snapshot(), allowing others to connect in and use that snapshot. > b) When importing a snapshot the source transaction needs to be alive, > otherwise we cannot guarantee the global xmin hasn't advanced too > much. That would
Re: [HACKERS] pg_dump --snapshot
Hi, On 2013-05-07 08:54:54 -0400, Stephen Frost wrote: > * Andres Freund (and...@2ndquadrant.com) wrote: > > > All of which I > > > think I agree with, but I don't agree with the conclusion that this > > > larger window is somehow acceptable because there's a very small window > > > (one which can't be made any smaller, today..) which exists today. > > > > The window isn't that small currently: > > Agreed- but it also isn't currently possible to make it any smaller. Uh. Why not? I think this is what needs to be fixed instead of making the hole marginally smaller elsewhere. You can trivially reproduce the problem with pg_dump today: S1: $ psql postgres =# CREATE DATABASE pgdumptest; =# CREATE DATABASE pgrestoretest; =# \c pgdumptest =# CREATE TABLE tbl(id serial primary key, data_a int8, data_b float8); =# INSERT INTO tbl(data_a, data_b) SELECT random()::int, random() FROM generate_series(1, 10); =# BEGIN; =# ALTER TABLE tbl RENAME COLUMN data_a TO data_swap; =# ALTER TABLE tbl RENAME COLUMN data_b TO data_a; =# ALTER TABLE tbl RENAME COLUMN data_swap TO data_b; S2: $ pg_dump pgdumptest > /tmp/pg_dump.sql S1: =# COMMIT; S2: $ psql pgrestoretest -f /tmp/pgdump.sql psql:/tmp/pgdump.sql:87: ERROR: invalid input syntax for integer: "0.944006722886115313" CONTEXT: COPY tbl, line 1, column data_a: "0.944006722886115313" A ddl upgrade script taking some seconds isn't exactly anything unusual... > > > Alright, then let's provide a function which will do that and tell > > > people to use it instead of just using pg_export_snapshot(), which > > > clearly doesn't do that. > > > > If it were clear cut what to lock and we had locks for > > everything. Maybe. But we don't have locks for everything. > > My suggestion was to lock everything that pg_dump locks, which we > clearly have locks for since pg_dump is acquiring them. Also, I don't > believe it'd be that difficult to identify what pg_dump would lock, at > least in a 'default' whole-database run. This is more of a stop-gap > than a complete solution. The problem is that locking - as shown above - doesn't really help all that much. You would have to do it like: 1) start txn 2) acquire DDL prevention lock 3) assert we do not yet have a snapshot 4) acquire snapshot 5) lock objects 6) release DDL lock 7) dump objects/data 8) commit txn Unfortunately most of these steps cannot easily/safely exposed to sql. And again, this is a very old situation, that doesn't really have to do anything with snapshot exporting. > > So we would > > need to take locks preventing any modification on any of system catalogs > > which doesn't really seem like a good thing, especially as we can't > > release them from sql during the dump were we can allow creation of > > temp tables and everything without problems. > > That's already an issue when pg_dump runs, no? Not sure why this is > different. pg_dump doesn't prevent you from running CREATE TEMPORARY TABLE? That would make it unrunnable in many situations. Especially as we cannot easily (without using several connections at once) release locks before ending a transaction. > > > I believe the main argument here is really around "you should think > > > about these issues before just throwing this in" and not "it must be > > > perfect before it goes in". Perhaps "it shouldn't make things *worse* > > > than they are now" would also be apt.. > > > > That's not how I read 8465.1367860...@sss.pgh.pa.us :( > > I believe the point that Tom is making is that we shouldn't paint > ourselves into a corner by letting users provide old snapshots to > pg_dump which haven't acquired any of the necessary locks. The goal, at > least as I read it, is to come up with a workable design (and I don't > know that we have, but still) which provides a way for the locks to be > taken at least as quickly as what pg_dump does today and which we could > modify down the road to take the locks pre-snapshot (presuming we can > figure out a way to make that work). > The proposed patch certainly doesn't make any attempt to address that > issue and would encourage users to open themselves up to this risk more > than they are exposted today w/ pg_dump. I fail to see a difference that is big enough to worry overly much about. The above problem is easy enough to encounter without any snapshot exporting and I can't remember a single report. > > I think there is no point in fixing it somewhere else. The problem is in > > pg_dump, not the snapshot import/export. > > It's really a problem for just about everything that uses transactions > and locking, isn't it? pg_dump just happens to have it worst since it > wants to go and touch every object in the database. It's certainly > possible for people to connect to the DB, look at pg_class and then try > to access some object which no longer exists (even though it's in > pg_class). Well, normal sql shouldn't need to touch pg_class and will know beforehand which locks it will need. But I have to say I more than
[HACKERS] Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog
On 08.04.2013 18:58, Jeff Bohmer wrote: On Apr 6, 2013, at 1:24 PM, Jeff Janes wrote: On Sat, Apr 6, 2013 at 1:24 AM, Heikki Linnakangas wrote: Perhaps we should improve the documentation to make it more explicit that backup_label must be included in the backup. The docs already say that, though, so I suspect that people making this mistake have not read the docs very carefully anyway. I don't think the docs are very clear on that. They say "This file will of course be archived as a part of your backup dump file", but "will be" does not imply "must be". Elsewhere it emphasizes that the label you gave to pg_start_backup is written into the file, but doesn't really say what the file itself is there for. To me it seems to imply that the file is there for your convenience, to hold that label, and not as a critical part of the system. Patch attached, which I hope can be back-patched. I'll also add it to commitfest-Next. I think this documentation update would be helpful. Committed that. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Failing start-up archive recovery at Standby mode in PG9.2.4
On 26.04.2013 11:51, KONDO Mitsumasa wrote: Hi, I discavered the problem cause. I think taht horiguchi's discovery is another problem... Problem has CreateRestartPoint. In recovery mode, PG should not WAL record. Because PG does not know latest WAL file location. But in this problem case, PG(standby) write WAL file at RestartPoint in archive recovery. In recovery mode, I think that RestartPoint can write only MinRecoveryPoint. Here is Standby's pg_xlog directory in problem caused. [mitsu-ko@localhost postgresql-9.2.4-c]$ ls Standby/pg_xlog/ 00020003 00020007 0002000B 0003.history 00020004 00020008 0002000C 0003000E 00020005 00020009 0002000D 0003000F 00020006 0002000A 0002000E archive_status This problem case is here. [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: attempting to remove WAL segments older than log file 00030002 [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3568 [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction log file "00010002" [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607 [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: recycled transaction log file "00020002" [Standby] 2013-04-26 04:26:44 EDT LOCATION: RemoveOldXlogFiles, xlog.c:3607 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint complete: wrote 9 buffers (0.2%); 0 transaction log file(s) added, 0 removed, 2 recycled; write=0.601 s, sync=1.178 s, total=1.781 s; sync files=3, longest=1.176 s, average=0.392 s [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointEnd, xlog.c:7893 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: recovery restart point at 0/90FE448 [Standby] 2013-04-26 04:26:44 EDT DETAIL: last completed transaction was at log time 2013-04-26 04:25:53.203725-04 [Standby] 2013-04-26 04:26:44 EDT LOCATION: CreateRestartPoint, xlog.c:8601 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: restartpoint starting: xlog [Standby] 2013-04-26 04:26:44 EDT LOCATION: LogCheckpointStart, xlog.c:7821 cp: cannot stat `../arc/0003000F': そのようなファイル やディレクトリはありません [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file "0003000F" from archive: return code 256 [Standby] 2013-04-26 04:26:44 EDT LOCATION: RestoreArchivedFile, xlog.c:3323 [Standby] 2013-04-26 04:26:44 EDT LOG: 0: unexpected pageaddr 0/200 in log file 0, segment 15, offset 0 [Standby] 2013-04-26 04:26:44 EDT LOCATION: ValidXLOGHeader, xlog.c:4395 cp: cannot stat `../arc/0003000F': そのようなファイル やディレクトリはありません [Standby] 2013-04-26 04:26:44 EDT DEBUG: 0: could not restore file "0003000F" from archive: return code 256 In recovery, pg normary search WAL file at archive recovery. If propery WAL file is nothing(archive command is failed), next search pg_xlog directory. Normary, propety WAL file is nothing in pg_xlog. But this case has propety name's WAL file(But it's mistaken and illegal) in pg_xlog. So recovery is failed and broken Standby. So I fix CreateRestartPoint at branching point of executing MinRecoveryPoint. It seems to fix this problem, but attached patch is plain. I didn't understand this. I committed a fix for the issue where recycled WAL files get in the way of recovery, but I'm not sure if you're describing the same issue or something else. But before we dig any deeper into this, can you verify if you're still seeing any issues on 9.3beta1? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_dump --snapshot
* Simon Riggs (si...@2ndquadrant.com) wrote: > On 7 May 2013 01:18, Stephen Frost wrote: > > * Simon Riggs (si...@2ndquadrant.com) wrote: > >> If anybody really wanted to fix pg_dump, they could do. If that was so > >> important, why block this patch, but allow parallel pg_dump to be > >> committed without it? > > > > Because parallel pg_dump didn't make the problem any *worse*..? This > > does. > > Sorry, not accurate. Patch makes nothing *worse*. I really don't understand that claim. It clearly increases the time between transaction start and when the locks are all acquired. If this is used programatically (instead of someone hand-typing commands into psql), perhaps it doesn't increase it very much, but increasing it at all is clearly going in the "worse" direction. > The existing API *can* be misused in the way you say, and so also > could pg_dump if the patch is allowed. And if the patch isn't allowed, then pg_dump isn't any worse off than it's always been. The corrollary is that pg_dump *is* worse off if the patch goes in. The patch is absolutely *encouraging* such misuse of this API because there's almost no other way to use this new option to pg_dump except as an abuse of the API. > However, there is no reason to suppose that such misuse would be > common; The concern is exactly this point. It wouldn't be simply common, were this patch applied and the option used, it would be the predominant case and tools would be built which either completely ignore the problem (most likely) or which attempt to duplicate what pg_dump already does, likely very poorly. > no reason why a timing gap would *necessarily* occur in the > way your previous example showed, or if it did why it would > necessarily present a problem for the user. Especially if we put > something in the docs. I don't believe papering over this with documentation is really a solution. I did suggest a few other options which don't seem to be getting much traction from anyone, so perhaps they're not workable, but I'll reiterate them for fun anyway: Provide a way for users (perhaps even pg_dump) to acquire locks on essentially everything in the DB. This could be a plpgsql function or perhaps even a new lock type. Grander ideas might also support the filtering options which pg_dump supports. Have pg_dump output the snapshot which it has acquired and shared, for others to then connect and use. It would do this after acquiring all of the locks, of course. > Snapshots have no WARNING on them. The documentation is certainly fallible, though this is certainly a problem which all applications need to address in some way. We address it as best we can in pg_dump today and the goal is to continue to do so. > There is no guarantee in any > transaction that the table you want will not be dropped before you try > to access it. Hence why we do the best we can in pg_dump by immediately locking the objects, once we've identified what they are. > > However, > > there really isn't a way for users to use this new option correctly- > > Not accurate. Then please articulate how they would use it correctly? Would they write a function which goes and acquires all of the locks which pg_dump would (assuming they've figured out what locks pg_dump needs), prior to exporting the snapshot and calling pg_dump? If so, perhaps we could simplify their lives by providing such a function for them to use instead of asking each user to write it? > The argument against this is essentially that we don't trust the user > to use it well, so we won't let them have it at all. Which makes no > sense since they already have this API and don't need our permission > to use it. All that blocking this patch does is to remove any chance > the user has of coordinating pg_dump with other actions; preventing > that causes more issues for the user and so doing nothing is not a > safe or correct either. A balanced viewpoint needs to include the same > level of analysis on both sides, not just a deep look at the worst > case on one side and claim everything is rosy with the current > situation. I'm not sure where the claim that everything is rosy was made; I've certainly not seen it nor made any claim that it is. This whole discussion started with Tom pointing out that pg_dump is essentially already busted and that this change would end up making that situation worse. Perhaps in very select cases users will do the right thing and acquire all the locks they need before exporting the snapshot and calling pg_dump, but that description of how to use this option correctly certainly wasn't articulated anywhere in the initial description or contents of the patch. There's also nothing done to make that any easier to do nor a way to move users using this system to a new methodology when we figure out how to make pg_dump do this better, thus putting us in a situation of *always* having this issue, even if we fix it for 'normal' pg_dump. Thanks, Stephe
Re: [HACKERS] pg_dump --snapshot
* Andres Freund (and...@2ndquadrant.com) wrote: > > All of which I > > think I agree with, but I don't agree with the conclusion that this > > larger window is somehow acceptable because there's a very small window > > (one which can't be made any smaller, today..) which exists today. > > The window isn't that small currently: Agreed- but it also isn't currently possible to make it any smaller. > b) Locking all relations in a big database can take a second or some, > even if there are no conflicting locks. Yes, I've noticed.. :( You can also run out of locks, which is quite painful. > > Alright, then let's provide a function which will do that and tell > > people to use it instead of just using pg_export_snapshot(), which > > clearly doesn't do that. > > If it were clear cut what to lock and we had locks for > everything. Maybe. But we don't have locks for everything. My suggestion was to lock everything that pg_dump locks, which we clearly have locks for since pg_dump is acquiring them. Also, I don't believe it'd be that difficult to identify what pg_dump would lock, at least in a 'default' whole-database run. This is more of a stop-gap than a complete solution. > So we would > need to take locks preventing any modification on any of system catalogs > which doesn't really seem like a good thing, especially as we can't > release them from sql during the dump were we can allow creation of > temp tables and everything without problems. That's already an issue when pg_dump runs, no? Not sure why this is different. > Also, as explained above, the problem already exists in larger > timeframes than referenced in this thread, so I really don't see how > anything thats only based on plain locks on user objects can solve the > issue in a relevant enough way. The point is to try and avoid making the problem worse.. > I am comparing the time between 'snapshot acquiration' and 'getting > the object list' with the time between 'getting the object list' and > 'locking the object list'. What I am saying is that in many scenarios > the second part will be the bigger part. I can see how that can happen, sure. > > I believe the main argument here is really around "you should think > > about these issues before just throwing this in" and not "it must be > > perfect before it goes in". Perhaps "it shouldn't make things *worse* > > than they are now" would also be apt.. > > That's not how I read 8465.1367860...@sss.pgh.pa.us :( I believe the point that Tom is making is that we shouldn't paint ourselves into a corner by letting users provide old snapshots to pg_dump which haven't acquired any of the necessary locks. The goal, at least as I read it, is to come up with a workable design (and I don't know that we have, but still) which provides a way for the locks to be taken at least as quickly as what pg_dump does today and which we could modify down the road to take the locks pre-snapshot (presuming we can figure out a way to make that work). The proposed patch certainly doesn't make any attempt to address that issue and would encourage users to open themselves up to this risk more than they are exposted today w/ pg_dump. > I think there is no point in fixing it somewhere else. The problem is in > pg_dump, not the snapshot import/export. It's really a problem for just about everything that uses transactions and locking, isn't it? pg_dump just happens to have it worst since it wants to go and touch every object in the database. It's certainly possible for people to connect to the DB, look at pg_class and then try to access some object which no longer exists (even though it's in pg_class). This will be an interesting thing to consider when implementing MVCC for the catalog. > You did suggest how it can be fixed? You mean > 20130506214515.gl4...@tamriel.snowman.net? I suggested how it might be done. :) There's undoubtably issues with an all-database-objects lock, but it would certainly reduce the time between transaction start and getting all the locks acquired and shrink the window that much more. If we did implement such a beast, how could we ensure that the locks were taken immediately after transaction start if the snapshot is being passed to pg_dump? Basically, if we *did* solve this issue for pg_dump in some way in the future, how would we use it if pg_dump can accept an outside snapshot? One other thought did occur to me- we could simply have pg_dump export the snapshot that it gets to stdout, a file, whatever, and systems which are trying to do this magic "everyone gets the same view" could glob onto the snapshot created by pg_dump, after all the locks have been acquired.. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Recovery target 'immediate'
On 07.05.2013 15:38, Simon Riggs wrote: On 3 May 2013 14:40, Heikki Linnakangas wrote: If we want to avoid adding a new option for this, how about a magic restore point called "consistent" or "immediate": recovery_target_name='immediate' That would stop recovery right after reaching consistency, but there wouldn't be an actual restore point record in the WAL stream. recovery_target_name='something' ...works for me. Either constent or immediate works. I request that the docs recommend this be used in conjunction with pause_at_recovery_target = on, so that the user can begin inspecting the database at the first available point and then roll forward from that point if desired. That would cover my concern that this stopping point is arbitrary and not intrinsically worth stopping at of itself. Sounds good. I've added this to the TODO. Can I suggest that we discuss a range of related changes together? So we have a roadmap of agreed changes in this area. That will be more efficient than discussing each one individually; often each one makes sense only as part of the wider context. Sure, do you have something else in mind related to this? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Recovery target 'immediate'
On 3 May 2013 14:40, Heikki Linnakangas wrote: > On 03.05.2013 16:29, Bruce Momjian wrote: >> >> On Fri, May 3, 2013 at 01:02:08PM +0200, Cédric Villemain wrote: >>> >>> This changes the existing API which will confuse people that know it >>> and invalidate everything written in software and on wikis as to how >>> to do it. That means all the "in case of fire break glass" >>> instructions are all wrong and need to be rewritten and retested. >> >> >> Yes, *that* is the main reason *not* to make the change. It has a >> pretty bad cost in backwards compatibility loss. There is a gain, but >> I don't think it outweighs the cost. > > > So, is there a way to add this feature without breaking the API? Yes, by adding a new parameter exclusively used to control this feature, something like recovery_target_immediate = 'on/off'. >>> >>> >>> We just need to add a named restore point when ending the backup (in >>> pg_stop_backup() ?). >>> No API change required. Just document that some predefined target names >>> are set >>> during backup. >> >> >> So we auto-add a restore point based on the backup label. Does that >> work for everyone? > > > Unfortunately, no. There are cases where you want to stop right after > reaching consistency, but the point where you reach consistency is not at > the end of a backup. For example, if you take a backup using an atomic > filesystem snapshot, there are no pg_start/stop_backup calls, and the system > will reach consistency after replaying all the WAL in pg_xlog. You might > think that you can just not create a recovery.conf file in that case, or > create a dummy recovery.conf file with restore_command='/bin/false'. > However, then the system will not find the existing timeline history files > in the archive, and can pick a TLI that's already in use. I found this out > the hard way, and actually ended up writing a restore_command that restore > timeline history files normally, but returns non-zero for any real other > files; it wasn't pretty. > > Another case is that you take a backup from a standby server; you can't > write a restore-point WAL record in a standby. > > If we want to avoid adding a new option for this, how about a magic restore > point called "consistent" or "immediate": > > recovery_target_name='immediate' > > That would stop recovery right after reaching consistency, but there > wouldn't be an actual restore point record in the WAL stream. recovery_target_name='something' ...works for me. Either constent or immediate works. I request that the docs recommend this be used in conjunction with pause_at_recovery_target = on, so that the user can begin inspecting the database at the first available point and then roll forward from that point if desired. That would cover my concern that this stopping point is arbitrary and not intrinsically worth stopping at of itself. Can I suggest that we discuss a range of related changes together? So we have a roadmap of agreed changes in this area. That will be more efficient than discussing each one individually; often each one makes sense only as part of the wider context. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Fast promotion failure
While testing the bug from the "Assertion failure at standby promotion", I bumped into a different bug in fast promotion. When the first checkpoint after fast promotion is performed, there is no guarantee that the checkpointer process is running with the correct, new, ThisTimeLineID. In CreateCheckPoint(), we have this: /* * An end-of-recovery checkpoint is created before anyone is allowed to * write WAL. To allow us to write the checkpoint record, temporarily * enable XLogInsertAllowed. (This also ensures ThisTimeLineID is * initialized, which we need here and in AdvanceXLInsertBuffer.) */ if (flags & CHECKPOINT_END_OF_RECOVERY) LocalSetXLogInsertAllowed(); That ensures that ThisTimeLineID is updated when performing an end-of-recovery checkpoint, but it doesn't get executed with fast promotion. The consequence is that the checkpoint is created with the old timeline, and subsequent recovery from it will fail. I ran into this with the attached script. It sets up a master (M), a standby (B), and a cascading standby (C). I'm not sure why, but when I tried to simplify the script by removing the cascading standby, it started to work. The bug occurs in standby B, so I'm not sure why the presence of the cascading standby makes any difference. Maybe it just affects the timing. - Heikki fast-promotion-bug.sh Description: application/shellscript -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] add long options to pgbench (submission 1)
Hello Robert, This very minor patch adds a corresponding long option to all short (one letter) options of pgbench. [...] I don't really have an opinion on whether this is worth doing, but we'd probably want to update all of our client utilities, not just pgbench, if we did. The current status is that "official" clients already have systematic long options. I have checked: psql, pg_dump, pg_dumpall, pg_restore, pg_basebackup, createdb, createuser, createlang. Possibly other commands in contrib do not have long option. As for the rational, when I type interactively I tend to use short options, but in a script I like long options so that I may not need to look them up in the documentation too often when reading the script later. The other rational is that adding long options is cheap and straightforward. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement
Hello, I have been trying to understand how group commit implementation works the way it does after 9.2 group commit enhancement patch (9b38d46d9f5517dab67dda1dd0459683fc9cda9f on REL9_2_STABLE). I admit it's a pretty old commit though I seek some clarification as to how it provides the performance gain as it does. Also, I have observed some behavior in this regard that I could not understand. Profiling results show that XLogFlush is called about twice as much after this patch while for XLogWrite count remains about same as before. Since, XLogFlush is the place where this patch offers the said performance gain by alleviating the lock contention on WALWriteLock using the new LWLockAcquireOrWait(). So far so good. I do not understand why XLogFlush is invoked twice (as see from the profiling results) as an effect of this patch. I used "pgbench -c 32 -t 1000 pgbench" in both cases with TPS result (after applying the patch) not being significantly different (as in not twice as much on my system). Using: pgbench scale=10 shared_buffers=256MB max_connections=1000 checkpoint_segments=15 synchronous_commit=on Comments? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] spoonbill vs. -HEAD
On 04/04/2013 02:18 AM, Tom Lane wrote: Stefan Kaltenbrunner writes: On 04/03/2013 12:59 AM, Tom Lane wrote: BTW, on further thought it seems like maybe this is an OpenBSD bug, at least in part: what is evidently happening is that the temporary blockage of SIGINT during the handler persists even after we've longjmp'd back to the main loop. But we're using sigsetjmp(..., 1) to establish that longjmp handler --- so why isn't the original signal mask reinstalled when we return to the main loop? If (your version of) OpenBSD is getting this wrong, it'd explain why we've not seen similar behavior elsewhere. hmm trolling the openbsd cvs history brings up this: http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143 That's about alternate signal stacks, which we're not using. I put together a simple test program (attached) and tried it on spoonbill, and it says that the signal *does* get unblocked when control returns to the sigsetjmp(...,1). So now I'm really confused. Somehow the results we're getting in a full-fledged backend do not match up with the results gotten by this test program ... but why? as a followup to this - I spend some time upgrading spoonbill to the lastest OpenBSD release (5.3) the other day and it seems to be able to pass a full regression test run now on a manual run. I will add it to the regular reporting schedule again, but it seems at least part of the problem is/was an Operating system level issue that got fixed in either 5.2 or 5.3 (spoonbill was on 5.1 before). Stefan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers