Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On Fri, Sep 30, 2016 at 2:05 PM, Kyotaro HORIGUCHIwrote: > At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI > wrote in > <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp> >> I don't see no problem in setting progressAt in XLogInsertRecord. >> But I doubt GetProgressRecPtr is harmful, especially when > > But I suspect that GetProgressRecPtr could be harmful. Well, you can maximize its effects by doing NUM_XLOGINSERT_LOCKS == nproc and reducing checkpoint_timeout. That's what I did but.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
At Thu, 29 Sep 2016 16:16:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20160929.161600.224338668.horiguchi.kyot...@lab.ntt.co.jp> > That looks better. I'll change the API as the following. > > COMPLETE_WITH_QUERY(query); > COMPLETE_WITH_QUERY_KW(query, kwlist); > COMPLETE_WITH_SCHEMA_QUERY(squery); > COMPLETE_WITH_SCHEMA_QUERY_KW(squery, kwlist); Done on my environment. > > >> 4. Introduce word shift and removal feature to psql-completion > > >> > > >> This is the second core for the flexibility of completion code. > > >> The word shift feature is the ability to omit first several > > >> words in *MatchesN macros. For example this allows complete > > >> create-schema's schema elements in a natural code. (Currently > > >> those syntaxes that can be a schema elements are using > > >> TailMatches instead of Matches, as the result HeadMatches are > > >> not available there). The words removing feature is the ability > > >> to (desructively) clip multiple suceessive words in the > > >> previous_words list. This feature allows suceeding completion > > >> code not to care about the removed words, such like UNIQUE, > > >> CONCURRENTLY, VERBOSE and so on. > > >> > > > > > I am thinking so commit's description should be inside README > > Currently psql or tab-complete.c/psql_completion() have no such > document. If this should be written as README, perhaps I should > write about completion in general. On the other hand, per-macro > explanations are written in tab-complete-macros.h but the usages > are not. I'll try to write README. Before proposing new patch including this. Since I'm totally inconfident for my capability to write a documentation, I'd like to ask anyone here of what shape we are to aim.. The following is the first part of the document I have written up to now. Please help me by giving me any corrections, suggestions, opinions, or anything else! # The completion macro part would be better to be moved as # comments for the macros in tab-complete-macros.h. == src/bin/psql/README.completion Word completion of interactive psql === psql supports word completion on interactive input. The core function of the feature is psql_completion_internal in tab-complete.c. A bunch of macros are provided in order to make it easier to read and maintain the completion code. The console input to refer is stored in char ** previous_words in reverse order but maintaiers of psql_completion_internal don't need to be aware of the detail of it. Most of the operation can be described using the provided macros. Basic structure of the completion code -- The main part of the function is just a series of completion definitions, where the first match wins. Each definition basically is in the following shape. if (*matching expression*) *corresponding completion, then return* The matching expression checks against all input words before the word currently being entered. Completion chooses the words prefixed by letters already entered. For example, for "CREATE " the word list to be matched is ["CREATE"] and the prefix for completion is nothing. For "CREATE INDEX id", the list is ["CREATE", "INDEX"] and the prefix is "id". Matching expression macros -- There are four types of matching expression macros. - MatchesN(word1, word2 .. , wordN) true iff the word list is exactly the same as the paremeter. - HeadMatchesN(word1, word2 .., wordN) true iff the first N words in the word list matches the parameter. - TailMatchesN(word1, word2 .., wordN) true iff the last N words in the word list matches the parameter. - MidMatchesN(pos, word1, word2 .., wordN) true iff N successive words starts from pos in the word list matches the parameter. The position is 1-based. Completion macros - There are N types of completion macros. - COMPLETE_WITH_QUERY(query), COMPLETE_WITH_QUERY_KW(query, addon) Suggest completion words acquired using the given query. The details of the query is seen in the comment for _complete_from_query(). Word matching is case-sensitive. The latter takes an additional parameter, which should be a fragment of query starts with " UNION " followed by a query string which gives some additional words. This addon can be ADDLISTN() macro for case-sensitive suggestion. - COMPLETE_WITH_SCHEMA_QUERY(squery), COMPLETE_WITH_SCHEMA_QUERY_KW(squery, addon) Suggest based on a "schema query", which is a struct containing parameters. You will see the details in the comment for _complete_from_query(). Word maching is case-sensitive. Just same as COMPLETE_WITH_QUERY_KW, the latter form takes a fragment query same to that for COMPLETE_WITH_QUERY_KW. - COMPLETE_WITH_LIST_CS(list) Suggest completion words given as an array of strings. Word matching is case-sensitive. -
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
On Fri, Sep 30, 2016 at 1:26 PM, Kyotaro HORIGUCHIwrote: > Hello, > > At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawada > wrote in
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
Sorry, it wrote wrong thing. At Fri, 30 Sep 2016 14:00:15 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHIwrote in <20160930.140015.150178454.horiguchi.kyot...@lab.ntt.co.jp> > Sorry, I might have torn off this thread somehow.. > > At Thu, 29 Sep 2016 11:26:29 -0400, David Steele wrote > in <30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net> > > On 9/28/16 10:32 PM, Michael Paquier wrote: > > > On Thu, Sep 29, 2016 at 7:45 AM, David Steele > > > wrote: > > >> > > >> In general I agree with the other comments that this could end up > > >> being > > >> a problem. On the other hand, since the additional locks are only > > >> taken > > >> at checkpoint or archive_timeout it may not be that big a deal. > > > > > > Yes, I did some tests on my laptop a couple of months back, that has 4 > > > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase > > > contention and performing a bunch of INSERT using 4 clients on 4 > > > different relations I could not catch a difference.. Autovacuum was > > > disabled to eliminate any noise. I tried checkpoint_segments at 30s to > > > see its effects, as well as larger values to see the impact with the > > > standby snapshot taken by the bgwriter. Other thoughts are welcome. > > > > I don't have any better ideas than that. > > I don't see no problem in setting progressAt in XLogInsertRecord. > But I doubt GetProgressRecPtr is harmful, especially when But I suspect that GetProgressRecPtr could be harmful. > NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems > rather alleviates the impact. But it actually doesn't seem so > harmful up to 8. (Even though I don't like the locking in > GetProgressRecPtr..) > > Currently possiblly harmful calling of GetProgressRecPtr is that > in BackgroundWriterMain. It should be called with ther interval > BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't > see the issuer of SIGUSR1 of bgwriter.. > > > > >> +++ b/src/backend/postmaster/checkpointer.c > > >> + /* OK, it's time to switch */ > > >> + elog(LOG, "Request XLog Switch"); > > >> > > >> LOG level seems a bit much here, perhaps DEBUG1? > > > > > > That's from Horiguchi-san's patch, and those would be definitely > > > better as DEBUG1 by looking at it. Now and in order to keep things > > > simple I think that we had better discard this patch for now. I was > > > planning to come back to this thing anyway once we are done with the > > > first problem. > > > > I still see this: > > > > +++ b/src/backend/postmaster/checkpointer.c > > /* OK, it's time to switch */ > > + elog(LOG, "Request XLog Switch"); > > > > > Well for now attached are two patches, that could just be squashed > > > into one. > > Mmmm. Sorry, this was for my quite private instant debug, spilt > outside.. But I don't mind to leave it with DEBUG2 if it seems > useful. > > > Yes, I think that makes sense. > > > > More importantly, there is a regression. With your new patch the > > xlogs are switching on archive_timeout again even with no changes. > > The v12 worked fine. > > As Michael mentioned in this or another thread, it is another > issue that he wants to solve separately. I personally doubt that > this patch (v11 and v13) can be evaluated alone without it, but > we can review this with the excessive switching problem, perhaps? > > > The differences are all in 0002-hs-checkpoints-v12-2.patch and as far > > as I can see the patch does not work correctly without these changes. > > Am I missing something? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
Sorry, I might have torn off this thread somehow.. At Thu, 29 Sep 2016 11:26:29 -0400, David Steelewrote in <30095aea-3910-dbb7-1790-a579fb93f...@pgmasters.net> > On 9/28/16 10:32 PM, Michael Paquier wrote: > > On Thu, Sep 29, 2016 at 7:45 AM, David Steele > > wrote: > >> > >> In general I agree with the other comments that this could end up > >> being > >> a problem. On the other hand, since the additional locks are only > >> taken > >> at checkpoint or archive_timeout it may not be that big a deal. > > > > Yes, I did some tests on my laptop a couple of months back, that has 4 > > cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase > > contention and performing a bunch of INSERT using 4 clients on 4 > > different relations I could not catch a difference.. Autovacuum was > > disabled to eliminate any noise. I tried checkpoint_segments at 30s to > > see its effects, as well as larger values to see the impact with the > > standby snapshot taken by the bgwriter. Other thoughts are welcome. > > I don't have any better ideas than that. I don't see no problem in setting progressAt in XLogInsertRecord. But I doubt GetProgressRecPtr is harmful, especially when NUM_XLOGINSERT_LOCKS is *large*. So reducing the number seems rather alleviates the impact. But it actually doesn't seem so harmful up to 8. (Even though I don't like the locking in GetProgressRecPtr..) Currently possiblly harmful calling of GetProgressRecPtr is that in BackgroundWriterMain. It should be called with ther interval BgWriterDelay, and anytime pgwriter recieved SIGUSR1. But I don't see the issuer of SIGUSR1 of bgwriter.. > >> +++ b/src/backend/postmaster/checkpointer.c > >> + /* OK, it's time to switch */ > >> + elog(LOG, "Request XLog Switch"); > >> > >> LOG level seems a bit much here, perhaps DEBUG1? > > > > That's from Horiguchi-san's patch, and those would be definitely > > better as DEBUG1 by looking at it. Now and in order to keep things > > simple I think that we had better discard this patch for now. I was > > planning to come back to this thing anyway once we are done with the > > first problem. > > I still see this: > > +++ b/src/backend/postmaster/checkpointer.c > /* OK, it's time to switch */ > + elog(LOG, "Request XLog Switch"); > > > Well for now attached are two patches, that could just be squashed > > into one. Mmmm. Sorry, this was for my quite private instant debug, spilt outside.. But I don't mind to leave it with DEBUG2 if it seems useful. > Yes, I think that makes sense. > > More importantly, there is a regression. With your new patch the > xlogs are switching on archive_timeout again even with no changes. > The v12 worked fine. As Michael mentioned in this or another thread, it is another issue that he wants to solve separately. I personally doubt that this patch (v11 and v13) can be evaluated alone without it, but we can review this with the excessive switching problem, perhaps? > The differences are all in 0002-hs-checkpoints-v12-2.patch and as far > as I can see the patch does not work correctly without these changes. > Am I missing something? regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Fri, Sep 30, 2016 at 1:48 AM, Robert Haaswrote: > It seems to me that you haven't tested this patch very carefully, > because as far as I can see it breaks wait event reporting for both > heavyweight locks and buffer pins - or in other words two out of the > three existing cases covered by the mechanism you are patching. Oops. The WaitLatch calls overlap the other things if another event is reported. > The heavyweight lock portion is broken because WaitOnLock() reports a > Lock wait before calling ProcSleep(), which now clobbers it. Instead > of reporting that we're waiting for Lock/relation, a LOCK TABLE > statement that hangs now reports IPC/ProcSleep. Similarly, a conflict > over a tuple lock is now also reported as IPC/ProcSleep, and ditto for > any other kind of lock, which were all reported separately before. > Obviously, that's no good. I think it would be just fine to push down > setting the wait event into ProcSleep(), doing it when we actually > WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to > report that we're waiting for the heavyweight lock, not ProcSleep(). Somewhat similar problem with ResolveRecoveryConflictWithBufferPin(), per this reasoning what we should wait for here is a buffer pin and not a IPC/WaitForSignal. > As for buffer pins, note that LockBufferForCleanup() calls > ProcWaitForSignal(), which now overwrites the wait event set in by its > caller. I think we could just make ProcWaitForSignal() take a wait > event as an argument; then LockBufferForCleanup() could pass an > appropriate constant and other callers of ProcWaitForSignal() could > pass their own wait events. Agreed. So changed the patch this way. > The way that we're constructing the wait event ID that ultimately gets > advertised in pg_stat_activity is a bit silly in this patch. We start > with an event ID (which is an integer) and we then do an array lookup > (via GetWaitEventClass) to get a second integer which is then XOR'd > back into the first integer (via pgstat_report_wait_start) before > storing it in the PGPROC. New idea: let's change > pgstat_report_wait_start() to take a single integer argument which it > stores without interpretation. Let's separate the construction of the > wait event into a separate macro, say make_wait_event(). Then > existing callers of pgstat_report_wait_start(a, b) will instead do > pgstat_report_wait_start(make_wait_event(a, b)), but callers that want > to construct the wait event and push it through a few levels of the > call stack before advertising it only need to pass one integer. If > done correctly, this should be cleaner and faster than what you've got > right now. Hm, OK So ProcSleep() and ProcWaitForSignal() need an additional argument in the shape of a uint32 wait_event. So we need to change the shape of WaitLatch to have also just a uint32 as an extra argument. This has as result to force all the callers of WaitLatch to use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h needs to be included where WaitLatch() is called. And this has as consequence to make the addition of classId in WaitEventEntries completely useless, because including them has the advantage to reduce the places where pgstat.h is included, but as make_wait_event is needed to the wait_event value... > I am not a huge fan of the "WE_" prefix you've used. It's not > terrible, but "we" doesn't obviously stand for "wait event" and it's > also a word itself. I think a little more verbosity would add > clarity. Maybe we could go with WAIT_EVENT_ instead. OK. Switched that back. Hopefully you find the result cleaner. -- Michael diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c index 8ca1c1c..0841fd7 100644 --- a/contrib/postgres_fdw/connection.c +++ b/contrib/postgres_fdw/connection.c @@ -17,6 +17,7 @@ #include "access/xact.h" #include "mb/pg_wchar.h" #include "miscadmin.h" +#include "pgstat.h" #include "storage/latch.h" #include "utils/hsearch.h" #include "utils/memutils.h" @@ -491,12 +492,18 @@ pgfdw_get_result(PGconn *conn, const char *query) while (PQisBusy(conn)) { int wc; + uint32 wait_event; + + /* Define event to wait for */ + wait_event = pgstat_make_wait_event(WAIT_EXTENSION, +WAIT_EVENT_EXTENSION); /* Sleep until there's something to do */ wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE, PQsocket(conn), - -1L); + -1L, + wait_event); ResetLatch(MyLatch); CHECK_FOR_INTERRUPTS(); diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index f400785..bb975c1 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -679,6 +679,42 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser buffer in question. + + + Activity: The server
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Sep 29, 2016 at 9:17 PM, Michael Paquierwrote: > On Tue, Aug 30, 2016 at 11:04 AM, Michael Paquier > wrote: >> On Mon, Aug 29, 2016 at 9:39 PM, Craig Ringer >> wrote: >>> Cool. I'll mark as waiting on author pending that. >>> >>> It'll be good to get this footgun put away. >> >> OK, so done. I have put the renaming of pg_xlog to pg_wal on top patch >> stack as that's the one making no discussion it seems: a lot of people >> like pg_wal. I have added as well handling for the renaming in >> pg_basebackup by using PQserverVersion. One thing to note is that a >> connection needs to be made to the target server *before* creating the >> soft link of pg_xlog/pg_wal because we need to know the version of the >> target server. pg_upgrade is handled as well. And that's all in 0001. >> >> Patch 0002 does pg_clog -> pg_trans, "trans" standing for >> "transaction". Switching to pg_trans_status or pg_xact_status is not >> that complicated to change anyway... > > Any input to offer for those patches? If there is nothing happening, I > guess that the best move is just to move it to next CF. At least I can > see that the flow would be to get those renamings done. +1 for pg_xlog -> pg_wal. Of the existing suggestions, would like to add my vote for the following renames, matching pg_multixact: pg_clog -> pg_xact pg_subtrans -> pg_subxact If longer names are on the table, I would consider expanding all three of those: pg_clog -> pg_transaction pg_subtrans -> pg_subtransaction pg_multixact -> pg_multitransaction They sound eminently non-deletable. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 8:05 PM, Robert Haaswrote: > OK, another theory: Dilip is, I believe, reinitializing for each run, > and you are not. Yes, I am reinitializing for each run. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Hello, At Fri, 30 Sep 2016 13:11:21 +0900, Masahiko Sawadawrote in
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
On Fri, Sep 30, 2016 at 7:08 AM, Tom Lanewrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> I wouldn't even put a lot of faith in the errno being meaningful, >>> considering that it does close() calls before capturing the errno. > >> So we do close() in a bunch of places while closing shop, which calls >> _close() on Windows; this function sets errno. > > But only on failure, no? The close()s usually shouldn't fail, and > therefore shouldn't change errno, it's just that you can't trust that > 100%. > > I think likely what's happening is that we're seeing a leftover value from > some previous syscall that set GetLastError's result (and, presumably, > wasn't fatal so far as pg_upgrade was concerned). > It means that read() syscall could not read BLOCKSZ bytes because probably _vm file is corrupted? > if ((src_fd = open(fromfile, O_RDONLY, 0)) < 0) It could be happen that read() syscall stopped to read at where it reads bits representing '\n' characters (0x5c6e) because we don't open file with O_BINARY flag? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed Appreciate your support. I've tested v6 patch again, looks good to me, code in v6 does not differ much from v4 patch. Ready for committer review. Thanks ! Regards, Amul Sul The new status of this patch is: Ready for Committer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Thu, Sep 29, 2016 at 6:19 PM, David Fetterwrote: > On Thu, Sep 29, 2016 at 11:12:11AM +1300, Thomas Munro wrote: >> On Mon, Sep 26, 2016 at 5:11 PM, Thomas Munro >> wrote: >> > On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munro >> > wrote: >> >> >> >> On Mon, Sep 19, 2016 at 4:02 PM, David Fetter wrote: >> >> > >> >> > [training_wheels_004.patch] >> >> >> >> [review] >> >> Ping. > > Please find attached the next revision. I'm not sold on ERRCODE_SYNTAX_ERROR. There's nothing wrong with the syntax, since parsing succeeded. It would be cool if we could use ERRCODE_E_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED, though I'm not sure what error class 38 really means. Does require_where's hook function count as an 'external routine' and on that basis is it it allowed to raise this error? Thoughts, anyone? I am still seeing the underscore problem when building the docs. Please find attached fix-docs.patch which applies on top of training_wheels_005.patch. It fixes that problem for me. The regression test fails. The expected error messages show the old wording, so I think you forgot to add a file. Also, should contrib/require_where/Makefile define REGRESS = require_where? That would allow 'make check' from inside that directory, which is convenient and matches other extensions. Please find attached fix-regression-test.patch which also applies on top of training_wheels_005.patch and fixes those things for me, and also adds a couple of extra test cases. Robert Haas mentioned upthread that the authors section was too short, and your follow-up v3 patch addressed that. Somehow two authors got lost on their way to the v5 patch. Please find attached fix-authors.patch which also applies on top of training_wheels_005.patch to restore them. It would be really nice to be able to set this to 'Ready for Committer' in this CF. Do you want to post a v6 patch or are you happy for me to ask a committer to look at v5 + these three corrections? -- Thomas Munro http://www.enterprisedb.com fix-regression-test.patch Description: Binary data fix-docs.patch Description: Binary data fix-authors.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notice lock waits
On Fri, Sep 30, 2016 at 3:00 AM, Jeff Janeswrote: > On Wed, Sep 28, 2016 at 11:57 PM, Haribabu Kommi > wrote: >> >> >> Providing the details of lock wait to the client is good. I fell this >> message >> is useful for the cases where User/administrator is trying to perform some >> SQL operations. >> >> I also feel that, adding a GUC variable for these logs to show it to user >> may not be good. Changing the existing GUC may be better. >> > > I don't think it would be a good idea to refactor the existing GUC > (log_lock_waits) to accomplish this. > > There would have to be four states, log only, notice only, both log and > notice, and neither. But non-superusers can't be allowed to change the > log flag, only the notice flag. It is probably possible to implement that, > but it seems complicated both to implement, and to explain/document. I > think that adding another GUC is better than greatly complicating an > existing one. > Yes, I understood. Changing the existing GUC will make it complex. What do you think of Jim Nasby's idea of making a settable level, rather > just on or off? > I am not clearly understood, how the settable level works here? Based on log_min_messages or something, the behavior differs? The Notification messages are good, If we are going to add this facility only for lock waits, then a simple GUC is enough. If we are going to enhance the same for other messages, then I prefer something like log_statement GUC to take some input from user and those messages will be sent to the user. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] Hash Indexes
On Thu, Sep 29, 2016 at 8:53 PM, Peter Geogheganwrote: > On Fri, Sep 30, 2016 at 1:14 AM, Robert Haas wrote: >>> I, for one, agree with this position. >> >> Well, I, for one, find it frustrating. It seems pretty unhelpful to >> bring this up only after the code has already been written. The first >> post on this thread was on May 10th. The first version of the patch >> was posted on June 16th. This position was first articulated on >> September 15th. > > Really, what do we have to lose at this point? It's not very difficult > to do what Andres proposes. Well, first of all, I can't, because I don't really understand what tests he has in mind. Maybe somebody else does, in which case perhaps they could do the work and post the results. If the tests really are simple, that shouldn't be much of a burden. But, second, suppose we do the tests and find out that the hash-over-btree idea completely trounces hash indexes. What then? I don't think that would really prove anything because, as I said in my email to Andres, the current hash index code is severely under-optimized, so it's not really an apples-to-apples comparison. But even if it did prove something, is the idea then that Amit (with help from Mithun and Ashutosh Sharma) should throw away the ~8 months of development work that's been done on hash indexes in favor of starting all over with a new and probably harder project to build a whole new AM, and just leave hash indexes broken? That doesn't seem like a very reasonable think to ask. Leaving hash indexes broken fixes no problem that we have. On the other hand, applying those patches (after they've been suitably reviewed and fixed up) does fix several things. For one thing, we can stop shipping a totally broken feature in release after release. For another thing, those hash indexes do in fact outperform btree on some workloads, and with more work they can probably beat btree on more workloads. And if somebody later wants to write hash-over-btree and that turns out to be better still, great! I'm not blocking anyone from doing that. The only argument that's been advanced for not fixing hash indexes is that we'd then have to give people accurate guidance on whether to choose hash or btree, but that would also be true of a hypothetical hash-over-btree AM. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Learning to hack Postgres - Keeping track of ctids
On 30 September 2016 at 04:15, Emrulwrote: > Hi, > > I'm working on an idea to implement a graph database in Postgres. At the > moment its just a learning exercise. > > What I'd like to do is store a reference to all the links from one record > using an array type that stores links to all related tables. > > At first, I've succeeded in doing this using primary key Ids and this works > fine. However, I'd like to be able to bypass the index lookup altogether by > storing the ctids in my array instead of the primary key ids. > > Trouble of course is that ctids can get changed (like for instance > vacuuming). So my question is: how can I keep my ctid references up to date > - is there any way to detect when a ctid is changed? I expect that you'd have to teach the heapam code, vacuum, etc to do the required housekeeping. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 29, 2016 at 8:29 PM, Andres Freundwrote: > On September 29, 2016 5:28:00 PM PDT, Robert Haas > wrote: >>On Thu, Sep 29, 2016 at 8:16 PM, Andres Freund >>wrote: Well, I, for one, find it frustrating. It seems pretty unhelpful to bring this up only after the code has already been written. >>> >>> I brought this up in person at pgcon too. >> >>To whom? In what context? > > Amit, over dinner. OK, well, I can't really comment on that, then, except to say that if you waited three months to follow up on the mailing list, you probably can't blame Amit if he thought that it was more of a casual suggestion than a serious objection. Maybe it was? I don't know. For my part, I don't really understand how you think that we could find anything out via relatively simple tests. The hash index code is horribly under-maintained, which is why Amit is able to get large performance improvements out of improving it. If you compare it to btree in some way, it's probably going to lose. But I don't think that answers the question of whether a hash AM that somebody's put some work into will win or lose against a hypothetical hash-over-btree AM that nobody's written. Even if it wins, is that really a reason to leave the hash index code itself in a state of disrepair? We probably would have removed it already except that the infrastructure is used for hash joins and hash aggregation, so we really can't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Fri, Sep 30, 2016 at 1:14 AM, Robert Haaswrote: >> I, for one, agree with this position. > > Well, I, for one, find it frustrating. It seems pretty unhelpful to > bring this up only after the code has already been written. The first > post on this thread was on May 10th. The first version of the patch > was posted on June 16th. This position was first articulated on > September 15th. Really, what do we have to lose at this point? It's not very difficult to do what Andres proposes. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Fri, Sep 30, 2016 at 1:29 AM, Andres Freundwrote: >>To whom? In what context? > > Amit, over dinner. In case it matters, I also talked to Amit about this privately. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On September 29, 2016 5:28:00 PM PDT, Robert Haaswrote: >On Thu, Sep 29, 2016 at 8:16 PM, Andres Freund >wrote: >>> Well, I, for one, find it frustrating. It seems pretty unhelpful to >>> bring this up only after the code has already been written. >> >> I brought this up in person at pgcon too. > >To whom? In what context? Amit, over dinner. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 29, 2016 at 8:16 PM, Andres Freundwrote: >> Well, I, for one, find it frustrating. It seems pretty unhelpful to >> bring this up only after the code has already been written. > > I brought this up in person at pgcon too. To whom? In what context? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 2016-09-29 20:14:40 -0400, Robert Haas wrote: > On Thu, Sep 29, 2016 at 8:07 PM, Peter Geogheganwrote: > > On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund wrote: > >> On 2016-09-28 15:04:30 -0400, Robert Haas wrote: > >>> Andres already > >>> stated that he things working on btree-over-hash would be more > >>> beneficial than fixing hash, but at this point it seems like he's the > >>> only one who takes that position. > >> > >> Note that I did *NOT* take that position. I was saying that I think we > >> should evaluate whether that's not a better approach, doing some simple > >> performance comparisons. > > > > I, for one, agree with this position. > > Well, I, for one, find it frustrating. It seems pretty unhelpful to > bring this up only after the code has already been written. I brought this up in person at pgcon too. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 29, 2016 at 8:07 PM, Peter Geogheganwrote: > On Wed, Sep 28, 2016 at 8:06 PM, Andres Freund wrote: >> On 2016-09-28 15:04:30 -0400, Robert Haas wrote: >>> Andres already >>> stated that he things working on btree-over-hash would be more >>> beneficial than fixing hash, but at this point it seems like he's the >>> only one who takes that position. >> >> Note that I did *NOT* take that position. I was saying that I think we >> should evaluate whether that's not a better approach, doing some simple >> performance comparisons. > > I, for one, agree with this position. Well, I, for one, find it frustrating. It seems pretty unhelpful to bring this up only after the code has already been written. The first post on this thread was on May 10th. The first version of the patch was posted on June 16th. This position was first articulated on September 15th. But, by all means, please feel free to do the performance comparison and post the results. I'd be curious to see them myself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Declarative partitioning - another take
On Thu, Sep 29, 2016 at 8:09 AM, Amit Langotewrote: > I removed DEPENDENCY_IGNORE. Does the following look good or am I still > missing something? You missed your commit message, but otherwise looks fine. >> Also, I think this should be rephrased a bit to be more clear about >> how the partitioning key works, like this: The optional >> PARTITION BY clause specifies a method of >> partitioning the table. The table thus created is called a >> partitioned table. The parenthesized list of >> expressions forms the partitioning key for the >> table. When using range partitioning, the partioning key can include >> multiple columns or expressions, but for list partitioning, the >> partitioning key must consist of a single column or expression. If no >> btree operator class is specified when creating a partitioned table, >> the default btree operator class for the datatype will be used. If >> there is none, an error will be reported. > > Revised text along these lines. It seems you copied my typo: my text has "partioning" for "partitioning" and your patch now has that, too. > Things were that way initially, that is, the parent relations had no > relfilenode. I abandoned that project however. The storage-less parent > thing seemed pretty daunting to me to handle right away. For example, > optimizer and the executor code needed to be taught about the parent rel > appendrel member that used to be included as part of the result of > scanning the inheritance set but was no longer. Even if we leave the empty relfilenode around for now -- in the long run I think it should die -- I think we should prohibit the creation of subsidiary object on the parent which is only sensible if it has rows - e.g. indexes. It makes no sense to disallow non-inheritable constraints while allowing indexes, and it could box us into a corner later. >> +/* >> + * Run the expressions through eval_const_expressions. This is >> + * not just an optimization, but is necessary, because eventually >> + * the planner will be comparing them to similarly-processed qual >> + * clauses, and may fail to detect valid matches without this. >> + * We don't bother with canonicalize_qual, however. >> + */ >> >> I'm a bit confused by this, because I would think this processing >> ought to have been done before storing anything in the system >> catalogs. I don't see why it should be necessary to do it again after >> pulling data back out of the system catalogs. > > The pattern matches what's done for other expressions that optimizer deals > with, such as CHECK, index key, and index predicate expressions. That's kind of a non-answer answer, but OK. Still, I think you shouldn't just copy somebody else's comment blindly into a new place. Reference the original comment, or write your own. >> How can it be valid to have no partitioning expressions? > > Keys that are simply column names are resolved to attnums and stored > likewise. If some key is an expression, then corresponding attnum is 0 > and the expression itself is added to the list that gets stored into > partexprbin. It is doing the same thing as index expressions. Oh, right. Oops. >> +if (classform->relkind != relkind && >> +(relkind == RELKIND_RELATION && >> +classform->relkind != RELKIND_PARTITIONED_TABLE)) >> >> That's broken. Note that all of the conditions are joined using &&, >> so if any one of them fails then we won't throw an error. In >> particular, it's no longer possible to throw an error when relkind is >> not RELKIND_RELATION. > > You are right. I guess it would have to be the following: > > +if ((classform->relkind != relkind && > + classform->relkind != RELKIND_PARTITIONED_TABLE) || > +(classform->relkind == RELKIND_PARTITIONED_TABLE && > + relkind != RELKIND_RELATION)) > > Such hackishness could not be helped because we have a separate DROP > command for every distinct relkind, except we overload DROP TABLE for both > regular and partitioned tables. Maybe this would be better: if (relkind == RELKIND_PARTITIONED_TABLE) syntax_relkind = RELKIND_RELATION; else syntax_relkind = rekind; if (classform->relkind != syntax_relkind) DropErrorMsgWrongType(rel->relname, classform->relkind, relkind); >> Why isn't this logic being invoked from transformCreateStmt()? Then >> we could use the actual parseState for the query instead of a fake >> one. > > Because we need an open relation for it to work, which in this case there > won't be until after we have performed heap_create_with_catalog() in > DefineRelation(). Mainly because we need to perform transformExpr() on > expressions. That's similar to how cookConstraint() on the new CHECK > constraints cannot be performed earlier. Am I missing something? Hmm, yeah, I guess that's the same thing. I guess I got on this line of thinking because the function
Re: [HACKERS] Hash Indexes
On Wed, Sep 28, 2016 at 8:06 PM, Andres Freundwrote: > On 2016-09-28 15:04:30 -0400, Robert Haas wrote: >> Andres already >> stated that he things working on btree-over-hash would be more >> beneficial than fixing hash, but at this point it seems like he's the >> only one who takes that position. > > Note that I did *NOT* take that position. I was saying that I think we > should evaluate whether that's not a better approach, doing some simple > performance comparisons. I, for one, agree with this position. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Alvaro Herrerawrites: > Tom Lane wrote: >> I wouldn't even put a lot of faith in the errno being meaningful, >> considering that it does close() calls before capturing the errno. > So we do close() in a bunch of places while closing shop, which calls > _close() on Windows; this function sets errno. But only on failure, no? The close()s usually shouldn't fail, and therefore shouldn't change errno, it's just that you can't trust that 100%. I think likely what's happening is that we're seeing a leftover value from some previous syscall that set GetLastError's result (and, presumably, wasn't fatal so far as pg_upgrade was concerned). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
Michael Paquierwrites: > Oops. Are you using gcc to see that? I compiled with clang and on > Windows without noticing it. Peter already noted that you'd only see it on platforms that define PG_FLUSH_DATA_WORKS. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Alvaro Herrerawrites: > Moreover I think getErrorText() as a whole is misconceived and should be > removed altogether (why pstrdup the string?). Indeed. I think bouncing the error back to the caller is misguided to start with, seeing that the caller is just going to do pg_fatal anyway. We should rewrite these functions to just error out internally, which will make it much easier to provide decent error reporting indicating which call failed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
I wrote: > But what gets my attention in this connection is that it doesn't > seem to be taking the trouble to open the files in binary mode. > Could that lead to the reported failure? Not sure, but it seems > like at the least it could result in corrupted VM files. On further thought, it definitely would lead to some sort of hard-to-diagnose error. Assuming there's some bytes that look like \n, Windows read() would expand them to \r\n, which not only produces garbage vismap data but throws off the page boundaries in the file. rewriteVisibilityMap() would not notice that, since it contains exactly no sanity checking on the page headers, but what it would notice is getting a short read on what it'll think is a partial last page. Then it does this: if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ) { close(dst_fd); close(src_fd); return getErrorText(); } The read() won't have set errno, since by its lights there's nothing wrong. So while we know that getErrorText saw errno == EINVAL, there's no way to guess what call that might've been left over from. BTW, just to add to the already long list of mortal sins in this bit of code, I believe it does 100% the wrong thing on big-endian hardware. uint16new_vmbits = 0; ... /* Copy new visibility map bit to new format page */ memcpy(new_cur, _vmbits, BITS_PER_HEAPBLOCK); That's going to drop the two new bytes down in exactly the wrong order if big-endian. Oh, and for even more fun, this bit will dump core on alignment-picky machines, if the vmbuf local array starts on an odd boundary: ((PageHeader) new_vmbuf)->pd_checksum = pg_checksum_page(new_vmbuf, new_blkno); We might've caught these things earlier if the buildfarm testing included cross-version upgrades, but of course that requires a lot of test infrastructure that's not there ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On Fri, Sep 30, 2016 at 1:31 AM, Jeff Janeswrote: > On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut > wrote: >> >> On 9/26/16 10:34 PM, Michael Paquier wrote: >> > I thought that as long as the error string is shown to the user, it >> > does not matter much if errno is still saved or not. All the callers >> > of durable_rename() on frontends don't check for strerrno(errno) >> > afterwards. Do you think it matters? Changing that back is trivial. >> > Sorry for not mentioning directly in the thread that I modified that >> > when dropping the last patch set. >> >> Actually, I think the equivalent backend code only does this to save the >> errno across the close call because the elog call comes after the close. >> So it's OK to not do that in the frontend. >> >> With that in mind, I have committed the v3 series now. Thanks! > I'm getting compiler warnings: > > file_utils.c: In function 'fsync_pgdata': > file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible > pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' > but argument is of type 'void (*)(const char *, bool, const char *)' > file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible > pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' > but argument is of type 'void (*)(const char *, bool, const char *)' > file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible > pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' > but argument is of type 'void (*)(const char *, bool, const char *)' Oops. Are you using gcc to see that? I compiled with clang and on Windows without noticing it. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Tom Lane wrote: > Thomas Kellererwrites: > > for some reason pg_upgrade failed on Windows 10 for me, with an error > > message that one specifc _vm file couldn't be copied. > > Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new > code for 9.6 and hasn't really gotten that much testing. Its error > reporting is shamefully bad --- you can't tell which step failed, and > I wouldn't even put a lot of faith in the errno being meaningful, > considering that it does close() calls before capturing the errno. So we do close() in a bunch of places while closing shop, which calls _close() on Windows; this function sets errno. Then we call getErrorText(), which calls _dosmaperr() on the result of GetLastError(). But the last-error stuff is not set by _close; I suppose GetLastError() returns 0 in that case, which promps _doserrmap to set errno to 0. http://stackoverflow.com/questions/20056851/getlasterror-errno-formatmessagea-and-strerror-s So this wouldn't quite have the effect you say; I think it'd say "Failure while copying ...: Success" instead. However surely we should have errno save/restore. Other than that, I think the _dosmaperr() call should go entirely. Moreover I think getErrorText() as a whole is misconceived and should be removed altogether (why pstrdup the string?). There are very few places in pg_upgrade that require _dosmaperr; I can see only copyFile and linkFile. All the others should just be doing strerror() only, at least according to the manual. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Thomas Kellererwrites: > for some reason pg_upgrade failed on Windows 10 for me, with an error message > that one specifc _vm file couldn't be copied. Hmm ... a _vm file would go through rewriteVisibilityMap(), which is new code for 9.6 and hasn't really gotten that much testing. Its error reporting is shamefully bad --- you can't tell which step failed, and I wouldn't even put a lot of faith in the errno being meaningful, considering that it does close() calls before capturing the errno. But what gets my attention in this connection is that it doesn't seem to be taking the trouble to open the files in binary mode. Could that lead to the reported failure? Not sure, but it seems like at the least it could result in corrupted VM files. Has anyone tested vismap upgrades on Windows, and made an effort to validate that the output wasn't garbage? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Congrats on the on-time release!
Hackers, I wanted to congratulate everyone involved (and it's a long list of people) in having our first on-schedule major release since 9.3. Especially the RMT, which did a lot to make that happen. Getting the release train to run on time is a major accomplishment, and will help both development and adoption. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of operations in SubPostmasterMain()
On 2016-09-29 15:46:00 -0400, Tom Lane wrote: > I noticed that buildfarm member culicidae, which is running an > EXEC_BACKEND build on Linux, occasionally falls over like this: > > FATAL: could not reattach to shared memory (key=6280001, > addr=0x7fa9df845000): Invalid argument > > That's probably because Andres failed to disable ASLR on that machine, Intentionally so, FWIW. Want to enable PIE as well soon-ish. Newer versions of windows / msvc want to enable ASLR by default, and I think that'll break parallel query bgworkers, because they pass the entry point around as a pointer. So I don't think we'll be able to duck this issue for much longer. > but the exact cause isn't too important right now. What I'm finding > distressing is that that message doesn't show up on the client side, > making it look like a postmaster crash: Indeed. > The reason we don't see anything is that backend libpq hasn't been > initialized yet, so it won't send the ereport message to the client. > > The original design of SubPostmasterMain() intended that such cases > would be reported, because it runs BackendInitialize (which calls > pq_init()) before CreateSharedMemoryAndSemaphores (which is where > the reattach used to happen --- the comments at postmaster.c 4730ff > and 4747ff reflect this expectation). We broke it when we added > the earlier reattach call at lines 4669ff. > > We could probably refactor things enough so that we do pq_init() > before PGSharedMemoryReAttach(). It would be a little bit ugly, > and it would fractionally increase the chance of a reattach failure > because pq_init() palloc's a few KB worth of buffers. I'm not quite > sure if it's worth it; thoughts? In any case the mentioned comments > are obsolete and need to be moved/rewritten. Hm. It doesn't really seem necessary to directly allocate that much. Seems pretty harmless to essentially let it start at 0 bytes (so we can repalloc, but don't use a lot of memory)? > Another thing that I'm finding distressing while I look at this is > the placement of ClosePostmasterPorts(). That needs to happen *early*, > because as long as the child process is holding open the postmaster side > of the postmaster death pipe, we are risking unhappiness. Not only is > it not particularly early, it's after process_shared_preload_libraries() > which could invoke arbitrarily stupid stuff. Whether or not we do > anything about moving pq_init(), I'm thinking we'd better move the > ClosePostmasterPorts() call so that it's done as soon as possible, > probably right after read_backend_variables() which loads the data > it needs. That seems like a good idea. Besides the benefit you mention, it looks like that'll also reduce duplication. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of operations in SubPostmasterMain()
Greg Starkwrites: > On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane wrote: >> We could probably refactor things enough so that we do pq_init() >> before PGSharedMemoryReAttach(). It would be a little bit ugly, >> and it would fractionally increase the chance of a reattach failure >> because pq_init() palloc's a few KB worth of buffers. I'm not quite >> sure if it's worth it; thoughts? In any case the mentioned comments >> are obsolete and need to be moved/rewritten. > Alternately we could call pq_init in the error path if it hasn't been > called yet. I'm sure there are problems with doing that in general but > for the specific errors that can happen before pq_init it might be > feasible since they obviously can't have very much shared state yet to > have corrupted. Thanks for the suggestion, but I don't think it helps much. Most of the refactoring pain comes from the fact that we'd have to set up MyProcPort earlier (since pqcomm.c needs that to be valid), and it's not very clear just where to do that. Having pq_init be called behind the scenes does nothing to fix that issue. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Learning to hack Postgres - Keeping track of ctids
Hi, I'm working on an idea to implement a graph database in Postgres. At the moment its just a learning exercise. What I'd like to do is store a reference to all the links from one record using an array type that stores links to all related tables. At first, I've succeeded in doing this using primary key Ids and this works fine. However, I'd like to be able to bypass the index lookup altogether by storing the ctids in my array instead of the primary key ids. Trouble of course is that ctids can get changed (like for instance vacuuming). So my question is: how can I keep my ctid references up to date - is there any way to detect when a ctid is changed? -- View this message in context: http://postgresql.nabble.com/Learning-to-hack-Postgres-Keeping-track-of-ctids-tp5923649.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability
On 9/29/16 12:31 PM, Jeff Janes wrote: > With that in mind, I have committed the v3 series now. > > > I'm getting compiler warnings: Fixed. > > file_utils.c: In function 'fsync_pgdata': > file_utils.c:86: warning: passing argument 2 of 'walkdir' from > incompatible pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char > *)' but argument is of type 'void (*)(const char *, bool, const char *)' > file_utils.c:88: warning: passing argument 2 of 'walkdir' from > incompatible pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char > *)' but argument is of type 'void (*)(const char *, bool, const char *)' > file_utils.c:89: warning: passing argument 2 of 'walkdir' from > incompatible pointer type > file_utils.c:36: note: expected 'int (*)(const char *, bool, const char > *)' but argument is of type 'void (*)(const char *, bool, const char *)' -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Order of operations in SubPostmasterMain()
On Thu, Sep 29, 2016 at 8:46 PM, Tom Lanewrote: > We could probably refactor things enough so that we do pq_init() > before PGSharedMemoryReAttach(). It would be a little bit ugly, > and it would fractionally increase the chance of a reattach failure > because pq_init() palloc's a few KB worth of buffers. I'm not quite > sure if it's worth it; thoughts? In any case the mentioned comments > are obsolete and need to be moved/rewritten. Just speaking off the cuff without reviewing the code in detail... Alternately we could call pq_init in the error path if it hasn't been called yet. I'm sure there are problems with doing that in general but for the specific errors that can happen before pq_init it might be feasible since they obviously can't have very much shared state yet to have corrupted. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 9/29/16 4:00 PM, Peter Eisentraut wrote: > Since the commit fest is drawing to a close, I'll set this patch as > returned with feedback. Actually, the CF app informs me that moving to the next CF is more appropriate, so I have done that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
I think we should look into handling the different page types better. The hash_page_stats function was copied from btree, which only has one type. It's not clear whether all the values apply to each page type. At least they should be null if they don't apply. BRIN has a separate function for each page type, which might make more sense. I suggest taken the test suite that I posted and expanding the tests so that we see output for each different page type. Besides that, I would still like better data types for some of the output columns, as previously discussed. In addition to what I already pointed out, the ctid column can be of type ctid instead of text. Since the commit fest is drawing to a close, I'll set this patch as returned with feedback. Please continue working on it, since there is clearly renewed interest in hash indexes, and we'll need this type of functionality for that. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Order of operations in SubPostmasterMain()
I noticed that buildfarm member culicidae, which is running an EXEC_BACKEND build on Linux, occasionally falls over like this: FATAL: could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument That's probably because Andres failed to disable ASLR on that machine, but the exact cause isn't too important right now. What I'm finding distressing is that that message doesn't show up on the client side, making it look like a postmaster crash: *** /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/contrib/intarray/expected/_int.out 2016-09-15 22:02:39.512951557 + --- /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/contrib/intarray/results/_int.out 2016-09-29 17:52:56.921948612 + *** *** 1,568 ! lots ... --- 1,3 ! psql: server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. The reason we don't see anything is that backend libpq hasn't been initialized yet, so it won't send the ereport message to the client. The original design of SubPostmasterMain() intended that such cases would be reported, because it runs BackendInitialize (which calls pq_init()) before CreateSharedMemoryAndSemaphores (which is where the reattach used to happen --- the comments at postmaster.c 4730ff and 4747ff reflect this expectation). We broke it when we added the earlier reattach call at lines 4669ff. We could probably refactor things enough so that we do pq_init() before PGSharedMemoryReAttach(). It would be a little bit ugly, and it would fractionally increase the chance of a reattach failure because pq_init() palloc's a few KB worth of buffers. I'm not quite sure if it's worth it; thoughts? In any case the mentioned comments are obsolete and need to be moved/rewritten. Another thing that I'm finding distressing while I look at this is the placement of ClosePostmasterPorts(). That needs to happen *early*, because as long as the child process is holding open the postmaster side of the postmaster death pipe, we are risking unhappiness. Not only is it not particularly early, it's after process_shared_preload_libraries() which could invoke arbitrarily stupid stuff. Whether or not we do anything about moving pq_init(), I'm thinking we'd better move the ClosePostmasterPorts() call so that it's done as soon as possible, probably right after read_backend_variables() which loads the data it needs. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
I wrote some tests for pageinspect, attached here (hash stuff in a separate patch). I think all the output numbers ought to be deterministic (I excluded everything that might contain xids). Please test. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 0001-Add-tests-for-pageinspect.patch Description: invalid/octet-stream 0002-Add-tests-for-pageinspect-hash.patch Description: invalid/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 09/23/2016 10:27 PM, Jim Nasby wrote: On 9/23/16 2:42 AM, Heikki Linnakangas wrote: How do we handle single-dimensional arrays of composite types at the moment? At a quick glance, it seems that the composite types are just treated like strings, when they're in an array. That's probably OK, but it means that there's nothing special about composite types in multi-dimensional arrays. In any case, we should mention that in the docs. That is how they're handled, but I'd really like to change that. I've held off because I don't know how to handle the backwards incompatibility that would introduce. (I've been wondering if we might add a facility to allow specifying default TRANSFORMs that should be used for specific data types in specific languages.) The converse case (a composite with arrays) suffers the same problem (array is just treated as a string). I take that back, I don't know what I was talking about. Without this patch, an array of composite types can be returned, using any of the three representations for the composite type explained in the docs: a string, a sequence, or a dictionary. So, all these work, and return the same value: create table foo (a int4, b int4); CREATE FUNCTION comp_array_string() RETURNS foo[] AS $$ return ["(1, 2)"] $$ LANGUAGE plpythonu; CREATE FUNCTION comp_array_sequence() RETURNS foo[] AS $$ return [[1, 2]] $$ LANGUAGE plpythonu; CREATE FUNCTION comp_array_dict() RETURNS foo[] AS $$ return [{"a": 1, "b": 2}] $$ LANGUAGE plpythonu; Jim, I was confused, but you agreed with me. Were you also confused, or am I missing something? Now, back to multi-dimensional arrays. I can see that the Sequence representation is problematic, with arrays, because if you have a python list of lists, like [[1, 2]], it's not immediately clear if that's a one-dimensional array of tuples, or two-dimensional array of integers. Then again, we do have the type definitions available. So is it really ambiguous? The string and dict representations don't have that ambiguity at all, so I don't see why we wouldn't support those, at least. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
This patch do not apply on latest code. it fails as follows libpq-failover-9.patch:176: trailing whitespace. thread.o pgsleep.o libpq-failover-9.patch:184: trailing whitespace. check: libpq-failover-9.patch:185: trailing whitespace. $(prove_check) libpq-failover-9.patch:186: trailing whitespace. libpq-failover-9.patch:194: trailing whitespace. rm -rf tmp_check error: patch failed: doc/src/sgml/libpq.sgml:792 error: doc/src/sgml/libpq.sgml: patch does not apply error: patch failed: src/interfaces/libpq/Makefile:36 error: src/interfaces/libpq/Makefile: patch does not apply error: patch failed: src/interfaces/libpq/fe-connect.c:299 error: src/interfaces/libpq/fe-connect.c: patch does not apply error: patch failed: src/interfaces/libpq/libpq-fe.h:62 error: src/interfaces/libpq/libpq-fe.h: patch does not apply error: patch failed: src/interfaces/libpq/libpq-int.h:334 error: src/interfaces/libpq/libpq-int.h: patch does not apply error: patch failed: src/interfaces/libpq/test/expected.out:1 error: src/interfaces/libpq/test/expected.out: patch does not apply error: patch failed: src/test/perl/PostgresNode.pm:398 error: src/test/perl/PostgresNode.pm: patch does not apply On Tue, Sep 27, 2016 at 2:49 PM, Victor Wagnerwrote: 1). >* if there is more than one host in the connect string and >* target_server_type is not specified explicitely, set >* target_server_type to "master", because default mode of >* operation is failover, and so, we need to connect to RW >* host, and keep other nodes of the cluster in the connect >* string just in case master would fail and one of standbys >* would be promoted to master. I am slightly confused. As per this target_server_type=master means user is expecting failover. What if user pass target_server_type=any and request sequential connection isn't this also a case for failover?. I think by default it should be "any" for any number of hosts. And parameter "sequential and random will decide whether we want just failover or with load-balance. 2). >For some reason DNS resolving was concentrated in one place before my >changes. So, I've tried to not change this decision. My intention was not to have a replacement function for "pg_getaddrinfo_all", I just suggested to have a local function which call pg_getaddrinfo_all for every host, port pair read earlier. By this way we need not to maintain nodes struct. This also reduces complexity of connectDBStart. FUNC (host, port, addrs) { CALL pg_getaddrinfo_all(host, port, newaddrs); addrs-> ai_next = newaddrs; } 3). I think you should run a spellcheck once. And, there are some formatting issue with respect to comments and curly braces of controlled blocks which need to be fixed. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Let file_fdw access COPY FROM PROGRAM
Corey Huinkerwrites: > [ file_fdw_program_v3.diff ] Pushed with cosmetic adjustments, mostly more work on the comments and documentation. I did not push the proposed test case; it's unportable. The upthread suggestion to add a TAP test would have been all right, because --enable-tap-tests requires Perl, but the basic regression tests must not. I'm a bit dubious that it'd be worth the work to create such a test anyway, when COPY FROM PROGRAM itself hasn't got one. What *would* be worth some effort is allowing ANALYZE on a file_fdw table reading from a program. I concur that that probably can't be the default behavior, but always falling back to the 10-block default with no pg_stats stats is a really horrid prospect. One idea is to invent another table-level FDW option "analyze". If we could make that default to true for files and false for programs, it'd preserve the desired default behavior, but it would add a feature for plain files too: if they're too unstable to be worth analyzing, you could turn it off. Another thought is that maybe manual ANALYZE should go through in any case, and the FDW option would only be needed to control auto-analyze. Although I'm not sure what to think about scripted cases like vacuumdb --analyze. Maybe we'd need two flags, one permitting explicit ANALYZE and one for autoanalyze, which could have different defaults. Another thing that felt a little unfinished was the cost estimation behavior. Again, it's not clear how to do any better by default, but is it worth the trouble to provide an FDW option to let the user set the cost estimate for reading the table? I'm not sure honestly. Since there's only one path for the FDW itself, the cost estimate doesn't matter in simple cases, and I'm not sure how much it matters even in more complicated ones. It definitely sucks that we don't have a rows estimate that has anything to do with reality, but allowing ANALYZE would be enough to handle that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notice lock waits
On Wed, Sep 28, 2016 at 11:57 PM, Haribabu Kommiwrote: > > > On Sat, Aug 6, 2016 at 3:00 AM, Jeff Janes wrote: > >> One time too many, I ran some minor change using psql on a production >> server and was wondering why it was taking so much longer than it did >> on the test server. Only to discover, after messing around with >> opening new windows and running queries against pg_stat_activity and >> pg_locks and so on, that it was waiting for a lock. >> >> So I created a new guc, notice_lock_waits, which acts like >> log_lock_waits but sends the message as NOTICE so it will show up on >> interactive connections like psql. >> >> I turn it on in my .psqlrc, as it doesn't make much sense for me to >> turn it on in non-interactive sessions. >> >> A general facility for promoting selected LOG messages to NOTICE would >> be nice, but I don't know how to design or implement that. This is >> much easier, and I find it quite useful. >> >> I have it PGC_SUSET because it does send some tiny amount of >> information about the blocking process (the PID) to the blocked >> process. That is probably too paranoid, because the PID can be seen >> by anyone in the pg_locks table anyway. >> >> Do you think this is useful and generally implemented in the correct >> way? If so, I'll try to write some sgml documentation for it. >> > > > Providing the details of lock wait to the client is good. I fell this > message > is useful for the cases where User/administrator is trying to perform some > SQL operations. > > I also feel that, adding a GUC variable for these logs to show it to user > may not be good. Changing the existing GUC may be better. > I don't think it would be a good idea to refactor the existing GUC (log_lock_waits) to accomplish this. There would have to be four states, log only, notice only, both log and notice, and neither. But non-superusers can't be allowed to change the log flag, only the notice flag. It is probably possible to implement that, but it seems complicated both to implement, and to explain/document. I think that adding another GUC is better than greatly complicating an existing one. What do you think of Jim Nasby's idea of making a settable level, rather just on or off? Thanks, Jeff
Re: [HACKERS] Tuplesort merge pre-reading
On Thu, Sep 29, 2016 at 11:38 AM, Peter Geogheganwrote: > On Thu, Sep 29, 2016 at 2:59 PM, Robert Haas wrote: >>> Maybe that was the wrong choice of words. What I mean is that it seems >>> somewhat unprincipled to give over an equal share of memory to each >>> active-at-least-once tape, ... >> >> I don't get it. If the memory is being used for prereading, then the >> point is just to avoid doing many small I/Os instead of one big I/O, >> and that goal will be accomplished whether the memory is equally >> distributed or not; indeed, it's likely to be accomplished BETTER if >> the memory is equally distributed than if it isn't. > > I think it could hurt performance if preloading loads runs on a tape > that won't be needed until some subsequent merge pass, in preference > to using that memory proportionately, giving more to larger input runs > for *each* merge pass (giving memory proportionate to the size of each > run to be merged from each tape). For tapes with a dummy run, the > appropriate amount of memory for there next merge pass is zero. OK, true. But I still suspect that unless the amount of data you need to read from a tape is zero or very small, the size of the buffer doesn't matter. For example, if you have a 1GB tape and a 10GB tape, I doubt there's any benefit in making the buffer for the 10GB tape 10x larger. They can probably be the same. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tracking wait event for latches
On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquierwrote: > On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas wrote: >> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier >> wrote: >>> So should I change back the patch to have only one argument for the >>> eventId, and guess classId from it? >> >> Why would you need to guess? > > Incorrect wording from me perhaps? i just meant that processing needs > to know what is the classId coming for a specific eventId. > >> But, yes, I think one argument is much preferable. > > OK. Here is the wanted patch. I have reduced the routines of WaitLatch > & friends to use only one argument, and added what is the classId > associated with a given eventId in an array of multiple fields, giving > something like that: > + const struct wait_event_entry WaitEventEntries[] = { > + /* Activity */ > + {WAIT_ACTIVITY, "ArchiverMain"}, > [...] > > I have cleaned up as well the inclusions of pgstat.h that I added > previously. Patch is attached. It seems to me that you haven't tested this patch very carefully, because as far as I can see it breaks wait event reporting for both heavyweight locks and buffer pins - or in other words two out of the three existing cases covered by the mechanism you are patching. The heavyweight lock portion is broken because WaitOnLock() reports a Lock wait before calling ProcSleep(), which now clobbers it. Instead of reporting that we're waiting for Lock/relation, a LOCK TABLE statement that hangs now reports IPC/ProcSleep. Similarly, a conflict over a tuple lock is now also reported as IPC/ProcSleep, and ditto for any other kind of lock, which were all reported separately before. Obviously, that's no good. I think it would be just fine to push down setting the wait event into ProcSleep(), doing it when we actually WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to report that we're waiting for the heavyweight lock, not ProcSleep(). As for buffer pins, note that LockBufferForCleanup() calls ProcWaitForSignal(), which now overwrites the wait event set in by its caller. I think we could just make ProcWaitForSignal() take a wait event as an argument; then LockBufferForCleanup() could pass an appropriate constant and other callers of ProcWaitForSignal() could pass their own wait events. The way that we're constructing the wait event ID that ultimately gets advertised in pg_stat_activity is a bit silly in this patch. We start with an event ID (which is an integer) and we then do an array lookup (via GetWaitEventClass) to get a second integer which is then XOR'd back into the first integer (via pgstat_report_wait_start) before storing it in the PGPROC. New idea: let's change pgstat_report_wait_start() to take a single integer argument which it stores without interpretation. Let's separate the construction of the wait event into a separate macro, say make_wait_event(). Then existing callers of pgstat_report_wait_start(a, b) will instead do pgstat_report_wait_start(make_wait_event(a, b)), but callers that want to construct the wait event and push it through a few levels of the call stack before advertising it only need to pass one integer. If done correctly, this should be cleaner and faster than what you've got right now. I am not a huge fan of the "WE_" prefix you've used. It's not terrible, but "we" doesn't obviously stand for "wait event" and it's also a word itself. I think a little more verbosity would add clarity. Maybe we could go with WAIT_EVENT_ instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On Thu, Sep 29, 2016 at 8:33 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 9/26/16 10:34 PM, Michael Paquier wrote: > > I thought that as long as the error string is shown to the user, it > > does not matter much if errno is still saved or not. All the callers > > of durable_rename() on frontends don't check for strerrno(errno) > > afterwards. Do you think it matters? Changing that back is trivial. > > Sorry for not mentioning directly in the thread that I modified that > > when dropping the last patch set. > > Actually, I think the equivalent backend code only does this to save the > errno across the close call because the elog call comes after the close. > So it's OK to not do that in the frontend. > > With that in mind, I have committed the v3 series now. > I'm getting compiler warnings: file_utils.c: In function 'fsync_pgdata': file_utils.c:86: warning: passing argument 2 of 'walkdir' from incompatible pointer type file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' but argument is of type 'void (*)(const char *, bool, const char *)' file_utils.c:88: warning: passing argument 2 of 'walkdir' from incompatible pointer type file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' but argument is of type 'void (*)(const char *, bool, const char *)' file_utils.c:89: warning: passing argument 2 of 'walkdir' from incompatible pointer type file_utils.c:36: note: expected 'int (*)(const char *, bool, const char *)' but argument is of type 'void (*)(const char *, bool, const char *)' Cheers, Jeff
Re: [HACKERS] pageinspect: Hash index support
On 09/29/2016 11:58 AM, Peter Eisentraut wrote: On 9/27/16 10:10 AM, Jesper Pedersen wrote: contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 contrib/pageinspect/pageinspect--1.5.sql | 279 -- contrib/pageinspect/pageinspect--1.6.sql | 346 ++ I think there is still a misunderstanding about these extension files. You only need to supply pageinspect--1.5--1.6.sql and leave all the other ones alone. Ok. Left the 'DATA' section in the Makefile, and the control file as is. Best regards, Jesper >From b48623beda6aad43116d46ff44de55bdc7547325 Mon Sep 17 00:00:00 2001 From: jesperpedersenDate: Fri, 5 Aug 2016 10:16:32 -0400 Subject: [PATCH] pageinspect: Hash index support --- contrib/pageinspect/Makefile | 10 +- contrib/pageinspect/hashfuncs.c | 408 ++ contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 contrib/pageinspect/pageinspect.control | 2 +- doc/src/sgml/pageinspect.sgml | 146 - 5 files changed, 609 insertions(+), 16 deletions(-) create mode 100644 contrib/pageinspect/hashfuncs.c create mode 100644 contrib/pageinspect/pageinspect--1.5--1.6.sql diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile index a98237e..c73d728 100644 --- a/contrib/pageinspect/Makefile +++ b/contrib/pageinspect/Makefile @@ -2,13 +2,13 @@ MODULE_big = pageinspect OBJS = rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \ - brinfuncs.o ginfuncs.o $(WIN32RES) + brinfuncs.o ginfuncs.o hashfuncs.o $(WIN32RES) EXTENSION = pageinspect -DATA = pageinspect--1.5.sql pageinspect--1.4--1.5.sql \ - pageinspect--1.3--1.4.sql pageinspect--1.2--1.3.sql \ - pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \ - pageinspect--unpackaged--1.0.sql +DATA = pageinspect--1.6.sql pageinspect--1.5--1.6.sql \ + pageinspect--1.4--1.5.sql pageinspect--1.3--1.4.sql \ + pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \ + pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql PGFILEDESC = "pageinspect - functions to inspect contents of database pages" ifdef USE_PGXS diff --git a/contrib/pageinspect/hashfuncs.c b/contrib/pageinspect/hashfuncs.c new file mode 100644 index 000..a76b683 --- /dev/null +++ b/contrib/pageinspect/hashfuncs.c @@ -0,0 +1,408 @@ +/* + * hashfuncs.c + * Functions to investigate the content of HASH indexes + * + * Copyright (c) 2016, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/pageinspect/hashfuncs.c + */ + +#include "postgres.h" + +#include "access/hash.h" +#include "access/htup_details.h" +#include "funcapi.h" +#include "miscadmin.h" +#include "utils/builtins.h" + +PG_FUNCTION_INFO_V1(hash_metapage_info); +PG_FUNCTION_INFO_V1(hash_page_items); +PG_FUNCTION_INFO_V1(hash_page_stats); + +/* + * structure for single hash page statistics + * + */ +typedef struct HashPageStat +{ + uint32 live_items; + uint32 dead_items; + uint32 page_size; + uint32 free_size; + char *type; + + /* opaque data */ + BlockNumber hasho_prevblkno; + BlockNumber hasho_nextblkno; + Bucket hasho_bucket; + uint16 hasho_flag; + uint16 hasho_page_id; +} HashPageStat; + + +/* + * Verify that the given bytea contains a HASH page, or die in the attempt. + * A pointer to the page is returned. + */ +static Page +verify_hash_page(bytea *raw_page, bool metap) +{ + Page page; + int raw_page_size; + HashPageOpaque pageopaque; + + raw_page_size = VARSIZE(raw_page) - VARHDRSZ; + + if (raw_page_size != BLCKSZ) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("input page too small"), + errdetail("Expected size %d, got %d", + BLCKSZ, raw_page_size))); + + page = VARDATA(raw_page); + pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); + if (pageopaque->hasho_page_id != HASHO_PAGE_ID) + ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a HASH page"), + errdetail("Expected %08x, got %08x.", + HASHO_PAGE_ID, pageopaque->hasho_page_id))); + + if (metap) + { + if (pageopaque->hasho_flag != LH_META_PAGE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is not a metadata page"), + errdetail("Expected %08x, got %08x.", + LH_META_PAGE, pageopaque->hasho_flag))); + } + else + { + if (pageopaque->hasho_flag == LH_META_PAGE) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("page is a metadata page"), + errdetail("Flags %08x.", + pageopaque->hasho_flag))); + } + + return page; +} + +/* - + * GetHashPageStatistics() + * + * Collect statistics of single hash page + * - + */ +static void +GetHashPageStatistics(Page page, HashPageStat
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
> >> The attached patch adds the >> dependencies from create_foreignscan_plan() which is called for any >> foreign access. I have also added testcases to test the functionality. >> Let me know your comments on the patch. > > > Hmm. I'm not sure that that's a good idea. > > I was thinking the changes to setrefs.c proposed by Amit to collect that > dependencies would be probably OK, but I wasn't sure that it's a good idea > that he used PlanCacheFuncCallback as the syscache inval callback function > for the foreign object caches because it invalidates not only generic plans > but query trees, as you mentioned downthread. So, I was thinking to modify > his patch so that we add a new syscache inval callback function for the > caches that is much like PlanCacheFuncCallback but only invalidates generic > plans. PlanCacheFuncCallback() invalidates the query tree only when invalItems are added to the plan source. The patch adds the dependencies in root->glob->invalItems, which standard_planner() copies into PlannedStmt::invalItems. This is then copied into the gplan->stmt_list. Thus PlanCacheFuncCallback never invalidates the query tree. I have verified this under the debugger. Am I missing something? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pageinspect: Hash index support
On 9/27/16 10:10 AM, Jesper Pedersen wrote: > contrib/pageinspect/pageinspect--1.5--1.6.sql | 59 > contrib/pageinspect/pageinspect--1.5.sql | 279 -- > contrib/pageinspect/pageinspect--1.6.sql | 346 ++ I think there is still a misunderstanding about these extension files. You only need to supply pageinspect--1.5--1.6.sql and leave all the other ones alone. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Tom Lane wrote: > Robert Haaswrites: > > As long as we get %t and %p in there we're going to be way ahead, really. > > Could we get consensus on just changing the default to > > log_line_prefix = '%t [%p] ' > > and leaving the rest out of it? +1 from me. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Thu, Sep 29, 2016 at 2:59 PM, Robert Haaswrote: >> Maybe that was the wrong choice of words. What I mean is that it seems >> somewhat unprincipled to give over an equal share of memory to each >> active-at-least-once tape, ... > > I don't get it. If the memory is being used for prereading, then the > point is just to avoid doing many small I/Os instead of one big I/O, > and that goal will be accomplished whether the memory is equally > distributed or not; indeed, it's likely to be accomplished BETTER if > the memory is equally distributed than if it isn't. I think it could hurt performance if preloading loads runs on a tape that won't be needed until some subsequent merge pass, in preference to using that memory proportionately, giving more to larger input runs for *each* merge pass (giving memory proportionate to the size of each run to be merged from each tape). For tapes with a dummy run, the appropriate amount of memory for there next merge pass is zero. I'm not arguing that it would be worth it to do that, but I do think that that's the sensible way of framing the idea of using a uniform amount of memory to every maybe-active tape up front. I'm not too concerned about this because I'm never too concerned about multiple merge pass cases, which are relatively rare and relatively unimportant. Let's just get our story straight. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
On 9/26/16 10:34 PM, Michael Paquier wrote: > I thought that as long as the error string is shown to the user, it > does not matter much if errno is still saved or not. All the callers > of durable_rename() on frontends don't check for strerrno(errno) > afterwards. Do you think it matters? Changing that back is trivial. > Sorry for not mentioning directly in the thread that I modified that > when dropping the last patch set. Actually, I think the equivalent backend code only does this to save the errno across the close call because the elog call comes after the close. So it's OK to not do that in the frontend. With that in mind, I have committed the v3 series now. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall
Robert Haaswrites: > On Thu, Sep 29, 2016 at 4:04 AM, Haribabu Kommi > wrote: >> I am also not sure whether pg_dumpall -g and then individual pg_dump >> is the more widely used approach or not? > That's the approach I normally recommend. The fundamental thing we have to do in order to move forward on this is to rethink what's the division of labor between pg_dump and pg_dumpall. I find the patch as presented quite unacceptable because it's made no effort to do that (or even to touch the documentation). What do people think of this sketch: 1. pg_dump without --create continues to do what it does today, ie it just dumps objects within the database, assuming that database-level properties will already be set correctly for the target database. 2. pg_dump with --create creates the target database and also sets all database-level properties (ownership, ACLs, ALTER DATABASE SET, etc etc). 3. pg_dumpall loses all code relating to individual-database creation and property setting and instead relies on pg_dump --create to do that. This would leave only the code relating to "pg_dumpall -g" (ie, dump roles and tablespaces) within pg_dumpall itself. One thing that would still be messy is that presumably "pg_dumpall -g" would issue ALTER ROLE SET commands, but it's unclear what to do with ALTER ROLE IN DATABASE SET commands. Should those become part of "pg_dump --create"'s charter? It seems like not, but I'm not certain. Another thing that requires some thought is that pg_dumpall is currently willing to dump ACLs and other properties for template1/template0, though it does not invoke pg_dump on them. If we wanted to preserve that behavior while still moving the code that does those things to pg_dump, pg_dump would have to grow an option that would let it do that. But I'm not sure how much of that behavior is actually sensible. This would probably take a pg_dump archive version bump, since I think we don't currently record enough information for --create to do this (and we can't just cram the extra commands into the DATABASE entry, since we don't know whether --create will be specified to pg_restore). But we've done those before. Thoughts? Is there a better way to look at this? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix checkpoint skip logic on idle systems by tracking LSN progress
On 9/28/16 10:32 PM, Michael Paquier wrote: On Thu, Sep 29, 2016 at 7:45 AM, David Steelewrote: In general I agree with the other comments that this could end up being a problem. On the other hand, since the additional locks are only taken at checkpoint or archive_timeout it may not be that big a deal. Yes, I did some tests on my laptop a couple of months back, that has 4 cores. After reducing NUM_XLOGINSERT_LOCKS from 8 to 4 to increase contention and performing a bunch of INSERT using 4 clients on 4 different relations I could not catch a difference.. Autovacuum was disabled to eliminate any noise. I tried checkpoint_segments at 30s to see its effects, as well as larger values to see the impact with the standby snapshot taken by the bgwriter. Other thoughts are welcome. I don't have any better ideas than that. +++ b/src/backend/postmaster/checkpointer.c + /* OK, it's time to switch */ + elog(LOG, "Request XLog Switch"); LOG level seems a bit much here, perhaps DEBUG1? That's from Horiguchi-san's patch, and those would be definitely better as DEBUG1 by looking at it. Now and in order to keep things simple I think that we had better discard this patch for now. I was planning to come back to this thing anyway once we are done with the first problem. I still see this: +++ b/src/backend/postmaster/checkpointer.c /* OK, it's time to switch */ + elog(LOG, "Request XLog Switch"); Well for now attached are two patches, that could just be squashed into one. Yes, I think that makes sense. More importantly, there is a regression. With your new patch the xlogs are switching on archive_timeout again even with no changes. The v12 worked fine. The differences are all in 0002-hs-checkpoints-v12-2.patch and as far as I can see the patch does not work correctly without these changes. Am I missing something? -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On 09/29/2016 05:41 PM, Heikki Linnakangas wrote: Here's a new patch version, addressing the points you made. Please have a look! Bah, I fumbled the initSlabAllocator() function, attached is a fixed version. - Heikki >From bd74cb9c32b3073637d6932f3b4552598fcdc92a Mon Sep 17 00:00:00 2001 From: Heikki LinnakangasDate: Wed, 14 Sep 2016 17:29:11 +0300 Subject: [PATCH 1/1] Change the way pre-reading in external sort's merge phase works. Don't pre-read tuples into SortTuple slots during merge. Instead, use the memory for larger read buffers in logtape.c. We're doing the same number of READTUP() calls either way, but managing the pre-read SortTuple slots is much more complicated. Also, the on-tape representation is more compact than SortTuples, so we can fit more pre-read tuples into the same amount of memory this way. And we have better cache-locality, when we use just a small number of SortTuple slots. Now that we only hold one tuple from each tape in the SortTuple slots, we can greatly simplify the "batch memory" management. We now maintain a small set of fixed-sized slots, to hold the tuples, and fall back to palloc() for larger tuples. We use this method during all merge phases, not just the final merge, and also when randomAccess is requested, and also in the TSS_SORTEDONTAPE. In other words, it's used whenever we do an external sort. Reviewed by Peter Geoghegan and Claudio Freire. --- src/backend/utils/sort/logtape.c | 153 - src/backend/utils/sort/tuplesort.c | 1208 +--- src/include/utils/logtape.h|1 + 3 files changed, 565 insertions(+), 797 deletions(-) diff --git a/src/backend/utils/sort/logtape.c b/src/backend/utils/sort/logtape.c index 7745207..4152da1 100644 --- a/src/backend/utils/sort/logtape.c +++ b/src/backend/utils/sort/logtape.c @@ -52,12 +52,17 @@ * not clear this helps much, but it can't hurt. (XXX perhaps a LIFO * policy for free blocks would be better?) * + * To further make the I/Os more sequential, we can use a larger buffer + * when reading, and read multiple blocks from the same tape in one go, + * whenever the buffer becomes empty. LogicalTapeAssignReadBufferSize() + * can be used to set the size of the read buffer. + * * To support the above policy of writing to the lowest free block, * ltsGetFreeBlock sorts the list of free block numbers into decreasing * order each time it is asked for a block and the list isn't currently * sorted. This is an efficient way to handle it because we expect cycles * of releasing many blocks followed by re-using many blocks, due to - * tuplesort.c's "preread" behavior. + * the larger read buffer. * * Since all the bookkeeping and buffer memory is allocated with palloc(), * and the underlying file(s) are made with OpenTemporaryFile, all resources @@ -79,6 +84,7 @@ #include "storage/buffile.h" #include "utils/logtape.h" +#include "utils/memutils.h" /* * Block indexes are "long"s, so we can fit this many per indirect block. @@ -131,9 +137,18 @@ typedef struct LogicalTape * reading. */ char *buffer; /* physical buffer (separately palloc'd) */ + int buffer_size; /* allocated size of the buffer */ long curBlockNumber; /* this block's logical blk# within tape */ int pos; /* next read/write position in buffer */ int nbytes; /* total # of valid bytes in buffer */ + + /* + * Desired buffer size to use when reading. To keep things simple, we + * use a single-block buffer when writing, or when reading a frozen + * tape. But when we are reading and will only read forwards, we + * allocate a larger buffer, determined by read_buffer_size. + */ + int read_buffer_size; } LogicalTape; /* @@ -228,6 +243,53 @@ ltsReadBlock(LogicalTapeSet *lts, long blocknum, void *buffer) } /* + * Read as many blocks as we can into the per-tape buffer. + * + * The caller can specify the next physical block number to read, in + * datablocknum, or -1 to fetch the next block number from the internal block. + * If datablocknum == -1, the caller must've already set curBlockNumber. + * + * Returns true if anything was read, 'false' on EOF. + */ +static bool +ltsReadFillBuffer(LogicalTapeSet *lts, LogicalTape *lt, long datablocknum) +{ + lt->pos = 0; + lt->nbytes = 0; + + do + { + /* Fetch next block number (unless provided by caller) */ + if (datablocknum == -1) + { + datablocknum = ltsRecallNextBlockNum(lts, lt->indirect, lt->frozen); + if (datablocknum == -1L) +break; /* EOF */ + lt->curBlockNumber++; + } + + /* Read the block */ + ltsReadBlock(lts, datablocknum, (void *) (lt->buffer + lt->nbytes)); + if (!lt->frozen) + ltsReleaseBlock(lts, datablocknum); + + if (lt->curBlockNumber < lt->numFullBlocks) + lt->nbytes += BLCKSZ; + else + { + /* EOF */ + lt->nbytes += lt->lastBlockBytes; + break; + } + + /* Advance to next block, if we have buffer space left */ + datablocknum = -1; + }
Re: [HACKERS] PL/Python adding support for multi-dimensional arrays
On 27 September 2016 at 14:58, Heikki Linnakangaswrote: > On 09/27/2016 02:04 PM, Dave Cramer wrote: > >> On 26 September 2016 at 14:52, Dave Cramer wrote: >> >>> This crashes with arrays with non-default lower bounds: postgres=# SELECT * FROM test_type_conversion_array_int 4('[2:4]={1,2,3}'); INFO: ([1, 2, ], ) server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Attached patch fixes this bug, and adds a test for it. >>> > I spent some more time massaging this: > > * Changed the loops from iterative to recursive style. I think this indeed > is slightly easier to understand. > > * Fixed another segfault, with too deeply nested lists: > > CREATE or replace FUNCTION test_type_conversion_mdarray_toodeep() RETURNS > int[] AS $$ > return [[1]] > $$ LANGUAGE plpythonu; > > * Also, in PLySequence_ToArray(), we must check that the 'len' of the > array doesn't overflow. > > * Fixed reference leak in the loop in PLySequence_ToArray() to count the > number of dimensions. > > I'd like to see some updates to the docs for this. The manual doesn't currently say anything about multi-dimensional arrays in pl/python, but it should've mentioned that they're not supported. Now that it is supported, should mention that, and explain briefly that a multi-dimensional array is mapped to a python list of lists. If the code passes I'll fix the docs >>> >> > Please do, thanks! > > see attached Dave Cramer da...@postgresintl.com www.postgresintl.com 0002-WIP-Multi-dimensional-arrays-in-PL-python.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Thu, Sep 29, 2016 at 6:04 AM, Robert Haaswrote: > On Wed, Sep 28, 2016 at 3:04 PM, Robert Haas wrote: >> I'll write another email with my thoughts about the rest of the patch. > > I think that the README changes for this patch need a fairly large > amount of additional work. Here are a few things I notice: > > - The confusion between buckets and pages hasn't been completely > cleared up. If you read the beginning of the README, the terminology > is clearly set forth. It says: > >>> A hash index consists of two or more "buckets", into which tuples are >>> placed whenever their hash key maps to the bucket number. Each bucket in >>> the hash index comprises one or more index pages. The bucket's first page >>> is permanently assigned to it when the bucket is created. Additional pages, >>> called "overflow pages", are added if the bucket receives too many tuples >>> to fit in the primary bucket page." > > But later on, you say: > >>> Scan will take a lock in shared mode on the primary bucket or on one of the >>> overflow page. > > So the correct terminology here would be "primary bucket page" not > "primary bucket". > > - In addition, notice that there are two English errors in this > sentence: the word "the" needs to be added to the beginning of the > sentence, and the last word needs to be "pages" rather than "page". > There are a considerable number of similar minor errors; if you can't > fix them, I'll make a pass over it and clean it up. > > - The whole "lock definitions" section seems to me to be pretty loose > and imprecise about what is happening. For example, it uses the term > "split-in-progress" without first defining it. The sentence quoted > above says that scans take a lock in shared mode either on the primary > page or on one of the overflow pages, but it's not to document code by > saying that it will do either A or B without explaining which one! In > fact, I think that a scan will take a content lock first on the > primary bucket page and then on each overflow page in sequence, > retaining a pin on the primary buffer page throughout the scan. So it > is not one or the other but both in a particular sequence, and that > can and should be explained. > > Another problem with this section is that even when it's precise about > what is going on, it's probably duplicating what is or should be in > the following sections where the algorithms for each operation are > explained. In the original wording, this section explains what each > lock protects, and then the following sections explain the algorithms > in the context of those definitions. Now, this section contains a > sketch of the algorithm, and then the following sections lay it out > again in more detail. The question of what each lock protects has > been lost. Here's an attempt at some text to replace what you have > here: > > === > Concurrency control for hash indexes is provided using buffer content > locks, buffer pins, and cleanup locks. Here as elsewhere in > PostgreSQL, cleanup lock means that we hold an exclusive lock on the > buffer and have observed at some point after acquiring the lock that > we hold the only pin on that buffer. For hash indexes, a cleanup lock > on a primary bucket page represents the right to perform an arbitrary > reorganization of the entire bucket, while a cleanup lock on an > overflow page represents the right to perform a reorganization of just > that page. Therefore, scans retain a pin on both the primary bucket > page and the overflow page they are currently scanning, if any. > I don't think we take cleanup lock on overflow page, so I will edit that part. > Splitting a bucket requires a cleanup lock on both the old and new > primary bucket pages. VACUUM therefore takes a cleanup lock on every > bucket page in turn order to remove tuples. It can also remove tuples > copied to a new bucket by any previous split operation, because the > cleanup lock taken on the primary bucket page guarantees that no scans > which started prior to the most recent split can still be in progress. > After cleaning each page individually, it attempts to take a cleanup > lock on the primary bucket page in order to "squeeze" the bucket down > to the minimum possible number of pages. > === > > As I was looking at the old text regarding deadlock risk, I realized > what may be a serious problem. Suppose process A is performing a scan > of some hash index. While the scan is suspended, it attempts to take > a lock and is blocked by process B. Process B, meanwhile, is running > VACUUM on that hash index. Eventually, it will do > LockBufferForCleanup() on the hash bucket on which process A holds a > buffer pin, resulting in an undetected deadlock. In the current > coding, A would hold a heavyweight lock and B would attempt to acquire > a conflicting heavyweight lock, and the deadlock detector would kill > one of them. This patch probably breaks that. I
Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall
On Thu, Sep 29, 2016 at 4:04 AM, Haribabu Kommiwrote: > I am also not sure whether pg_dumpall -g and then individual pg_dump > is the more widely used approach or not? That's the approach I normally recommend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On 09/29/2016 01:52 PM, Peter Geoghegan wrote: * Variables like maxTapes have a meaning that is directly traceable back to Knuth's description of polyphase merge. I don't think that you should do anything to them, on general principle. Ok. I still think that changing maxTapes would make sense, when there are fewer runs than tapes, but that is actually orthogonal to the rest of the patch, so let's discuss that separately. I've changed the patch to not do that. * Everything or almost everything that you've added to mergeruns() should probably be in its own dedicated function. This function can have a comment where you acknowledge that it's not perfectly okay that you claim memory per-tape, but it's simpler and faster overall. Ok. * I think you should be looking at the number of active tapes, and not state->Level or state->currentRun in this new function. Actually, maybe this wouldn't be the exact definition of an active tape that we establish at the beginning of beginmerge() (this considers tapes with dummy runs to be inactive for that merge), but it would look similar. The general concern I have here is that you shouldn't rely on state->Level or state->currentRun from a distance for the purposes of determining which tapes need some batch memory. This is also where you might say something like: "we don't bother with shifting memory around tapes for each merge step, even though that would be fairer. That's why we don't use the beginmerge() definition of activeTapes -- instead, we use our own broader definition of the number of active tapes that doesn't exclude dummy-run-tapes with real runs, making the allocation reasonably sensible for every merge pass". I'm not sure I understood what your concern was, but please have a look at this new version, if the comments in initTapeBuffers() explain that well enough. * The "batchUsed" terminology really isn't working here, AFAICT. For one thing, you have two separate areas where caller tuples might reside: The small per-tape buffers (sized MERGETUPLEBUFFER_SIZE per tape), and the logtape.c controlled preread buffers (sized up to MaxAllocSize per tape). Which of these two things is batch memory? I think it might just be the first one, but KiBs of memory do not suggest "batch" to me. Isn't that really more like what you might call double buffer memory, used not to save overhead from palloc (having many palloc headers in memory), but rather to recycle memory efficiently? So, these two things should have two new names of their own, I think, and neither should be called "batch memory" IMV. I see several assertions remain here and there that were written with my original definition of batch memory in mind -- assertions like: Ok. I replaced "batch" terminology with "slab allocator" and "slab slots", I hope this is less confusing. This isn't exactly like e.g. the slab allocator in the Linux kernel, as it has a fixed number of slots, so perhaps an "object pool" might be more accurate. But I like "slab" because it's not used elsewhere in the system. I also didn't use the term "cache" for the "slots", as might be typical for slab allocators, because "cache" is such an overloaded term. case TSS_SORTEDONTAPE: Assert(forward || state->randomAccess); Assert(!state->batchUsed); (Isn't state->batchUsed always true now?) Good catch. It wasn't, because mergeruns() set batchUsed only after checking for the TSS_SORTEDONTAPE case, even though it set up the batch memory arena before it. So if replacement selection was able to produce a single run batchUsed was false. Fixed, the slab allocator (as it's now called) is now always used in TSS_SORTEDONTAPE case. And: case TSS_FINALMERGE: Assert(forward); Assert(state->batchUsed || !state->tuples); (Isn't state->tuples only really of interest to datum-tuple-case routines, now that you've made everything use large logtape.c preread buffers?) Yeah, fixed. * Is is really necessary for !state->tuples cases to do that MERGETUPLEBUFFER_SIZE-based allocation? In other words, what need is there for pass-by-value datum cases to have this scratch/double buffer memory at all, since the value is returned to caller by-value, not by-reference? This is related to the problem of it not being entirely clear what batch memory now is, I think. True, fixed. I still set slabAllocatorUsed (was batchUsed), but it's initialized as a dummy 0-slot arena when !state->tuples. Here's a new patch version, addressing the points you made. Please have a look! - Heikki >From a958cea32550825aa0ea487f58ac87c2c3620cda Mon Sep 17 00:00:00 2001 From: Heikki LinnakangasDate: Wed, 14 Sep 2016 17:29:11 +0300 Subject: [PATCH 1/1] Change the way pre-reading in external sort's merge phase works. Don't pre-read tuples into SortTuple slots during merge. Instead, use the memory for larger read buffers in logtape.c. We're doing the same number
Re: [HACKERS] Bug in to_timestamp().
Robert Haaswrites: > On Thu, Sep 29, 2016 at 4:54 AM, amul sul wrote: >> Commitfest status left untouched, I am not sure how to deal with >> "Returned With Feedback" patch. Is there any chance that, this can be >> considered again for committer review? > You may be able to set the status back to "Ready for Committer", or > else you can move the entry to the next CommitFest and do it there. I already switched it back to "Needs Review". AFAIK none of the statuses are immovable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Christoph Bergwrites: > Re: Tom Lane 2016-09-29 <16946.1475157...@sss.pgh.pa.us> >> Personally I'm also on board with using this for regression testing: >> log_line_prefix = '%t [%p] %q%a ' >> but I doubt it can be sold as a general-purpose default. > I don't think it makes much sense to log only %a unconditionally, Right; this is helpful for the regression tests, now that Peter has set up pg_regress to set the application name, but I can't see trying to push it on the rest of the world. > Possibly the longer version could be added as an example in the > documentation. I suspect that simply having a nonempty default in the first place is going to do more to raise peoples' awareness than anything we could do in the documentation. But perhaps an example along these lines would be useful for showing proper use of %q. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 10:14 AM, Tomas Vondrawrote: >> It's not impossible that the longer runs could matter - performance >> isn't necessarily stable across time during a pgbench test, and the >> longer the run the more CLOG pages it will fill. > > Sure, but I'm not doing just a single pgbench run. I do a sequence of > pgbench runs, with different client counts, with ~6h of total runtime. > There's a checkpoint in between the runs, but as those benchmarks are on > unlogged tables, that flushes only very few buffers. > > Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M > transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 > clients or more, which means we fill the 128 pages in less than 80 seconds. > > So half-way through the run only 50% of clog pages fits into the SLRU, and > we have a data set with 30M tuples, with uniform random access - so it seems > rather unlikely we'll get transaction that's still in the SLRU. > > But sure, I can do a run with larger data set to verify this. OK, another theory: Dilip is, I believe, reinitializing for each run, and you are not. Maybe somehow the effect Dilip is seeing only happens with a newly-initialized set of pgbench tables. For example, maybe the patches cause a huge improvement when all rows have the same XID, but the effect fades rapidly once the XIDs spread out... I'm not saying any of what I'm throwing out here is worth the electrons upon which it is printed, just that there has to be some explanation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On Thu, Sep 29, 2016 at 4:54 AM, amul sulwrote: > On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane wrote: >> Artur Zakirov writes: >>> - now DCH_cache_getnew() is called after parse_format(). Because now >>> parse_format() can raise an error and in the next attempt >>> DCH_cache_search() could return broken cache entry. >> >> I started looking at your 0001-to-timestamp-format-checking-v4.patch >> and this point immediately jumped out at me. Currently the code relies >> ... without any documentation ... on no elog being thrown out of >> parse_format(). That's at the very least trouble waiting to happen. >> There's a hack to deal with errors from within the NUMDesc_prepare >> subroutine, but it's a pretty ugly and underdocumented hack. And what >> you had here was randomly different from that solution, too. >> >> After a bit of thought it seemed to me that a much cleaner fix is to add >> a "valid" flag to the cache entries, which we can leave clear until we >> have finished parsing the new format string. That avoids adding extra >> data copying as you suggested, removes the need for PG_TRY, and just >> generally seems cleaner and more bulletproof. >> >> I've pushed a patch that does it that way. The 0001 patch will need >> to be rebased over that (might just require removal of some hunks, >> not sure). >> >> I also pushed 0002-to-timestamp-validation-v2.patch with some revisions >> (it'd broken acceptance of BC dates, among other things, but I think >> I fixed everything). >> >> Since you told us earlier that you'd be on vacation through the end of >> September, I'm assuming that nothing more will happen on this patch during >> this commitfest, so I will mark the CF entry Returned With Feedback. > > Behalf of Artur I've rebased patch, removed hunk dealing with broken > cache entries by copying it, which is no longer required after 83bed06 > commit. > > Commitfest status left untouched, I am not sure how to deal with > "Returned With Feedback" patch. Is there any chance that, this can be > considered again for committer review? You may be able to set the status back to "Ready for Committer", or else you can move the entry to the next CommitFest and do it there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Re: Tom Lane 2016-09-29 <16946.1475157...@sss.pgh.pa.us> > Robert Haaswrites: > > As long as we get %t and %p in there we're going to be way ahead, really. > > Could we get consensus on just changing the default to > > log_line_prefix = '%t [%p] ' > > and leaving the rest out of it? I think pretty much everybody agrees > that those fields are useful, whereas the rest of it is a lot more > context-dependent. (For example, there are a whole lot of real > installations where neither %u nor %d would be worth the log space.) Nod. In many installations %u and %d will be effectively constants. > Personally I'm also on board with using this for regression testing: > > log_line_prefix = '%t [%p] %q%a ' > > but I doubt it can be sold as a general-purpose default. I don't think it makes much sense to log only %a unconditionally, except maybe for making more applications actually set it. If we were to add more fields I'd go for log_line_prefix = '%t [%p] %q%u@%d/%a ' but the above-proposed '%t [%p] ' is already fixing 95% of the problem (and the remaining 5% are unclear). Possibly the longer version could be added as an example in the documentation. So, in short, +1. Christoph signature.asc Description: PGP signature
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 09/29/2016 03:47 PM, Robert Haas wrote: On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondrawrote: I feel like we must be missing something here. If Dilip is seeing huge speedups and you're seeing nothing, something is different, and we don't know what it is. Even if the test case is artificial, it ought to be the same when one of you runs it as when the other runs it. Right? Yes, definitely - we're missing something important, I think. One difference is that Dilip is using longer runs, but I don't think that's a problem (as I demonstrated how stable the results are). It's not impossible that the longer runs could matter - performance isn't necessarily stable across time during a pgbench test, and the longer the run the more CLOG pages it will fill. Sure, but I'm not doing just a single pgbench run. I do a sequence of pgbench runs, with different client counts, with ~6h of total runtime. There's a checkpoint in between the runs, but as those benchmarks are on unlogged tables, that flushes only very few buffers. Also, the clog SLRU has 128 pages, which is ~1MB of clog data, i.e. ~4M transactions. On some kernels (3.10 and 3.12) I can get >50k tps with 64 clients or more, which means we fill the 128 pages in less than 80 seconds. So half-way through the run only 50% of clog pages fits into the SLRU, and we have a data set with 30M tuples, with uniform random access - so it seems rather unlikely we'll get transaction that's still in the SLRU. But sure, I can do a run with larger data set to verify this. I wonder what CPU model is Dilip using - I know it's x86, but not which generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer model and it makes a difference (although that seems unlikely). The fact that he's using an 8-socket machine seems more likely to matter than the CPU generation, which isn't much different. Maybe Dilip should try this on a 2-socket machine and see if he sees the same kinds of results. Maybe. I wouldn't expect a major difference between 4 and 8 sockets, but I may be wrong. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Re: Peter Eisentraut 2016-09-29 <21d2719f-36ff-06d2-5856-25ed48b96...@2ndquadrant.com> > > Christoph/Debian: > > log_line_prefix = '%t [%p-%l] %q%u@%d ' > > Peter: > > log_line_prefix = '%t [%p]: [%l] %qapp=%a ' > > I'm aware of two existing guidelines on log line formats: syslog and > pgbadger. Syslog output looks like this: > > Sep 28 00:58:56 hostname syslogd[46]: some text here > > pgbadger by default asks for this: > > log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' > > I don't know why it wants that "-1" there, and I'm actually not sure > what the point of %l is in practice. Those are separate issues that are > having their own lively discussions at times. I could drop the [%l] > from my proposal if that causes concerns. [%l-1] is originally from pgfouine, and I vaguely remember that it used to be something like [%l-%c] where %c was some extra line numbering removed in later (7.something?) PG versions. In any case, the -1 isn't useful. I'm happy to remove %l as well. Log lines won't be out of order anyway, and one needs to look at %p anyway to correlate them. %l doesn't help there. Christoph signature.asc Description: PGP signature
Re: [HACKERS] Set log_line_prefix and application name in test drivers
Robert Haaswrites: > As long as we get %t and %p in there we're going to be way ahead, really. Could we get consensus on just changing the default to log_line_prefix = '%t [%p] ' and leaving the rest out of it? I think pretty much everybody agrees that those fields are useful, whereas the rest of it is a lot more context-dependent. (For example, there are a whole lot of real installations where neither %u nor %d would be worth the log space.) Personally I'm also on board with using this for regression testing: log_line_prefix = '%t [%p] %q%a ' but I doubt it can be sold as a general-purpose default. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Thu, Sep 29, 2016 at 6:52 AM, Peter Geogheganwrote: >> How is it awkward? > > Maybe that was the wrong choice of words. What I mean is that it seems > somewhat unprincipled to give over an equal share of memory to each > active-at-least-once tape, ... I don't get it. If the memory is being used for prereading, then the point is just to avoid doing many small I/Os instead of one big I/O, and that goal will be accomplished whether the memory is equally distributed or not; indeed, it's likely to be accomplished BETTER if the memory is equally distributed than if it isn't. I can imagine that there might be a situation in which it makes sense to give a bigger tape more resources than a smaller one; for example, if one were going to divide N tapes across K worker processess and make individual workers or groups of workers responsible for sorting particular tapes, one would of course want to divide up the data across the available processes relatively evenly, rather than having some workers responsible for only a small amount of data and others for a very large amount of data. But such considerations are irrelevant here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Set log_line_prefix and application name in test drivers
On Wed, Sep 28, 2016 at 10:30 PM, Peter Eisentrautwrote: > On 9/28/16 6:13 PM, Robert Haas wrote: >> Christoph/Debian: >> log_line_prefix = '%t [%p-%l] %q%u@%d ' >> Peter: >> log_line_prefix = '%t [%p]: [%l] %qapp=%a ' > > I'm aware of two existing guidelines on log line formats: syslog and > pgbadger. Syslog output looks like this: > > Sep 28 00:58:56 hostname syslogd[46]: some text here > > pgbadger by default asks for this: > > log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h ' > > I don't know why it wants that "-1" there, and I'm actually not sure > what the point of %l is in practice. Those are separate issues that are > having their own lively discussions at times. I could drop the [%l] > from my proposal if that causes concerns. > > On balance, I think my proposal is more in line with existing > wide-spread conventions. As long as we get %t and %p in there we're going to be way ahead, really. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Wed, Sep 28, 2016 at 9:10 PM, Tomas Vondrawrote: >> I feel like we must be missing something here. If Dilip is seeing >> huge speedups and you're seeing nothing, something is different, and >> we don't know what it is. Even if the test case is artificial, it >> ought to be the same when one of you runs it as when the other runs >> it. Right? >> > Yes, definitely - we're missing something important, I think. One difference > is that Dilip is using longer runs, but I don't think that's a problem (as I > demonstrated how stable the results are). It's not impossible that the longer runs could matter - performance isn't necessarily stable across time during a pgbench test, and the longer the run the more CLOG pages it will fill. > I wonder what CPU model is Dilip using - I know it's x86, but not which > generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer > model and it makes a difference (although that seems unlikely). The fact that he's using an 8-socket machine seems more likely to matter than the CPU generation, which isn't much different. Maybe Dilip should try this on a 2-socket machine and see if he sees the same kinds of results. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgresql infinite loop
=?GB2312?B?ufkg08I=?=writes: > I have a postgresql infinite problem. > when inserting data, a postgresql process involve to infinite loop and cpu > usage is 100%. perf top show that LWacquirelock is most costful. > Callstack is below. Given that stack trace, I'd have to guess that this is caused by a circular chain of right-links in a btree index. How it got that way is hard to say, but reindexing the index ought to fix it. If you can show a recipe for producing an index that's broken this way from a standing start, we'd be very interested. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] postgresql infinite loop
I have a postgresql infinite problem. when inserting data, a postgresql process involve to infinite loop and cpu usage is 100%. perf top show that LWacquirelock is most costful. Callstack is below. Other processes are normal and can process insert operation. But after some minutes, one another process involve to infinite loop.postgresql I can reproduce the problem easily. About an hour, one of processes go into abnormal process firstly postgresql version is 9.2.15 postgresql is running in a container which os is centos 7.2.1511. Os of physical machine is centos7.1.1503 #0 UnpinBuffer (buf=0x7fb47d11bdf8, fixOwner=) at bufmgr.c:1201 #1 0x0063947d in ReleaseAndReadBuffer (buffer=buffer@entry=2463, relation=relation@entry=0x7fb47cc58f98, blockNum=423) at bufmgr.c:1089 #2 0x00482b6f in _bt_relandgetbuf (rel=rel@entry=0x7fb47cc58f98, obuf=obuf@entry=2463, blkno=, access=access@entry=1) at nbtpage.c:639 #3 0x004863d8 in _bt_moveright (rel=rel@entry=0x7fb47cc58f98, buf=2463, keysz=keysz@entry=1, scankey=scankey@entry=0x178a350, nextkey=nextkey@entry=0 '\000', access=access@entry=1) at nbtsearch.c:194 #4 0x0048669f in _bt_search (rel=rel@entry=0x7fb47cc58f98, keysz=keysz@entry=1, scankey=scankey@entry=0x178a350, nextkey=nextkey@entry=0 '\000', bufP=bufP@entry=0x7fff0e5f8064, access=access@entry=2) at nbtsearch.c:86 #5 0x0048249f in _bt_doinsert (rel=rel@entry=0x7fb47cc58f98, itup=itup@entry=0x1788200, checkUnique=checkUnique@entry=UNIQUE_CHECK_YES, heapRel=heapRel@entry=0x7fb47cc581d8) at nbtinsert.c:118 #6 0x0048520d in btinsert (fcinfo=) at nbtree.c:257 #7 0x00728cfa in FunctionCall6Coll (flinfo=flinfo@entry=0x1758040, collation=collation@entry=0, arg1=arg1@entry=140413164162968, arg2=arg2@entry=140733434529104, arg3=arg3@entry=140733434529360, arg4=arg4@entry=24658596, arg5=arg5@entry=140413164159448, arg6=1) at fmgr.c:1440 #8 0x0047f109 in index_insert (indexRelation=indexRelation@entry=0x7fb47cc58f98, values=values@entry=0x7fff0e5f8550, isnull=isnull@entry=0x7fff0e5f8650 "", heap_t_ctid=heap_t_ctid@entry=0x17842a4, heapRelation=heapRelation@entry=0x7fb47cc581d8, checkUnique=checkUnique@entry=UNIQUE_CHECK_YES) at indexam.c:216 #9 0x0058db1d in ExecInsertIndexTuples (slot=slot@entry=0x1751b30, tupleid=tupleid@entry=0x17842a4, estate=estate@entry=0x1750fc0) at execUtils.c:1089 #10 0x0059adc7 in ExecInsert (canSetTag=1 '\001', estate=0x1750fc0, planSlot=0x1751b30, slot=0x1751b30) at nodeModifyTable.c:248 #11 ExecModifyTable (node=node@entry=0x17511e0) at nodeModifyTable.c:854 #12 0x00584478 in ExecProcNode (node=node@entry=0x17511e0) at execProcnode.c:376 #13 0x00581c40 in ExecutePlan (dest=0x176f238, direction=, numberTuples=0, sendTuples=0 '\000', operation=CMD_INSERT, planstate=0x17511e0, estate=0x1750fc0) at execMain.c:1411 #14 standard_ExecutorRun (queryDesc=0x1758850, direction=, count=0) at execMain.c:315 #15 0x0065e3ae in ProcessQuery (plan=, sourceText=0x177ba80 "INSERT INTO token (id, expires, extra, valid, user_id, trust_id) VALUES ('d810c3f542414cfa86778249be70ab93', '2016-09-26T06:32:46.097976'::timestamp, '{\"token_data\": {\"token\": {\"methods\": [\"password\"]"..., params=0x0, dest=0x176f238, completionTag=0x7fff0e5f8c80 "") at pquery.c:185 #16 0x0065e5dd in PortalRunMulti (portal=portal@entry=0x1638380, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x176f238, altdest=altdest@entry=0x176f238, completionTag=completionTag@entry=0x7fff0e5f8c80 "") at pquery.c:1275 #17 0x0065f1a8 in PortalRun (portal=0x1638380, count=9223372036854775807, isTopLevel=, dest=0x176f238, altdest=0x176f238, completionTag=0x7fff0e5f8c80 "") at pquery.c:812 #18 0x0065b100 in PostgresMain (argc=, argv=, dbname=, username=) at postgres.c:1073 #19 0x006182c6 in BackendRun (port=0x163e9a0) at postmaster.c:3813 #20 BackendStartup (port=0x163e9a0) at postmaster.c:3484 #21 ServerLoop () at postmaster.c:1497 #22 0x00619077 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x1616690) at postmaster.c:1226 #23 0x0045bb45 in main (argc=1, argv=0x1616690) at main.c:230
Re: [HACKERS] WAL logging problem in 9.4.3?
Hello, At Thu, 29 Sep 2016 16:59:55 +0900, Michael Paquierwrote in > On Mon, Sep 26, 2016 at 5:03 PM, Kyotaro HORIGUCHI > wrote: > > Hello, I return to this before my things:) > > > > Though I haven't played with the patch yet.. > > Be sure to run the test cases in the patch or base your tests on them then! All items of 006_truncate_opt fail on ed0b228 and they are fixed with the patch. > > Though I don't know how it actually impacts the perfomance, it > > seems to me that we can live with truncated_to and sync_above in > > RelationData and BufferNeedsWAL(rel, buf) instead of > > HeapNeedsWAL(rel, buf). Anyway up to one entry for one relation > > seems to exist at once in the hash. > > TBH, I still think that the design of this patch as proposed is pretty > cool and easy to follow. It is clean from certain viewpoint but additional hash, especially hash-searching on every HeapNeedsWAL seems to me to be unacceptable. Do you see it accetable? The attached patch is quiiiccck-and-dirty-hack of Michael's patch just as a PoC of my proposal quoted above. This also passes the 006 test. The major changes are the following. - Moved sync_above and truncted_to into RelationData. - Cleaning up is done in AtEOXact_cleanup instead of explicit calling to smgrDoPendingSyncs(). * BufferNeedsWAL (replace of HeapNeedsWAL) no longer requires hash_search. It just refers to the additional members in the given Relation. X I feel that I have dropped one of the features of the origitnal patch during the hack, but I don't recall it clearly now:( X I haven't consider relfilenode replacement, which didn't matter for the original patch. (but there's few places to consider). What do you think about this? regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 38bba16..02e33cc 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -34,6 +34,28 @@ * the POSTGRES heap access method used for all POSTGRES * relations. * + * WAL CONSIDERATIONS + * All heap operations are normally WAL-logged. but there are a few + * exceptions. Temporary and unlogged relations never need to be + * WAL-logged, but we can also skip WAL-logging for a table that was + * created in the same transaction, if we don't need WAL for PITR or + * WAL archival purposes (i.e. if wal_level=minimal), and we fsync() + * the file to disk at COMMIT instead. + * + * The same-relation optimization is not employed automatically on all + * updates to a table that was created in the same transacton, because + * for a small number of changes, it's cheaper to just create the WAL + * records than fsyncing() the whole relation at COMMIT. It is only + * worthwhile for (presumably) large operations like COPY, CLUSTER, + * or VACUUM FULL. Use heap_register_sync() to initiate such an + * operation; it will cause any subsequent updates to the table to skip + * WAL-logging, if possible, and cause the heap to be synced to disk at + * COMMIT. + * + * To make that work, all modifications to heap must use + * HeapNeedsWAL() to check if WAL-logging is needed in this transaction + * for the given block. + * *- */ #include "postgres.h" @@ -55,6 +77,7 @@ #include "access/xlogutils.h" #include "catalog/catalog.h" #include "catalog/namespace.h" +#include "catalog/storage.h" #include "miscadmin.h" #include "pgstat.h" #include "storage/bufmgr.h" @@ -2331,12 +2354,6 @@ FreeBulkInsertState(BulkInsertState bistate) * The new tuple is stamped with current transaction ID and the specified * command ID. * - * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not - * logged in WAL, even for a non-temp relation. Safe usage of this behavior - * requires that we arrange that all new tuples go into new pages not - * containing any tuples from other transactions, and that the relation gets - * fsync'd before commit. (See also heap_sync() comments) - * * The HEAP_INSERT_SKIP_FSM option is passed directly to * RelationGetBufferForTuple, which see for more info. * @@ -2440,7 +2457,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid, MarkBufferDirty(buffer); /* XLOG stuff */ - if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation)) + if (BufferNeedsWAL(relation, buffer)) { xl_heap_insert xlrec; xl_heap_header xlhdr; @@ -2639,12 +2656,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, int ndone; char *scratch = NULL; Page page; - bool needwal; Size saveFreeSpace; bool need_tuple_data = RelationIsLogicallyLogged(relation); bool need_cids = RelationIsAccessibleInLogicalDecoding(relation); - needwal
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On 2016/09/29 20:03, Ashutosh Bapat wrote: On Tue, Apr 5, 2016 at 10:54 AM, Amit Langotewrote: How about the attached that teaches extract_query_dependencies() to add a foreign table and associated foreign data wrapper and foreign server to invalItems. Also, it adds plan cache callbacks for respective caches. Although this is a better solution than the previous one, it invalidates the query tree along with the generic plan. Invalidating the query tree and the generic plan required parsing the query again and generating the plan. Instead I think, we should invalidate only the generic plan and not the query tree. Agreed. The attached patch adds the dependencies from create_foreignscan_plan() which is called for any foreign access. I have also added testcases to test the functionality. Let me know your comments on the patch. Hmm. I'm not sure that that's a good idea. I was thinking the changes to setrefs.c proposed by Amit to collect that dependencies would be probably OK, but I wasn't sure that it's a good idea that he used PlanCacheFuncCallback as the syscache inval callback function for the foreign object caches because it invalidates not only generic plans but query trees, as you mentioned downthread. So, I was thinking to modify his patch so that we add a new syscache inval callback function for the caches that is much like PlanCacheFuncCallback but only invalidates generic plans. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On 2016/09/29 20:06, Ashutosh Bapat wrote: On Wed, Aug 24, 2016 at 1:55 PM, Etsuro Fujitawrote: On 2016/04/04 23:24, Tom Lane wrote: A related issue, now that I've seen this example, is that altering FDW-level or server-level options won't cause a replan either. I'm not sure there's any very good fix for that. Surely we don't want to try to identify all tables belonging to the FDW or server and issue relcache invals on all of them. While improving join pushdown in postgres_fdw, I happened to notice that the fetch_size option in 9.6 has the same issue. A forced replan discussed above would fix that issue, but I'm not sure that that's a good idea, because the fetch_size option, which postgres_fdw gets at GetForeignRelSize, is not used for anything in creating a plan. So, as far as the fetch_size option is concerned, a better solution is (1) to consult that option at execution time, probably at BeginForeignScan, not at planning time such as GetForiegnRelSize (and (2) to not cause that replan when altering that option.) Maybe I'm missing something, though. As explained in my latest patch, an FDW knows which of the options, when changed, render a plan invalid. If we have to track the changes for each option, an FDW will need to consulted to check whether that options affects the plan or not. That may be an overkill and there is high chance that the FDW authors wouldn't bother implementing that hook. I thought it would be better to track that dependencies to avoid useless replans, but I changed my mind; I agree with Amit's approach. In addition to what you mentioned, ISTM that users don't change such options frequently, so I think Amit's approach is much practical. Let me know your views on my latest patch on this thread. OK, will do. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Push down more full joins in postgres_fdw
On 2016/09/28 18:35, Etsuro Fujita wrote: Attached is an updated version of the patch. I found a minor bug in that patch; the relation_index added to PgFdwRelationInfo was defined as Index, but I used the modifier %d to print that. So, I changed the data type to int. Also, I added a bit more comments. Please find attached an updated version of the patch. Best regards, Etsuro Fujita postgres-fdw-more-full-join-pushdown-v5.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On Wed, Aug 24, 2016 at 1:55 PM, Etsuro Fujitawrote: > On 2016/04/04 23:24, Tom Lane wrote: >> >> Amit Langote writes: >>> >>> On 2016/04/04 15:17, Rajkumar Raghuwanshi wrote: * .Observation: *Prepare statement execution plan is not getting changed even after altering foreign table to point to new schema. > > >>> I wonder if performing relcache invalidation upon ATExecGenericOptions() >>> is the correct solution for this problem. The attached patch fixes the >>> issue for me. > > >> A forced relcache inval will certainly fix it, but I'm not sure if that's >> the best (or only) place to put it. > > >> A related issue, now that I've seen this example, is that altering >> FDW-level or server-level options won't cause a replan either. I'm >> not sure there's any very good fix for that. Surely we don't want >> to try to identify all tables belonging to the FDW or server and >> issue relcache invals on all of them. > > > While improving join pushdown in postgres_fdw, I happened to notice that the > fetch_size option in 9.6 has the same issue. A forced replan discussed > above would fix that issue, but I'm not sure that that's a good idea, > because the fetch_size option, which postgres_fdw gets at GetForeignRelSize, > is not used for anything in creating a plan. So, as far as the fetch_size > option is concerned, a better solution is (1) to consult that option at > execution time, probably at BeginForeignScan, not at planning time such as > GetForiegnRelSize (and (2) to not cause that replan when altering that > option.) Maybe I'm missing something, though. As explained in my latest patch, an FDW knows which of the options, when changed, render a plan invalid. If we have to track the changes for each option, an FDW will need to consulted to check whether that options affects the plan or not. That may be an overkill and there is high chance that the FDW authors wouldn't bother implementing that hook. Let me know your views on my latest patch on this thread. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On Tue, Apr 5, 2016 at 10:54 AM, Amit Langotewrote: > On 2016/04/05 0:23, Tom Lane wrote: >> Amit Langote writes: >>> On Mon, Apr 4, 2016 at 11:24 PM, Tom Lane wrote: A related issue, now that I've seen this example, is that altering FDW-level or server-level options won't cause a replan either. I'm not sure there's any very good fix for that. Surely we don't want to try to identify all tables belonging to the FDW or server and issue relcache invals on all of them. >> >>> Hm, some kind of PlanInvalItem-based solution could work maybe? >> >> Hm, so we'd expect that whenever an FDW consulted the options while >> making a plan, it'd have to record a plan dependency on them? That >> would be a clean fix maybe, but I'm worried that third-party FDWs >> would fail to get the word about needing to do this. > > I would imagine that that level of granularity may be a little too much; I > mean tracking dependencies at the level of individual FDW/foreign > table/foreign server options. I think it should really be possible to do > the entire thing in core instead of requiring this to be made a concern of > FDW authors. How about the attached that teaches > extract_query_dependencies() to add a foreign table and associated foreign > data wrapper and foreign server to invalItems. Also, it adds plan cache > callbacks for respective caches. Although this is a better solution than the previous one, it invalidates the query tree along with the generic plan. Invalidating the query tree and the generic plan required parsing the query again and generating the plan. Instead I think, we should invalidate only the generic plan and not the query tree. The attached patch adds the dependencies from create_foreignscan_plan() which is called for any foreign access. I have also added testcases to test the functionality. Let me know your comments on the patch. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index d97e694..7ca1102 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -2480,26 +2480,114 @@ EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st5('foo', 1); Filter: (t1.c8 = $1) Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" = $1::integer)) (4 rows) EXECUTE st5('foo', 1); c1 | c2 | c3 | c4 |c5| c6 | c7 | c8 ++---+--+--+++- 1 | 1 | 1 | Fri Jan 02 00:00:00 1970 PST | Fri Jan 02 00:00:00 1970 | 1 | 1 | foo (1 row) +-- changing foreign table option should change the plan +PREPARE st6 AS SELECT * FROM ft4; +PREPARE st7 AS UPDATE ft4 SET c1 = c1; +PREPARE st8 AS UPDATE ft4 SET c1 = random(); +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6; +QUERY PLAN +-- + Foreign Scan on public.ft4 + Output: c1, c2, c3 + Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 3" +(3 rows) + +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7; + QUERY PLAN + + Update on public.ft4 + -> Foreign Update on public.ft4 + Remote SQL: UPDATE "S 1"."T 3" SET c1 = c1 +(3 rows) + +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st8; + QUERY PLAN +- + Update on public.ft4 + Remote SQL: UPDATE "S 1"."T 3" SET c1 = $2 WHERE ctid = $1 + -> Foreign Scan on public.ft4 + Output: random(), c2, c3, ctid + Remote SQL: SELECT c2, c3, ctid FROM "S 1"."T 3" FOR UPDATE +(5 rows) + +ALTER FOREIGN TABLE ft4 OPTIONS (SET table_name 'T 4'); +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st6; +QUERY PLAN +-- + Foreign Scan on public.ft4 + Output: c1, c2, c3 + Remote SQL: SELECT c1, c2, c3 FROM "S 1"."T 4" +(3 rows) + +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st7; + QUERY PLAN + + Update on public.ft4 + -> Foreign Update on public.ft4 + Remote SQL: UPDATE "S 1"."T 4" SET c1 = c1 +(3 rows) + +EXPLAIN (VERBOSE, COSTS OFF) EXECUTE st8; + QUERY PLAN +- + Update on public.ft4 + Remote SQL: UPDATE "S 1"."T 4" SET c1 = $2 WHERE ctid = $1 + -> Foreign Scan on public.ft4 + Output: random(), c2, c3, ctid + Remote SQL: SELECT c2, c3, ctid FROM "S
Re: [HACKERS] PATCH: Exclude additional directories in pg_basebackup
On 9/28/16 9:55 PM, Peter Eisentraut wrote: > On 9/28/16 2:45 AM, Michael Paquier wrote: >> After all that fixed, I have moved the patch to "Ready for Committer". >> Please use the updated patch though. > > Committed after some cosmetic changes. Thank you, Peter! -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On Thu, Sep 29, 2016 at 10:49 AM, Heikki Linnakangaswrote: >> Do I have that right? If so, this seems rather awkward. Hmm. > > > How is it awkward? Maybe that was the wrong choice of words. What I mean is that it seems somewhat unprincipled to give over an equal share of memory to each active-at-least-once tape, regardless of how much that matters in practice. One tape could have several runs in memory at once, while another only has a fraction of a single much larger run. Maybe this is just the first time that I find myself on the *other* side of a discussion about an algorithm that seems brute-force compared to what it might replace, but is actually better overall. :-) Now, that could be something that I just need to get over. In any case, I still think: * Variables like maxTapes have a meaning that is directly traceable back to Knuth's description of polyphase merge. I don't think that you should do anything to them, on general principle. * Everything or almost everything that you've added to mergeruns() should probably be in its own dedicated function. This function can have a comment where you acknowledge that it's not perfectly okay that you claim memory per-tape, but it's simpler and faster overall. * I think you should be looking at the number of active tapes, and not state->Level or state->currentRun in this new function. Actually, maybe this wouldn't be the exact definition of an active tape that we establish at the beginning of beginmerge() (this considers tapes with dummy runs to be inactive for that merge), but it would look similar. The general concern I have here is that you shouldn't rely on state->Level or state->currentRun from a distance for the purposes of determining which tapes need some batch memory. This is also where you might say something like: "we don't bother with shifting memory around tapes for each merge step, even though that would be fairer. That's why we don't use the beginmerge() definition of activeTapes -- instead, we use our own broader definition of the number of active tapes that doesn't exclude dummy-run-tapes with real runs, making the allocation reasonably sensible for every merge pass". * The "batchUsed" terminology really isn't working here, AFAICT. For one thing, you have two separate areas where caller tuples might reside: The small per-tape buffers (sized MERGETUPLEBUFFER_SIZE per tape), and the logtape.c controlled preread buffers (sized up to MaxAllocSize per tape). Which of these two things is batch memory? I think it might just be the first one, but KiBs of memory do not suggest "batch" to me. Isn't that really more like what you might call double buffer memory, used not to save overhead from palloc (having many palloc headers in memory), but rather to recycle memory efficiently? So, these two things should have two new names of their own, I think, and neither should be called "batch memory" IMV. I see several assertions remain here and there that were written with my original definition of batch memory in mind -- assertions like: case TSS_SORTEDONTAPE: Assert(forward || state->randomAccess); Assert(!state->batchUsed); (Isn't state->batchUsed always true now?) And: case TSS_FINALMERGE: Assert(forward); Assert(state->batchUsed || !state->tuples); (Isn't state->tuples only really of interest to datum-tuple-case routines, now that you've made everything use large logtape.c preread buffers?) * Is is really necessary for !state->tuples cases to do that MERGETUPLEBUFFER_SIZE-based allocation? In other words, what need is there for pass-by-value datum cases to have this scratch/double buffer memory at all, since the value is returned to caller by-value, not by-reference? This is related to the problem of it not being entirely clear what batch memory now is, I think. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pg_basebackup stream xlog to tar
On Mon, Sep 5, 2016 at 10:01 AM, Michael Paquierwrote: > On Sat, Sep 3, 2016 at 10:35 PM, Magnus Hagander > wrote: > > Ugh. That would be nice to have, but I think that's outside the scope of > > this patch. > > A test for this patch that could have value would be to use > pg_basebackup -X stream -Ft, then untar pg_xlog.tar and look at the > size of the segments. If you have an idea to untar something without > the in-core perl support because we need to have the TAP stuff able to > run on at least 5.8.8, I'd welcome an idea. Honestly I still have > none, and that's why the recovery tests do not use tarballs in their > tests when using pg_basebackup. In short let's not add something more > for this patch. > > > PFA is an updated version of this patch that: > > * documents a magic value passed to zlib (which is in their > documentation as > > being a magic value, but has no define) > > * fixes the padding of tar files > > * adds a most basic test that the -X stream -Ft does produce a tarfile > > So I had a more serious look at this patch, and it basically makes > more generic the operations done for the plain mode by adding a set of > routines that can be used by both tar and plain mode to work on the > WAL files streamed. Elegant. > > + > +The transaction log files will be written to > + the base.tar file. > + > Nit: number of spaces here. > Fixed. > -mark_file_as_archived(const char *basedir, const char *fname) > +mark_file_as_archived(StreamCtl *stream, const char *fname) > Just passing WalMethod as argument would be enough, but... My patch > adding the fsync calls to pg_basebackup could just make use of > StreamCtl, so let's keep it as you suggest. > Yeah, I think it's cleaner to pass the whole structure around really. If not now, we'd need it eventually. That makes all callers more consistent. > static bool > existsTimeLineHistoryFile(StreamCtl *stream) > { > [...] > + return stream->walmethod->existsfile(histfname); > } > existsfile returns always false for the tar method. This does not > matter much because pg_basebackup exists immediately in case of a > failure, but I think that this deserves a comment in ReceiveXlogStream > where existsTimeLineHistoryFile is called. > OK, added. As you say, the behaviour is expected, but it makes sense to mention it clealry there. > I find the use of existsfile() in open_walfile() rather confusing > because this relies on the fact that existsfile() returns always > false for the tar mode. We could add an additional field in WalMethod > to store the method type and use that more, but that may make the code > more confusing than what you propose. What do you think? > Yeah, I'm not sure that helps. The point is that the abstraction is supposed to take care of that. But if it's confusing, then clearly a comment is warranted there, so I've added that. Do you think that makes it clear enough? > > + int (*unlink) (const char *pathname); > The unlink method is used nowhere. This could just be removed. > That's clearly a missed cleanup. Removed, thanks. > -static void > +void > print_tar_number(char *s, int len, uint64 val) > This could be an independent patch. Or not. > Could be, but we don't really have any other uses for it. > I think that I found another bug regarding the contents of the > segments. I did pg_basebackup -F t -X stream, then untared pg_xlog.tar > which contained segment 1/0/2, then: > $ pg_xlogdump 00010002 > pg_xlogdump: FATAL: could not find a valid record after 0/200 > I'd expect this segment to have records, up to a XLOG_SWITCH record. > Ugh. That's definitely broken yes. It seeked back and overwrote the tar header with the data, instead of starting where the file part was supposed to be. It worked fine on compressed files, and it's when implementing that that it broke. So what's our basic rule for these perl tests - are we allowed to use pg_xlogdump from within a pg_basebackup test? If so that could actually be a useful test - do the backup, extract the xlog and verify that it contains the SWITCH record. I also noticed that using -Z5 created a .tar.gz and -z created a .tar (which was compressed). Because compresslevel is set to -1 with -z, meaning default. Again, apologies for getting late back into the game here. //Magnus diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml index 9f1eae1..a4236a5 100644 --- a/doc/src/sgml/ref/pg_basebackup.sgml +++ b/doc/src/sgml/ref/pg_basebackup.sgml @@ -180,7 +180,8 @@ PostgreSQL documentation target directory, the tar contents will be written to standard output, suitable for piping to for example gzip. This is only possible if -the cluster has no additional tablespaces. +the cluster has no additional tablespaces and transaction +log
Re: [HACKERS] Add support for restrictive RLS policies
Hi Stephen, > 4. It will be good if we have an example for this in section > > "5.7. Row Security Policies" > > I haven't added one yet, but will plan to do so. > > I think you are going to add this in this patch itself, right? I have reviewed your latest patch and it fixes almost all my review comments. Also I am agree with your responses for couple of comments like response on ALTER POLICY and tab completion. No issues with that. However in documentation, PERMISSIVE and RESTRICTIVE are actually literals and not parameters as such. Also can we combine these two options into one like below (similar to how we document CASCADE and RESTRICT for DROP POLICY): PERMISSIVE RESTRICTIVE ... explain PERMISSIVE ... ... explain RESTRICTIVE ... Apart from this changes look excellent to me. Thanks -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
Re: [HACKERS] less expensive pg_buffercache on big shmem
On 09/29/2016 02:45 AM, Ivan Kartyshov wrote: > Secondly, I see this bit added to the loop over buffers: > > if (bufHdr->tag.forkNum == -1) > { > fctx->record[i].blocknum = InvalidBlockNumber; > continue; > } > > and I have no idea why this is needed (when it was not needed before). This helps to skip not used bufferpages. It is valuable on big and not warmup shared memory. That seems like an unrelated change. Checking for forkNum, instead of e.g. BM_VALID seems a bit surprising, too. On 09/02/2016 12:01 PM, Robert Haas wrote: I think we certainly want to lock the buffer header, because otherwise we might get a torn read of the buffer tag, which doesn't seem good. But it's not obvious to me that there's any point in taking the lock on the buffer mapping partition; I'm thinking that doesn't really do anything unless we lock them all, and we all seem to agree that's going too far. Replace consistent method with semiconsistent (that lock buffer headers without partition lock). Made some additional fixes thanks to reviewers. This patch also does: +++ b/contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql @@ -0,0 +1,6 @@ +/* contrib/pg_buffercache/pg_buffercache--1.2--1.3.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION pg_buffercache(text) UPDATE TO '1.3'" to load this file. \quit + +ALTER FUNCTION pg_buffercache_pages() PARALLEL SAFE; Why? That seems unrelated to what's been discussed in this thread. I have committed the straightforward part of this, removing the partition-locking. I think we're done here for this commitfest, but feel free to post new patches for that PARALLEL SAFE and the quick-check for unused buffers, if you think it's worthwhile. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
2016-09-29 13:54 GMT+05:00 amul sul: > > On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane wrote: > > > > I started looking at your 0001-to-timestamp-format-checking-v4.patch > > and this point immediately jumped out at me. Currently the code relies > > ... without any documentation ... on no elog being thrown out of > > parse_format(). That's at the very least trouble waiting to happen. > > There's a hack to deal with errors from within the NUMDesc_prepare > > subroutine, but it's a pretty ugly and underdocumented hack. And what > > you had here was randomly different from that solution, too. > > > > After a bit of thought it seemed to me that a much cleaner fix is to add > > a "valid" flag to the cache entries, which we can leave clear until we > > have finished parsing the new format string. That avoids adding extra > > data copying as you suggested, removes the need for PG_TRY, and just > > generally seems cleaner and more bulletproof. > > > > I've pushed a patch that does it that way. The 0001 patch will need > > to be rebased over that (might just require removal of some hunks, > > not sure). > > > > I also pushed 0002-to-timestamp-validation-v2.patch with some revisions > > (it'd broken acceptance of BC dates, among other things, but I think > > I fixed everything). Thank you for committing the 0002 part of the patch! I wanted to fix cache functions too, but wasn't sure about that. > > > > Since you told us earlier that you'd be on vacation through the end of > > September, I'm assuming that nothing more will happen on this patch during > > this commitfest, so I will mark the CF entry Returned With Feedback. > > Behalf of Artur I've rebased patch, removed hunk dealing with broken > cache entries by copying it, which is no longer required after 83bed06 > commit. > > Commitfest status left untouched, I am not sure how to deal with > "Returned With Feedback" patch. Is there any chance that, this can be > considered again for committer review? Thank you for fixing the patch! Today I have access to the internet and able to fix and test the patch. I've looked at your 0001-to-timestamp-format-checking-v5.patch. It seems nice to me. I suppose it is necessary to fix is_char_separator() declaration. from: static bool is_char_separator(char *str); to: static bool is_char_separator(const char *str); Because now in parse_format() *str argument is const. I attached new version of the patch, which fix is_char_separator() declaration too. Sorry for confusing! -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company 0001-to-timestamp-format-checking-v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Tuplesort merge pre-reading
On 09/28/2016 07:20 PM, Peter Geoghegan wrote: On Wed, Sep 28, 2016 at 5:11 PM, Peter Geogheganwrote: This is why I never pursued batch memory for non-final merges. Isn't that what you're doing here? You're pretty much always setting "state->batchUsed = true". Wait. I guess you feel you have to, since it wouldn't be okay to use almost no memory per tape on non-final merges. You're able to throw out so much code here in large part because you give almost all memory over to logtape.c (e.g., you don't manage each tape's share of "slots" anymore -- better to give everything to logtape.c). So, with your patch, you would actually only have one caller tuple in memory at once per tape for a merge that doesn't use batch memory (if you made it so that a merge *could* avoid the use of batch memory, as I suggest). Correct. In summary, under your scheme, the "batchUsed" variable contains a tautological value, since you cannot sensibly not use batch memory. (This is even true with !state->tuples callers). I suppose. Do I have that right? If so, this seems rather awkward. Hmm. How is it awkward? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] An extra error for client disconnection on Windows
Hello, At Tue, 13 Sep 2016 10:00:32 -0300, Alvaro Herrerawrote in <20160913130032.GA391646@alvherre.pgsql> > Michael Paquier wrote: > > On Tue, Sep 13, 2016 at 10:42 AM, Kyotaro HORIGUCHI > > wrote: > > > If we take a policy to try to imitate the behavior of some > > > reference platform (specifically Linux) on other platforms, this > > > is required disguising. Another potential policy on this problem > > > is "following the platform's behavior". From this viewpoint, this > > > message should be shown to users because Windows says > > > so. Especially for socket operations, the simultion layer is > > > intending the former for non-error behaviors, but I'm not sure > > > about the behaviors on errors. > > > > The more you hack windows, the more you'll notice that it is full of > > caveats, behavior exceptions, and that it runs in its way as nothing > > else in this world... This patch looks like a tempest in a teapot at > > the end. Why is it actually a problem to show this message? Just > > useless noise? If that's the only reason let's drop the patch and move > > on. > > Yeah, I looked into this a few days ago and that was my conclusion also: > let's drop this. Ok, I greed. Thanks. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug in to_timestamp().
On Thu, Sep 29, 2016 at 2:48 AM, Tom Lanewrote: > Artur Zakirov writes: >> - now DCH_cache_getnew() is called after parse_format(). Because now >> parse_format() can raise an error and in the next attempt >> DCH_cache_search() could return broken cache entry. > > I started looking at your 0001-to-timestamp-format-checking-v4.patch > and this point immediately jumped out at me. Currently the code relies > ... without any documentation ... on no elog being thrown out of > parse_format(). That's at the very least trouble waiting to happen. > There's a hack to deal with errors from within the NUMDesc_prepare > subroutine, but it's a pretty ugly and underdocumented hack. And what > you had here was randomly different from that solution, too. > > After a bit of thought it seemed to me that a much cleaner fix is to add > a "valid" flag to the cache entries, which we can leave clear until we > have finished parsing the new format string. That avoids adding extra > data copying as you suggested, removes the need for PG_TRY, and just > generally seems cleaner and more bulletproof. > > I've pushed a patch that does it that way. The 0001 patch will need > to be rebased over that (might just require removal of some hunks, > not sure). > > I also pushed 0002-to-timestamp-validation-v2.patch with some revisions > (it'd broken acceptance of BC dates, among other things, but I think > I fixed everything). > > Since you told us earlier that you'd be on vacation through the end of > September, I'm assuming that nothing more will happen on this patch during > this commitfest, so I will mark the CF entry Returned With Feedback. Behalf of Artur I've rebased patch, removed hunk dealing with broken cache entries by copying it, which is no longer required after 83bed06 commit. Commitfest status left untouched, I am not sure how to deal with "Returned With Feedback" patch. Is there any chance that, this can be considered again for committer review? Thanks ! Regards, Amul 0001-to-timestamp-format-checking-v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
Hello, thank you for the comment. At Fri, 23 Sep 2016 18:15:40 +0530, Amit Khandekarwrote in
Re: [HACKERS] Floating point comparison inconsistencies of the geometric types
> Well, those two results are not contradictory -- notice that you > switched the order of the values in the comparison. I don't think > you've really found the explanation yet. I am sorry I copy-pasted the wrong example: "select_views" test runs: > SELECT name, #thepath FROM iexit ORDER BY 1, 2 and expects to get rows in this order: > I- 580Ramp |8 > I- 580/I-680 Ramp |2 with the collation on my laptop, this is actually true: > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > Ramp'; > ?column? > -- > t > (1 row) on the Linux server I am testing, it is not: > regression=# select 'I- 580Ramp' < 'I- 580/I-680 > Ramp'; > ?column? > -- > f > (1 row) The later should be the case on your environment as the test was also failing for you. This is not consistent with the expected test result. Do you know how this test can still pass on the master? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Tue, Aug 30, 2016 at 11:04 AM, Michael Paquierwrote: > On Mon, Aug 29, 2016 at 9:39 PM, Craig Ringer > wrote: >> Cool. I'll mark as waiting on author pending that. >> >> It'll be good to get this footgun put away. > > OK, so done. I have put the renaming of pg_xlog to pg_wal on top patch > stack as that's the one making no discussion it seems: a lot of people > like pg_wal. I have added as well handling for the renaming in > pg_basebackup by using PQserverVersion. One thing to note is that a > connection needs to be made to the target server *before* creating the > soft link of pg_xlog/pg_wal because we need to know the version of the > target server. pg_upgrade is handled as well. And that's all in 0001. > > Patch 0002 does pg_clog -> pg_trans, "trans" standing for > "transaction". Switching to pg_trans_status or pg_xact_status is not > that complicated to change anyway... Any input to offer for those patches? If there is nothing happening, I guess that the best move is just to move it to next CF. At least I can see that the flow would be to get those renamings done. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
Hello, At Tue, 20 Sep 2016 16:50:29 +0900, Michael Paquierwrote in > On Mon, Sep 19, 2016 at 6:11 PM, Pavel Stehule > wrote: > > I am thinking so commit's description should be inside README > > Horiguchi-san, your patch has some whitespace issues, you may want to > get a run with git diff --check. Here are some things I have spotted: > src/bin/psql/tab-complete.c:1074: trailing whitespace. > +"MATERIALIZED VIEW", > src/bin/psql/tab-complete.c:2621: trailing whitespace. > + COMPLETE_WITH_QUERY(Query_for_list_of_roles, Thank you very much for pointing it out. I put a pre-commit hook to check that not to do such a mistake again. http://stackoverflow.com/questions/591923/make-git-automatically-remove-trailing-whitespace-before-committing/22704385#22704385 > This set of patches is making psql tab completion move into a better > shape, particularly with 0001 that removes the legendary huge if-elif > and just the routine return immediately in case of a keyword match. > Things could be a little bit more shortened by for example not doing > the refactoring of the tab macros because they are just needed in > tab-complete.c. The other patches introduce further improvements for > the existing infrastructure, but that's a lot of things just for > adding IF [NOT] EXISTS to be honest. It was the motive for this, but even excluding it, some syntaxes with optional keywords can be simplified or enriched with the new macros. CREATE SCHEMA's schema elements, CREATE INDEX and some other syntaxes are simplified using the feature. > Testing a bit, I have noticed that for example trying to after typing > "create table if", if I attempt to do a tab completion "not exists" > does not show up. I suspect that the other commands are failing at > that as well. I suppose it is "create table if ", with a space at the tail. It is a general issue on combined keywords(?) suggestion in the whole tab-completion mechanism (or readline's limitation). Some sytaxes have explicit complition for such cases. For examle, "create foreign " gets a suggestion of "DATA WRAPPER" since it has an explcit suggestion step. > /* ALTER FOREIGN */ > if (Matches2("ALTER", "FOREIGN")) > COMPLETE_WITH_LIST2("DATA WRAPPER", "TABLE"); It is apparently solvable, but needs additional code to suggest the rest words for every steps. It should be another issue. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall
On Mon, Sep 26, 2016 at 2:29 PM, Rafia Sabihwrote: > On Tue, Jul 5, 2016 at 06:39 AM, Haribabu Kommi > kommi(dot)haribabu(at)gmail(dot)com wrote: > > Still i feel the GRANT statements should be present, as the create > database statement > is generated only with -C option. So attached patch produces the GRANT > statements based > on the -x option. > > > The attached patch does the job fine. However, I am a little skeptical > about this addition, since, it is clearly mentioned in the documentation of > pg_dump that it would not restore global objects, then why expecting this. > This addditon makes pg_dump -C somewhat special as now it is restoring > these grant statements. Only if we consider the popular method of > dump-restore mentioned in the thread (https://www.postgresql.org/me > ssage-id/E1VYMqi-0001P4-P4%40wrigleys.postgresql.org) with pg_dumpall -g > and then individual pg_dump, then it would be helpful to have this patch. > Thanks for your comments. I am also not sure whether pg_dumpall -g and then individual pg_dump is the more widely used approach or not? If it is the case, it is better to fix the grant statements with -C option, otherwise I agree that this patch is not required. Any opinions from other members? Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] WAL logging problem in 9.4.3?
On Mon, Sep 26, 2016 at 5:03 PM, Kyotaro HORIGUCHIwrote: > Hello, I return to this before my things:) > > Though I haven't played with the patch yet.. Be sure to run the test cases in the patch or base your tests on them then! > Though I don't know how it actually impacts the perfomance, it > seems to me that we can live with truncated_to and sync_above in > RelationData and BufferNeedsWAL(rel, buf) instead of > HeapNeedsWAL(rel, buf). Anyway up to one entry for one relation > seems to exist at once in the hash. TBH, I still think that the design of this patch as proposed is pretty cool and easy to follow. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 12:56 PM, Dilip Kumarwrote: > On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondra > wrote: >> Yes, definitely - we're missing something important, I think. One difference >> is that Dilip is using longer runs, but I don't think that's a problem (as I >> demonstrated how stable the results are). >> >> I wonder what CPU model is Dilip using - I know it's x86, but not which >> generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer >> model and it makes a difference (although that seems unlikely). > > I am using "Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz " > Another difference is that m/c on which Dilip is doing tests has 8 sockets. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Sep 29, 2016 at 6:40 AM, Tomas Vondrawrote: > Yes, definitely - we're missing something important, I think. One difference > is that Dilip is using longer runs, but I don't think that's a problem (as I > demonstrated how stable the results are). > > I wonder what CPU model is Dilip using - I know it's x86, but not which > generation it is. I'm using E5-4620 v1 Xeon, perhaps Dilip is using a newer > model and it makes a difference (although that seems unlikely). I am using "Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz " -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] IF (NOT) EXISTS in psql-completion
Thank you for reviewing! At Mon, 19 Sep 2016 11:11:03 +0200, Pavel Stehulewrote in
Re: [HACKERS] Handling dropped attributes in pglogical_proto
On 29 September 2016 at 12:53, Petr Jelinekwrote: > On 29/09/16 05:33, Michael Paquier wrote: >> On Wed, Sep 28, 2016 at 11:25 PM, Konstantin Knizhnik >> wrote: >>> But if table was just altered and some attribute was removed from the table, >>> then rel->natts can be greater than natts. >> >> This is part of pglogical, so you may want to reply on the dedicated >> thread or send directly a patch to them. By the way, this code path >> may need to care as well about attisdropped. It is never good not to >> check for it. >> > > Agreed this does not belong to hackers and should reported on github. > > But just as note the rel variable is not Relation, it's > PGLogicalRelation which gets populated by relation message from the > upstream so if you observe the behavior mentioned in the original email, > you are probably doing something unexpected there (Konstantin is not > using vanilla pglogical). Also the attmap should never contain > attisdropped attributes. Yeah, and if you're relying on attno equivalence you're also going to have to deal with it in dump/restore unless you only create new nodes using physical copy. (See bdr's bdr_dump). BDR relies on attno equivalence. pglogical avoids doing because of the pain it caused in BDR. On a side note, do NOT report bugs on ANY software when you're talking about a modified version unless you clearly disclose that it's modified. You'll waste everyone's time. We already spent a while looking into a logical decoding issue you reported that turned out to be caused by your team's patches to add 2PC decoding support, which you didn't mention in the report. If you want anyone to listen to your reports you really need to be very explicit about whether it's a modified version or not. If it's modified, make sure you can reproduce the problem in the unmodified version or explain the situation more. If you want to ask things about BDR and pglogical, now that it's clear that in-core logical replication is going in another direction so -hackers isn't the place for it, please use bdr-l...@2ndquadrant.com . -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Notice lock waits
On Sat, Aug 6, 2016 at 3:00 AM, Jeff Janeswrote: > One time too many, I ran some minor change using psql on a production > server and was wondering why it was taking so much longer than it did > on the test server. Only to discover, after messing around with > opening new windows and running queries against pg_stat_activity and > pg_locks and so on, that it was waiting for a lock. > > So I created a new guc, notice_lock_waits, which acts like > log_lock_waits but sends the message as NOTICE so it will show up on > interactive connections like psql. > > I turn it on in my .psqlrc, as it doesn't make much sense for me to > turn it on in non-interactive sessions. > > A general facility for promoting selected LOG messages to NOTICE would > be nice, but I don't know how to design or implement that. This is > much easier, and I find it quite useful. > > I have it PGC_SUSET because it does send some tiny amount of > information about the blocking process (the PID) to the blocked > process. That is probably too paranoid, because the PID can be seen > by anyone in the pg_locks table anyway. > > Do you think this is useful and generally implemented in the correct > way? If so, I'll try to write some sgml documentation for it. > Providing the details of lock wait to the client is good. I fell this message is useful for the cases where User/administrator is trying to perform some SQL operations. I also feel that, adding a GUC variable for these logs to show it to user may not be good. Changing the existing GUC may be better. I am not sure whether it really beneficial in providing all LOG as NOTICE messages with a generic framework, it may be unnecessary overhead for some users, I am not 100% sure. Regards, Hari Babu Fujitsu Australia