Re: [HACKERS] Tracking wait event for latches
On Sat, Sep 24, 2016 at 1:44 AM, Michael Paquierwrote: > After digesting all the comments given, I have produced the patch > attached that uses the following categories: > +const char *const WaitEventNames[] = { > + /* activity */ > + "ArchiverMain", > + "AutoVacuumMain", > + "BgWriterHibernate", > + "BgWriterMain", > + "CheckpointerMain", > + "PgStatMain", > + "RecoveryWalAll", > + "RecoveryWalStream", > + "SysLoggerMain", > + "WalReceiverMain", > + "WalSenderMain", > + "WalWriterMain", > + /* client */ > + "SecureRead", > + "SecureWrite", > + "SSLOpenServer", > + "WalReceiverWaitStart", > + "WalSenderWaitForWAL", > + "WalSenderWriteData", > + /* Extension */ > + "Extension", > + /* IPC */ > + "BgWorkerShutdown", > + "BgWorkerStartup", > + "ExecuteGather", > + "MessageQueueInternal", > + "MessageQueuePutMessage", > + "MessageQueueReceive", > + "MessageQueueSend", > + "ParallelFinish", > + "ProcSignal", > + "ProcSleep", > + "SyncRep", > + /* timeout */ > + "BaseBackupThrottle", > + "PgSleep", > + "RecoveryApplyDelay", > +}; > I have moved WalSenderMain as it tracks a main loop, so it was strange > to not have it in Activity. I also kept SecureRead and SecureWrite > because this is referring to the functions of the same name. For the > rest, I got convinced by what has been discussed and it makes things > clearer. My head is not spinning anymore when I try to keep in sync > the list of names and its associated enum, which is good. I have as > well rewritten the docs to follow that. -extern int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent *occurred_events, int nevents); -extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout); +extern int WaitEventSetWait(WaitEventSet *set, long timeout, + WaitEvent *occurred_events, int nevents, + uint8 classId, uint16 eventId); +extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout, + uint8 classId, uint16 eventId); extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, - pgsocket sock, long timeout); + pgsocket sock, long timeout, uint8 classId, + uint16 eventId); + WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_IPC, WE_MQ_RECEIVE); If the class really is strictly implied by the WaitEventIdentifier, then do we really need to supply it everywhere when calling the various wait functions? That's going to be quite a few functions: WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some more patches out there have legs then also ConditionVariableWait, BarrierWait, and possibly further kinds of wait points. And then all their call sites will have have to supply the wait class ID, even though it is implied by the other ID. Perhaps that array that currently holds the names should instead hold { classId, displayName } so we could just look it up? -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] "Re: Question about grant create on database and pg_dump/pg_dumpall
On Tue, Jul 5, 2016 at 06:39 AM, Haribabu Kommi kommi(dot)haribabu(at)gmail(dot)com wrote: Still i feel the GRANT statements should be present, as the create database statement is generated only with -C option. So attached patch produces the GRANT statements based on the -x option. The attached patch does the job fine. However, I am a little skeptical about this addition, since, it is clearly mentioned in the documentation of pg_dump that it would not restore global objects, then why expecting this. This addditon makes pg_dump -C somewhat special as now it is restoring these grant statements. Only if we consider the popular method of dump-restore mentioned in the thread (https://www.postgresql.org/me ssage-id/E1VYMqi-0001P4-P4%40wrigleys.postgresql.org) with pg_dumpall -g and then individual pg_dump, then it would be helpful to have this patch. Regards, Rafia Sabih EnterpriseDB:: http://www.enterprisedb.com
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Mon, Sep 26, 2016 at 5:11 PM, Thomas Munrowrote: > It seems that the version of docbook that you get if you follow the > instructions[1] ... > And I mean these instructions: [1] https://www.postgresql.org/docs/devel/static/docguide-toolsets.html -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munrowrote: > On Mon, Sep 19, 2016 at 4:02 PM, David Fetter wrote: > > > > [training_wheels_004.patch] > > openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter > literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA", > "STARTTAG", "SYSTEM" and parameter separators allowed > openjade:contrib.sgml:138:2:W: cannot generate system identifier for > general entity "require" > > The documentation doesn't build here, I think because require_where is > not an acceptable entity name. It works for me if I change the > underscore to a minus in various places. It seems that the version of docbook that you get if you follow the instructions[1] on CentOS is OK with the underscore in entity names, but the version you get if you follow the instructions for macOS + MacPorts doesn't like it. I didn't investigate exactly which component or version was behind that, but it's clear that other entity names use hyphens instead of underscores, so I would recommend making this change to your patch so we don't change the version requirements for building the docs: diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml index 7ca62a4..48ca717 100644 --- a/doc/src/sgml/contrib.sgml +++ b/doc/src/sgml/contrib.sgml @@ -135,7 +135,7 @@ CREATE EXTENSION module_name FROM unpackaged; - _where; + diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 8079cbd..4552273 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -141,7 +141,7 @@ - + -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Mon, Sep 26, 2016 at 5:48 AM, Thomas Munrowrote: > On Mon, Sep 19, 2016 at 4:02 PM, David Fetter wrote: > > > > [training_wheels_004.patch] > > openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter > literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA", > "STARTTAG", "SYSTEM" and parameter separators allowed > openjade:contrib.sgml:138:2:W: cannot generate system identifier for > general entity "require" > > The documentation doesn't build here, I think because require_where is > not an acceptable entity name. It works for me if I change the > underscore to a minus in various places. That fixes these errors: > > + > + Here is an example showing how to set up a database cluster with > + require_where. > + > +$ psql -U postgres > +# SHOW shared_preload_libraries; /* Make sure not to clobber > something by accident */ > + > +If you found something, > +# ALTER SYSTEM SET > shared_preload_libraries='the,stuff,you,found,require_where'; > + > +Otherwise, > +# ALTER SYSTEM SET shared_preload_libraries='require_where'; > + > +Then restart PostgreSQL > + > + > > Could use a full stop (period) on the end of that sentence. Also it > shouldn't be inside the "screen" tags. Maybe "If you found > something," and "Otherwise," shouldn't be either, or should somehow be > marked up so as not to appear to be text from the session. > > postgres=# delete from foo; > ERROR: DELETE requires a WHERE clause > HINT: To delete all rows, use "WHERE true" or similar. > > Maybe one of those messages could use some indication of where this is > coming from, for surprised users encountering this non-standard > behaviour for the first time? > > +1. I think hint message should be more clear about where its coming from. May be it can specify the GUC name, or suggest that if you really want to use DELETE without WHERE clause, turn OFF this GUC? or something in similar line. > FWIW I saw something similar enforced globally by the DBA team at a > large company with many database users. I think experienced users > probably initially felt mollycoddled when they first encountered the > error but I'm sure that some were secretly glad of its existence from > time to time... I think it's a useful feature for users who want it, > and a nice little demonstration of how extensible Postgres is. > > -- > Thomas Munro > http://www.enterprisedb.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Rushabh Lathia
Re: [HACKERS] Use of SizeOfIptrData - is that obsolete?
On Fri, Sep 23, 2016 at 12:04 AM, Tom Lanewrote: > > I thought removing the comment altogether was not appropriate, because > it remains true that you want to work really hard to ensure that > sizeof(ItemPointerData) is 6. We're just giving up on pretense of support > for compilers that don't believe that Makes sense. > . I'm half tempted to introduce a > StaticAssert about it, but refrained for the moment. > > I also thought about that and it probably makes sense, at least to see how buildfarm behaves. One reason to do so is that I did not find any discussion or evidence of why SizeOfIptrData magic is no longer necessary and to see if it was an unintentional change at some point. > > While htup.h refactoring happened in 9.5, I don't see any point in back > > patching this. > > Agreed. Pushed to HEAD only. > Thanks. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Possibly too stringent Assert() in b-tree code
On Mon, Sep 26, 2016 at 8:54 AM, Amit Kapilawrote: > On Sun, Sep 25, 2016 at 9:31 PM, Tom Lane wrote: >> Amit Kapila writes: >>> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane wrote: This is clearly an oversight in Simon's patch fafa374f2, which introduced this code without any consideration for the possibility that the page doesn't have a valid special area. ... but I'm not very clear on whether this is a safe fix, mainly because I don't understand what the reuse WAL record really accomplishes. Maybe we need to instead generate a reuse record with some special transaction ID indicating worst-case assumptions? >> >>> I think it is basically to ensure that the queries on standby should >>> not be considering the page being recycled as valid and if there is >>> any pending query to which this page is visible, cancel it. If this >>> understanding is correct, then I think the fix you are proposing is >>> correct. >> >> After thinking about it some more, I do not understand why we are emitting >> WAL here at all in *any* case, either for a new page or for a deleted one >> that we're about to recycle. Why wouldn't the appropriate actions have >> been taken when the page was deleted, or even before that when its last >> tuple was deleted? >> > > It seems to me that we do take actions for conflict resolution during > the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO > which we emit in vacuum), but not sure if that is sufficient. > Consider a case where the new transaction is started on standby after > Here by new transaction, I intend to say some newer snapshot with valid MyPgXact->xmin. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Possibly too stringent Assert() in b-tree code
On Sun, Sep 25, 2016 at 9:31 PM, Tom Lanewrote: > Amit Kapila writes: >> On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane wrote: >>> This is clearly an oversight in Simon's patch fafa374f2, which introduced >>> this code without any consideration for the possibility that the page >>> doesn't have a valid special area. ... >>> but I'm not very clear on whether this is a safe fix, mainly because >>> I don't understand what the reuse WAL record really accomplishes. >>> Maybe we need to instead generate a reuse record with some special >>> transaction ID indicating worst-case assumptions? > >> I think it is basically to ensure that the queries on standby should >> not be considering the page being recycled as valid and if there is >> any pending query to which this page is visible, cancel it. If this >> understanding is correct, then I think the fix you are proposing is >> correct. > > After thinking about it some more, I do not understand why we are emitting > WAL here at all in *any* case, either for a new page or for a deleted one > that we're about to recycle. Why wouldn't the appropriate actions have > been taken when the page was deleted, or even before that when its last > tuple was deleted? > It seems to me that we do take actions for conflict resolution during the page deletion (that looks to be covered by XLOG_HEAP2_CLEANUP_INFO which we emit in vacuum), but not sure if that is sufficient. Consider a case where the new transaction is started on standby after page deletion and the same still precedes the value of xact on page, such transactions must be cancelled before we can reuse the page. I think the fact that before recycling the page we do ensure that no transaction is interested in that page in master indicates something similar is required in standby as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Supporting huge pages on Windows
Hello, The attached patch implements huge_pages on Windows. I'll add this to the CommitFest. The performance improvement was about 2% with the following select-only pgbench. The scaling factor is 200, where the database size is roughly 3GB. I ran the benchmark on my Windows 10 PC with 6 CPU cores and 16GB of RAM. pgbench -c18 -j6 -T60 -S bench Before running pgbench, I used pg_prewarm to cache all pgbench tables and indexes (excluding the history table) in the 4GB shared buffers. The averages of running pgbench three times are: huge_pages=off: 70412 tps huge_pages=on : 72100 tps The purpose of pg_ctl.c modification is to retain "Lock pages in memory" Windows user right in postgres. That user right is necessary for the large-page support. The current pg_ctl removes all privileges when spawning postgres, which is overkill. The system administrator should be able to assign appropriate privileges to the PostgreSQL service account. Credit: This patch is based on Thomas Munro's one. Regards Takayuki Tsunakawa win_large_page.patch Description: win_large_page.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Change the default of update_process_title to off
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro > Another database vendor recommends granting SeLockMemoryPrivilege so that > it can use large pages on Windows when using several GB of buffer pool. > I wonder if that might help Postgres on Windows. This could be useful as > a starting point to test that theory: > > https://www.postgresql.org/message-id/CAEepm%3D075-bgHi_VDt4SCAmt%2Bo_ > %2B1XaRap2zh7XwfZvT294oHA%40mail.gmail.com Sorry for my late reply, and thank you. I've created a patch based on yours, and I'll submit it shortly in a separate thread. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
On 26 Sep. 2016 02:16, "Vladimir Gordiychuk"wrote: > > >It looks like you didn't import my updated patches, so I've rebased your new patches on top of them. > Yes, i forgot it, sorry. Thanks for you fixes. > > >I did't see you explain why this was removed: > > -/* fast path */ > -/* Try to flush pending output to the client */ > -if (pq_flush_if_writable() != 0) > -WalSndShutdown(); > - > -if (!pq_is_send_pending()) > -return; > > I remove it, because during decode long transaction code, we always exist from function by fast path and not receive messages from client. Now we always include in 'for' loop and executes > /* Check for input from the client */ ProcessRepliesIfAny(); Makes sense. I > >Some of the changes to pg_recvlogical look to be copied from > >receivelog.c, most notably the functions ProcessKeepalive and > >ProcessXLogData . > > During refactoring huge function I also notice It, but decide not include additional changes in already huge patch. I only split method that do everything to few small functions. Good decision. This improves things already and makes future refactoring easier. > >I was evaluating the tests and I don't think they actually demonstrate > >that the patch works. There's nothing done to check that > >pg_recvlogical exited because of client-initiated CopyDone. While > >looking into that I found that it also never actually hits > >ProcessCopyDone(...) because libpq handles the CopyDone reply from the > >server its self and treats it as end-of-stream. So the loop in > >StreamLogicalLog will just end and ProcessCopyDone() is dead code. > > The main idea was about fast stop logical replication as it available, because if stop replication gets more them 1 seconds it takes more pain. You should rely on time I tests as little as possible. Some of the test hosts are VERY slow. If possible something deterministic should be used. > Increase this timeout to 3 or 5 second hide problems that this PR try fix, that's why I think this lines should be restore to state from previous patch. Per above I don't agree. > > ``` > ok($spendTime <= 5, # allow extra time for slow machines > "pg_recvlogical exited promptly on signal when decoding"); > > ok((time() - $cancelTime) <= 3, # allow extra time for slow machines > "pg_recvlogical exited promptly on sigint when idle" > ); > > ``` > > Also I do not understand why we do > > $proc->pump while $proc->pumpable; > > after send signal to process, why we can not just wait stop replication slot? Because verbose output can produce a lot of writes. If we don't pump the buffer pg_recvlogical could block writing on its socket. Also it lets us capture stderr which we need to observe that pg_recvlogical actually ended copy on user command rather than just receiving all the input available.
Re: [HACKERS] store narrow values in hash indexes?
On Sat, Sep 24, 2016 at 1:03 AM, Amit Kapilawrote: > On Sat, Sep 24, 2016 at 1:02 AM, Robert Haas wrote: >> Currently, hash indexes always store the hash code in the index, but >> not the actual Datum. It's recently been noted that this can make a >> hash index smaller than the corresponding btree index would be if the >> column is wide. However, if the index is being built on a fixed-width >> column with a typlen <= sizeof(Datum), we could store the original >> value in the hash index rather than the hash code without using any >> more space. That would complicate the code, but I bet it would be >> faster: we wouldn't need to set xs_recheck, we could rule out hash >> collisions without visiting the heap, and we could support index-only >> scans in such cases. > > What exactly you mean by Datum? Is it for datatypes that fits into 64 > bits like integer. Yeah, I mean whatever is small enough to fit into the space currently being used to store the hashcode, along with any accompanying padding bytes that we can also use. > I think if we are able to support index only scans > for hash indexes for some data types, that will be a huge plus. > Surely, there is some benefit without index only scans as well, which > is we can avoid recheck, but not sure if that alone can give us any > big performance boost. As, you say, it might lead to some > complication in code, but I think it is worth trying. Yeah, the recheck is probably not that expensive if we have to retrieve the heap page anyway. > Won't it add some requirements for pg_upgrade as well? I have nothing to add to what Bruce already said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] VACUUM's ancillary tasks
On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearingwrote: > While doing this, I noticed that ALTER TABLE should also re-analyze the > table for obvious reasons, so I have that one set > "changes_since_analyze" to the number of rows rewritten as well. I noticed that ATExecAlterColumnType does this: /* * Drop any pg_statistic entry for the column, since it's now wrong type */ RemoveStatistics(RelationGetRelid(rel), attnum); What if there is no rewrite, because the type is binary compatible (ATColumnChangeRequiresRewrite returns false)? Then I think your patch won't attract an autovacuum daemon to ANALYZE the table and produce new statistics, because it won't touch changes_since_analyze. I wonder if we should simply keep the stats if no rewrite is required. Aren't the statistical properties of binary-compatible types also compatible? Alternatively, we should consider bumping changes_since_analyze in this case too, if you're going to do it in the rewrite case. -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] Allowing GIN array_ops to work on anyarray
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:tested, passed I am resending this due to an earlier error message from the mailing list: - These are my comments for the review of https://commitfest.postgresql.org/10/708/ I built and installed (make world / make install-world) github branch REL9_6_STABLE and applied the patch (patch -p1 < patch/gin-true-anyarray-opclass-1.patch). I then upgraded my development database (postgres 9.5) using pg_upgrade. This database has one table with an UUID array gin index. The database was upgraded correctly to postgresql 9.6 and I was able to successfully connect to it from a web application which uses the database. There were no conflicts so I expect others to be able to upgrade without issues. I then dropped the database and re-created it... I made sure that I no longer used my prior operator class definition. I re-started my web application and confirmed it works. This verifies the feature works as designed. The following is no longer required: CREATE OPERATOR CLASS _uuid_ops DEFAULT FOR TYPE _uuid USING gin AS OPERATOR 1 &&(anyarray, anyarray), OPERATOR 2 @>(anyarray, anyarray), OPERATOR 3 <@(anyarray, anyarray), OPERATOR 4 =(anyarray, anyarray), FUNCTION 1 uuid_cmp(uuid, uuid), FUNCTION 2 ginarrayextract(anyarray, internal, internal), FUNCTION 3 ginqueryarrayextract(anyarray, internal, smallint, internal, internal, internal, internal), FUNCTION 4 ginarrayconsistent(internal, smallint, anyarray, integer, internal, internal, internal, internal), STORAGE uuid; I also ran "make installcheck-world" and all the tests passed. I am not sure what "Spec compliant means"... so I did not select as tested or passed. What should I do to validate that this change is "Spec compliant"? The new status of this patch is: Waiting on Author -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Mon, Sep 26, 2016 at 1:18 PM, Thomas Munrowrote: > underscore to a minus in various places. That fixes these errors: Correction: s/these errors:/the above errors./ -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
On Mon, Sep 19, 2016 at 4:02 PM, David Fetterwrote: > > [training_wheels_004.patch] openjade:filelist.sgml:144:16:E: character "_" invalid: only parameter literal, "CDATA", "ENDTAG", "MD", "MS", "PI", "PUBLIC", "SDATA", "STARTTAG", "SYSTEM" and parameter separators allowed openjade:contrib.sgml:138:2:W: cannot generate system identifier for general entity "require" The documentation doesn't build here, I think because require_where is not an acceptable entity name. It works for me if I change the underscore to a minus in various places. That fixes these errors: + + Here is an example showing how to set up a database cluster with + require_where. + +$ psql -U postgres +# SHOW shared_preload_libraries; /* Make sure not to clobber something by accident */ + +If you found something, +# ALTER SYSTEM SET shared_preload_libraries='the,stuff,you,found,require_where'; + +Otherwise, +# ALTER SYSTEM SET shared_preload_libraries='require_where'; + +Then restart PostgreSQL + + Could use a full stop (period) on the end of that sentence. Also it shouldn't be inside the "screen" tags. Maybe "If you found something," and "Otherwise," shouldn't be either, or should somehow be marked up so as not to appear to be text from the session. postgres=# delete from foo; ERROR: DELETE requires a WHERE clause HINT: To delete all rows, use "WHERE true" or similar. Maybe one of those messages could use some indication of where this is coming from, for surprised users encountering this non-standard behaviour for the first time? FWIW I saw something similar enforced globally by the DBA team at a large company with many database users. I think experienced users probably initially felt mollycoddled when they first encountered the error but I'm sure that some were secretly glad of its existence from time to time... I think it's a useful feature for users who want it, and a nice little demonstration of how extensible Postgres is. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] C++ port of Postgres
On Thu, Sep 1, 2016 at 1:41 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > [trimmed cc list because of big attachments] > > On 8/16/16 4:22 PM, Jim Nasby wrote: > > Joy, do you have an idea what a *minimally invasive* patch for C++ > > support would look like? That's certainly the first step here. > > I developed a minimally invasive patch for C++ support a few years ago > shortly after I wrote that blog post. Since there appears to have been > some interest here now, I have updated that and split it up into logical > chunks. > > So here you go. > I looked at a random selection of these patches this morning. > 0001-Fix-use-of-offsetof.patch I agree that this is already invalid C because it's not constant. It is accepted by GCC's C front end no matter what switches you give it and clearly many other compilers too, but not clang's C front end with "-pedantic". > 0002-Use-return-instead-of-exit-in-configure.patch Makes sense. Any reason not to #include instead like you did for the 0003 patch? > 0003-Add-missing-include-files-to-configure-tests.patch Makes sense. > 0014-Set-up-for-static-asserts-in-C.patch +#if __cpp_static_assert >= 201411 +#define _Static_assert(condition, errmessage) static_assert(condition, errmessage) Don't you mean 201103L? C++11 introduced two-argument static_assert, not C++14. > 0021-Workaround-for-using-typdef-ed-ints-in-loops.patch > > Types made from int don't have a ++ operator in C++, so they can't be > used in for loops without further work. ForkNumber is not a typedef'd int: it's an enum. Since it's called a "fork *number*" and clearly treated as a number in various places including loops that want to increment it, perhaps it really should be a typedef of an int, instead of an enum. Then either macros or an enum ForkNumberEnum could define the names (you can assign enums to ints, and compare enums and ints, you just can't assign ints to enums so it'd be no problem to have typedef ForkNumber int and then enum ForkNumberEnum { ... } to define the friendly names, but never actually use the type ForkNumberEnum). > 0023-Add-C-linkage-to-replacement-declaration-of-fdatasyn.patch -extern int fdatasync(int fildes); +extern "C" int fdatasync(int fildes); Doesn't this need to be made conditional on __cplusplus? > 0024-Make-inet_net_-to-C-linkage.patch Same. > 0025-Add-C-linkage-to-functions-exported-by-plugins.patch -extern void _PG_init(void); +extern "C" void _PG_init(void); Why this way in some places... +extern "C" { extern void _PG_init(void); +} ... and this way in single-function declaration cases in other places? -char *widget_out(WIDGET *widget); +char *widget_out(WIDGET * widget); Noise. > 0027-Hack-Disable-volatile-that-causes-mysterious-compile.patch > > Subject: [PATCH 27/27] Hack: Disable volatile that causes mysterious compiler errors I don't grok the reason for the volatile qualifier in this code the first place but if it actually does something useful, here's one way to fix it. The specific reason for the error reported by GCC is that QueuePosition (a struct) is copied by assignment: pos = oldpos = QUEUE_BACKEND_POS(MyBackendId); That means that it invokes the default assignment operator, and you can't do that with a volatile and a non-volatile object either way around according to C++ (though clang seems less fussy than GCC in this case). You could try to untangle that by supplying a suitably qualified explicit member operator: #ifdef __cplusplus QueuePosition& operator=(const volatile QueuePosition& other) { page = other.page; offset = other.offset; return *this; } #endif But that doesn't help with assignments going the other way How about just defining a macro in the style of the existing macros and then using that in place of all those incompatible assignment operations: iff --git a/src/backend/commands/async.c b/src/backend/commands/async.c index 716f1c3..469018f 100644 --- a/src/backend/commands/async.c +++ b/src/backend/commands/async.c @@ -194,6 +194,12 @@ typedef struct QueuePosition (x).offset = (z); \ } while (0) +#define COPY_QUEUE_POS(x, y) \ + do { \ + (x).page = (y).page; \ + (x).offset = (y).offset; \ + } while (0) + #define QUEUE_POS_EQUAL(x,y) \ ((x).page == (y).page && (x).offset == (y).offset) @@ -1757,7 +1763,8 @@ asyncQueueReadAllNotifications(void) LWLockAcquire(AsyncQueueLock, LW_SHARED); /* Assert checks that we have a valid state entry */ Assert(MyProcPid == QUEUE_BACKEND_PID(MyBackendId)); - pos = oldpos = QUEUE_BACKEND_POS(MyBackendId); + COPY_QUEUE_POS(pos, QUEUE_BACKEND_POS(MyBackendId)); + COPY_QUEUE_POS(oldpos, pos); head = QUEUE_HEAD; LWLockRelease(AsyncQueueLock); @@ -1861,7 +1868,7 @@ asyncQueueReadAllNotifications(void) { /* Update shared state */
Re: [HACKERS] PATCH: two slab-like memory allocators
On 25/09/16 22:17, Tomas Vondra wrote: > On 09/25/2016 08:48 PM, Petr Jelinek wrote: > >> Slab: >> In general it seems understandable, the initial description helps to >> understand what's happening well enough. >> >> One thing I don't understand however is why the freelist is both >> array and doubly linked list and why there is new implementation of >> said doubly linked list given that we have dlist. >> > > Hmm, perhaps that should be explained better. > > In AllocSet, we only have a global freelist of chunks, i.e. we have a > list of free chunks for each possible size (there's 11 sizes, starting > with 8 bytes and then doubling the size). So freelist[0] is a list of > free 8B chunks, freelist[1] is a list of free 16B chunks, etc. > > In Slab, the freelist has two levels - first there's a bitmap on each > block (which is possible, as the chunks have constant size), tracking > which chunks of that particular block are free. This makes it trivial to > check that all chunks on the block are free, and free the whole block > (which is impossible with AllocSet). > > Second, the freelist at the context level tracks blocks with a given > number of free chunks - so freelist[0] tracks completely full blocks, > freelist[1] is a list of blocks with 1 free chunk, etc. This is used to > reuse space on almost full blocks first, in the hope that some of the > less full blocks will get completely empty (and freed to the OS). > > Is that clear? Ah okay, makes sense, the documentation of this could be improved then though as it's all squashed into single sentence that wasn't quite clear for me. > >>> +/* >>> + * SlabChunk >>> + *The prefix of each piece of memory in an SlabBlock >>> + * >>> + * NB: this MUST match StandardChunkHeader as defined by >>> utils/memutils.h. >>> + */ >> >> Is this true? Why? And if it is then why doesn't the SlabChunk >> actually match the StandardChunkHeader? > > It is true, a lot of stuff in MemoryContext infrastructure relies on > that. For example when you do pfree(ptr), we actually do something like > >header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader)) > > to get the chunk header - which includes pointer to the memory context > and other useful stuff. > > This also means we can put additional fields before StandardChunkHeader > as that does not break this pointer arithmetic, i.e. SlabChunkData is > effectively defined like this: > > typedef struct SlabChunkData > { > /* block owning this chunk */ > void *block; > > /* standard header */ > StandardChunkHeader header; > } SlabChunkData; > Yes but your struct then does not match StandardChunkHeader exactly so it should be explained in more detail (the aset.c where this comment is also present has struct that matches StandardChunkHeader so it's sufficient there). >> >>> +static void >>> +SlabCheck(MemoryContext context) >>> +{ >>> +/* FIXME */ >>> +} >> >> Do you plan to implement this interface? >> > > Yes, although I'm not sure what checks should go there. The only thing I > can think of right now is checking that the number of free chunks on a > block (according to the bitmap) matches the freelist index. > Yeah this context does not seem like it needs too much checking. The freelist vs free chunks check sounds ok to me. I guess GenSlab then will call the checks for the underlying contexts. >>> +#define SLAB_DEFAULT_BLOCK_SIZE8192 >>> +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024) >> >> I am guessing this is based on max_cached_tuplebufs? Maybe these >> could be written with same style? >> > > Not sure I understand what you mean by "based on"? I don't quite > remember how I came up with those constants, but I guess 8kB and 8MB > seemed like good values. > > Also, what style you mean? I've used the same style as for ALLOCSET_* > constants in the same file. > I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more readable. ALLOCSET_* does that too (with exception of ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from rest of the code). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for lazy_scan_heap
Hello, Some initial comments on optimize_lazy_scan_heap_v2.patch. >- while (next_unskippable_block < nblocks) >+ while (next_unskippable_block < nblocks && >+ !FORCE_CHECK_PAGE(next_unskippable_block)) Dont we need similar check of !FORCE_CHECK_PAGE(next_unskippable_block) in the below code which appears on line no 556 of vacuumlazy.c ? next_unskippable_block = 0; if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0) { while (next_unskippable_block < nblocks) > { >if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0) >break; >+ n_all_frozen++; >} >else >{ >if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0) >break; >+ >+ /* Count the number of all-frozen pages */ >+ if (vmstatus & VISIBILITYMAP_ALL_FROZEN) >+ n_all_frozen++; >} I think its better to have just one place where we increment n_all_frozen before if-else block. >} >vacuum_delay_point(); >next_unskippable_block++; >+ n_skipped++; >} >} Also, instead of incrementing n_skipped everytime, it can be set to 'next_unskippable_block' or 'next_unskippable_block -blkno' once at the end of the loop. Thank you, Rahila Syed On Thu, Aug 25, 2016 at 11:52 AM, Masahiko Sawadawrote: > On Thu, Aug 25, 2016 at 1:41 AM, Anastasia Lubennikova > wrote: > > The following review has been posted through the commitfest application: > > make installcheck-world: tested, passed > > Implements feature: not tested > > Spec compliant: not tested > > Documentation:not tested > > > > Hi, > > I haven't tested the performance yet, but the patch itself looks pretty > good > > and reasonable improvement. > > I have a question about removing the comment. It seems to be really > tricky > > moment. How do we know that all-frozen block hasn't changed since the > > moment we checked it? > > I think that we don't need to check whether all-frozen block hasn't > changed since we checked it. > Even if the all-frozen block has changed after we saw it as an > all-frozen page, we can update the relfrozenxid. > Because any new XIDs just added to that page are newer than the > GlobalXmin we computed. > > Am I missing something? > > In this patch, since we count the number of all-frozen page even > during not an aggresive scan, I thought that we don't need to check > whether these blocks were all-frozen pages. > > > I'm going to test the performance this week. > > I wonder if you could send a test script or describe the steps to test > it? > > This optimization reduces the number of scanning visibility map when > table is almost visible or frozen.. > So it would be effective for very large table (more than several > hundreds GB) which is almost all-visible or all-frozen pages. > > For example, > 1. Create very large table. > 2. Execute VACUUM FREEZE to freeze all pages of table. > 3. Measure the execution time of VACUUM FREEZE. > I hope that the second VACUUM FREEZE become fast a little. > > I modified the comment, and attached v2 patch. > > Regards, > > -- > Masahiko Sawada > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] PATCH: two slab-like memory allocators
On 09/25/2016 08:48 PM, Petr Jelinek wrote: Hi Tomas, On 02/08/16 17:44, Tomas Vondra wrote: This patch actually includes two new memory allocators (not one). Very brief summary (for more detailed explanation of the ideas, see comments at the beginning of slab.c and genslab.c): Slab * suitable for fixed-length allocations (other pallocs fail) * much simpler than AllocSet (no global freelist management etc.) * free space is tracked per block (using a simple bitmap) * which allows freeing the block once all chunks are freed (AllocSet will hold the memory forever, in the hope of reusing it) GenSlab --- * suitable for non-fixed-length allocations, but with chunks of mostly the same size (initially unknown, the context will tune itself) * a combination AllocSet and Slab (or a sequence of Slab allocators) * the goal is to do most allocations in Slab context * there's always a single 'current' Slab context, and every now and and then it's replaced with a new generation (with the chunk size computed from recent requests) * the AllocSet context is used for chunks too large for current Slab So it's just wrapper around the other two allocators to make this usecase easier to handle. Do you expect there will be use for the GenSlab eventually outside of the reoderbuffer? Yes, you might say it's just a wrapper around the other two allocators, but it *also* includes the logic of recomputing chunk size etc. I haven't thought about other places that might benefit from these new allocators very much - in general, it's useful for places that produce a stream of equally-sized items (GenSlab relaxes this), that are pfreed() in ~FIFO manner (i.e. roughly in the order of allocation). So none of this is meant as a universal replacement of AllocSet, but in the suitable cases the results seem really promising. For example for the simple test query in [1], the performance improvement is this: N | master | patched - 1 |100ms |100ms 5 | 15000ms |350ms 10 | 146000ms |700ms 20 |? | 1400ms That's a fairly significant improvement, and the submitted version of the patches should perform even better (~2x, IIRC). I agree that it improves performance quite nicely and that reoderbuffer could use this. About the code. I am not quite sure that this needs to be split into two patches especially if 1/3 of the second patch is the removal of the code added by the first one and otherwise it's quite small and straightforward. That is unless you expect the GenSlab to not go in. I don't know - it seemed natural to first introduce the Slab, as it's easier to discuss it separately, and it works for 2 of the 3 contexts needed in reorderbuffer. GenSlab is an improvement of Slab, or rather based on it, so that it works for the third context. And it introduces some additional ideas (particularly the generational design, etc.) Of course, none of this means it has to be committed in two separate chunks, or that I don't expect GenSlab to get committed ... Slab: In general it seems understandable, the initial description helps to understand what's happening well enough. One thing I don't understand however is why the freelist is both array and doubly linked list and why there is new implementation of said doubly linked list given that we have dlist. Hmm, perhaps that should be explained better. In AllocSet, we only have a global freelist of chunks, i.e. we have a list of free chunks for each possible size (there's 11 sizes, starting with 8 bytes and then doubling the size). So freelist[0] is a list of free 8B chunks, freelist[1] is a list of free 16B chunks, etc. In Slab, the freelist has two levels - first there's a bitmap on each block (which is possible, as the chunks have constant size), tracking which chunks of that particular block are free. This makes it trivial to check that all chunks on the block are free, and free the whole block (which is impossible with AllocSet). Second, the freelist at the context level tracks blocks with a given number of free chunks - so freelist[0] tracks completely full blocks, freelist[1] is a list of blocks with 1 free chunk, etc. This is used to reuse space on almost full blocks first, in the hope that some of the less full blocks will get completely empty (and freed to the OS). Is that clear? +/* + * SlabContext is our standard implementation of MemoryContext. + * Really? Meh, that's clearly a bogus comment. +/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + */ Is this true? Why? And if it is then why doesn't the SlabChunk actually match the StandardChunkHeader? It is true, a lot of stuff in MemoryContext infrastructure relies on that. For example when you do pfree(ptr), we actually do something like header =
Re: [HACKERS] Better tracking of free space during SP-GiST index build
On 09/25/2016 08:33 PM, Oleg Bartunov wrote: On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondrawrote: On 09/22/2016 07:37 PM, Tom Lane wrote: Tomas Vondra writes: ... I've tried increasing the cache size to 768 entries, with vast majority of them (~600) allocated to leaf pages. Sadly, this seems to only increase the CREATE INDEX duration a bit, without making the index significantly smaller (still ~120MB). Yeah, that's in line with my results: not much further gain from a larger cache. Though if you were testing with the same IRRExplorer data, it's not surprising that our results would match. Would be good to try some other cases... Agreed, but I don't have any other data sets at hand. One possibility would be to generate something randomly (e.g. it's not particularly difficult to generate random IP addresses), but I'd much rather use some real-world data sets. Tomas, I have one real dataset, which I used for testing spgist (https://www.postgresql.org/message-id/caf4au4zxd2xov0a__fu7xohxsiwjzm1z2xhs-ffat1dzb9u...@mail.gmail.com) Let me know if you need it. Sure, that would be useful. I think it would be useful to make repository of such data sets, so that patch authors & reviewers can get a reasonable collection of data sets if needed, instead of scrambling every time. Opinions? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
Hi Tomas, On 02/08/16 17:44, Tomas Vondra wrote: > > This patch actually includes two new memory allocators (not one). Very > brief summary (for more detailed explanation of the ideas, see comments > at the beginning of slab.c and genslab.c): > > Slab > > * suitable for fixed-length allocations (other pallocs fail) > * much simpler than AllocSet (no global freelist management etc.) > * free space is tracked per block (using a simple bitmap) > * which allows freeing the block once all chunks are freed (AllocSet > will hold the memory forever, in the hope of reusing it) > > GenSlab > --- > * suitable for non-fixed-length allocations, but with chunks of mostly > the same size (initially unknown, the context will tune itself) > * a combination AllocSet and Slab (or a sequence of Slab allocators) > * the goal is to do most allocations in Slab context > * there's always a single 'current' Slab context, and every now and and > then it's replaced with a new generation (with the chunk size computed > from recent requests) > * the AllocSet context is used for chunks too large for current Slab > So it's just wrapper around the other two allocators to make this usecase easier to handle. Do you expect there will be use for the GenSlab eventually outside of the reoderbuffer? > So none of this is meant as a universal replacement of AllocSet, but in > the suitable cases the results seem really promising. For example for > the simple test query in [1], the performance improvement is this: > > N | master | patched >- > 1 |100ms |100ms > 5 | 15000ms |350ms >10 | 146000ms |700ms >20 |? | 1400ms > > That's a fairly significant improvement, and the submitted version of > the patches should perform even better (~2x, IIRC). > I agree that it improves performance quite nicely and that reoderbuffer could use this. About the code. I am not quite sure that this needs to be split into two patches especially if 1/3 of the second patch is the removal of the code added by the first one and otherwise it's quite small and straightforward. That is unless you expect the GenSlab to not go in. Slab: In general it seems understandable, the initial description helps to understand what's happening well enough. One thing I don't understand however is why the freelist is both array and doubly linked list and why there is new implementation of said doubly linked list given that we have dlist. > +/* > + * SlabContext is our standard implementation of MemoryContext. > + * Really? > +/* > + * SlabChunk > + * The prefix of each piece of memory in an SlabBlock > + * > + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. > + */ Is this true? Why? And if it is then why doesn't the SlabChunk actually match the StandardChunkHeader? > +#define SlabPointerIsValid(pointer) PointerIsValid(pointer) What's the point of this given that it's defined in the .c file? > +static void * > +SlabAlloc(MemoryContext context, Size size) > +{ > + Slabset = (Slab) context; > + SlabBlock block; > + SlabChunk chunk; > + int idx; > + > + AssertArg(SlabIsValid(set)); > + > + Assert(size == set->chunkSize); I wonder if there should be stronger protection (ie, elog) for the size matching. > +static void * > +SlabRealloc(MemoryContext context, void *pointer, Size size) > +{ > + Slabset = (Slab)context; > + > + /* can't do actual realloc with slab, but let's try to be gentle */ > + if (size == set->chunkSize) > + return pointer; > + > + /* we can't really do repalloc with this allocator */ > + Assert(false); This IMHO should definitely be elog. > +static void > +SlabCheck(MemoryContext context) > +{ > + /* FIXME */ > +} Do you plan to implement this interface? > +#define SLAB_DEFAULT_BLOCK_SIZE 8192 > +#define SLAB_LARGE_BLOCK_SIZE(8 * 1024 * 1024) I am guessing this is based on max_cached_tuplebufs? Maybe these could be written with same style? GenSlab: Since this is relatively simple wrapper it looks mostly ok to me. The only issue I have here is that I am not quite sure about those FIXME functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to mcxt but I guess the alternative there is to error out so this is probably preferable. Would want to hear other opinions here. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Better tracking of free space during SP-GiST index build
On Sat, Sep 24, 2016 at 11:32 PM, Tomas Vondrawrote: > On 09/22/2016 07:37 PM, Tom Lane wrote: >> >> Tomas Vondra writes: >> >>> ... I've tried increasing the cache size to 768 >>> entries, with vast majority of them (~600) allocated to leaf pages. >>> Sadly, this seems to only increase the CREATE INDEX duration a bit, >>> without making the index significantly smaller (still ~120MB). >> >> >> Yeah, that's in line with my results: not much further gain from a >> larger cache. Though if you were testing with the same IRRExplorer >> data, it's not surprising that our results would match. Would be >> good to try some other cases... >> > > Agreed, but I don't have any other data sets at hand. One possibility would > be to generate something randomly (e.g. it's not particularly difficult to > generate random IP addresses), but I'd much rather use some real-world data > sets. Tomas, I have one real dataset, which I used for testing spgist (https://www.postgresql.org/message-id/caf4au4zxd2xov0a__fu7xohxsiwjzm1z2xhs-ffat1dzb9u...@mail.gmail.com) Let me know if you need it. > >>> >>> >>> One thing I'd change is making the SpGistLUPCache dynamic, i.e. >>> storing the size and lastUsedPagesMap on the meta page. That >>> should allow us resizing the cache and tweak lastUsedPagesMap in >>> the future. >> >> >> Yeah, probably a good idea. I had thought of bumping >> SPGIST_MAGIC_NUMBER again if we want to revisit the cache size; but >> keeping it as a separate field won't add noticeable cost, and it >> might save some trouble. >> > > I see you plan to track only the cache size, while I proposed to track also > the map, i.e. number of pages per category. I think that'd useful in case we > come up with better values (e.g. more entries for leaf pages), or even > somewhat adaptive way. > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Stopping logical replication protocol
>It looks like you didn't import my updated patches, so I've rebased your new patches on top of them. Yes, i forgot it, sorry. Thanks for you fixes. >I did't see you explain why this was removed: -/* fast path */ -/* Try to flush pending output to the client */ -if (pq_flush_if_writable() != 0) -WalSndShutdown(); - -if (!pq_is_send_pending()) -return; I remove it, because during decode long transaction code, we always exist from function by fast path and not receive messages from client. Now we always include in 'for' loop and executes /* Check for input from the client */ ProcessRepliesIfAny(); >Some of the changes to pg_recvlogical look to be copied from >receivelog.c, most notably the functions ProcessKeepalive and >ProcessXLogData . During refactoring huge function I also notice It, but decide not include additional changes in already huge patch. I only split method that do everything to few small functions. >I was evaluating the tests and I don't think they actually demonstrate >that the patch works. There's nothing done to check that >pg_recvlogical exited because of client-initiated CopyDone. While >looking into that I found that it also never actually hits >ProcessCopyDone(...) because libpq handles the CopyDone reply from the >server its self and treats it as end-of-stream. So the loop in >StreamLogicalLog will just end and ProcessCopyDone() is dead code. The main idea was about fast stop logical replication as it available, because if stop replication gets more them 1 seconds it takes more pain. That's why my tests not contained check pg_reclogincal output, only time spend on stop replication(CopyDone exchange). Increase this timeout to 3 or 5 second hide problems that this PR try fix, that's why I think this lines should be restore to state from previous patch. ``` ok($spendTime <= 5, # allow extra time for slow machines "pg_recvlogical exited promptly on signal when decoding"); ok((time() - $cancelTime) <= 3, # allow extra time for slow machines "pg_recvlogical exited promptly on sigint when idle" ); ``` Also I do not understand why we do $proc->pump while $proc->pumpable; after send signal to process, why we can not just wait stop replication slot? 2016-09-23 8:01 GMT+03:00 Craig Ringer: > On 19 September 2016 at 07:12, Vladimir Gordiychuk > wrote: > > New patch in attach. 0001 and 0002 without changes. > > 0003 - patch contain improvements for pg_recvloginca, now pg_recvloginca > > after receive SIGINT will send CopyDone package to postgresql and wait > from > > server CopyDone. For backward compatible after repeat send SIGINT > > pg_recvloginca will continue immediately without wait CopyDone from > server. > > 0004 patch contain regression tests on scenarios that fix 0001 and 0002 > > patches. > > Great. > > Thanks for this. > > > First observation (which I'll fix in updated patch): > > > > It looks like you didn't import my updated patches, so I've rebased > your new patches on top of them. > > Whitespace issues: > > $ git am ~/Downloads/0003-Add-ability-for-pg_recvlogical-safe-stop- > replication.patch > Applying: Add ability for pg_recvlogical safe stop replication > /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:140: indent > with spaces. >msgBuf + hdr_len + bytes_written, > /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:550: indent > with spaces. > /* Backward compatible, allow force interrupt logical replication > /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:551: indent > with spaces. > * after second SIGINT without wait CopyDone from server > /home/craig/projects/2Q/postgres/.git/rebase-apply/patch:552: indent > with spaces. > */ > warning: 4 lines add whitespace errors. > > > Remember to use "git log --check" before sending patches. > > Also, comment style, please do > > /* this */ > > or > > /* > * this > */ > > not > > /* this > */ > > > > > I did't see you explain why this was removed: > > -/* fast path */ > -/* Try to flush pending output to the client */ > -if (pq_flush_if_writable() != 0) > -WalSndShutdown(); > - > -if (!pq_is_send_pending()) > -return; > > > > I fixed a warning introduced here: > > > pg_recvlogical.c: In function ‘ProcessXLogData’: > pg_recvlogical.c:289:2: warning: ISO C90 forbids mixed declarations > and code [-Wdeclaration-after-statement] > int bytes_left = msgLength - hdr_len; > ^ > > > Some of the changes to pg_recvlogical look to be copied from > receivelog.c, most notably the functions ProcessKeepalive and > ProcessXLogData . I thought that rather than copying them, shouldn't > the existing ones be modified to meet your needs? But it looks like > the issue is that a fair chunk of the rest of pg_recvlogical doesn't > re-use code from receivelog.c either; for example, pg_recvlogical's > sendFeedback differs from receivelog.c's sendFeedback. So >
Re: [HACKERS] Password identifiers, protocol aging and SCRAM protocol
On 9/3/16 8:36 AM, Michael Paquier wrote: > > Attached is a new series: * [PATCH 1/8] Refactor SHA functions and move them to src/common/ I'd like to see more code comments in sha.c (though I realize this was copied directly from pgcrypto.) I tested by building with and without --with-openssl and running make check for the project as a whole and the pgcrypto extension. I notice that the copyright from pgcrypto/sha1.c was carried over but not the copyright from pgcrypto/sha2.c. I'm no expert on how this works, but I believe the copyright from sha2.c must be copied over. Also, are there any plans to expose these functions directly to the user without loading pgcrypto? Now that the functionality is in core it seems that would be useful. In addition, it would make this patch stand on its own rather than just being a building block * [PATCH 2/8] Move encoding routines to src/common/ I wonder if it is confusing to have two of encode.h/encode.c. Perhaps they should be renamed to make them distinct? * [PATCH 3/8] Switch password_encryption to a enum Does not apply on HEAD (98c2d3332): error: patch failed: src/backend/commands/user.c:139 error: src/backend/commands/user.c: patch does not apply error: patch failed: src/include/commands/user.h:15 error: src/include/commands/user.h: patch does not apply For here on I used 39b691f251 for review and testing. I seems you are keeping on/off for backwards compatibility, shouldn't the default now be "md5"? -#password_encryption = on +#password_encryption = on # on, off, md5 or plain * [PATCH 4/8] Refactor decision-making of password encryption into a single routine +++ b/src/backend/commands/user.c + new_record[Anum_pg_authid_rolpassword - 1] = + CStringGetTextDatum(encrypted_passwd); pfree(encrypted_passwd) here or let it get freed with the context? * [PATCH 5/8] Create generic routine to fetch password and valid until values for a role Couldn't md5_crypt_verify() be made more general and take the hash type? For instance, password_crypt_verify() with the last param as the new password type enum. * [PATCH 6/8] Support for SCRAM-SHA-256 authentication +++ b/contrib/passwordcheck/passwordcheck.c + case PASSWORD_TYPE_SCRAM: + /* unfortunately not much can be done here */ + break; Why can't we at least do the same check as md5 to make sure the username was not used as the password? +++ b/src/backend/libpq/auth.c +* without relying on the length word, but we hardly care about protocol +* version or older anymore.) Do you mean protocol version 2 or older? +++ b/src/backend/libpq/crypt.c return STATUS_ERROR;/* empty password */ + Looks like a stray LF. +++ b/src/backend/parser/gram.y + SAVEPOINT SCHEMA SCRAM SCROLL SEARCH SECOND_P SECURITY SELECT SEQUENCE Doesn't this belong in patch 7? Even in patch 7 it doesn't appear that SCRAM is a keyword since the protocol specified after USING is quoted. I tested this patch using both md5 and scram and was able to get both of them to working separately. However, it doesn't look like they can be used in conjunction since the pg_hba.conf entry must specify either m5 or scram (though the database can easily contain a mixture). This would probably make a migration very unpleasant. Is there any chance of a mixed mode that will allow new passwords to be set as scram while still honoring the old md5 passwords? Or does that cause too many complications with the protocol? * [PATCH 7/8] Add clause PASSWORD val USING protocol to CREATE/ALTER ROLE +++ b/doc/src/sgml/ref/create_role.sgml +Sets the role's password using the wanted protocol. How about "Sets the role's password using the requested procotol." +an unencrypted password. If the presented password string is already +in MD5-encrypted or SCRAM-encrypted format, then it is stored encrypted +as-is. How about, "If the password string is..." * [PATCH 8/8] Add regression tests for passwords OK. On the whole I find this patch set easier to digest than what was submitted for 9.6. It is more targeted but still provides very valuable functionality. I'm a bit concerned that a mixture of md5/scram could cause confusion and think this may warrant discussion somewhere in the documentation since the idea is for users to migrate from md5 to scram. -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] CommitFest 2016-09 status summary
Hi all, The current status summary is: Needs review: 65 Waiting on author: 34 Ready for Commiter: 9 Commited: 88 Moved to next CF: 1 Rejected: 11 Returned with feedback: 11 TOTAL: 219 The current progress is ~51%. We're in the last week of this CF and we still with about ~30% of patches needing review and ~16% waiting on author to take an action. So if you have time please consider reviewing at least one patch. We need your help! Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] Possibly too stringent Assert() in b-tree code
Amit Kapilawrites: > On Fri, Sep 23, 2016 at 12:21 AM, Tom Lane wrote: >> This is clearly an oversight in Simon's patch fafa374f2, which introduced >> this code without any consideration for the possibility that the page >> doesn't have a valid special area. ... >> but I'm not very clear on whether this is a safe fix, mainly because >> I don't understand what the reuse WAL record really accomplishes. >> Maybe we need to instead generate a reuse record with some special >> transaction ID indicating worst-case assumptions? > I think it is basically to ensure that the queries on standby should > not be considering the page being recycled as valid and if there is > any pending query to which this page is visible, cancel it. If this > understanding is correct, then I think the fix you are proposing is > correct. After thinking about it some more, I do not understand why we are emitting WAL here at all in *any* case, either for a new page or for a deleted one that we're about to recycle. Why wouldn't the appropriate actions have been taken when the page was deleted, or even before that when its last tuple was deleted? In other words, if I'm left to fix it, I will do so by reverting fafa374f2 lock stock and barrel. I think it was ill-considered and unnecessary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Development build with uuid-ossp support - macOS
Enrique Mwrites: > Is there a way to build postgresql and install with uuid-ossp without > having to build the documentation? I don't really need the documentation > for my test. Use "make all contrib" instead of "make world". regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Development build with uuid-ossp support - macOS
I am going to try and switch to macports instead... I see the documentation provides the macports command for installing the toolset... https://www.postgresql.org/docs/9.6/static/docguide-toolsets.html Thank you. On Sat, Sep 24, 2016 at 4:52 PM Enrique Mwrote: > I am trying to do a macOS build of postgresql (9.6 stable branch from > github) with the uuid-ossp contrib by typing "make world" but it fails due > to an openjade error (I did install openjade using homebrew using this > setup https://github.com/petere/homebrew-sgml). > > Is there a way to build postgresql and install with uuid-ossp without > having to build the documentation? I don't really need the documentation > for my test. > > Thank you, > > Enrique > > >
Re: [HACKERS] pg_basebackup creates a corrupt file for pg_stat_tmp and pg_replslot on a backup location
Hi Peter, > I just wanted to update you, I have taken this commit fest entry patch > to review because I think it will be addresses as part of "Exclude > additional directories in pg_basebackup", which I'm also reviewing. > Therefore, I'm not actually planning on discussing this patch further. > Please correct me if this assessment does not match your expectations. Thanks for the update. I am absolutely OK with it. I feel it would be a good idea to review "Exclude additional directories in pg_basebackup" which also addresses the issue reported by me. With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: Implement failover on libpq connect level.
I have some more comments on libpq-failover-8.patch + /* FIXME need to check that port is numeric */ Is this still applicable?. 1) + /* + * if there is more than one host in the connect string and + * target_server_type is not specified explicitely, set + * target_server_type to "master" + */ I think we need comments to know why we change default value based on number of elements in connection string. why default in not “any" when node count > 1. 2) + /* loop over all the host specs in the node variable */ + for (node = nodes; node->host != NULL || node->port != NULL; node++) { I think instead of loop if we can move these code into a separate function say pg_add_to_addresslist(host, port, ) this helps in removing temp variables like "struct node info” and several heap calls around it. 3) +static char * +get_port_from_addrinfo(struct addrinfo * ai) +{ + char port[6]; + + sprintf(port, "%d", htons(((struct sockaddr_in *) ai->ai_addr)->sin_port)); + return strdup(port); +} Need some comments for this function. We use strdup in many places no where we handle memory allocation failure. 4) + /* + * consult connection options and check if RO connection is OK RO + * connection is OK if readonly connection is explicitely + * requested or if replication option is set, or just one host is + * specified in the connect string. + */ + if (conn->pghost == NULL || !conn->target_server_type || + strcmp(conn->target_server_type, "any") == 0) Comments not in sink with code. On Wed, Sep 14, 2016 at 12:28 AM, Robert Haaswrote: > On Fri, Sep 9, 2016 at 4:49 AM, Victor Wagner wrote: > > Random permutation is much more computationally complex than random > > picking. > > It really isn't. The pseudocode given at > https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle is all of 4 > lines long, and one of those lines is a comment. The C implementation > is longer, but not by much. > > Mind you, I haven't read the patch, so I don't know whether using > something like this would make the code simpler or more complicated. > However, if the two modes of operation are (1) try all hosts in random > order and (2) try all hosts in the listed order, it's worth noting > that the code for the second thing can be used to implement the first > thing just by shuffling the list before you begin. > > > So, using random permutation instead of random pick wouln't help. > > And keeping somewhere number of elements in the list wouldn't help > > either unless we change linked list to completely different data > > structure which allows random access. > > Is there a good reason not to use an array rather than a linked list? > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] Write Ahead Logging for Hash Indexes
Hi All, > I forgot to mention that Ashutosh has tested this patch for a day > using Jeff's tool and he didn't found any problem. Also, he has found > a way to easily reproduce the problem. Ashutosh, can you share your > changes to the script using which you have reproduce the problem? I made slight changes in Jeff Janes patch (crash_REL10.patch) to reproduce the issue reported by him earlier [1]. I changed the crash_REL10.patch such that a torn page write happens only for a bitmap page in hash index. As described by Amit in [2] & [3] we were not logging full page image for bitmap page and that's the reason we were not be able to restore bitmap page in case it is a torn page which would eventually result in below error. 38422 01000 2016-09-19 16:25:50.061 PDT:WARNING: page verification failed, calculated checksum 65067 but expected 21260 38422 01000 2016-09-19 16:25:50.061 PDT:CONTEXT: xlog redo at 3F/22053B50 for Hash/ADD_OVFL_PAGE: bmsize 4096, bmpage_found T 38422 XX001 2016-09-19 16:25:50.071 PDT:FATAL: invalid page in block 9 of relation base/16384/17334 Jeff Jane's original patch had following conditions for a torn page write: if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress()) nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ/3); Ashutosh has added some more condition's in above if( ) to ensure that the issue associated with bitmap page gets reproduced easily: if (JJ_torn_page > 0 && counter++ > JJ_torn_page && !RecoveryInProgress() && bucket_opaque->hasho_page_id == HASHO_PAGE_ID && bucket_opaque->hasho_flag & LH_BITMAP_PAGE) nbytes = FileWrite(v->mdfd_vfd, buffer, 24); Also, please note that i am allowing only 24 bytes of data i.e. just a page header to be written into a disk file so that even if a single overflow page is added into hash index table the issue will be observed. Note: You can find Jeff Jane's patch for torn page write at - [4]. [1] - https://www.postgresql.org/message-id/CAMkU%3D1zGcfNTWWxnud8j_p0vb1ENGcngkwqZgPUEnwZmAN%2BXQw%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1LmQZGnYhSHXDDCOsSb_0U-gsxReEmSDRgCZr%3DAdKbTEg%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAA4eK1%2Bk9tGPw7vOACRo%2B4h5n%3DutHnSuGgMoJrLANybAjNBW9w%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAMkU%3D1xRt8jBBB7g_7K41W00%3Dbr9UrxMVn_rhWhKPLaHfEdM5A%40mail.gmail.com With Regards, Ashutosh Sharma EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On 25/09/16 18:18, Amit Kapila wrote: On Sat, Sep 24, 2016 at 10:49 PM, Greg Starkwrote: On Thu, Sep 22, 2016 at 3:23 AM, Tom Lane wrote: But to kick the hash AM as such to the curb is to say "sorry, there will never be O(1) index lookups in Postgres". Well there's plenty of halfway solutions for that. We could move hash indexes to contrib or even have them in core as experimental_hash or unlogged_hash until the day they achieve their potential. We definitely shouldn't discourage people from working on hash indexes Okay, but to me it appears that naming it as experimental_hash or moving it to contrib could discourage people or at the very least people will be less motivated. Thinking on those lines a year or so back would have been a wise direction, but now when already there is lot of work done (patches to make it wal-enabled, more concurrent and performant, page inspect module are available) for hash indexes and still more is in progress, that sounds like a step backward then step forward. +1 I think so too - I've seen many email threads over the years on this list that essentially state "we need hash indexes wal logged to make progress with them"...and Amit et al has/have done this (more than this obviously - made 'em better too) and I'm astonished that folk are suggesting anything other than 'commit this great patch now!'... regards Mark -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers