Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization
On Wed, Oct 18, 2017 at 5:32 AM, Fabien COELHO wrote: > > Hello Masahiko-san, > >>> Attached the updated version patch. >> >> >> Applies, compiles, make check & tap test ok, doc is fine. >> >> All is well for me: the feature is useful, code is simple and clean, it is >> easy to invoke, easy to extend as well, which I'm planning to do once it >> gets in. >> >> I switched the patch to "Ready for Committers". No doubt they will have >> their own opinions about it. Let's wait and see. > > > The patch needs a rebase in the documentation because of the xml-ilization > of the sgml doc. > Thank you for the notification! Attached rebased patch. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center pgbench_custom_initialization_v16.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
Amit Langote wrote: > On 2017/10/18 1:52, Alvaro Herrera wrote: > > Alvaro Herrera wrote: > >> Robert Haas wrote: > >>> Implement table partitioning. > >> > >> Is it intentional that you can use ALTER TABLE OWNER TO on the parent > >> table, and that this does not recurse to modify the partitions' owners? > >> This doesn't seem to be mentioned in comments nor documentation, so it > >> seems an oversight to me. > > Hmm, I would say of it that the new partitioning didn't modify the > behavior that existed for inheritance. > > That said, I'm not sure if the lack of recursive application of ownership > change to descendant tables is unintentional. My view is that the fact that partitioning uses inheritance is just an implementation detail. We shouldn't let historical behavior for inheritance dictate behavior for partitioning. Inheritance has many undesirable warts. > > The alter table docs say that ONLY must be specified if one does not > > want to modify descendants, so I think this is a bug. > > Just to clarify, if we do think of it as a bug, then it will apply to the > inheritance case as well, right? I'd leave it alone. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?
Hi guys, please help me to understand the proposal. Take a simple configuration: Two "live" systems S1 and S2 and for each of them a Replica R1 and R2. So S1 sends data to R1 and S2 to R2. S1 has foreign tables on S2 with write access, meaning you can change a few data from S1 where information is actually in the foreign table sitting as real table on S2. So what does the system after a change and committing it do? S1 would write to S2, R1 to R2. Assuming that I'd switch off replication from S2 to R2 everything would be fine. That is what you want, don't you?What would happen when the DBA forgets to switch of the replication from S2 to R2 in your scenario? A higher workload? What happens when R2 fails? In the S2 -> R2 configuration the changes of S2 are saved until R2 is up again, isn't it?What would happen in the case that R1 should write to R2? Is there a write or does it fail because the foreign table on R2 isn't available? Regards,Wolfgang Michael Paquier schrieb am 3:49 Mittwoch, 18.Oktober 2017: On Wed, Oct 18, 2017 at 9:14 AM, Craig Ringer wrote: > Superficially at least, it sounds like a good idea. Indeed. > We should only need a virtual xid when we're working with foreign > tables since we don't do any local heap changes. > > How's it work with savepoints? That's one thing to worry about. At least to me, it feels like cheating to allow an INSERT query to happen for a transaction which is read-only actually read-only because XactReadOnly is set to true when the transaction starts. I am wondering if we should extend BEGIN TRANSACTION with a sort of "WRITE ONLY FOREIGN" mode, which allows read queries as well as write queries for foreign tables, because we know that those will not generate WAL locally. This way it would be possible to block as well INSERT queries happening in a transaction which should be intrinsically read-only. + if (rte->relkind == 'f') + continue; Better to use RELKIND_FOREIGN_TABLE here. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] v10 telease note for pg_basebackup refers to old --xlog-method argument
Pushed. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter table doc fix
Amit Langote wrote: > Hi. > > Noticed that a alter table sub-command's name in Description (where it's > OWNER) differs from that in synopsis (where it's OWNER TO). Attached > patch to make them match, if the difference is unintentional. I agree -- pushed. This paragraph The actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY), as well as the actions TRIGGER, CLUSTER, OWNER, and TABLESPACE never recurse to descendant tables; that is, they always act as though ONLY were specified. Adding a constraint recurses only for CHECK constraints that are not marked NO INHERIT. is a bit annoying, though I think it'd be worse if we "fix" it to be completely strict about the subcommands it refers to. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Thu, Oct 12, 2017 at 2:46 AM, Michael Paquier wrote: > On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas wrote: >> On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier >> wrote: >>> v4 looks correct to me. Testing it through (pgbench and some custom >>> queries) I have not spotted issues. If the final decision is to use >>> 64-bit query IDs, then this patch could be pushed. >> >> Cool. Committed, thanks for the review. > > The final result looks fine for me. Thanks. Sorry for replying so late, but I have a perhaps naive question about the hashtable handling with this new version. IIUC, the shared hash table is now created with HASH_BLOBS instead of HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash table will use tag_hash() to compute the hash key. tag_hash() uses all the bits present in the given struct, so this can be problematic if padding bits are not zeroed, which isn't garanted by C standard for local variable. WIth current pgssHashKey definition, there shouldn't be padding bits, so it should be safe. But I wonder if adding an explicit memset() of the key in pgss_store() could avoid extension authors to have duplicate entries if they rely on this code, or prevent future issue in the unlikely case of adding other fields to pgssHashKey. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
This check is odd (tablecmds.c ATExecAttachPartition line 13861): /* Temp parent cannot have a partition that is itself not a temp */ if (rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP && attachrel->rd_rel->relpersistence != RELPERSISTENCE_TEMP) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot attach a permanent relation as partition of temporary relation \"%s\"", RelationGetRelationName(rel; This seems to work (i.e. it's allowed to create a temp partition on a permanent parent and not vice-versa, which you'd think makes sense) but it's illusory, because if two sessions try to create temp partitions for overlapping values, the second session fails with a silly error message. To be more precise, do this in one session: CREATE TABLE p (a int, b int) PARTITION BY RANGE (a); CREATE TEMP TABLE p1 PARTITION OF p FOR VALUES FROM (0) TO (10); then without closing that one, in another session repeat the second command: alvherre=# CREATE TEMP TABLE p1 PARTITION OF p FOR VALUES FROM (0) TO (10); ERROR: partition "p1" would overlap partition "p1" which is not what I would have expected. Maybe there are combinations of different persistence values that can be allowed to differ (an unlogged partition is probably OK with a permanent parent), but I don't think the current check is good enough. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
On Wed, Oct 18, 2017 at 4:53 AM, Alvaro Herrera wrote: > My view is that the fact that partitioning uses inheritance is just an > implementation detail. We shouldn't let historical behavior for > inheritance dictate behavior for partitioning. Inheritance has many > undesirable warts. I am OK with filing down warts over time, but to be clear, I think it's too late to change things like this in v10, which has shipped. >> > The alter table docs say that ONLY must be specified if one does not >> > want to modify descendants, so I think this is a bug. >> >> Just to clarify, if we do think of it as a bug, then it will apply to the >> inheritance case as well, right? > > I'd leave it alone. I don't think it's a good idea for table partitioning and table inheritance to behave differently. Generally, I think we don't want to end up with three categories of behavior: commands that recurse always, commands that recurse to partitions but not inheritance children, and commands that don't recurse. If we now think that ALTER TABLE .. OWNER TO should recurse, then we should change that to do so in all cases and document it as a backward incompatibility. I think this issue has very little to do with table partitioning per se. As Amit says, this is a longstanding behavior and it would have been far stranger than where we are if the commit to implement table partitioning had changed it. I suggest starting new threads for changes you want to make instead of tacking them all onto this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
Robert Haas writes: > On Wed, Oct 18, 2017 at 4:53 AM, Alvaro Herrera > wrote: >> My view is that the fact that partitioning uses inheritance is just an >> implementation detail. We shouldn't let historical behavior for >> inheritance dictate behavior for partitioning. Inheritance has many >> undesirable warts. > I don't think it's a good idea for table partitioning and table > inheritance to behave differently. I'm with Robert on this one. I'd be in favor of changing both cases to make ALTER OWNER recurse by default. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
On Tue, Oct 17, 2017 at 09:07:40AM -0500, Justin Pryzby wrote: > On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote: > > Justin Pryzby writes: > > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote: > > >> Anyway, can give this patch a try? > > > > The trick in this sort of situation is to make sure you build binaries > > that match your existing install in every way except having the added > > patch, and maybe getting installed into a different directory. > > I'm familiar with that process; but, these are PG10 binaries from PGDG for > centos6 x64. I had to add symlinks for postgis library, but otherwise seems > to > be working fine (although I didn't preserve as many configure options as your > message would suggest I should have). On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote: > So what I'm thinking is that you need an error during perform_work_item, > and/or more than one work_item picked up in the calling loop, to make this > bug manifest. You would need to enter perform_work_item in a ..in our case probably due to interruption by LOCK TABLE, yes? On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote: > Alvaro Herrera writes: > > And I think that's because we're not > > checking that the namespace OID is a valid value before calling > > get_namespace_name on it. > > The part of your patch that adds a check on avw_database is clearly > correct and necessary. I'm thinking the change you propose in > perform_work_item is just overcomplicating code that's okay as it > stands. We don't need to optimize for the schema-went-away case. No crashes in ~28hr. It occurs to me that it's a weaker test due to not preserving most compilation options. If I understand, our crash isn't explained by the avw_database test anyway (?) Should I make clean and recompile with all non-prefix options and a minimal patch (avw_database==MyDatabaseId || continue) ? Or recompile with existing options but no patch to first verify crash occurs with locally compiled binary? Justin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
Justin Pryzby wrote: > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > preserving most compilation options. And the previous code crashes in 45 minutes? That's solid enough for me; I'll clean up the patch and push in the next few days. I think what you have now should be sufficient for the time being for your production system. > If I understand, our crash isn't explained by the avw_database test > anyway (?) I don't see why you would think that -- I disagree. > Should I make clean and recompile with all non-prefix options and a minimal > patch (avw_database==MyDatabaseId || continue) ? > > Or recompile with existing options but no patch to first verify crash occurs > with locally compiled binary? I don't think either of these actions is necessary. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
On Wed, Oct 18, 2017 at 11:27 AM, Alvaro Herrera wrote: > Maybe there are combinations of different persistence values that can be > allowed to differ (an unlogged partition is probably OK with a permanent > parent), but I don't think the current check is good enough. This is also a sort of long-standing historical problem, I think. This comment in expand_inherited_rtentry has been with us for a while: /* * It is possible that the parent table has children that are temp * tables of other backends. We cannot safely access such tables * (because of buffering issues), and the best thing to do seems to be * to silently ignore them. */ I do not find that to be a particularly sensible approach, and it's probably even less sensible in the partitioning world than it was with table inheritance. I think what we ought to do is prohibit it, but it wasn't the job of the table partitioning commit to whack this around. This is not the first example of a case where we've failed to put in place sufficiently-strong checks to prohibit references to temp tables in places where they don't make sense. See e.g. 16925c1e1fa236e4d7d6c8b571890e7c777f75d7, 948d6ec90fd35d6e1a59d0b1af8d6efd8442f0ad, 0688d84041f7963a2a00468c53aec7bb6051ef5c, a13b01853084b6c6f9c34944bc19b3dd7dc4ceb2, 5fa3418304b06967fa54052b183bf96e1193d85e, 7d6e6e2e9732adfb6a93711ca6a6e42ba039fbdb, 82eed4dba254b8fda71d429b29d222ffb4e93fca. We really did not do a good job plugging all of the cases where temp tables ought to be forbidden when that feature went in, and IMHO this is more of the tail end of that work than anything specific to partitioning. Your opinion may differ, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > Justin Pryzby wrote: > > > No crashes in ~28hr. It occurs to me that it's a weaker test due to not > > preserving most compilation options. > > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll clean up the patch and push in the next few days. I think what > you have now should be sufficient for the time being for your production > system. No - the crash happened 4 times since adding BRIN+autosummarize 6 days ago, and in once instance occured twice within 3 hours (while I was trying to query logs for the preceding crash). [pryzbyj@database ~]$ sudo grep -hE 'in postgres|Saved core' /var/log/messages* Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 sp:7ffd9b349990 error:0 in postgres[40+692000] Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 (15040512 bytes) Oct 14 18:05:35 database kernel: postmaster[26500] general protection ip:84a177 sp:7ffd9b349b88 error:0 in postgres[40+692000] Oct 14 18:05:35 database abrt[27564]: Saved core dump of pid 26500 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 (24137728 bytes) Oct 16 23:21:22 database kernel: postmaster[31543] general protection ip:4bd467 sp:7ffe08a94890 error:0 in postgres[40+692000] Oct 16 23:21:22 database abrt[570]: Saved core dump of pid 31543 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-16-23:21:22-31543 (25133056 bytes) Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000] Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 (7692288 bytes) > > If I understand, our crash isn't explained by the avw_database test > > anyway (?) > > I don't see why you would think that -- I disagree. No problem - apparently I read too far into Tom's thoughts regarding memory context. I'll continue runnning with the existing patch and come back if the issue recurs. Thanks Justin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Tue, Aug 12, 2014 at 1:52 PM, Heikki Linnakangas wrote: > On 08/06/2014 08:37 PM, Jeff Janes wrote: >> >> But now it looks like 0002 needs a rebase > > I've committed the refactoring patch, and here's a rebased and improved > version of the Windows SChannel implementation over that. > > Server-side support is now implemented too, but it's all very crude and > work-in-progress. CRLs are not supported, intermediary CAs are not > supported, and probably many other bells and whistles are missing too. But > the basics work, including cert authentication. Consider this a Proof of > Concept. Heikki, do you have any plans to work more on this? Or does anyone else? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
Robert Haas wrote: > On Wed, Oct 18, 2017 at 11:27 AM, Alvaro Herrera > wrote: > > Maybe there are combinations of different persistence values that can be > > allowed to differ (an unlogged partition is probably OK with a permanent > > parent), but I don't think the current check is good enough. > > This is also a sort of long-standing historical problem, I think. Sure. Actually, the code I'm calling attention to is ATExecAttachPartition() which was specifically written for partitioning. Looks like it was copied verbatim from ATExecAddInherit, but there's no shared code there AFAICS. I'm okay with prohibiting the case of different persistence values as you suggest. And I do suggest to back-patch that prohibition to pg10. Let me add that I'm not looking to blame anyone for what I report here. I'm very excited about the partitioning stuff and I'm happy of what was done for pg10. I'm now working on more partitioning-related changes which means I review the existing code as I go along, so I just report things that look wrong to me as I discover them, just with an interest in seeing them fixed, or documented, or at least discussed and explicitly agreed upon. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
Justin Pryzby wrote: > On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote: > > And the previous code crashes in 45 minutes? That's solid enough for > > me; I'll clean up the patch and push in the next few days. I think what > > you have now should be sufficient for the time being for your production > > system. > > No - the crash happened 4 times since adding BRIN+autosummarize 6 days ago, > and > in once instance occured twice within 3 hours (while I was trying to query > logs > for the preceding crash). Oh, okay. Then we don't know enough yet, ISTM. > [pryzbyj@database ~]$ sudo grep -hE 'in postgres|Saved core' > /var/log/messages* > Oct 13 17:22:45 database kernel: postmaster[32127] general protection > ip:4bd467 sp:7ffd9b349990 error:0 in postgres[40+692000] > Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 > (/usr/pgsql-10/bin/postgres) to > /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 (15040512 bytes) > Oct 14 18:05:35 database kernel: postmaster[26500] general protection > ip:84a177 sp:7ffd9b349b88 error:0 in postgres[40+692000] > Oct 14 18:05:35 database abrt[27564]: Saved core dump of pid 26500 > (/usr/pgsql-10/bin/postgres) to > /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 (24137728 bytes) > Oct 16 23:21:22 database kernel: postmaster[31543] general protection > ip:4bd467 sp:7ffe08a94890 error:0 in postgres[40+692000] > Oct 16 23:21:22 database abrt[570]: Saved core dump of pid 31543 > (/usr/pgsql-10/bin/postgres) to > /var/spool/abrt/ccpp-2017-10-16-23:21:22-31543 (25133056 bytes) > Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip > 0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000] > Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 > (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 > (7692288 bytes) Do you still have those core dumps? If so, would you please verify the database that autovacuum was running in? Just open each with gdb (using the original postgres binary, not the one you just installed) and do "print MyDatabaseId". > I'll continue runnning with the existing patch and come back if the issue > recurs. Thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
On Wed, Oct 18, 2017 at 07:22:27PM +0200, Alvaro Herrera wrote: > Do you still have those core dumps? If so, would you please verify the > database that autovacuum was running in? Just open each with gdb (using > the original postgres binary, not the one you just installed) and do > "print MyDatabaseId". [pryzbyj@database ~]$ gdb ccpp-2017-10-16-23:21:22-31543/coredump -ex 'p MyDatabaseId' -ex q 2>/dev/null |tail -5 Core was generated by `postgres: autovacuum worker process gtt '. Program terminated with signal 11, Segmentation fault. #0 index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178 178 LockRelId relid = relation->rd_lockInfo.lockRelId; $1 = 16400 [pryzbyj@database ~]$ gdb ccpp-2017-10-14-18:05:35-26500/coredump -ex 'p MyDatabaseId' -ex q 2>/dev/null |tail -5 Core was generated by `postgres: autovacuum worker process gtt '. Program terminated with signal 11, Segmentation fault. #0 pfree (pointer=0x298c740) at mcxt.c:954 954 (*context->methods->free_p) (context, pointer); $1 = 16400 gtt=# SELECT oid,datname FROM pg_database; 13456 | template0 16400 | gtt 13457 | postgres 1 | template1 The gtt DB is where the (only) BRIN indicies are (not sure what to conclude from that?) Justin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
Robert Haas writes: > Heikki, do you have any plans to work more on this? > Or does anyone else? FWIW, I have some interest in the Apple Secure Transport patch that is in the CF queue, and will probably pick that up at some point if no one beats me to it (but it's not real high on my to-do list). I won't be touching the Windows version though. I suspect that the folk who might be competent to review the Windows code may have correspondingly little interest in the macOS patch. This is a bit of a problem, since it would be good for someone to look at both of them, with an eye to whether there are any places in our SSL abstraction API that ought to be rethought now that we have actual non-OpenSSL implementations to compare to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SIGSEGV in BRIN autosummarize
Alvaro Herrera writes: > And the previous code crashes in 45 minutes? That's solid enough for > me; I'll clean up the patch and push in the next few days. I think what > you have now should be sufficient for the time being for your production > system. I'm still of the opinion that the presented patch isn't preventing any crashes. The failure to check avw_database would generally lead to tasks silently getting dropped in the mistaken belief that the target table has gone away. I could believe that a crash occurs if the given schema OID exists in some other DB, and the given table OID does too, but it's for a table/index of the wrong kind ... but how likely is that? Not very likely at all, given the way we generate OIDs. (And if it did happen, I'd say the cause is a failure to recheck relkind somewhere downstream of the autovac task dispatch code, anyway.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.
On Wed, Oct 18, 2017 at 1:18 PM, Alvaro Herrera wrote: > I'm okay with prohibiting the case of different persistence values as > you suggest. And I do suggest to back-patch that prohibition to pg10. I disagree. There's nothing any more broken about the way this works with partitioning in v10 than the way it works with inheritance in 9.6 or prior. Table inheritance has had warts for years, and the fact that we now have table partitioning doesn't make all of those same warts into must-fix-now bugs. They are still just warts, and they should be cleaned up through future development as we find them and have the time to do something about them. They should be documented as incompatibilities where appropriate. They should not be jammed into stable branches because users don't like it when DDL works one way in 10.1 and another way in 10.2. They don't even really like it when 10.0 works differently from 11.0, but you have to be willing to see bad decisions revisited at some point if you want progress to happen, and I certainly do. > Let me add that I'm not looking to blame anyone for what I report here. > I'm very excited about the partitioning stuff and I'm happy of what was > done for pg10. I'm now working on more partitioning-related changes > which means I review the existing code as I go along, so I just report > things that look wrong to me as I discover them, just with an interest > in seeing them fixed, or documented, or at least discussed and > explicitly agreed upon. Fair enough, but when you reply on the thread where I committed the patch and propose back-patching to the release that contained it, you make it sound like it's a bug in the patch, and I don't think either of the two things you just raised are. My complaint is not that I think you are accusing me of any sort of wrongdoing but that you're trying to justify back-patching what I think is new development by characterizing it as a bug fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement
On Wed, Oct 18, 2017 at 2:50 PM, Tom Lane wrote: > Robert Haas writes: >> Heikki, do you have any plans to work more on this? >> Or does anyone else? > > FWIW, I have some interest in the Apple Secure Transport patch that > is in the CF queue, and will probably pick that up at some point if > no one beats me to it (but it's not real high on my to-do list). > I won't be touching the Windows version though. I suspect that the > folk who might be competent to review the Windows code may have > correspondingly little interest in the macOS patch. This is a bit > of a problem, since it would be good for someone to look at both of > them, with an eye to whether there are any places in our SSL abstraction > API that ought to be rethought now that we have actual non-OpenSSL > implementations to compare to. Well, the best way to handle that might be to get some of this stuff done before we get too much later into the release cycle, so that there's time to tinker with it before the release goes out the door (or is deep in beta). However, if nobody's working on this and the other patch is someplace far down your to-do list, then I guess that isn't going to happen. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud wrote: > Sorry for replying so late, but I have a perhaps naive question about > the hashtable handling with this new version. > > IIUC, the shared hash table is now created with HASH_BLOBS instead of > HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash > table will use tag_hash() to compute the hash key. > > tag_hash() uses all the bits present in the given struct, so this can > be problematic if padding bits are not zeroed, which isn't garanted by > C standard for local variable. > > WIth current pgssHashKey definition, there shouldn't be padding bits, > so it should be safe. But I wonder if adding an explicit memset() of > the key in pgss_store() could avoid extension authors to have > duplicate entries if they rely on this code, or prevent future issue > in the unlikely case of adding other fields to pgssHashKey. I guess we should probably add additional comment to the definition of pgssHashKey warning of the danger. I'm OK with adding a memset if somebody can promise me it will get optimized away by all reasonably commonly-used compilers, but I'm not that keen on adding more cycles to protect against a hypothetical danger. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))
On Mon, Oct 16, 2017 at 8:09 PM, Noah Misch wrote: > That presupposes construction of two pieces of software, the server and the > checker, such that every disagreement is a bug in the server. But checkers > get bugs just like servers get bugs. You make a good point, which is that *some* code must be wrong when an error is raised and hardware is not to blame, but ISTM that the nuance of that really matters. The checker seems much less likely to be where bugs are, for three reasons: * There is far less code for us to maintain as compared to the volume of backend code that is effectively tested (again, not including the hidden universe of complex, unauditable firmware code that could be involved these days). * Much of the actual checking (as much as possible) is outsourced to core code that is already critically important. If that has bugs in it, then it is unlikely to be defined as an amcheck bug. * Knowing all this, we can go out of our way to do a good job of getting the design right the first time. (A sound design is far more important than actually having zero bugs.) Obviously there could be unambiguous bugs; I'm not arguing otherwise. I just hope that we can push this model as far as we need to, and perhaps accommodate verifiability as a goal for *future* development projects. We're almost doing that today; debuggability of on-disk structures is something that the community already values. This is the logical next step, IMV. > Checkers do provide a sort of > double-entry bookkeeping. When a reproducible test case prompts a checker > complaint, we'll know *some* code is wrong. I really like your double entry bookkeeping analogy. A tiny discrepancy will bubble up, even in a huge organization, and yet the underlying principles are broad and not all that complicated. > That's an admirable contribution. Thank you. I just hope that it becomes something that other contributors have some sense of ownership over. > I'm essentially saying that the server is innocent until proven guilty. It > would be cool to have a self-contained specification of PostgreSQL data files, > but where the server disagrees with the spec without causing problem > behaviors, we'd ultimately update the spec to fit the server. I might not have done a good job of explaining my position. I agree with everything you say here. I would like to see amcheck become a kind of vehicle for discussing things that we already discuss. You get a nice tool at the end, that clarifies and increases confidence in the original understanding over time (or acts as a canary-in-the-coalmine forcing function when the original understanding turns out to be faulty). The tool itself is ultimately just a bonus. Bringing it back to the concrete freeze-the-dead issue, and the question of an XID-cutoff for safely interrogating CLOG: I guess it will be possible to assess a HOT chain as a whole. We can probably do this safely while holding a super-exclusive lock on the buffer. I can probably find a way to ensure this only needs to happen in a rare slow path, when it looks like the invariant might be violated but we need to make sure (I'm already following this pattern in a couple of places). Realistically, there will be some amount of "try it and see" here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund wrote: > The precise query doesn't really matter, the observations here are more > general, I hope. > > 1) nodeGather.c re-projects every row from workers. As far as I can tell >that's now always exactly the same targetlist as it's coming from the >worker. Projection capability was added in 8538a6307049590 (without >checking whether it's needed afaict), but I think it in turn often >obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7. My >measurement shows that removing the projection yields quite massive >speedups in queries like yours, which is not too surprising. That seems like an easy and worthwhile optimization. >I suspect this just needs a tlist_matches_tupdesc check + an if >around ExecProject(). And a test, right now tests pass unless >force_parallel_mode is used even if just commenting out the >projection unconditionally. So, for this to fail, we'd need a query that uses parallelism but where the target list contains a parallel-restricted function. Also, the function should really be such that we'll reliably get a failure rather than only with some small probability. I'm not quite sure how to put together such a test case, but there's probably some way. > 2) The spinlocks both on the the sending and receiving side a quite hot: > >rafia query leader: > + 36.16% postgres postgres[.] shm_mq_receive > + 19.49% postgres postgres[.] s_lock > + 13.24% postgres postgres[.] SetLatch > >To test that theory, here are the timings for >1) spinlocks present > time: 6593.045 >2) spinlocks acuisition replaced by *full* memory barriers, which on x86 > is a lock; addl $0,0(%%rsp) > time: 5159.306 >3) spinlocks replaced by read/write barriers as appropriate. > time: 4687.610 > >By my understanding of shm_mq.c's logic, something like 3) aught to >be doable in a correct manner. There should be, in normal >circumstances, only be one end modifying each of the protected >variables. Doing so instead of using full block atomics with locked >instructions is very likely to yield considerably better performance. I think the sticking point here will be the handling of the mq_detached flag. I feel like I fixed a bug at some point where this had to be checked or set under the lock at the same time we were checking or setting mq_bytes_read and/or mq_bytes_written, but I don't remember the details. I like the idea, though. Not sure what happened to #3 on your list... you went from #2 to #4. > 4) Modulo computations in shm_mq are expensive: > >that we end up with a full blown div makes sense - the compiler can't >know anything about ringsize, therefore it can't optimize to a mask. >I think we should force the size of the ringbuffer to be a power of >two, and use a maks instead of a size for the buffer. This seems like it would require some redesign. Right now we let the caller choose any size they want and subtract off our header size to find the actual queue size. We can waste up to MAXALIGN-1 bytes, but that's not much. This would waste up to half the bytes provided, which is probably not cool. > 5) There's a *lot* of latch interactions. The biggest issue actually is >the memory barrier implied by a SetLatch, waiting for the latch >barely shows up. > >Commenting out the memory barrier - which is NOT CORRECT - improves >timing: >before: 4675.626 >after: 4125.587 > >The correct fix obviously is not to break latch correctness. I think >the big problem here is that we perform a SetLatch() for every read >from the latch. I think it's a little bit of an overstatement to say that commenting out the memory barrier is not correct. When we added that code, we removed this comment: - * Presently, when using a shared latch for interprocess signalling, the - * flag variable(s) set by senders and inspected by the wait loop must - * be protected by spinlocks or LWLocks, else it is possible to miss events - * on machines with weak memory ordering (such as PPC). This restriction - * will be lifted in future by inserting suitable memory barriers into - * SetLatch and ResetLatch. It seems to me that in any case where the data is protected by an LWLock, the barriers we've added to SetLatch and ResetLatch are redundant. I'm not sure if that's entirely true in the spinlock case, because S_UNLOCK() is only documented to have release semantics, so maybe the load of latch->is_set could be speculated before the lock is dropped. But I do wonder if we're just multiplying barriers endlessly instead of trying to think of ways to minimize them (e.g. have a variant of SpinLockRelease that acts as a full barrier instead of a release barrier, and then avoid a second barrier when checking the latch state). All that having been said, a batch variant for reading tuples in bulk might make sense.
Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy
Hi, I just ran into this again in another context (see original dicussion, quoted below). Some time ago, while initially introducting non-default stats target, I set our non-filtering columns to "STATISTICS 10", lower than default, to minimize the size of pg_statistic, which (at least at one point) I perceived to have become bloated and causing issue (partially due to having an excessive number of "daily" granularity partitions, a problem I've since mitigated). The large number of columns with non-default stats target was (I think) causing pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more disruptive than necessary, so now I'm going back and fixing it. [pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS -1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost [pryzbyj@database ~]$ dmesg |tail -n2 Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, anon-rss:8977764kB, file-rss:8kB So I'm hoping to encourage someone to commit the change contemplated earlier. Thanks in advance. Justin On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote: > Justin Pryzby writes: > > I've seen this before while doing SET STATISTICS on a larger number of > > columns > > using xargs, but just came up while doing ADD of a large number of columns. > > Seems to be roughly linear in number of children but superlinear WRT > > columns. > > I think having to do with catalog update / cache invalidation with many > > ALTERs*children*columns? > > I poked into this a bit. The operation is necessarily roughly O(N^2) in > the number of columns, because we rebuild the affected table's relcache > entry after each elementary ADD COLUMN operation, and one of the principal > components of that cost is reading all the pg_attribute entries. However, > that should only be a time cost not a space cost. Eventually I traced the > O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to > have been introduced in Simon's commit e5550d5fe, and which strikes me as > a kluge of the first magnitude. Unless I am missing something, that > function's design concept can fairly be characterized as "let's leak > memory like there's no tomorrow, on the off chance that somebody somewhere > is ignoring basic coding rules". > > I tried ripping that out, and the peak space consumption of your example > (with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB. > Moreover, the system still passes make check-world, so it's not clear > to me what excuse this code has to live. > > It's probably a bit late in the v10 cycle to be taking any risks in > this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX > as soon as the v11 cycle opens, unless someone can show an example > of non-broken coding that requires it. (And if so, there ought to > be a regression test incorporating that.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
It'd be nice if SECURITY DEFINER functions could see what user invoked them, but current_user is the DEFINER user, naturally, since that's how this is done in fmgr_security_definer(). I was thinking that fmgr_security_definer() could keep a global pointer to a linked list (with automatic nodes) of the save_userid values. Then we could have a SQL function for accessing these, something like pg_current_user(level int) returning text, where level 0 is current_user, level 1 is "the previous current_user in the stack", and so on, returning null when level is beyond the top-level. This seems like a simple, small, easy patch, and since I [think I] need it I suspect others probably do as well. Thoughts? Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
Alternatively, a way to get at the OuterUserId? Or the outer-most current_user in the function stack? I should explain why I need this: for audit functionality where I want the triggers' procedures to be SECURITY DEFINER so only they can write to audit tables and such, but I want them to see the current_user of the *caller*, rather than current_user being the DEFINER's name. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
2017-10-18 22:01 GMT+02:00 Nico Williams : > It'd be nice if SECURITY DEFINER functions could see what user invoked > them, but current_user is the DEFINER user, naturally, since that's how > this is done in fmgr_security_definer(). > > I was thinking that fmgr_security_definer() could keep a global pointer > to a linked list (with automatic nodes) of the save_userid values. Then > we could have a SQL function for accessing these, something like > pg_current_user(level int) returning text, where level 0 is > current_user, level 1 is "the previous current_user in the stack", and > so on, returning null when level is beyond the top-level. > > This seems like a simple, small, easy patch, and since I [think I] need > it I suspect others probably do as well. > > Thoughts? > there is a function session_user() already regards Pavel > Nico > -- > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 10:15:01PM +0200, Pavel Stehule wrote: > there is a function session_user() already But it doesn't do this. Are you saying that I should add a session_user(int)? Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] Faster processing at Gather node
Hi, On 2017-10-18 15:46:39 -0400, Robert Haas wrote: > > 2) The spinlocks both on the the sending and receiving side a quite hot: > > > >rafia query leader: > > + 36.16% postgres postgres[.] shm_mq_receive > > + 19.49% postgres postgres[.] s_lock > > + 13.24% postgres postgres[.] SetLatch > > > >To test that theory, here are the timings for > >1) spinlocks present > > time: 6593.045 > >2) spinlocks acuisition replaced by *full* memory barriers, which on x86 > > is a lock; addl $0,0(%%rsp) > > time: 5159.306 > >3) spinlocks replaced by read/write barriers as appropriate. > > time: 4687.610 > > > >By my understanding of shm_mq.c's logic, something like 3) aught to > >be doable in a correct manner. There should be, in normal > >circumstances, only be one end modifying each of the protected > >variables. Doing so instead of using full block atomics with locked > >instructions is very likely to yield considerably better performance. > > I think the sticking point here will be the handling of the > mq_detached flag. I feel like I fixed a bug at some point where this > had to be checked or set under the lock at the same time we were > checking or setting mq_bytes_read and/or mq_bytes_written, but I don't > remember the details. I like the idea, though. Hm. I'm a bit confused/surprised by that. What'd be the worst that can happen if we don't immediately detect that the other side is detached? At least if we only do so in the non-blocking paths, the worst that seems that could happen is that we read/write at most one superflous queue entry, rather than reporting an error? Or is the concern that detaching might be detected *too early*, before reading the last entry from a queue? > Not sure what happened to #3 on your list... you went from #2 to #4. Threes are boring. > > 4) Modulo computations in shm_mq are expensive: > > > >that we end up with a full blown div makes sense - the compiler can't > >know anything about ringsize, therefore it can't optimize to a mask. > >I think we should force the size of the ringbuffer to be a power of > >two, and use a maks instead of a size for the buffer. > > This seems like it would require some redesign. Right now we let the > caller choose any size they want and subtract off our header size to > find the actual queue size. We can waste up to MAXALIGN-1 bytes, but > that's not much. This would waste up to half the bytes provided, > which is probably not cool. Yea, I think it'd require a shm_mq_estimate_size(Size queuesize), that returns the next power-of-two queuesize + overhead. > > 5) There's a *lot* of latch interactions. The biggest issue actually is > >the memory barrier implied by a SetLatch, waiting for the latch > >barely shows up. > > > >Commenting out the memory barrier - which is NOT CORRECT - improves > >timing: > >before: 4675.626 > >after: 4125.587 > > > >The correct fix obviously is not to break latch correctness. I think > >the big problem here is that we perform a SetLatch() for every read > >from the latch. > > I think it's a little bit of an overstatement to say that commenting > out the memory barrier is not correct. When we added that code, we > removed this comment: > > - * Presently, when using a shared latch for interprocess signalling, the > - * flag variable(s) set by senders and inspected by the wait loop must > - * be protected by spinlocks or LWLocks, else it is possible to miss events > - * on machines with weak memory ordering (such as PPC). This restriction > - * will be lifted in future by inserting suitable memory barriers into > - * SetLatch and ResetLatch. > > It seems to me that in any case where the data is protected by an > LWLock, the barriers we've added to SetLatch and ResetLatch are > redundant. I'm not sure if that's entirely true in the spinlock case, > because S_UNLOCK() is only documented to have release semantics, so > maybe the load of latch->is_set could be speculated before the lock is > dropped. But I do wonder if we're just multiplying barriers endlessly > instead of trying to think of ways to minimize them (e.g. have a > variant of SpinLockRelease that acts as a full barrier instead of a > release barrier, and then avoid a second barrier when checking the > latch state). I'm not convinced by this. Imo the multiplying largely comes from superflous actions, like the per-entry SetLatch calls here, rather than per batch. After all I'd benchmarked this *after* an experimental conversion of shm_mq to not use spinlocks - so there's actually no external barrier providing these guarantees that could be combined with SetLatch()'s barrier. Presumably part of the cost here comes from the fact that the barriers actually do have an influence over the ordering. > All that having been said, a batch variant for reading tuples in bulk > might make sense. I thi
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 1:26 PM, Nico Williams wrote: > On Wed, Oct 18, 2017 at 10:15:01PM +0200, Pavel Stehule wrote: > > there is a function session_user() already > > But it doesn't do this. Are you saying that I should add a > session_user(int)? > > Regardless of the merits of the proposed feature, the function "session_user" is SQL-defined and should not be modified or enhanced. I could see "calling_role()" being useful - it returns the same value as "current_role" normally and in security invoker functions while in a security definer function it would return whatever current_role would have returned if the function was a security invoker (i.e., the role that the system will put back into effect once the security definer function returns). Introducing the concept of a stack at the SQL level here seems, at first glance, to be over-complicating things. David J.
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
On Tue, Oct 17, 2017 at 6:02 AM, Alvaro Herrera wrote: > Robert Haas wrote: >> I haven't really followed this thread in depth, but I'm very nervous >> about the idea that we should ever be examining the raw-xmin of a >> tuple that has been marked HEAP_XMIN_FROZEN for anything other than >> forensic purposes. > > Yeah, me too. If you see another way to fix the problem, let's discuss > it. Well, I guess what I don't understand is, suppose we have a HOT chain that looks like this, where [x,y] denotes a tuple with an xmin of x and an xmax of y. [a,b]->[b,c]->[c,d] If b is eligible to be frozen, then b is committed and all-visible, which means that the [a,b] tuple should be pruned altogether. IOW, I don't think that a non-root tuple should ever have a frozen xmin; if that happens, I think we've already screwed up. So I don't quite understand how this problem arises in the first place. I think that the way we are supposed to be guaranteeing this is to first prune the page and then freeze it. If we ever freeze without first pruning, I think that's a bug. Now it could be that the problem is that there's a race: after we prune the page, somehow the xmin horizon advances before we freeze. But that also seems like something that shouldn't happen - our notion of the freeze threshold should be frozen at the beginning of the scan and should not advance, and it should be prior to our freeze horizon, which could be allowed to advance but not retreat in the course of the scan. Apologies if this is stupid; please clue me on what I'm missing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 01:43:30PM -0700, David G. Johnston wrote: > Regardless of the merits of the proposed feature, the function > "session_user" is SQL-defined and should not be modified or enhanced. > > I could see "calling_role()" being useful - it returns the same value > as "current_role" normally and in security invoker functions while in > a security definer function it would return whatever current_role > would have returned if the function was a security invoker (i.e., the > role that the system will put back into effect once the security > definer function returns). That... could be awkward where lots of SECURITY DEFINER functions may be user-callable, but also called from each other. But it would be minimally useful. More useful than this, for me, would be a way to get the top-most user. > Introducing the concept of a stack at the SQL level here seems, at > first glance, to be over-complicating things. Because of the current implementation of invocation of SECURITY DEFINER functions, a stack is trivial to build, since it's a list of nodes allocated on the C stack in fmgr_security_definer(). Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 2:08 PM, Nico Williams wrote: > On Wed, Oct 18, 2017 at 01:43:30PM -0700, David G. Johnston wrote: > > More useful than this, for me, would be a way to get the top-most user. > > That would be "session_user"? > Introducing the concept of a stack at the SQL level here seems, at > > first glance, to be over-complicating things. > > Because of the current implementation of invocation of SECURITY DEFINER > functions, a stack is trivial to build, since it's a list of nodes > allocated on the C stack in fmgr_security_definer(). > Not saying its difficult (or not) to code in C; but exposing that to SQL seems like a big step. If I was in position to dive deeper I wouldn't foreclose on the stack idea but I'd be inclined to see if something else could be made to work with reasonable effort. David J.
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote: > > More useful than this, for me, would be a way to get the top-most user. > > That would be "session_user"? It's not quite since there's a difference between SET SESSION AUTHORIZATION and SET SESSION ROLE. But yes, it's what I'm using now. > > Introducing the concept of a stack at the SQL level here seems, at > > > first glance, to be over-complicating things. > > > > Because of the current implementation of invocation of SECURITY DEFINER > > functions, a stack is trivial to build, since it's a list of nodes > > allocated on the C stack in fmgr_security_definer(). > > Not saying its difficult (or not) to code in C; but exposing that to SQL > seems like a big step. Really? Why? I mean, there's an implicit function invocation stack already. Reifying some bits of the function call stack is useful. I can't think of how this particular reification would be dangerous or set a bad precedent. Hmmm, oh, I forgot about GET DIAGNOSTICS! The stack is already exposed to SQL. Maybe we could add a CURRENT_USER item to GET STACKED DIAGNOSTICS or to the PG_CONTEXT. > If I was in position to dive deeper I wouldn't foreclose on the stack idea > but I'd be inclined to see if something else could be made to work with > reasonable effort. I would think that the more general approach, if easy enough to implement, would be better. I can (and will) live with using session_user instead of current_user, for now. But I'm willing to contribute a patch. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 2:30 PM, Nico Williams wrote: > On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote: > > > More useful than this, for me, would be a way to get the top-most user. > > > > That would be "session_user"? > > It's not quite since there's a difference between SET SESSION > AUTHORIZATION and SET SESSION ROLE. > > But yes, it's what I'm using now. > True, though at that point the superuser who wants to cover their tracks could probably just edit your functions... > Really? Why? I mean, there's an implicit function invocation stack > already. Reifying some bits of the function call stack is useful. I > can't think of how this particular reification would be dangerous or set > a bad precedent. > Nothing concrete... > > Hmmm, oh, I forgot about GET DIAGNOSTICS! The stack is already exposed > to SQL. Maybe we could add a CURRENT_USER item to GET STACKED > DIAGNOSTICS or to the PG_CONTEXT. > Ideally if implementing what you describe we'd want it accessible from any procedural language, not just pl/pgsql. Also, GET STACKED DIAGNOSTICS is documented as being exposed only within an exception handler. > > If I was in position to dive deeper I wouldn't foreclose on the stack > idea > > but I'd be inclined to see if something else could be made to work with > > reasonable effort. > > I would think that the more general approach, if easy enough to > implement, would be better. I can (and will) live with using > session_user instead of current_user, for now. But I'm willing to > contribute a patch I'd probably expose the stack as an array... David J.
Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains
On Wed, Oct 18, 2017 at 1:54 PM, Robert Haas wrote: > Well, I guess what I don't understand is, suppose we have a HOT chain > that looks like this, where [x,y] denotes a tuple with an xmin of x > and an xmax of y. > > [a,b]->[b,c]->[c,d] > > If b is eligible to be frozen, then b is committed and all-visible, > which means that the [a,b] tuple should be pruned altogether. IOW, I > don't think that a non-root tuple should ever have a frozen xmin; if > that happens, I think we've already screwed up. So I don't quite > understand how this problem arises in the first place. Technically, we don't freeze the heap-only tuples here, because we cannot. because freezing tuples renders them visible (we don't set XMIN_FROZEN). Instead, we set hint bits with WAL-logging, on the assumption that nobody will ever *need* to interrogate the clog -- see commit 20b65522 from several weeks back. To be clear, this *does* mean that hint bits stop being mere hints, which I find troubling. I described these heap-only tuples as "logically frozen" over on the "heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug" thread, which is where I brought up my general concern. > I think that the way we are supposed to be guaranteeing this is to > first prune the page and then freeze it. There is a race where we cannot prune the page, though. That's why we had to revisit what I suppose was a tacit assumption, and address its problems in the commit that started this thread (commit a5736bf7). > But that also seems like something > that shouldn't happen - our notion of the freeze threshold should be > frozen at the beginning of the scan and should not advance, and it > should be prior to our freeze horizon, which could be allowed to > advance but not retreat in the course of the scan. It is not allowed retreat -- kind of. (Alvaro mentioned something in passing about an *alternative* fix where it really was allowed to retreat in the simplest sense [1], but I don't think that that's going to happen.) > Apologies if this is stupid; please clue me on what I'm missing. I don't think that your questions are stupid at all. It intuitively seems bad that xmin freezing is only something we can do to HOT root and non-HOT tuples, while xmax freezing needs to happen to heap-only tuples, as well as HOT root tuples and non-HOT tuples. But, as I said, I'm still playing catch-up on MultiXacts, and I too feel like I might still be missing important details. [1] https://postgr.es/m/20171017100200.ruszi2c6qqwetce5@alvherre.pgsql -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?
On Wed, Oct 18, 2017 at 02:45:47PM -0700, David G. Johnston wrote: > On Wed, Oct 18, 2017 at 2:30 PM, Nico Williams > wrote: > > On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote: > > > > More useful than this, for me, would be a way to get the top-most user. > > > > > > That would be "session_user"? > > > > It's not quite since there's a difference between SET SESSION > > AUTHORIZATION and SET SESSION ROLE. > > > > But yes, it's what I'm using now. > > True, though at that point the superuser who wants to cover their tracks > could probably just edit your functions... I don't worry about superusers. However, I'd like for there to be a way to drop privileges permanently for a session. Something like SET SESSION AUTHORIZATION WITH NORESET (ala MySQL) or SET SESSION AUTHENTICATION. > > Hmmm, oh, I forgot about GET DIAGNOSTICS! The stack is already exposed > > to SQL. Maybe we could add a CURRENT_USER item to GET STACKED > > DIAGNOSTICS or to the PG_CONTEXT. > > Ideally if implementing what you describe we'd want it accessible from any > procedural language, not just pl/pgsql. Good point. So a function. Got it. > I'd probably expose the stack as an array... I agree, but that would be more expensive, since it means marshalling all the information, even if the caller only wants one specific item. Whereas accessing a specific frame by number is much simpler and performant (no allocation). It's also easier to not have to do something like.. parsing than the PG_CONTEXT, instead accessing each of any number of attributes we might expose from each frame. Nico -- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit queryId?
On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas wrote: > On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud wrote: >> Sorry for replying so late, but I have a perhaps naive question about >> the hashtable handling with this new version. >> >> IIUC, the shared hash table is now created with HASH_BLOBS instead of >> HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash >> table will use tag_hash() to compute the hash key. >> >> tag_hash() uses all the bits present in the given struct, so this can >> be problematic if padding bits are not zeroed, which isn't garanted by >> C standard for local variable. >> >> WIth current pgssHashKey definition, there shouldn't be padding bits, >> so it should be safe. But I wonder if adding an explicit memset() of >> the key in pgss_store() could avoid extension authors to have >> duplicate entries if they rely on this code, or prevent future issue >> in the unlikely case of adding other fields to pgssHashKey. > > I guess we should probably add additional comment to the definition of > pgssHashKey warning of the danger. I'm OK with adding a memset if > somebody can promise me it will get optimized away by all reasonably > commonly-used compilers, but I'm not that keen on adding more cycles > to protect against a hypothetical danger. A comment is an adapted answer for me too. There is no guarantee that memset improvements will get committed. They will likely be, but making a hard promise is difficult. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
Hi, On 2017-06-05 16:22:58 +0300, Sokolov Yura wrote: > Patch changes the way LWLock is acquired. > > Before patch, lock is acquired using following procedure: > 1. first LWLock->state is checked for ability to take a lock, and CAS > performed (either lock could be acquired or not). And it is retried if > status were changed. > 2. if lock is not acquired at first 1, Proc is put into wait list, using > LW_FLAG_LOCKED bit of LWLock->state as a spin-lock for wait list. > In fact, profile shows that LWLockWaitListLock spends a lot of CPU on > contendend LWLock (upto 20%). > Additional CAS loop is inside pg_atomic_fetch_or_u32 for setting > LW_FLAG_HAS_WAITERS. And releasing wait list lock is another CAS loop > on the same LWLock->state. > 3. after that state is checked again, because lock could be released > before Proc were queued into wait list. > 4. if lock were acquired at step 3, Proc should be dequeued from wait > list (another spin lock and CAS loop). And this operation could be > quite heavy, because whole list should be traversed. > > Patch makes lock acquiring in single CAS loop: > 1. LWLock->state is read, and ability for lock acquiring is detected. > If there is possibility to take a lock, CAS tried. > If CAS were successful, lock is aquired (same to original version). > 2. but if lock is currently held by other backend, we check ability for > taking WaitList lock. If wait list lock is not help by anyone, CAS > perfomed for taking WaitList lock and set LW_FLAG_HAS_WAITERS at once. > If CAS were successful, then LWLock were still held at the moment wait > list lock were held - this proves correctness of new algorithm. And > Proc is queued to wait list then. > 3. Otherwise spin_delay is performed, and loop returns to step 1. Right, something like this didn't use to be possible because we'd both the LWLock->state and LWLock->mutex. But after 008608b9d5106 that's not a concern anymore. > Algorithm for LWLockWaitForVar is also refactored. New version is: > 1. If lock is not held by anyone, it immediately exit. > 2. Otherwise it is checked for ability to take WaitList lock, because > variable is protected with it. If so, CAS is performed, and if it is > successful, loop breaked to step 4. > 3. Otherwise spin_delay perfomed, and loop returns to step 1. > 4. var is checked for change. > If it were changed, we unlock wait list lock and exit. > Note: it could not change in following steps because we are holding > wait list lock. > 5. Otherwise CAS on setting necessary flags is performed. If it succeed, > then queue Proc to wait list and exit. > 6. If Case failed, then there is possibility for LWLock to be already > released - if so then we should unlock wait list and exit. > 7. Otherwise loop returns to step 5. > > So invariant is preserved: > - either we found LWLock free, > - or we found changed variable, > - or we set flags and queue self while LWLock were held. > > Spin_delay is not performed at step 7, because we should release wait > list lock as soon as possible. That seems unconvincing - by not delaying you're more likely to *increase* the time till the current locker that holds the lock can release the lock. > Additionally, taking wait list lock is skipped in both algorithms if > SpinDelayStatus.spins is less than part of spins_per_delay loop > iterations (so, it is initial iterations, and some iterations after > pg_usleep, because SpinDelayStatus.spins is reset after sleep). I quite strongly think it's a good idea to change this at the same time as the other changes you're proposing here. I think it's a worthwhile thing to pursue, but that change also has quit ethe potential for regressing in some cases. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)
Hi Robert, Thanks very much for your quick response. PFA the patch containing the BE changes for pgwire v3.1, correctly formatted using pgindent this time 😊 A few salient points: >> SendServerProtocolVersionMessage should be adjusted to use the new >> facilities added by commit 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818. The new functionality is for sending 64bit ints. I think 32bits is sufficient for the information we want to pass around in the protocol negotiation phase, so I left this part unchanged. >> Also, this really doesn't belong in guc.c at all. We should be separating >> out >> these options in ProcessStartupPacket() just as we do for existing protocol- >> level options. When we actually have some options, I think they should be >> segregated into a separate list hanging off of the port, instead of letting >> them >> get mixed into >> port->guc_options, but for right now we don't have any yet, so a bunch >> of this complexity can go away. You are right, it is more elegant to make this a part of the port struct; I made the necessary changes in the patch. Thanks, Badrul >> -Original Message- >> From: Robert Haas [mailto:robertmh...@gmail.com] >> Sent: Friday, October 13, 2017 11:16 AM >> To: Badrul Chowdhury >> Cc: Tom Lane ; Satyanarayana Narlapuram >> ; Craig Ringer >> ; Peter Eisentraut ; Magnus >> Hagander ; PostgreSQL-development > hack...@postgresql.org> >> Subject: Re: [HACKERS] Re: protocol version negotiation (Re: Libpq >> PGRES_COPY_BOTH - version compatibility) >> >> On Fri, Oct 6, 2017 at 5:07 PM, Badrul Chowdhury >> wrote: >> > I added a mechanism to fall back to v3.0 if the BE fails to start when FE >> initiates a connection with v3.1 (with optional startup parameters). This >> completely eliminates the need to backpatch older servers, ie newer FE can >> connect to older BE. Please let me know what you think. >> >> Well, I think it needs a good bit of cleanup before we can really get to the >> substance of the patch. >> >> +fe_utils \ >> interfaces \ >> backend/replication/libpqwalreceiver \ >> backend/replication/pgoutput \ >> -fe_utils \ >> >> Useless change, omit. >> >> +if (whereToSendOutput != DestRemote || >> +PG_PROTOCOL_MAJOR(FrontendProtocol) < 3) >> +return -1; >> + >> +int sendStatus = 0; >> >> Won't compile on older compilers. We generally aim for C89 compliance, with >> a few exceptions for newer features. >> >> Also, why initialize sendStatus and then overwrite the value in the very next >> line of code? >> >> Also, the PG_PROTOCOL_MAJOR check here seems to be redundant with the >> one in the caller. >> >> +/* NegotiateServerProtocol packet structure >> + * >> + * [ 'Y' | msgLength | min_version | max_version | param_list_len >> | list of param names ] >> + */ >> + >> >> Please pgindent your patches. I suspect you'll find this gets garbled. >> >> Is there really any reason to separate NegotiateServerProtocol and >> ServerProtocolVersion into separate functions? >> >> -libpq = -L$(libpq_builddir) -lpq >> +libpq = -L$(libpq_builddir) -lpq -L$(top_builddir)/src/common >> -lpgcommon -L$(top_builddir)/src/fe_utils -lpgfeutils >> +$libpq->AddReference($libpgcommon, $libpgfeutils, $libpgport); >> >> I haven't done any research to try to figure out why you did this, but I >> don't >> think these are likely to be acceptable changes. >> >> SendServerProtocolVersionMessage should be adjusted to use the new >> facilities added by commit 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818. >> >> -/* Check we can handle the protocol the frontend is using. */ >> - >> -if (PG_PROTOCOL_MAJOR(proto) < >> PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST) || >> -PG_PROTOCOL_MAJOR(proto) > >> PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) || >> -(PG_PROTOCOL_MAJOR(proto) == >> PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) && >> - PG_PROTOCOL_MINOR(proto) > >> PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST))) >> -ereport(FATAL, >> -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> - errmsg("unsupported frontend protocol %u.%u: server >> supports % >> u.0 to %u.%u", >> -PG_PROTOCOL_MAJOR(proto), PG_PROTOCOL_MINOR(proto), >> -PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST), >> -PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST), >> -PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST; >> >> The way you've arranged things here looks like it'll cause us to accept >> connections even for protocol versions 4.x or higher; I don't think we want >> that. I suggest checking the major version number at this point in the code; >> then, the code path for version 3+ needs no additional check and the code >> path for version 2 can enforce 2.0. >> >> +bool >> +is_optional(const char *guc_name) >> +{ >> +const char *optionalPrefix = "_pq_"; >> +bool isOptional = false; >> + >> +/* "_pq_" must be a proper
Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA
Hi, On 2017-09-08 22:35:39 +0300, Sokolov Yura wrote: > From 722a8bed17f82738135264212dde7170b8c0f397 Mon Sep 17 00:00:00 2001 > From: Sokolov Yura > Date: Mon, 29 May 2017 09:25:41 + > Subject: [PATCH 1/6] More correct use of spinlock inside LWLockWaitListLock. > > SpinDelayStatus should be function global to count iterations correctly, > and produce more correct delays. > > Also if spin didn't spin, do not count it in spins_per_delay correction. > It wasn't necessary before cause delayStatus were used only in contented > cases. I'm not convinced this is benefial. Adds a bit of overhead to the casewhere LW_FLAG_LOCKED can be set immediately. OTOH, it's not super likely to make a large difference if the lock has to be taken anyway... > + > +/* just shortcut to not declare lwlock_stats variable at the end of function > */ > +static void > +add_lwlock_stats_spin_stat(LWLock *lock, SpinDelayStatus *delayStatus) > +{ > + lwlock_stats *lwstats; > + > + lwstats = get_lwlock_stats_entry(lock); > + lwstats->spin_delay_count += delayStatus->delays; > +} > #endif /* LWLOCK_STATS > */ This seems like a pretty independent change. > /* > * Internal function that tries to atomically acquire the lwlock in the > passed > - * in mode. > + * in mode. If it could not grab the lock, it doesn't puts proc into wait > + * queue. > * > - * This function will not block waiting for a lock to become free - that's > the > - * callers job. > + * It is used only in LWLockConditionalAcquire. > * > - * Returns true if the lock isn't free and we need to wait. > + * Returns true if the lock isn't free. > */ > static bool > -LWLockAttemptLock(LWLock *lock, LWLockMode mode) > +LWLockAttemptLockOnce(LWLock *lock, LWLockMode mode) This now has become a fairly special cased function, I'm not convinced it makes much sense with the current naming and functionality. > +/* > + * Internal function that tries to atomically acquire the lwlock in the > passed > + * in mode or put it self into waiting queue with waitmode. > + * This function will not block waiting for a lock to become free - that's > the > + * callers job. > + * > + * Returns true if the lock isn't free and we are in a wait queue. > + */ > +static inline bool > +LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode, LWLockMode > waitmode) > +{ > + uint32 old_state; > + SpinDelayStatus delayStatus; > + boollock_free; > + uint32 mask, > + add; > + > + AssertArg(mode == LW_EXCLUSIVE || mode == LW_SHARED); > + > + if (mode == LW_EXCLUSIVE) > + { > + mask = LW_LOCK_MASK; > + add = LW_VAL_EXCLUSIVE; > + } > + else > + { > + mask = LW_VAL_EXCLUSIVE; > + add = LW_VAL_SHARED; > + } > + > + init_local_spin_delay(&delayStatus); The way you moved this around has the disadvantage that we now do this - a number of writes - even in the very common case where the lwlock can be acquired directly. > + /* > + * Read once outside the loop. Later iterations will get the newer value > + * either via compare & exchange or with read after perform_spin_delay. > + */ > + old_state = pg_atomic_read_u32(&lock->state); > + /* loop until we've determined whether we could acquire the lock or not > */ > + while (true) > + { > + uint32 desired_state; > + > + desired_state = old_state; > + > + lock_free = (old_state & mask) == 0; > + if (lock_free) > { > - if (lock_free) > + desired_state += add; > + if (pg_atomic_compare_exchange_u32(&lock->state, > + >&old_state, desired_state)) > { > /* Great! Got the lock. */ > #ifdef LOCK_DEBUG > if (mode == LW_EXCLUSIVE) > lock->owner = MyProc; > #endif > - return false; > + break; > + } > + } > + else if ((old_state & LW_FLAG_LOCKED) == 0) > + { > + desired_state |= LW_FLAG_LOCKED | LW_FLAG_HAS_WAITERS; > + if (pg_atomic_compare_exchange_u32(&lock->state, > + >&old_state, desired_state)) > + { > + LWLockQueueSelfLocked(lock, waitmode); > + break; > } > - else > - return true;/* somebody else has the lock */ > + } > + else > + { > +
Re: [HACKERS] 64-bit queryId?
On Thu, Oct 19, 2017 at 1:08 AM, Michael Paquier wrote: > On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas wrote: >> On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud wrote: >>> WIth current pgssHashKey definition, there shouldn't be padding bits, >>> so it should be safe. But I wonder if adding an explicit memset() of >>> the key in pgss_store() could avoid extension authors to have >>> duplicate entries if they rely on this code, or prevent future issue >>> in the unlikely case of adding other fields to pgssHashKey. >> >> I guess we should probably add additional comment to the definition of >> pgssHashKey warning of the danger. I'm OK with adding a memset if >> somebody can promise me it will get optimized away by all reasonably >> commonly-used compilers, but I'm not that keen on adding more cycles >> to protect against a hypothetical danger. > > A comment is an adapted answer for me too. I agree, and I'm perfectly fine with adding a comment around pgssHashKey. PFA a patch to warn about the danger. diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index b04b4d6ce1..829ee69f51 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -124,7 +124,10 @@ typedef enum pgssVersion /* * Hashtable key that defines the identity of a hashtable entry. We separate - * queries by user and by database even if they are otherwise identical. + * queries by user and by database even if they are otherwise identical. Be + * careful when adding new fields, tag_hash() is used to compute the hash key, + * so we rely on the fact that no padding bit is present in this structure. + * Otherwise, we'd have to zero the key variable in pgss_store. */ typedef struct pgssHashKey { -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers