Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-10-18 Thread Masahiko Sawada
On Wed, Oct 18, 2017 at 5:32 AM, Fabien COELHO  wrote:
>
> Hello Masahiko-san,
>
>>> Attached the updated version patch.
>>
>>
>> Applies, compiles, make check & tap test ok, doc is fine.
>>
>> All is well for me: the feature is useful, code is simple and clean, it is
>> easy to invoke, easy to extend as well, which I'm planning to do once it
>> gets in.
>>
>> I switched the patch to "Ready for Committers". No doubt they will have
>> their own opinions about it. Let's wait and see.
>
>
> The patch needs a rebase in the documentation because of the xml-ilization
> of the sgml doc.
>

Thank you for the notification! Attached rebased patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v16.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Alvaro Herrera
Amit Langote wrote:
> On 2017/10/18 1:52, Alvaro Herrera wrote:
> > Alvaro Herrera wrote:
> >> Robert Haas wrote:
> >>> Implement table partitioning.
> >>
> >> Is it intentional that you can use ALTER TABLE OWNER TO on the parent
> >> table, and that this does not recurse to modify the partitions' owners?
> >> This doesn't seem to be mentioned in comments nor documentation, so it
> >> seems an oversight to me.
> 
> Hmm, I would say of it that the new partitioning didn't modify the
> behavior that existed for inheritance.
> 
> That said, I'm not sure if the lack of recursive application of ownership
> change to descendant tables is unintentional.

My view is that the fact that partitioning uses inheritance is just an
implementation detail.  We shouldn't let historical behavior for
inheritance dictate behavior for partitioning.  Inheritance has many
undesirable warts.

> > The alter table docs say that ONLY must be specified if one does not
> > want to modify descendants, so I think this is a bug.
> 
> Just to clarify, if we do think of it as a bug, then it will apply to the
> inheritance case as well, right?

I'd leave it alone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

2017-10-18 Thread Wolfgang Wilhelm
Hi guys,
please help me to understand the proposal.
Take a simple configuration: Two "live" systems S1 and S2 and for each of them 
a Replica R1 and R2. So S1 sends data to R1 and S2 to R2.
S1 has foreign tables on S2 with write access, meaning you can change a few 
data from S1 where information is actually in the foreign table sitting as real 
table on S2.

So what does the system after a change and committing it do? S1 would write to 
S2, R1 to R2. Assuming that I'd switch off replication from S2 to R2 everything 
would be fine. That is what you want, don't you?What would happen when the DBA 
forgets to switch of the replication from S2 to R2 in your scenario? A higher 
workload?

What happens when R2 fails? In the S2 -> R2 configuration the changes of S2 are 
saved until R2 is up again, isn't it?What would happen in the case that R1 
should write to R2? Is there a write or does it fail because the foreign table 
on R2 isn't available?

Regards,Wolfgang 

Michael Paquier  schrieb am 3:49 Mittwoch, 
18.Oktober 2017:
 

 On Wed, Oct 18, 2017 at 9:14 AM, Craig Ringer  wrote:
> Superficially at least, it sounds like a good idea.

Indeed.

> We should only need a virtual xid when we're working with foreign
> tables since we don't do any local heap changes.
>
> How's it work with savepoints?

That's one thing to worry about.

At least to me, it feels like cheating to allow an INSERT query to
happen for a transaction which is read-only actually read-only because
XactReadOnly is set to true when the transaction starts. I am
wondering if we should extend BEGIN TRANSACTION with a sort of "WRITE
ONLY FOREIGN" mode, which allows read queries as well as write queries
for foreign tables, because we know that those will not generate WAL
locally. This way it would be possible to block as well INSERT queries
happening in a transaction which should be intrinsically read-only.

+        if (rte->relkind == 'f')
+            continue;
Better to use RELKIND_FOREIGN_TABLE here.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


   

Re: [HACKERS] v10 telease note for pg_basebackup refers to old --xlog-method argument

2017-10-18 Thread Alvaro Herrera
Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter table doc fix

2017-10-18 Thread Alvaro Herrera
Amit Langote wrote:
> Hi.
> 
> Noticed that a alter table sub-command's name in Description (where it's
> OWNER) differs from that in synopsis (where it's OWNER TO).  Attached
> patch to make them match, if the difference is unintentional.

I agree -- pushed.

This paragraph

   
The actions for identity columns (ADD
GENERATED, SET etc., DROP
IDENTITY), as well as the actions
TRIGGER, CLUSTER, 
OWNER,
and TABLESPACE never recurse to descendant tables;
that is, they always act as though ONLY were specified.
Adding a constraint recurses only for CHECK constraints
that are not marked NO INHERIT.
   

is a bit annoying, though I think it'd be worse if we "fix" it to be
completely strict about the subcommands it refers to.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit queryId?

2017-10-18 Thread Julien Rouhaud
On Thu, Oct 12, 2017 at 2:46 AM, Michael Paquier
 wrote:
> On Thu, Oct 12, 2017 at 9:05 AM, Robert Haas  wrote:
>> On Wed, Oct 4, 2017 at 9:00 PM, Michael Paquier
>>  wrote:
>>> v4 looks correct to me. Testing it through (pgbench and some custom
>>> queries) I have not spotted issues. If the final decision is to use
>>> 64-bit query IDs, then this patch could be pushed.
>>
>> Cool.  Committed, thanks for the review.
>
> The final result looks fine for me. Thanks.

Sorry for replying so late, but I have a perhaps naive question about
the hashtable handling with this new version.

IIUC, the shared hash table is now created with HASH_BLOBS instead of
HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
table will use tag_hash() to compute the hash key.

tag_hash() uses all the bits present in the given struct, so this can
be problematic if padding bits are not zeroed, which isn't garanted by
C standard for local variable.

WIth current pgssHashKey definition, there shouldn't be padding bits,
so it should be safe.  But I wonder if adding an explicit memset() of
the key in pgss_store() could avoid extension authors to have
duplicate entries if they rely on this code, or prevent future issue
in the unlikely case of adding other fields to pgssHashKey.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Alvaro Herrera
This check is odd (tablecmds.c ATExecAttachPartition line 13861):

/* Temp parent cannot have a partition that is itself not a temp */
if (rel->rd_rel->relpersistence == RELPERSISTENCE_TEMP &&
attachrel->rd_rel->relpersistence != RELPERSISTENCE_TEMP)
ereport(ERROR,
(errcode(ERRCODE_WRONG_OBJECT_TYPE),
 errmsg("cannot attach a permanent relation as partition of 
temporary relation \"%s\"",
RelationGetRelationName(rel;

This seems to work (i.e. it's allowed to create a temp partition on a
permanent parent and not vice-versa, which you'd think makes sense) but
it's illusory, because if two sessions try to create temp partitions for
overlapping values, the second session fails with a silly error message.
To be more precise, do this in one session:

CREATE TABLE p (a int, b int) PARTITION BY RANGE (a);
CREATE TEMP TABLE p1 PARTITION OF p FOR VALUES FROM (0) TO (10);

then without closing that one, in another session repeat the second
command:

alvherre=# CREATE TEMP TABLE p1 PARTITION OF p FOR VALUES FROM (0) TO (10);
ERROR:  partition "p1" would overlap partition "p1"

which is not what I would have expected.

Maybe there are combinations of different persistence values that can be
allowed to differ (an unlogged partition is probably OK with a permanent
parent), but I don't think the current check is good enough.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Robert Haas
On Wed, Oct 18, 2017 at 4:53 AM, Alvaro Herrera  wrote:
> My view is that the fact that partitioning uses inheritance is just an
> implementation detail.  We shouldn't let historical behavior for
> inheritance dictate behavior for partitioning.  Inheritance has many
> undesirable warts.

I am OK with filing down warts over time, but to be clear, I think
it's too late to change things like this in v10, which has shipped.

>> > The alter table docs say that ONLY must be specified if one does not
>> > want to modify descendants, so I think this is a bug.
>>
>> Just to clarify, if we do think of it as a bug, then it will apply to the
>> inheritance case as well, right?
>
> I'd leave it alone.

I don't think it's a good idea for table partitioning and table
inheritance to behave differently.  Generally, I think we don't want
to end up with three categories of behavior: commands that recurse
always, commands that recurse to partitions but not inheritance
children, and commands that don't recurse.  If we now think that ALTER
TABLE .. OWNER TO should recurse, then we should change that to do so
in all cases and document it as a backward incompatibility.

I think this issue has very little to do with table partitioning per
se.  As Amit says, this is a longstanding behavior and it would have
been far stranger than where we are if the commit to implement table
partitioning had changed it.  I suggest starting new threads for
changes you want to make instead of tacking them all onto this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Tom Lane
Robert Haas  writes:
> On Wed, Oct 18, 2017 at 4:53 AM, Alvaro Herrera  
> wrote:
>> My view is that the fact that partitioning uses inheritance is just an
>> implementation detail.  We shouldn't let historical behavior for
>> inheritance dictate behavior for partitioning.  Inheritance has many
>> undesirable warts.

> I don't think it's a good idea for table partitioning and table
> inheritance to behave differently.

I'm with Robert on this one.  I'd be in favor of changing both cases
to make ALTER OWNER recurse by default.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Tue, Oct 17, 2017 at 09:07:40AM -0500, Justin Pryzby wrote:
> On Tue, Oct 17, 2017 at 09:34:24AM -0400, Tom Lane wrote:
> > Justin Pryzby  writes:
> > > On Tue, Oct 17, 2017 at 12:59:16PM +0200, Alvaro Herrera wrote:
> > >> Anyway, can give this patch a try?
> > 
> > The trick in this sort of situation is to make sure you build binaries
> > that match your existing install in every way except having the added
> > patch, and maybe getting installed into a different directory.
>
> I'm familiar with that process; but, these are PG10 binaries from PGDG for
> centos6 x64.  I had to add symlinks for postgis library, but otherwise seems 
> to
> be working fine (although I didn't preserve as many configure options as your
> message would suggest I should have).

On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote:
> So what I'm thinking is that you need an error during perform_work_item,
> and/or more than one work_item picked up in the calling loop, to make this
> bug manifest.  You would need to enter perform_work_item in a

..in our case probably due to interruption by LOCK TABLE, yes?

On Tue, Oct 17, 2017 at 12:49:55PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
> > And I think that's because we're not
> > checking that the namespace OID is a valid value before calling
> > get_namespace_name on it.
> 
> The part of your patch that adds a check on avw_database is clearly
> correct and necessary.  I'm thinking the change you propose in
> perform_work_item is just overcomplicating code that's okay as it
> stands.  We don't need to optimize for the schema-went-away case.

No crashes in ~28hr.  It occurs to me that it's a weaker test due to not
preserving most compilation options.  If I understand, our crash isn't
explained by the avw_database test anyway (?)

Should I make clean and recompile with all non-prefix options and a minimal
patch (avw_database==MyDatabaseId || continue) ?

Or recompile with existing options but no patch to first verify crash occurs
with locally compiled binary?

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote:

> No crashes in ~28hr.  It occurs to me that it's a weaker test due to not
> preserving most compilation options.

And the previous code crashes in 45 minutes?  That's solid enough for
me; I'll clean up the patch and push in the next few days.  I think what
you have now should be sufficient for the time being for your production
system.

> If I understand, our crash isn't explained by the avw_database test
> anyway (?)

I don't see why you would think that -- I disagree.

> Should I make clean and recompile with all non-prefix options and a minimal
> patch (avw_database==MyDatabaseId || continue) ?
> 
> Or recompile with existing options but no patch to first verify crash occurs
> with locally compiled binary?

I don't think either of these actions is necessary.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Robert Haas
On Wed, Oct 18, 2017 at 11:27 AM, Alvaro Herrera
 wrote:
> Maybe there are combinations of different persistence values that can be
> allowed to differ (an unlogged partition is probably OK with a permanent
> parent), but I don't think the current check is good enough.

This is also a sort of long-standing historical problem, I think.
This comment in expand_inherited_rtentry has been with us for a while:

/*
 * It is possible that the parent table has children that are temp
 * tables of other backends.  We cannot safely access such tables
 * (because of buffering issues), and the best thing to do seems to be
 * to silently ignore them.
 */

I do not find that to be a particularly sensible approach, and it's
probably even less sensible in the partitioning world than it was with
table inheritance.  I think what we ought to do is prohibit it, but it
wasn't the job of the table partitioning commit to whack this around.

This is not the first example of a case where we've failed to put in
place sufficiently-strong checks to prohibit references to temp tables
in places where they don't make sense.  See e.g.
16925c1e1fa236e4d7d6c8b571890e7c777f75d7,
948d6ec90fd35d6e1a59d0b1af8d6efd8442f0ad,
0688d84041f7963a2a00468c53aec7bb6051ef5c,
a13b01853084b6c6f9c34944bc19b3dd7dc4ceb2,
5fa3418304b06967fa54052b183bf96e1193d85e,
7d6e6e2e9732adfb6a93711ca6a6e42ba039fbdb,
82eed4dba254b8fda71d429b29d222ffb4e93fca.  We really did not do a good
job plugging all of the cases where temp tables ought to be forbidden
when that feature went in, and IMHO this is more of the tail end of
that work than anything specific to partitioning.  Your opinion may
differ, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote:
> Justin Pryzby wrote:
> 
> > No crashes in ~28hr.  It occurs to me that it's a weaker test due to not
> > preserving most compilation options.
> 
> And the previous code crashes in 45 minutes?  That's solid enough for
> me; I'll clean up the patch and push in the next few days.  I think what
> you have now should be sufficient for the time being for your production
> system.

No - the crash happened 4 times since adding BRIN+autosummarize 6 days ago, and
in once instance occured twice within 3 hours (while I was trying to query logs
for the preceding crash).

[pryzbyj@database ~]$ sudo grep -hE 'in postgres|Saved core' /var/log/messages*
Oct 13 17:22:45 database kernel: postmaster[32127] general protection ip:4bd467 
sp:7ffd9b349990 error:0 in postgres[40+692000]
Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 
(15040512 bytes)
Oct 14 18:05:35 database kernel: postmaster[26500] general protection ip:84a177 
sp:7ffd9b349b88 error:0 in postgres[40+692000]
Oct 14 18:05:35 database abrt[27564]: Saved core dump of pid 26500 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 
(24137728 bytes)
Oct 16 23:21:22 database kernel: postmaster[31543] general protection ip:4bd467 
sp:7ffe08a94890 error:0 in postgres[40+692000]
Oct 16 23:21:22 database abrt[570]: Saved core dump of pid 31543 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-16-23:21:22-31543 
(25133056 bytes)
Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 
0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000]
Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 
(/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 
(7692288 bytes)

> > If I understand, our crash isn't explained by the avw_database test
> > anyway (?)
> 
> I don't see why you would think that -- I disagree.

No problem - apparently I read too far into Tom's thoughts regarding memory
context.

I'll continue runnning with the existing patch and come back if the issue
recurs.

Thanks
Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2017-10-18 Thread Robert Haas
On Tue, Aug 12, 2014 at 1:52 PM, Heikki Linnakangas
 wrote:
> On 08/06/2014 08:37 PM, Jeff Janes wrote:
>>
>> But now it looks like 0002 needs a rebase
>
> I've committed the refactoring patch, and here's a rebased and improved
> version of the Windows SChannel implementation over that.
>
> Server-side support is now implemented too, but it's all very crude and
> work-in-progress. CRLs are not supported, intermediary CAs are not
> supported, and probably many other bells and whistles are missing too. But
> the basics work, including cert authentication. Consider this a Proof of
> Concept.

Heikki, do you have any plans to work more on this?

Or does anyone else?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Oct 18, 2017 at 11:27 AM, Alvaro Herrera
>  wrote:
> > Maybe there are combinations of different persistence values that can be
> > allowed to differ (an unlogged partition is probably OK with a permanent
> > parent), but I don't think the current check is good enough.
> 
> This is also a sort of long-standing historical problem, I think.

Sure.

Actually, the code I'm calling attention to is ATExecAttachPartition()
which was specifically written for partitioning.  Looks like it was copied
verbatim from ATExecAddInherit, but there's no shared code there AFAICS.

I'm okay with prohibiting the case of different persistence values as
you suggest.  And I do suggest to back-patch that prohibition to pg10.

Let me add that I'm not looking to blame anyone for what I report here.
I'm very excited about the partitioning stuff and I'm happy of what was
done for pg10.  I'm now working on more partitioning-related changes
which means I review the existing code as I go along, so I just report
things that look wrong to me as I discover them, just with an interest
in seeing them fixed, or documented, or at least discussed and
explicitly agreed upon.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Alvaro Herrera
Justin Pryzby wrote:
> On Wed, Oct 18, 2017 at 06:54:09PM +0200, Alvaro Herrera wrote:

> > And the previous code crashes in 45 minutes?  That's solid enough for
> > me; I'll clean up the patch and push in the next few days.  I think what
> > you have now should be sufficient for the time being for your production
> > system.
> 
> No - the crash happened 4 times since adding BRIN+autosummarize 6 days ago, 
> and
> in once instance occured twice within 3 hours (while I was trying to query 
> logs
> for the preceding crash).

Oh, okay.  Then we don't know enough yet, ISTM.

> [pryzbyj@database ~]$ sudo grep -hE 'in postgres|Saved core' 
> /var/log/messages*
> Oct 13 17:22:45 database kernel: postmaster[32127] general protection 
> ip:4bd467 sp:7ffd9b349990 error:0 in postgres[40+692000]
> Oct 13 17:22:47 database abrt[32387]: Saved core dump of pid 32127 
> (/usr/pgsql-10/bin/postgres) to 
> /var/spool/abrt/ccpp-2017-10-13-17:22:47-32127 (15040512 bytes)
> Oct 14 18:05:35 database kernel: postmaster[26500] general protection 
> ip:84a177 sp:7ffd9b349b88 error:0 in postgres[40+692000]
> Oct 14 18:05:35 database abrt[27564]: Saved core dump of pid 26500 
> (/usr/pgsql-10/bin/postgres) to 
> /var/spool/abrt/ccpp-2017-10-14-18:05:35-26500 (24137728 bytes)
> Oct 16 23:21:22 database kernel: postmaster[31543] general protection 
> ip:4bd467 sp:7ffe08a94890 error:0 in postgres[40+692000]
> Oct 16 23:21:22 database abrt[570]: Saved core dump of pid 31543 
> (/usr/pgsql-10/bin/postgres) to 
> /var/spool/abrt/ccpp-2017-10-16-23:21:22-31543 (25133056 bytes)
> Oct 17 01:58:36 database kernel: postmaster[8646]: segfault at 8 ip 
> 0084a177 sp 7ffe08a94a88 error 4 in postgres[40+692000]
> Oct 17 01:58:38 database abrt[9192]: Saved core dump of pid 8646 
> (/usr/pgsql-10/bin/postgres) to /var/spool/abrt/ccpp-2017-10-17-01:58:38-8646 
> (7692288 bytes)

Do you still have those core dumps?  If so, would you please verify the
database that autovacuum was running in?  Just open each with gdb (using
the original postgres binary, not the one you just installed) and do
"print MyDatabaseId".

> I'll continue runnning with the existing patch and come back if the issue
> recurs.

Thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Justin Pryzby
On Wed, Oct 18, 2017 at 07:22:27PM +0200, Alvaro Herrera wrote:
> Do you still have those core dumps?  If so, would you please verify the
> database that autovacuum was running in?  Just open each with gdb (using
> the original postgres binary, not the one you just installed) and do
> "print MyDatabaseId".

[pryzbyj@database ~]$ gdb ccpp-2017-10-16-23:21:22-31543/coredump -ex 'p 
MyDatabaseId' -ex q 2>/dev/null |tail -5
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  index_close (relation=0x324647603246466, lockmode=1) at indexam.c:178
178 LockRelId   relid = relation->rd_lockInfo.lockRelId;
$1 = 16400

[pryzbyj@database ~]$ gdb ccpp-2017-10-14-18:05:35-26500/coredump -ex 'p 
MyDatabaseId' -ex q 2>/dev/null |tail -5
Core was generated by `postgres: autovacuum worker process   gtt '.
Program terminated with signal 11, Segmentation fault.
#0  pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);
$1 = 16400

gtt=# SELECT oid,datname FROM pg_database;
 13456 | template0
 16400 | gtt
 13457 | postgres
 1 | template1

The gtt DB is where the (only) BRIN indicies are (not sure what to conclude
from that?)

Justin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2017-10-18 Thread Tom Lane
Robert Haas  writes:
> Heikki, do you have any plans to work more on this?
> Or does anyone else?

FWIW, I have some interest in the Apple Secure Transport patch that
is in the CF queue, and will probably pick that up at some point if
no one beats me to it (but it's not real high on my to-do list).
I won't be touching the Windows version though.  I suspect that the
folk who might be competent to review the Windows code may have
correspondingly little interest in the macOS patch.  This is a bit
of a problem, since it would be good for someone to look at both of
them, with an eye to whether there are any places in our SSL abstraction
API that ought to be rethought now that we have actual non-OpenSSL
implementations to compare to.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SIGSEGV in BRIN autosummarize

2017-10-18 Thread Tom Lane
Alvaro Herrera  writes:
> And the previous code crashes in 45 minutes?  That's solid enough for
> me; I'll clean up the patch and push in the next few days.  I think what
> you have now should be sufficient for the time being for your production
> system.

I'm still of the opinion that the presented patch isn't preventing any
crashes.  The failure to check avw_database would generally lead to
tasks silently getting dropped in the mistaken belief that the target
table has gone away.  I could believe that a crash occurs if the given
schema OID exists in some other DB, and the given table OID does too,
but it's for a table/index of the wrong kind ... but how likely is that?
Not very likely at all, given the way we generate OIDs.

(And if it did happen, I'd say the cause is a failure to recheck relkind
somewhere downstream of the autovac task dispatch code, anyway.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Implement table partitioning.

2017-10-18 Thread Robert Haas
On Wed, Oct 18, 2017 at 1:18 PM, Alvaro Herrera  wrote:
> I'm okay with prohibiting the case of different persistence values as
> you suggest.  And I do suggest to back-patch that prohibition to pg10.

I disagree.  There's nothing any more broken about the way this works
with partitioning in v10 than the way it works with inheritance in 9.6
or prior.  Table inheritance has had warts for years, and the fact
that we now have table partitioning doesn't make all of those same
warts into must-fix-now bugs.  They are still just warts, and they
should be cleaned up through future development as we find them and
have the time to do something about them.  They should be documented
as incompatibilities where appropriate.  They should not be jammed
into stable branches because users don't like it when DDL works one
way in 10.1 and another way in 10.2.  They don't even really like it
when 10.0 works differently from 11.0, but you have to be willing to
see bad decisions revisited at some point if you want progress to
happen, and I certainly do.

> Let me add that I'm not looking to blame anyone for what I report here.
> I'm very excited about the partitioning stuff and I'm happy of what was
> done for pg10.  I'm now working on more partitioning-related changes
> which means I review the existing code as I go along, so I just report
> things that look wrong to me as I discover them, just with an interest
> in seeing them fixed, or documented, or at least discussed and
> explicitly agreed upon.

Fair enough, but when you reply on the thread where I committed the
patch and propose back-patching to the release that contained it, you
make it sound like it's a bug in the patch, and I don't think either
of the two things you just raised are.  My complaint is not that I
think you are accusing me of any sort of wrongdoing but that you're
trying to justify back-patching what I think is new development by
characterizing it as a bug fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2017-10-18 Thread Robert Haas
On Wed, Oct 18, 2017 at 2:50 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Heikki, do you have any plans to work more on this?
>> Or does anyone else?
>
> FWIW, I have some interest in the Apple Secure Transport patch that
> is in the CF queue, and will probably pick that up at some point if
> no one beats me to it (but it's not real high on my to-do list).
> I won't be touching the Windows version though.  I suspect that the
> folk who might be competent to review the Windows code may have
> correspondingly little interest in the macOS patch.  This is a bit
> of a problem, since it would be good for someone to look at both of
> them, with an eye to whether there are any places in our SSL abstraction
> API that ought to be rethought now that we have actual non-OpenSSL
> implementations to compare to.

Well, the best way to handle that might be to get some of this stuff
done before we get too much later into the release cycle, so that
there's time to tinker with it before the release goes out the door
(or is deep in beta).  However, if nobody's working on this and the
other patch is someplace far down your to-do list, then I guess that
isn't going to happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit queryId?

2017-10-18 Thread Robert Haas
On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud  wrote:
> Sorry for replying so late, but I have a perhaps naive question about
> the hashtable handling with this new version.
>
> IIUC, the shared hash table is now created with HASH_BLOBS instead of
> HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
> table will use tag_hash() to compute the hash key.
>
> tag_hash() uses all the bits present in the given struct, so this can
> be problematic if padding bits are not zeroed, which isn't garanted by
> C standard for local variable.
>
> WIth current pgssHashKey definition, there shouldn't be padding bits,
> so it should be safe.  But I wonder if adding an explicit memset() of
> the key in pgss_store() could avoid extension authors to have
> duplicate entries if they rely on this code, or prevent future issue
> in the unlikely case of adding other fields to pgssHashKey.

I guess we should probably add additional comment to the definition of
pgssHashKey warning of the danger.  I'm OK with adding a memset if
somebody can promise me it will get optimized away by all reasonably
commonly-used compilers, but I'm not that keen on adding more cycles
to protect against a hypothetical danger.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead bug (Was: amcheck (B-Tree integrity checking tool))

2017-10-18 Thread Peter Geoghegan
On Mon, Oct 16, 2017 at 8:09 PM, Noah Misch  wrote:
> That presupposes construction of two pieces of software, the server and the
> checker, such that every disagreement is a bug in the server.  But checkers
> get bugs just like servers get bugs.

You make a good point, which is that *some* code must be wrong when an
error is raised and hardware is not to blame, but ISTM that the nuance
of that really matters. The checker seems much less likely to be where
bugs are, for three reasons:

* There is far less code for us to maintain as compared to the volume
of backend code that is effectively tested (again, not including the
hidden universe of complex, unauditable firmware code that could be
involved these days).

* Much of the actual checking (as much as possible) is outsourced to
core code that is already critically important. If that has bugs in
it, then it is unlikely to be defined as an amcheck bug.

* Knowing all this, we can go out of our way to do a good job of
getting the design right the first time. (A sound design is far more
important than actually having zero bugs.)

Obviously there could be unambiguous bugs; I'm not arguing otherwise.
I just hope that we can push this model as far as we need to, and
perhaps accommodate verifiability as a goal for *future* development
projects. We're almost doing that today; debuggability of on-disk
structures is something that the community already values. This is the
logical next step, IMV.

> Checkers do provide a sort of
> double-entry bookkeeping.  When a reproducible test case prompts a checker
> complaint, we'll know *some* code is wrong.

I really like your double entry bookkeeping analogy. A tiny
discrepancy will bubble up, even in a huge organization, and yet the
underlying principles are broad and not all that complicated.

> That's an admirable contribution.

Thank you. I just hope that it becomes something that other
contributors have some sense of ownership over.

> I'm essentially saying that the server is innocent until proven guilty.  It
> would be cool to have a self-contained specification of PostgreSQL data files,
> but where the server disagrees with the spec without causing problem
> behaviors, we'd ultimately update the spec to fit the server.

I might not have done a good job of explaining my position. I agree
with everything you say here. I would like to see amcheck become a
kind of vehicle for discussing things that we already discuss. You get
a nice tool at the end, that clarifies and increases confidence in the
original understanding over time (or acts as a canary-in-the-coalmine
forcing function when the original understanding turns out to be
faulty). The tool itself is ultimately just a bonus.

Bringing it back to the concrete freeze-the-dead issue, and the
question of an XID-cutoff for safely interrogating CLOG: I guess it
will be possible to assess a HOT chain as a whole. We can probably do
this safely while holding a super-exclusive lock on the buffer. I can
probably find a way to ensure this only needs to happen in a rare slow
path, when it looks like the invariant might be violated but we need
to make sure (I'm already following this pattern in a couple of
places). Realistically, there will be some amount of "try it and see"
here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-18 Thread Robert Haas
On Tue, Oct 17, 2017 at 5:39 PM, Andres Freund  wrote:
> The precise query doesn't really matter, the observations here are more
> general, I hope.
>
> 1) nodeGather.c re-projects every row from workers. As far as I can tell
>that's now always exactly the same targetlist as it's coming from the
>worker. Projection capability was added in 8538a6307049590 (without
>checking whether it's needed afaict), but I think it in turn often
>obsoleted by 992b5ba30dcafdc222341505b072a6b009b248a7.  My
>measurement shows that removing the projection yields quite massive
>speedups in queries like yours, which is not too surprising.

That seems like an easy and worthwhile optimization.

>I suspect this just needs a tlist_matches_tupdesc check + an if
>around ExecProject(). And a test, right now tests pass unless
>force_parallel_mode is used even if just commenting out the
>projection unconditionally.

So, for this to fail, we'd need a query that uses parallelism but
where the target list contains a parallel-restricted function.  Also,
the function should really be such that we'll reliably get a failure
rather than only with some small probability.  I'm not quite sure how
to put together such a test case, but there's probably some way.

> 2) The spinlocks both on the the sending and receiving side a quite hot:
>
>rafia query leader:
> +   36.16%  postgres  postgres[.] shm_mq_receive
> +   19.49%  postgres  postgres[.] s_lock
> +   13.24%  postgres  postgres[.] SetLatch
>
>To test that theory, here are the timings for
>1) spinlocks present
>   time: 6593.045
>2) spinlocks acuisition replaced by *full* memory barriers, which on x86 
> is a lock; addl $0,0(%%rsp)
>   time: 5159.306
>3) spinlocks replaced by read/write barriers as appropriate.
>   time: 4687.610
>
>By my understanding of shm_mq.c's logic, something like 3) aught to
>be doable in a correct manner. There should be, in normal
>circumstances, only be one end modifying each of the protected
>variables. Doing so instead of using full block atomics with locked
>instructions is very likely to yield considerably better performance.

I think the sticking point here will be the handling of the
mq_detached flag.  I feel like I fixed a bug at some point where this
had to be checked or set under the lock at the same time we were
checking or setting mq_bytes_read and/or mq_bytes_written, but I don't
remember the details.  I like the idea, though.

Not sure what happened to #3 on your list... you went from #2 to #4.

> 4) Modulo computations in shm_mq are expensive:
>
>that we end up with a full blown div makes sense - the compiler can't
>know anything about ringsize, therefore it can't optimize to a mask.
>I think we should force the size of the ringbuffer to be a power of
>two, and use a maks instead of a size for the buffer.

This seems like it would require some redesign.  Right now we let the
caller choose any size they want and subtract off our header size to
find the actual queue size.  We can waste up to MAXALIGN-1 bytes, but
that's not much.  This would waste up to half the bytes provided,
which is probably not cool.

> 5) There's a *lot* of latch interactions. The biggest issue actually is
>the memory barrier implied by a SetLatch, waiting for the latch
>barely shows up.
>
>Commenting out the memory barrier - which is NOT CORRECT - improves
>timing:
>before: 4675.626
>after: 4125.587
>
>The correct fix obviously is not to break latch correctness. I think
>the big problem here is that we perform a SetLatch() for every read
>from the latch.

I think it's a little bit of an overstatement to say that commenting
out the memory barrier is not correct.  When we added that code, we
removed this comment:

- * Presently, when using a shared latch for interprocess signalling, the
- * flag variable(s) set by senders and inspected by the wait loop must
- * be protected by spinlocks or LWLocks, else it is possible to miss events
- * on machines with weak memory ordering (such as PPC).  This restriction
- * will be lifted in future by inserting suitable memory barriers into
- * SetLatch and ResetLatch.

It seems to me that in any case where the data is protected by an
LWLock, the barriers we've added to SetLatch and ResetLatch are
redundant.  I'm not sure if that's entirely true in the spinlock case,
because S_UNLOCK() is only documented to have release semantics, so
maybe the load of latch->is_set could be speculated before the lock is
dropped.  But I do wonder if we're just multiplying barriers endlessly
instead of trying to think of ways to minimize them (e.g. have a
variant of SpinLockRelease that acts as a full barrier instead of a
release barrier, and then avoid a second barrier when checking the
latch state).

All that having been said, a batch variant for reading tuples in bulk
might make sense.  

Re: [HACKERS] [GENERAL] huge RAM use in multi-command ALTER of table heirarchy

2017-10-18 Thread Justin Pryzby
Hi,

I just ran into this again in another context (see original dicussion, quoted
below).

Some time ago, while initially introducting non-default stats target, I set our
non-filtering columns to "STATISTICS 10", lower than default, to minimize the
size of pg_statistic, which (at least at one point) I perceived to have become
bloated and causing issue (partially due to having an excessive number of
"daily" granularity partitions, a problem I've since mitigated).

The large number of columns with non-default stats target was (I think) causing
pg_dump --section=pre-data to take 10+ minutes, which makes pg_upgrade more
disruptive than necessary, so now I'm going back and fixing it.

[pryzbyj@database ~]$ time sed '/SET STATISTICS 10;$/!d; s//SET STATISTICS 
-1;/' /srv/cdrperfbackup/ts/2017-10-17/pg_dump-section\=pre-data |psql -1q ts
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

[pryzbyj@database ~]$ dmesg |tail -n2
Out of memory: Kill process 6725 (postmaster) score 550 or sacrifice child
Killed process 6725, UID 26, (postmaster) total-vm:13544792kB, 
anon-rss:8977764kB, file-rss:8kB

So I'm hoping to encourage someone to commit the change contemplated earlier.

Thanks in advance.

Justin

On Tue, Jul 18, 2017 at 07:26:30PM -0400, Tom Lane wrote:
> Justin Pryzby  writes:
> > I've seen this before while doing SET STATISTICS on a larger number of 
> > columns
> > using xargs, but just came up while doing ADD of a large number of columns.
> > Seems to be roughly linear in number of children but superlinear WRT 
> > columns.
> > I think having to do with catalog update / cache invalidation with many
> > ALTERs*children*columns?
> 
> I poked into this a bit.  The operation is necessarily roughly O(N^2) in
> the number of columns, because we rebuild the affected table's relcache
> entry after each elementary ADD COLUMN operation, and one of the principal
> components of that cost is reading all the pg_attribute entries.  However,
> that should only be a time cost not a space cost.  Eventually I traced the
> O(N^2) space consumption to RememberToFreeTupleDescAtEOX, which seems to
> have been introduced in Simon's commit e5550d5fe, and which strikes me as
> a kluge of the first magnitude.  Unless I am missing something, that
> function's design concept can fairly be characterized as "let's leak
> memory like there's no tomorrow, on the off chance that somebody somewhere
> is ignoring basic coding rules".
> 
> I tried ripping that out, and the peak space consumption of your example
> (with 20 child tables and 1600 columns) decreased from ~3GB to ~200MB.
> Moreover, the system still passes make check-world, so it's not clear
> to me what excuse this code has to live.
> 
> It's probably a bit late in the v10 cycle to be taking any risks in
> this area, but I'd vote for ripping out RememberToFreeTupleDescAtEOX
> as soon as the v11 cycle opens, unless someone can show an example
> of non-broken coding that requires it.  (And if so, there ought to
> be a regression test incorporating that.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
It'd be nice if SECURITY DEFINER functions could see what user invoked
them, but current_user is the DEFINER user, naturally, since that's how
this is done in fmgr_security_definer().

I was thinking that fmgr_security_definer() could keep a global pointer
to a linked list (with automatic nodes) of the save_userid values.  Then
we could have a SQL function for accessing these, something like
pg_current_user(level int) returning text, where level 0 is
current_user, level 1 is "the previous current_user in the stack", and
so on, returning null when level is beyond the top-level.

This seems like a simple, small, easy patch, and since I [think I] need
it I suspect others probably do as well.

Thoughts?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
Alternatively, a way to get at the OuterUserId?  Or the outer-most
current_user in the function stack?

I should explain why I need this: for audit functionality where I want
the triggers' procedures to be SECURITY DEFINER so only they can write
to audit tables and such, but I want them to see the current_user of the
*caller*, rather than current_user being the DEFINER's name.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Pavel Stehule
2017-10-18 22:01 GMT+02:00 Nico Williams :

> It'd be nice if SECURITY DEFINER functions could see what user invoked
> them, but current_user is the DEFINER user, naturally, since that's how
> this is done in fmgr_security_definer().
>
> I was thinking that fmgr_security_definer() could keep a global pointer
> to a linked list (with automatic nodes) of the save_userid values.  Then
> we could have a SQL function for accessing these, something like
> pg_current_user(level int) returning text, where level 0 is
> current_user, level 1 is "the previous current_user in the stack", and
> so on, returning null when level is beyond the top-level.
>
> This seems like a simple, small, easy patch, and since I [think I] need
> it I suspect others probably do as well.
>
> Thoughts?
>

there is a function session_user() already

regards

Pavel


> Nico
> --
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 10:15:01PM +0200, Pavel Stehule wrote:
> there is a function session_user() already

But it doesn't do this.  Are you saying that I should add a
session_user(int)?

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] Faster processing at Gather node

2017-10-18 Thread Andres Freund
Hi,

On 2017-10-18 15:46:39 -0400, Robert Haas wrote:
> > 2) The spinlocks both on the the sending and receiving side a quite hot:
> >
> >rafia query leader:
> > +   36.16%  postgres  postgres[.] shm_mq_receive
> > +   19.49%  postgres  postgres[.] s_lock
> > +   13.24%  postgres  postgres[.] SetLatch
> >
> >To test that theory, here are the timings for
> >1) spinlocks present
> >   time: 6593.045
> >2) spinlocks acuisition replaced by *full* memory barriers, which on x86 
> > is a lock; addl $0,0(%%rsp)
> >   time: 5159.306
> >3) spinlocks replaced by read/write barriers as appropriate.
> >   time: 4687.610
> >
> >By my understanding of shm_mq.c's logic, something like 3) aught to
> >be doable in a correct manner. There should be, in normal
> >circumstances, only be one end modifying each of the protected
> >variables. Doing so instead of using full block atomics with locked
> >instructions is very likely to yield considerably better performance.
> 
> I think the sticking point here will be the handling of the
> mq_detached flag.  I feel like I fixed a bug at some point where this
> had to be checked or set under the lock at the same time we were
> checking or setting mq_bytes_read and/or mq_bytes_written, but I don't
> remember the details.  I like the idea, though.

Hm. I'm a bit confused/surprised by that. What'd be the worst that can
happen if we don't immediately detect that the other side is detached?
At least if we only do so in the non-blocking paths, the worst that
seems that could happen is that we read/write at most one superflous
queue entry, rather than reporting an error? Or is the concern that
detaching might be detected *too early*, before reading the last entry
from a queue?


> Not sure what happened to #3 on your list... you went from #2 to #4.

Threes are boring.


> > 4) Modulo computations in shm_mq are expensive:
> >
> >that we end up with a full blown div makes sense - the compiler can't
> >know anything about ringsize, therefore it can't optimize to a mask.
> >I think we should force the size of the ringbuffer to be a power of
> >two, and use a maks instead of a size for the buffer.
> 
> This seems like it would require some redesign.  Right now we let the
> caller choose any size they want and subtract off our header size to
> find the actual queue size.  We can waste up to MAXALIGN-1 bytes, but
> that's not much.  This would waste up to half the bytes provided,
> which is probably not cool.

Yea, I think it'd require a shm_mq_estimate_size(Size queuesize), that
returns the next power-of-two queuesize + overhead.


> > 5) There's a *lot* of latch interactions. The biggest issue actually is
> >the memory barrier implied by a SetLatch, waiting for the latch
> >barely shows up.
> >
> >Commenting out the memory barrier - which is NOT CORRECT - improves
> >timing:
> >before: 4675.626
> >after: 4125.587
> >
> >The correct fix obviously is not to break latch correctness. I think
> >the big problem here is that we perform a SetLatch() for every read
> >from the latch.
> 
> I think it's a little bit of an overstatement to say that commenting
> out the memory barrier is not correct.  When we added that code, we
> removed this comment:
> 
> - * Presently, when using a shared latch for interprocess signalling, the
> - * flag variable(s) set by senders and inspected by the wait loop must
> - * be protected by spinlocks or LWLocks, else it is possible to miss events
> - * on machines with weak memory ordering (such as PPC).  This restriction
> - * will be lifted in future by inserting suitable memory barriers into
> - * SetLatch and ResetLatch.
> 
> It seems to me that in any case where the data is protected by an
> LWLock, the barriers we've added to SetLatch and ResetLatch are
> redundant. I'm not sure if that's entirely true in the spinlock case,
> because S_UNLOCK() is only documented to have release semantics, so
> maybe the load of latch->is_set could be speculated before the lock is
> dropped.  But I do wonder if we're just multiplying barriers endlessly
> instead of trying to think of ways to minimize them (e.g. have a
> variant of SpinLockRelease that acts as a full barrier instead of a
> release barrier, and then avoid a second barrier when checking the
> latch state).

I'm not convinced by this. Imo the multiplying largely comes from
superflous actions, like the per-entry SetLatch calls here, rather than
per batch.

After all I'd benchmarked this *after* an experimental conversion of
shm_mq to not use spinlocks - so there's actually no external barrier
providing these guarantees that could be combined with SetLatch()'s
barrier.

Presumably part of the cost here comes from the fact that the barriers
actually do have an influence over the ordering.


> All that having been said, a batch variant for reading tuples in bulk
> might make sense.  I thi

Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread David G. Johnston
On Wed, Oct 18, 2017 at 1:26 PM, Nico Williams 
wrote:

> On Wed, Oct 18, 2017 at 10:15:01PM +0200, Pavel Stehule wrote:
> > there is a function session_user() already
>
> But it doesn't do this.  Are you saying that I should add a
> session_user(int)?
>
>
​Regardless of the merits of the proposed feature, the function
"session_user" is SQL-defined and should not be modified or enhanced.

I could see "calling_role()" being useful - it returns the same value as
"current_role" normally and in security invoker functions while in a
security definer function it would return whatever current_role would have
returned if the function was a security invoker (i.e., the role that the
system will put back into effect once the security definer function
returns).

Introducing the concept of a stack at the SQL level here seems, at first
glance, to be over-complicating things.

David J.


Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

2017-10-18 Thread Robert Haas
On Tue, Oct 17, 2017 at 6:02 AM, Alvaro Herrera  wrote:
> Robert Haas wrote:
>> I haven't really followed this thread in depth, but I'm very nervous
>> about the idea that we should ever be examining the raw-xmin of a
>> tuple that has been marked HEAP_XMIN_FROZEN for anything other than
>> forensic purposes.
>
> Yeah, me too.  If you see another way to fix the problem, let's discuss
> it.

Well, I guess what I don't understand is, suppose we have a HOT chain
that looks like this, where [x,y] denotes a tuple with an xmin of x
and an xmax of y.

[a,b]->[b,c]->[c,d]

If b is eligible to be frozen, then b is committed and all-visible,
which means that the [a,b] tuple should be pruned altogether.  IOW, I
don't think that a non-root tuple should ever have a frozen xmin; if
that happens, I think we've already screwed up.  So I don't quite
understand how this problem arises in the first place.

I think that the way we are supposed to be guaranteeing this is to
first prune the page and then freeze it.  If we ever freeze without
first pruning, I think that's a bug.  Now it could be that the problem
is that there's a race: after we prune the page, somehow the xmin
horizon advances before we freeze.  But that also seems like something
that shouldn't happen - our notion of the freeze threshold should be
frozen at the beginning of the scan and should not advance, and it
should be prior to our freeze horizon, which could be allowed to
advance but not retreat in the course of the scan.

Apologies if this is stupid; please clue me on what I'm missing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 01:43:30PM -0700, David G. Johnston wrote:
> Regardless of the merits of the proposed feature, the function
> "session_user" is SQL-defined and should not be modified or enhanced.
> 
> I could see "calling_role()" being useful - it returns the same value
> as "current_role" normally and in security invoker functions while in
> a security definer function it would return whatever current_role
> would have returned if the function was a security invoker (i.e., the
> role that the system will put back into effect once the security
> definer function returns).

That... could be awkward where lots of SECURITY DEFINER functions may be
user-callable, but also called from each other.  But it would be
minimally useful.

More useful than this, for me, would be a way to get the top-most user.

> Introducing the concept of a stack at the SQL level here seems, at
> first glance, to be over-complicating things.

Because of the current implementation of invocation of SECURITY DEFINER
functions, a stack is trivial to build, since it's a list of nodes
allocated on the C stack in fmgr_security_definer().

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread David G. Johnston
On Wed, Oct 18, 2017 at 2:08 PM, Nico Williams 
wrote:

> On Wed, Oct 18, 2017 at 01:43:30PM -0700, David G. Johnston wrote:
>
> More useful than this, for me, would be a way to get the top-most user.
>
>
​That would be "session_user"?​

> Introducing the concept of a stack at the SQL level here seems, at
> > first glance, to be over-complicating things.
>
> Because of the current implementation of invocation of SECURITY DEFINER
> functions, a stack is trivial to build, since it's a list of nodes
> allocated on the C stack in fmgr_security_definer().
>

​Not saying its difficult (or not) to code in C; but exposing that to SQL
seems like a big step.

If I was in position to dive deeper I wouldn't foreclose on the stack idea
but I'd be inclined to see if something else could be made to work with
reasonable effort.

David J.​


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote:
> > More useful than this, for me, would be a way to get the top-most user.
> 
> That would be "session_user"?

It's not quite since there's a difference between SET SESSION
AUTHORIZATION and SET SESSION ROLE.

But yes, it's what I'm using now.

> > Introducing the concept of a stack at the SQL level here seems, at
> > > first glance, to be over-complicating things.
> >
> > Because of the current implementation of invocation of SECURITY DEFINER
> > functions, a stack is trivial to build, since it's a list of nodes
> > allocated on the C stack in fmgr_security_definer().
> 
> Not saying its difficult (or not) to code in C; but exposing that to SQL
> seems like a big step.

Really?  Why?  I mean, there's an implicit function invocation stack
already.  Reifying some bits of the function call stack is useful.  I
can't think of how this particular reification would be dangerous or set
a bad precedent.

Hmmm, oh, I forgot about GET DIAGNOSTICS!  The stack is already exposed
to SQL.  Maybe we could add a CURRENT_USER item to GET STACKED
DIAGNOSTICS or to the PG_CONTEXT.

> If I was in position to dive deeper I wouldn't foreclose on the stack idea
> but I'd be inclined to see if something else could be made to work with
> reasonable effort.

I would think that the more general approach, if easy enough to
implement, would be better.  I can (and will) live with using
session_user instead of current_user, for now.  But I'm willing to
contribute a patch.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread David G. Johnston
On Wed, Oct 18, 2017 at 2:30 PM, Nico Williams 
wrote:

> On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote:
> > > More useful than this, for me, would be a way to get the top-most user.
> >
> > That would be "session_user"?
>
> It's not quite since there's a difference between SET SESSION
> AUTHORIZATION and SET SESSION ROLE.
>
> But yes, it's what I'm using now.
>

​True, though at that point the superuser who wants to cover their tracks
could probably just edit your functions...​


> Really?  Why?  I mean, there's an implicit function invocation stack
> already.  Reifying some bits of the function call stack is useful.  I
> can't think of how this particular reification would be dangerous or set
> a bad precedent.
>

​Nothing concrete...​
​

>
> Hmmm, oh, I forgot about GET DIAGNOSTICS!  The stack is already exposed
> to SQL.  Maybe we could add a CURRENT_USER item to GET STACKED
> DIAGNOSTICS or to the PG_CONTEXT.
>

Ideally if implementing what you describe we'd want it accessible from any
procedural language​, not just pl/pgsql.

Also, GET STACKED DIAGNOSTICS is documented as being exposed only within an
exception handler.


> > If I was in position to dive deeper I wouldn't foreclose on the stack
> idea
> > but I'd be inclined to see if something else could be made to work with
> > reasonable effort.
>
> I would think that the more general approach, if easy enough to
> implement, would be better.  I can (and will) live with using
> session_user instead of current_user, for now.  But I'm willing to
> contribute a patch


​I'd probably expose the stack as an array...

David J.
​


Re: [HACKERS] [COMMITTERS] pgsql: Fix traversal of half-frozen update chains

2017-10-18 Thread Peter Geoghegan
On Wed, Oct 18, 2017 at 1:54 PM, Robert Haas  wrote:
> Well, I guess what I don't understand is, suppose we have a HOT chain
> that looks like this, where [x,y] denotes a tuple with an xmin of x
> and an xmax of y.
>
> [a,b]->[b,c]->[c,d]
>
> If b is eligible to be frozen, then b is committed and all-visible,
> which means that the [a,b] tuple should be pruned altogether.  IOW, I
> don't think that a non-root tuple should ever have a frozen xmin; if
> that happens, I think we've already screwed up.  So I don't quite
> understand how this problem arises in the first place.

Technically, we don't freeze the heap-only tuples here, because we
cannot. because freezing tuples renders them visible (we don't set
XMIN_FROZEN). Instead, we set hint bits with WAL-logging, on the
assumption that nobody will ever *need* to interrogate the clog -- see
commit 20b65522 from several weeks back. To be clear, this *does* mean
that hint bits stop being mere hints, which I find troubling.

I described these heap-only tuples as "logically frozen" over on the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, which is where I brought up my general concern.

> I think that the way we are supposed to be guaranteeing this is to
> first prune the page and then freeze it.

There is a race where we cannot prune the page, though. That's why we
had to revisit what I suppose was a tacit assumption, and address its
problems in the commit that started this thread (commit a5736bf7).

> But that also seems like something
> that shouldn't happen - our notion of the freeze threshold should be
> frozen at the beginning of the scan and should not advance, and it
> should be prior to our freeze horizon, which could be allowed to
> advance but not retreat in the course of the scan.

It is not allowed retreat -- kind of. (Alvaro mentioned something in
passing about an *alternative* fix where it really was allowed to
retreat in the simplest sense [1], but I don't think that that's going
to happen.)

> Apologies if this is stupid; please clue me on what I'm missing.

I don't think that your questions are stupid at all. It intuitively
seems bad that xmin freezing is only something we can do to HOT root
and non-HOT tuples, while xmax freezing needs to happen to heap-only
tuples, as well as HOT root tuples and non-HOT tuples. But, as I said,
I'm still playing catch-up on MultiXacts, and I too feel like I might
still be missing important details.

[1] https://postgr.es/m/20171017100200.ruszi2c6qqwetce5@alvherre.pgsql
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interest in a SECURITY DEFINER function current_user stack access mechanism?

2017-10-18 Thread Nico Williams
On Wed, Oct 18, 2017 at 02:45:47PM -0700, David G. Johnston wrote:
> On Wed, Oct 18, 2017 at 2:30 PM, Nico Williams 
> wrote:
> > On Wed, Oct 18, 2017 at 02:13:29PM -0700, David G. Johnston wrote:
> > > > More useful than this, for me, would be a way to get the top-most user.
> > >
> > > That would be "session_user"?
> >
> > It's not quite since there's a difference between SET SESSION
> > AUTHORIZATION and SET SESSION ROLE.
> >
> > But yes, it's what I'm using now.
> 
> True, though at that point the superuser who wants to cover their tracks
> could probably just edit your functions...

I don't worry about superusers.

However, I'd like for there to be a way to drop privileges permanently
for a session.  Something like SET SESSION AUTHORIZATION WITH NORESET
(ala MySQL) or SET SESSION AUTHENTICATION.

> > Hmmm, oh, I forgot about GET DIAGNOSTICS!  The stack is already exposed
> > to SQL.  Maybe we could add a CURRENT_USER item to GET STACKED
> > DIAGNOSTICS or to the PG_CONTEXT.
> 
> Ideally if implementing what you describe we'd want it accessible from any
> procedural language​, not just pl/pgsql.

Good point.  So a function.  Got it.

> I'd probably expose the stack as an array...

I agree, but that would be more expensive, since it means marshalling
all the information, even if the caller only wants one specific item.
Whereas accessing a specific frame by number is much simpler and
performant (no allocation).

It's also easier to not have to do something like.. parsing than the
PG_CONTEXT, instead accessing each of any number of attributes we might
expose from each frame.

Nico
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 64-bit queryId?

2017-10-18 Thread Michael Paquier
On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas  wrote:
> On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud  wrote:
>> Sorry for replying so late, but I have a perhaps naive question about
>> the hashtable handling with this new version.
>>
>> IIUC, the shared hash table is now created with HASH_BLOBS instead of
>> HASH_FUNCTION, so since sizeof(pgssHashKey) != sizeof(uint32) the hash
>> table will use tag_hash() to compute the hash key.
>>
>> tag_hash() uses all the bits present in the given struct, so this can
>> be problematic if padding bits are not zeroed, which isn't garanted by
>> C standard for local variable.
>>
>> WIth current pgssHashKey definition, there shouldn't be padding bits,
>> so it should be safe.  But I wonder if adding an explicit memset() of
>> the key in pgss_store() could avoid extension authors to have
>> duplicate entries if they rely on this code, or prevent future issue
>> in the unlikely case of adding other fields to pgssHashKey.
>
> I guess we should probably add additional comment to the definition of
> pgssHashKey warning of the danger.  I'm OK with adding a memset if
> somebody can promise me it will get optimized away by all reasonably
> commonly-used compilers, but I'm not that keen on adding more cycles
> to protect against a hypothetical danger.

A comment is an adapted answer for me too. There is no guarantee that
memset improvements will get committed. They will likely be, but
making a hard promise is difficult.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-10-18 Thread Andres Freund
Hi,

On 2017-06-05 16:22:58 +0300, Sokolov Yura wrote:
> Patch changes the way LWLock is acquired.
> 
> Before patch, lock is acquired using following procedure:
> 1. first LWLock->state is checked for ability to take a lock, and CAS
>   performed (either lock could be acquired or not). And it is retried if
>   status were changed.
> 2. if lock is not acquired at first 1, Proc is put into wait list, using
>   LW_FLAG_LOCKED bit of LWLock->state as a spin-lock for wait list.
>   In fact, profile shows that LWLockWaitListLock spends a lot of CPU on
>   contendend LWLock (upto 20%).
>   Additional CAS loop is inside pg_atomic_fetch_or_u32 for setting
>   LW_FLAG_HAS_WAITERS. And releasing wait list lock is another CAS loop
>   on the same LWLock->state.
> 3. after that state is checked again, because lock could be released
>   before Proc were queued into wait list.
> 4. if lock were acquired at step 3, Proc should be dequeued from wait
>   list (another spin lock and CAS loop). And this operation could be
>   quite heavy, because whole list should be traversed.
> 
> Patch makes lock acquiring in single CAS loop:
> 1. LWLock->state is read, and ability for lock acquiring is detected.
>   If there is possibility to take a lock, CAS tried.
>   If CAS were successful, lock is aquired (same to original version).
> 2. but if lock is currently held by other backend, we check ability for
>   taking WaitList lock. If wait list lock is not help by anyone, CAS
>   perfomed for taking WaitList lock and set LW_FLAG_HAS_WAITERS at once.
>   If CAS were successful, then LWLock were still held at the moment wait
>   list lock were held - this proves correctness of new algorithm. And
>   Proc is queued to wait list then.
> 3. Otherwise spin_delay is performed, and loop returns to step 1.

Right, something like this didn't use to be possible because we'd both
the LWLock->state and LWLock->mutex. But after 008608b9d5106 that's not
a concern anymore.

> Algorithm for LWLockWaitForVar is also refactored. New version is:
> 1. If lock is not held by anyone, it immediately exit.
> 2. Otherwise it is checked for ability to take WaitList lock, because
>   variable is protected with it. If so, CAS is performed, and if it is
>   successful, loop breaked to step 4.
> 3. Otherwise spin_delay perfomed, and loop returns to step 1.
> 4. var is checked for change.
>   If it were changed, we unlock wait list lock and exit.
>   Note: it could not change in following steps because we are holding
>   wait list lock.
> 5. Otherwise CAS on setting necessary flags is performed. If it succeed,
>   then queue Proc to wait list and exit.
> 6. If Case failed, then there is possibility for LWLock to be already
>   released - if so then we should unlock wait list and exit.
> 7. Otherwise loop returns to step 5.
> 
> So invariant is preserved:
> - either we found LWLock free,
> - or we found changed variable,
> - or we set flags and queue self while LWLock were held.
> 
> Spin_delay is not performed at step 7, because we should release wait
> list lock as soon as possible.

That seems unconvincing - by not delaying you're more likely to
*increase* the time till the current locker that holds the lock can
release the lock.

> Additionally, taking wait list lock is skipped in both algorithms if
> SpinDelayStatus.spins is less than part of spins_per_delay loop
> iterations (so, it is initial iterations, and some iterations after
> pg_usleep, because SpinDelayStatus.spins is reset after sleep).

I quite strongly think it's a good idea to change this at the same time
as the other changes you're proposing here.  I think it's a worthwhile
thing to pursue, but that change also has quit ethe potential for
regressing in some cases.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: protocol version negotiation (Re: Libpq PGRES_COPY_BOTH - version compatibility)

2017-10-18 Thread Badrul Chowdhury
Hi Robert,

Thanks very much for your quick response. PFA the patch containing the BE 
changes for pgwire v3.1, correctly formatted using pgindent this time 😊

A few salient points:

>> SendServerProtocolVersionMessage should be adjusted to use the new
>> facilities added by commit 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818.

The new functionality is for sending 64bit ints. I think 32bits is sufficient 
for the information we want to pass around in the protocol negotiation phase, 
so I left this part unchanged.

>> Also, this really doesn't belong in guc.c at all.  We should be separating 
>> out
>> these options in ProcessStartupPacket() just as we do for existing protocol-
>> level options.  When we actually have some options, I think they should be
>> segregated into a separate list hanging off of the port, instead of letting 
>> them
>> get mixed into
>> port->guc_options, but for right now we don't have any yet, so a bunch
>> of this complexity can go away.

You are right, it is more elegant to make this a part of the port struct; I 
made the necessary changes in the patch.

Thanks,
Badrul

>> -Original Message-
>> From: Robert Haas [mailto:robertmh...@gmail.com]
>> Sent: Friday, October 13, 2017 11:16 AM
>> To: Badrul Chowdhury 
>> Cc: Tom Lane ; Satyanarayana Narlapuram
>> ; Craig Ringer
>> ; Peter Eisentraut ; Magnus
>> Hagander ; PostgreSQL-development > hack...@postgresql.org>
>> Subject: Re: [HACKERS] Re: protocol version negotiation (Re: Libpq
>> PGRES_COPY_BOTH - version compatibility)
>> 
>> On Fri, Oct 6, 2017 at 5:07 PM, Badrul Chowdhury 
>> wrote:
>> > I added a mechanism to fall back to v3.0 if the BE fails to start when FE
>> initiates a connection with v3.1 (with optional startup parameters). This
>> completely eliminates the need to backpatch older servers, ie newer FE can
>> connect to older BE. Please let me know what you think.
>> 
>> Well, I think it needs a good bit of cleanup before we can really get to the
>> substance of the patch.
>> 
>> +fe_utils \
>>  interfaces \
>>  backend/replication/libpqwalreceiver \
>>  backend/replication/pgoutput \
>> -fe_utils \
>> 
>> Useless change, omit.
>> 
>> +if (whereToSendOutput != DestRemote ||
>> +PG_PROTOCOL_MAJOR(FrontendProtocol) < 3)
>> +return -1;
>> +
>> +int sendStatus = 0;
>> 
>> Won't compile on older compilers.  We generally aim for C89 compliance, with
>> a few exceptions for newer features.
>> 
>> Also, why initialize sendStatus and then overwrite the value in the very next
>> line of code?
>> 
>> Also, the PG_PROTOCOL_MAJOR check here seems to be redundant with the
>> one in the caller.
>> 
>> +/* NegotiateServerProtocol packet structure
>> + *
>> + * [ 'Y' | msgLength | min_version | max_version | param_list_len
>> | list of param names ]
>> + */
>> +
>> 
>> Please pgindent your patches.  I suspect you'll find this gets garbled.
>> 
>> Is there really any reason to separate NegotiateServerProtocol and
>> ServerProtocolVersion into separate functions?
>> 
>> -libpq = -L$(libpq_builddir) -lpq
>> +libpq = -L$(libpq_builddir) -lpq -L$(top_builddir)/src/common
>> -lpgcommon -L$(top_builddir)/src/fe_utils -lpgfeutils
>> +$libpq->AddReference($libpgcommon, $libpgfeutils, $libpgport);
>> 
>> I haven't done any research to try to figure out why you did this, but I 
>> don't
>> think these are likely to be acceptable changes.
>> 
>> SendServerProtocolVersionMessage should be adjusted to use the new
>> facilities added by commit 1de09ad8eb1fa673ee7899d6dfbb2b49ba204818.
>> 
>> -/* Check we can handle the protocol the frontend is using. */
>> -
>> -if (PG_PROTOCOL_MAJOR(proto) <
>> PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST) ||
>> -PG_PROTOCOL_MAJOR(proto) >
>> PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) ||
>> -(PG_PROTOCOL_MAJOR(proto) ==
>> PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST) &&
>> - PG_PROTOCOL_MINOR(proto) >
>> PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST)))
>> -ereport(FATAL,
>> -(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> - errmsg("unsupported frontend protocol %u.%u: server 
>> supports %
>> u.0 to %u.%u",
>> -PG_PROTOCOL_MAJOR(proto), PG_PROTOCOL_MINOR(proto),
>> -PG_PROTOCOL_MAJOR(PG_PROTOCOL_EARLIEST),
>> -PG_PROTOCOL_MAJOR(PG_PROTOCOL_LATEST),
>> -PG_PROTOCOL_MINOR(PG_PROTOCOL_LATEST;
>> 
>> The way you've arranged things here looks like it'll cause us to accept
>> connections even for protocol versions 4.x or higher; I don't think we want
>> that.  I suggest checking the major version number at this point in the code;
>> then, the code path for version 3+ needs no additional check and the code
>> path for version 2 can enforce 2.0.
>> 
>> +bool
>> +is_optional(const char *guc_name)
>> +{
>> +const char *optionalPrefix = "_pq_";
>> +bool isOptional = false;
>> +
>> +/* "_pq_" must be a proper

Re: [HACKERS] Fix performance degradation of contended LWLock on NUMA

2017-10-18 Thread Andres Freund
Hi,

On 2017-09-08 22:35:39 +0300, Sokolov Yura wrote:
> From 722a8bed17f82738135264212dde7170b8c0f397 Mon Sep 17 00:00:00 2001
> From: Sokolov Yura 
> Date: Mon, 29 May 2017 09:25:41 +
> Subject: [PATCH 1/6] More correct use of spinlock inside LWLockWaitListLock.
> 
> SpinDelayStatus should be function global to count iterations correctly,
> and produce more correct delays.
> 
> Also if spin didn't spin, do not count it in spins_per_delay correction.
> It wasn't necessary before cause delayStatus were used only in contented
> cases.

I'm not convinced this is benefial. Adds a bit of overhead to the
casewhere LW_FLAG_LOCKED can be set immediately. OTOH, it's not super
likely to make a large difference if the lock has to be taken anyway...


> +
> +/* just shortcut to not declare lwlock_stats variable at the end of function 
> */
> +static void
> +add_lwlock_stats_spin_stat(LWLock *lock, SpinDelayStatus *delayStatus)
> +{
> + lwlock_stats *lwstats;
> +
> + lwstats = get_lwlock_stats_entry(lock);
> + lwstats->spin_delay_count += delayStatus->delays;
> +}
>  #endif   /* LWLOCK_STATS 
> */

This seems like a pretty independent change.


>  /*
>   * Internal function that tries to atomically acquire the lwlock in the 
> passed
> - * in mode.
> + * in mode. If it could not grab the lock, it doesn't puts proc into wait
> + * queue.
>   *
> - * This function will not block waiting for a lock to become free - that's 
> the
> - * callers job.
> + * It is used only in LWLockConditionalAcquire.
>   *
> - * Returns true if the lock isn't free and we need to wait.
> + * Returns true if the lock isn't free.
>   */
>  static bool
> -LWLockAttemptLock(LWLock *lock, LWLockMode mode)
> +LWLockAttemptLockOnce(LWLock *lock, LWLockMode mode)

This now has become a fairly special cased function, I'm not convinced
it makes much sense with the current naming and functionality.


> +/*
> + * Internal function that tries to atomically acquire the lwlock in the 
> passed
> + * in mode or put it self into waiting queue with waitmode.
> + * This function will not block waiting for a lock to become free - that's 
> the
> + * callers job.
> + *
> + * Returns true if the lock isn't free and we are in a wait queue.
> + */
> +static inline bool
> +LWLockAttemptLockOrQueueSelf(LWLock *lock, LWLockMode mode, LWLockMode 
> waitmode)
> +{
> + uint32  old_state;
> + SpinDelayStatus delayStatus;
> + boollock_free;
> + uint32  mask,
> + add;
> +
> + AssertArg(mode == LW_EXCLUSIVE || mode == LW_SHARED);
> +
> + if (mode == LW_EXCLUSIVE)
> + {
> + mask = LW_LOCK_MASK;
> + add = LW_VAL_EXCLUSIVE;
> + }
> + else
> + {
> + mask = LW_VAL_EXCLUSIVE;
> + add = LW_VAL_SHARED;
> + }
> +
> + init_local_spin_delay(&delayStatus);

The way you moved this around has the disadvantage that we now do this -
a number of writes - even in the very common case where the lwlock can
be acquired directly.


> + /*
> +  * Read once outside the loop. Later iterations will get the newer value
> +  * either via compare & exchange or with read after perform_spin_delay.
> +  */
> + old_state = pg_atomic_read_u32(&lock->state);
> + /* loop until we've determined whether we could acquire the lock or not 
> */
> + while (true)
> + {
> + uint32  desired_state;
> +
> + desired_state = old_state;
> +
> + lock_free = (old_state & mask) == 0;
> + if (lock_free)
>   {
> - if (lock_free)
> + desired_state += add;
> + if (pg_atomic_compare_exchange_u32(&lock->state,
> + 
>&old_state, desired_state))
>   {
>   /* Great! Got the lock. */
>  #ifdef LOCK_DEBUG
>   if (mode == LW_EXCLUSIVE)
>   lock->owner = MyProc;
>  #endif
> - return false;
> + break;
> + }
> + }
> + else if ((old_state & LW_FLAG_LOCKED) == 0)
> + {
> + desired_state |= LW_FLAG_LOCKED | LW_FLAG_HAS_WAITERS;
> + if (pg_atomic_compare_exchange_u32(&lock->state,
> + 
>&old_state, desired_state))
> + {
> + LWLockQueueSelfLocked(lock, waitmode);
> + break;
>   }
> - else
> - return true;/* somebody else has the lock */
> + }
> + else
> + {
> + 

Re: [HACKERS] 64-bit queryId?

2017-10-18 Thread Julien Rouhaud
On Thu, Oct 19, 2017 at 1:08 AM, Michael Paquier
 wrote:
> On Thu, Oct 19, 2017 at 4:12 AM, Robert Haas  wrote:
>> On Wed, Oct 18, 2017 at 9:20 AM, Julien Rouhaud  wrote:
>>> WIth current pgssHashKey definition, there shouldn't be padding bits,
>>> so it should be safe.  But I wonder if adding an explicit memset() of
>>> the key in pgss_store() could avoid extension authors to have
>>> duplicate entries if they rely on this code, or prevent future issue
>>> in the unlikely case of adding other fields to pgssHashKey.
>>
>> I guess we should probably add additional comment to the definition of
>> pgssHashKey warning of the danger.  I'm OK with adding a memset if
>> somebody can promise me it will get optimized away by all reasonably
>> commonly-used compilers, but I'm not that keen on adding more cycles
>> to protect against a hypothetical danger.
>
> A comment is an adapted answer for me too.

I agree, and I'm perfectly fine with adding a comment around pgssHashKey.

PFA a patch to warn about the danger.
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index b04b4d6ce1..829ee69f51 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -124,7 +124,10 @@ typedef enum pgssVersion
 
 /*
  * Hashtable key that defines the identity of a hashtable entry.  We separate
- * queries by user and by database even if they are otherwise identical.
+ * queries by user and by database even if they are otherwise identical.  Be
+ * careful when adding new fields, tag_hash() is used to compute the hash key,
+ * so we rely on the fact that no padding bit is present in this structure.
+ * Otherwise, we'd have to zero the key variable in pgss_store.
  */
 typedef struct pgssHashKey
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers