date:20170407

On Fri, Apr 7, 2017 at 9:00 AM, Masahiko Sawada 
wrote:

> Hi all,
>
> Attached fixes a typo in publicationcmd.c file.
>
> s/om/on/
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Compiler warning in costsize.c

On Tue, Apr 4, 2017 at 9:42 PM, Michael Paquier
 wrote:
> On Wed, Apr 5, 2017 at 2:54 AM, Tom Lane  wrote:
>> (I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY,
>> because it tends to confuse pgindent.)
>
> I would be incline to just do that, any other solution I can think of
> is uglier than that.

Actually, no. Looking at this issue again the warning is triggered
because the Assert() clause is present in USE_ASSERT_CHECKING. So
instead of removing PG_USED_FOR_ASSERTS_ONLY, here is a more simple
patch that fixes the problem. No need to put the variable *rte within
ifdefs as well.
-- 
Michael


costsize-warning-fix-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undefined psql variables



Hello Corey,


\if defined varname
\if sql boolean expression to send to server
\if compare value operator value


I'm still thinking:-)

Independently of the my aethetical complaint against having a pretty
unusual keyword prefix syntax, how would you envision a \set assignment
variant? Would \if have a different expression syntax somehow?


Any further thoughts?


My current opinion:

 - I'm fine if \set stays as it is, i.e. no expression.

 - I agree that some client-side expressions are needed, along the
   semantics suggested by Tom, i.e. definition and comparisons.

 - I'm really against the prefix syntax suggested by Tom


I wish I could have an explanation about why the :?varname (or some other 
variant) syntax I suggested has a "namespace" issue.


The advantage that I see is that although it is obviously ugly, it is ugly 
in the continuity of the various :["'?]varname syntaxes already offered 
and it allows to get rid of "defined varname" which does not look like 
SQL. A second advantage is that with the "defined" proposal


   \if defined var1 and defined var2 or defined var3 and sqlrt() >= ..

Would probably never work work, as it cannot be embedded in another 
expression, while it would work with


   \if :?var1 and :?var2 or :?var3 and ...


Moreover, I would like the condition syntax to be basically SQL & psql 
variables, without explicit prefixes, with a transparent decision whether 
it is evaluated client side or server side.


As client-side expressions are pretty simple, ISTM that some regex could 
be used for this purpose, eg for integer and boolean comparisons:


 ^\s*\d+\s*(=|<>|!=|<|<=|>|>=)\s*\d+\s*$
 ^\s*(bool...)\s*(=|<>|!=)\s*(bool...)\s*$
 ^\s*(NOT\s*)?(bool...)\s*$

So that one could just write the expressions without having to tell where 
it is executed, eg


 \if :VERSION_NUM < 11

Would lead to

 \if 10 < 11

Caught by the first regex, and evaluated with a few lines of code.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Craig Ringer

On 7 April 2017 at 15:59, Heikki Linnakangas  wrote:
> On 04/07/2017 10:38 AM, Magnus Hagander wrote:

>> Not sure if it would be doable in the code, but we could also have:
>> host all all example.com scram method=sha256plus,sha256
>>
>> or something like that. Which would fit within the current syntax of the
>> file. But I think it might not be enough, because then you couldn't have
>> two entries with different scram methods for the same combination of the
>> other fields -- the hba *matching* doesn't look at the options fields.
>
> You can't have two entries with the same type+database+user+address
> combination, period. (Or if you do, the second one is ignored.)

So we need a methods= list for users who want to constrain the allowed
methods, accepting a list of methods. This is just how things like SSH
work; e.g. ssh_config might contain

Ciphers aes128-cbc,3des-cbc

if you feel like using the old dodgy stuff today.

If the user doesn't supply a methods= list, they get a full list of
supported methods by the server to choose from in the 'B' message, and
can auth with any one of them.

I'm aware there are some compat concerns there, but existing clients
will already have no idea what the scram method is, so now's our
chance to lock it in as containing a *list* of permitted methods. Even
though to start with it it's hard coded.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

On Fri, Apr 7, 2017 at 9:59 AM, Heikki Linnakangas  wrote:

> On 04/07/2017 10:38 AM, Magnus Hagander wrote:
>
>> So here's a wild idea. What if we just call it "sha256"? Does the user
>> actually care about it being scram, or is scram just an implementation
>> detail for them? That way when the next one shows up, it'll be sha512 or
>> whatever. It happens to use scram under the hood, but does the user have
>> to
>> or does the user want to care about that?
>>
>> (One could argue the same way that the user shouldn't have to or want to
>> care about the hashing algorithm -- but if that's the case then we should
>> only have one entry, it would be "scram", and the system would decide from
>> there. And I think this discussion already indicates we don't think this
>> is
>> enough)
>>
>
> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.
>

If that is the important part, then I agree :) I am not entirely sure that
the scram part *is* more important though.

I think most users will be a lot more comfortable with "sha256" than
"scram" though. But I guess that says using scram-sha-256 is the correct
way.



> The main against using just "scram" is that it's misleading, because we
> implement SCRAM-SHA-256, rather than SCRAM-SHA-1, which was the first SCRAM
> mechanism, commonly called just SCRAM. As long as that's the only SCRAM
> variant we have, that's not too bad, but it becomes more confusing if we
> ever implement SCRAM-SHA-512 or SCRAM-something-else in the future. That's
> the point Noah made, and it's a fair point, but the question is whether we
> consider that to be more important than having a short name for what we
> have now.


Yeah, I agree we should be prepared for the future. And having "scram" and
"scram-sha-512" would definitely fall under confusing.


The channel binding aspect is actually more important to think about right
>> now, as that we will hopefully implement in the next release or two.
>>
>> In [1], Michael wrote:
>>
>> There is also the channel binding to think about... So we could have a
>>> list of keywords perhaps associated with SASL? Imagine for example:
>>> sasl$algo,$channel_binding
>>> Giving potentially:
>>> saslscram_sha256
>>> saslscram_sha256,channel
>>> saslscram_sha512
>>> saslscram_sha512,channel
>>> In the case of the patch of this thread just the first entry would
>>> make sense, once channel binding support is added a second
>>> keyword/option could be added. And there are of course other methods
>>> that could replace SCRAM..
>>>
>>
>> It should also be possible to somehow specify "use channel binding, if the
>> client supports it".
>>
>
> Is that really a type of authentication? We already hvae the idea of
> authentication method options, used for most other things except md5 which
> doesn't have any. So it could be "sha256 channelbind=on", "sha256
> channelbind=off" or "sha256 channelbind=negotiate" or something like that?
>

> Technically, the channel-binding variant is a separate SASL mechanism,
i.e. it has a separate name, SCRAM-SHA-256-PLUS. I'm not sure if >
users/admins think of it that way.


I bet they don't.



I don't think "sasl" is interesting to a user, it's the actual mechanisms
>> (e.g "scram-sha256") that matter. So I'd suggest that we allow a list of
>> algorithms in the method field. If we go with the longer "scram-sha-256"
>> name, it would look like this:
>>
>> # TYPE  DATABASEUSERADDRESS METHOD
>> hostall all example.com scram-sha-256-plus,
>> scram-sha-256
>>
>> The problem again is that those names are quite long. Is that OK?
>>
>
> Not sure if it would be doable in the code, but we could also have:
> host all all example.com scram method=sha256plus,sha256
>
> or something like that. Which would fit within the current syntax of the
> file. But I think it might not be enough, because then you couldn't have
> two entries with different scram methods for the same combination of the
> other fields -- the hba *matching* doesn't look at the options fields.
>


> You can't have two entries with the same type+database+user+address
combination, period. (Or if you do, the second one is ignored.)

That's exactly my point.

//Magnus

Re: [HACKERS] pgbench - allow to store select results into variables




If I understand correctly, the patch is moved because of the unrelated
issue that variables cannot be utf8 in pgbench, and it is a condition
to consider this patch that existing pgbench variables (set with \set)
can be utf8?


I'm not sure if it is "unrelated" because the new feature relies on
existing pgbench variable infrastructure.


Sure. I meant that the constraint on variable names exists before the 
patch and the patch is not related to variable names, but the patch is 
about variables, obviously.


As "psql" variables can be utf8 and that the same scanner is used, but the 
variables values are not stritcly the same (they are typed in pgbench), 
I'm wondering whether the effort should be do share more code/abstraction 
between psql & pgbench or just adjust/replicate the needed small 
functions/code excerpts.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables



Hello Tatsuo,


Ok, I will move the patch to the next cf.


Done.


If I understand correctly, the patch is moved because of the unrelated 
issue that variables cannot be utf8 in pgbench, and it is a condition to 
consider this patch that existing pgbench variables (set with \set) can be 
utf8?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - allow to store select results into variables

2017-04-07 Thread Tatsuo Ishii

> If I understand correctly, the patch is moved because of the unrelated
> issue that variables cannot be utf8 in pgbench, and it is a condition
> to consider this patch that existing pgbench variables (set with \set)
> can be utf8?

I'm not sure if it is "unrelated" because the new feature relies on
existing pgbench variable infrastructure.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange


On 04/06/2017 11:16 PM, Simon Riggs wrote:

or it
can just ignore the list and send what it wants anyway, probably leading
to client disconnect.

It would need to follow one of the requested protocols, but mark the
request as doomed. Otherwise we'd be revealing information. That's
what SCRAM does now.


It's not a secret today, what authentication method the server requires. 
You can't really hide it, anyway, as the client could probe with 
different lists of supported methods, and see which method the server 
picks in each case.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange

2017-04-07 Thread Craig Ringer

On 7 April 2017 at 16:33, Heikki Linnakangas  wrote:

> That list of supported authentication methods would need to be included in
> the startup message. Unfortunately, there is no way to add options to the
> startup message, without breaking compatibility with old servers. If there
> is an option in the startup message that the server doesn't understand, it
> will treat it as a GUC, and you get an "unrecognized configuration
> parameter" after authentication.

sasl.mechanisms = 'SCRAM_SHA256'

:p

No, I'm not seriously suggesting we abuse that.

Personally I think it's reasonable enough to let the server send a 'B'
message with supported auth modes. I'm not overly concerned about the
small information leak that provides. We're unlikely to be able to
convincingly fake execution of any and all SASL auth methods the
client may request, and since they may require any arbitrary number of
message exchanges we'd basically land up blackholeing clients that try
an unsupported auth-method.

No thanks. It's one area I'd rather honestly say "nope, we don't
support that". In which case the client can easily enough probe for
all known methods, and we might as well just tell it up front.

This is hardly new. Most servers with neotiable auth, like SMTP, IMAP,
etc, have the server supply a list of auth mechs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-04-07 Thread Mithun Cy

On Thu, Apr 6, 2017 at 4:12 AM, Andres Freund  wrote:
> On 2017-03-13 18:45:00 +0530, Mithun Cy wrote:
>> I have implemented a similar logic now. The prewarm bgworker will
>> launch a sub-worker per database in the dump file. And, each
>> sub-worker will load its database block info. The sub-workers will be
>> launched only after previous one is finished. All of this will only
>> start if the database has reached a consistent state.
>
> Hm. For replay performance it'd possibly be good to start earlier,
> before reaching consistency.  Is there an issue starting earlier?

Thanks Andres for a detailed review. I will try to address them in my next
post. I thought it is important to reply to above comment before that.
Earlier patches used to start loading blocks before reaching a consistent
state. Then Robert while reviewing found a basic flaw in my approach[1].
The function DropRelFileNodesAllBuffers do not expect others to load the
blocks concurrently while it is getting rid of buffered blocks. So has to
delay loading until database reaches consistent state so that we can
connect to each database and take a relation lock before loading any of
theirs blocks.

[1] cannot load blocks without holding relation lock

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] SCRAM authentication, take three


On 04/07/2017 10:38 AM, Magnus Hagander wrote:

So here's a wild idea. What if we just call it "sha256"? Does the user
actually care about it being scram, or is scram just an implementation
detail for them? That way when the next one shows up, it'll be sha512 or
whatever. It happens to use scram under the hood, but does the user have to
or does the user want to care about that?

(One could argue the same way that the user shouldn't have to or want to
care about the hashing algorithm -- but if that's the case then we should
only have one entry, it would be "scram", and the system would decide from
there. And I think this discussion already indicates we don't think this is
enough)


I think the "SCRAM" part is more important than "SHA-256", so -1 on that.

The main against using just "scram" is that it's misleading, because we 
implement SCRAM-SHA-256, rather than SCRAM-SHA-1, which was the first 
SCRAM mechanism, commonly called just SCRAM. As long as that's the only 
SCRAM variant we have, that's not too bad, but it becomes more confusing 
if we ever implement SCRAM-SHA-512 or SCRAM-something-else in the 
future. That's the point Noah made, and it's a fair point, but the 
question is whether we consider that to be more important than having a 
short name for what we have now.



The channel binding aspect is actually more important to think about right
now, as that we will hopefully implement in the next release or two.

In [1], Michael wrote:


There is also the channel binding to think about... So we could have a
list of keywords perhaps associated with SASL? Imagine for example:
sasl$algo,$channel_binding
Giving potentially:
saslscram_sha256
saslscram_sha256,channel
saslscram_sha512
saslscram_sha512,channel
In the case of the patch of this thread just the first entry would
make sense, once channel binding support is added a second
keyword/option could be added. And there are of course other methods
that could replace SCRAM..


It should also be possible to somehow specify "use channel binding, if the
client supports it".


Is that really a type of authentication? We already hvae the idea of
authentication method options, used for most other things except md5 which
doesn't have any. So it could be "sha256 channelbind=on", "sha256
channelbind=off" or "sha256 channelbind=negotiate" or something like that?


Technically, the channel-binding variant is a separate SASL mechanism, 
i.e. it has a separate name, SCRAM-SHA-256-PLUS. I'm not sure if 
users/admins think of it that way.



I don't think "sasl" is interesting to a user, it's the actual mechanisms
(e.g "scram-sha256") that matter. So I'd suggest that we allow a list of
algorithms in the method field. If we go with the longer "scram-sha-256"
name, it would look like this:

# TYPE  DATABASEUSERADDRESS METHOD
hostall all example.com scram-sha-256-plus,
scram-sha-256

The problem again is that those names are quite long. Is that OK?


Not sure if it would be doable in the code, but we could also have:
host all all example.com scram method=sha256plus,sha256

or something like that. Which would fit within the current syntax of the
file. But I think it might not be enough, because then you couldn't have
two entries with different scram methods for the same combination of the
other fields -- the hba *matching* doesn't look at the options fields.


You can't have two entries with the same type+database+user+address 
combination, period. (Or if you do, the second one is ignored.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compiler warning in costsize.c

On Fri, Apr 7, 2017 at 12:38 AM, Michael Paquier
 wrote:
> On Tue, Apr 4, 2017 at 9:42 PM, Michael Paquier
>  wrote:
>> On Wed, Apr 5, 2017 at 2:54 AM, Tom Lane  wrote:
>>> (I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY,
>>> because it tends to confuse pgindent.)
>>
>> I would be incline to just do that, any other solution I can think of
>> is uglier than that.
>
> Actually, no. Looking at this issue again the warning is triggered
> because the Assert() clause is present in USE_ASSERT_CHECKING. So
> instead of removing PG_USED_FOR_ASSERTS_ONLY, here is a more simple
> patch that fixes the problem. No need to put the variable *rte within
> ifdefs as well.

Bah. This actually fixes nothing. Attached is a different patch that
really addresses the problem, by removing the variable because we
don't want planner_rt_fetch() to run for non-Assert builds.
-- 
Michael


costsize-warning-fix-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange


On 04/07/2017 11:57 AM, Craig Ringer wrote:

On 7 April 2017 at 16:33, Heikki Linnakangas  wrote:


That list of supported authentication methods would need to be included in
the startup message. Unfortunately, there is no way to add options to the
startup message, without breaking compatibility with old servers. If there
is an option in the startup message that the server doesn't understand, it
will treat it as a GUC, and you get an "unrecognized configuration
parameter" after authentication.


sasl.mechanisms = 'SCRAM_SHA256'

:p

No, I'm not seriously suggesting we abuse that.


Hmm, that's not such a bad idea, actually. It only goes back to 9.2, 
though. Before that, the prefix needed to be listed in 
custom_variable_classes, or you got an error. 9.2 is the oldest 
supported version, but libpq should still be able to connect to older 
versions.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting huge pages on Windows

On Wed, Apr 5, 2017 at 9:15 AM, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andres Freund
> > As I asked before, why can't we delete all privs and add the explicitly
> > needed once back (using AdjustTokenPrivileges)?
>
> I tried it with pg_ctl.c attached to an earlier mail today, i.e. delete
> all privs with CreateRestrictedToken(DISABLE_ALL_PRIVILEGE) and enable
> Lock Pages in Memory with AdjustTokenPrivileges().  But it didn't work;
> AdjustTokenPrivileges() failed to enable the priv.  It's probably that
> CreateRestrictedToken() deletes (unassigns?) the privs from the access
> token, so subsequent AdjustTokenPrivileges() can no longer enable the priv.
>
>
Once you have used CreateRestrictedToken(), you can no longer add
*anything* to it. It's not just removed privileges, there's a special flag
on the token that says it's restricted (can be checked with
IsTokenRestricted()).

I think what you'd need to do is enumerate what privileges the user has
*before* calling CreateRestrictedToken(), using GetTokenInformation(). And
then pass those into PrivilegesToDelete (except for
SeChangeNotifyPrivilege) in the call to CreateRestrictedToken(), instead of
using DISABLE_MAX_PRIVILEGE. (and add the privilege needed for huge pages
before you start that whole process -- that needs to be added in the token
used *before* we create the restricted one).

At least that's my guess from reading the docs and trying to remember :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256


On 04/07/2017 05:30 AM, Michael Paquier wrote:

On Fri, Apr 7, 2017 at 2:47 AM, Heikki Linnakangas  wrote:

On 04/06/2017 08:42 PM, Heikki Linnakangas wrote:

There is for example this portion in the new tables:
+static const Codepoint prohibited_output_chars[] =
+{
+   0xD800, 0xF8FF, /* C.3, C.5 */

   - Start Table C.5 -
   D800-DFFF; [SURROGATE CODES]
   - End Table C.5 -
This indicates a range of values. Wouldn't it be better to split this
table in two, one for the range of codepoints and another one with the
single entries?


I considered that, but there are relatively few singular codepoints in
the tables, so it wouldn't save much space. In this patch, singular
codepoints are represented by a range like "0x3000, 0x3000".


I am really wondering if this should not reflect the real range
reported by the RFC. I understand that you have grouped things to save
a couple of bytes, but that would protect from any updates of the
codepoints within those ranges (unlikely to happen I agree).


It just means that there will be some more work required to apply the 
changes to the current lists. I constructed the lists manually to begin 
with, copy-pasting the lists from the RFC, and moving and merging 
entries by hand. I wouldn't mind doing that by hand again, if the lists 
change. But as you said, it seems unlikely that they would change any 
time soon.



You may want to add a .gitignore in src/common/unicode for norm_test
and norm_test_table.h.


Added, and pushed, with some more comment fixes.

Many thanks, Michael!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three


On 04/07/2017 08:21 AM, Noah Misch wrote:

On Thu, Apr 06, 2017 at 09:46:29PM +0300, Heikki Linnakangas wrote:

On 04/06/2017 08:36 AM, Noah Misch wrote:

On Tue, Mar 07, 2017 at 02:36:13PM +0200, Heikki Linnakangas wrote:

I didn't include the last-minute changes to the way you specify this in
pg_hba.conf. So it's still just "scram". I agree in general that we should
think about how to extend that too, but I think the proposed syntax was
overly verbose for what we actually support right now. Let's discuss that as
a separate thread, as well.


[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.


I don't think we will come up with anything better than what we have now, so
I have removed this from the open items list.


Michael shared[1] better pg_hba.conf syntax on 2016-11-05.  I agreed[2] with
his framing of the problem and provided two syntax alternatives, on
2017-01-18.  Michael implemented[3] a variation of one of those on 2017-02-20,
which you declined in your 2017-03-07 commit with just the explanation quoted
above.  I say Michael came up with something better five months ago.


OK. My feeling is that we should have a relatively short and 
easy-to-pronounce name for it. People editing pg_hba.conf with a text 
editor will need to type in the keyword, and "scram" is a lot easier to 
remember than "scram-sha-256". The word will also be used in 
conversations, "hey, Noah, can you add example.com to the hba file, with 
scram, please?" If md5 had a more difficult name, I think we would've 
come up with a shorthand for it back in the day, too.


I might be wrong, of course. I don't set up PostgreSQL installations for 
a living, so I might be out of touch of what's important.



Reserving, as HEAD does today, keyword "scram" to mean "type of SCRAM we
introduced first" will look ugly in 2027.  Cryptographic hash functions have a
short shelf life compared to PostgreSQL.


I don't think that's such a big deal. Firstly, I don't think it would be 
too bad for "scram" to mean "the type of SCRAM we introduced first". 
Secondly, we can add an alias later, if we add support for a new 
mechanism in the SCRAM family.


Our MD5 authentication method was introduced in 2001, I expect 
SCRAM-SHA-256 to also last about 15 years before we consider replacing 
it. Note that the big problem with our MD5 authentication is not 
actually the hash algorithm. There are still no practical pre-image 
attacks on MD5, even though it's thoroughly broken for collisions. If we 
had "SCRAM-MD5", it would still be OK. So I'd hazard a guess that 
whatever will eventually replace SCRAM-SHA-256, will not be SCRAM with a 
different hash algorithm, but something else entirely.


The channel binding aspect is actually more important to think about 
right now, as that we will hopefully implement in the next release or two.


In [1], Michael wrote:

There is also the channel binding to think about... So we could have a
list of keywords perhaps associated with SASL? Imagine for example:
sasl$algo,$channel_binding
Giving potentially:
saslscram_sha256
saslscram_sha256,channel
saslscram_sha512
saslscram_sha512,channel
In the case of the patch of this thread just the first entry would
make sense, once channel binding support is added a second
keyword/option could be added. And there are of course other methods
that could replace SCRAM..


It should also be possible to somehow specify "use channel binding, if 
the client supports it".


I don't think "sasl" is interesting to a user, it's the actual 
mechanisms (e.g "scram-sha256") that matter. So I'd suggest that we 
allow a list of algorithms in the method field. If we go with the longer 
"scram-sha-256" name, it would look like this:


# TYPE  DATABASEUSERADDRESS METHOD
host	all all example.com 
scram-sha-256-plus, scram-sha-256


The problem again is that those names are quite long. Is that OK?

In [2], you wrote:

The latest versions document this precisely, but I agree with Peter's concern
about plain "scram".  Suppose it's 2025 and PostgreSQL support SASL mechanisms
OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-256-PLUS, and SCRAM-SHA3-512.  What
should the pg_hba.conf options look like at that time?  I don't think having a
single "scram" option fits in such a world.  I see two strategies that fit:

1. Single "sasl" option, with a GUC, similar to ssl_ciphers, controlling the
   mechanisms to offer.
2. Separate options "scram_sha_256", "scram_sha3_512", "oauthbearer", etc.


The example I gave above is like option 2. The problem with option 1 is 
that different SASL mechanisms can have very different properties. You 
might want to allow "NTLM" from a trusted network, but require "OTP" 
from the public internet, for example. So I don't think a single GUC 
would be flexible enough.


That said, a GUC with a more narrow scope might be OK. If we

Re: [HACKERS] SCRAM authentication, take three

Jumping late into this one, apologies if these opinions have already been
up and discarded.

On Fri, Apr 7, 2017 at 9:28 AM, Heikki Linnakangas  wrote:

> On 04/07/2017 08:21 AM, Noah Misch wrote:
>
>> On Thu, Apr 06, 2017 at 09:46:29PM +0300, Heikki Linnakangas wrote:
>>
>>> On 04/06/2017 08:36 AM, Noah Misch wrote:
>>>
 On Tue, Mar 07, 2017 at 02:36:13PM +0200, Heikki Linnakangas wrote:

> I didn't include the last-minute changes to the way you specify this in
> pg_hba.conf. So it's still just "scram". I agree in general that we
> should
> think about how to extend that too, but I think the proposed syntax was
> overly verbose for what we actually support right now. Let's discuss
> that as
> a separate thread, as well.
>

 [Action required within three days.  This is a generic notification.]

 The above-described topic is currently a PostgreSQL 10 open item.

>>>
>>> I don't think we will come up with anything better than what we have
>>> now, so
>>> I have removed this from the open items list.
>>>
>>
>> Michael shared[1] better pg_hba.conf syntax on 2016-11-05.  I agreed[2]
>> with
>> his framing of the problem and provided two syntax alternatives, on
>> 2017-01-18.  Michael implemented[3] a variation of one of those on
>> 2017-02-20,
>> which you declined in your 2017-03-07 commit with just the explanation
>> quoted
>> above.  I say Michael came up with something better five months ago.
>>
>
> OK. My feeling is that we should have a relatively short and
> easy-to-pronounce name for it. People editing pg_hba.conf with a text
> editor will need to type in the keyword, and "scram" is a lot easier to
> remember than "scram-sha-256". The word will also be used in conversations,
> "hey, Noah, can you add example.com to the hba file, with scram, please?"
> If md5 had a more difficult name, I think we would've come up with a
> shorthand for it back in the day, too.
>
> I might be wrong, of course. I don't set up PostgreSQL installations for a
> living, so I might be out of touch of what's important.
>
> Reserving, as HEAD does today, keyword "scram" to mean "type of SCRAM we
>> introduced first" will look ugly in 2027.  Cryptographic hash functions
>> have a
>> short shelf life compared to PostgreSQL.
>>
>
> I don't think that's such a big deal. Firstly, I don't think it would be
> too bad for "scram" to mean "the type of SCRAM we introduced first".
> Secondly, we can add an alias later, if we add support for a new mechanism
> in the SCRAM family.
>
> Our MD5 authentication method was introduced in 2001, I expect
> SCRAM-SHA-256 to also last about 15 years before we consider replacing it.
> Note that the big problem with our MD5 authentication is not actually the
> hash algorithm. There are still no practical pre-image attacks on MD5, even
> though it's thoroughly broken for collisions. If we had "SCRAM-MD5", it
> would still be OK. So I'd hazard a guess that whatever will eventually
> replace SCRAM-SHA-256, will not be SCRAM with a different hash algorithm,
> but something else entirely.
>

So here's a wild idea. What if we just call it "sha256"? Does the user
actually care about it being scram, or is scram just an implementation
detail for them? That way when the next one shows up, it'll be sha512 or
whatever. It happens to use scram under the hood, but does the user have to
or does the user want to care about that?

(One could argue the same way that the user shouldn't have to or want to
care about the hashing algorithm -- but if that's the case then we should
only have one entry, it would be "scram", and the system would decide from
there. And I think this discussion already indicates we don't think this is
enough)




>
> The channel binding aspect is actually more important to think about right
> now, as that we will hopefully implement in the next release or two.
>
> In [1], Michael wrote:
>
>> There is also the channel binding to think about... So we could have a
>> list of keywords perhaps associated with SASL? Imagine for example:
>> sasl$algo,$channel_binding
>> Giving potentially:
>> saslscram_sha256
>> saslscram_sha256,channel
>> saslscram_sha512
>> saslscram_sha512,channel
>> In the case of the patch of this thread just the first entry would
>> make sense, once channel binding support is added a second
>> keyword/option could be added. And there are of course other methods
>> that could replace SCRAM..
>>
>
> It should also be possible to somehow specify "use channel binding, if the
> client supports it".
>

Is that really a type of authentication? We already hvae the idea of
authentication method options, used for most other things except md5 which
doesn't have any. So it could be "sha256 channelbind=on", "sha256
channelbind=off" or "sha256 channelbind=negotiate" or something like that?



> I don't think "sasl" is interesting to a user, it's the actual mechanisms
> (e.g "scram-sha256") that

Re: [HACKERS] Letting the client choose the protocol to use during a SASL exchange


On 04/06/2017 11:05 PM, Tom Lane wrote:

Perhaps we could turn this around: have the client send (in the connection
request packet) a list of auth protocols it thinks it is able to handle.
(I'm envisioning this as being more or less fixed for any one version of
any one client, since it would basically mean "I have code to do X, Y, or
Z".)  Then the server can pick one that is allowed by pg_hba.conf, or it
can just ignore the list and send what it wants anyway, probably leading
to client disconnect.


That list of supported authentication methods would need to be included 
in the startup message. Unfortunately, there is no way to add options to 
the startup message, without breaking compatibility with old servers. If 
there is an option in the startup message that the server doesn't 
understand, it will treat it as a GUC, and you get an "unrecognized 
configuration parameter" after authentication.


It would be nice to change that, so that the server would ignore 
parameters that it doesn't understand that begin with "optional_" 
prefix, for example. But it won't help us right now.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_export_snapshot doc

2017-04-07 Thread Tatsuo Ishii

pg_export_snapshot() cannot be used during recovery (i.e. on standby
servers), but it's not documented. IMO this is a bug and should be
fixed. Patch attached.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cb0a36a..9923e67 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18778,6 +18778,7 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup());
 Snapshots are exported with the pg_export_snapshot function,
 shown in , and
 imported with the  command.
+This function cannot be used during recovery.

 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-07 Thread Álvaro Hernández Tortosa




On 07/04/17 11:05, Magnus Hagander wrote:
On Fri, Apr 7, 2017 at 9:59 AM, Heikki Linnakangas > wrote:


On 04/07/2017 10:38 AM, Magnus Hagander wrote:

So here's a wild idea. What if we just call it "sha256"? Does
the user
actually care about it being scram, or is scram just an
implementation
detail for them? That way when the next one shows up, it'll be
sha512 or
whatever. It happens to use scram under the hood, but does the
user have to
or does the user want to care about that?

(One could argue the same way that the user shouldn't have to
or want to
care about the hashing algorithm -- but if that's the case
then we should
only have one entry, it would be "scram", and the system would
decide from
there. And I think this discussion already indicates we don't
think this is
enough)


I think the "SCRAM" part is more important than "SHA-256", so -1
on that.


If that is the important part, then I agree :) I am not entirely sure 
that the scram part *is* more important though.


I agree it is much more important. Needed, I'd say. "SHA-256" could 
refer to other mechanisms that just simply hash the value (maybe with a 
salt, or not) with that hash algorithm. SCRAM is a different beast, with 
much more functionality than that. So yes, it matters a lot :)




I think most users will be a lot more comfortable with "sha256" than 
"scram" though. But I guess that says using scram-sha-256 is the 
correct way.


I don't like UPPERCASE, but the RFC links to the IANA registry 
where SCRAM methods are all uppercase and with dashes: SCRAM-SHA-256 and 
SCRAM-SHA-256-PLUS. I'd use those names, they are standardized.





The main against using just "scram" is that it's misleading,
because we implement SCRAM-SHA-256, rather than SCRAM-SHA-1, which
was the first SCRAM mechanism, commonly called just SCRAM. As long
as that's the only SCRAM variant we have, that's not too bad, but
it becomes more confusing if we ever implement SCRAM-SHA-512 or
SCRAM-something-else in the future. That's the point Noah made,
and it's a fair point, but the question is whether we consider
that to be more important than having a short name for what we
have now.


Yeah, I agree we should be prepared for the future. And having "scram" 
and "scram-sha-512" would definitely fall under confusing.


The channel binding aspect is actually more important to think
about right
now, as that we will hopefully implement in the next release
or two.

In [1], Michael wrote:

There is also the channel binding to think about... So we
could have a
list of keywords perhaps associated with SASL? Imagine for
example:
sasl$algo,$channel_binding
Giving potentially:
saslscram_sha256
saslscram_sha256,channel
saslscram_sha512
saslscram_sha512,channel
In the case of the patch of this thread just the first
entry would
make sense, once channel binding support is added a second
keyword/option could be added. And there are of course
other methods
that could replace SCRAM..


It should also be possible to somehow specify "use channel
binding, if the
client supports it".


Is that really a type of authentication? We already hvae the idea of
authentication method options, used for most other things except
md5 which
doesn't have any. So it could be "sha256 channelbind=on", "sha256
channelbind=off" or "sha256 channelbind=negotiate" or something
like that?


> Technically, the channel-binding variant is a separate SASL 
mechanism, i.e. it has a separate name, SCRAM-SHA-256-PLUS. I'm not 
sure if > users/admins think of it that way.



I bet they don't.


Probably. But let's not underestimate channel binding: it is the 
"greatest" feature of SCRAM, and where security really shines. I'd 
encourage the use of channel binding and would definitely make it explicit.


As for the options, there's no way to negotiate, the client picks. 
It could still be three-valued: on, off, only-channel-binding (or 
however you want to call them). That will only change what mechanisms 
the server will be advertising to clients.




Álvaro



--

Álvaro Hernández Tortosa


---
<8K>data

[HACKERS] ExecPrepareExprList and per-query context

2017-04-07 Thread Amit Langote

As of b8d7f053c5c, ExecPrepareExprList is (must be?) used instead of
ExecPrepareExpr when the caller wants to initialize expressions in a list,
for example, FormIndexDatum.  ExecPrepareExpr doesn't require the caller
to have switched to per-query context, because it itself will.  Same is
not however true for the new ExecPrepareExprList.  That means the List
node that it creates might be in a context that is not necessarily
per-query context, where it previously would be.  That breaks third-party
users of FormIndexDatum that rely on the list to have been created in
per-query context (pg_bulkload was broken by this).

Should ExecPrepareExprList also switch to estate->es_query_cxt?  Or maybe
ExecPrepareExpr could itself detect that passed-in node is a List and
create the list of ExprState nodes by itself.  I guess the reason to
separate list case is because ExecInitExpr() does not take Lists anymore.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Etsuro Fujita


On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Prakash Itnal

Hello,

We currently use psotgres 9.3 in our products. Recently we upgraded to
postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput.
After analyzing carefully I found that "planner time" in 9.6 is very high.
Below are the details:

Scenario:
1 Create a table with 10 rows.
2 Execute simple query: select * from subscriber where s_id = 100;
3 No update/delete/insert; tried vacuum, full vacuum; by default we enable
auto-vacuum

9.3: Avg of "Total runtime" : *0.24ms* [actual throughput: *650 TPS*]
9.6: Avg of Total time: *0.56ms* (Avg of "Planning time" : 0.38ms + Avg of
"Execution time" : 0.18ms) [actual throughput: *80 TPS*]

Check the attachments for more details.

Below is the configuration setting. Full configuration can be found in
attachment.
shared_buffers = 128MB
effective_cache_size = 256MB

Note that we use master-slave (one master - one slave) setup. I could see
no difference even when I take out slave.

I tried all possibilities of increasing shared memory, maitenance_work,
asynchronous commit etc. but, nothing showed any major improvements. Kindly
help to identify what is missing!

PS: We use postgres for small scale so the values are less. The size of the
DB is also just around 180MB.

-- 
Cheers,
Prakash
psql (9.3.14)
Type "help" for help.

perftestdb=# select count(*) from subscriber ;
 count  

 10
(1 row)

perftestdb=# \d subscriber
Table "public.subscriber"
Column| Type  | Modifiers 
--+---+---
 s_id | integer   | not null
 sub_nbr  | character varying(15) | not null
 bit_1| smallint  | 
 bit_2| smallint  | 
 bit_3| smallint  | 
 bit_4| smallint  | 
 bit_5| smallint  | 
 bit_6| smallint  | 
 bit_7| smallint  | 
 bit_8| smallint  | 
 bit_9| smallint  | 
 bit_10   | smallint  | 
 hex_1| smallint  | 
 hex_2| smallint  | 
 hex_3| smallint  | 
 hex_4| smallint  | 
 hex_5| smallint  | 
 hex_6| smallint  | 
 hex_7| smallint  | 
 hex_8| smallint  | 
 hex_9| smallint  | 
 hex_10   | smallint  | 
 byte2_1  | smallint  | 
 byte2_2  | smallint  | 
 byte2_3  | smallint  | 
 byte2_4  | smallint  | 
 byte2_5  | smallint  | 
 byte2_6  | smallint  | 
 byte2_7  | smallint  | 
 byte2_8  | smallint  | 
 byte2_9  | smallint  | 
 byte2_10 | smallint  | 
 msc_location | integer   | 
 vlr_location | integer   | 
Indexes:
"subscriber_pkey" PRIMARY KEY, btree (s_id)
"subscriber_by_sub_nbr" UNIQUE, btree (sub_nbr)
Referenced by:
TABLE "access_info" CONSTRAINT "access_info_s_id_fkey" FOREIGN KEY (s_id) 
REFERENCES subscriber(s_id)
TABLE "special_facility" CONSTRAINT "special_facility_s_id_fkey" FOREIGN 
KEY (s_id) REFERENCES subscriber(s_id)

perftestdb=#  explain analyze select * from subscriber where s_id = 100;
 QUERY PLAN 
 
-
 Index Scan using subscriber_pkey on subscriber  (cost=0.29..8.31 rows=1 
width=88) (actual time=0.049..0.055 rows=1 loops=1)
   Index Cond: (s_id = 100)
 Total runtime: 0.231 ms
(3 rows)

perftestdb=#  explain analyze select * from subscriber where s_id = 100;
 QUERY PLAN 
 
-
 Index Scan using subscriber_pkey on subscriber  (cost=0.29..8.31 rows=1 
width=88) (actual time=0.059..0.066 rows=1 loops=1)
   Index Cond: (s_id = 100)
 Total runtime: 0.246 ms
(3 rows)

perftestdb=#  explain analyze select * from subscriber where s_id = 100;
 QUERY PLAN 
 
-
 Index Scan using subscriber_pkey on subscriber  (cost=0.29..8.31 rows=1 
width=88) (actual time=0.059..0.066 rows=1 loops=1)
   Index Cond: (s_id = 100)
 Total runtime: 0.249 ms
(3 rows)

perftestdb=#  explain analyze select * from subscriber where s_id = 100;
 QUERY PLAN

Re: [HACKERS] Declarative partitioning - another take

2017-04-07 Thread Maksim Milyutin


On 07.04.2017 13:05, Etsuro Fujita wrote:

On 2016/12/14 16:20, Etsuro Fujita wrote:

On 2016/12/09 19:46, Maksim Milyutin wrote:

I would like to work on two tasks:
 - insert (and eventually update) tuple routing for foreign partition.
 - the ability to create an index on the parent and have all of the
children inherit it;

The first one has been implemented in pg_pathman somehow, but the code
relies on dirty hacks, so the FDW API has to be improved. As for the
extended index support, it doesn't look like a super-hard task.



That would be great!  I'd like to help review the first one.


There seems to be no work on the first one, so I'd like to work on that.


Yes, you can start to work on this, I'll join later as a reviewer.



--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

On Fri, Apr 7, 2017 at 6:29 AM, Tom Lane  wrote:

> Andres Freund  writes:
> > On 2017-04-07 13:07:59 +0900, Michael Paquier wrote:
> >> On Fri, Apr 7, 2017 at 1:01 PM, Tom Lane  wrote:
> >>> Still, it's not very clear why we need to cater for building just libpq
> >>> rather than the whole distribution, and a user of win32.mak presumably
> >>> has the option to do the latter.
>
> >> Indeed. Those recent reports indicate that removing win32.c would be a
> >> bad move.
>
> > For me they indicate the contrary, that we're currently not properly
> > maintaining it so that longstanding errors crop up.
>

Is it broken in HEAD only or also in 9.6?

I think whomever uses win32.mak to build is quite unlikely to be tracking
HEAD. They would notice at release time. (Since we don't have a buildfarm
animal doing it)

> Yeah.  For win32.mak, the key question is whether there is still anybody
> who'd have an insurmountable problem with building the whole distro via
> src/tools/msvc/ rather than just building libpq with win32.mak.  Given our
> lack of infrastructure for testing win32.mak, continuing support for it
> seems like quite an expensive proposition from the developer-time
> standpoint.  I don't really want to do that if it's only going to save
> somebody an occasional few minutes of build time.
>

Insurmountable, probably not. The big difference is that you don't need
*any* dependencies to build a libpq using win32.mak, but you need many of
them (to start with, perl...) to build using the built-in one. For people
who want to build it, it certainly save a lot more than "a few minutes".
For somebody who doesn't have ready scripts it takes a *long* time to set
up a build environment to do our regular msvc builds.

I think the question is more, is there any need for people to do that at
all, or are those people just going to be using the pre-built binaries
anyway? That question I can't say I know the answer to.

bcc32.mak is in a different category because it's basically the only
> solution if you want to build libpq in Borland C.  But the lack of
> user input suggests that maybe nobody cares about that anymore.
>

I think there's always been even fewer users of bcc32.mak than of the
win32.mak one.

> Borland C, per se, has been dead since the 90s according to wikipedia.
> There are successor products with different names, none of which I can
> recall anybody ever mentioning on the PG lists.  I speculate that
> people are taking libpq.dll built with MSVC and using it in those
> products, if they're using them with PG at all.
>

The compiler is still called bcc (bcc32c to be specific - see
https://www.embarcadero.com/free-tools/ccompiler).  Apparently it's clang
based now. I don't know if our mak file even works with that anymore
though, it wouldn't surprise me if it doesn't. But the non-change-of-name
could be why we're not seeing questions about it.

FWIW, I've suggested we drop it before, so no objections to that part from
me (and if we do, there's some #ifdefs around it in headers as well).

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

[HACKERS] [PATCH] Document the order of changing certain settings when using hot-standby servers

2017-04-07 Thread Yorick Peterse

The attached patch updates the hot-standby documentation (in the high
availability section) so it explicitly mentions that certain settings
need to be applied to servers in a particular order. For example, it
states that if you increase a certain setting (e.g. max_connections) you
need to do so on a primary first, before applying it to any secondaries.

Previously this was not explicitly mentioned and could lead to one
thinking they can just apply the setting to all servers in parallel,
resulting in standby servers refusing to start.

The exact phrasing currently used in the patch may be a bit rough, I'm
open to suggestions on how to best improve the writing.

The patch is based on the master branch and applies cleanly to it.

Yorick
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 51359d6236..bbb0e1aab2 100644
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***
*** 2098,2104  LOG:  database system is ready to accept read only connections
  be equal to or greater than the value on the primary. If these parameters
  are not set high enough then the standby will refuse to start.
  Higher values can then be supplied and the server
! restarted to begin recovery again.  These parameters are:
  

 
--- 2098,2108 
  be equal to or greater than the value on the primary. If these parameters
  are not set high enough then the standby will refuse to start.
  Higher values can then be supplied and the server
! restarted to begin recovery again. If you want to increase these values you
! should do so on the primary first, before applying the changes to any
! standby servers. If you instead want to decrease these values you should do
! so on the standbys first, before applying the changes to the primary. These
! parameters are:
  

 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Document the order of changing certain settings when using hot-standby servers

2017-04-07 Thread Aleksander Alekseev

Hi Yorick,

> The attached patch updates the hot-standby documentation (in the high
> availability section) so it explicitly mentions that certain settings
> need to be applied to servers in a particular order. For example, it
> states that if you increase a certain setting (e.g. max_connections)
> you need to do so on a primary first, before applying it to any
> secondaries.

I'm sorry to inform you that your description of max_connection is,
lets say, not quite accurate. I've just increased max_connections on a
standby without doing anything on a master and nothing wrong happened.

-- 
Best regards,
Aleksander Alekseev


pgpqfeUI2TpqL.pgp
Description: OpenPGP digital signature

Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256

On Fri, Apr 7, 2017 at 8:58 PM, Heikki Linnakangas  wrote:
> On 04/07/2017 05:30 AM, Michael Paquier wrote:
>> I am really wondering if this should not reflect the real range
>> reported by the RFC. I understand that you have grouped things to save
>> a couple of bytes, but that would protect from any updates of the
>> codepoints within those ranges (unlikely to happen I agree).
>
> It just means that there will be some more work required to apply the
> changes to the current lists. I constructed the lists manually to begin
> with, copy-pasting the lists from the RFC, and moving and merging entries by
> hand. I wouldn't mind doing that by hand again, if the lists change. But as
> you said, it seems unlikely that they would change any time soon.

Yeah, I don't mind either. That's simple enough to change should that happen.

>> You may want to add a .gitignore in src/common/unicode for norm_test
>> and norm_test_table.h.
>
> Added, and pushed, with some more comment fixes.

Nice. There are still a couple of important items pending for SCRAM,
so I would think that it is better to not do the refactoring now (but
rework it in PG11), but polish a bit more the documentation. Your
thoughts on that?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Document the order of changing certain settings when using hot-standby servers

2017-04-07 Thread Stephen Frost

Aleksander, Yorick,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> > The attached patch updates the hot-standby documentation (in the high
> > availability section) so it explicitly mentions that certain settings
> > need to be applied to servers in a particular order. For example, it
> > states that if you increase a certain setting (e.g. max_connections)
> > you need to do so on a primary first, before applying it to any
> > secondaries.
> 
> I'm sorry to inform you that your description of max_connection is,
> lets say, not quite accurate. I've just increased max_connections on a
> standby without doing anything on a master and nothing wrong happened.

Right, the logic there is reversed- reduction has to be done on the
primary first and then WAL replayed on the replica, while increasing has
to be done on the secondary first and then on the primary.

I do think that we should add the (correct!) information into the docs
explicitly, perhaps even as a 'Note', since it can be quite confusing
otherwise.

Thanks!

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [PATCH] Document the order of changing certain settings when using hot-standby servers

2017-04-07 Thread Yorick Peterse

Ha! It seems I indeed had it the other way around. I suppose that's what
happens when writing a patch late at night. Somewhat ironically I did
have the other correct in my Git commit message.

Attached is an updated version of the patch that corrects the order in
the documentation.

Yorick
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 51359d6236..434afe5d43 100644
*** a/doc/src/sgml/high-availability.sgml
--- b/doc/src/sgml/high-availability.sgml
***
*** 2098,2104  LOG:  database system is ready to accept read only connections
  be equal to or greater than the value on the primary. If these parameters
  are not set high enough then the standby will refuse to start.
  Higher values can then be supplied and the server
! restarted to begin recovery again.  These parameters are:
  

 
--- 2098,2108 
  be equal to or greater than the value on the primary. If these parameters
  are not set high enough then the standby will refuse to start.
  Higher values can then be supplied and the server
! restarted to begin recovery again. If you want to increase these values you
! should do so on any standby servers first, before applying the changes to
! the primary. If you instead want to decrease these values you should do so
! on the primary first, before applying the changes to any standby servers.
! These parameters are:
  

 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

On 04/07/2017 02:00 PM, Magnus Hagander wrote:

On Fri, Apr 7, 2017 at 6:29 AM, Tom Lane wrote:

Yeah. For win32.mak, the key question is whether there is still anybody
who'd have an insurmountable problem with building the whole distro via
src/tools/msvc/ rather than just building libpq with win32.mak. Given our
lack of infrastructure for testing win32.mak, continuing support for it
seems like quite an expensive proposition from the developer-time
standpoint. I don't really want to do that if it's only going to save
somebody an occasional few minutes of build time.

Insurmountable, probably not. The big difference is that you don't need
*any* dependencies to build a libpq using win32.mak, but you need many of
them (to start with, perl...) to build using the built-in one. For people
who want to build it, it certainly save a lot more than "a few minutes".
For somebody who doesn't have ready scripts it takes a *long* time to set
up a build environment to do our regular msvc builds.

I think the question is more, is there any need for people to do that at
all, or are those people just going to be using the pre-built binaries
anyway? That question I can't say I know the answer to.

It does seem handy, if all you want is libpq. Clearly not many people
use it, though.

I just tested it. After adding all the missing files to the makefile,
I'm getting an error:

.\Release\libpq.dll.manifest : general error c1010070: Failed to load and parse
the manifest. The system cannot find the file specified.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Windows Kits\10\bin\x86\mt.
XE"' : return code '0x1f'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio 14.
\VC\BIN\nmake.EXE"' : return code '0x2'
Stop.

This seems be the same as the 2nd error that was reported back in 2013:
https://www.postgresql.org/message-id/CAJ2%3DPVQcW8UGNnSy%3DOw%3DvUK2zpjowTkzUS1B864REa7LOT140Q%40mail.gmail.com.

Despite the failure, it built a libpq.dll file, and it seems to work. I
have no idea what the manifest file is.

I could easily add the missing files to win32.mak, but unless someone
else steps up to the plate and fixes the manifest issue, I don't think
we have much choice but remove the it.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

On Sat, Apr 8, 2017 at 1:59 AM, Robert Haas  wrote:
> On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
>> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.
>
> I agree.  The point here isn't that we're using a better hashing
> method, even if a lot of people *think* that's the point.  The point
> is we're using a modern algorithm that has nice properties like "you
> can't impersonate the client by steeling the verifier, or even by
> snooping the exchange".
>
> But "sasl" might be even better.

FWIW, my opinion has not changed much on the matter, I would still
favor "sasl" as the keyword used in pg_hba.conf. What has changed in
my mind though is that defining no mechanisms with an additional
option mean that all possible choices are sent to the client. But if
you define a list of mechanisms, then we'll just send back to the
client the specified list as a possible choice of exchange mechanism:
host all all blah.com sasl mechanism=scram-sha-256-plus
Here for example the user would not be allowed to use SCRAM-SHA-256,
just SCRAM with channel binding.

Such an option makes sense once we add support for one more mechanism
in SASL, like channel binding, but that's by far a generic approach
that can serve us for years to come, and by admitting that nothing
listed means all possible options we don't need any immediate action.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

On 4/7/17 16:50, Andres Freund wrote:
> On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:
>> monitoring.sgml has one  tag missing
> 
> Is that actually an issue? SGML allows skipping certain close tags, and
> IIRC row is one them.

The issue is a weird one.  For some reason, older tool chains complain
about this, but newer ones only complain about it when you use certain
warning options.  The mistake here was basically that the osx calls in
the makefile didn't enable those options, so users of newer tool chains
didn't see any complaints.  I have fixed that now.

For clarification, SGML allows applications of SGML to declare whether
they want to allow omitting tags.  HTML (<5) does so.  DocBook does not.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

Andres Freund wrote:
> Improve 64bit atomics support.
> 
> When adding atomics back in b64d92f1a, I added 64bit support as
> optional; there wasn't yet a direct user in sight.  That turned out to
> be a bit short-sighted, it'd already have been useful a number of times.

Seems like this killed an arapaima:
  
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima=2017-04-07%2022%3A06%3A59

Program terminated with signal 6, Aborted.
#0  0x00c6a402 in __kernel_vsyscall ()
#0  0x00c6a402 in __kernel_vsyscall ()
#1  0x00284b10 in raise () from /lib/libc.so.6
#2  0x00286421 in abort () from /lib/libc.so.6
#3  0x084d967e in ExceptionalCondition (
conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
errorType=0xe19831 "UnalignedPointer", 
fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
at assert.c:54
#4  0x00e189b0 in pg_atomic_init_u64 ()
at ../../../src/include/port/atomics.h:428
#5  test_atomic_uint64 () at regress.c:1007
#6  0x00e1905d in test_atomic_ops (fcinfo=0x9362584) at regress.c:1097
#7  0x08273ab2 in ExecInterpExpr (state=0x9362510, econtext=0x93622e0, 
isnull=0xbfbc1a4b "") at execExprInterp.c:650
#8  0x082990a7 in ExecEvalExprSwitchContext (node=0x9362294)
at ../../../src/include/executor/executor.h:289
#9  ExecProject (node=0x9362294)
at ../../../src/include/executor/executor.h:323
#10 ExecResult (node=0x9362294) at nodeResult.c:132
#11 0x0827c6d5 in ExecProcNode (node=0x9362294) at execProcnode.c:416
#12 0x0827a08d in ExecutePlan (queryDesc=0x92daf38, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001')
at execMain.c:1651
#13 standard_ExecutorRun (queryDesc=0x92daf38, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001')
at execMain.c:360
#14 0x083cd8bb in PortalRunSelect (portal=0x92d78a8, 
forward=, count=0, dest=0x9337728) at pquery.c:933
#15 0x083cef1e in PortalRun (portal=0x92d78a8, count=2147483647, 
isTopLevel=1 '\001', run_once=1 '\001', dest=0x9337728, 
altdest=0x9337728, completionTag=0xbfbc1c6a "") at pquery.c:774
#16 0x083cb38f in exec_simple_query (
query_string=0x93360b0 "SELECT test_atomic_ops();") at postgres.c:1105
#17 0x083cc640 in PostgresMain (argc=1, argv=0x92e4160, 
dbname=0x92e3fe0 "regression", username=0x92e3fc4 "postgres")
at postgres.c:4075
#18 0x08349eaf in BackendStartup () at postmaster.c:4317
#19 ServerLoop () at postmaster.c:1729
#20 0x0834db50 in PostmasterMain (argc=8, argv=0x92b9968) at postmaster.c:1337
#21 0x082c01e2 in main (argc=8, argv=Cannot access memory at address 0x5aa5
) at main.c:228


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

On 4/7/17 01:10, Masahiko Sawada wrote:
> It's not critical but it could be problem. So I thought we should fix
> it before the PostgreSQL 10 release. If it's not appropriate as an
> open item I'll remove it.

You wrote that you "sent" a patch, but I don't see a patch anywhere.

I think a nonintrusive patch for this could be considered.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Kyotaro HORIGUCHI wrote:

> The attached patch is quiiiccck-and-dirty-hack of Michael's patch
> just as a PoC of my proposal quoted above. This also passes the
> 006 test.  The major changes are the following.
> 
> - Moved sync_above and truncted_to into  RelationData.

Interesting.  I wonder if it's possible that a relcache invalidation
would cause these values to get lost for some reason, because that would
be dangerous.

I suppose the rationale is that this shouldn't happen because any
operation that does things this way must hold an exclusive lock on the
relation.  But that doesn't guarantee that the relcache entry is
completely stable, does it?  If we can get proof of that, then this
technique should be safe, I think.

In your version of the patch, which I spent some time skimming, I am
missing comments on various functions.  I added some as I went along,
including one XXX indicating it must be filled.

RecordPendingSync() should really live in relcache.c (and probably get a
different name).

> X I feel that I have dropped one of the features of the origitnal
>   patch during the hack, but I don't recall it clearly now:(

Hah :-)

> X I haven't consider relfilenode replacement, which didn't matter
>   for the original patch. (but there's few places to consider).

Hmm ...  Please provide.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0c3e2b0..aa1b97d 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -34,6 +34,28 @@
  *   the POSTGRES heap access method used for all POSTGRES
  *   relations.
  *
+ * WAL CONSIDERATIONS
+ *   All heap operations are normally WAL-logged. but there are a few
+ *   exceptions. Temporary and unlogged relations never need to be
+ *   WAL-logged, but we can also skip WAL-logging for a table that was
+ *   created in the same transaction, if we don't need WAL for PITR or
+ *   WAL archival purposes (i.e. if wal_level=minimal), and we fsync()
+ *   the file to disk at COMMIT instead.
+ *
+ *   The same-relation optimization is not employed automatically on all
+ *   updates to a table that was created in the same transacton, because
+ *   for a small number of changes, it's cheaper to just create the WAL
+ *   records than fsyncing() the whole relation at COMMIT. It is only
+ *   worthwhile for (presumably) large operations like COPY, CLUSTER,
+ *   or VACUUM FULL. Use heap_register_sync() to initiate such an
+ *   operation; it will cause any subsequent updates to the table to skip
+ *   WAL-logging, if possible, and cause the heap to be synced to disk at
+ *   COMMIT.
+ *
+ *   To make that work, all modifications to heap must use
+ *   HeapNeedsWAL() to check if WAL-logging is needed in this transaction
+ *   for the given block.
+ *
  *-
  */
 #include "postgres.h"
@@ -56,6 +78,7 @@
 #include "access/xlogutils.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
+#include "catalog/storage.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "storage/bufmgr.h"
@@ -2356,12 +2379,6 @@ ReleaseBulkInsertStatePin(BulkInsertState bistate)
  * The new tuple is stamped with current transaction ID and the specified
  * command ID.
  *
- * If the HEAP_INSERT_SKIP_WAL option is specified, the new tuple is not
- * logged in WAL, even for a non-temp relation.  Safe usage of this behavior
- * requires that we arrange that all new tuples go into new pages not
- * containing any tuples from other transactions, and that the relation gets
- * fsync'd before commit.  (See also heap_sync() comments)
- *
  * The HEAP_INSERT_SKIP_FSM option is passed directly to
  * RelationGetBufferForTuple, which see for more info.
  *
@@ -2465,7 +2482,7 @@ heap_insert(Relation relation, HeapTuple tup, CommandId 
cid,
MarkBufferDirty(buffer);
 
/* XLOG stuff */
-   if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
+   if (BufferNeedsWAL(relation, buffer))
{
xl_heap_insert xlrec;
xl_heap_header xlhdr;
@@ -2664,12 +2681,10 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, 
int ntuples,
int ndone;
char   *scratch = NULL;
Pagepage;
-   boolneedwal;
SizesaveFreeSpace;
boolneed_tuple_data = RelationIsLogicallyLogged(relation);
boolneed_cids = 
RelationIsAccessibleInLogicalDecoding(relation);
 
-   needwal = !(options & HEAP_INSERT_SKIP_WAL) && 
RelationNeedsWAL(relation);
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

On 2017-04-07 16:36:09 -0700, Andres Freund wrote:
> On 2017-04-07 19:55:21 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > Improve 64bit atomics support.
> > > 
> > > When adding atomics back in b64d92f1a, I added 64bit support as
> > > optional; there wasn't yet a direct user in sight.  That turned out to
> > > be a bit short-sighted, it'd already have been useful a number of times.
> > 
> > Seems like this killed an arapaima:
> >   
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima=2017-04-07%2022%3A06%3A59
> > 
> > Program terminated with signal 6, Aborted.
> > #0  0x00c6a402 in __kernel_vsyscall ()
> > #0  0x00c6a402 in __kernel_vsyscall ()
> > #1  0x00284b10 in raise () from /lib/libc.so.6
> > #2  0x00286421 in abort () from /lib/libc.so.6
> > #3  0x084d967e in ExceptionalCondition (
> > conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> > ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
> > errorType=0xe19831 "UnalignedPointer", 
> > fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
> > at assert.c:54
> > #4  0x00e189b0 in pg_atomic_init_u64 ()
> > at ../../../src/include/port/atomics.h:428
> 
> Gah, that's fairly annoying :(.  We can't trivially force alignment in
> the generic fallback case, because not all compilers support that.  We
> don't really need it the fallback case, because things are protected by
> a lock - but that means we'll have to make a bunch of locks conditional
> :/

Pushed an attempt at fixing this along those lines, let's hope that
works.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

On Fri, Apr 7, 2017 at 6:32 PM, Michael Paquier
 wrote:
> On Sat, Apr 8, 2017 at 1:59 AM, Robert Haas  wrote:
>> On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
>>> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.
>>
>> I agree.  The point here isn't that we're using a better hashing
>> method, even if a lot of people *think* that's the point.  The point
>> is we're using a modern algorithm that has nice properties like "you
>> can't impersonate the client by steeling the verifier, or even by
>> snooping the exchange".
>>
>> But "sasl" might be even better.
>
> FWIW, my opinion has not changed much on the matter, I would still
> favor "sasl" as the keyword used in pg_hba.conf. What has changed in
> my mind though is that defining no mechanisms with an additional
> option mean that all possible choices are sent to the client. But if
> you define a list of mechanisms, then we'll just send back to the
> client the specified list as a possible choice of exchange mechanism:
> host all all blah.com sasl mechanism=scram-sha-256-plus
> Here for example the user would not be allowed to use SCRAM-SHA-256,
> just SCRAM with channel binding.
>
> Such an option makes sense once we add support for one more mechanism
> in SASL, like channel binding, but that's by far a generic approach
> that can serve us for years to come, and by admitting that nothing
> listed means all possible options we don't need any immediate action.

Yes, that all seems quite sensible.  What exactly is the counter-argument?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

On 4/6/17 14:32, Pavel Stehule wrote:
> I like to see any proposals about syntax or implementation.
> 
> Using PRAGMA is one variant - introduced by PLpgSQL origin - Ada
> language. The PRAGMA syntax can be used for PRAGMA autonomous with well
> known syntax. It scales well  - it supports function, block or command
> level.

I had pragmas implemented in the original autonomous transactions patch
(https://www.postgresql.org/message-id/659a2fce-b6ee-06de-05c0-c8ed6a019...@2ndquadrant.com).
 However, the difference there is that the behavior is lexical, specific
to plpgsql, whereas here you are really just selecting run time
behavior.  So a GUC, and also something that could apply to other
places, should be considered.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
> Hi,
>
>
> On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
>> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > I've *not* read the history of this thread.  So I really might be
>> > missing some context.
>> >
>> >
>> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> >> From: Claudio Freire 
>> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>> >>
>> >> Turn the dead_tuples array into a structure composed of several
>> >> exponentially bigger arrays, to enable usage of more than 1GB
>> >> of work mem during vacuum and thus reduce the number of full
>> >> index scans necessary to remove all dead tids when the memory is
>> >> available.
>> >
>> >>   * We are willing to use at most maintenance_work_mem (or perhaps
>> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> >> - * initially allocate an array of TIDs of that size, with an upper limit 
>> >> that
>> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>> >>   * depends on table size (this limit ensures we don't allocate a huge 
>> >> area
>> >> - * uselessly for vacuuming small tables).  If the array threatens to 
>> >> overflow,
>> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
>> >> and page
>> >> - * compaction, then resume the heap scan with an empty TID array.
>> >> + * uselessly for vacuuming small tables). Additional arrays of 
>> >> increasingly
>> >> + * large sizes are allocated as they become necessary.
>> >> + *
>> >> + * The TID array is thus represented as a list of multiple segments of
>> >> + * varying size, beginning with the initial size of up to 128MB, and 
>> >> growing
>> >> + * exponentially until the whole budget of 
>> >> (autovacuum_)maintenance_work_mem
>> >> + * is used up.
>> >
>> > When the chunk size is 128MB, I'm a bit unconvinced that using
>> > exponential growth is worth it. The allocator overhead can't be
>> > meaningful in comparison to collecting 128MB dead tuples, the potential
>> > waste is pretty big, and it increases memory fragmentation.
>>
>> The exponential strategy is mainly to improve lookup time (ie: to
>> avoid large segment lists).
>
> Well, if we were to do binary search on the segment list, that'd not be
> necessary.

True, but the initial lookup might be slower in the end, since the
array would be bigger and cache locality worse.

Why do you say exponential growth fragments memory? AFAIK, all those
allocations are well beyond the point where malloc starts mmaping
memory, so each of those segments should be a mmap segment,
independently freeable.

>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
>> >> + {
>> >> + /*
>> >> +  * The segment is overflowing, so we must allocate 
>> >> a new segment.
>> >> +  * We could have a preallocated segment descriptor 
>> >> already, in
>> >> +  * which case we just reinitialize it, or we may 
>> >> need to repalloc
>> >> +  * the vacrelstats->dead_tuples array. In that 
>> >> case, seg will no
>> >> +  * longer be valid, so we must be careful about 
>> >> that. In any case,
>> >> +  * we must update the last_dead_tuple copy in the 
>> >> overflowing
>> >> +  * segment descriptor.
>> >> +  */
>> >> + Assert(seg->num_dead_tuples == 
>> >> seg->max_dead_tuples);
>> >> + seg->last_dead_tuple = 
>> >> seg->dt_tids[seg->num_dead_tuples - 1];
>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
>> >> vacrelstats->dead_tuples.num_segs)
>> >> + {
>> >> + int new_num_segs = 
>> >> vacrelstats->dead_tuples.num_segs * 2;
>> >> +
>> >> + vacrelstats->dead_tuples.dt_segments = 
>> >> (DeadTuplesSegment *) repalloc(
>> >> +(void *) 
>> >> vacrelstats->dead_tuples.dt_segments,
>> >> +
>> >> new_num_segs * sizeof(DeadTuplesSegment));
>> >
>> > Might be worth breaking this into some sub-statements, it's quite hard
>> > to read.
>>
>> Breaking what precisely? The comment?
>
> No, the three-line statement computing the new value of
> dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
> variable, to cut the length of the statement down.

Ah, alright. Will try to do that.

>> >> +/*
>> >>   *   lazy_tid_reaped() -- is a particular tid deletable?
>> >>   *
>> >>   *   This has the right signature to be an 
>> >> IndexBulkDeleteCallback.
>> >>   *
>> >> - *   Assumes

Re: [HACKERS] Push down more UPDATEs/DELETEs in postgres_fdw

On Wed, Mar 22, 2017 at 6:20 AM, Etsuro Fujita
 wrote:
> On 2017/02/22 19:57, Rushabh Lathia wrote:
>> Marked this as Ready for Committer.
>
> I noticed that this item in the CF app was incorrectly marked as Committed.
> This patch isn't committed, so I returned it to the previous status.  I also
> rebased the patch.  Attached is a new version of the patch.

Sorry, I marked the wrong patch as committed.  Apologies for that.

This doesn't apply any more because of recent changes.

git diff --check complains:
contrib/postgres_fdw/postgres_fdw.c:3653: space before tab in indent.

+/* Shouldn't contain the target relation. */
+Assert(target_rel == 0);

This comment should give a reason.

 void
 deparseDirectUpdateSql(StringInfo buf, PlannerInfo *root,
Index rtindex, Relation rel,
+   RelOptInfo *foreignrel,
List *targetlist,
List *targetAttrs,
List *remote_conds,

Could you add a comment explaining the meaning of these various
arguments?  It takes rtindex, rel, and foreignrel, which apparently
are all different things, but the meaning is not explained.

 /*
+ * Add a RETURNING clause, if needed, to an UPDATE/DELETE on a join.
+ */
+static void
+deparseExplicitReturningList(List *rlist,
+ List **retrieved_attrs,
+ deparse_expr_cxt *context)
+{
+deparseExplicitTargetList(rlist, true, retrieved_attrs, context);
+}

Do we really want to add a function for one line of code?

+/*
+ * Look for conditions mentioning the target relation in the given join tree,
+ * which will be pulled up into the WHERE clause.  Note that this is safe due
+ * to the same reason stated in comments in deparseFromExprForRel.
+ */

The comments for deparseFromExprForRel do not seem to address the
topic of why this is safe.  Also, the answer to the question "safe
from what?" is not clear.

-deparseReturningList(buf, root, rtindex, rel, false,
- returningList, retrieved_attrs);
+if (foreignrel->reloptkind == RELOPT_JOINREL)
+deparseExplicitReturningList(returningList, retrieved_attrs, );
+else
+deparseReturningList(buf, root, rtindex, rel, false,
+ returningList, retrieved_attrs);

Why do these cases need to be handled differently?  Maybe add a brief comment?

+if ((outerrel->reloptkind == RELOPT_BASEREL &&
+ outerrel->relid == target_rel) ||
+(innerrel->reloptkind == RELOPT_BASEREL &&
+ innerrel->relid == target_rel))

1. Surely it's redundant to check the RelOptKind if the RTI matches?

2. Generally, the tests in this patch against various RelOptKind
values should be adapted to use the new macros introduced in
7a39b5e4d11229ece930a51fd7cb29e535db4494.

The regression tests remove every remaining case where an update or
delete gets fails to get pushed to the remote side.  I think we should
still test that path, because we've still got that code.  Maybe use a
non-pushable function in the join clause, or something.

The new test cases could use some brief comments explaining their purpose.

 if (plan->returningLists)
+{
 returningList = (List *) list_nth(plan->returningLists, subplan_index);

+/*
+ * If UPDATE/DELETE on a join, create a RETURNING list used in the
+ * remote query.
+ */
+if (fscan->scan.scanrelid == 0)
+returningList = make_explicit_returning_list(resultRelation,
+ rel,
+ returningList);
+}

Again, the comment doesn't really explain why we're doing this.  And
initializing returningList twice in a row seems strange, too.

I am unfortunately too tired to finish properly reviewing this tonight.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Peter Geoghegan

On Fri, Apr 7, 2017 at 12:28 PM, Alvaro Herrera
 wrote:
> Peter Geoghegan wrote:
>> On Fri, Apr 7, 2017 at 11:37 AM, Andres Freund  wrote:
>> > Write Amplification Reduction Method (WARM)
>> > - fair number of people don't think it's ready for v10.
>
> Given the number of votes against putting this on pg10, I am going to
> back off from this patch now, with an eye towards putting it in pg11 as
> soon as the tree opens.  Either I or Pavan are going to post another
> version of this patch series, within the next couple of weeks, so that
> others can base their testing, review and suggestions.

My offer to work with you on amcheck verification of WARM invariants
remains open. If nothing else, structuring things so that verification
is possible may clarify your design. Formalizing the preconditions,
postconditions, and legal states for on-disk structures might just be
a useful exercise, even if verification never actually finds a
problem.

I anticipate that amcheck verification will become my main focus for
Postgres 11, in any case.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Hi,

I've *not* read the history of this thread.  So I really might be
missing some context.


> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> From: Claudio Freire 
> Date: Mon, 12 Sep 2016 23:36:42 -0300
> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> 
> Turn the dead_tuples array into a structure composed of several
> exponentially bigger arrays, to enable usage of more than 1GB
> of work mem during vacuum and thus reduce the number of full
> index scans necessary to remove all dead tids when the memory is
> available.

>   * We are willing to use at most maintenance_work_mem (or perhaps
>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> - * initially allocate an array of TIDs of that size, with an upper limit that
> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>   * depends on table size (this limit ensures we don't allocate a huge area
> - * uselessly for vacuuming small tables).  If the array threatens to 
> overflow,
> - * we suspend the heap scan phase and perform a pass of index cleanup and 
> page
> - * compaction, then resume the heap scan with an empty TID array.
> + * uselessly for vacuuming small tables). Additional arrays of increasingly
> + * large sizes are allocated as they become necessary.
> + *
> + * The TID array is thus represented as a list of multiple segments of
> + * varying size, beginning with the initial size of up to 128MB, and growing
> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
> + * is used up.

When the chunk size is 128MB, I'm a bit unconvinced that using
exponential growth is worth it. The allocator overhead can't be
meaningful in comparison to collecting 128MB dead tuples, the potential
waste is pretty big, and it increases memory fragmentation.


> + * Lookup in that structure proceeds sequentially in the list of segments,
> + * and with a binary search within each segment. Since segment's size grows
> + * exponentially, this retains O(N log N) lookup complexity.

N log N is a horrible lookup complexity.  That's the complexity of
*sorting* an entire array.  I think you might be trying to argue that
it's log(N) * log(N)? Once log(n) for the exponentially growing size of
segments, one for the binary search?

Afaics you could quite easily make it O(2 log(N)) by simply also doing
binary search over the segments.  Might not be worth it due to the small
constant involved normally.


> + * If the array threatens to overflow, we suspend the heap scan phase and
> + * perform a pass of index cleanup and page compaction, then resume the heap
> + * scan with an array of logically empty but already preallocated TID 
> segments
> + * to be refilled with more dead tuple TIDs.

Hm, it's not really the array that overflows, it's m_w_m that'd be
exceeded, right?


>  /*
> + * Minimum (starting) size of the dead_tuples array segments. Will allocate
> + * space for 128MB worth of tid pointers in the first segment, further 
> segments
> + * will grow in size exponentially. Don't make it too small or the segment 
> list
> + * will grow bigger than the sweetspot for search efficiency on big vacuums.
> + */
> +#define LAZY_MIN_TUPLES  Max(MaxHeapTuplesPerPage, (128<<20) / 
> sizeof(ItemPointerData))

That's not really the minimum, no? s/MIN/INIT/?


> +typedef struct DeadTuplesSegment
> +{
> + int num_dead_tuples;/* # of entries in the 
> segment */
> + int max_dead_tuples;/* # of entries 
> allocated in the segment */
> + ItemPointerData last_dead_tuple;/* Copy of the last dead tuple 
> (unset
> + 
>  * until the segment is fully
> + 
>  * populated) */
> + unsigned short padding;
> + ItemPointer dt_tids;/* Array of dead tuples */
> +}DeadTuplesSegment;

Whenever padding is needed, it should have an explanatory comment.  It's
certainly not obvious to me wh it's neede here.


> @@ -1598,6 +1657,11 @@ lazy_vacuum_index(Relation indrel,
>   ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
>   ivinfo.strategy = vac_strategy;
>  
> + /* Finalize the current segment by setting its upper bound dead tuple */
> + seg = DeadTuplesCurrentSegment(vacrelstats);
> + if (seg->num_dead_tuples > 0)
> + seg->last_dead_tuple = seg->dt_tids[seg->num_dead_tuples - 1];

Why don't we just maintain this here, for all of the segments?  Seems a
bit easier.


> @@ -1973,7 +2037,8 @@ count_nondeletable_pages(Relation onerel, LVRelStats 
> *vacrelstats)
>  static void
>  lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
>  {
> - longmaxtuples;
> + longmaxtuples,
> + mintuples;
>   int

Re: [HACKERS] monitoring.sgml missing tag

2017-04-07 Thread Erik Rijkers


On 2017-04-07 22:50, Andres Freund wrote:

On 2017-04-07 22:47:55 +0200, Erik Rijkers wrote:

monitoring.sgml has one  tag missing


Is that actually an issue? SGML allows skipping certain close tags, and
IIRC row is one them.  We'll probably move to xml at some point not too
far away, but I don't think it makes much sense to fix these 
one-by-one.



Well, I have only used  make oldhtml  before now so maybe I am doing 
something wrong.


I try to run  make html.

First, I got this (just showing first few of a 75x repeat):

$ time ( cd /home/aardvark/pg_stuff/pg_sandbox/pgsql.HEAD/doc/src/sgml;  
make html; )

osx -D . -D . -x lower postgres.sgml >postgres.xml.tmp
osx:monitoring.sgml:1278:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1282:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1286:12:E: document type does not allow element 
"ROW" here

...
osx:monitoring.sgml:1560:12:E: document type does not allow element 
"ROW" here
osx:monitoring.sgml:1564:13:E: end tag for "ROW" omitted, but OMITTAG NO 
was specified

osx:monitoring.sgml:1275:8: start tag was here
make: *** [postgres.xml] Error 1


After closing that tag with  ,  make html  still fails:



$ time ( cd /home/aardvark/pg_stuff/pg_sandbox/pgsql.HEAD/doc/src/sgml;  
make html; )

osx -D . -D . -x lower postgres.sgml >postgres.xml.tmp
'/opt/perl-5.24/bin/perl' -p -e 
's/\[(aacute|acirc|aelig|agrave|amp|aring|atilde|auml|bull|copy|eacute|egrave|gt|iacute|lt|mdash|nbsp|ntilde|oacute|ocirc|oslash|ouml|pi|quot|scaron|uuml) 
*\]/\&\1;/gi;' -e '$_ .= qq{XML V4.2//EN" 
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd;>\n} if $. == 
1;'  postgres.xml

rm postgres.xml.tmp
xmllint --noout --valid postgres.xml
xsltproc --stringparam pg.version '10devel'  stylesheet.xsl postgres.xml
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 41 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
runtime error: file stylesheet-html-common.xsl line 30 element 
call-template

The called template 'id.attribute' was not found.
no result for postgres.xml
make: *** [html-stamp] Error 9

real4m23.641s
user4m22.304s
sys 0m0.914s


Any hints welcome...

thanks


$ cat /etc/redhat-release
CentOS release 6.6 (Final)



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] mvstats triggers 32bit warnings

Hi,

compiling on linux 32 bit I get a lot of warnings due to printf format.
I suspect most of them should just be cured by using %zd or %zu instead
of %ld.

/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c: In function 
‘statext_ndistinct_deserialize’:
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:241:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:241:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVNDistinct size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:277:13: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 2 has 
type ‘unsigned int’ [-Wformat=]
  errmsg("invalid MVNDistinct size %ld (expected at least %ld)",
 ^
/home/andres/src/postgresql/src/include/utils/elog.h:107:14: note: in 
definition of macro ‘ereport_domain’
errfinish rest; \
  ^~~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:275:3: note: in 
expansion of macro ‘ereport’
   ereport(ERROR,
   ^~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:277:13: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘Size {aka unsigned int}’ [-Wformat=]
  errmsg("invalid MVNDistinct size %ld (expected at least %ld)",
 ^
/home/andres/src/postgresql/src/include/utils/elog.h:107:14: note: in 
definition of macro ‘ereport_domain’
errfinish rest; \
  ^~~~
/home/andres/src/postgresql/src/backend/statistics/mvdistinct.c:275:3: note: in 
expansion of macro ‘ereport’
   ereport(ERROR,
   ^~~
In file included from /home/andres/src/postgresql/src/include/postgres.h:48:0,
 from 
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:14:
/home/andres/src/postgresql/src/backend/statistics/dependencies.c: In function 
‘statext_dependencies_deserialize’:
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:514:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:514:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid MVDependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:550:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has 
type ‘unsigned int’ [-Wformat=]
   elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
   ^
/home/andres/src/postgresql/src/include/utils/elog.h:202:23: note: in 
definition of macro ‘elog’
   elog_finish(elevel, __VA_ARGS__); \
   ^~~
/home/andres/src/postgresql/src/backend/statistics/dependencies.c:550:15: 
warning: format ‘%ld’ expects argument of type ‘long int’, but argument 4 has 
type ‘Size {aka unsigned int}’ [-Wformat=]
   elog(ERROR, "invalid dependencies size %ld (expected at least %ld)",
   ^
- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On Fri, Apr 7, 2017 at 10:06 PM, Claudio Freire  wrote:
>>> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
>>> >> + {
>>> >> + /*
>>> >> +  * The segment is overflowing, so we must allocate 
>>> >> a new segment.
>>> >> +  * We could have a preallocated segment descriptor 
>>> >> already, in
>>> >> +  * which case we just reinitialize it, or we may 
>>> >> need to repalloc
>>> >> +  * the vacrelstats->dead_tuples array. In that 
>>> >> case, seg will no
>>> >> +  * longer be valid, so we must be careful about 
>>> >> that. In any case,
>>> >> +  * we must update the last_dead_tuple copy in the 
>>> >> overflowing
>>> >> +  * segment descriptor.
>>> >> +  */
>>> >> + Assert(seg->num_dead_tuples == 
>>> >> seg->max_dead_tuples);
>>> >> + seg->last_dead_tuple = 
>>> >> seg->dt_tids[seg->num_dead_tuples - 1];
>>> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
>>> >> vacrelstats->dead_tuples.num_segs)
>>> >> + {
>>> >> + int new_num_segs = 
>>> >> vacrelstats->dead_tuples.num_segs * 2;
>>> >> +
>>> >> + vacrelstats->dead_tuples.dt_segments = 
>>> >> (DeadTuplesSegment *) repalloc(
>>> >> +(void *) 
>>> >> vacrelstats->dead_tuples.dt_segments,
>>> >> +
>>> >> new_num_segs * sizeof(DeadTuplesSegment));
>>> >
>>> > Might be worth breaking this into some sub-statements, it's quite hard
>>> > to read.
>>>
>>> Breaking what precisely? The comment?
>>
>> No, the three-line statement computing the new value of
>> dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
>> variable, to cut the length of the statement down.
>
> Ah, alright. Will try to do that.

Attached is an updated patch set with the requested changes.

Segment allocation still follows the exponential strategy, and segment
lookup is still linear.

I rebased the early free patch (patch 3) to apply on top of the v9
patch 2 (it needed some changes). I recognize the early free patch
didn't get nearly as much scrutiny, so I'm fine with commiting only 2
if that one's ready to go but 3 isn't.

If it's decided to go for fixed 128M segments and a binary search of
segments, I don't think I can get that ready and tested before the
commitfest ends.
From 9b8f90f19d558a7e6a32cb253d89819f7c300598 Mon Sep 17 00:00:00 2001
From: Claudio Freire 
Date: Mon, 12 Sep 2016 23:36:42 -0300
Subject: [PATCH 1/2] Vacuum: allow using more than 1GB work mem

Turn the dead_tuples array into a structure composed of several
exponentially bigger arrays, to enable usage of more than 1GB
of work mem during vacuum and thus reduce the number of full
index scans necessary to remove all dead tids when the memory is
available.
---
 src/backend/commands/vacuumlazy.c| 346 ---
 src/test/regress/expected/vacuum.out |   8 +
 src/test/regress/sql/vacuum.sql  |  10 +
 3 files changed, 299 insertions(+), 65 deletions(-)

diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 005440e..4f0cf1b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -12,11 +12,25 @@
  *
  * We are willing to use at most maintenance_work_mem (or perhaps
  * autovacuum_work_mem) memory space to keep track of dead tuples.  We
- * initially allocate an array of TIDs of that size, with an upper limit that
+ * initially allocate an array of TIDs of 128MB, or an upper limit that
  * depends on table size (this limit ensures we don't allocate a huge area
- * uselessly for vacuuming small tables).  If the array threatens to overflow,
+ * uselessly for vacuuming small tables). Additional arrays of increasingly
+ * large sizes are allocated as they become necessary.
+ *
+ * The TID array is thus represented as a list of multiple segments of
+ * varying size, beginning with the initial size of up to 128MB, and growing
+ * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
+ * is used up.
+ *
+ * Lookup in that structure proceeds sequentially in the list of segments,
+ * and with a binary search within each segment. Since segment's size grows
+ * exponentially, this retains O(log N) lookup complexity (2 log N to be
+ * precise).
+ *
+ * If the multiarray's total size threatens to grow beyond maintenance_work_mem,
  * we suspend the heap scan phase and perform a pass of index cleanup and page
- * compaction, then resume the heap scan with an empty TID array.
+ * compaction, then resume the heap scan with an array of logically empty but
+ * already preallocated

Re: [HACKERS] partitioned tables and contrib/sepgsql

2017-04-07 Thread Joe Conway

On 04/07/2017 05:36 PM, Robert Haas wrote:
> On Fri, Apr 7, 2017 at 5:22 PM, Joe Conway  wrote:
>> 1) commit the 0002 patch now before the feature freeze and follow up
>>with the regression test patch when ready in a couple of days
>> 2) hold off on both patches until ready
>> 3) push both patches to the next commitfest/pg11
>>
>> Some argue this is an open issue against the new partitioning feature in
>> pg10 and therefore should be addressed now, and others do not. I can see
>> both sides of that argument.
>>
>> In any case, thoughts on what to do?
> 
> Speaking only for myself, I'm OK with any of those options, provided
> that that "a couple" means what my dictionary says it means.

Thanks. I'd prefer not to do #1 actually, and I think we can adhere to
the dictionary meaning of "a couple" (i.e. by EOD Sunday). Assuming we
are ready by Sunday I will push both together (#2) or else I will move
them both together to the next CF (#3).

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] valgrind errors around dsa.c

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 8:57 AM, Thomas Munro
 wrote:
> On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund  wrote:
>> Hi,
>>
>> newly added tests exercise parallel bitmap scans.  And they trigger
>> valgrind errors:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink=2017-04-07%2007%3A10%3A01
>>
>>
>> ==4567== VALGRINDERROR-BEGIN
>> ==4567== Conditional jump or move depends on uninitialised value(s)
>> ==4567==at 0x5FD62A: check_for_freed_segments (dsa.c:2219)
>> ==4567==by 0x5FD97E: dsa_get_address (dsa.c:934)
>> ==4567==by 0x5FDA2A: init_span (dsa.c:1339)
>> ==4567==by 0x5FE6D1: ensure_active_superblock (dsa.c:1696)
>> ==4567==by 0x5FEBBD: alloc_object (dsa.c:1452)
>> ==4567==by 0x5FEBBD: dsa_allocate_extended (dsa.c:693)
>> ==4567==by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536)
>> ==4567==by 0x3C7A83: pagetable_create (simplehash.h:342)
>> ==4567==by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323)
>> ==4567==by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246)
>> ==4567==by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432)
>> ==4567==by 0x22510C: btgetbitmap (nbtree.c:460)
>> ==4567==by 0x21A8D1: index_getbitmap (indexam.c:726)
>> ==4567==by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91)
>> ==4567==by 0x37D353: MultiExecProcNode (execProcnode.c:621)
>> ==4567==  Uninitialised value was created by a heap allocation
>> ==4567==at 0x602FD5: palloc (mcxt.c:872)
>> ==4567==by 0x5FF73B: create_internal (dsa.c:1242)
>> ==4567==by 0x5FF8F5: dsa_create_in_place (dsa.c:473)
>> ==4567==by 0x37CA32: ExecInitParallelPlan (execParallel.c:532)
>> ==4567==by 0x38C324: ExecGather (nodeGather.c:152)
>> ==4567==by 0x37D247: ExecProcNode (execProcnode.c:551)
>> ==4567==by 0x39870F: ExecNestLoop (nodeNestloop.c:156)
>> ==4567==by 0x37D1B7: ExecProcNode (execProcnode.c:512)
>> ==4567==by 0x3849D4: fetch_input_tuple (nodeAgg.c:686)
>> ==4567==by 0x387764: agg_retrieve_direct (nodeAgg.c:2306)
>> ==4567==by 0x387A11: ExecAgg (nodeAgg.c:2117)
>> ==4567==by 0x37D217: ExecProcNode (execProcnode.c:539)
>> ==4567==
>>
>> It could be that these are spurious due to shared memory - valgrind
>> doesn't track definedness across processes - but the fact that memory
>> allocated by palloc is the source of the undefined memory makes me doubt
>> that.
>
> Thanks.  Will post a fix for this later today.

Fix attached.

Explanation:  Whenever segments are destroyed because they no longer
contain any live blocks, the shared variable
control->freed_segment_counter advances.  Each attached backend has
its own local variable area->freed_segment_counter, and if it sees
that the former differs from the latter it checks all attached
segments to see if any need to be detached.  I failed to initialise
the backend-local version, with the consequence that if you were very
unlucky your backend could fail to detach from a no-longer needed
segment until a another segment was eventually freed causing the
shared counter to move again.  More likely, it would notice that they
are different because one holds uninitialised junk, perform a spurious
scan for dead segments, and then get them in sync.

-- 
Thomas Munro
http://www.enterprisedb.com


initialise-freed-segment-counter.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add GUCs for predicate lock promotion thresholds

2017-04-07 Thread Kevin Grittner

On Mon, Feb 27, 2017 at 7:35 AM, Dagfinn Ilmari Mannsåker
 wrote:
> Kevin Grittner  writes:
>
>> It occurred to me that it would make sense to allow these settings
>> to be attached to a database or table (though *not* a role).  I'll
>> look at that when I get back if you don't get to it first.
>
> Attached is a draft patch (no docs or tests) on top of the previous one,
> adding reloptions for the per-relation and per-page thresholds.  That
> made me realise that the corresponding GUCs don't need to be PGC_SIGHUP,
> but could be PGC_SUSET or PGC_USERSET.  I'll adjust the original patch
> if you agree.  I have not got around around to adding per-database
> versions of the setting, but could have a stab at that.

Unfortunately, I was unable to get the follow-on patch to allow
setting by relation into a shape I liked.  Let's see what we can do
for the next release.  The first patch was applied with only very
minor tweaks.  I realize that nothing would break if individual
users could set their granularity thresholds on individual
connections, but as someone who has filled the role of DBA, the
thought kinda made my skin crawl.  I left it as PGC_SIGHUP for now;
we can talk about loosening that up later, but I think we should
have one or more use-cases that outweigh the opportunities for
confusion and bad choices by individual programmers to justify that.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 9:24 PM, Thomas Munro
 wrote:

> 2.  Did I understand correctly that it is safe to scan the list of
> SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
> holding only SerializableXactHashLock,

Yes.

> and that 'inLink' is the correct link to be following?

If you're starting from the blocked (read-only) transaction (which
you are), inLink is the one to follow.

Note: It would be better form to use the SxactIsDeferrableWaiting()
macro than repeat the bit-testing code directly in your function.

--
Kevin Grittner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

On Sat, Apr 8, 2017 at 1:09 PM, Masahiko Sawada  wrote:
> On Sat, Apr 8, 2017 at 3:37 AM, Andres Freund  wrote:
>> Hi,
>>
>> When I started writing this, there were the following reamining CF
>> items, minus bugfix ones which aren't bound by the code freeze.
>>
>> I think it makes sense to go through those and see whether it's
>> realistic to commit any of them.
>>
>> Ready for Committer:
>>
>> Add GUCs for predicate lock promotion thresholds:
>> - claimed by Kevin, should be easy enough
>>
>> initdb configurable wal_segment_size
>> - parts have been committed
>> - significantly derailed by segment naming discussion
>> - possibly committable if we decide to skip the naming bit? But also a
>>   bit late given that it touches some quite sensitive code.
>>
>> Create fdw_outerpath for foreign
>> - haven't really followed discussion
>> - only marked as ready-for-committer 2017-04-04
>>
>> Vacuum: allow usage of more than 1GB of work mem
>> - hm, maybe?  Will take a look.
>>
>> Unique Joins
>> - Tom's discussing things with David, not sure.
>>
>> Push down more UPDATEs/DELETEs in postgres_fdw
>> - claimed by Robert?
>>
>> postgres_fdw: support parameterized foreign joins
>> - think that depends on fdw_outerpath?
>>
>>
>> Waiting on Author:
>>
>> SQL statements statistics counter view (pg_stat_sql)
>> - the code doesn't look quite ready
>> - don't think we quite have design agreement, e.g. I don't like where it's
>>   hooked into query execution
>>
>> Write Amplification Reduction Method (WARM)
>> - fair number of people don't think it's ready for v10.
>> - can't move to next fest because it's waiting-on-author, which doesn't
>>   allow that.  Doesn't strike me as a useful restriction.
>>
>> BRIN optimize memory allocation
>> - I think Alvaro has indicated that he wants to take care of that?
>>
>> Indexes with Included Columns (was Covering + unique indexes)
>> - Don't think concerns about #columns on truncated tuples have been
>>   addressed.  Should imo be returned-with-feedback.
>>
>>
>> Needs-Review:
>>
>> Better estimate merging for duplicate vars in clausesel.c
>> - has been submitted pretty late (2017-02-24) and discussion is ongoing
>> - I'm inclined to punt on this one to the next release, previous
>>   proposal along that line got some pushback
>>
>> new plpgsql extra_checks
>> - Winner of the "most opaque CF title" award
>> - hasn't received a whole lot of review
>> - don't think we're even close to having design agreement
>>
>> Generic type subscripting
>> - still some review back and forth
>> - probably should be punted
>>
>>
>> Any comments?
>>
>
> HI,
>
> Could you consider the item 2PC on FDW as well? It is marked as "Move
> to Next CF" early yesterday but I'm not sure that reason..
>

Oops, I meant "Transactions involving multiple postgres foreign servers"[1].

[1] https://commitfest.postgresql.org/13/928/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Thomas Munro

On Sat, Apr 8, 2017 at 9:47 AM, Kevin Grittner  wrote:
> On Fri, Apr 7, 2017 at 3:47 PM, Thomas Munro
>  wrote:
>> On Sat, Apr 8, 2017 at 6:35 AM, Kevin Grittner  wrote:
>>> On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:
>>>
 I'd rather fix the issue, than remove the tests entirely.  Seems quite
 possible to handle blocking on Safesnapshot in a similar manner as 
 pg_blocking_pids?
>>>
>>> I'll see what I can figure out.
>>
>> Ouch.  These are the other ways I thought of to achieve this:
>>
>> https://www.postgresql.org/message-id/CAEepm%3D1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN%3D0LY3pJ49Ksg%40mail.gmail.com
>>
>> I'd be happy to write one of those, but it may take a day as I have
>> some other commitments.
>
> Please give it a go.  I'm dealing with putting out fires with
> customers while trying to make sure I have tested the predicate
> locking GUCs patch sufficiently.  (I think it's ready to go, and has
> passed all tests so far, but there are a few more I want to run.)
> I'm not sure I can come up with a solution faster than you, given
> that.  Since it is an improvement to performance for a test-only
> environment on a feature that is already committed and not causing
> problems for production environments, hopefully people will tolerate
> a fix a day or two out.  If not, we'll have to revert and get it
> into the first CF for v11.

Here is an attempt at option 2 from the menu I posted above.  Questions:

1.  Does anyone object to this extension of pg_blocking_pids()'s
remit?  If so, I could make it a separate function (that was option
3).

2.  Did I understand correctly that it is safe to scan the list of
SERIALIZABLEXACTs and access the possibleUnsafeConflicts list while
holding only SerializableXactHashLock, and that 'inLink' is the
correct link to be following?

-- 
Thomas Munro
http://www.enterprisedb.com


safe-snapshot-blocker-introspection.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

David Rowley  writes:
[ unique_joins_2017-04-07b.patch ]

It turned out that this patch wasn't as close to committable as I'd
thought, but after a full day of whacking at it, I got to a place
where I thought it was OK.  So, pushed.

[ and that's a wrap for v10 feature freeze, I think ]

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

On Sat, Apr 8, 2017 at 3:37 AM, Andres Freund  wrote:
> Hi,
>
> When I started writing this, there were the following reamining CF
> items, minus bugfix ones which aren't bound by the code freeze.
>
> I think it makes sense to go through those and see whether it's
> realistic to commit any of them.
>
> Ready for Committer:
>
> Add GUCs for predicate lock promotion thresholds:
> - claimed by Kevin, should be easy enough
>
> initdb configurable wal_segment_size
> - parts have been committed
> - significantly derailed by segment naming discussion
> - possibly committable if we decide to skip the naming bit? But also a
>   bit late given that it touches some quite sensitive code.
>
> Create fdw_outerpath for foreign
> - haven't really followed discussion
> - only marked as ready-for-committer 2017-04-04
>
> Vacuum: allow usage of more than 1GB of work mem
> - hm, maybe?  Will take a look.
>
> Unique Joins
> - Tom's discussing things with David, not sure.
>
> Push down more UPDATEs/DELETEs in postgres_fdw
> - claimed by Robert?
>
> postgres_fdw: support parameterized foreign joins
> - think that depends on fdw_outerpath?
>
>
> Waiting on Author:
>
> SQL statements statistics counter view (pg_stat_sql)
> - the code doesn't look quite ready
> - don't think we quite have design agreement, e.g. I don't like where it's
>   hooked into query execution
>
> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.
> - can't move to next fest because it's waiting-on-author, which doesn't
>   allow that.  Doesn't strike me as a useful restriction.
>
> BRIN optimize memory allocation
> - I think Alvaro has indicated that he wants to take care of that?
>
> Indexes with Included Columns (was Covering + unique indexes)
> - Don't think concerns about #columns on truncated tuples have been
>   addressed.  Should imo be returned-with-feedback.
>
>
> Needs-Review:
>
> Better estimate merging for duplicate vars in clausesel.c
> - has been submitted pretty late (2017-02-24) and discussion is ongoing
> - I'm inclined to punt on this one to the next release, previous
>   proposal along that line got some pushback
>
> new plpgsql extra_checks
> - Winner of the "most opaque CF title" award
> - hasn't received a whole lot of review
> - don't think we're even close to having design agreement
>
> Generic type subscripting
> - still some review back and forth
> - probably should be punted
>
>
> Any comments?
>

HI,

Could you consider the item 2PC on FDW as well? It is marked as "Move
to Next CF" early yesterday but I'm not sure that reason..

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

Thomas Munro  writes:
> Here is an attempt at option 2 from the menu I posted above.  Questions:

> 1.  Does anyone object to this extension of pg_blocking_pids()'s
> remit?  If so, I could make it a separate function (that was option
> 3).

It seems an entirely principle-free change in the function's definition.

I'm not actually clear on why Kevin wanted this change in
isolationtester's wait behavior anyway, so maybe some clarification
on that would be a good idea.  But if we need it, I think probably
a dedicated function would be a good thing.  We want the wait-checking
query to be as trivial as possible at the SQL level, so whatever
semantic oddities it needs to have should be pushed into C code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

On 2017-04-07 16:28:03 -0300, Alvaro Herrera wrote:
> Peter Geoghegan wrote:
> > > - can't move to next fest because it's waiting-on-author, which doesn't
> > >   allow that.  Doesn't strike me as a useful restriction.
> > 
> > I agree that that CF app restriction makes little sense.
> 
> What the restriction means is that if a patch is in waiting-on-author,
> the proper "close" action is to return-with-feedback.  There is no point
> in moving the patch to the next commitfest if there is no further patch
> version.

That's true if the patch has been in that state for a while, but if you
find some relatively minor issues, and then move it soon after, it seems
to make sense to keep it open in the next CF.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undefined psql variables

2017-04-07 Thread Pavel Stehule

2017-04-07 21:04 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> I wish I could have an explanation about why the :?varname (or some other
>>> variant) syntax I suggested has a "namespace" issue.
>>>
>>> The advantage that I see is that although it is obviously ugly, it is
>>> ugly
>>> in the continuity of the various :["'?]varname syntaxes already offered
>>> and
>>> it allows to get rid of "defined varname" which does not look like SQL. A
>>> second advantage is that with the "defined" proposal
>>>
>>
>> I don't think so this argument is valid - \if doesn't look like SQL too.
>>
>
> Sure. I'm talking about the expressions after the "\if" which should be as
> close as SQL, I think. At least that is what Tom required about the
> expression syntax in pgbench, and I tend to agree that psql should avoid to
> mix in another language if possible.
>
>\if defined var1 and defined var2 or defined var3 and sqlrt() >= ..
>>>
>>> Would probably never work work, as it cannot be embedded in another
>>> expression, while it would work with
>>>
>>>\if :?var1 and :?var2 or :?var3 and ...
>>>
>>> I don't see any reason why first should not work and second should to
>> work
>>
>
> Because of the mix of client-side and server-side stuff which needs to be
> interpreted. Let us consider:
>
>   \if EXISTS (SELECT * FROM tbl WHERE id=3) AND defined foo
>
> The "exists" is obviously executed server-side, but "defined foo" needs to
> be interpreted client-side, and it means that some parser client side would
> have been able to catch it in the middle of everything else. This example
> also illustrate my "does not look like SQL" point, as the first part is
> clearly SQL and the part after AND is not.
>
> With the second approach, ... "AND :?foo", the ":?foo" reference would be
> substituted directly by psql lexer and replaced on the fly by the answer,
> resulting in "AND TRUE" or "AND FALSE" depending, then the whole result
> (from EXISTS to TRUE/FALSE) could be interpreted server side to get an
> answer.
>
> Basically, catching :?varname seems easier/safer than catching "defined
> varname". I think that Tom's intent is that the defined expressions could
> not be mixed with SQL server side stuff, but I do not see why not, it is
> easy to imagine some use case where it would make sense.
>
> I have a different opinion - the condition expression should not be SQL
>> necessary. This language is oriented on client side operations. What is
>> benefit from server side expression?
>>
>
> Because I think it is legitimate to be able to write things like:
>
>   \if NOT pg_extension_is_loaded('units')
> \echo 'this application requires the great units extension'
> \q
>   \endif
>
>   \if (SELECT version FROM app_version) >= 2.0
> \echo 'application already installed at 2.0'
> \q
>   \endif
>
>
you proposal disallow client side expressions. I agree so is not possible
to mix server side and client side expressions. But I am sceptic so benefit
of server side expression is higher than a lost of client side expressions.
If we disallow server side expressions, then your examples are only two
lines longer, but the implementation can be more simpler.

SELECT version FROM  app_version
\gset
\if :version >= 2.0
 ...

Still I don't think so server side expression in \if is good idea.

Regards

Pavel




> --
> Fabien.
>

Re: [HACKERS] Making clausesel.c Smarter

On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
 wrote:
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> + s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +  if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +  {
>> +   /* Most likely contradictory clauses found */
>> +   s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +  }
>> +  else
>> +  {
>> +   /* Could be a rounding error */
>> +   s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +  }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>
> I notice this does change the estimates for contradictory clauses such as:
>
> create table a (value int);
> insert into a select x/100 from generate_Series(1,1) x;
> analyze a;
> explain analyze select * from a where value between 101 and -1;
>
> We now get:
>
>  QUERY PLAN
> -
>  Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
> time=1.904..1.904 rows=0 loops=1)
>Filter: ((value >= 101) AND (value <= '-1'::integer))
>Rows Removed by Filter: 1
>  Planning time: 0.671 ms
>  Execution time: 1.950 ms
> (5 rows)
>
> where before we'd get:
>
>   QUERY PLAN
> --
>  Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
> time=0.903..0.903 rows=0 loops=1)
>Filter: ((value >= 101) AND (value <= '-1'::integer))
>Rows Removed by Filter: 1
>  Planning time: 0.162 ms
>  Execution time: 0.925 ms
> (5 rows)
>
> Which comes from the 1 * 0.005 selectivity estimate from tuples *
> DEFAULT_RANGE_INEQ_SEL
>
> I've attached a patch to this effect.

+/*
+ * A zero or slightly negative s2 should be converted into
+ * a small positive value; we probably are dealing with a
+ * very tight range and got a bogus result due to roundoff
+ * errors. However, if s2 is very negative, then we
+ * probably have default selectivity estimates on one or
+ * both sides of the range that we failed to recognize
+ * above for some reason.
+ */
+if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> Hi,
>
> I've *not* read the history of this thread.  So I really might be
> missing some context.
>
>
>> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> From: Claudio Freire 
>> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>>
>> Turn the dead_tuples array into a structure composed of several
>> exponentially bigger arrays, to enable usage of more than 1GB
>> of work mem during vacuum and thus reduce the number of full
>> index scans necessary to remove all dead tids when the memory is
>> available.
>
>>   * We are willing to use at most maintenance_work_mem (or perhaps
>>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> - * initially allocate an array of TIDs of that size, with an upper limit 
>> that
>> + * initially allocate an array of TIDs of 128MB, or an upper limit that
>>   * depends on table size (this limit ensures we don't allocate a huge area
>> - * uselessly for vacuuming small tables).  If the array threatens to 
>> overflow,
>> - * we suspend the heap scan phase and perform a pass of index cleanup and 
>> page
>> - * compaction, then resume the heap scan with an empty TID array.
>> + * uselessly for vacuuming small tables). Additional arrays of increasingly
>> + * large sizes are allocated as they become necessary.
>> + *
>> + * The TID array is thus represented as a list of multiple segments of
>> + * varying size, beginning with the initial size of up to 128MB, and growing
>> + * exponentially until the whole budget of (autovacuum_)maintenance_work_mem
>> + * is used up.
>
> When the chunk size is 128MB, I'm a bit unconvinced that using
> exponential growth is worth it. The allocator overhead can't be
> meaningful in comparison to collecting 128MB dead tuples, the potential
> waste is pretty big, and it increases memory fragmentation.

The exponential strategy is mainly to improve lookup time (ie: to
avoid large segment lists).

>> + * Lookup in that structure proceeds sequentially in the list of segments,
>> + * and with a binary search within each segment. Since segment's size grows
>> + * exponentially, this retains O(N log N) lookup complexity.
>
> N log N is a horrible lookup complexity.  That's the complexity of
> *sorting* an entire array.  I think you might be trying to argue that
> it's log(N) * log(N)? Once log(n) for the exponentially growing size of
> segments, one for the binary search?
>
> Afaics you could quite easily make it O(2 log(N)) by simply also doing
> binary search over the segments.  Might not be worth it due to the small
> constant involved normally.

It's a typo, yes, I meant O(log N) (which is equivalent to O(2 log N))

>> + * If the array threatens to overflow, we suspend the heap scan phase and
>> + * perform a pass of index cleanup and page compaction, then resume the heap
>> + * scan with an array of logically empty but already preallocated TID 
>> segments
>> + * to be refilled with more dead tuple TIDs.
>
> Hm, it's not really the array that overflows, it's m_w_m that'd be
> exceeded, right?

Yes, will rephrase. Although that's how the original comment expressed
the same concept.

>>  /*
>> + * Minimum (starting) size of the dead_tuples array segments. Will allocate
>> + * space for 128MB worth of tid pointers in the first segment, further 
>> segments
>> + * will grow in size exponentially. Don't make it too small or the segment 
>> list
>> + * will grow bigger than the sweetspot for search efficiency on big vacuums.
>> + */
>> +#define LAZY_MIN_TUPLES  Max(MaxHeapTuplesPerPage, (128<<20) / 
>> sizeof(ItemPointerData))
>
> That's not really the minimum, no? s/MIN/INIT/?

Ok

>> +typedef struct DeadTuplesSegment
>> +{
>> + int num_dead_tuples;/* # of entries in the 
>> segment */
>> + int max_dead_tuples;/* # of entries 
>> allocated in the segment */
>> + ItemPointerData last_dead_tuple;/* Copy of the last dead tuple 
>> (unset
>> +
>>   * until the segment is fully
>> +
>>   * populated) */
>> + unsigned short padding;
>> + ItemPointer dt_tids;/* Array of dead tuples */
>> +}DeadTuplesSegment;
>
> Whenever padding is needed, it should have an explanatory comment.  It's
> certainly not obvious to me wh it's neede here.

Ok

>> @@ -1598,6 +1657,11 @@ lazy_vacuum_index(Relation indrel,
>>   ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
>>   ivinfo.strategy = vac_strategy;
>>
>> + /* Finalize the current segment by setting its upper bound dead tuple 
>> */
>> + seg = DeadTuplesCurrentSegment(vacrelstats);
>> + if (seg->num_dead_tuples > 0)
>> +

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On Fri, Apr 7, 2017 at 7:43 PM, Claudio Freire  wrote:
>>> + * Lookup in that structure proceeds sequentially in the list of segments,
>>> + * and with a binary search within each segment. Since segment's size grows
>>> + * exponentially, this retains O(N log N) lookup complexity.
>>
>> N log N is a horrible lookup complexity.  That's the complexity of
>> *sorting* an entire array.  I think you might be trying to argue that
>> it's log(N) * log(N)? Once log(n) for the exponentially growing size of
>> segments, one for the binary search?
>>
>> Afaics you could quite easily make it O(2 log(N)) by simply also doing
>> binary search over the segments.  Might not be worth it due to the small
>> constant involved normally.
>
> It's a typo, yes, I meant O(log N) (which is equivalent to O(2 log N))


To clarify, lookup over the segments is linear, so it's O(M) with M
the number of segments, then the binary search is O(log N) with N the
number of dead tuples.

So lookup is O(M + log N), but M < log N because of the segment's
exponential growth, therefore the lookup is O(2 log N)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

I have claimed this patch as committer FWIW.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Alvaro Herrera wrote:

> I suppose the rationale is that this shouldn't happen because any
> operation that does things this way must hold an exclusive lock on the
> relation.  But that doesn't guarantee that the relcache entry is
> completely stable, does it?  If we can get proof of that, then this
> technique should be safe, I think.

It occurs to me that in order to test this we could run the recovery
tests (including Michael's new 006 file, which you didn't include in
your patch) under -D CLOBBER_CACHE_ALWAYS.  I think that'd be sufficient
proof that it is solid.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

Keith Fiske  writes:
> On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:
>> Joe Conway  writes:
>>> Apparently INSERT and SELECT on the parent partitioned table skip normal
>>> acl checks on the partitions. Is that intended behavior?

>> Yes, this matches normal inheritance behavior.

> Should that really be normal partitioning behavior though?

Yes, it should.  Consider the alternatives:

1. Owner must remember to run around and grant permissions on all child
tables along with the parent.

2. The system silently(?) doesn't show you some rows that are supposed
to be visible when scanning the parent table.

If you want RLS, use RLS; this is not that, and is not a good substitute.

(We've been around on this topic before, btw.  See the archives.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

Hi,


On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> > Hi,
> >
> > I've *not* read the history of this thread.  So I really might be
> > missing some context.
> >
> >
> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> >> From: Claudio Freire 
> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> >>
> >> Turn the dead_tuples array into a structure composed of several
> >> exponentially bigger arrays, to enable usage of more than 1GB
> >> of work mem during vacuum and thus reduce the number of full
> >> index scans necessary to remove all dead tids when the memory is
> >> available.
> >
> >>   * We are willing to use at most maintenance_work_mem (or perhaps
> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> >> - * initially allocate an array of TIDs of that size, with an upper limit 
> >> that
> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
> >>   * depends on table size (this limit ensures we don't allocate a huge area
> >> - * uselessly for vacuuming small tables).  If the array threatens to 
> >> overflow,
> >> - * we suspend the heap scan phase and perform a pass of index cleanup and 
> >> page
> >> - * compaction, then resume the heap scan with an empty TID array.
> >> + * uselessly for vacuuming small tables). Additional arrays of 
> >> increasingly
> >> + * large sizes are allocated as they become necessary.
> >> + *
> >> + * The TID array is thus represented as a list of multiple segments of
> >> + * varying size, beginning with the initial size of up to 128MB, and 
> >> growing
> >> + * exponentially until the whole budget of 
> >> (autovacuum_)maintenance_work_mem
> >> + * is used up.
> >
> > When the chunk size is 128MB, I'm a bit unconvinced that using
> > exponential growth is worth it. The allocator overhead can't be
> > meaningful in comparison to collecting 128MB dead tuples, the potential
> > waste is pretty big, and it increases memory fragmentation.
> 
> The exponential strategy is mainly to improve lookup time (ie: to
> avoid large segment lists).

Well, if we were to do binary search on the segment list, that'd not be
necessary.

> >> + if (seg->num_dead_tuples >= seg->max_dead_tuples)
> >> + {
> >> + /*
> >> +  * The segment is overflowing, so we must allocate a 
> >> new segment.
> >> +  * We could have a preallocated segment descriptor 
> >> already, in
> >> +  * which case we just reinitialize it, or we may 
> >> need to repalloc
> >> +  * the vacrelstats->dead_tuples array. In that case, 
> >> seg will no
> >> +  * longer be valid, so we must be careful about 
> >> that. In any case,
> >> +  * we must update the last_dead_tuple copy in the 
> >> overflowing
> >> +  * segment descriptor.
> >> +  */
> >> + Assert(seg->num_dead_tuples == seg->max_dead_tuples);
> >> + seg->last_dead_tuple = 
> >> seg->dt_tids[seg->num_dead_tuples - 1];
> >> + if (vacrelstats->dead_tuples.last_seg + 1 >= 
> >> vacrelstats->dead_tuples.num_segs)
> >> + {
> >> + int new_num_segs = 
> >> vacrelstats->dead_tuples.num_segs * 2;
> >> +
> >> + vacrelstats->dead_tuples.dt_segments = 
> >> (DeadTuplesSegment *) repalloc(
> >> +(void *) 
> >> vacrelstats->dead_tuples.dt_segments,
> >> +
> >> new_num_segs * sizeof(DeadTuplesSegment));
> >
> > Might be worth breaking this into some sub-statements, it's quite hard
> > to read.
> 
> Breaking what precisely? The comment?

No, the three-line statement computing the new value of
dead_tuples.dt_segments.  I'd at least assign dead_tuples to a local
variable, to cut the length of the statement down.


> >> + while (vacrelstats->dead_tuples.num_segs < 
> >> new_num_segs)
> >> + {
> >> + /* Initialize as "unallocated" */
> >> + DeadTuplesSegment *nseg = 
> >> &(vacrelstats->dead_tuples.dt_segments[
> >> +  
> >> vacrelstats->dead_tuples.num_segs]);
> >
> > dito.
> 
> I don't really get what you're asking here.

Trying to simplify/shorten the statement.


> >> +/*
> >>   *   lazy_tid_reaped() -- is a particular tid deletable?
> >>   *
> >>   *   This has the right signature to be an 
> >> IndexBulkDeleteCallback.
> >>   *
> >> - *   Assumes

Re: [pgsql-www] [HACKERS] Small issue in online devel documentation build

2017-04-07 Thread Bruce Momjian

On Fri, Mar 24, 2017 at 07:01:46AM +0100, Fabien COELHO wrote:
> 
> Hello Peter,
> 
> >I think the fix belongs into the web site CSS, so there is nothing to
> >commit into PostgreSQL here.
> 
> Indeed, the changes were only for the "remove nesting" solution.
> 
> >I will close the commit fest entry, but I have added a section to the open
> >items list so we keep track of it. 
> >(https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items#Documentation_tool_chain)
> 
> I put forward that the quick workaround a colleague of mine suggested (aka
> something like code code { font-size: 100%; important! }) could also be
> applied to the web site CSS while waiting for a more definite answer which
> might take some pretty unknown time close to never?

Sorry I am just getting back to this.  Below I am going to cover only
the problem with the font size of nested  tags, and I am going to
confirm what most people already figured out.

The basic problem was already posted by Fabien, with an image example. 
The cause of the fonts being too large on Chrome is an interaction of
Chrome's default font size for different blocks, the JavaScript that is
meant to fix such mismatches, and the new nested code blocks in the PG
10 docs.

First, the JavaScript:

https://github.com/postgres/pgweb/blob/master/media/js/monospacefix.js

There is no git history for this file except for its initial checkin in
2011, but I am pretty sure I wrote it.  What it does is to create 
and  blocks, find the font point size, and compute a ratio.  If the
ratio is not 1, , , and  blocks are adjusted in size to
match .  The complex part is that the JavaScript conditionally
injects CSS into the web-page to accomplish this.

The reason the PG 10 docs look fine on Linux Firefox is because the font
points sizes match so no CSS is injected.  They don't match on Chrome,
so the CSS is injected.  When the CSS hits double-embedded code blocks,
 , it makes the font too large because it double-adjusts.

The fix, as Fabien identified, is to conditionally inject additional CSS
to be _more_ specific than the first CSS and set the font-size to a
simple '1em' so the first CSS is not called twice.  I don't think
'important!' is necessary but it would be good to test this.

Attached is a patch that can be applied to pgweb which should fix all of
this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/media/js/monospacefix.js b/media/js/monospacefix.js
new file mode 100644
index 8523932..42ce913
*** a/media/js/monospacefix.js
--- b/media/js/monospacefix.js
*** if (newMonoSize != 1)
*** 19,24 
  {
  document.write(''
  	+ '#docContainer tt, #docContainer pre, #docContainer code'
! 	+ '{font-size: ' + newMonoSize.toFixed(1) + 'em;}\n');
  }

--- 19,27 
  {
  document.write(''
  	+ '#docContainer tt, #docContainer pre, #docContainer code'
! 	+ '{font-size: ' + newMonoSize.toFixed(1) + 'em;}\n'
! 	/* prevent embedded code tags from changing font size */
! 	+ '#docContainer code code'
! 	+ '{font-size: 1em;}\n');
  }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improve 64bit atomics support.

On 2017-04-07 19:55:21 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Improve 64bit atomics support.
> > 
> > When adding atomics back in b64d92f1a, I added 64bit support as
> > optional; there wasn't yet a direct user in sight.  That turned out to
> > be a bit short-sighted, it'd already have been useful a number of times.
> 
> Seems like this killed an arapaima:
>   
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=arapaima=2017-04-07%2022%3A06%3A59
> 
> Program terminated with signal 6, Aborted.
> #0  0x00c6a402 in __kernel_vsyscall ()
> #0  0x00c6a402 in __kernel_vsyscall ()
> #1  0x00284b10 in raise () from /lib/libc.so.6
> #2  0x00286421 in abort () from /lib/libc.so.6
> #3  0x084d967e in ExceptionalCondition (
> conditionName=0xe19dac "(((uintptr_t) ((uintptr_t)(ptr)) + ((8) - 1)) & 
> ~((uintptr_t) ((8) - 1))) != (uintptr_t)(ptr)", 
> errorType=0xe19831 "UnalignedPointer", 
> fileName=0xe19d88 "../../../src/include/port/atomics.h", lineNumber=428)
> at assert.c:54
> #4  0x00e189b0 in pg_atomic_init_u64 ()
> at ../../../src/include/port/atomics.h:428

Gah, that's fairly annoying :(.  We can't trivially force alignment in
the generic fallback case, because not all compilers support that.  We
don't really need it the fallback case, because things are protected by
a lock - but that means we'll have to make a bunch of locks conditional
:/

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL logging problem in 9.4.3?

Alvaro Herrera  writes:
> Interesting.  I wonder if it's possible that a relcache invalidation
> would cause these values to get lost for some reason, because that would
> be dangerous.

> I suppose the rationale is that this shouldn't happen because any
> operation that does things this way must hold an exclusive lock on the
> relation.  But that doesn't guarantee that the relcache entry is
> completely stable,

It ABSOLUTELY is not safe.  Relcache flushes can happen regardless of
how strong a lock you hold.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On 2017-04-07 22:06:13 -0300, Claudio Freire wrote:
> On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
> > Hi,
> >
> >
> > On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
> >> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
> >> > Hi,
> >> >
> >> > I've *not* read the history of this thread.  So I really might be
> >> > missing some context.
> >> >
> >> >
> >> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
> >> >> From: Claudio Freire 
> >> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
> >> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
> >> >>
> >> >> Turn the dead_tuples array into a structure composed of several
> >> >> exponentially bigger arrays, to enable usage of more than 1GB
> >> >> of work mem during vacuum and thus reduce the number of full
> >> >> index scans necessary to remove all dead tids when the memory is
> >> >> available.
> >> >
> >> >>   * We are willing to use at most maintenance_work_mem (or perhaps
> >> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
> >> >> - * initially allocate an array of TIDs of that size, with an upper 
> >> >> limit that
> >> >> + * initially allocate an array of TIDs of 128MB, or an upper limit that
> >> >>   * depends on table size (this limit ensures we don't allocate a huge 
> >> >> area
> >> >> - * uselessly for vacuuming small tables).  If the array threatens to 
> >> >> overflow,
> >> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
> >> >> and page
> >> >> - * compaction, then resume the heap scan with an empty TID array.
> >> >> + * uselessly for vacuuming small tables). Additional arrays of 
> >> >> increasingly
> >> >> + * large sizes are allocated as they become necessary.
> >> >> + *
> >> >> + * The TID array is thus represented as a list of multiple segments of
> >> >> + * varying size, beginning with the initial size of up to 128MB, and 
> >> >> growing
> >> >> + * exponentially until the whole budget of 
> >> >> (autovacuum_)maintenance_work_mem
> >> >> + * is used up.
> >> >
> >> > When the chunk size is 128MB, I'm a bit unconvinced that using
> >> > exponential growth is worth it. The allocator overhead can't be
> >> > meaningful in comparison to collecting 128MB dead tuples, the potential
> >> > waste is pretty big, and it increases memory fragmentation.
> >>
> >> The exponential strategy is mainly to improve lookup time (ie: to
> >> avoid large segment lists).
> >
> > Well, if we were to do binary search on the segment list, that'd not be
> > necessary.
> 
> True, but the initial lookup might be slower in the end, since the
> array would be bigger and cache locality worse.
> 
> Why do you say exponential growth fragments memory? AFAIK, all those
> allocations are well beyond the point where malloc starts mmaping
> memory, so each of those segments should be a mmap segment,
> independently freeable.

Not all platforms have that, and even on platforms with it, frequent,
unevenly sized, very large allocations can lead to enough fragmentation
that further allocations are harder and fragment / enlarge the
pagetable.


> >> Yes, the benchmarks are upthread. The earlier runs were run on my
> >> laptop and made little sense, so I'd ignore them as inaccurate. The
> >> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
> >> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
> >> improvements as well, roughly 30%, though it's not measuring CPU time
> >> but rather elapsed time.
> >
> > I'd be more concerned about cases that'd already fit into memory, not ones
> > where we avoid doing another scan - and I think you mostly measured that?
> >
> > - Andres
> 
> Well, scale 400 is pretty much as big as you can get with the old 1GB
> limit, and also suffered no significant regression. Although, true, id
> didn't significantly improve either.

Aren't more interesting cases those where not that many dead tuples are
found, but the indexes are pretty large?  IIRC the index vacuum scans
still visit every leaf index tuple, no?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Keith Fiske

On Fri, Apr 7, 2017 at 8:41 PM, Tom Lane  wrote:

> Keith Fiske  writes:
> > On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:
> >> Joe Conway  writes:
> >>> Apparently INSERT and SELECT on the parent partitioned table skip
> normal
> >>> acl checks on the partitions. Is that intended behavior?
>
> >> Yes, this matches normal inheritance behavior.
>
> > Should that really be normal partitioning behavior though?
>
> Yes, it should.  Consider the alternatives:
>
> 1. Owner must remember to run around and grant permissions on all child
> tables along with the parent.
>

I'm not following. That's what Joe is saying is happening now. The child
tables are not getting the parent privileges so this is what the owner must
remember to do every time they add a new child if they want to role to be
able to interact directly with the children. They can select, insert, etc
with the parent, but any direct interaction with the child is denied. I
know you're all trying to make the planner work so queries work efficiently
from the parent, but they'll never be as good as being able to hit the
child tables directly if they know where the data they want is. Why even
leave the child tables visible at all they can't be interacted with the
same as the parent? I thought that was supposed to be one of the advantages
to doing partitioning this way vs how Oracle & MySQL do it.

> 2. The system silently(?) doesn't show you some rows that are supposed
> to be visible when scanning the parent table.
>

> If you want RLS, use RLS; this is not that, and is not a good substitute.
>

Agreed. It appears the rows are visible if the role has select privileges
on the parent. But they cannot select directly from children. Not sure what
this has to do with RLS.

>
> (We've been around on this topic before, btw.  See the archives.)
>
> regards, tom lane
>

Re: [HACKERS] mvstats triggers 32bit warnings

Andres Freund wrote:
> Hi,
> 
> compiling on linux 32 bit I get a lot of warnings due to printf format.
> I suspect most of them should just be cured by using %zd or %zu instead
> of %ld.

You're right, they are.  Confirmed, and pushed fix using %zd.  I suppose
%zu would be "more correct", but this doesn't raise warnings anymore.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] monitoring.sgml missing tag

On 4/7/17 16:47, Erik Rijkers wrote:
> monitoring.sgml has one  tag missing

fixed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

2017-04-07 Thread Keith Fiske

On Fri, Apr 7, 2017 at 2:46 PM, Tom Lane  wrote:

> Joe Conway  writes:
> > Apparently INSERT and SELECT on the parent partitioned table skip normal
> > acl checks on the partitions. Is that intended behavior?
>
> Yes, this matches normal inheritance behavior.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Should that really be normal partitioning behavior though? Pretty sure
people would expect child tables to have consistent permissions in a
partition set and I'd think setting them on the parent should be what they
expect the children to have.

Keith

Re: [HACKERS] partitioned tables and contrib/sepgsql

On Fri, Apr 7, 2017 at 5:22 PM, Joe Conway  wrote:
> On 04/07/2017 11:37 AM, Mike Palmiotto wrote:
 I found some missing bits in the 0002 patch -- new version attached.
 Will wait on new regression tests before committing, but I expect we'll
 have those by end of today and be able to commit the rest tomorrow.
>>>
>>> Attached are the regression test updates for partitioned tables.
>>
>> Actually attached this time.
>
> Based on my review and testing of the 0002 patch I believe it is
> correct. However Mike and I just went through the regression test patch
> line by line and there are issues he needs to address -- there is no way
> that is happening by tonight as the output is very verbose and we need
> to be sure we are both testing the correct things and getting the
> correct behaviors.
>
> Based on that I can:
>
> 1) commit the 0002 patch now before the feature freeze and follow up
>with the regression test patch when ready in a couple of days
> 2) hold off on both patches until ready
> 3) push both patches to the next commitfest/pg11
>
> Some argue this is an open issue against the new partitioning feature in
> pg10 and therefore should be addressed now, and others do not. I can see
> both sides of that argument.
>
> In any case, thoughts on what to do?

Speaking only for myself, I'm OK with any of those options, provided
that that "a couple" means what my dictionary says it means.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

On Fri, Apr 7, 2017 at 10:12 PM, Andres Freund  wrote:
> On 2017-04-07 22:06:13 -0300, Claudio Freire wrote:
>> On Fri, Apr 7, 2017 at 9:56 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> >
>> > On 2017-04-07 19:43:39 -0300, Claudio Freire wrote:
>> >> On Fri, Apr 7, 2017 at 5:05 PM, Andres Freund  wrote:
>> >> > Hi,
>> >> >
>> >> > I've *not* read the history of this thread.  So I really might be
>> >> > missing some context.
>> >> >
>> >> >
>> >> >> From e37d29c26210a0f23cd2e9fe18a264312fecd383 Mon Sep 17 00:00:00 2001
>> >> >> From: Claudio Freire 
>> >> >> Date: Mon, 12 Sep 2016 23:36:42 -0300
>> >> >> Subject: [PATCH] Vacuum: allow using more than 1GB work mem
>> >> >>
>> >> >> Turn the dead_tuples array into a structure composed of several
>> >> >> exponentially bigger arrays, to enable usage of more than 1GB
>> >> >> of work mem during vacuum and thus reduce the number of full
>> >> >> index scans necessary to remove all dead tids when the memory is
>> >> >> available.
>> >> >
>> >> >>   * We are willing to use at most maintenance_work_mem (or perhaps
>> >> >>   * autovacuum_work_mem) memory space to keep track of dead tuples.  We
>> >> >> - * initially allocate an array of TIDs of that size, with an upper 
>> >> >> limit that
>> >> >> + * initially allocate an array of TIDs of 128MB, or an upper limit 
>> >> >> that
>> >> >>   * depends on table size (this limit ensures we don't allocate a huge 
>> >> >> area
>> >> >> - * uselessly for vacuuming small tables).  If the array threatens to 
>> >> >> overflow,
>> >> >> - * we suspend the heap scan phase and perform a pass of index cleanup 
>> >> >> and page
>> >> >> - * compaction, then resume the heap scan with an empty TID array.
>> >> >> + * uselessly for vacuuming small tables). Additional arrays of 
>> >> >> increasingly
>> >> >> + * large sizes are allocated as they become necessary.
>> >> >> + *
>> >> >> + * The TID array is thus represented as a list of multiple segments of
>> >> >> + * varying size, beginning with the initial size of up to 128MB, and 
>> >> >> growing
>> >> >> + * exponentially until the whole budget of 
>> >> >> (autovacuum_)maintenance_work_mem
>> >> >> + * is used up.
>> >> >
>> >> > When the chunk size is 128MB, I'm a bit unconvinced that using
>> >> > exponential growth is worth it. The allocator overhead can't be
>> >> > meaningful in comparison to collecting 128MB dead tuples, the potential
>> >> > waste is pretty big, and it increases memory fragmentation.
>> >>
>> >> The exponential strategy is mainly to improve lookup time (ie: to
>> >> avoid large segment lists).
>> >
>> > Well, if we were to do binary search on the segment list, that'd not be
>> > necessary.
>>
>> True, but the initial lookup might be slower in the end, since the
>> array would be bigger and cache locality worse.
>>
>> Why do you say exponential growth fragments memory? AFAIK, all those
>> allocations are well beyond the point where malloc starts mmaping
>> memory, so each of those segments should be a mmap segment,
>> independently freeable.
>
> Not all platforms have that, and even on platforms with it, frequent,
> unevenly sized, very large allocations can lead to enough fragmentation
> that further allocations are harder and fragment / enlarge the
> pagetable.

I wouldn't call this frequent. You can get at most slightly more than
a dozen such allocations given the current limits.
And allocation sizes are quite regular - you get 128M or multiples of
128M, so each free block can be reused for N smaller allocations if
needed. I don't think it has much potential to fragment memory.

This isn't significantly different from tuplesort or any other code
that can do big allocations, and the differences favor less
fragmentation than those, so I don't see why this would need special
treatment.

My point being that it's not been simple getting to a point where this
beats even in CPU time the original single binary search.
If we're to scrap this implementation and go for a double binary
search, I'd like to have a clear measurable benefit to chase from
doing so. Fragmentation is hard to measure, and I cannot get CPU-bound
vacuums on the test hardware I have to test lookup performance at big
scales.

>> >> Yes, the benchmarks are upthread. The earlier runs were run on my
>> >> laptop and made little sense, so I'd ignore them as inaccurate. The
>> >> latest run[1] with a pgbench scale of 4000 gave an improvement in CPU
>> >> time (ie: faster) of about 20%. Anastasia did another one[2] and saw
>> >> improvements as well, roughly 30%, though it's not measuring CPU time
>> >> but rather elapsed time.
>> >
>> > I'd be more concerned about cases that'd already fit into memory, not ones
>> > where we avoid doing another scan - and I think you mostly measured that?
>> >
>> > - Andres
>>
>> Well, scale 400 is pretty much as big as you can get with the old 1GB
>> limit, and also suffered no

[HACKERS] shift_sjis_2004 related autority files are remaining

2017-04-07 Thread Kyotaro HORIGUCHI

Hi, I happned to notice that backend/utils/mb/Unicode directory
contains two encoding authority files, which I believe are not to
be there.

euc-jis-2004-std.txt
sjis-0213-2004-std.txt

And what is more astonishing, make distclean didn't its work.

| $ make distclean
| rm -f 

The Makefile there is missing the defenition of TEXT.

# Sorry for the bogus patch by me..

The attached is the *first patch* that fixes distclean and adds
the two files into GENERICTEXTS.

=

I don't attach the *second* patch since it's too large for the
trivality and can be made by the following steps.

$ cd src/backend/utils/mb/Unicode
$ git rm *.txt
$ git commit


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 59c069baaee7a4125fe7071e999c9b2a9d0e40d2 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 7 Apr 2017 14:33:56 +0900
Subject: [PATCH 1/2] Fix distclean of src/backend/utils/mb/Unicode

The Makefile there is missing the target of make distclean.
This restores the defeinition.

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index 8f3afa0..c06b7a1 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -68,7 +68,9 @@ WINTEXTS = CP866.TXT CP874.TXT CP936.TXT \
 	CP1256.TXT CP1257.TXT CP1258.TXT
 
 GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
-	KOI8-R.TXT KOI8-U.TXT
+	KOI8-R.TXT KOI8-U.TXT sjis-0213-2004-std.txt euc-jis-2004-std.txt
+
+TEXTS = $(GENERICTEXTS) $(ISO8859TEXTS) $(WINTEXTS)
 
 all: $(MAPS)
 
-- 
2.9.2


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Append implementation

2017-04-07 Thread Amit Khandekar

On 6 April 2017 at 07:33, Andres Freund  wrote:
> On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
>> This is what the earlier versions of my patch had done : just add up
>> per-subplan parallel_workers (1 for non-partial subplan and
>> subpath->parallel_workers for partial subplans) and set this total as
>> the Append parallel_workers.
>
> I don't think that's great, consider e.g. the case that you have one
> very expensive query, and a bunch of cheaper ones. Most of those workers
> wouldn't do much while waiting for the the expensive query.  What I'm
> basically thinking we should do is something like the following
> pythonesque pseudocode:
>
> best_nonpartial_cost = -1
> best_nonpartial_nworkers = -1
>
> for numworkers in 1...#max workers:
>worker_work = [0 for x in range(0, numworkers)]
>
>nonpartial_cost += startup_cost * numworkers
>
># distribute all nonpartial tasks over workers.  Assign tasks to the
># worker with the least amount of work already performed.
>for task in all_nonpartial_subqueries:
>least_busy_worker = worker_work.smallest()
>least_busy_worker += task.total_nonpartial_cost
>
># the nonpartial cost here is the largest amount any single worker
># has to perform.
>nonpartial_cost += worker_work.largest()
>
>total_partial_cost = 0
>for task in all_partial_subqueries:
>total_partial_cost += task.total_nonpartial_cost
>
># Compute resources needed by partial tasks. First compute how much
># cost we can distribute to workers that take shorter than the
># "busiest" worker doing non-partial tasks.
>remaining_avail_work = 0
>for i in range(0, numworkers):
>remaining_avail_work += worker_work.largest() - worker_work[i]
>
># Equally divide up remaining work over all workers
>if remaining_avail_work < total_partial_cost:
>   nonpartial_cost += (worker_work.largest - remaining_avail_work) / 
> numworkers
>
># check if this is the best number of workers
>if best_nonpartial_cost == -1 or best_nonpartial_cost > nonpartial_cost:
>   best_nonpartial_cost = worker_work.largest
>   best_nonpartial_nworkers = nworkers
>
> Does that make sense?

Yeah, I gather what you are trying to achieve is : allocate number of
workers such that the total cost does not exceed the cost of the first
non-partial plan (i.e. the costliest one, because the plans are sorted
by descending cost).

So for non-partial costs such as (20, 10, 5, 2) allocate only 2
workers because the 2nd worker will execute (10, 5, 2) while 1st
worker executes (20).

But for costs such as (4, 4, 4,   20 times), the logic would give
us 20 workers because we want to finish the Append in 4 time units;
and this what we want to avoid when we go with
don't-allocate-too-many-workers approach.

>
>
>> BTW all of the above points apply only for non-partial plans.
>
> Indeed. But I think that's going to be a pretty common type of plan,

Yes it is.

> especially if we get partitionwise joins.

About that I am not sure, because we already have support for parallel
joins, so wouldn't the join subpaths corresponding to all of the
partitions be partial paths ? I may be wrong about that.

But if the subplans are foreign scans, then yes all would be
non-partial plans. This may provoke  off-topic discussion, but here
instead of assigning so many workers to all these foreign plans and
all those workers waiting for the results, a single asynchronous
execution node (which is still in the making) would be desirable
because it would do the job of all these workers.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Duplicate assignment in Unicode/convutils.pm

2017-04-07 Thread Kyotaro HORIGUCHI

Hi, I found that convutils.pl contains a harmless duplicate
assignemnt.

> my $out = {f => $fname, l => $.,
>code => hex($1),
>ucs => hex($2),
>comment => $4,
>direction => BOTH,
>f => $fname,
>l => $.
> };

Of course this is utterly harmless but wrong.

The attached patch fixes this following other perl files around.
No similar mistake is not found there.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/utils/mb/Unicode/convutils.pm b/src/backend/utils/mb/Unicode/convutils.pm
index 479bfe9..42b4ffa 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -47,12 +47,11 @@ sub read_source
 		if (!/^0x([0-9A-Fa-f]+)\s+0x([0-9A-Fa-f]+)\s+(#.*)$/)
 		{
 			print STDERR "READ ERROR at line $. in $fname: $_\n";
 			exit;
 		}
-		my $out = {f => $fname, l => $.,
-   code => hex($1),
+		my $out = {code => hex($1),
    ucs => hex($2),
    comment => $4,
    direction => BOTH,
    f => $fname,
    l => $.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

2017-04-07 Thread Beena Emerson

I ran tests and following are the details:

Machine details:
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):192
On-line CPU(s) list:   0-191
Thread(s) per core:8
Core(s) per socket:1
Socket(s): 24
NUMA node(s):  4
Model: IBM,8286-42A

clients>  16  32   64
 128
size
16MB  18895.63486 28799.48759 37855.39521 27968.88309
32MB  18313.1461  29201.44954 40733.80051 32458.74147
64 MB18055.73141 30875.28687 42713.54447 38009.60542
128MB   18234.31424 33208.65419 48604.5593  45498.27689
256MB19524.36498 35740.19032 54686.16898 54060.11168
512MB 20351.90719 37426.72174 55045.60719 56194.99349
1024MB   19667.67062 35696.19194 53666.60373 54353.0614

I did not get any degradation, in fact, higher values showed performance
improvement for higher client count.

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[HACKERS] Comment typo in publicationcmd.c

Hi all,

Attached fixes a typo in publicationcmd.c file.

s/om/on/

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_typo_in_publicationcmds_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

On 2017-04-07 13:55:51 -0400, Robert Haas wrote:
> On Wed, Apr 5, 2017 at 5:54 AM, Amit Langote
>  wrote:
> > Marked as ready for committer.
> 
> Andres seems to have changed the status of this patch to "Needs
> review" and then, 30 seconds later, to "Waiting on author"
> there's no actual email on the thread explaining what his concerns
> were.  I'm going to set this back to "Ready for Committer" and push it
> out to the next CommitFest.  I think this would be a great feature,
> but I think it's not entirely clear that we have consensus on the
> design, so let's revisit it for next release.

I was kind of looking for the appropriate status of "not entirely clear
that we have consensus on the design" - which isn't really
ready-for-committer, but no waiting-on-author either...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Kevin Grittner

On Fri, Apr 7, 2017 at 12:52 PM, Andres Freund  wrote:

> I'd rather fix the issue, than remove the tests entirely.  Seems quite
> possible to handle blocking on Safesnapshot in a similar manner as 
> pg_blocking_pids?

I'll see what I can figure out.

-- 
Kevin Grittner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

On Fri, Apr 7, 2017 at 2:53 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Andres Freund wrote:
>>> Write Amplification Reduction Method (WARM)
>>> - fair number of people don't think it's ready for v10.
>
>> I'm going over this one now with Pavan, with the intent of getting it in
>> committable shape.
>
> I have to agree with Andres that this is not something to push in, on the
> last day before feature freeze, when a number of people aren't comfortable
> with it.  It looks much more like a feature to push at the start of a
> development cycle.

I strongly agree.  Testing has found some noticeable regressions in
some cases as well, even if there were no outright bugs.  I'm frankly
astonished by the ongoing unwillingness to admit that the objections
(by multiple people) to this patch have any real merit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

Tomas Vondra  writes:
> On 04/07/2017 06:31 PM, Merlin Moncure wrote:
>> I think your math is off.  Looking at your attachments, planning time
>> is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
>> the order of your measured TPS.   How are you measuring TPS?

> Not sure where did you get the 0.056ms?

I don't see that either, but:

> What I see is this in the 9.3 explains:
>   Total runtime: 0.246 ms
> and this in those from 9.6:
>   Planning time: 0.396 ms
>   Execution time: 0.181 ms
> That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.

9.3's EXPLAIN did not measure planning time at all.  The "Total runtime"
it reports corresponds to "Execution time" in the newer version.  So
these numbers indicate that 9.6 is significantly *faster*, not slower,
than 9.3, at least so far as execution of this one example is concerned.

The OP may well be having some performance issue with 9.6, but the
presented material completely fails to demonstrate it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Undefined psql variables



Hello Pavel,


I wish I could have an explanation about why the :?varname (or some other
variant) syntax I suggested has a "namespace" issue.

The advantage that I see is that although it is obviously ugly, it is ugly
in the continuity of the various :["'?]varname syntaxes already offered and
it allows to get rid of "defined varname" which does not look like SQL. A
second advantage is that with the "defined" proposal


I don't think so this argument is valid - \if doesn't look like SQL too.


Sure. I'm talking about the expressions after the "\if" which should be 
as close as SQL, I think. At least that is what Tom required about the 
expression syntax in pgbench, and I tend to agree that psql should avoid 
to mix in another language if possible.



   \if defined var1 and defined var2 or defined var3 and sqlrt() >= ..

Would probably never work work, as it cannot be embedded in another
expression, while it would work with

   \if :?var1 and :?var2 or :?var3 and ...


I don't see any reason why first should not work and second should to work


Because of the mix of client-side and server-side stuff which needs to be 
interpreted. Let us consider:


  \if EXISTS (SELECT * FROM tbl WHERE id=3) AND defined foo

The "exists" is obviously executed server-side, but "defined foo" needs to 
be interpreted client-side, and it means that some parser client side 
would have been able to catch it in the middle of everything else. This 
example also illustrate my "does not look like SQL" point, as the first 
part is clearly SQL and the part after AND is not.


With the second approach, ... "AND :?foo", the ":?foo" reference would be 
substituted directly by psql lexer and replaced on the fly by the answer, 
resulting in "AND TRUE" or "AND FALSE" depending, then the whole result 
(from EXISTS to TRUE/FALSE) could be interpreted server side to get an 
answer.


Basically, catching :?varname seems easier/safer than catching "defined 
varname". I think that Tom's intent is that the defined expressions could 
not be mixed with SQL server side stuff, but I do not see why not, it is 
easy to imagine some use case where it would make sense.



I have a different opinion - the condition expression should not be SQL
necessary. This language is oriented on client side operations. What is
benefit from server side expression?


Because I think it is legitimate to be able to write things like:

  \if NOT pg_extension_is_loaded('units')
\echo 'this application requires the great units extension'
\q
  \endif

  \if (SELECT version FROM app_version) >= 2.0
\echo 'application already installed at 2.0'
\q
  \endif

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread Aleksander Alekseev

Hi.

Recently I've discovered that if there are multiple values of the same
parameter in postgresql.conf PostgreSQL will silently use the last one.
It looks like not the best approach to me. For instance, user can find
the first value in the config file and expect that it will be used, etc.

I suggest to warn users about duplicated parameters. Here is a
corresponding patch.

Thoughts?

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/utils/misc/guc-file.l b/src/backend/utils/misc/guc-file.l
index f01b814..6aa60a4 100644
--- a/src/backend/utils/misc/guc-file.l
+++ b/src/backend/utils/misc/guc-file.l
@@ -304,6 +304,13 @@ ProcessConfigFileInternal(GucContext context, bool applySettings, int elevel)
 			}
 			/* Now mark it as present in file */
 			record->status |= GUC_IS_IN_FILE;
+
+			/* Warn the user about duplicate configuration parameter */
+			ereport(elevel,
+(errcode(ERRCODE_DUPLICATE_OBJECT),
+errmsg("duplicate configuration parameter \"%s\" overrides previous value in file \"%s\" line %u",
+		item->name,
+		item->filename, item->sourceline)));
 		}
 		else if (strchr(item->name, GUC_QUALIFIER_SEPARATOR) == NULL)
 		{


pgp126mUyo0K4.pgp
Description: OpenPGP digital signature

[HACKERS] pgbench --progress-timestamp no longer works correctly

2017-04-07 Thread Jeff Janes

--progress-timestamp is supposed to make -P report a Unix Epoch time stamp,
for easy correlation with the entries in other log files (like the postgres
server log file using %n).

But that broke in this commit:

commit 1d63f7d2d180c8708bc12710254eb7b45823440f
Author: Tom Lane 
Date:   Mon Jan 2 13:41:51 2017 -0500

Use clock_gettime(), if available, in instr_time measurements.


The commit before that one changed pgbench to make it tolerate the change
in clock, but it overlooked --progress-timestamp.

Cheers,

Jeff

[HACKERS] recent deadlock regression test failures

Hi,

There's two machines that recently report changes in deadlock detector
output:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2017-04-05%2018%3A58%3A04
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird=2017-04-07%2004%3A20%3A01

both just failed twice in a row:
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax=HEAD
https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird=HEAD
without previous errors of the same ilk.

I don't think any recent changes are supposed to affect deadlock
detector behaviour?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

On Wed, Mar 29, 2017 at 11:14 PM, Masahiko Sawada  wrote:
> On Wed, Mar 22, 2017 at 2:49 AM, Masahiko Sawada  
> wrote:
>> On Thu, Mar 16, 2017 at 2:37 PM, Vinayak Pokale
>>  wrote:
>>> The following review has been posted through the commitfest application:
>>> make installcheck-world:  tested, passed
>>> Implements feature:   tested, passed
>>> Spec compliant:   tested, passed
>>> Documentation:tested, passed
>>>
>>> I have tested the latest patch and it looks good to me,
>>> so I marked it "Ready for committer".
>>> Anyway, it would be great if anyone could also have a look at the patches 
>>> and send comments.
>>>
>>> The new status of this patch is: Ready for Committer
>>>
>>
>> Thank you for updating but I found a bug in 001 patch. Attached latest 
>> patches.
>> The differences are
>>   * Fixed a bug.
>>   * Ran pgindent.
>>   * Separated the patch supporting GetPrepareID API.
>>
>
> Since previous patches conflict with current HEAD, I attached latest
> set of patches.
>

Vinayak, why did you marked this patch as "Move to next CF"? AFAIU
there is not discussion yet.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Warn users about duplicate configuration parameters

2017-04-07 Thread David G. Johnston

On Fri, Apr 7, 2017 at 8:29 AM, Aleksander Alekseev <
a.aleks...@postgrespro.ru> wrote:

> Andres, Tatsuo,
>
> Thank you for sharing your thoughts.
>
> > -1 - I frequently just override earlier parameters by adding an
> > include at the end of the file.  Also, with postgresql.auto.conf it's
> > even more common to override parameters.
>
> > -1 from me too by the same reason Andres said.
>
> I see no problem here. After all, it's just warnings. We can even add
> a GUC that disables them specially for experienced users who knows what
> she or he is doing. And/or add special case for postgresql.auto.conf.
>
>
-1 for learning how the configuration system work via warning messages.

We've recently added pg_file_settings to provide a holistic view and the
docs cover the topic quite well.

David J.

Re: [HACKERS] SCRAM authentication, take three

On Fri, Apr 7, 2017 at 3:59 AM, Heikki Linnakangas  wrote:
> I think the "SCRAM" part is more important than "SHA-256", so -1 on that.

I agree.  The point here isn't that we're using a better hashing
method, even if a lot of people *think* that's the point.  The point
is we're using a modern algorithm that has nice properties like "you
can't impersonate the client by steeling the verifier, or even by
snooping the exchange".

But "sasl" might be even better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

2017-04-07 Thread Andrew Dunstan



On 04/07/2017 12:57 PM, Andres Freund wrote:
> Hi,
>
> There's two machines that recently report changes in deadlock detector
> output:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax=2017-04-05%2018%3A58%3A04
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=friarbird=2017-04-07%2004%3A20%3A01
>
> both just failed twice in a row:
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=hyrax=HEAD
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=friarbird=HEAD
> without previous errors of the same ilk.
>
> I don't think any recent changes are supposed to affect deadlock
> detector behaviour?
>


Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
recent changes have made the isolation tests run much much longer.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recent deadlock regression test failures

Andrew Dunstan  writes:
> On 04/07/2017 12:57 PM, Andres Freund wrote:
>> I don't think any recent changes are supposed to affect deadlock
>> detector behaviour?

> Both these machines have CLOBBER_CACHE_ALWAYS set. And on both machines
> recent changes have made the isolation tests run much much longer.

Ouch.  I see friarbird's run time for the isolation tests has gone from an
hour and change to over 5 hours in one fell swoop.  hyrax not much better.
Oddly, non-CCA animals don't seem to have changed much.

Eyeing recent patches, it seems like the culprit must be Kevin's
addition to isolationtester's wait query:

@@ -231,6 +231,14 @@ main(int argc, char **argv)
appendPQExpBuffer(_query, ",%s", backend_pids[i]);
appendPQExpBufferStr(_query, "}'::integer[]");

+   /* Also detect certain wait events. */
+   appendPQExpBufferStr(_query,
+" OR EXISTS ("
+"  SELECT * "
+"  FROM pg_catalog.pg_stat_activity "
+"  WHERE pid = $1 "
+"  AND wait_event IN ('SafeSnapshot'))");
+
res = PQprepare(conns[0], PREP_WAITING, wait_query.data, 0, NULL);
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{

This seems a tad ill-considered.  We sweated a lot of blood not so long
ago to get the runtime of that query down, and this seems to have blown
it up again.  And done so for every isolation test case, not only the
ones where it could possibly matter.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Remove unused argument in btree_xlog_split

On Thu, Apr 6, 2017 at 8:21 AM, Aleksander Alekseev
 wrote:
> Hi Robert,
>
>> Hmm.  I don't see anything wrong with that, particularly, but it seems
>> we also don't need the distinction between XLOG_BTREE_SPLIT_L and
>> XLOG_BTREE_SPLIT_L_ROOT or likewise between XLOG_BTREE_SPLIT_R and
>> XLOG_BTREE_SPLIT_R_ROOT -- in which case I think this patch should go
>> a little further and do all of that together.
>
> Thank you for sharing your thoughts on this patch. Here is a new
> version.

Thanks.  Please add this to the next CommitFest, as there seems to be
no urgency (and some risk) in committing it right before feature
freeze.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

Andres Freund wrote:

> Unique Joins
> - Tom's discussing things with David, not sure.

This one was already included-and-removed from 9.6, Tom had said he'd
give it priority during the current cycle as I recall.  It seems unfair
that it's still waiting for review on the last day of pg10's last
commitfest.

> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.

I'm going over this one now with Pavan, with the intent of getting it in
committable shape.

I may be biased, but the claimed performance gains are so large that I
can't let it slip through without additional effort.

> BRIN optimize memory allocation
> - I think Alvaro has indicated that he wants to take care of that?

I am happy to see it move to pg11 to give priority to WARM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

2017-04-07 Thread Peter Geoghegan

On Fri, Apr 7, 2017 at 11:37 AM, Andres Freund  wrote:
> Write Amplification Reduction Method (WARM)
> - fair number of people don't think it's ready for v10.
> - can't move to next fest because it's waiting-on-author, which doesn't
>   allow that.  Doesn't strike me as a useful restriction.

I agree that that CF app restriction makes little sense.

> Indexes with Included Columns (was Covering + unique indexes)
> - Don't think concerns about #columns on truncated tuples have been
>   addressed.  Should imo be returned-with-feedback.

+1.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables vs GRANT

Joe Conway  writes:
> Apparently INSERT and SELECT on the parent partitioned table skip normal
> acl checks on the partitions. Is that intended behavior?

Yes, this matches normal inheritance behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 2017-03 CF entries

Alvaro Herrera  writes:
> Andres Freund wrote:
>> Write Amplification Reduction Method (WARM)
>> - fair number of people don't think it's ready for v10.

> I'm going over this one now with Pavan, with the intent of getting it in
> committable shape.

I have to agree with Andres that this is not something to push in, on the
last day before feature freeze, when a number of people aren't comfortable
with it.  It looks much more like a feature to push at the start of a
development cycle.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance issue with postgres9.6

2017-04-07 Thread Tomas Vondra


On 04/07/2017 06:31 PM, Merlin Moncure wrote:

On Fri, Apr 7, 2017 at 5:16 AM, Prakash Itnal  wrote:

Hello,

We currently use psotgres 9.3 in our products. Recently we upgraded to
postgres 9.6. But with 9.6 we have seen a drastic reduction in throughput.
After analyzing carefully I found that "planner time" in 9.6 is very high.
Below are the details:

Scenario:
1 Create a table with 10 rows.
2 Execute simple query: select * from subscriber where s_id = 100;
3 No update/delete/insert; tried vacuum, full vacuum; by default we enable
auto-vacuum

9.3: Avg of "Total runtime" : 0.24ms [actual throughput: 650 TPS]
9.6: Avg of Total time: 0.56ms (Avg of "Planning time" : 0.38ms + Avg of
"Execution time" : 0.18ms) [actual throughput: 80 TPS]


I think your math is off.  Looking at your attachments, planning time
is 0.056ms, not 0.56ms.  This is in no way relevant to performance on
the order of your measured TPS.   How are you measuring TPS?



Not sure where did you get the 0.056ms? What I see is this in the 9.3 
explains:


 Total runtime: 0.246 ms

and this in those from 9.6:

 Planning time: 0.396 ms

 Execution time: 0.181 ms


That is roughly 0.25ms vs. 0.6ms (0.4+0.2), as reported by Prakash.

Obviously, this "just" 2x slowdown, so it does not match the drop from 
650 to 80 tps. Also, 0.25ms would be ~4000 tps, so I guess this was just 
an example of a query that slowed down.


Prakash, are you using packages (which ones?), or have you compiled from 
sources? Can you provide pg_config output from both versions, and also 
'select * from pg_settings' (the full config)?


It might also be useful to collect profiles, i.e. (1) install debug 
symbols (2) run the query in a loop and (3) collect profiles from that 
one backend using 'perf'.


I assume you're using the same hardware / machine for the tests?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel bitmapscan isn't exercised in regression tests