date:20170511

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Khandekar

>> 3.
>> +   longer satisfy the partition constraint of the containing partition. In 
>> that
>> +   case, if there is some other partition in the partition tree for which 
>> this
>> +   row satisfies its partition constraint, then the row is moved to that
>> +   partition. If there isn't such a partition, an error will occur.
>>
>> Doesn't this error case indicate that this needs to be integrated with
>> Default partition patch of Rahila or that patch needs to take care
>> this error case?
>> Basically, if there is no matching partition, then move it to default 
>> partition.
>
> Will have a look on this. Thanks for pointing this out.

I tried update row movement with both my patch and default-partition
patch applied. And it looks like it works as expected :

1. When an update changes the partitioned key such that the row does
not fit into any of the non-default partitions, the row is moved to
the default partition.
2. If the row does fit into a non-default partition, the row moves
into that partition.
3. If a row from a default partition is updated such that it fits into
any of the non-default partition, it moves into that partition. I
think we can debate on whether the row should stay in the default
partition or move. I think it should be moved, since now the row has a
suitable partition.



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Ashutosh Bapat

On Fri, May 12, 2017 at 8:08 AM, Amit Langote
 wrote:
> On 2017/05/12 11:20, Robert Haas wrote:
>> On Thu, May 11, 2017 at 10:15 PM, Amit Langote
>>  wrote:
>>> On 2017/05/12 10:42, Robert Haas wrote:
 On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  
 wrote:
> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
> for hash also, right?

 I think it should.

 Actually, I think that not supporting nulls for range partitioning may
 have been a fairly bad decision.
>>>
>>> I think the relevant discussion concluded [1] that way, because we
>>> couldn't decide which interface to provide for specifying where NULLs are
>>> placed or because we decided to think about it later.
>>
>> Yeah, but I have a feeling that marking the columns NOT NULL is going
>> to make it really hard to support that in the future when we get the
>> syntax hammered out.  If it had only affected the partition
>> constraints that'd be different.
>
> So, adding keycol IS NOT NULL (like we currently do for expressions) in
> the implicit partition constraint would be more future-proof than
> generating an actual catalogued NOT NULL constraint on the keycol?  I now
> tend to think it would be better.  Directly inserting into a range
> partition with a NULL value for a column currently generates a "null value
> in column \"%s\" violates not-null constraint" instead of perhaps more
> relevant "new row for relation \"%s\" violates partition constraint".
> That said, we *do* document the fact that a NOT NULL constraint is added
> on range key columns, but we might as well document instead that we don't
> currently support routing tuples with NULL values in the partition key
> through a range-partitioned table and so NULL values cause error.

in get_partition_for_tuple() we have
if (key->strategy == PARTITION_STRATEGY_RANGE)
{
/* Disallow nulls in the range partition key of the tuple */
for (i = 0; i < key->partnatts; i++)
if (isnull[i])
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("range partition key of row contains null")));
}

Instead of throwing an error here, we should probably return -1 and
let the error be ""no partition of relation \"%s\" found for row",
which is the real error, not having a partition which can accept NULL.
If in future we decide to support NULL values in partition keys, we
need to just remove above code from get_partition_for_tuple() and
everything will work as is. I am assuming that we don't add any
implicit/explicit NOT NULL constraint right now.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Khandekar

On 12 May 2017 at 08:30, Amit Kapila  wrote:
> On Thu, May 11, 2017 at 5:41 PM, Amit Khandekar  
> wrote:
>> On 11 May 2017 at 17:23, Amit Kapila  wrote:
>>> On Fri, Mar 17, 2017 at 4:07 PM, Amit Khandekar  
>>> wrote:
 On 4 March 2017 at 12:49, Robert Haas  wrote:
> On Thu, Mar 2, 2017 at 11:53 AM, Amit Khandekar  
> wrote:
>> I think it does not make sense running after row triggers in case of
>> row-movement. There is no update happened on that leaf partition. This
>> reasoning can also apply to BR update triggers. But the reasons for
>> having a BR trigger and AR triggers are quite different. Generally, a
>> user needs to do some modifications to the row before getting the
>> final NEW row into the database, and hence [s]he defines a BR trigger
>> for that. And we can't just silently skip this step only because the
>> final row went into some other partition; in fact the row-movement
>> itself might depend on what the BR trigger did with the row. Whereas,
>> AR triggers are typically written for doing some other operation once
>> it is made sure the row is actually updated. In case of row-movement,
>> it is not actually updated.
>
> How about running the BR update triggers for the old partition and the
> AR update triggers for the new partition?  It seems weird to run BR
> update triggers but not AR update triggers.  Another option would be
> to run BR and AR delete triggers and then BR and AR insert triggers,
> emphasizing the choice to treat this update as a delete + insert, but
> (as Amit Kh. pointed out to me when we were in a room together this
> week) that precludes using the BEFORE trigger to modify the row.
>
>>>
>>> I also find the current behavior with respect to triggers quite odd.
>>> The two points that appears odd are (a) Executing both before row
>>> update and delete triggers on original partition sounds quite odd.
>> Note that *before* trigger gets fired *before* the update happens. The
>> actual update may not even happen, depending upon what the trigger
>> does. And then in our case, the update does not happen; not just that,
>> it is transformed into delete-insert. So then we should fire
>> before-delete trigger.
>>
>
> Sure, I am aware of that point, but it doesn't seem obvious that both
> update and delete BR triggers get fired for original partition.  As a
> developer, it might be obvious to you that as you have used delete and
> insert interface, it is okay that corresponding BR/AR triggers get
> fired, however, it is not so obvious for others, rather it appears
> quite odd.

I agree that it seems a bit odd that we are firing both update and
delete triggers on the same partition. But if you look at the
perspective that the update=>delete+insert is a user-aware operation,
it does not seem that odd.

> If we try to compare it with the non-partitioned update,
> there also it is internally a delete and insert operation, but we
> don't fire triggers for those.

For a non-partitioned table, the delete+insert is internal, whereas
for partitioned table, it is completely visible to the user.

>
>>> (b) It seems inconsistent to consider behavior for row and statement
>>> triggers differently
>>
>> I am not sure whether we should compare row and statement triggers.
>> Statement triggers are anyway fired only per-statement, depending upon
>> whether it is update or insert or delete. It has nothing to do with
>> how the rows are modified.
>>
>
> Okay.  The broader point I was trying to convey was that the way this
> patch defines the behavior of triggers doesn't sound good to me.  It
> appears to me that in this thread multiple people have raised points
> around trigger behavior and you should try to consider those.

I understand that there is no single solution which will provide
completely intuitive trigger behaviour. Skipping BR delete trigger
should be fine. But then for consistency, we should skip BR insert
trigger as well, the theory being that the delete+insert are not fired
by the user so we should not fire them. But I feel both should be
fired to avoid any consequences unexpected to the user who has
installed those triggers.

The only specific concern of yours is that of firing *both* update as
well as insert triggers on the same table, right ? My explanation for
this was : we have done this before for UPSERT, and we had documented
the same. We can do it here also.

>  Apart from the options, Robert has suggested, another option could be that
> we allow firing BR-AR update triggers for original partition and BR-AR
> insert triggers for the new partition.  In this case, one can argue
> that we have not actually updated the row in the original partition,
> so there is no need to fire AR update triggers,

Yes that's what I think. If there is no

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Khandekar

On 12 May 2017 at 10:01, Amit Kapila  wrote:
> On Fri, May 12, 2017 at 9:27 AM, Amit Kapila  wrote:
>> On Thu, May 11, 2017 at 5:45 PM, Amit Khandekar  
>> wrote:
>>> On 11 May 2017 at 17:24, Amit Kapila  wrote:
 Few comments:
 1.
 Operating directly on partition doesn't allow update to move row.
 Refer below example:
 create table t1(c1 int) partition by range(c1);
 create table t1_part_1 partition of t1 for values from (1) to (100);
 create table t1_part_2 partition of t1 for values from (100) to (200);
 insert into t1 values(generate_series(1,11));
 insert into t1 values(generate_series(110,120));

 postgres=# update t1_part_1 set c1=122 where c1=11;
 ERROR:  new row for relation "t1_part_1" violates partition constraint
 DETAIL:  Failing row contains (122).
>>>
>>> Yes, as Robert said, this is expected behaviour. We move the row only
>>> within the partition subtree that has the update table as its root. In
>>> this case, it's the leaf partition.
>>>
>>
>> Okay, but what is the technical reason behind it?  Is it because the
>> current design doesn't support it or is it because of something very
>> fundamental to partitions?
No, we can do that if decide to update some table outside the
partition subtree. The reason is more of semantics. I think the user
who is running UPDATE for a partitioned table, should not be
necessarily aware of the structure of the complete partition tree
outside of the current subtree. It is always safe to return error
instead of moving the data outside of the subtree silently.

>>
>
> One plausible theory is that as Select's on partitions just returns
> the rows of that partition, the update should also behave in same way.

Yes , right. Or even inserts fail if we try to insert data that does
not fit into the current subtree.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Ashutosh Bapat

On Fri, May 12, 2017 at 7:12 AM, Robert Haas  wrote:
> On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  wrote:
>> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
>> for hash also, right?
>
> I think it should.
>
+1

As long as we can hash a NULL value, we should place a value with NULL
key in the corresponding partition, most probably the one with
remainder 0.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-11 Thread Michael Paquier

On Fri, May 12, 2017 at 1:28 PM, Tsunakawa, Takayuki
 wrote:
> Likewise, when the first host has already reached max_connections, libpq 
> doesn't attempt the connection aginst later hosts.

It seems to me that the feature is behaving as wanted. Or in short
attempt to connect to the next host only if a connection cannot be
established. If there is a failure once the exchange with the server
has begun, just consider it as a hard failure. This is an important
property for authentication and SSL connection failures actually.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Kapila

On Fri, May 12, 2017 at 9:27 AM, Amit Kapila  wrote:
> On Thu, May 11, 2017 at 5:45 PM, Amit Khandekar  
> wrote:
>> On 11 May 2017 at 17:24, Amit Kapila  wrote:
>>> Few comments:
>>> 1.
>>> Operating directly on partition doesn't allow update to move row.
>>> Refer below example:
>>> create table t1(c1 int) partition by range(c1);
>>> create table t1_part_1 partition of t1 for values from (1) to (100);
>>> create table t1_part_2 partition of t1 for values from (100) to (200);
>>> insert into t1 values(generate_series(1,11));
>>> insert into t1 values(generate_series(110,120));
>>>
>>> postgres=# update t1_part_1 set c1=122 where c1=11;
>>> ERROR:  new row for relation "t1_part_1" violates partition constraint
>>> DETAIL:  Failing row contains (122).
>>
>> Yes, as Robert said, this is expected behaviour. We move the row only
>> within the partition subtree that has the update table as its root. In
>> this case, it's the leaf partition.
>>
>
> Okay, but what is the technical reason behind it?  Is it because the
> current design doesn't support it or is it because of something very
> fundamental to partitions?
>

One plausible theory is that as Select's on partitions just returns
the rows of that partition, the update should also behave in same way.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Improvement in log message of logical replication worker

2017-05-11 Thread Masahiko Sawada

Hi,

When I created a subscription with copydata option that corresponds to
a publication that includes several tables, I got the following log
messages.

[7019] LOG:  starting logical replication worker for subscription "hoge_sub"
[7047] LOG:  logical replication apply for subscription hoge_sub started
[7047] LOG:  starting logical replication worker for subscription "hoge_sub"
[7049] LOG:  logical replication sync for subscription hoge_sub, table
pgbench_accounts started
[7047] LOG:  starting logical replication worker for subscription "hoge_sub"
[7051] LOG:  logical replication sync for subscription hoge_sub, table
pgbench_branches started
[7049] LOG:  logical replication synchronization worker finished processing
[7051] LOG:  logical replication synchronization worker finished processing
[7047] LOG:  starting logical replication worker for subscription "hoge_sub"
[7047] LOG:  starting logical replication worker for subscription "hoge_sub"
[7056] LOG:  logical replication sync for subscription hoge_sub, table
pgbench_history started
[7057] LOG:  logical replication sync for subscription hoge_sub, table
pgbench_tellers started
[7056] LOG:  logical replication synchronization worker finished processing
[7057] LOG:  logical replication synchronization worker finished processing

PID 7019 is a logical replication launcher, PID 7047 is a apply worker
and other processes are table sync worker. I set
max_sync_workers_per_subscription = 2.

I got same log messages 'starting logical replication worker for
subscription' total 5 times but actually 4 of them mean to launch
table sync worker and another one means apply worker. We cannot
distinguish them. Also, I got same log messages 'logical replication
synchronization worker finished processing' total 4 times but I think
it's better to show the table name in finish log message as well. Any
thoughts?

Attached small patch adds relid to these log messages if it's valid.
I'd like to propose it for PG10 if possible, but since It's not a bug
and not an open item we can add it to next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


log_message_improvement_for_logical_repl.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-05-11 Thread Tsunakawa, Takayuki

Hello,

I found a problem with libpq connection failover.  When libpq cannot connect to 
earlier hosts in the host list, it doesn't try to connect to other hosts.  For 
example, when you specify a wrong port that some non-postgres program is using, 
or some non-postgres program is using PG's port unexpectedly, you get an error 
like this:

$ psql -h localhost -p 23
psql: received invalid response to SSL negotiation: 
$ psql -h localhost -p 23 -d "sslmode=disable"
psql: expected authentication request from server, but received 

Likewise, when the first host has already reached max_connections, libpq 
doesn't attempt the connection aginst later hosts.

The attached patch fixes this.  I'll add this item in the PostgreSQL 10 Open 
Items.


Regards
Takayuki Tsunakawa



libpq-reconnect-on-error.patch
Description: libpq-reconnect-on-error.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Hash Functions

2017-05-11 Thread Jeff Davis

https://www.postgresql.org/message-id/camp0ubeo3fzzefie1vmc1ajkkrpxlnzqooaseu6o-c+...@mail.gmail.com

In that thread, I pointed out some important considerations for the
hash functions themselves. This is a follow-up, after I looked more
carefully.

1. The hash functions as they exist today aren't portable -- they can
return different results on different machines. That means using these
functions for hash partitioning would yield different contents for the
same partition on different architectures (and that's bad, considering
they are logical partitions and not some internal detail).

The core hashing algorithm is based on 32-bit chunks, so it's only
portable if the input is an int32 (or array of int32s). That's not
true for varchar (etc.) or numeric (which is based on an array of
int16s). There is a hack for int8, but see issue #2 below.

We could try to mark portable vs. unportable hash functions, but it
seems quite valuable to be able to logically partition on varchar, so
I think we should have some portable answer there. Another annoyance
is that we would have to assume container types are unportable, or
make them inherit the portability of the underlying type's hash
function.

We could revert 26043592 (copied Tom because that was his patch) to
make hash_any() go back to being portable -- do we know what that
speedup actually was? Maybe the benefit is smaller on newer
processors? Another option is to try to do some combination of
byteswapping and word-at-a-time, which might be better than
byte-at-a-time if the byteswapping is done with a native instruction.

2. hashint8() is a little dubious. If the input is positive, it XORs
the high bits; if negative, it XORs the complement of the high bits.
That works for compatibility with hashint2/4, but it does not cause
good hash properties[1]. I prefer that we either (a) upcast int2/4 to
int8 before hashing and then hash the whole 64 bits; or (b) if the
value is within range, downcast to int4, otherwise hash the whole 64
bits.

3. We should downcast numeric to int4/8 before hashing if it's in
range, so that it can be a part of the same opfamily as int2/4/8.

4. text/varchar should use hashbpchar() so that they can be part of
the same opfamily. Trailing blanks seem unlikely to be significant for
most real-world data anyway.

5. For salts[2], I don't think it's too hard to support them in an
optional way. We just allow the function to be a two-argument function
with a default. Places that care about specifying the salt may do so
if the function has pronargs==2, otherwise it just gets the default
value. If we have salts, I don't think having 64-bit hashes is very
important. If we run out of bits, we can just salt the hash function
differently and get more hash bits. This is not urgent and I believe
we should just implement salts when and if some algorithm needs them.

Regards,
Jeff Davis

[1] You can a kind of mirroring in the hash outputs indicating bad mixing:

postgres=# select hashint8((2^32-100)::int8);
 hashint8
--
 41320869
(1 row)

postgres=# select hashint8((-2^32-100)::int8);
 hashint8
--
 41320869
(1 row)

postgres=# select hashint8((2^32-101)::int8);
  hashint8
-
 -1214482326
(1 row)

postgres=# select hashint8((-2^32-101)::int8);
  hashint8
-
 -1214482326
(1 row)


[2] Not sure I'm using the term "salt" properly. I really just mean a
way to affect the initial state, which I think is good enough for our
purposes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-11 Thread Bossart, Nathan

On 5/11/17, 7:20 PM, "Michael Paquier"  wrote:
> Browsing the code

Thanks for taking a look.

> It seems to me that you don't need those extra square brackets around
> the column list. Same remark for vacuum.sgml.

I’ll remove them.  My intent was to ensure that we didn’t accidentally suggest 
that statements like “VACUUM , foo, bar;” were accepted, but the extra brackets 
don’t seem to fix that, and I don’t foresee much confusion anyway.  

> In short for all that, it is enough to have "[, ... ]" to document
> that a list is accepted.

That makes sense, I’ll fix it.

> It seems to me that it would have been less invasive to loop through
> vacuum() for each relation. Do you foresee advantages in allowing
> vacuum() to handle multiple? I am not sure if is is worth complicating
> the current logic more considering that you have as well so extra
> logic to carry on option values.

That was the approach I first prototyped.  The main disadvantage that I found 
was that the command wouldn’t fail-fast if one of the tables or columns didn’t 
exist, and I thought that it might be frustrating to encounter such an error in 
the middle of vacuuming several large tables.  It’s easy enough to change the 
logic to emit a warning and simply move on to the next table, but that seemed 
like it could be easily missed among the rest of the vacuum log statements 
(especially with the verbose option specified).  What are your thoughts on this?

In the spirit of simplifying things a bit, I do think it is possible to 
eliminate one of the new node types, since the fields for each are almost 
identical.

> + * used for error messages.  In the case that relid is valid, rels
> + * must have exactly one element corresponding to the specified relid.
> s/rels/relations/ as variable name?

Agreed, that seems nicer.

Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-05-11 17:21:18 -0400, Tom Lane wrote:
>> I doubt my machine is 6X faster than yours,
>> so this indicates that the subtransaction overhead is pretty real.

> Isn't that pretty much the point?  The whole open_share_lock()
> optimization looks like it really only can make a difference with
> subtransactions?

Uh, no; I'm pretty sure that that code is older than subtransactions.
The point of it is to avoid taking and releasing a lock over and over
within a single transaction.

>> Hm.  I don't think that's a sufficient code change, because if you do it
>> like that then the lock remains held after nextval() returns.

> Hm?  That's not new, is it?  We previously did a LockRelationOid(seq->relid) 
> and
> then relation_open(seq->relid, NoLock)?

Right, but the existing code is *designed* to hold the lock till end of
top-level transaction, regardless of what happens in any subtransaction.
My understanding of your complaint is that you do not think that's OK
for any lock stronger than AccessShareLock.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Kapila

On Thu, May 11, 2017 at 5:45 PM, Amit Khandekar  wrote:
> On 11 May 2017 at 17:24, Amit Kapila  wrote:
>> Few comments:
>> 1.
>> Operating directly on partition doesn't allow update to move row.
>> Refer below example:
>> create table t1(c1 int) partition by range(c1);
>> create table t1_part_1 partition of t1 for values from (1) to (100);
>> create table t1_part_2 partition of t1 for values from (100) to (200);
>> insert into t1 values(generate_series(1,11));
>> insert into t1 values(generate_series(110,120));
>>
>> postgres=# update t1_part_1 set c1=122 where c1=11;
>> ERROR:  new row for relation "t1_part_1" violates partition constraint
>> DETAIL:  Failing row contains (122).
>
> Yes, as Robert said, this is expected behaviour. We move the row only
> within the partition subtree that has the update table as its root. In
> this case, it's the leaf partition.
>

Okay, but what is the technical reason behind it?  Is it because the
current design doesn't support it or is it because of something very
fundamental to partitions?  Is it because we can't find root partition
from leaf partition?

+ is_partitioned_table =
+ root_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE;
+
+ if (is_partitioned_table)
+ ExecSetupPartitionTupleRouting(
+ root_rel,
+ /* Build WITH CHECK OPTION constraints for leaf partitions */
+ ExecInitPartitionWithCheckOptions(mtstate, root_rel);
+ /* Build a projection for each leaf partition rel. */
+ ExecInitPartitionReturningProjection(mtstate, root_rel);
..
+ /* It's not a partitioned table after all; error out. */
+ ExecPartitionCheckEmitError(resultRelInfo, slot, estate);

When we are anyway going to give error if table is not a partitioned
table, then isn't it better to give it early when we first identify
that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-05-11 Thread Tom Lane

Michael Paquier  writes:
> Bruce, the release notes do not mention yet that support for cleartext
> passwords is removed. This has been done recently by Heikki in
> eb61136d. This has as consequence that CREATE ROLE PASSWORD
> UNENCRYPTED returns an error, and that password_encryption loses its
> value 'off' compared to last releases.

The release notes only say that they're current through 4-22.  The
usual process is to update them in batches every so often.  It'd be
great if Bruce has time to do another round before Monday, but the
current situation is not really broken.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-05-11 Thread Tom Lane

Robert Haas  writes:
> On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  
> wrote:
>> ... I'd like to propose to change relation
>> extension lock management so that it works using LWLock instead.

> That's not a good idea because it'll make the code that executes while
> holding that lock noninterruptible.

Is that really a problem?  We typically only hold it over one kernel call,
which ought to be noninterruptible anyway.  Also, the CheckpointLock is
held for far longer, and we've not heard complaints about that one.

I'm slightly suspicious of the claim that we don't need deadlock
detection.  There are places that e.g. touch FSM while holding this
lock.  It might be all right but it needs close review, not just an
assertion that it's not a problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG 10 release notes

2017-05-11 Thread Michael Paquier

On Mon, May 8, 2017 at 10:50 PM, Bruce Momjian  wrote:
> [other stuff]

Bruce, the release notes do not mention yet that support for cleartext
passwords is removed. This has been done recently by Heikki in
eb61136d. This has as consequence that CREATE ROLE PASSWORD
UNENCRYPTED returns an error, and that password_encryption loses its
value 'off' compared to last releases.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multi-column range partition constraint

2017-05-11 Thread Robert Haas

On Wed, May 10, 2017 at 10:21 PM, Amit Langote
 wrote:
>> Next update on this issue by Thursday 5/11.
>
> Attached updated patches.

Thanks.  0001, at least, really needs a pgindent run.  Also, my
compiler has this apparently-justifiable complaint:

partition.c:1767:5: error: variable 'cur_op_intp' is used uninitialized whenever
  'for' loop exits because its condition is false
  [-Werror,-Wsometimes-uninitialized]
foreach(opic, op_infos)
^~~
../../../src/include/nodes/pg_list.h:162:30: note: expanded from macro 'foreach'
for ((cell) = list_head(l); (cell) != NULL; (cell) = lnext(cell))
^~

If by some mischance the condition intp->opfamily_id ==
key->partopfamily[l - 1] is never satisfied, this is going to seg
fault, which isn't good.  It's pretty easy to imagine how this could
be caused by corrupted system catalog contents or some other bug, so I
think you should initialize the variable to NULL and elog() if it's
still NULL when the loop exits.  There is a similar problem in one
other place.

But actually, I think maybe this logic should just go away altogether,
because backtracking when we realize that an unbounded bound follows
is pretty ugly. Can't we just fix the loop that precedes it so that it
treats next-bound-unbounded the same as this-is-the-last-bound (i.e.
when it is not the case that j - i < num_or_arms)?

+if (partexprs_item)
+partexprs_item_saved = *partexprs_item;

Is there a reason why you're saving the contents of the ListCell
instead of just a pointer to it?  If key->partexprs == NIL,
partexprs_item_saved will not get initialized, but the next loop will
still point to it.  That's dangerously close to a live bug, and at any
rate a compiler warning seems likely.

I don't really understand the motivation behind the
nulltest_generated[] array.  In the upper loop, we'll use any given
value of i in only one loop iteration, so having logic to prevent a
nulltest more than once doesn't seem like it will ever actually do
anything.  In the lower loop, it's actually doing something, but if
(num_or_arms == 0) /* Only do this the first time through */ would
have the same effect, I think.

I suggest num_or_arms -> current_or_arm and maxnum_or_arms -> num_or_arms.

The for_both_cell loop seems to have two different exit conditions:
either we can run off the lists, or we can find j - i sufficiently
large.  But shouldn't those things happen at the same time?

+ * If both lower_or_arms and upper_or_arms are empty, we append a
+ * constant-true expression.  That happens if all of the literal values in
+ * both the lower and upper bound lists are UNBOUNDED.
  */
-if (!OidIsValid(operoid))
+if (lower_or_arms == NIL && upper_or_arms == NIL)
+result = lappend(result, makeBoolConst(true, false));

I think it would be better to instead say, at the very end after all
else is done:

if (result == NIL)
result = make_list1(makeBoolConst(true, false));

Next update by tomorrow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-11 Thread Tom Lane

Neha Khatri  writes:
> On Fri, May 12, 2017 at 12:56 AM, Tom Lane  wrote:
>> Neha Khatri  writes:
>>> Following are pg_controldata interfaces that might require change:
>>> Latest checkpoint location:
>>> Prior checkpoint location:
>>> Latest checkpoint's REDO location:
>>> Minimum recovery ending location:
>>> Backup start location:
>>> Backup end location:

>> My inclination is to leave these messages alone.  They're not really
>> inconsistent with anything.  Where we seem to be ending up is that
>> "lsn" will be used in things like function and column names, but
>> documentation will continue to spell out phrases like "WAL location".

> Are you indicating that the above phrases do not require change because
> those are consistent with other references. Or the other thread [1]
> (renaming 'transaction log') should take care of it.

Personally I'm happy to leave those particular messages as they are.
Yes, a case could be made for changing them to say "LSN", and a different
case could be made for changing them to say "WAL location", but I don't
think either of those are real improvements.  Also, this'd be propagating
the compatibility problem into yet a new place, because there are surely
user-written scripts out there that grep the output for exactly these
spellings.

It's a judgment call though, and others might have different opinions.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-05-11 Thread Amit Kapila

On Thu, May 11, 2017 at 5:41 PM, Amit Khandekar  wrote:
> On 11 May 2017 at 17:23, Amit Kapila  wrote:
>> On Fri, Mar 17, 2017 at 4:07 PM, Amit Khandekar  
>> wrote:
>>> On 4 March 2017 at 12:49, Robert Haas  wrote:
 On Thu, Mar 2, 2017 at 11:53 AM, Amit Khandekar  
 wrote:
> I think it does not make sense running after row triggers in case of
> row-movement. There is no update happened on that leaf partition. This
> reasoning can also apply to BR update triggers. But the reasons for
> having a BR trigger and AR triggers are quite different. Generally, a
> user needs to do some modifications to the row before getting the
> final NEW row into the database, and hence [s]he defines a BR trigger
> for that. And we can't just silently skip this step only because the
> final row went into some other partition; in fact the row-movement
> itself might depend on what the BR trigger did with the row. Whereas,
> AR triggers are typically written for doing some other operation once
> it is made sure the row is actually updated. In case of row-movement,
> it is not actually updated.

 How about running the BR update triggers for the old partition and the
 AR update triggers for the new partition?  It seems weird to run BR
 update triggers but not AR update triggers.  Another option would be
 to run BR and AR delete triggers and then BR and AR insert triggers,
 emphasizing the choice to treat this update as a delete + insert, but
 (as Amit Kh. pointed out to me when we were in a room together this
 week) that precludes using the BEFORE trigger to modify the row.

>>
>> I also find the current behavior with respect to triggers quite odd.
>> The two points that appears odd are (a) Executing both before row
>> update and delete triggers on original partition sounds quite odd.
> Note that *before* trigger gets fired *before* the update happens. The
> actual update may not even happen, depending upon what the trigger
> does. And then in our case, the update does not happen; not just that,
> it is transformed into delete-insert. So then we should fire
> before-delete trigger.
>

Sure, I am aware of that point, but it doesn't seem obvious that both
update and delete BR triggers get fired for original partition.  As a
developer, it might be obvious to you that as you have used delete and
insert interface, it is okay that corresponding BR/AR triggers get
fired, however, it is not so obvious for others, rather it appears
quite odd.  If we try to compare it with the non-partitioned update,
there also it is internally a delete and insert operation, but we
don't fire triggers for those.

>> (b) It seems inconsistent to consider behavior for row and statement
>> triggers differently
>
> I am not sure whether we should compare row and statement triggers.
> Statement triggers are anyway fired only per-statement, depending upon
> whether it is update or insert or delete. It has nothing to do with
> how the rows are modified.
>

Okay.  The broader point I was trying to convey was that the way this
patch defines the behavior of triggers doesn't sound good to me.  It
appears to me that in this thread multiple people have raised points
around trigger behavior and you should try to consider those.   Apart
from the options, Robert has suggested, another option could be that
we allow firing BR-AR update triggers for original partition and BR-AR
insert triggers for the new partition.  In this case, one can argue
that we have not actually updated the row in the original partition,
so there is no need to fire AR update triggers, but I feel that is
what we do for non-partitioned table update and it should be okay here
as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Tom Lane

Alvaro Herrera  writes:
> Hmm, yeah, makes sense.  Here's a patch for this approach.

I did some code review on this patch.  The attached update makes the
following hopefully-uncontroversial changes:

* fix inconsistencies in the set of fields in CreateStatsStmt

* get rid of struct CreateStatsArgument, which seems to be a leftover

* tighten up parse checks in CreateStatistics(), and improve comments

* add missing check for ownership of target table; the documentation
  says this is checked, and surely it is a security bug that it's not

* adjust regression test to compensate for the above, because it
  fails otherwise

* change end of regression test to suppress cascade drop messages;
  it's astonishing that this test hasn't caused intermittent buildfarm
  failures due to unstable drop order

I did not review the rest of the regression test changes, nor the
tab-complete changes.  I can however report that DROP STATISTICS 
doesn't successfully complete any stats object names.

Also, while reading the docs changes, I got rather ill from the
inconsistent and not very grammatical handling of "statistics" as a
noun, particularly the inability to decide from one sentence to the next
if that is singular or plural.  In the attached, I fixed this in the
ref/*_statistics.sgml files by always saying "statistics object" instead.
If people agree that that reads better, we should make an effort to
propagate that terminology elsewhere in the docs, and into error messages,
objectaddress.c output, etc.

Although I've not done anything about it here, I'm not happy about the
handling of dependencies for stats objects.  I do not think that cloning
RemoveStatistics as RemoveStatisticsExt is sane at all.  The former is
meant to deal with cleaning up pg_statistic rows that we know will be
there, so there's no real need to expend pg_depend overhead to track them.
For objects that are only loosely connected, the right thing is to use
the dependency system; in particular, this design has no real chance of
working well with cross-table stats.  Also, it's really silly to have
*both* this hard-wired mechanism and a pg_depend entry; the latter is
surely redundant if you have the former.  IMO we should revert
RemoveStatisticsExt and instead handle things by making stats objects
auto-dependent on the individual column(s) they reference (not the whole
table).

I'm also of the opinion that having an AUTO dependency, rather than
a NORMAL dependency, on the stats object's schema is the wrong semantics.
There isn't any other case where you can drop a non-empty schema without
saying CASCADE, and I'm mystified why this case should act that way.

Lastly, I tried the example given in the CREATE STATISTICS reference page,
and it doesn't seem to work.  Without the stats object, the two queries
are both estimated at one matching row, whereas they really produce 100
and 0 rows respectively.  With the stats object, they seem to both get
estimated at ~100 rows, which is a considerable improvement for one case
but a considerable disimprovement for the other.  If this is not broken,
I'd like to know why not.  What's the point of an expensive extended-
stats mechanism if it can't get this right?

regards, tom lane

diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b10b734..32e17ee 100644
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
*** WHERE tablename = 'road';
*** 1132,1139 
   To inspect functional dependencies on a statistics
   stts, you may do this:
  
! CREATE STATISTICS stts WITH (dependencies)
!ON (zip, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxname, stxkeys, stxdependencies
FROM pg_statistic_ext
--- 1132,1139 
   To inspect functional dependencies on a statistics
   stts, you may do this:
  
! CREATE STATISTICS stts (dependencies)
!ON zip, city FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxname, stxkeys, stxdependencies
FROM pg_statistic_ext
*** EXPLAIN (ANALYZE, TIMING OFF) SELECT * F
*** 1219,1226 
   Continuing the above example, the n-distinct coefficients in a ZIP
   code table may look like the following:
  
! CREATE STATISTICS stts2 WITH (ndistinct)
!ON (zip, state, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxkeys AS k, stxndistinct AS nd
FROM pg_statistic_ext
--- 1219,1226 
   Continuing the above example, the n-distinct coefficients in a ZIP
   code table may look like the following:
  
! CREATE STATISTICS stts2 (ndistinct)
!ON zip, state, city FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxkeys AS k, stxndistinct AS nd
FROM pg_statistic_ext
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 11580bf..ef847b9 100644
*** a/doc/src/sgml/planstats.sgml
--- b/doc/src/sgml/planstats.sgml
*** EXPLAIN (ANALYZE, TIMING OFF) SELECT * F
***

Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-05-11 Thread Masahiko Sawada

On Thu, May 11, 2017 at 10:48 AM, Michael Paquier
 wrote:
> Hi all,
>
> I had my eyes on the WAL sender code this morning, and I have noticed
> that walsender.c is not completely consistent with the PID lookups it
> does in walsender.c. In two code paths, the PID value is checked
> without holding the WAL sender spin lock (WalSndRqstFileReload and
> pg_stat_get_wal_senders), which looks like a very bad idea contrary to
> what the new WalSndWaitStopping() does and what InitWalSenderSlot() is
> doing for ages.
>
> Any thoughts about the patch attached to make things more consistent?
> It seems to me that having some safeguards would be nice for
> robustness.

+1. I think this is a sensible change.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 10:07 AM, Rahila Syed  wrote:
> Please find attached an updated patch with review comments and bugs reported
> till date implemented.

You haven't done anything about the repeated suggestion that this
should also cover range partitioning.

+/*
+ * If the partition is the default partition switch
+ * back to PARTITION_STRATEGY_LIST
+ */
+if (spec->strategy == PARTITION_DEFAULT)
+result_spec->strategy = PARTITION_STRATEGY_LIST;
+else
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+ errmsg("invalid bound specification for a list
partition"),
  parser_errposition(pstate, exprLocation(bound;

This is incredibly ugly.  I don't know exactly what should be done
about it, but I think PARTITION_DEFAULT is a bad idea and has got to
go.  Maybe add a separate isDefault flag to PartitionBoundSpec.

+/*
+ * Skip if it's a partitioned table.  Only RELKIND_RELATION
+ * relations (ie, leaf partitions) need to be scanned.
+ */
+if (part_rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)

What about foreign table partitions?

Doesn't it strike you as a bit strange that get_qual_for_default()
doesn't return a qual?  Functions should generally have names that
describe what they do.

+bound_datums = list_copy(spec->listdatums);
+
+boundspecs = get_qual_for_default(parent, defid);
+
+foreach(cell, bound_datums)
+{
+Node *value = lfirst(cell);
+boundspecs = lappend(boundspecs, value);
+}

There's an existing function that you can use to concatenate two lists
instead of open-coding it.

Also, I think that before you ask anyone to spend too much more time
and energy reviewing this, you should really add the documentation and
regression tests which you mentioned as a TODO.  And run the code
through pgindent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Amit Langote

On 2017/05/12 11:20, Robert Haas wrote:
> On Thu, May 11, 2017 at 10:15 PM, Amit Langote
>  wrote:
>> On 2017/05/12 10:42, Robert Haas wrote:
>>> On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  wrote:
 We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
 for hash also, right?
>>>
>>> I think it should.
>>>
>>> Actually, I think that not supporting nulls for range partitioning may
>>> have been a fairly bad decision.
>>
>> I think the relevant discussion concluded [1] that way, because we
>> couldn't decide which interface to provide for specifying where NULLs are
>> placed or because we decided to think about it later.
> 
> Yeah, but I have a feeling that marking the columns NOT NULL is going
> to make it really hard to support that in the future when we get the
> syntax hammered out.  If it had only affected the partition
> constraints that'd be different.

So, adding keycol IS NOT NULL (like we currently do for expressions) in
the implicit partition constraint would be more future-proof than
generating an actual catalogued NOT NULL constraint on the keycol?  I now
tend to think it would be better.  Directly inserting into a range
partition with a NULL value for a column currently generates a "null value
in column \"%s\" violates not-null constraint" instead of perhaps more
relevant "new row for relation \"%s\" violates partition constraint".
That said, we *do* document the fact that a NOT NULL constraint is added
on range key columns, but we might as well document instead that we don't
currently support routing tuples with NULL values in the partition key
through a range-partitioned table and so NULL values cause error.

Can we still decide to do that instead?

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-11 Thread Neha Khatri

On Fri, May 12, 2017 at 12:56 AM, Tom Lane  wrote:

> Neha Khatri  writes:
> > [In case forgotten] pg_controdata and pg_waldump interfaces should also
> be
> > considered for this standardization.
>
> > Following are pg_controldata interfaces that might require change:
>
> >   Latest checkpoint location:
> >   Prior checkpoint location:
> >   Latest checkpoint's REDO location:
> >   Minimum recovery ending location:
> >   Backup start location:
> >   Backup end location:
>
> My inclination is to leave these messages alone.  They're not really
> inconsistent with anything.  Where we seem to be ending up is that
> "lsn" will be used in things like function and column names, but
> documentation will continue to spell out phrases like "WAL location".
>
> There is another open thread about converting said phrases to be
> more consistent --- a lot of them currently say "transaction log
> location", which is not a very good choice because it invites
> confusion with pg_xact nee pg_clog.  But I think that's mostly
> just documentation changes, and in any case it's better done as
> a separate patch.
>
>
Are you indicating that the above phrases do not require change because
those are consistent with other references. Or the other thread [1]
(renaming 'transaction log') should take care of it.

Regards,
Neha

[1]
https://www.postgresql.org/message-id/89ba433e-8990-0aad-238f-55e1d7280ece%402ndquadrant.com

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-05-11 Thread Masahiko Sawada

On Thu, May 11, 2017 at 6:16 PM, Petr Jelinek
 wrote:
> On 11/05/17 10:10, Masahiko Sawada wrote:
>> On Thu, May 11, 2017 at 4:06 PM, Michael Paquier
>>  wrote:
>>> On Wed, May 10, 2017 at 11:57 AM, Masahiko Sawada  
>>> wrote:
 Barring any objections, I'll add these two issues to open item.
>>>
>>> It seems to me that those open items have not been added yet to the
>>> list. If I am following correctly, they could be defined as follows:
>>> - Dropping subscription may stuck if done during tablesync.
>>> -- Analyze deadlock issues with DROP SUBSCRIPTION and apply worker process.
>
> I think the solution to this is to reintroduce the LWLock that was
> removed and replaced with the exclusive lock on catalog [1]. I am afraid
> that correct way of handling this is to do both LWLock and catalog lock
> (first LWLock under which we kill the workers and then catalog lock so
> that something that prevents launcher from restarting them is held till
> the end of transaction).

I agree to reintroduce LWLock and to stop logical rep worker first and
then modify catalog. That way we can reduce catalog lock level (maybe
to RowExclusiveLock) so that apply worker can see it. Also I think
that we need to do more things like in order to prevent that we keep
to hold LWLock until end of transaction, because holding LWLock until
end of transaction is not good idea and could be cause of deadlock. So
for example we can commit the transaction in DropSubscription after
cleaned pg_subscription record and all its dependencies and then start
new transaction for the remaining work. Of course we also need to
disallow DROP SUBSCRIPTION being executed in a user transaction
though.

>
>>> -- Avoid orphaned tablesync worker if apply worker exits before
>>> changing its status.
>>
>
> The behavior question I have about this is if sync workers should die
> when apply worker dies (ie they are tied to apply worker) or if they
> should be tied to the subscription.
>
> I guess taking down all the sync workers when apply worker has exited is
> easier to solve. Of course it means that if apply worker restarts in
> middle of table synchronization, the table synchronization will have to
> start from scratch. That being said, in normal operation apply worker
> should only exit/restart if subscription has changed or has been
> dropped/disabled and I think sync workers want to exit/restart in that
> situation as well.

I agree that sync workers are tied to the apply worker.

>
> So for example having shmem detach hook for an apply worker (or reusing
> the existing one) that searches for all the other workers for same
> subscription and shuts them down as well sounds like solution to this.

Seems reasonable solution.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 10:15 PM, Amit Langote
 wrote:
> On 2017/05/12 10:42, Robert Haas wrote:
>> On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  wrote:
>>> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
>>> for hash also, right?
>>
>> I think it should.
>>
>> Actually, I think that not supporting nulls for range partitioning may
>> have been a fairly bad decision.
>
> I think the relevant discussion concluded [1] that way, because we
> couldn't decide which interface to provide for specifying where NULLs are
> placed or because we decided to think about it later.

Yeah, but I have a feeling that marking the columns NOT NULL is going
to make it really hard to support that in the future when we get the
syntax hammered out.  If it had only affected the partition
constraints that'd be different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-11 Thread Michael Paquier

On Fri, May 12, 2017 at 9:47 AM, Bossart, Nathan  wrote:
> Attached is a more complete first attempt at adding this functionality.  I 
> added two node types: one for parsing the “relation and columns” list in the 
> grammar, and one for holding the relation information we need for each call 
> to vacuum_rel(…)/analyze_rel(…).  I also added assertions and comments for 
> some undocumented assumptions that we currently rely upon.
>
> Adjustments to the documentation for VACUUM/ANALYZE and new checks in the 
> VACUUM regression test are included in this patch as well.
>
> Looking forward to any feedback that you have.

Browsing the code

 
-ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
+ANALYZE [ VERBOSE ] [ table_name [ [ ( column_name [, ...] ) ] [, ...] ]
 
It seems to me that you don't need those extra square brackets around
the column list. Same remark for vacuum.sgml.

 
  
-  The name (possibly schema-qualified) of a specific table to
+  The name (possibly schema-qualified) of the specific tables to
   analyze.  If omitted, all regular tables, partitioned tables, and
   materialized views in the current database are analyzed (but not
-  foreign tables).  If the specified table is a partitioned table, both the
+  foreign tables).  If a specified table is a partitioned table, both the
   inheritance statistics of the partitioned table as a whole and
   statistics of the individual partitions are updated.
  
Don't think that's needed. table_name is still referencing a single
table name. And similar remark for vacuum.sgml.

In short for all that, it is enough to have "[, ... ]" to document
that a list is accepted.

/* Now go through the common routine */
-   vacuum(vacstmt->options, vacstmt->relation, InvalidOid, ,
-  vacstmt->va_cols, NULL, isTopLevel);
+   vacuum(vacstmt->options, vacstmt->rels, InvalidOid, ,
+  NULL, isTopLevel);
 }
It seems to me that it would have been less invasive to loop through
vacuum() for each relation. Do you foresee advantages in allowing
vacuum() to handle multiple? I am not sure if is is worth complicating
the current logic more considering that you have as well so extra
logic to carry on option values.

+ * used for error messages.  In the case that relid is valid, rels
+ * must have exactly one element corresponding to the specified relid.
s/rels/relations/ as variable name?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Amit Langote

On 2017/05/12 10:42, Robert Haas wrote:
> On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  wrote:
>> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
>> for hash also, right?
> 
> I think it should.
> 
> Actually, I think that not supporting nulls for range partitioning may
> have been a fairly bad decision.

I think the relevant discussion concluded [1] that way, because we
couldn't decide which interface to provide for specifying where NULLs are
placed or because we decided to think about it later.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CA%2BTgmoZN_Zf7MBb48O66FAJgFe0S9_NkLVeQNBz6hsxb6Og93w%40mail.gmail.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Addition of pg_dump --no-publications

2017-05-11 Thread Michael Paquier

On Thu, May 11, 2017 at 3:19 PM, Michael Paquier
 wrote:
> I imagine that pg_dump -s would be the basic operation that users
> would do first before creating a subcription on a secondary node, but
> what I find surprising is that publications are dumped by default. I
> don't find confusing that those are actually included by default to be
> consistent with the way subcriptions are handled, what I find
> confusing is that there are no options to not dump them, and no
> options to bypass their restore.
>
> So, any opinions about having pg_dump/pg_restore --no-publications?

And that's really a boring patch, giving the attached.
-- 
Michael


pgdump-no-pubs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-05-11 Thread Robert Haas

On Wed, May 10, 2017 at 8:39 PM, Masahiko Sawada  wrote:
> Currently, the relation extension lock is implemented using
> heavyweight lock manager and almost functions (except for
> brin_page_cleanup) using LockRelationForExntesion use it with
> ExclusiveLock mode. But actually it doesn't need multiple lock modes
> or deadlock detection or any of the other functionality that the
> heavyweight lock manager provides. I think It's enough to use
> something like LWLock. So I'd like to propose to change relation
> extension lock management so that it works using LWLock instead.

That's not a good idea because it'll make the code that executes while
holding that lock noninterruptible.

Possibly something based on condition variables would work better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump --table and --exclude-table for declarative partition table handling.

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 9:33 AM, Tom Lane  wrote:
> Uh ... what in that is creating the already-extant parent?

/me looks embarrassed.

Never mind.  I didn't read what you wrote carefully enough.

>> I think one answer to the original complaint might be to add a new
>> flag to pg_dump, something like --recursive-selection, maybe -r for
>> short, which makes --table, --exclude-table, and --exclude-table-data
>> cascade to inheritance descendents.
>
> Yeah, you could do it like that.  Another way to do it would be to
> create variants of all the selection switches, along the lines of
> "--table-all=foo" meaning "foo plus its children".  Then you could
> have some switches recursing and others not within the same command.
> But maybe that's more flexibility than needed ... and I'm having a
> hard time coming up with nice switch names, anyway.

I don't think that's as good.  It's a lot more typing than what I
proposed and I don't think anyone is really going to want the
flexibility.

> Anyway, I'm still of the opinion that it's fine to leave this as a
> future feature.  If we've gotten away without it this long for
> inherited tables, it's unlikely to be critical for partitioned tables.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-11 Thread Bossart, Nathan

On 5/11/17, 6:32 PM, "Tom Lane"  wrote:
> You probably won't get much in the near term, because we're in
> stabilize-the-release mode not develop-new-features mode.
> Please add your patch to the pending commitfest
> https://commitfest.postgresql.org/14/
> so that we remember to look at it when the time comes.

Understood.  I’ve added it to the commitfest.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 12:02 PM, Dilip Kumar  wrote:
> We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
> for hash also, right?

I think it should.

Actually, I think that not supporting nulls for range partitioning may
have been a fairly bad decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Row Level Security Documentation

2017-05-11 Thread Rod Taylor

Of course, better thoughts appear immediately after hitting the send button.

This version of the table attempts to stipulate which section of the
process the rule applies to.

On Thu, May 11, 2017 at 8:40 PM, Rod Taylor  wrote:

> I think the biggest piece missing is something to summarize the giant
> blocks of text.
>
> Attached is a table that has commands and policy types, and a "yes" if it
> applies.
>
> --
> Rod Taylor
>



-- 
Rod Taylor
diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 3b24e5e95e..4c997a956d 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -389,6 +389,72 @@ CREATE POLICY name ON 
 

+
+   Policies Applied During Statement
+
+ 
+  
+   Policy
+   SELECT
+   INSERT
+   UPDATE
+   DELETE
+  
+ 
+ 
+  
+   FOR ALL ... USING
+   WHERE clause
+   RETURNING clause
+   WHERE and RETURNING clause
+   WHERE and RETURNING clause
+  
+  
+   FOR ALL ... WITH CHECK
+   
+   new tuple
+   new tuple
+   new tuple
+  
+  
+   FOR SELECT ... USING
+   WHERE clause
+   RETURNING clause
+   WHERE and RETURNING clause
+   WHERE and RETURNING clause
+  
+  
+   FOR INSERT ... WITH CHECK
+   
+   new tuple
+   
+   
+  
+  
+   FOR UPDATE ... USING
+   
+   
+   WHERE clause
+   
+  
+  
+   FOR UPDATE ... WITH CHECK
+   
+   
+   new tuple
+   
+  
+  
+   FOR DELETE ... USING
+   
+   
+   
+   WHERE clause
+  
+ 
+
+   
+
   
  
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-05-11 Thread Tom Lane

"Bossart, Nathan"  writes:
> Looking forward to any feedback that you have.

You probably won't get much in the near term, because we're in
stabilize-the-release mode not develop-new-features mode.
Please add your patch to the pending commitfest
https://commitfest.postgresql.org/14/
so that we remember to look at it when the time comes.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund

On 2017-05-11 14:54:26 -0700, Andres Freund wrote:
> On 2017-05-11 14:51:55 -0700,  wrote:
> > On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> > > I plan to commit the next pending patch after the back branch releases
> > > are cut - it's an invasive fix and the issue doesn't cause corruption
> > > "just" slow slot creation. So it seems better to wait for a few days,
> > > rather than hurry it into the release.
> > 
> > Now that that's done, here's an updated version of that patch.  Note the
> > new logic to trigger xl_running_xact's to be logged at the right spot.
> > Works well in my testing.
> > 
> > I plan to commit this fairly soon, unless somebody wants a bit more time
> > to look into it.
> > 
> > Unless somebody protests, I'd like to slightly revise how the on-disk
> > snapshots are stored on master - given the issues this bug/commit showed
> > with the current method - but I can see one could argue that that should
> > rather be done next release.
> 
> As usual I forgot my attachement.

This patch also seems to offer a way to do your 0005 in, possibly, more
efficient manner.  We don't ever need to assume a transaction is
timetravelling anymore.  Could you check whether the attached, hasty,
patch resolves the performance problems you measured?  Also, do you have
a "reference" workload for that?

Regards,

Andres
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 0f2dcb84be..4ddd10fcf0 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -929,21 +929,31 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 {
 	int			nxact;
 
-	bool		forced_timetravel = false;
+	bool		needs_snapshot = false;
+	bool		needs_timetravel = false;
+
 	bool		sub_needs_timetravel = false;
-	bool		top_needs_timetravel = false;
 
 	TransactionId xmax = xid;
 
+	if (builder->state == SNAPBUILD_START)
+		return;
+
+
 	/*
-	 * If we couldn't observe every change of a transaction because it was
-	 * already running at the point we started to observe we have to assume it
-	 * made catalog changes.
-	 *
-	 * This has the positive benefit that we afterwards have enough
-	 * information to build an exportable snapshot that's usable by pg_dump et
-	 * al.
+	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
+	 * will it be part of a snapshot.  So we don't even need to record
+	 * anything.
 	 */
+	if (builder->state == SNAPBUILD_BUILDING_SNAPSHOT &&
+		TransactionIdPrecedes(xid, SnapBuildNextPhaseAt(builder)))
+	{
+		/* ensure that only commits after this are getting replayed */
+		if (builder->start_decoding_at <= lsn)
+			builder->start_decoding_at = lsn + 1;
+		return;
+	}
+
 	if (builder->state < SNAPBUILD_CONSISTENT)
 	{
 		/* ensure that only commits after this are getting replayed */
@@ -951,12 +961,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			builder->start_decoding_at = lsn + 1;
 
 		/*
-		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
-		 * timetravel ones, but we want to be able to export a snapshot when
-		 * we reached consistency.
+		 * If we're building an exportable snapshot, force recording of the
+		 * xid, even if the transaction doesn't modify the catalog.
 		 */
-		forced_timetravel = true;
-		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
+		if (builder->building_full_snapshot)
+		{
+			needs_timetravel = true;
+		}
 	}
 
 	for (nxact = 0; nxact < nsubxacts; nxact++)
@@ -964,23 +975,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		TransactionId subxid = subxacts[nxact];
 
 		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state.
-		 */
-		if (forced_timetravel)
-		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-xmax = subxid;
-		}
-
-		/*
 		 * Add subtransaction to base snapshot if it DDL, we don't distinguish
 		 * to toplevel transactions there.
 		 */
-		else if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
 		{
 			sub_needs_timetravel = true;
+			needs_snapshot = true;
 
 			elog(DEBUG1, "found subtransaction %u:%u with catalog changes.",
  xid, subxid);
@@ -990,21 +991,26 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			if (NormalTransactionIdFollows(subxid, xmax))
 xmax = subxid;
 		}
+		/*
+		 * If we're forcing timetravel we also need visibility information
+		 * about subtransaction, so keep track of subtransaction's state, even
+		 * if not catalog modifying.
+		 */
+		else if (needs_timetravel)
+		{
+			SnapBuildAddCommittedTxn(builder, subxid);
+			if (NormalTransactionIdFollows(subxid, xmax))
+xmax = subxid;
+		}
 	}
 
-	if

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian

On Thu, May 11, 2017 at 10:41:45PM +0300, Konstantin Knizhnik wrote:
> This is why I have provided second implementation which replace
> literals with parameters after raw parsing.  Certainly it is slower
> than first approach. But still provide significant advantage in
> performance: more than two times at pgbench.  Then I tried to run
> regression tests and find several situations where type analysis is
> not correctly performed in case of replacing literals with parameters.

So the issue is that per-command output from the parser, SelectStmt,
only has strings for identifers, e.g. table and column names, so you
can't be sure it is the same as the cached entry you matched.  I suppose
if you cleared the cache every time someone created an object or changed
search_path, it might work.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Row Level Security Documentation

2017-05-11 Thread Rod Taylor

I think the biggest piece missing is something to summarize the giant
blocks of text.

Attached is a table that has commands and policy types, and a "yes" if it
applies.

-- 
Rod Taylor
diff --git a/doc/src/sgml/ref/create_policy.sgml b/doc/src/sgml/ref/create_policy.sgml
index 3b24e5e95e..c737f9e884 100644
--- a/doc/src/sgml/ref/create_policy.sgml
+++ b/doc/src/sgml/ref/create_policy.sgml
@@ -389,6 +389,96 @@ CREATE POLICY name ON 
 

+
+   Policies Applied During Statement
+
+ 
+  
+   Policy
+   SELECT
+   INSERT
+   INSERT RETURNING
+   UPDATE WHERE
+   UPDATE RETURNING
+   DELETE WHERE
+   DELETE RETURNING
+  
+ 
+ 
+  
+   FOR ALL ... USING
+   yes
+   
+   
+   yes
+   yes
+   yes
+   yes
+  
+  
+   FOR ALL ... WITH CHECK
+   
+   yes
+   yes
+   yes
+   yes
+   
+   
+  
+  
+   FOR SELECT ... USING
+   yes
+   
+   yes
+   yes
+   yes
+   yes
+   yes
+  
+  
+   FOR INSERT ... WITH CHECK
+   
+   yes
+   yes
+   
+   
+   
+   
+  
+  
+   FOR UPDATE ... USING
+   
+   
+   
+   yes
+   yes
+   
+   
+  
+  
+   FOR UPDATE ... WITH CHECK
+   
+   
+   
+   yes
+   yes
+   
+   
+  
+  
+   FOR DELETE ... USING
+   
+   
+   
+   
+   
+   yes
+   yes
+  
+ 
+
+   
+
   
  
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A design for amcheck heapam verification

2017-05-11 Thread Peter Geoghegan

On Mon, May 1, 2017 at 6:39 PM, Peter Geoghegan  wrote:
> On Mon, May 1, 2017 at 6:20 PM, Tom Lane  wrote:
>> Maybe you can fix this by assuming that your own session's advertised xmin
>> is a safe upper bound on everybody else's RecentGlobalXmin.  But I'm not
>> sure if that rule does what you want.
>
> That's what you might ultimately need to fall back on (that, or
> perhaps repeated calls to GetOldestXmin() to recheck, in the style of
> pg_visibility).

I spent only a few hours writing a rough prototype, and came up with
something that does an IndexBuildHeapScan() scan following the
existing index verification steps. Its amcheck callback does an
index_form_tuple() call, hashes the resulting IndexTuple (heap TID and
all), and tests it for membership of a bloom filter generated as part
of the main B-Tree verification phase. The IndexTuple memory is freed
immediately following hashing.

The general idea here is that whatever IndexTuples ought to be in the
index following a fresh REINDEX also ought to have been found in the
index already. IndexBuildHeapScan() takes care of almost all of the
details for us.

I think I can do this correctly when only an AccessShareLock is
acquired on heap + index, provided I also do a separate
GetOldestXmin() before even the index scan begins, and do a final
recheck of the saved GetOldestXmin() value against heap tuple xmin
within the new IndexBuildHeapScan() callback (if we still think that
it should have been found by the index scan, then actually throw a
corruption related error). When there is only a ShareLock (for
bt_parent_index_check() calls), the recheck isn't necessary. I think I
should probably also make the IndexBuildHeapScan()-passed indexInfo
structure "ii_Unique = false", since waiting for the outcome of a
concurrent conflicting unique index insertion isn't useful, and can
cause deadlocks.

While I haven't really made my mind up, this design is extremely
simple, and effectively tests IndexBuildHeapScan() at the same time as
everything else. The addition of the bloom filter itself isn't
trivial, but the code added to verify_nbtree.c is.

The downside of going this way is that I cannot piggyback other types
of heap verification on the IndexBuildHeapScan() scan. Still, perhaps
it's worth it. Perhaps I should implement this bloom-filter-index-heap
verification step as one extra option for the existing B-Tree
verification functions. I may later add new verification functions
that examine and verify the heap and related SLRUs alone.

-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Tom Lane

Alvaro Herrera  writes:
> BTW the new castNode() family of macros don't work with Value nodes
> (because the tags are different depending on what's stored, but each
> type does not have its own struct.  Oh well.)

Yeah.  Value nodes are pretty much of a wart --- it's not clear to me
that they add anything compared to just having a separate node type
for each of the possible subclasses.  It's never bothered anyone enough
to try to replace them, though.

I'll take a look at the patch later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Alvaro Herrera

Tom Lane wrote:
> Alvaro Herrera  writes:
> > Tom Lane wrote:
> >> Have you thought further about the upthread suggestion to just borrow
> >> SELECT's syntax lock stock and barrel?
> 
> > Bison seems to like the productions below.  Is this what you had in
> > mind?  These mostly mimic joined_table and table_ref, stripping out the
> > rules that we don't need.
> 
> I'd suggest just using the from_list production and then complaining
> at runtime if what you get is too complicated.  Otherwise, you have
> to maintain a duplicate set of productions, and you're going to be
> unable to throw anything more informative than "syntax error" when
> somebody tries to exceed the implementation limits.

Hmm, yeah, makes sense.  Here's a patch for this approach.  I ended up
using on ON again for the list of columns.  I suppose the checks in
CreateStatistics() could still be improved, but I'd go ahead and push
this tomorrow morning and we can hammer those details later on, if it's
still needed.  Better avoid shipping beta with outdated grammar ...

BTW the new castNode() family of macros don't work with Value nodes
(because the tags are different depending on what's stored, but each
type does not have its own struct.  Oh well.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index b10b734b90..32e17ee5f8 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1132,8 +1132,8 @@ WHERE tablename = 'road';
  To inspect functional dependencies on a statistics
  stts, you may do this:
 
-CREATE STATISTICS stts WITH (dependencies)
-   ON (zip, city) FROM zipcodes;
+CREATE STATISTICS stts (dependencies)
+   ON zip, city FROM zipcodes;
 ANALYZE zipcodes;
 SELECT stxname, stxkeys, stxdependencies
   FROM pg_statistic_ext
@@ -1219,8 +1219,8 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 
AND b = 10;
  Continuing the above example, the n-distinct coefficients in a ZIP
  code table may look like the following:
 
-CREATE STATISTICS stts2 WITH (ndistinct)
-   ON (zip, state, city) FROM zipcodes;
+CREATE STATISTICS stts2 (ndistinct)
+   ON zip, state, city FROM zipcodes;
 ANALYZE zipcodes;
 SELECT stxkeys AS k, stxndistinct AS nd
   FROM pg_statistic_ext
diff --git a/doc/src/sgml/planstats.sgml b/doc/src/sgml/planstats.sgml
index 11580bfd22..ef847b9633 100644
--- a/doc/src/sgml/planstats.sgml
+++ b/doc/src/sgml/planstats.sgml
@@ -526,7 +526,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 
AND b = 1;
 multivariate statistics on the two columns:
 
 
-CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t;
+CREATE STATISTICS stts (dependencies) ON a, b FROM t;
 ANALYZE t;
 EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
   QUERY PLAN   
@@ -569,7 +569,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP 
BY a, b;
 calculation, the estimate is much improved:
 
 DROP STATISTICS stts;
-CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t;
+CREATE STATISTICS stts (dependencies, ndistinct) ON a, b FROM t;
 ANALYZE t;
 EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b;
QUERY PLAN  
  
diff --git a/doc/src/sgml/ref/create_statistics.sgml 
b/doc/src/sgml/ref/create_statistics.sgml
index edbcf5840b..84ff52df04 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -22,8 +22,8 @@ PostgreSQL documentation
  
 
 CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
-WITH ( option [= value] [, ... ] )
-ON ( column_name, 
column_name [, ...])
+[ ( statistic_type [, ... ] ) 
]
+ON column_name, column_name [, ...]
 FROM table_name
 
 
@@ -75,6 +75,19 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na

 

+statistic_type
+
+ 
+  A statistic type to be enabled for this statistics.  Currently
+  supported types are ndistinct, which enables
+  n-distinct coefficient tracking,
+  and dependencies, which enables functional
+  dependencies.
+ 
+
+   
+
+   
 column_name
 
  
@@ -94,42 +107,6 @@ CREATE STATISTICS [ IF NOT EXISTS ] statistics_na

 
   
-
-  
-   Parameters
-
- 
-  statistics parameters
- 
-
-   
-The WITH clause can specify options
-for the statistics. Available options are listed below.
-   
-
-   
-
-   
-dependencies (boolean)
-
- 
-  Enables functional dependencies for the statistics.
- 
-
-   
-
-   
-ndistinct (boolean)
-
- 
-  Enables ndistinct coefficients for the statistics.
- 
-
-   
-
-   
-
-  
  
 
  
@@ -158,7 +135,7 @@ CREATE TABLE t1 (
 INSERT INTO

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> >>> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> >>> From: Petr Jelinek 
> >>> Date: Sun, 26 Feb 2017 01:07:33 +0100
> >>> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
> >>
> >> A bit more commentary would be good. What does that protect us against?
> >>
> > 
> > I think I explained that in the email. We might export snapshot with
> > xmin smaller than global xmin otherwise.
> > 
> 
> Updated commit message with explanation as well.


> From ae60b52ae0ca96bc14169cd507f101fbb5dfdf52 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek 
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
> 
> Logical decoding snapshot builder may encounter xl_running_xacts with
> older xmin than the xmin of the builder. This can happen because
> LogStandbySnapshot() sometimes sees already committed transactions as
> running (there is difference between "running" in terms for WAL and in
> terms of ProcArray). When this happens we must make sure that the xmin
> of snapshot builder won't go back otherwise the resulting snapshot would
> show some transaction as running even though they have already
> committed.
> ---
>  src/backend/replication/logical/snapbuild.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/replication/logical/snapbuild.c 
> b/src/backend/replication/logical/snapbuild.c
> index ada618d..3e34f75 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -1165,7 +1165,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, 
> XLogRecPtr lsn, xl_running_xact
>* looking, it's correct and actually more efficient this way since we 
> hit
>* fast paths in tqual.c.
>*/
> - builder->xmin = running->oldestRunningXid;
> + if (TransactionIdFollowsOrEquals(running->oldestRunningXid, 
> builder->xmin))
> + builder->xmin = running->oldestRunningXid;
>  
>   /* Remove transactions we don't need to keep track off anymore */
>   SnapBuildPurgeCommittedTxn(builder);
> -- 
> 2.7.4

I still don't understand.  The snapshot's xmin is solely managed via
xl_running_xacts, so I don't see how the WAL/procarray difference can
play a role here.  ->committed isn't pruned before xl_running_xacts
indicates it can be done, so I don't understand your explanation above.

I'd be ok with adding this as paranoia check, but I still don't
understand when it could practically be hit.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund

On 2017-05-11 14:51:55 -0700,  wrote:
> On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> > I plan to commit the next pending patch after the back branch releases
> > are cut - it's an invasive fix and the issue doesn't cause corruption
> > "just" slow slot creation. So it seems better to wait for a few days,
> > rather than hurry it into the release.
> 
> Now that that's done, here's an updated version of that patch.  Note the
> new logic to trigger xl_running_xact's to be logged at the right spot.
> Works well in my testing.
> 
> I plan to commit this fairly soon, unless somebody wants a bit more time
> to look into it.
> 
> Unless somebody protests, I'd like to slightly revise how the on-disk
> snapshots are stored on master - given the issues this bug/commit showed
> with the current method - but I can see one could argue that that should
> rather be done next release.

As usual I forgot my attachement.

- Andres
>From 3ecec368e3015f5082f120b1a428d1108203a356 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 4 May 2017 16:48:00 -0700
Subject: [PATCH] Fix race condition leading to hanging logical slot creation.

The snapshot assembly during the creation of logical slots relied
waiting for transactions in xl_running_xacts to end, by checking for
their commit/abort records.  Unfortunately, despite locking, it is
possible to see an xl_running_xact record listing transactions as
ready, that have already WAL-logged an commit/abort record, as the
locking just prevents the ProcArray to be adjusted, and the commit
record has to be logged first.

That lead to either delayed or hanging snapshot creation, because
snapbuild.c would wait "forever" to see commit/abort records for some
transactions.  That hang resolved only if a xl_running_xacts record
without any running transactions happened to be logged, far from
certain on a busy server.

It's impractical to prevent that via more heavyweight locking, the
likelihood of deadlocks and significantly increased contention would
be too big.

Instead change the initial snapshot creation to be solely based on
tracking the oldest running transaction via
xl_running_xacts->oldestRunningXid - that actually ends up
significantly simplifying the code.  That has two disadvantages:
1) Because we cannot fully "trust" the contents of xl_running_xacts,
   we cannot use it to build the initial snapshot.  Instead we have to
   wait twice for all running transactions to finish.
2) Previously a slot, unless the race occurred, could be created when
   the all transaction perceived as running based on commit/abort
   records, now we have to wait for the next xl_running_xacts record.
To address that, trigger logging new xl_running_xacts record from
within snapbuild.c exactly when necessary.

Unfortunately snabuild.c's SnapBuild is stored on disk, one of the
stupider ideas of a certain Mr Freund, so we can't change it in a
minor release.  As this is going to be backpatched, we have to play
around a bit to keep on-disk compatibility.  A later commit will
rejigger that on master.

Author: Andres Freund, based on a quite different patch from Petr Jelinek
Analyzed-By: Petr Jelinek
Reviewed-By: Petr Jelinek
Discussion: https://postgr.es/m/f37e975c-908f-858e-707f-058d3b1eb...@2ndquadrant.com
Backpatch: 9.4-, where logical decoding has been introduced
---
 contrib/test_decoding/expected/ondisk_startup.out |  15 +-
 contrib/test_decoding/specs/ondisk_startup.spec   |   8 +-
 src/backend/replication/logical/decode.c  |   3 -
 src/backend/replication/logical/reorderbuffer.c   |   2 +-
 src/backend/replication/logical/snapbuild.c   | 416 ++
 src/include/replication/snapbuild.h   |  25 +-
 6 files changed, 220 insertions(+), 249 deletions(-)

diff --git a/contrib/test_decoding/expected/ondisk_startup.out b/contrib/test_decoding/expected/ondisk_startup.out
index 65115c830a..c7b1f45b46 100644
--- a/contrib/test_decoding/expected/ondisk_startup.out
+++ b/contrib/test_decoding/expected/ondisk_startup.out
@@ -1,21 +1,30 @@
 Parsed test spec with 3 sessions

-starting permutation: s2txid s1init s3txid s2alter s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
-step s2txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+starting permutation: s2b s2txid s1init s3b s3txid s2alter s2c s2b s2txid s3c s2c s1insert s1checkpoint s1start s1insert s1alter s1insert s1start
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
 ?column?   

 f  
 step s1init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding'); 
-step s3txid: BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT txid_current() IS NULL;
+step s3b: BEGIN;
+step s3txid: SELECT txid_current() IS NULL;
 ?column?   

 f  
 step s2alter: ALTER TABLE do_write ADD COLUMN addedbys2 int;
 step s2c: COMMIT;
+step s2b: BEGIN;
+step s2txid: SELECT txid_current() IS NULL;
+?column?

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Peter Geoghegan

On Thu, May 11, 2017 at 2:51 PM, Andres Freund  wrote:
> Now that that's done, here's an updated version of that patch.  Note the
> new logic to trigger xl_running_xact's to be logged at the right spot.
> Works well in my testing.

You forgot the patch. :-)


-- 
Peter Geoghegan

VMware vCenter Server
https://www.vmware.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Andres Freund

On 2017-05-08 00:10:12 -0700, Andres Freund wrote:
> I plan to commit the next pending patch after the back branch releases
> are cut - it's an invasive fix and the issue doesn't cause corruption
> "just" slow slot creation. So it seems better to wait for a few days,
> rather than hurry it into the release.

Now that that's done, here's an updated version of that patch.  Note the
new logic to trigger xl_running_xact's to be logged at the right spot.
Works well in my testing.

I plan to commit this fairly soon, unless somebody wants a bit more time
to look into it.

Unless somebody protests, I'd like to slightly revise how the on-disk
snapshots are stored on master - given the issues this bug/commit showed
with the current method - but I can see one could argue that that should
rather be done next release.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE inlining

2017-05-11 Thread Yaroslav

Ilya Shkuratov wrote
> First of all, to such replacement to be valid, the CTE must be 
> 1. non-writable (e.g. be of form: SELECT ...),
> 2. do not use VOLATILE or STABLE functions,
> 3. ... (maybe there must be more restrictions?) 

What about simple things like this?

CREATE OR REPLACE FUNCTION z(numeric) RETURNS boolean AS $$
BEGIN
RETURN $1 <> 0;
END;
$$ LANGUAGE plpgSQL IMMUTABLE COST 1000;

-- This one works:
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
), a AS (
SELECT *
  FROM t
 WHERE z(v2)
)
SELECT *
  FROM a
 WHERE v1/v2 > 1.5;

-- This one gives 'division by zero':
WITH T AS (
SELECT 1.0 AS v1, 0.0 AS v2
UNION ALL
SELECT 3.0, 1.0
UNION ALL
SELECT 2.0, 0.0
)
SELECT *
  FROM (
   SELECT *
 FROM t
WHERE z(v2)
   ) AS a
 WHERE v1/v2 > 1.5;




-
WBR, Yaroslav Schekin.
--
View this message in context: 
http://www.postgresql-archive.org/CTE-inlining-tp5958992p5961086.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Andres Freund

On 2017-05-11 17:21:18 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > I ran this script
> 
> > CREATE SEQUENCE seq1;
> 
> > DO LANGUAGE plpythonu $$
> > plan = plpy.prepare("SELECT nextval('seq1')")
> > for i in range(0, 1000):
> > plpy.execute(plan)
> > $$;
> 
> > and timed the "DO".
> 
> It occurred to me that plpy.execute is going to run a subtransaction for
> each call, which makes this kind of a dubious test case, both because of
> the extra subtransaction overhead and because subtransaction lock
> ownership effects will possibly skew the results.  So I tried to
> reproduce the test using plain plpgsql,
> 
> CREATE SEQUENCE IF NOT EXISTS seq1;
> 
> \timing on
> 
> do $$
> declare x int;
> begin
>   for i in 0 .. 1000 loop
> x := nextval('seq1');
>   end loop;
> end
> $$;
> 
> On HEAD, building without cassert, I got a fairly reproducible result
> of about 10.6 seconds.  I doubt my machine is 6X faster than yours,
> so this indicates that the subtransaction overhead is pretty real.

Isn't that pretty much the point?  The whole open_share_lock()
optimization looks like it really only can make a difference with
subtransactions?


> > I compared the stock releases with a patched version that replaces the
> > body of open_share_lock() with just
> > return relation_open(seq->relid, AccessShareLock);
> 
> Hm.  I don't think that's a sufficient code change, because if you do it
> like that then the lock remains held after nextval() returns.

Hm?  That's not new, is it?  We previously did a LockRelationOid(seq->relid) and
then relation_open(seq->relid, NoLock)?


> This means
> that (a) subsequent calls aren't hitting the shared lock manager at all,
> they're just incrementing a local lock counter, and whether it would be
> fastpath or not is irrelevant; and (b) this means that the semantic issues
> Andres is worried about remain in place, because we will hold the lock
> till transaction end.

My concern was aborted subtransactions, and those should release their
lock via resowner stuff at abort?


> I experimented with something similar, just replacing open_share_lock
> as above, and I got runtimes of just about 12 seconds, which surprised
> me a bit.

Yea, I suspect we'd need something like the current open_share_lock,
except with the resowner trickery removed.


> I'd have thought the locallock-already-exists code path would be
> faster than that.

It's a hash-table lookup, via dynahash no less, that's definitley going
to be more expensive than a single seq->lxid != thislxid...


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-05-11 16:27:48 -0400, Peter Eisentraut wrote:
>> (So without contention fast-path locking beats the extra dance that
>> open_share_lock() does.)

> That's kind of surprising, I really wouldn't have thought it'd be faster
> without.  I guess it's the overhead of sigsetjmp().  Cool.

My results (posted nearby) lead me to suspect that the improvement
Peter sees from 9.1 to 9.2 has little to do with fastpath locking
and a lot to do with some improvement or other in subtransaction
lock management.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Tom Lane

Peter Eisentraut  writes:
> I ran this script

> CREATE SEQUENCE seq1;

> DO LANGUAGE plpythonu $$
> plan = plpy.prepare("SELECT nextval('seq1')")
> for i in range(0, 1000):
> plpy.execute(plan)
> $$;

> and timed the "DO".

It occurred to me that plpy.execute is going to run a subtransaction for
each call, which makes this kind of a dubious test case, both because of
the extra subtransaction overhead and because subtransaction lock
ownership effects will possibly skew the results.  So I tried to
reproduce the test using plain plpgsql,

CREATE SEQUENCE IF NOT EXISTS seq1;

\timing on

do $$
declare x int;
begin
  for i in 0 .. 1000 loop
x := nextval('seq1');
  end loop;
end
$$;

On HEAD, building without cassert, I got a fairly reproducible result
of about 10.6 seconds.  I doubt my machine is 6X faster than yours,
so this indicates that the subtransaction overhead is pretty real.

> I compared the stock releases with a patched version that replaces the
> body of open_share_lock() with just
> return relation_open(seq->relid, AccessShareLock);

Hm.  I don't think that's a sufficient code change, because if you do it
like that then the lock remains held after nextval() returns.  This means
that (a) subsequent calls aren't hitting the shared lock manager at all,
they're just incrementing a local lock counter, and whether it would be
fastpath or not is irrelevant; and (b) this means that the semantic issues
Andres is worried about remain in place, because we will hold the lock
till transaction end.

I experimented with something similar, just replacing open_share_lock
as above, and I got runtimes of just about 12 seconds, which surprised
me a bit.  I'd have thought the locallock-already-exists code path
would be faster than that.

I then further changed the code so that nextval_internal ends with
"relation_close(seqrel, AccessShareLock);" rather than NoLock,
so that the lock is actually released between calls.  This boosted
the runtime up to 15.5 seconds, or a 50% penalty over HEAD.

My conclusion is that in low-overhead cases, such as using a sequence
to assign default values during COPY IN, the percentage overhead from
acquiring and releasing the lock could be pretty dire.  Still, we might
not have much choice if we want nice semantics.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Andres Freund

Hi,


On 2017-05-11 16:27:48 -0400, Peter Eisentraut wrote:
> On 5/10/17 12:24, Andres Freund wrote:
> > Upthread I theorized whether
> > that's actually still meaningful given fastpath locking and such, but I
> > guess we'll have to evaluate that.
> 
> I did some testing.

That's with the open_share_lock stuff ripped out entirely, replaced by a
plain lock acquisition within the current subxact?


> (These were within each other's variance over several runs.)
> 
> 9.2 unpatched
> Time: 64868.305 ms
> 
> 9.2 patched
> Time: 60585.317 ms
> 
> (So without contention fast-path locking beats the extra dance that
> open_share_lock() does.)

That's kind of surprising, I really wouldn't have thought it'd be faster
without.  I guess it's the overhead of sigsetjmp().  Cool.


- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 5/10/17 12:24, Andres Freund wrote:
>> Upthread I theorized whether
>> that's actually still meaningful given fastpath locking and such, but I
>> guess we'll have to evaluate that.

> [ with or without contention, fast-path locking beats the extra dance that
> open_share_lock() does. ]

That is pretty cool.  It would be good to verify the same on master,
but assuming it holds up, I think it's ok to remove open_share_lock().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Andres Freund

On 2017-05-11 11:35:22 -0400, Peter Eisentraut wrote:
> On 5/10/17 12:24, Andres Freund wrote:
> > The issue isn't the strength, but that we currently have this weird
> > hackery around open_share_lock():
> > /*
> >  * Open the sequence and acquire AccessShareLock if needed
> >  *
> >  * If we haven't touched the sequence already in this transaction,
> >  * we need to acquire AccessShareLock.  We arrange for the lock to
> >  * be owned by the top transaction, so that we don't need to do it
> >  * more than once per xact.
> >  */
> > 
> > This'd probably need to be removed, as we'd otherwise would get very
> > weird semantics around aborted subxacts.
> 
> Can you explain in more detail what you mean by this?

Well, right now we don't do proper lock-tracking for sequences, always
assigning them to the toplevel transaction.  But that doesn't seem
proper when nextval() would conflict with ALTER SEQUENCE et al, because
then locks would continue to be held by aborted savepoints.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik


On 05/11/2017 10:52 PM, Andres Freund wrote:

On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:

On 05/11/2017 09:31 PM, Tom Lane wrote:

Bruce Momjian  writes:

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane

Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified 
query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by 
pg_parse_query.

Those numbers and your statement seem to contradict each other?


Oops, my parse error:( I incorrectly read Tom's statement.
Actually, I also was afraid that price of parsing is large enough and this is 
why my first attempt was to avoid parsing.
But then I find out that most of the time is spent in analyze and planning (see 
attached profile):

pg_parse_query: 4.23%
pg_analyze_and_rewrite 8.45%
pg_plan_queries: 15.49%



- Andres



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Peter Eisentraut

On 5/10/17 12:24, Andres Freund wrote:
> Upthread I theorized whether
> that's actually still meaningful given fastpath locking and such, but I
> guess we'll have to evaluate that.

I did some testing.

I ran this script

CREATE SEQUENCE seq1;

DO LANGUAGE plpythonu $$
plan = plpy.prepare("SELECT nextval('seq1')")
for i in range(0, 1000):
plpy.execute(plan)
$$;

and timed the "DO".

I compared 9.1 (pre fast-path locking) with 9.2 (with fast-path locking).

I compared the stock releases with a patched version that replaces the
body of open_share_lock() with just

return relation_open(seq->relid, AccessShareLock);

First, a single session:

9.1 unpatched
Time: 57634.098 ms

9.1 patched
Time: 58674.655 ms

(These were within each other's variance over several runs.)

9.2 unpatched
Time: 64868.305 ms

9.2 patched
Time: 60585.317 ms

(So without contention fast-path locking beats the extra dance that
open_share_lock() does.)

Then, with four concurrent sessions (one timed, three just running):

9.1 unpatched
Time: 73123.661 ms

9.1 patched
Time: 78019.079 ms

So the "hack" avoids the slowdown from contention here.

9.2 unpatched
Time: 73308.873 ms

9.2 patched
Time: 70353.971 ms

Here, fast-path locking beats out the "hack".

If I add more concurrent sessions, everything gets much slower but the
advantage of the patched version stays about the same.  But I can't
reliably test that on this hardware.

(One wonders whether these differences remain relevant if this is run
within the context of an INSERT or COPY instead of just a tight loop.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Andres Freund

On 2017-05-11 22:48:26 +0300, Konstantin Knizhnik wrote:
> On 05/11/2017 09:31 PM, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > Good point.  I think we need to do some measurements to see if the
> > > parser-only stage is actually significant.  I have a hunch that
> > > commercial databases have much heavier parsers than we do.
> > FWIW, gram.y does show up as significant in many of the profiles I take.
> > I speculate that this is not so much that it eats many CPU cycles, as that
> > the constant tables are so large as to incur lots of cache misses.  scan.l
> > is not quite as big a deal for some reason, even though it's also large.
> > 
> > regards, tom lane
> Yes, my results shows that pg_parse_query adds not so much overhead:
> 206k TPS for my first variant with string literal substitution and modified 
> query text used as hash key vs.
> 181k. TPS for version with patching raw parse tree constructed by 
> pg_parse_query.

Those numbers and your statement seem to contradict each other?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Have you thought further about the upthread suggestion to just borrow
>> SELECT's syntax lock stock and barrel?

> Bison seems to like the productions below.  Is this what you had in
> mind?  These mostly mimic joined_table and table_ref, stripping out the
> rules that we don't need.

I'd suggest just using the from_list production and then complaining
at runtime if what you get is too complicated.  Otherwise, you have
to maintain a duplicate set of productions, and you're going to be
unable to throw anything more informative than "syntax error" when
somebody tries to exceed the implementation limits.

> Note that I had to put back the FOR keyword in there; without the FOR,
> that is "CREATE STATS any_name opt_name_list expr_list FROM" there was
> one shift/reduce conflict, which I tried to solve by splitting
> opt_name_list using two rules (one with "'(' name_list ')'" and one
> without), but that gave rise to two reduce/reduce conflicts, and I didn't
> see any way to deal with them.

That's fine.  I was going to suggest using ON after the stats type list
just because it seemed to read better.  FOR is about as good.

> (Note that I ended up realizing that stats_type_list is unnecessary
> since we can use name_list.)

Check.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik


On 05/11/2017 09:31 PM, Tom Lane wrote:

Bruce Momjian  writes:

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane

Yes, my results shows that pg_parse_query adds not so much overhead:
206k TPS for my first variant with string literal substitution and modified 
query text used as hash key vs.
181k. TPS for version with patching raw parse tree constructed by 
pg_parse_query.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Konstantin Knizhnik


On 05/11/2017 06:12 PM, Bruce Momjian wrote:

On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:

I am going to continue work on this patch I will be glad to receive any
feedback and suggestions for its improvement.
In most cases, applications are not accessing Postgres directly, but using
some connection pooling layer and so them are not able to use prepared
statements.
But at simple OLTP Postgres spent more time on building query plan than on
execution itself. And it is possible to speedup Postgres about two times at
such workload!
Another alternative is true shared plan cache.  May be it is even more
perspective approach, but definitely much more invasive and harder to
implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:

Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.


Sorry, for luck of overview.
I have started with small prototype just to investigate if such optimization 
makes sense or not.
When I get more than two time advantage in performance on standard pgbench, I 
come to conclusion that this
optimization can be really very useful and now try to find the best way of its 
implementation.

I have started with simplest approach when string literals are replaced with 
parameters. It is done before parsing.
And can be done very fast - just need to locate data in quotes.
But this approach is not safe and universal: you will not be able to speedup 
most of the existed queries without rewriting them.

This is why I have provided second implementation which replace literals with 
parameters after raw parsing.
Certainly it is slower than first approach. But still provide significant 
advantage in performance: more than two times at pgbench.
Then I tried to run regression tests and find several situations where type 
analysis is not correctly performed in case of replacing literals with 
parameters.

So my third attempt is to replace constant nodes with parameters in already 
analyzed tree.

Now answering your questions:

*  What information is stored about cached plans?

Key to locate cached plan is raw parse tree. Value is saved CachedPlanSource.

*  How are the cached plans invalidated?

In the same way as plans for explicitly prepared statements.

*  How is a

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Alvaro Herrera

Tom Lane wrote:

> Have you thought further about the upthread suggestion to just borrow
> SELECT's syntax lock stock and barrel?  That is, it'd look something
> like
> 
> CREATE STATISTICS name [(list of stats types)] expression-list FROM ...
> [ WHERE ... ] [ WITH (options) ]
> 
> This would certainly support any FROM stuff we might want later.
> Yeah, you'd need to reject any features you weren't supporting at
> execution, but doing that would allow issuing a friendlier message than
> "syntax error" anyway.
> 
> If you don't have the cycles for that, I'd be willing to look into it.

Bison seems to like the productions below.  Is this what you had in
mind?  These mostly mimic joined_table and table_ref, stripping out the
rules that we don't need.

Note that I had to put back the FOR keyword in there; without the FOR,
that is "CREATE STATS any_name opt_name_list expr_list FROM" there was
one shift/reduce conflict, which I tried to solve by splitting
opt_name_list using two rules (one with "'(' name_list ')'" and one
without), but that gave rise to two reduce/reduce conflicts, and I didn't
see any way to deal with them.

(Note that I ended up realizing that stats_type_list is unnecessary
since we can use name_list.)


CreateStatsStmt:
CREATE opt_if_not_exists STATISTICS any_name
opt_name_list FOR expr_list FROM stats_joined_table
{
CreateStatsStmt *n = 
makeNode(CreateStatsStmt);
n->defnames = $4;
n->stat_types = $6;
n->if_not_exists = $2;

stmt->relation = $9;
stmt->keys = $7;
$$ = (Node *)n;
}
;

stats_joined_table:
  '(' stats_joined_table ')'
{
}
| stats_table_ref CROSS JOIN stats_table_ref
{
}
| stats_table_ref join_type JOIN stats_table_ref 
join_qual
{
}
| stats_table_ref JOIN stats_table_ref join_qual
{
}
| stats_table_ref NATURAL join_type JOIN stats_table_ref
{
}
| stats_table_ref NATURAL JOIN table_ref
{
}
;

stats_table_ref:
relation_expr opt_alias_clause
{
}
| stats_joined_table
{
}
| '(' stats_joined_table ')' alias_clause
{
}
;



-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Andres Freund



On May 11, 2017 11:31:02 AM PDT, Tom Lane  wrote:
>Bruce Momjian  writes:
>> Good point.  I think we need to do some measurements to see if the
>> parser-only stage is actually significant.  I have a hunch that
>> commercial databases have much heavier parsers than we do.
>
>FWIW, gram.y does show up as significant in many of the profiles I
>take.
>I speculate that this is not so much that it eats many CPU cycles, as
>that
>the constant tables are so large as to incur lots of cache misses. 
>scan.l
>is not quite as big a deal for some reason, even though it's also
>large.

And that there's a lot of unpredictable branches, leading to a lot if pipeline 
stalls.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Tom Lane

Bruce Momjian  writes:
> Good point.  I think we need to do some measurements to see if the
> parser-only stage is actually significant.  I have a hunch that
> commercial databases have much heavier parsers than we do.

FWIW, gram.y does show up as significant in many of the profiles I take.
I speculate that this is not so much that it eats many CPU cycles, as that
the constant tables are so large as to incur lots of cache misses.  scan.l
is not quite as big a deal for some reason, even though it's also large.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding truncate

2017-05-11 Thread Euler Taveira

2017-05-11 4:23 GMT-03:00 Friedrich, Steffen <
steffen.friedr...@dieboldnixdorf.com>:

> I am writing a logical decoding output plugin decoding WAL to SQL which is
> finally applied to target database.
>
>
>
> Is it possible to decode a TRUNCATE statement and the tables involved?
>
>
Yes, use event triggers. You can decode whatever DDL command you want.


> Assuming the SQL statement "TRUNCATE x, y;",  I am interested in decoding
> the operation TRUNCATE and the corresponding tables x and y so that I can
> reconstruct the SQL statement/transaction.
>
> Is that possible?
>
> If so, can you please provide an example or point me into the right
> direction?
>
>
>
Take a look at BDR code (bdr_queue_ddl_commands and
bdr_queue_dropped_objects) [1]. It implements DDL replication too.


[1] https://github.com/2ndQuadrant/bdr


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian

On Thu, May 11, 2017 at 05:39:58PM +, Douglas Doole wrote:
> One interesting idea from Doug Doole was to do it between the tokenizer 
> and
> parser.  I think they are glued together so you would need a way to run 
> the
> tokenizer separately and compare that to the tokens you stored for the
> cached plan.
> 
> 
> When I did this, we had the same problem that the tokenizer and parser were
> tightly coupled. Fortunately, I was able to do as you suggest and run the
> tokenizer separately to do my analysis. 
> 
> So my model was to do statement generalization before entering the compiler at
> all. I would tokenize the statement to find the literals and generate a new
> statement string with placeholders. The new string would the be passed to the
> compiler which would then tokenize and parse the reworked statement.
> 
> This means we incurred the cost of tokenizing twice, but the tokenizer was
> lightweight enough that it wasn't a problem. In exchange I was able to do
> statement generalization without touching the compiler - the compiler saw the
> generalized statement text as any other statement and handled it in the exact
> same way. (There was just a bit of new code around variable binding.)

Good point.  I think we need to do some measurements to see if the
parser-only stage is actually significant.  I have a hunch that
commercial databases have much heavier parsers than we do.

This split would also not work if the scanner feeds changes back into
the parser.  I know C does that for typedefs but I don't think we do.

Ideally I would like to see percentage-of-execution numbers for typical
queries for scan, parse, parse-analysis, plan, and execute to see where
the wins are.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-11 Thread David Fetter

On Thu, May 11, 2017 at 05:24:16PM +0200, Remi Colinet wrote:
> 2017-05-10 21:52 GMT+02:00 David Fetter :
> 
> > On Wed, May 10, 2017 at 06:40:31PM +0200, Remi Colinet wrote:
> > > Hello,
> > >
> > > This is version 2 of the new command name PROGRESS which I wrote in
> > > order to monitor long running SQL queries in a Postgres backend
> > > process.
> >
> > Perhaps I missed something important in the discussion, but was there
> > a good reason that this isn't a function callable from SQL, i.e. not
> > restricted to the psql client?

Please don't top post. http://www.caliburn.nl/topposting.html

> That's good point.
> 
> I will probably convert the new command to a SQL function.

Great!

> I was fumbling between a table or a SQL function. Oracle uses
> v$session_longops to track progression of long running SQL queries
> which have run for more than 6 seconds.  But a function is much
> better to provide parameters such as the backend pid and a format
> specification for the output.

Once it's a function, it can very easily be called in a system (or
user-defined) view.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Douglas Doole

>
> One interesting idea from Doug Doole was to do it between the tokenizer
> and parser.  I think they are glued together so you would need a way to run
> the tokenizer separately and compare that to the tokens you stored for the
> cached plan.
>

When I did this, we had the same problem that the tokenizer and parser were
tightly coupled. Fortunately, I was able to do as you suggest and run the
tokenizer separately to do my analysis.

So my model was to do statement generalization before entering the compiler
at all. I would tokenize the statement to find the literals and generate a
new statement string with placeholders. The new string would the be passed
to the compiler which would then tokenize and parse the reworked statement.

This means we incurred the cost of tokenizing twice, but the tokenizer was
lightweight enough that it wasn't a problem. In exchange I was able to do
statement generalization without touching the compiler - the compiler saw
the generalized statement text as any other statement and handled it in the
exact same way. (There was just a bit of new code around variable binding.)

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-11 Thread Simon Riggs

On 11 May 2017 at 18:13, Andres Freund  wrote:

>>New patch, v3.
>>
>>Applying in 90 minutes, barring objections.
>
> Could you please wait till tomorrow?  I've bigger pending fixes for related 
> code pending/being tested that I plan to push today.  I'd also like to take a 
> look before...

Sure.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Alvaro Herrera

Tom Lane wrote:

> Have you thought further about the upthread suggestion to just borrow
> SELECT's syntax lock stock and barrel?  That is, it'd look something
> like
> 
> CREATE STATISTICS name [(list of stats types)] expression-list FROM ...
> [ WHERE ... ] [ WITH (options) ]
> 
> This would certainly support any FROM stuff we might want later.
> Yeah, you'd need to reject any features you weren't supporting at
> execution, but doing that would allow issuing a friendlier message than
> "syntax error" anyway.

Hmm, no, I hadn't looked at this angle.

> If you don't have the cycles for that, I'd be willing to look into it.

I'll give it a try today, and pass the baton if I'm unable to.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Safer and faster get_attstatsslot()

2017-05-11 Thread Tom Lane

Simon Riggs  writes:
> On 11 May 2017 at 17:41, Tom Lane  wrote:
>> ...because code that worked fine for Peter and me failed
>> erratically in the buildfarm.

> I think its always a little bit too exciting for me also.

> I suggest we have a commit tree and a main tree, with automatic
> copying from commit -> main either
> 1. 24 hours after commit
> 2. or earlier if we have a full set of green results from people
> running the full suite on the commit tree

Meh.  We don't really need that in normal development, and for security
patches there's still a problem: we don't want to wait around 24 hours
after the code is public.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> Hmm ... I'm not sure that I buy that particular argument.  If you're
>> concerned that the grammar could not handle "FROM x JOIN y USING (z)",
>> wouldn't it also have a problem with "FROM x JOIN y ON (z)"?

> Tomas spent some time trying to shoehorn the whole join syntax into the
> FROM clause, but stopped once he realized that the joined_table
> production uses table_ref, which allow things like TABLESAMPLE, SRFs,
> LATERAL, etc which presumably we don't want to accept in CREATE STATS.
> I didn't look into it any further.  But because of the other
> considerations, I did end up changing the ON to FOR.

Have you thought further about the upthread suggestion to just borrow
SELECT's syntax lock stock and barrel?  That is, it'd look something
like

CREATE STATISTICS name [(list of stats types)] expression-list FROM ...
[ WHERE ... ] [ WITH (options) ]

This would certainly support any FROM stuff we might want later.
Yeah, you'd need to reject any features you weren't supporting at
execution, but doing that would allow issuing a friendlier message than
"syntax error" anyway.

If you don't have the cycles for that, I'd be willing to look into it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-11 Thread Andres Freund



On May 11, 2017 8:08:11 AM PDT, Simon Riggs  wrote:
>On 11 May 2017 at 14:12, Petr Jelinek 
>wrote:
>
>>> Attached patch is Petr's patch, slightly rebased with added pacing
>>> delay, similar to that used by HSFeedback.
>>>
>>
>> This looks reasonable. I would perhaps change:
>>> +   /*
>>> +* Track lag no more than once per
>WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS
>>> +*/
>>
>> to something like this for extra clarity:
>>> +   /*
>>> +* Track lag no more than once per
>WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS
>>> +* to avoid flooding the lag tracker on busy servers.
>>> +*/
>
>New patch, v3.
>
>Applying in 90 minutes, barring objections.

Could you please wait till tomorrow?  I've bigger pending fixes for related 
code pending/being tested that I plan to push today.  I'd also like to take a 
look before...

Thanks,

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WITH clause in CREATE STATISTICS

2017-05-11 Thread Alvaro Herrera

Tom Lane wrote:

> Hmm ... I'm not sure that I buy that particular argument.  If you're
> concerned that the grammar could not handle "FROM x JOIN y USING (z)",
> wouldn't it also have a problem with "FROM x JOIN y ON (z)"?
> 
> It might work anyway, since the grammar should know whether ON or USING
> is needed to complete the JOIN clause.  But I think you'd better check
> whether the complete join syntax works there, even if we're not going
> to support it now.

Tomas spent some time trying to shoehorn the whole join syntax into the
FROM clause, but stopped once he realized that the joined_table
production uses table_ref, which allow things like TABLESAMPLE, SRFs,
LATERAL, etc which presumably we don't want to accept in CREATE STATS.
I didn't look into it any further.  But because of the other
considerations, I did end up changing the ON to FOR.

So the attached is the final version which I intend to push shortly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
*** a/doc/src/sgml/perform.sgml
--- b/doc/src/sgml/perform.sgml
***
*** 1132,1138  WHERE tablename = 'road';
   To inspect functional dependencies on a statistics
   stts, you may do this:
  
! CREATE STATISTICS stts WITH (dependencies)
 ON (zip, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxname, stxkeys, stxdependencies
--- 1132,1138 
   To inspect functional dependencies on a statistics
   stts, you may do this:
  
! CREATE STATISTICS stts (dependencies)
 ON (zip, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxname, stxkeys, stxdependencies
***
*** 1219,1225  EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 
AND b = 10;
   Continuing the above example, the n-distinct coefficients in a ZIP
   code table may look like the following:
  
! CREATE STATISTICS stts2 WITH (ndistinct)
 ON (zip, state, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxkeys AS k, stxndistinct AS nd
--- 1219,1225 
   Continuing the above example, the n-distinct coefficients in a ZIP
   code table may look like the following:
  
! CREATE STATISTICS stts2 (ndistinct)
 ON (zip, state, city) FROM zipcodes;
  ANALYZE zipcodes;
  SELECT stxkeys AS k, stxndistinct AS nd
*** a/doc/src/sgml/planstats.sgml
--- b/doc/src/sgml/planstats.sgml
***
*** 526,532  EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND 
b = 1;
  multivariate statistics on the two columns:
  
  
! CREATE STATISTICS stts WITH (dependencies) ON (a, b) FROM t;
  ANALYZE t;
  EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
QUERY PLAN  
 
--- 526,532 
  multivariate statistics on the two columns:
  
  
! CREATE STATISTICS stts (dependencies) ON (a, b) FROM t;
  ANALYZE t;
  EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
QUERY PLAN  
 
***
*** 569,575  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY 
a, b;
  calculation, the estimate is much improved:
  
  DROP STATISTICS stts;
! CREATE STATISTICS stts WITH (dependencies, ndistinct) ON (a, b) FROM t;
  ANALYZE t;
  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b;
 QUERY PLAN 
   
--- 569,575 
  calculation, the estimate is much improved:
  
  DROP STATISTICS stts;
! CREATE STATISTICS stts (dependencies, ndistinct) ON (a, b) FROM t;
  ANALYZE t;
  EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b;
 QUERY PLAN 
   
*** a/doc/src/sgml/ref/create_statistics.sgml
--- b/doc/src/sgml/ref/create_statistics.sgml
***
*** 22,29  PostgreSQL documentation
   
  
  CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
! WITH ( option [= 
value] [, ... ] )
! ON ( column_name, 
column_name [, ...])
  FROM table_name
  
  
--- 22,29 
   
  
  CREATE STATISTICS [ IF NOT EXISTS ] statistics_name
! [ ( statistic_type [, ... ] 
) ]
! FOR ( column_name, 
column_name [, ...])
  FROM table_name
  
  
***
*** 75,80  CREATE STATISTICS [ IF NOT EXISTS ] statistics_na
--- 75,93 
 
  
 
+ statistic_type
+ 
+  
+   A statistic type to be enabled for this statistics.  Currently
+   supported types are ndistinct, which enables
+   n-distinct coefficient tracking,
+   and dependencies, which enables functional
+   dependencies.
+  
+ 
+
+ 
+
  column_name
  
   
***
*** 94,135  CREATE STATISTICS [ IF NOT EXISTS ] statistics_na
 
  

- 
-   
-Parameters
- 
-  
-

Re: [HACKERS] Safer and faster get_attstatsslot()

2017-05-11 Thread Simon Riggs

On 11 May 2017 at 17:41, Tom Lane  wrote:
> ...because code that worked fine for Peter and me failed
> erratically in the buildfarm.

I think its always a little bit too exciting for me also.

I suggest we have a commit tree and a main tree, with automatic
copying from commit -> main either
1. 24 hours after commit
2. or earlier if we have a full set of green results from people
running the full suite on the commit tree

That way we don't have to polute the main tree with all this jumping around.

Many alternate ideas possible.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] export import bytea from psql

2017-05-11 Thread Pavel Stehule

2017-05-11 17:16 GMT+02:00 Daniel Verite :

> Pavel Stehule wrote:
>
> > It is similar to my first or second proposal - rejected by Tom :(
>
> Doesn't it differ? ISTM that in the patches/discussion related to:
> https://commitfest.postgresql.org/11/787/
> it was proposed to change \set in one way or another,
> and also in the point #1 of this present thread:
>
> > > 1. using psql variables - we just need to write commands \setfrom
> > > \setfrombf - this should be very simple implementation. The value can
> be
> > > used more times. On second hand - the loaded variable can hold lot of
> > > memory (and it can be invisible for user).
>
>
> My comment is: let's not change \set or even create a variant
> of it just for importing verbatim file contents, in text or binary,
> because interpolating psql variables differently in the query itself
> is all we need.
>

My memory is wrong - Tom didn't reject this idea


http://www.postgresql-archive.org/proposal-psql-setfileref-td5918727i20.html

There was 100% agreement on format and then discussion finished without any
result.

Regards

Pavel


>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-11 Thread Tom Lane

Heikki Linnakangas  writes:
> On 05/11/2017 06:21 PM, Tom Lane wrote:
>>> On a side-note, I find it a bit awkward that ScalarArrayOpExpr uses a
>>> 2-element List to hold the scalar and array arguments. Wouldn't it be
>>> much more straightforward to have explicit "Expr *scalararg" and "Expr
>>> *arrayarg" fields?

>> I think it's modeled on OpExpr, which also uses a list though you could
>> argue for separate leftarg and rightarg fields instead.

> Yeah, I think that would be better for OpExpr, too. (For an unary 
> operator, rightarg would be unused.)

I should think leftarg is the one that would go unused, for a normal
prefix unary operator.

But it seems like changing this would be way more invasive than it's
really worth.  We'd save a couple of List cells per operator, which
is certainly something, but I'm afraid you'd be touching a heck of
a lot of code.  And there are a nontrivial number of places that
share argument-processing with FuncExprs, which such a change would
break.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Safer and faster get_attstatsslot()

2017-05-11 Thread Tom Lane

Monday's round of security patches was a lot more exciting than I would
have liked, because code that worked fine for Peter and me failed
erratically in the buildfarm.  What eventually emerged was that I'd added
some missing free_attstatsslot() calls in rangetypes_selfuncs.c, and
naively copied the first argument (atttype) from the matching
get_attstatsslot() calls.  One of those atttype values was in fact
wrong for the slot in question; this had been missed for years because
get_attstatsslot() doesn't actually do anything with that argument.
I think that at one point we had, or at least in the original conception
intended to have, an Assert that the atttype matched the actual stats
array element type found in the pg_statistic row; but we had to remove
it because in some cases the type in pg_statistic is only
binary-compatible with the datatype the applied operator is expecting.

So the existing API for get_attstatsslot()/free_attstatsslot() is just
seriously bug-prone.  It would be better if the caller did not have
to supply any type information; indeed really what we'd want is for
get_attstatsslot() to pass back the actual type it found in pg_statistic.

I also realized as I looked at the code that it's exceedingly inefficient
if the array element type is pass-by-reference --- then it'll incur a
separate palloc, copy, and pfree for each element.  We'd be a lot better
off to copy the stats array as a whole, especially since that would come
for free in the probably-common case that the array has to be detoasted.
This code was written with very small stats targets in mind, like about
10, and it just doesn't look very good when you're imagining 1000 or more
entries in the stats array.

So attached is a proposed redesign that makes the API for
get_attstatsslot()/free_attstatsslot() simpler and hopefully more
foolproof.  I've not made any particular attempt to performance-test
it, but it really ought to be a significant win for pass-by-ref element
types.  It will add an array-copying step that wasn't there before when
the element type is pass-by-value and the array isn't toasted, but that
seems like an acceptable price.

BTW, this patch makes the skewColType and skewColTypmod fields of Hash
plan nodes unnecessary, as the only reason for them was to satisfy
get_attstatsslot().  I didn't remove them here but it would make sense
to do so.

Comments?  Is this something that'd be OK to push now, or had I better
sit on it till v11?  Being freshly burned, I kinda want to fix it now,
but I recognize that my judgment may be colored by that.

regards, tom lane

diff --git a/contrib/intarray/_int_selfuncs.c b/contrib/intarray/_int_selfuncs.c
index 9b4a22f..3d92025 100644
*** a/contrib/intarray/_int_selfuncs.c
--- b/contrib/intarray/_int_selfuncs.c
*** _int_matchsel(PG_FUNCTION_ARGS)
*** 136,146 
  	int			nmcelems = 0;
  	float4		minfreq = 0.0;
  	float4		nullfrac = 0.0;
! 	Form_pg_statistic stats;
! 	Datum	   *values = NULL;
! 	int			nvalues = 0;
! 	float4	   *numbers = NULL;
! 	int			nnumbers = 0;
  
  	/*
  	 * If expression is not "variable @@ something" or "something @@ variable"
--- 136,142 
  	int			nmcelems = 0;
  	float4		minfreq = 0.0;
  	float4		nullfrac = 0.0;
! 	AttStatsSlot sslot;
  
  	/*
  	 * If expression is not "variable @@ something" or "something @@ variable"
*** _int_matchsel(PG_FUNCTION_ARGS)
*** 193,198 
--- 189,196 
  	 */
  	if (HeapTupleIsValid(vardata.statsTuple))
  	{
+ 		Form_pg_statistic stats;
+ 
  		stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
  		nullfrac = stats->stanullfrac;
  
*** _int_matchsel(PG_FUNCTION_ARGS)
*** 200,228 
  		 * For an int4 array, the default array type analyze function will
  		 * collect a Most Common Elements list, which is an array of int4s.
  		 */
! 		if (get_attstatsslot(vardata.statsTuple,
! 			 INT4OID, -1,
  			 STATISTIC_KIND_MCELEM, InvalidOid,
! 			 NULL,
! 			 , ,
! 			 , ))
  		{
  			/*
  			 * There should be three more Numbers than Values, because the
  			 * last three (for intarray) cells are taken for minimal, maximal
  			 * and nulls frequency. Punt if not.
  			 */
! 			if (nnumbers == nvalues + 3)
  			{
  /* Grab the lowest frequency. */
! minfreq = numbers[nnumbers - (nnumbers - nvalues)];
  
! mcelems = values;
! mcefreqs = numbers;
! nmcelems = nvalues;
  			}
  		}
  	}
  
  	/* Process the logical expression in the query, using the stats */
  	selec = int_query_opr_selec(GETQUERY(query) + query->size - 1,
--- 198,227 
  		 * For an int4 array, the default array type analyze function will
  		 * collect a Most Common Elements list, which is an array of int4s.
  		 */
! 		if (get_attstatsslot(, vardata.statsTuple,
  			 STATISTIC_KIND_MCELEM, InvalidOid,
! 			 ATTSTATSSLOT_VALUES | ATTSTATSSLOT_NUMBERS))
  		{
+ 			Assert(sslot.valuetype == INT4OID);
+ 
  			/*
  			 * There should be three

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-11 Thread Heikki Linnakangas


On 05/11/2017 06:21 PM, Tom Lane wrote:

Heikki Linnakangas  writes:

Eval_const_expressions() doesn't know about ScalarArrayOpExpr.
...
That seems like an oversight. I guess that scenario doesn't happen very
often in practice, but there's no reason not to do it when it does.
Patch attached.


Yeah, I think it's a lack-of-round-tuits situation.  Your patch reminds
me of a more ambitious attempt I made awhile back to reduce the amount
of duplicative boilerplate in eval_const_expressions.  I think I wrote
it during last year's beta period, stashed it because I didn't feel like
arguing for applying it right then, and then forgot about it.


Hmph, now we're almost in beta period again. :-(.


Blowing the dust off, it's attached. It fixes ArrayRef and RowExpr as
well as ScalarArrayOpExpr, with a net growth of only 20 lines
(largely comments).


Nice!


On a side-note, I find it a bit awkward that ScalarArrayOpExpr uses a
2-element List to hold the scalar and array arguments. Wouldn't it be
much more straightforward to have explicit "Expr *scalararg" and "Expr
*arrayarg" fields?


I think it's modeled on OpExpr, which also uses a list though you could
argue for separate leftarg and rightarg fields instead.


Yeah, I think that would be better for OpExpr, too. (For an unary 
operator, rightarg would be unused.)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-11 Thread Remi Colinet

Thx for your comment.

Based on other comments, I will probably convert the command into a SQL
function.
This allows passing arguments such as the one which can be used in the
current command (VERBOSE, FORMAT).

This will avoid keyword collisions.

Regards
Remi

2017-05-10 22:04 GMT+02:00 Pavel Stehule :

> Hi
>
> 2017-05-10 18:40 GMT+02:00 Remi Colinet :
>
>> Hello,
>>
>> This is version 2 of the new command name PROGRESS which I wrote in order
>> to monitor long running SQL queries in a Postgres backend process.
>>
>
> This patch introduce lot of reserved keyword. Why? It can breaks lot of
> applications. Cannot you enhance EXPLAIN command?
>
> Regards
>
> Pavel
>
>

Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-11 Thread Remi Colinet

2017-05-11 4:05 GMT+02:00 Michael Paquier :

> On Thu, May 11, 2017 at 1:40 AM, Remi Colinet 
> wrote:
> > This is version 2 of the new command name PROGRESS which I wrote in
> order to
> > monitor long running SQL queries in a Postgres backend process.
>
> It would be a good idea to add this patch to the next commit fest if
> you are expecting some feedback:
> https://commitfest.postgresql.org/14/
> But please don't expect immediate feedback, the primary focus these
> days is to stabilize the upcoming release.
>

Yes, I understand. I will submit the patch to the commit fest once I will
have had a few feedbacks.
I already received some interesting suggestions such as using a SQL
function instead of a whole new command.

The purpose for the moment is to gather feedback and advise about such
feature.

>
> > New command justification
> > 
> >
> > [root@rco pg]# git diff --stat master..
> > [...]
> >  58 files changed, 5972 insertions(+), 2619 deletions(-)
>
> For your first patch, you may want something less... Challenging.
> Please note as well that it would be nice if you review other patches
> to get more familiar with the community process, it is expected that
> for each patch submitted, an equivalent amount of review is done.
> That's hard to measure but for a patch as large as that you will need
> to pick up at least one patch equivalent in difficulty.
>

Acked

> Regarding the patch, this is way to close to the progress facility
> already in place. So why don't you extend it for the executor? We can
> likely come up with something that can help, though I see the part
> where the plan run by a backend is shared among all processes as a
> major bottleneck in performance. You have at least three concepts
> different here:
> - Store plans run in shared memory to allow access to any other processes.
> - Allow a process to look at the plan run by another one with a SQL
> interface.
> - Track the progress of a running plan, via pg_stat_activity.
> All in all, I think that a new command is not adapted, and that you
> had better split each concept before implementing something
> over-engineered like the patch attached.
>

I initially had a look at the progress facility but this seemed to me very
far from the current goal.
Btw, I will reconsider it. Things may have changed.

The plan and its planstate are not permanently shared with other backends.
This would be useless and very costly.

As long as the PROGRESS command is not used, the patch has a very low
impact on the executor.
A few counters and flags are added/updated by the patch in the executor.
This is all what is needed to track progress of execution.

For instance in src/backend/executor/execProcnode.c, I've added:

ExecInitNode()
+ result->plan_rows = 0;

ExecProcNode()
+ node->plan_rows++;

ExecEndNode()
+ node->plan_rows = 0;

This is enough to compute the number of rows fetched by the plan at this
step of the plan.
A few counters have been added to track sorts, scans, hashes, tuple stores.

The disk space used by nodes is also traced. This shows disk resource
needed to complete a SQL query.
This was not initially a purpose of the patch, but I find this very handy.

Some debugging code will also be removed making the change in the executor
very small.
The code change in the executor is less than 50 lines and it is about 150
lines in the tuple sort/store code mainly to add functions to fetch sort
and store status.

The execution tree is dumped in shared memory only when requested by a
monitoring backend sending a signal to the monitored backend.
The monitoring backend first sends a signal which is noted by the backend
running the long SQL query.
Once the long SQL query can deal with the request (interrupt enabled), it
dumps its execution tree in shared memory and then follows up its
execution.

So the dump of the execution tree is done only when it is requested.
This does not seem to incur any performance hit. Or I missed something.

Once the concept is mature, I would split the patch in small consistent
parts to allow a smooth and clean merge.

Regards
Remi

> --
> Michael
>

Re: [HACKERS] [POC] hash partitioning

2017-05-11 Thread Dilip Kumar

On Wed, May 3, 2017 at 6:39 PM, amul sul  wrote:
> On Thu, Apr 27, 2017 at 1:42 AM, Robert Haas  wrote:
>
>>I spent some time today looking at these patches.  It seems like there
>>is some more work still needed here to produce something committable
>>regardless of which way we go, but I am inclined to think that Amul's
>>patch is a better basis for work going forward than Nagata-san's
>>patch. Here are some general comments on the two patches:
>
> Thanks for your time.
>
> [...]
>
>> - Neither patch contains any documentation updates, which is bad.
>
> Fixed in the attached version.

I have done an intial review of the patch and I have some comments.  I
will continue the review
and testing and report the results soon

-
Patch need to be rebased



if (key->strategy == PARTITION_STRATEGY_RANGE)
{
/* Disallow nulls in the range partition key of the tuple */
for (i = 0; i < key->partnatts; i++)
if (isnull[i])
ereport(ERROR,
(errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
errmsg("range partition key of row contains null")));
}

We need to add PARTITION_STRATEGY_HASH as well, we don't support NULL
for hash also, right?


RangeDatumContent **content;/* what's contained in each range bound datum?
  * (see the above enum); NULL for list
  * partitioned tables */

This will be NULL for hash as well we need to change the comments.
-

  bool has_null; /* Is there a null-accepting partition? false
  * for range partitioned tables */
  int null_index; /* Index of the null-accepting partition; -1

Comments needs to be changed for these two members as well


+/* One bound of a hash partition */
+typedef struct PartitionHashBound
+{
+ int modulus;
+ int remainder;
+ int index;
+} PartitionHashBound;

It will good to add some comments to explain the structure members


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] renaming "transaction log"

2017-05-11 Thread Tom Lane

Peter Eisentraut  writes:
> Most documentation and error messages still uses the term "transaction
> log" to refer to the write-ahead log.  Here is a patch to rename that,
> which I think should be done, to match the xlog -> wal renaming in APIs.

This will need to be rebased over d10c626de, which made a few of the
same/similar changes but did not try to hit stuff that wasn't adjacent
to what it was changing anyway.  I'm +1 for the idea though.  I think
we should also try to standardize on the terminology "WAL location"
for LSNs, not variants such as "WAL position" or "log position".

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Concurrent ALTER SEQUENCE RESTART Regression

2017-05-11 Thread Peter Eisentraut

On 5/10/17 12:24, Andres Freund wrote:
> The issue isn't the strength, but that we currently have this weird
> hackery around open_share_lock():
> /*
>  * Open the sequence and acquire AccessShareLock if needed
>  *
>  * If we haven't touched the sequence already in this transaction,
>  * we need to acquire AccessShareLock.  We arrange for the lock to
>  * be owned by the top transaction, so that we don't need to do it
>  * more than once per xact.
>  */
> 
> This'd probably need to be removed, as we'd otherwise would get very
> weird semantics around aborted subxacts.

Can you explain in more detail what you mean by this?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v2] Progress command to monitor progression of long running SQL queries

2017-05-11 Thread Remi Colinet

That's good point.

I will probably convert the new command to a SQL function.

I was fumbling between a table or a SQL function. Oracle uses
v$session_longops to track progression of long running SQL queries which
have run for more than 6 seconds.
But a function is much better to provide parameters such as the backend pid
and a format specification for the output.

Regards
Remi

2017-05-10 21:52 GMT+02:00 David Fetter :

> On Wed, May 10, 2017 at 06:40:31PM +0200, Remi Colinet wrote:
> > Hello,
> >
> > This is version 2 of the new command name PROGRESS which I wrote in
> > order to monitor long running SQL queries in a Postgres backend
> > process.
>
> Perhaps I missed something important in the discussion, but was there
> a good reason that this isn't a function callable from SQL, i.e. not
> restricted to the psql client?
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

Re: [HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-11 Thread Tom Lane

Heikki Linnakangas  writes:
> Eval_const_expressions() doesn't know about ScalarArrayOpExpr.
> ...
> That seems like an oversight. I guess that scenario doesn't happen very 
> often in practice, but there's no reason not to do it when it does. 
> Patch attached.

Yeah, I think it's a lack-of-round-tuits situation.  Your patch reminds
me of a more ambitious attempt I made awhile back to reduce the amount
of duplicative boilerplate in eval_const_expressions.  I think I wrote
it during last year's beta period, stashed it because I didn't feel like
arguing for applying it right then, and then forgot about it.  Blowing
the dust off, it's attached.  It fixes ArrayRef and RowExpr as well as
ScalarArrayOpExpr, with a net growth of only 20 lines (largely comments).

> On a side-note, I find it a bit awkward that ScalarArrayOpExpr uses a 
> 2-element List to hold the scalar and array arguments. Wouldn't it be 
> much more straightforward to have explicit "Expr *scalararg" and "Expr 
> *arrayarg" fields?

I think it's modeled on OpExpr, which also uses a list though you could
argue for separate leftarg and rightarg fields instead.

regards, tom lane

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index a1dafc8..95489f4 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*** static List *find_nonnullable_vars_walke
*** 115,120 
--- 115,123 
  static bool is_strict_saop(ScalarArrayOpExpr *expr, bool falseOK);
  static Node *eval_const_expressions_mutator(Node *node,
  			   eval_const_expressions_context *context);
+ static bool contain_non_const_walker(Node *node, void *context);
+ static bool ece_function_is_safe(Oid funcid,
+ 	 eval_const_expressions_context *context);
  static List *simplify_or_arguments(List *args,
  	  eval_const_expressions_context *context,
  	  bool *haveNull, bool *forceTrue);
*** estimate_expression_value(PlannerInfo *r
*** 2443,2448 
--- 2446,2482 
  	return eval_const_expressions_mutator(node, );
  }
  
+ /*
+  * The generic case in eval_const_expressions_mutator is to recurse using
+  * expression_tree_mutator, which will copy the given node unchanged but
+  * const-simplify its arguments (if any) as far as possible.  If the node
+  * itself does immutable processing, and each of its arguments were reduced
+  * to a Const, we can then reduce it to a Const using evaluate_expr.  (Some
+  * node types need more complicated logic; for example, a CASE expression
+  * might be reducible to a constant even if not all its subtrees are.)
+  */
+ #define ece_generic_processing(node) \
+ 	expression_tree_mutator((Node *) (node), eval_const_expressions_mutator, \
+ 			(void *) context)
+ 
+ /*
+  * Check whether all arguments of the given node were reduced to Consts.
+  * By going directly to expression_tree_walker, contain_non_const_walker
+  * is not applied to the node itself, only to its children.
+  */
+ #define ece_all_arguments_const(node) \
+ 	(!expression_tree_walker((Node *) (node), contain_non_const_walker, NULL))
+ 
+ /* Generic macro for applying evaluate_expr */
+ #define ece_evaluate_expr(node) \
+ 	((Node *) evaluate_expr((Expr *) (node), \
+ 			exprType((Node *) (node)), \
+ 			exprTypmod((Node *) (node)), \
+ 			exprCollation((Node *) (node
+ 
+ /*
+  * Recursive guts of eval_const_expressions/estimate_expression_value
+  */
  static Node *
  eval_const_expressions_mutator(Node *node,
  			   eval_const_expressions_context *context)
*** eval_const_expressions_mutator(Node *nod
*** 2758,2763 
--- 2792,2816 
  newexpr->location = expr->location;
  return (Node *) newexpr;
  			}
+ 		case T_ScalarArrayOpExpr:
+ 			{
+ ScalarArrayOpExpr *saop;
+ 
+ /* Copy the node and const-simplify its arguments */
+ saop = (ScalarArrayOpExpr *) ece_generic_processing(node);
+ 
+ /* Make sure we know underlying function */
+ set_sa_opfuncid(saop);
+ 
+ /*
+  * If all arguments are Consts, and it's a safe function, we
+  * can fold to a constant
+  */
+ if (ece_all_arguments_const(saop) &&
+ 	ece_function_is_safe(saop->opfuncid, context))
+ 	return ece_evaluate_expr(saop);
+ return (Node *) saop;
+ 			}
  		case T_BoolExpr:
  			{
  BoolExpr   *expr = (BoolExpr *) node;
*** eval_const_expressions_mutator(Node *nod
*** 2982,3022 
  			}
  		case T_ArrayCoerceExpr:
  			{
! ArrayCoerceExpr *expr = (ArrayCoerceExpr *) node;
! Expr	   *arg;
! ArrayCoerceExpr *newexpr;
! 
! /*
!  * Reduce constants in the ArrayCoerceExpr's argument, then
!  * build a new ArrayCoerceExpr.
!  */
! arg = (Expr *) eval_const_expressions_mutator((Node *) expr->arg,
! 			  context);
  
! newexpr = makeNode(ArrayCoerceExpr);
! newexpr->arg = arg;
! newexpr->elemfuncid =

Re: [HACKERS] export import bytea from psql

2017-05-11 Thread Daniel Verite

Pavel Stehule wrote:

> It is similar to my first or second proposal - rejected by Tom :(

Doesn't it differ? ISTM that in the patches/discussion related to:
https://commitfest.postgresql.org/11/787/
it was proposed to change \set in one way or another,
and also in the point #1 of this present thread:

> > 1. using psql variables - we just need to write commands \setfrom
> > \setfrombf - this should be very simple implementation. The value can be
> > used more times. On second hand - the loaded variable can hold lot of
> > memory (and it can be invisible for user).

My comment is: let's not change \set or even create a variant
of it just for importing verbatim file contents, in text or binary,
because interpolating psql variables differently in the query itself
is all we need.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cached plans and statement generalization

2017-05-11 Thread Bruce Momjian

On Wed, May 10, 2017 at 07:11:07PM +0300, Konstantin Knizhnik wrote:
> I am going to continue work on this patch I will be glad to receive any
> feedback and suggestions for its improvement.
> In most cases, applications are not accessing Postgres directly, but using
> some connection pooling layer and so them are not able to use prepared
> statements.
> But at simple OLTP Postgres spent more time on building query plan than on
> execution itself. And it is possible to speedup Postgres about two times at
> such workload!
> Another alternative is true shared plan cache.  May be it is even more
> perspective approach, but definitely much more invasive and harder to
> implement.

Can we back up and get an overview of what you are doing and how you are
doing it?  Our TODO list suggests this order for successful patches:

Desirability -> Design -> Implement -> Test -> Review -> Commit

You kind of started at the Implementation/patch level, which makes it
hard to evaluate.

I think everyone agrees on the Desirability of the feature, but the
Design is the tricky part.  I think the design questions are:

*  What information is stored about cached plans?
*  How are the cached plans invalidated?
*  How is a query matched against a cached plan?

Looking at the options, ideally the plan would be cached at the same
query stage as the stage where the incoming query is checked against the
cache.  However, caching and checking at the same level offers no
benefit, so they are going to be different.  For example, caching a
parse tree at the time it is created, then checking at the same point if
the incoming query is the same doesn't help you because you already had
to create the parse tree get to that point.

A more concrete example is prepared statements.  They are stored at the
end of planning and matched in the parser.  However, you can easily do
that since the incoming query specifies the name of the prepared query,
so there is no trick to matching.

The desire is to cache as late as possible so you cache more work and
you have more detail about the referenced objects, which helps with
cache invalidation.  However, you also want to do cache matching as
early as possible to improve performance.

So, let's look at some options.  One interesting idea from Doug Doole
was to do it between the tokenizer and parser.  I think they are glued
together so you would need a way to run the tokenizer separately and
compare that to the tokens you stored for the cached plan.  The larger
issue is that prepared plans already are checked after parsing, and we
know they are a win, so matching any earlier than that just seems like
overkill and likely to lead to lots of problems.

So, you could do it after parsing but before parse-analysis, which is
kind of what prepared queries do.  One tricky problem is that we don't
bind the query string tokens to database objects until after parse
analysis.

Doing matching before parse-analysis is going to be tricky, which is why
there are so many comments about the approach.  Changing search_path can
certainly affect it, but creating objects in earlier-mentioned schemas
can also change how an object reference in a query is resolved.  Even
obscure things like the creation of a new operator that has higher
precedence in the query could change the plan, though am not sure if
our prepared query system even handles that properly.

Anyway, that is my feedback.  I would like to get an overview of what
you are trying to do and the costs/benefits of each option so we can
best guide you.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time based lag tracking for logical replication

2017-05-11 Thread Simon Riggs

On 11 May 2017 at 14:12, Petr Jelinek  wrote:

>> Attached patch is Petr's patch, slightly rebased with added pacing
>> delay, similar to that used by HSFeedback.
>>
>
> This looks reasonable. I would perhaps change:
>> +   /*
>> +* Track lag no more than once per 
>> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS
>> +*/
>
> to something like this for extra clarity:
>> +   /*
>> +* Track lag no more than once per 
>> WALSND_LOGICAL_LAG_TRACK_INTERVAL_MS
>> +* to avoid flooding the lag tracker on busy servers.
>> +*/

New patch, v3.

Applying in 90 minutes, barring objections.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Add-support-for-time-based-lag-tracking-to-logical-170429.v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] eval_const_expresisions and ScalarArrayOpExpr

2017-05-11 Thread Heikki Linnakangas

Eval_const_expressions() doesn't know about ScalarArrayOpExpr. We 
simplify the arguments, but if all the arguments are booleans, we don't 
take the obvious step of replacing the whole expression with a boolean 
Const. For example:


postgres=# explain select * from foo where 1 IN (1,2,3);
 QUERY PLAN
-
 Result  (cost=0.00..35.50 rows=2550 width=4)
   One-Time Filter: (1 = ANY ('{1,2,3}'::integer[]))
   ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)
(3 rows)

I would've expected that to be fully evaluated at plan-time, and 
optimized away.


That seems like an oversight. I guess that scenario doesn't happen very 
often in practice, but there's no reason not to do it when it does. 
Patch attached.


On a side-note, I find it a bit awkward that ScalarArrayOpExpr uses a 
2-element List to hold the scalar and array arguments. Wouldn't it be 
much more straightforward to have explicit "Expr *scalararg" and "Expr 
*arrayarg" fields?


- Heikki



0001-Const-eval-ScalarArrayOpExprs.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Petr Jelinek

On 11/05/17 16:33, Stas Kelvich wrote:
> 
>> On 10 May 2017, at 11:43, Petr Jelinek  wrote:
>>
>> On 09/05/17 22:11, Erik Rijkers wrote:
>>> On 2017-05-09 21:00, Petr Jelinek wrote:
 On 09/05/17 19:54, Erik Rijkers wrote:
> On 2017-05-09 11:50, Petr Jelinek wrote:
>

 Ah okay, so this is same issue that's reported by both Masahiko Sawada
 [1] and Jeff Janes [2].

 [1]
 https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com

 [2]
 https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com

>>>
>>> I don't understand why you come to that conclusion: both Masahiko Sawada
>>> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
>>> Isn't that a real difference?
>>>
>>
> 
> Just noted another one issue/feature with snapshot builder — when logical 
> decoding is in progress
> and vacuum full command is being issued quite a big amount of files appears 
> in pg_logical/mappings
> and reside there till the checkpoint. Also having consequent vacuum full’s 
> before checkpoint yields even
> more files.
> 
> Having two pgbench-filled instances with logical replication between them:
> 
> for i in {1..100}; do psql -c 'vacuum full' && ls -la pg_logical/mappings | 
> wc -l; done;
> VACUUM
>   73
> VACUUM
>  454
> VACUUM
> 1146
> VACUUM
> 2149
> VACUUM
> 3463
> VACUUM
> 5088
> <...>
> VACUUM
>44708
> <…> // checkpoint happens somewhere here
> VACUUM
>20921
> WARNING:  could not truncate file "base/16384/61773": Too many open files in 
> system
> ERROR:  could not fsync file 
> "pg_logical/mappings/map-4000-4df-0_A4EA29F8-5aa5-5ae6": Too many open files 
> in system
> 
> 
> I’m not sure whether this is boils down to some of the previous issues 
> mentioned here or not, so posting
> here as observation.
> 

This has nothing to do with snapshot builder so this is not the thread
for it. See the comment section near the bottom of
src/backend/access/heap/rewriteheap.c for explanation of why it does
what it does.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-11 Thread Tom Lane

Neha Khatri  writes:
> [In case forgotten] pg_controdata and pg_waldump interfaces should also be
> considered for this standardization.

> Following are pg_controldata interfaces that might require change:

>   Latest checkpoint location:
>   Prior checkpoint location:
>   Latest checkpoint's REDO location:
>   Minimum recovery ending location:
>   Backup start location:
>   Backup end location:

My inclination is to leave these messages alone.  They're not really
inconsistent with anything.  Where we seem to be ending up is that
"lsn" will be used in things like function and column names, but
documentation will continue to spell out phrases like "WAL location".

There is another open thread about converting said phrases to be
more consistent --- a lot of them currently say "transaction log
location", which is not a very good choice because it invites
confusion with pg_xact nee pg_clog.  But I think that's mostly
just documentation changes, and in any case it's better done as
a separate patch.

> The pg_waldump help options(and probably documentation) would also require
> change:
>   -e, --end=RECPTR   stop reading at log position RECPTR
>   -s, --start=RECPTR start reading at log position RECPTR

Yeah, probably s/log position/WAL location/ would be better here.
But again that seems like material for a separate patch.  The
current patch does do s/position/location/ in a few places, but
it's not trying to apply that policy everywhere.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should pg_current_wal_location() become pg_current_wal_lsn()

2017-05-11 Thread Tom Lane

Michael Paquier  writes:
> On Thu, May 11, 2017 at 9:15 AM, Bruce Momjian  wrote:
>> On Wed, May 10, 2017 at 01:09:36PM -0700, Joe Conway wrote:
> On 05/10/2017 12:22 PM, Tom Lane wrote:
>>> Hm, well, anybody else want to vote?

>>> +1 for #2

>> Same, +1 for #2 (apply this patch)

> #1, do nothing.

OK, by my count we have

For patch:
Joe Conway
Stephen Frost
Kyotaro Horiguchi (*)
Petr Jelinek (*)
Tom Lane
Bruce Momjian
David Rowley
David Steele
Euler Taveira

For doing nothing:
Peter Eisentraut
Robert Haas
Michael Paquier

I think Kyotaro-san and Petr would have preferred standardizing
on "location" over "lsn", but they were definitely not for doing
nothing.  With or without their votes, it's pretty clear that the
ayes have it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapbuild woes

2017-05-11 Thread Stas Kelvich


> On 10 May 2017, at 11:43, Petr Jelinek  wrote:
> 
> On 09/05/17 22:11, Erik Rijkers wrote:
>> On 2017-05-09 21:00, Petr Jelinek wrote:
>>> On 09/05/17 19:54, Erik Rijkers wrote:
 On 2017-05-09 11:50, Petr Jelinek wrote:
 
>>> 
>>> Ah okay, so this is same issue that's reported by both Masahiko Sawada
>>> [1] and Jeff Janes [2].
>>> 
>>> [1]
>>> https://www.postgresql.org/message-id/CAD21AoBYpyqTSw%2B%3DES%2BxXtRGMPKh%3DpKiqjNxZKnNUae0pSt9bg%40mail.gmail.com
>>> 
>>> [2]
>>> https://www.postgresql.org/message-id/flat/CAMkU%3D1xUJKs%3D2etq2K7bmbY51Q7g853HLxJ7qEB2Snog9oRvDw%40mail.gmail.com
>>> 
>> 
>> I don't understand why you come to that conclusion: both Masahiko Sawada
>> and Jeff Janes have a DROP SUBSCRIPTION in the mix; my cases haven't. 
>> Isn't that a real difference?
>> 
> 

Just noted another one issue/feature with snapshot builder — when logical 
decoding is in progress
and vacuum full command is being issued quite a big amount of files appears in 
pg_logical/mappings
and reside there till the checkpoint. Also having consequent vacuum full’s 
before checkpoint yields even
more files.

Having two pgbench-filled instances with logical replication between them:

for i in {1..100}; do psql -c 'vacuum full' && ls -la pg_logical/mappings | wc 
-l; done;
VACUUM
  73
VACUUM
 454
VACUUM
1146
VACUUM
2149
VACUUM
3463
VACUUM
5088
<...>
VACUUM
   44708
<…> // checkpoint happens somewhere here
VACUUM
   20921
WARNING:  could not truncate file "base/16384/61773": Too many open files in 
system
ERROR:  could not fsync file 
"pg_logical/mappings/map-4000-4df-0_A4EA29F8-5aa5-5ae6": Too many open files in 
system


I’m not sure whether this is boils down to some of the previous issues 
mentioned here or not, so posting
here as observation.


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-05-11 Thread Rahila Syed

Hello,

Please find attached an updated patch with review comments and bugs
reported till date implemented.

>1.
>In following block, we can just do with def_index, and we do not need
found_def
>flag. We can check if def_index is -1 or not to decide if default partition
is
>present.
found_def is used to set boundinfo->has_default which is used at couple
of other places to check if default partition exists. The implementation is
similar
to has_null.

>3.
>In following function isDefaultPartitionBound, first statement "return
false"
>is not needed.
It is needed to return false if the node is not DefElem.

Todo:
Add regression tests
Documentation

Thank you,
Rahila Syed



On Fri, May 5, 2017 at 1:30 AM, Jeevan Ladhe 
wrote:

> Hi Rahila,
>
> I have started reviewing your latest patch, and here are my initial
> comments:
>
> 1.
> In following block, we can just do with def_index, and we do not need
> found_def
> flag. We can check if def_index is -1 or not to decide if default
> partition is
> present.
>
> @@ -166,6 +172,8 @@ RelationBuildPartitionDesc(Relation rel)
>   /* List partitioning specific */
>   PartitionListValue **all_values = NULL;
>   bool found_null = false;
> + bool found_def = false;
> + int def_index = -1;
>   int null_index = -1;
>
> 2.
> In check_new_partition_bound, in case of PARTITION_STRATEGY_LIST, remove
> following duplicate declaration of boundinfo, because it is confusing and
> after
> your changes it is not needed as its not getting overridden in the if block
> locally.
> if (partdesc->nparts > 0)
> {
> PartitionBoundInfo boundinfo = partdesc->boundinfo;
> ListCell   *cell;
>
>
> 3.
> In following function isDefaultPartitionBound, first statement "return
> false"
> is not needed.
>
> + * Returns true if the partition bound is default
> + */
> +bool
> +isDefaultPartitionBound(Node *value)
> +{
> + if (IsA(value, DefElem))
> + {
> + DefElem *defvalue = (DefElem *) value;
> + if(!strcmp(defvalue->defname, "DEFAULT"))
> + return true;
> + return false;
> + }
> + return false;
> +}
>
> 4.
> As mentioned in my previous set of comments, following if block inside a
> loop
> in get_qual_for_default needs a break:
>
> + foreach(cell1, bspec->listdatums)
> + {
> + Node *value = lfirst(cell1);
> + if (isDefaultPartitionBound(value))
> + {
> + def_elem = true;
> + *defid  = inhrelid;
> + }
> + }
>
> 5.
> In the grammar the rule default_part_list is not needed:
>
> +default_partition:
> + DEFAULT  { $$ = (Node *)makeDefElem("DEFAULT", NULL, @1); }
> +
> +default_part_list:
> + default_partition { $$ = list_make1($1); }
> + ;
> +
>
> Instead you can simply declare default_partition as a list and write it as:
>
> default_partition:
> DEFAULT
> {
> Node *def = (Node *)makeDefElem("DEFAULT", NULL, @1);
> $$ = list_make1(def);
> }
>
> 6.
> You need to change the output of the describe command, which is currently
> as below: postgres=# \d+ test; Table "public.test" Column | Type |
> Collation | Nullable | Default | Storage | Stats target | Description
> +-+---+--+-+-+--+-
> a | integer | | | | plain | | b | date | | | | plain | | Partition key:
> LIST (a) Partitions: pd FOR VALUES IN (DEFAULT), test_p1 FOR VALUES IN (4,
> 5) What about changing the Paritions output as below: Partitions: *pd
> DEFAULT,* test_p1 FOR VALUES IN (4, 5)
>
> 7.
> You need to handle tab completion for DEFAULT.
> e.g.
> If I partially type following command:
> CREATE TABLE pd PARTITION OF test DEFA
> and then press tab, I get following completion:
> CREATE TABLE pd PARTITION OF test FOR VALUES
>
> I did some primary testing and did not find any problem so far.
>
> I will review and test further and let you know my comments.
>
> Regards,
> Jeevan Ladhe
>
> On Thu, May 4, 2017 at 6:09 PM, Rajkumar Raghuwanshi <
> rajkumar.raghuwan...@enterprisedb.com> wrote:
>
>> On Thu, May 4, 2017 at 5:14 PM, Rahila Syed 
>> wrote:
>>
>>> The syntax implemented in this patch is as follows,
>>>
>>> CREATE TABLE p11 PARTITION OF p1 DEFAULT;
>>>
>>> Applied v9 patches, table description still showing old pattern of
>> default partition. Is it expected?
>>
>> create table lpd (a int, b int, c varchar) partition by list(a);
>> create table lpd_d partition of lpd DEFAULT;
>>
>> \d+ lpd
>>  Table "public.lpd"
>>  Column |   Type| Collation | Nullable | Default | Storage  |
>> Stats target | Description
>> +---+---+--+
>> -+--+--+-
>>  a  | integer   |   |  | | plain
>> |  |
>>  b  | integer   |   |  | | plain
>> |  |
>>  c  | character varying |   |  | | extended
>> |  |
>> Partition key: LIST (a)
>> Partitions: lpd_d FOR VALUES IN (DEFAULT)
>>
>
>

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-11 Thread Petr Jelinek

Hi,

On 11/05/17 14:25, tushar wrote:
> Hi,
> 
> I observed that -we cannot publish "foreign table" in Publication
> 
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't1');
> CREATE FOREIGN TABLE
> 
> postgres=# create publication pub for table t;
> ERROR:  "t" is not a table
> DETAIL:  Only tables can be added to publications.
> postgres=#
> 
> but same thing is not true for Subscription
> 
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't');
> CREATE FOREIGN TABLE
> postgres=# alter subscription sub refresh publication ;
> NOTICE:  added subscription for table public.t
> ALTER SUBSCRIPTION
> 
> Is this an expected behavior ?   if we cannot publish then how  can we 
> add subscription for it.
> 

Thanks for report. What you can publish and what you can subscribe is
not necessarily same (we can write to relations which we can't capture
from wal, for example unlogged table can't be published but can be
subscribed).

However, the foreign tables indeed can't be subscribed. I originally
planned to have foreign tables allowed on subscriber but it turned out
to be more complex to implement than I had anticipated do I ripped the
code for that from the original patch.

We do check for this, but only during replication which we have to do
because the fact that relation 't' was foreign table during ALTER
SUBSCRIPTION does not mean that it won't be something else half hour later.

I think it does make sense to add check for this into CREATE/ALTER
SUBSCRIBER though so that user is informed immediately about the mistake
rather than by errors in the logs later.

I'll look into writing patch for this. I don't think it's beta blocker
though.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-05-11 Thread Remi Colinet

That's a good point.

A command is more straightforward because it targets only one backend.
The user is supposed to know which backend pid is taking a long time to
complete based on pg_stat_activity().

This is somehow the same approach as EXPLAIN command.
But the use is limited to psql utility. And this adds one more command.

I see 2 possible choices:

1 - either convert the command into a table.
This is the way it is done on Oracle database with v$session_longops view.
Obviously, this requires probing the status of each backend. This
inconvenient can be mitigated by using a threeshold of a few seconds before
considering a backend progression. v$session_longops only reports long
running queries after at least 6 seconds of execution.
This is less efficient that targeting directly a given pid or backend id.
But this is far better for SQL.

2 - either convert the command into a function
The advantage of a function is that it can accept parameters. So parameters
could be the pid of the backend, the verbosity level, the format (text,
json, ).
This would not reduce the options of the current command. And then a view
could be created on top of the function.

May be a mix of both a function with parameters and a view created on the
function is the solution.

Regards
Remi

2017-05-06 5:57 GMT+02:00 Jaime Casanova :

> On 5 May 2017 at 22:38, Vinayak Pokale  wrote:
> >
> > On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet 
> > wrote:
> >>
> >> Hello,
> >>
> >> I've implemented a new command named PROGRESS to monitor progression of
> >> long running SQL queries in a backend process.
> >>
> > Thank you for the patch.
> >
>
> sorry if i'm bikeshedding to soon but... why a command instead of a
> function?
> something like pg_progress_backend() will be in sync with
> pg_cancel_backend()/pg_terminate_backend() and the result of such a
> function could be usable by a tool to examine a slow query status
>
> --
> Jaime Casanova  www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Bug in pg_dump --table and --exclude-table for declarative partition table handling.

2017-05-11 Thread Tom Lane

Robert Haas  writes:
> On Thu, May 11, 2017 at 9:02 AM, Tom Lane  wrote:
>> You could argue that it would be better for pg_dump to emit something
>> like
>> 
>> CREATE TABLE c (...);
>> ALTER TABLE c INHERIT p;
>> 
>> so that if p isn't around, at least c gets created.  And I think it
>> *would* be better, probably.  But if that isn't a new feature then
>> I don't know what is.  And partitioning really ought to track the
>> behavior of inheritance here.

> Hmm, I think that'd actually be worse, because it would break the use
> case where you want to restore the old contents of one particular
> inheritance child.  So you drop that child (but not the parent or any
> other children) and then do a selective restore of that one child.
> Right now that works fine, but it'll fail with an error if we try to
> create the already-extant parent.

Uh ... what in that is creating the already-extant parent?  What
I'm envisioning is simply that the TOC entry for the child table
contains those two statements rather than "CREATE TABLE c (...)
INHERITS (p)".  The equivalent thing for the partitioned case is
to use a separate ATTACH PARTITION command rather than naming the
parent immediately in the child's CREATE TABLE.  This is independent
of the question of how --table and friends ought to behave.

> I think one answer to the original complaint might be to add a new
> flag to pg_dump, something like --recursive-selection, maybe -r for
> short, which makes --table, --exclude-table, and --exclude-table-data
> cascade to inheritance descendents.

Yeah, you could do it like that.  Another way to do it would be to
create variants of all the selection switches, along the lines of
"--table-all=foo" meaning "foo plus its children".  Then you could
have some switches recursing and others not within the same command.
But maybe that's more flexibility than needed ... and I'm having a
hard time coming up with nice switch names, anyway.

Anyway, I'm still of the opinion that it's fine to leave this as a
future feature.  If we've gotten away without it this long for
inherited tables, it's unlikely to be critical for partitioned tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] If subscription to foreign table valid ?

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 8:25 AM, tushar  wrote:
> I observed that -we cannot publish "foreign table" in Publication
>
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't1');
> CREATE FOREIGN TABLE
>
> postgres=# create publication pub for table t;
> ERROR:  "t" is not a table
> DETAIL:  Only tables can be added to publications.
> postgres=#
>
> but same thing is not true for Subscription
>
> postgres=# create foreign table t (n int) server db1_server options
> (table_name 't');
> CREATE FOREIGN TABLE
> postgres=# alter subscription sub refresh publication ;
> NOTICE:  added subscription for table public.t
> ALTER SUBSCRIPTION
>
> Is this an expected behavior ?   if we cannot publish then how  can we  add
> subscription for it.

This is not a complete test case, but it does appear to be odd behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] New command to monitor progression of long running queries

2017-05-11 Thread Remi Colinet

Do you have more details about the failed tests?

Regards,
Remi

2017-05-06 5:38 GMT+02:00 Vinayak Pokale :

>
> On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet 
> wrote:
>
>> Hello,
>>
>> I've implemented a new command named PROGRESS to monitor progression of
>> long running SQL queries in a backend process.
>>
>> Thank you for the patch.
>
> I am testing your patch but after applying your patch other regression
> test failed.
>
> $ make installcheck
> 13 of 178 tests failed.
>
> Regards,
> Vinayak
>

[HACKERS] Re: transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 3:38 AM, Noah Misch  wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

Kevin has not posted to this mailing list since May 3rd.  I don't know
whether he's gone on vacation or been without email access for some
other reason, but I think we'd better assume that he's not likely to
respond to emails demanding immediate action regardless of how many of
them we send.

I'm not prepared to write a patch for this issue, but it seems like
Thomas is on top of that.  If nobody else steps up to the plate I
guess I'm willing to take responsibility for reviewing and committing
that patch once it's in final form, but at this point I don't think
it's going to be possible to get that done before Monday's planned
wrap.

In formal terms, if Kevin forfeits ownership of this item and nobody
else volunteers to adopt it, put me down as owner with a next-update
date of Friday, May 19th, the day after beta1 is expected to ship.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-05-11 Thread Tom Lane

Heikki Linnakangas  writes:
> I updated the List of Drivers in the Wiki. I added a few drivers that 
> were missing, like the ODBC driver, and the pgtclng driver, as well as a 
> Go and Rust driver that I'm aware of. I reformatted it, and added a 
> column to indicate whether each driver uses libpq or not.

> https://wiki.postgresql.org/wiki/List_of_drivers

> There is a similar list in our docs:

> https://www.postgresql.org/docs/devel/static/external-interfaces.html

> Should we update the list in the docs, adding every driver we know of? 
> Or curate the list somehow, adding only more popular drivers? Or perhaps 
> add a link to the Wiki page from the docs?

Certainly, if that list is out of date, we need to do something about it.

ISTM that the list probably doesn't change fast enough that tracking it
in our SGML docs is a huge problem.  I think it'd likely be good enough
to just add the missing drivers to the list in the docs.  (If you do,
please back-patch that.)

> We can use this list in the Wiki to track which drivers implement SCRAM.

+1.  That *is* something likely to change faster than we can update
the SGML docs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_dump --table and --exclude-table for declarative partition table handling.

2017-05-11 Thread Robert Haas

On Thu, May 11, 2017 at 9:02 AM, Tom Lane  wrote:
> Jeevan Ladhe  writes:
>> On Thu, May 11, 2017 at 4:47 PM, Ashutosh Bapat <
>>> We add PARTITION OF clause for a table which is partition, so if the
>>> parent is not present while restoring, the restore is going to fail.
>
>> +1
>> But, similarly for inheritance if we dump a child table, it's dump is
>> broken as
>> of today. When we dump a child table we append "inherits(parenttab)" clause
>> at
>> the end of the DDL. Later when we try to restore this table on a database
>> not
>> having the parenttab, the restore fails.
>> So, I consider this as a bug.
>
> It sounds exactly what I'd expect.  In particular, given that pg_dump has
> worked that way for inherited tables from the beginning, it's hard to see
> any must-fix bugs here.

I agree.

> You could argue that it would be better for pg_dump to emit something
> like
>
> CREATE TABLE c (...);
> ALTER TABLE c INHERIT p;
>
> so that if p isn't around, at least c gets created.  And I think it
> *would* be better, probably.  But if that isn't a new feature then
> I don't know what is.  And partitioning really ought to track the
> behavior of inheritance here.

Hmm, I think that'd actually be worse, because it would break the use
case where you want to restore the old contents of one particular
inheritance child.  So you drop that child (but not the parent or any
other children) and then do a selective restore of that one child.
Right now that works fine, but it'll fail with an error if we try to
create the already-extant parent.

I think one answer to the original complaint might be to add a new
flag to pg_dump, something like --recursive-selection, maybe -r for
short, which makes --table, --exclude-table, and --exclude-table-data
cascade to inheritance descendents.  Then if you want to dump your
partition table's definition without picking up the partitions, you
can say pg_dump -t splat, and if you want the children as well you can
say pg_dump -r -t splat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 128 matches

Mail list logo