date:20170608

Re: [HACKERS] strcmp() tie-breaker for identical ICU-collated strings

2017-06-08 Thread Amit Khandekar

On 2 June 2017 at 23:52, Peter Geoghegan  wrote:
> On Fri, Jun 2, 2017 at 10:34 AM, Amit Khandekar  
> wrote:
>> Ok. I was thinking we are doing the tie-breaker because specifically
>> strcoll_l() was unexpectedly returning 0 for some cases. Now I get it,
>> that we do that to be compatible with texteq().
>
> Both of these explanations are correct, in a way. See commit 656beff.
>
>> Secondly, I was also considering if ICU especially has a way to
>> customize an ICU locale by setting some attributes which dictate
>> comparison or sorting rules for a set of characters. I mean, if there
>> is such customized ICU locale defined in the system, and we use that
>> to create PG collation, I thought we might have to strictly follow
>> those rules without a tie-breaker, so as to be 100% conformant to ICU.
>> I can't come up with an example, or may there isn't one, but , say ,
>> there is a locale which is supposed to sort only by lowest comparison
>> strength (de@strength=1 ?? ). In that case, there might be many
>> characters considered equal, but PG < operator or > operator would
>> still return true for those chars.
>
> In the terminology of the Unicode collation algorithm, PostgreSQL
> "forces deterministic comparisons" [1]. There is a lot of information
> on the details of that within the UCA spec.
>
> If we ever wanted to offer a case insensitive collation feature, then
> we wouldn't necessarily have to do the equivalent of a full strxfrm()
> when hashing, at least with collations controlled by ICU. Perhaps we
> could instead use a collator whose UCOL_STRENGTH is only UCOL_PRIMARY
> to build binary sort keys, and leave the rest to a ucol_equal() call
> (within texteq()) that has the usual UCOL_STRENGTH for the underlying
> PostgreSQL collation.
>
> I don't think it would be possible to implement case insensitive
> collations by using some pre-existing ICU collation that is case
> insensitive. Instead, an implementation might directly vary collation
> strength of any given collation to achieve case insensitivity.
> PostgreSQL would know that this collation was case insensitive, so
> regular collations wouldn't need to change their
> behavior/implementation (to use ucol_equal() within texteq(), and so
> on).

Ah ok. Understood, thanks.


Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

2017-06-08 Thread Michael Paquier

On Fri, Jun 9, 2017 at 3:17 PM, Noah Misch  wrote:
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

I have sent an updated patch simplifying the locking here:
https://www.postgresql.org/message-id/cab7npqqthlp9gvd2242ephkobkdmd+03tsufvk3cvzsgarq...@mail.gmail.com
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: BUG #14680: startup process on standby encounter a deadlock of TwoPhaseStateLock when redo 2PC xlog

2017-06-08 Thread Noah Misch

On Sun, Jun 04, 2017 at 10:24:30PM +, Noah Misch wrote:
> On Thu, Jun 01, 2017 at 01:07:53AM -0700, Michael Paquier wrote:
> > On Wed, May 31, 2017 at 12:30 PM, Michael Paquier
> >  wrote:
> > > On Wed, May 31, 2017 at 6:57 AM, Tom Lane  wrote:
> > >> wangchuant...@huawei.com writes:
> > >>> startup process on standby encounter a deadlock of TwoPhaseStateLock 
> > >>> when
> > >>> redo 2PC xlog.
> > >>
> > >> Please provide an example of how to get into this state.
> > >
> > > That would help. Are you seeing in the logs something like "removing
> > > future two-phase state from memory for XXX" or "removing stale
> > > two-phase state from shared memory for XXX"?
> > >
> > > Even with that, the light-weight lock sequence taken in those code
> > > paths look definitely wrong to me, we should not take twice
> > > TwoPhaseStateLock in the same code path. I think that we should remove
> > > the lock acquisitions in RemoveGXact() and PrepareRedoRemove, and then
> > > upgrade the locks of PrescanPreparedTransactions() and
> > > StandbyRecoverPreparedTransactions() to be exclusive. We still need to
> > > keep a lock as CheckPointTwoPhase() may still be triggered by the
> > > checkpoint. Tom, what do you think?
> > 
> > Attached is what I was thinking about for reference. I just came back
> > from a long flight and I am pretty tired, so my brain may have missed
> > something. I'll take again a look at this issue on Monday, an open
> > item has been added for now.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-08 Thread Amit Langote

On 2017/06/08 18:43, Amit Langote wrote:
> On 2017/06/08 1:44, Robert Haas wrote:
>> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat
>>  wrote:
>>> In ATExecAttachPartition() there's following code
>>>
>>> 13715 partnatts = get_partition_natts(key);
>>> 13716 for (i = 0; i < partnatts; i++)
>>> 13717 {
>>> 13718 AttrNumber  partattno;
>>> 13719
>>> 13720 partattno = get_partition_col_attnum(key, i);
>>> 13721
>>> 13722 /* If partition key is an expression, must not skip
>>> validation */
>>> 13723 if (!partition_accepts_null &&
>>> 13724 (partattno == 0 ||
>>> 13725  !bms_is_member(partattno, not_null_attrs)))
>>> 13726 skip_validate = false;
>>> 13727 }
>>>
>>> partattno obtained from the partition key is the attribute number of
>>> the partitioned table but not_null_attrs contains the attribute
>>> numbers of attributes of the table being attached which have NOT NULL
>>> constraint on them. But the attribute numbers of partitioned table and
>>> the table being attached may not agree i.e. partition key attribute in
>>> partitioned table may have a different position in the table being
>>> attached. So, this code looks buggy. Probably we don't have a test
>>> which tests this code with different attribute order between
>>> partitioned table and the table being attached. I didn't get time to
>>> actually construct a testcase and test it.
> 
> There seem to be couple of bugs here:
> 
> 1. When partition's key attributes differ in ordering from the parent,
>predicate_implied_by() will give up due to structural inequality of
>Vars in the expressions.  By fixing this, we can get it to return
>'true' when it's really so.
> 
> 2. As you said, we store partition's attribute numbers in the
>not_null_attrs bitmap, but then check partattno (which is the parent's
>attribute number which might differ) against the bitmap, which seems
>like it might produce incorrect result.  If, for example,
>predicate_implied_by() set skip_validate to true, and the above code
>failed to set skip_validate to false where it should have, then we
>would wrongly end up skipping the scan.  That is, rows in the partition
>will contain null values whereas the partition constraint does not
>allow it.  It's hard to reproduce this currently, because with
>different ordering of attributes, predicate_refute_by() never returns
>true (as mentioned in 1 above), so skip_validate does not need to be
>set to false again.
> 
> Consider this example:
> 
> create table p (a int, b char) partition by list (a);
> 
> create table p1 (b char not null, a int check (a in (1)));
> insert into p1 values ('b', null);
> 
> Note that not_null_attrs for p1 will contain 1 corresponding to column b,
> which matches key attribute of the parent, that is 1, corresponding to
> column a.  Hence we end up wrongly concluding that p1's partition key
> column does not allow nulls.

[ ... ]

> I am working on a patch to fix the above mentioned issues and will post
> the same no later than Friday.

And here is the patch.

Thanks,
Amit
From fef05d64bfad370f79e9fc305eee3d430a8f5263 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 8 Jun 2017 14:01:34 +0900
Subject: [PATCH] Fixes around partition constraint handling when attaching

Failure to map attribute numbers in the partition key expressions to
to partition's would cause predicate_implied_by() to unnecessarily
return 'false', in turn causing the failure to skip the validation
scan.

Conversely, failure to map partition's NOT NULL column's attribute
numbers to parent's might cause incorrect conclusion about which
columns of the partition being attached must have NOT NULL constraint
defined on them.

Rearrange code and comments a little around this area to make things
clearer.
---
 src/backend/commands/tablecmds.c  | 109 ++
 src/test/regress/expected/alter_table.out |  32 +
 src/test/regress/sql/alter_table.sql  |  31 +
 3 files changed, 113 insertions(+), 59 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b61fda9909..481bc97155 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13409,6 +13409,7 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
List   *childrels;
TupleConstr *attachRel_constr;
List   *partConstraint,
+  *partConstraintOrig,
   *existConstraint;
SysScanDesc scan;
ScanKeyData skey;
@@ -13574,14 +13575,24 @@ ATExecAttachPartition(List **wqueue, Relation rel, 
PartitionCmd *cmd)
partConstraint = list_make1(make_ands_explicit(partConstraint));
 
/*
+* Adjust the generated constraint to match this partition's attribute
+* numbers.  Save the original

[HACKERS] Re: [BUGS] BUG #14682: row level security not work with partitioned table

2017-06-08 Thread Noah Misch

On Wed, Jun 07, 2017 at 08:45:20AM -0700, Joe Conway wrote:
> On 06/07/2017 06:49 AM, Mike Palmiotto wrote:
> > I ended up narrowing it down to 4 tables (one parent and 3 partitions)
> > in order to demonstrate policy sorting and order of RLS/partition
> > constraint checking. It should be much more straight-forward now, but
> > let me know if there are any further recommended changes.
> 
> Thanks, will take a look towards the end of the day.

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Coverage improvements of src/bin/pg_basebackup and pg_receivewal --endpos

2017-06-08 Thread Michael Paquier

On Wed, Jun 7, 2017 at 9:20 AM, Michael Paquier
 wrote:
> While it is possible to tackle some of those issues independently,
> like pg_basebackup stuff, it seems to me that it would be helpful to
> make the tests more deterministic by having an --endpos option for
> pg_receivewal, similarly to what exists already in pg_recvlogical.
>
> Thoughts?

I have just played with that, and attached is a patch to implement the
so-said option with a basic set of tests, increasing code coverage of
pg_receivewal.c from 15% to 60%. I'll park that in the next CF of
September.
-- 
Michael


0001-Implement-pg_receivewal-endpos.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Alter subscription..SET - NOTICE message is coming for table which is already removed

2017-06-08 Thread Noah Misch

On Tue, Jun 06, 2017 at 03:21:34PM -0400, Peter Eisentraut wrote:
> On 6/3/17 01:04, Noah Misch wrote:
> > On Wed, May 31, 2017 at 10:15:50PM -0400, Peter Eisentraut wrote:
> >> On 5/30/17 13:25, Masahiko Sawada wrote:
> >>> I think this cause is that the relation status entry could be deleted
> >>> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
> >>> starting. Attached patch fixes issues reported on this thread so far.
> >>
> >> This looks like a reasonable change, but it conflicts with the change
> >> discussed on "logical replication - still unstable after all these
> >> months".  I think I'll deal with that one first.
> > 
> > The above-described topic is currently a PostgreSQL 10 open item.  You own
> > this open item.  Please observe the policy on open item ownership and send a
> > status update within three calendar days of this message.  Include a date 
> > for
> > your subsequent status update.
> 
> I'm working on this now and will report back tomorrow.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is past due for
your status update.  Please reacquaint yourself with the policy on open item
ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-10 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender & parallelism

2017-06-08 Thread Noah Misch

On Wed, Jun 07, 2017 at 10:54:57PM -0700, Noah Misch wrote:
> On Fri, Jun 02, 2017 at 11:06:29PM -0700, Andres Freund wrote:
> > On 2017-06-02 22:12:46 -0700, Noah Misch wrote:
> > > On Fri, Jun 02, 2017 at 11:27:55PM -0400, Peter Eisentraut wrote:
> > > > On 5/31/17 23:54, Peter Eisentraut wrote:
> > > > > On 5/29/17 22:01, Noah Misch wrote:
> > > > >> On Tue, May 23, 2017 at 01:45:59PM -0400, Andres Freund wrote:
> > > > >>> On May 23, 2017 1:42:41 PM EDT, Petr Jelinek 
> > > > >>>  wrote:
> > > >  Hi,
> > > > 
> > > >  so this didn't really move anywhere AFAICS, do we think the 
> > > >  approach
> > > >  I've chosen is good or do we want to do something else here?
> > > > >>>
> > > > >>> Can you add it to the open items list?
> > > > >>
> > > > >> [Action required within three days.  This is a generic notification.]
> > > > > 
> > > > > I have posted an update.  The next update will be Friday.
> > > > 
> > > > Andres is now working on this.  Let's give him a bit of time. ;-)
> > > 
> > > If you intended this as your soon-due status update, it is missing a 
> > > mandatory
> > > bit.
> > 
> > I'm now owning this item.  I've posted patches, await review.  If none
> > were to be forthcoming, I'll do another pass Monday, and commit.
> 
> This PostgreSQL 10 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-10 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How to refer to resource files from UDFs written in C

2017-06-08 Thread Supun Nakandala

Hi hackers,

I am trying to extend PostgreSQL by adding UDT and UDF for a custom use
case and I am using C language extensions to do that.

However, I have a requirement of reading a text file from one of the C
functions. The compiled *.so files are placed in the "pg_config
--pkglibdir" directory and tried copying my text files there but it didn't
work. I found that, when these shared libs are loaded they are run from a
different working directory. In this case, what is the best way to refer to
my text files from the C code other than giving the absolute path which can
change from system to system.

Thank you.
Supun

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-08 Thread Amit Langote

On 2017/06/08 19:25, Ashutosh Bapat wrote:
> On Thu, Jun 8, 2017 at 3:13 PM, Amit Langote
>>> I think this code could be removed entirely in light of commit
>>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7.
>>
>> I am assuming you think that because now we emit IS NOT NULL constraint
>> internally for any partition keys that do not allow null values (including
>> all the keys of range partitions as of commit
>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7).  But those IS NOT NULL
>> constraint expressions are inconclusive as far as the application of
>> predicate_implied_by() to determine if we can skip the scan is concerned.
>> So even if predicate_implied_by() returned 'true', we cannot conclude,
>> just based on that result, that there are not any null values in the
>> partition keys.
> 
> I am not able to understand this. Are you saying that
> predicate_implied_by() returns true even when it's not implied when
> NOT NULL constraints are involved? That sounds like a bug in
> predicate_implied_by(), which should be fixed instead of adding code
> to pepper over it?

No, it's not a bug of predicate_implied_by().  I meant to say
predicate_implied_by() isn't exactly designed for ATExecAttachPartition's
purpose, especially its treatment of IS NOT NULL constraints is not
suitable for this application.  To prove that the table cannot contain
NULLs when it shouldn't because of the partition constraint, we must look
for explicit NOT NULL constraints on the partition key columns, instead of
relying on the predicate_implied_by()'s proof.  See the following
explanation for why that is so (or at least I think is so):

There is this text in the header comment of
predicate_implied_by_simple_clause(), which is where the individual pairs
of expressions are compared and/or passed to operator_predicate_proof(),
which mentions that the treatment of IS NOT NULL predicate is based on the
assumption that 'restrictions' that are passed to predicate_implied_by()
are a query's WHERE clause restrictions, *not* CHECK constraints that are
checked when inserting data into a table.

 * When the predicate is of the form "foo IS NOT NULL", we can conclude that
 * the predicate is implied if the clause is a strict operator or function
 * that has "foo" as an input.  In this case the clause must yield NULL when
 * "foo" is NULL, which we can take as equivalent to FALSE because we know
 * we are within an AND/OR subtree of a WHERE clause.  (Again, "foo" is
 * already known immutable, so the clause will certainly always fail.)
 * Also, if the clause is just "foo" (meaning it's a boolean variable),
 * the predicate is implied since the clause can't be true if "foo" is NULL.

As mentioned above, note the following part: which we can take as
equivalent to FALSE because we know we are within an AND/OR subtree of a
WHERE clause.

Which is not what we are passing to predicate_implied_by() in
ATExecAttachPartition().  We are passing it the table's CHECK constraint
clauses which behave differently for the NULL result on NULL input - they
*allow* the row to be inserted.  Which means that there will be rows with
NULLs in the partition key, even if predicate_refuted_by() said that there
cannot be.  We will end up *wrongly* skipping the validation scan if we
relied on just the predicate_refuted_by()'s result.  That's why there is
code to check for explicit NOT NULL constraints on the partition key
columns.  If there are, it's OK then to assume that all the partition
constraints are satisfied by existing constraints.  One more thing: if any
partition key happens to be an expression, which there cannot be NOT NULL
constraints for, we just give up on skipping the scan, because we don't
have any declared knowledge about whether those keys are also non-null,
which we want for partitions that do not accept null values.

Does that make sense?

Thanks,
Amit

PS: Also interesting to note is the difference in behavior between
ExecQual() and ExecCheck() on NULL result.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-08 Thread Mithun Cy

I have merged Rafia's patch for cosmetic changes. I have also fixed
some of the changes you have recommended over that. But kept few as it
is since Rafia's opinion was needed on that.

On Wed, Jun 7, 2017 at 5:57 PM, Robert Haas  wrote:
> My experience is the opposite.  If I Google "from another database",
> including the quotes, I get 516,000 hits; if I do the same using "of
> another database", I get 110,000 hits.  I think that means the former
> usage is more popular, although obviously to some degree it's a matter
> of taste.

-- Open.

>
> + *database, tablespace, filenode, forknum, blocknum in
>
> The file doesn't contain the spaces you added here, which is probably
> why Mithun did it as he did, but actually we don't really need this
> level of detail in the file header comment anyway.  I think you could
> drop the specific mention of how blocks are identified.

-- Fixed. It has been removed now.

> + * GUC variable to decide if autoprewarm worker has to be started when
>
> if->whether? has to be->should be?

> Actually this patch uses "if" in various places where I would use
> "whether", but that's probably a matter of taste to some extent.

-- Fixed. I have changed from if to whether.

>
> - * Start of prewarm per-database worker. This will try to load blocks of one
> - * database starting from block info position state->prewarm_start_idx to
> - * state->prewarm_stop_idx.
> + * Connect to the database and load the blocks of that database starting from
> + * the position state->prewarm_start_idx to state->prewarm_stop_idx.
>
> That's a really good rephrasing.  It's more compact and more
> imperative.  The only thing that seems a little off is that you say
> "starting from" and then mention both the start and stop indexes.
> Maybe say "between" instead.

-- It is a half-open interval so rewrote it within the notation [,)

>
> - * Quit if we've reached records for another database. Unless the
> - * previous blocks were of global objects which were combined with
> - * next database's block infos.
> + * Quit if we've reached records for another database. If previous
> + * blocks are of some global objects, then continue pre-warming.
>
> Good.
>
> - * When we reach a new relation, close the old one.  Note, however,
> - * that the previous try_relation_open may have failed, in which case
> - * rel will be NULL.
> + * As soon as we encounter a block of a new relation, close the old
> + * relation. Note, that rel will be NULL if try_relation_open failed
> + * previously, in that case there is nothing to close.
>
> I wrote the original comment here, so you may not be too surprised to
> here that I liked it as it was.  Actually, your rewrite of the first
> sentence seems like it might be better, but I think the second one is
> not.  By deleting "however", you've made the comma after "Note"
> redundant, and by changing "in which case" to "in that case", you've
> made a dependent clause into a comma splice.  I hate comma splices.
>
> https://en.wikipedia.org/wiki/Comma_splice

-- Open

> + * $PGDATA/AUTOPREWARM_FILE to a dsm. Further, these BlockInfoRecords are
> I would probably capitalize DSM here.  There's no data structure
> called dsm (lower-case) and we have other examples where it is
> capitalized in documentation and comment text (see, e.g.
> custom-scan.sgml).

-- Fixed.

>
> + * Since there could be at most one worker for prewarm, locking is not
>
> could -> can?

-- Fixed.

>
> - * For block info of a global object whose database will be 0
> - * try to combine them with next non-zero database's block
> - * infos to load.
> + * For the BlockRecordInfos of a global object, combine them
> + * with BlockRecordInfos of the next non-global object.
>
> Good.  Or even "Combine BlockRecordInfos for a global object with the
> next non-global object", which I think is even more compact.

-- Fixed.

>
> + * If we are here with database having InvalidOid, then only
> + * BlockRecordInfos belonging to global objects exist. Since, we can
> + * not connect with InvalidOid skip prewarming for these objects.
>
> If we reach this point with current_db == InvalidOid, ...

--Fixed.

>
> + *Sub-routine to periodically call dump_now().
>
> Every existing use of the word subroutine in our code based spells it
> with no hyphen.
>
> [rhaas pgsql]$ git grep -- Subroutine | wc -l
>   47
> [rhaas pgsql]$ git grep -- Sub-routine | wc -l
>0
>
> Personally, I find this change worse, because calling something a
> subroutine is totally generic, like saying that you met a "person" or
> that something was "nice".  Calling it a loop gives at least a little
> bit of specific information.

-- Fixed. We call it a loop now.

>
> + * Call dum_now() at regular intervals defined by GUC variable dump_interval.
>
> This chan

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-08 Thread Mithun Cy

On Tue, Jun 6, 2017 at 3:48 PM, Neha Sharma
 wrote:
> Hi,
>
> I have been testing this feature for a while and below are the observations
> for few scenarios.
>
> Observation:
> scenario 1: If we set pg_prewarm.dump_interval = -1.0,we get an additional
> warning message in logfile and instead of ending the task of auto-dump it
> executes successfully.
> [centos@test-machine bin]$ more logfile
> 2017-06-06 08:39:53.127 GMT [21905] WARNING:  invalid value for parameter
> "pg_prewarm.dump_interval": "-1.0"
> 2017-06-06 08:39:53.127 GMT [21905] HINT:  Valid units for this parameter
> are "ms", "s", "min", "h", and "d".
>
> postgres=# show pg_prewarm.dump_interval;
>  pg_prewarm.dump_interval
> --
>  5min
> (1 row)
>
> scenario 2: If we set pg_prewarm.dump_interval = 0.0,we get an additional
> warning message in logfile and the message states that the task was started
> and the worker thread it is also active,but the dump_interval duration is
> set to default 5 min (300 sec) instead of 0.
>
> [centos@test-machine bin]$ ps -ef | grep prewarm
> centos   21980 21973  0 08:54 ?00:00:00 postgres: bgworker:
> autoprewarm
>
> [centos@test-machine bin]$ more logfile
> 2017-06-06 09:20:52.436 GMT [3] WARNING:  invalid value for parameter
> "pg_prewarm.dump_interval": "0.0"
> 2017-06-06 09:20:52.436 GMT [3] HINT:  Valid units for this parameter
> are "ms", "s", "min", "h", and "d".
> postgres=# show pg_prewarm.dump_interval;
>  pg_prewarm.dump_interval
> --
>  5min
> (1 row)

dump_interval is int type custom GUC variable so passing floating
value is illegal here, but the reason we are only throwing a warning
and setting it to default 5 mins is that of existing behavior

@define_custom_variable

/*
 * Assign the string value(s) stored in the placeholder to the real
 * variable.  Essentially, we need to duplicate all the active and stacked
 * values, but with appropriate validation and datatype adjustment.
 *
 * If an assignment fails, we report a WARNING and keep going.  We don't
 * want to throw ERROR for bad values, because it'd bollix the add-on
 * module that's presumably halfway through getting loaded.  In such cases
 * the default or previous state will become active instead.
 */

/* First, apply the reset value if any */
if (pHolder->reset_val)
(void) set_config_option(name, pHolder->reset_val,
 pHolder->gen.reset_scontext,
 pHolder->gen.reset_source,
 GUC_ACTION_SET, true, WARNING, false);

I think this should be fine as it is defined behavior for all of the
add-on modules. Please let me know if you have any suggestions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Transition tables vs ON CONFLICT

2017-06-08 Thread Thomas Munro

[Moving this to its own thread, for earlier discussion see the
transition-tables-vs-wCTE thread[1].]

On Wed, Jun 7, 2017 at 10:47 AM, Peter Geoghegan  wrote:
> I suppose you'll need two tuplestores for the ON CONFLICT DO UPDATE
> case -- one for updated tuples, and the other for inserted tuples.

On Fri, Jun 9, 2017 at 11:28 AM, Thomas Munro
 wrote:
> I discussed this off-list with Andrew Gierth and we came up with a
> fourth way:  Use separate insert and update tuplestores (as originally
> suggested by Peter) and use the  (INSERT, UPDATE) to
> decide which one a trigger should see, as described in option 2 above,
> but disallow INSERT OR UPDATE triggers with transition tables so that
> we don't have to choose any of the surprising behaviours described
> above.  Triggers with multiple s are a PostgreSQL
> extension, so by not allowing them with transition tables we don't
> reduce our compliance.  If you want to be invoked twice when you run
> ON CONFLICT statements (like option 3 above) then you'll need to
> create two triggers, one for INSERT and the other for UPDATE, and each
> will see only the transition tuples resulting from inserts or updates
> respectively.
>
> The door is still open for us to allow INSERT OR UPDATE with
> transition tables in future releases if someone can figure out what
> that should do.

Here is a patch implementing the above.  It should be applied on top
of transition-tuples-from-wctes-v2.patch[2].

This is patch 3 of a stack of 3 patches addressing currently known
problems with transition tables.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D3HZY%2B2Vr5P3pvVYfKLrwhPWT6vGLtBOeCh6K5Cwb8L7w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAEepm%3D2ZQ%2BmujsvWXhOqaNxpc2-0hDev6q7a%2BXrbOn2%3Dcr7%3D0A%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-on-conflict-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Thomas Munro

On Wed, Jun 7, 2017 at 5:36 PM, Thomas Munro
 wrote:
> I spent a couple of hours drafting a proof-of-concept to see if my
> hunch was right.  It seems to work correctly so far and isn't huge
> (but certainly needs more testing and work):
>
>  6 files changed, 156 insertions(+), 109 deletions(-)

[Adding Andrew Gierth]

Here is a new version of the patch to fix transition tables with
wCTEs, rebased on top of
transition-tuples-from-child-tables-v10.patch[1] which must be applied
first.

This is patch 2 of a stack of 3 patches addressing currently known
problems with transition tables.

[1] 
https://www.postgresql.org/message-id/CAEepm%3D1Ei_0yN%2BvKTHHsTYdajaY59LBMUunxmpfhBU-eQQzqxA%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


transition-tuples-from-wctes-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken (was: Declarative partitioning - another take)

2017-06-08 Thread Thomas Munro

On Mon, May 22, 2017 at 5:51 PM, Amit Langote
 wrote:
> On 2017/05/20 9:01, Thomas Munro wrote:
>> Sent too soon.  Several variables should also be renamed to make clear
>> they refer to the transition capture state in effect, instead of vague
>> names like 'transitions'.  Sorry for the version churn.
>
> Ah, I was kind of getting distracted by it earlier too; thanks.
>
> Anyway, the patch looks good to me.

[Adding Andrew Gierth]

Here is a rebased version of the patch to fix transition tables with
inheritance.  Fixes a typo in an error message ("not support on
partitions" -> "... supported ..."), and changes regression test
triggers to be single-event (only one of INSERT, UPDATE or DELETE),
because a later patch will not allow multi-event triggers with TTs.

This is patch 1 of a stack of 3 patches addressing currently known
problems with transition tables.

-- 
Thomas Munro
http://www.enterprisedb.com

transition-tuples-from-child-tables-v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-08 23:05:53 -0400, Tom Lane wrote:
>> ... The first
>> attached patch does it that way, and it seems nice and clean, but I ran
>> into a complete dead end while trying to extend it to handle related cases
>> such as disallowing SRF-inside-aggregate or nested SRFs in FROM.

> But, do we really need to handle those?  IOW, isn't handling
> CASE/COALESCE, which we can recognize early in parse analysis,
> sufficient to error out here?  It'd be a lot nicer if we could error
> about SRF arguments to aggregates et al. during analysis rather than
> execution, but there's not a comparably huge need to improve there.

Yes, we already have guards for those cases, but they return fairly opaque
error messages to the tune of "set-valued function called in context that
cannot accept a set", because the executor hasn't enough context to do
better.  I'd like the messages to be more specific, like "set-valued
function cannot appear within CASE" and so on.  Anyway the point here is
to evaluate different approaches to changing the parser on the basis of
whether they *could* be extended to handle related cases.  Whether we
actually do that right now is less important.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-08 Thread Amit Khandekar

On 7 June 2017 at 20:19, Amit Khandekar  wrote:
> On 7 June 2017 at 16:42, Amit Khandekar  wrote:
>> The column bitmap set returned by GetUpdatedColumns() refer to
>> attribute numbers w.r.t. to the root partition. And the
>> mstate->resultRelInfo[] have attnos w.r.t. to the leaf partitions. So
>> we need to do something similar to map_partition_varattnos() to change
>> the updated columns attnos to the leaf partitions
>
> I was wrong about this. Each of the mtstate->resultRelInfo[] has its
> own corresponding RangeTblEntry with its own updatedCols having attnos
> accordingly adjusted to refer its own table attributes. So we don't
> have to do the mapping; we need to get modifedCols separately for each
> of the ResultRelInfo, rather than the root relinfo.
>
>> and walk down the
>> partition constraint expressions to find if the attnos are present
>> there.
>
> But this we will need to do.

Attached is v9 patch. This covers the two parts discussed upthread :
1. Prevent triggers from causing the row movement.
2. Setup the tuple routing in ExecInitModifyTable(), but only if a
partition key is modified. Check new function IsPartitionKeyUpdate().

Have rebased the patch to consider changes done in commit
15ce775faa428dc9 to prevent triggers from violating partition
constraints. There, for the call to ExecFindPartition() in ExecInsert,
we need to fetch the mtstate->rootResultRelInfo in case the operation
is part of update row movement. This is because the root partition is
not available in the resultRelInfo for UPDATE.

Added many more test scenarios in update.sql that cover the above.

I am yet to test the concurrency part using isolation tester.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

update-partition-key_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread Andres Freund

Hi,

On 2017-06-08 23:05:53 -0400, Tom Lane wrote:
> I spent some time experimenting with this, and immediately found out
> that information_schema.user_mapping_options contains an instance of the
> problematic usage :-(.  However, that view also turns out to be a poster
> child for why our old SRF semantics are a bad idea, because it's on the
> hairy edge of failing completely. [...]

Yuck.


> Anyway, to the subject at hand.  My thought when I wrote the previous
> message was to implement a "top down" mechanism whereby contexts like
> CASE and COALESCE would record their presence in the ParseState while
> recursively analyzing their arguments, and then SRF calls would be
> responsible for throwing an error by inspecting that context.  The first
> attached patch does it that way, and it seems nice and clean, but I ran
> into a complete dead end while trying to extend it to handle related cases
> such as disallowing SRF-inside-aggregate or nested SRFs in FROM.

But, do we really need to handle those?  IOW, isn't handling
CASE/COALESCE, which we can recognize early in parse analysis,
sufficient to error out here?  It'd be a lot nicer if we could error
about SRF arguments to aggregates et al. during analysis rather than
execution, but there's not a comparably huge need to improve there.


> I'm inclined to go with the "bottom up" approach, but I wonder if anyone
> has any comments or better ideas?

I'll have a look at the patches tomorrow, but I'm too tired to
meaningfully read code tonight.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread Tom Lane

I wrote:
> As to *how* to throw an error, I think it should be possible to teach
> parse analysis to detect such cases, with something like the
> ParseExprKind mechanism that could be checked to see if we're inside
> a subexpression that restricts what's allowed.  There are some other
> checks like no-nested-aggregates that perhaps could be folded in as
> well.

I spent some time experimenting with this, and immediately found out
that information_schema.user_mapping_options contains an instance of the
problematic usage :-(.  However, that view also turns out to be a poster
child for why our old SRF semantics are a bad idea, because it's on the
hairy edge of failing completely.  It's got two SRF invocations in its
tlist, which it relies on to execute in lockstep and produce the same
number of rows --- but then it puts a CASE around one of them, with an
ELSE NULL.  So in old versions, that only works because if the CASE
doesn't return a set result at runtime then we just run the tlist the
number of times suggested by the other SRF.  And in HEAD, it only works
because we run the two SRFs in lockstep behind the scenes and then the
CASE is discarding individual results not the whole execution of the
second SRF.  If you don't have a headache at this point, re-read the above
until you do.  Or if you fully understand that and still think it's fine,
consider what will happen if the CASE's test expression generates
different results from one call to the next --- which I think is actually
possible here, because pg_has_role() operates on CatalogSnapshot time and
might possibly change its mind partway through the query.  Pre-v10, that
would have generated completely messed-up output, with a hard-to-predict
number of rows and unexpected combinations of the two SRFs' outputs.

Rewriting this view to use a LATERAL SRF call is just enormously cleaner.

Anyway, to the subject at hand.  My thought when I wrote the previous
message was to implement a "top down" mechanism whereby contexts like
CASE and COALESCE would record their presence in the ParseState while
recursively analyzing their arguments, and then SRF calls would be
responsible for throwing an error by inspecting that context.  The first
attached patch does it that way, and it seems nice and clean, but I ran
into a complete dead end while trying to extend it to handle related cases
such as disallowing SRF-inside-aggregate or nested SRFs in FROM.  The
trouble is that when we are parsing the arguments of a SRF or aggregate,
we don't actually know yet whether it's a SRF or aggregate or plain
function.  We can't find that out until we look up the pg_proc entry,
which we can't do without knowing the argument types, so we have to do
parse analysis of the arguments first.

I then considered a "bottom up" approach where the idea is for each SRF
to report its presence in the ParseState, and then the outer construct
complains if any SRF reported its presence within the outer construct's
arguments.  This isn't hard to implement, just keep a "p_last_srf" pointer
in the ParseState, as in the second attached patch.  This feels more ugly
than the first approach; for one thing, if we want to extend it to
multiple cases of "X cannot contain a Y" then we need a ParseState field
for each kind of Y we care about.  Also, it will tend to complain about
the last Y within a given X construct, not the first one; but only a true
geek would ever even notice that, let alone feel that it's strange.

I didn't actually fix the "bottom up" approach to complain about nested
SRFs.  It's clear to me how to do so, but because we analyze function
arguments before calling ParseFuncOrColumn, we'd have to have the callers
of that function remember the appropriate p_last_srf value (ie, from just
before they analyze the arguments) and then pass it to ParseFuncOrColumn.
That would add a bunch of uninteresting clutter to the patch so I didn't
do it here.

I'm inclined to go with the "bottom up" approach, but I wonder if anyone
has any comments or better ideas?

regards, tom lane

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index cbcd6cf..98bcfa0 100644
*** a/src/backend/catalog/information_schema.sql
--- b/src/backend/catalog/information_schema.sql
*** CREATE VIEW user_mapping_options AS
*** 2936,2947 
  SELECT authorization_identifier,
 foreign_server_catalog,
 foreign_server_name,
!CAST((pg_options_to_table(um.umoptions)).option_name AS sql_identifier) AS option_name,
 CAST(CASE WHEN (umuser <> 0 AND authorization_identifier = current_user)
 OR (umuser = 0 AND pg_has_role(srvowner, 'USAGE'))
!OR (SELECT rolsuper FROM pg_authid WHERE rolname = current_user) THEN (pg_options_to_table(um.umoptions)).option_value
   ELSE NULL END AS character_data) AS option_value
! FROM _pg_user_mappings um;

Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-08 Thread Mengxing Liu

Thank you very much! I follow your advice and here is the result. 

SerializableXactHashLock 73
predicate_lock_manager 605
WALWriteLock 3
SerializableFinishedListLock 665

There are more than 90 events each time.  
SerializableXactHashLock/SerializableFinishedListLock are both used in SSI. 
I think that's why PG is so slow in high contention environment.


> -Original Messages-
> From: "Robert Haas" 
> Sent Time: 2017-06-08 01:30:58 (Thursday)
> To: "Mengxing Liu" 
> Cc: kgrittn , "Alvaro Herrera" , 
> "pgsql-hackers@postgresql.org" 
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Jun 6, 2017 at 12:16 PM, Mengxing Liu
>  wrote:
> > I think disk I/O is not the bottleneck in our experiment, but the global 
> > lock is.
> 
> A handy way to figure this kind of thing out is to run a query like
> this repeatedly during the benchmark:
> 
> SELECT wait_event_type, wait_event FROM pg_stat_activity;
> 
> I often do this by using psql's \watch command, often \watch 0.5 to
> run it every half-second.  I save all the results collected during the
> benchmark using 'script' and then analyze them to see which wait
> events are most frequent.  If your theory is right, you ought to see
> that SerializableXactHashLock occurs as a wait event very frequently.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender termination error messages worse in v10

2017-06-08 Thread Alvaro Herrera

Petr Jelinek wrote:

> It took me a while to understand why walreceiver does this originally,
> but it did make sense to me once I understood the [...], so I did it
> the same way.

In other words, the coding pattern needs a comment in both places.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fix comment in shm_mq.c

2017-06-08 Thread Masahiko Sawada

Hi,

Attached patch for $subject.

/*
 * Set sender's latch, unless queue is detached.
 */
static shm_mq_result
shm_mq_notify_receiver(volatile shm_mq *mq)

I think shm_mq_notify_receiver() sets receiver's latch actually, not
sender's latch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_comment_in_shm_mq_c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-08 Thread Petr Jelinek

On 09/06/17 01:06, Andres Freund wrote:
> Hi,
> 
> On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:
>> One thing I don't like is the GetLastLocalTransactionId() that I had to
>> add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
>> but I don't have better solution there.
> 
> I dislike that quite a bit - how about we instead just change the
> information we keep in exportedSnapshots?  I.e. store a pointer to an
> struct ExportedSnapshot
> {
> char *exportedAs;
> Snapshot snapshot;
> }
> 
> Then we can get rid of that relatively ugly list_length() business too.
> 

That sounds reasonable, I'll do that.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender termination error messages worse in v10

2017-06-08 Thread Petr Jelinek

On 08/06/17 23:57, Andres Freund wrote:
> On 2017-06-03 00:55:22 +0200, Petr Jelinek wrote:
>> On 02/06/17 23:45, Andres Freund wrote:
>>> Hi Petr,
>>>
>>> On 2017-06-02 22:57:37 +0200, Petr Jelinek wrote:
 On 02/06/17 20:51, Andres Freund wrote:
> I don't understand why the new block is there, nor does the commit
> message explain it.
>

 Hmm, that particular change can actually be reverted. It was needed for
 one those custom replication commands which were replaced by normal
 query support. I have missed it during the rewrite.
>>>
>>> Doesn't appear to be quite that simple, I get regression test failures
>>> in that case.
>>>
>>
>> Hmm, looks like we still use it for normal COPY handling. So basically
>> the problem is that if we run COPY TO STDOUT and then consume it using
>> the libpqrcv_receive it will end with normal PGRES_COMMAND_OK but we
>> need to call PQgetResult() in that case otherwise libpq thinks the
>> command is still active and any following command will fail, but if we
>> call PQgetResult on dead connection we get that error you complained about.
> 
> Should this possibly handled at the caller level?  This is a bit too
> much magic for my taste.

It is, I agree with that. I guess the main issue with handling it in
callers is that callers can't access libpq api so there would need to be
some additional abstraction (we did some of the necessary parts of that
for libpqrcv_exec).

> 
> Looking at the callers, the new code isn't super-obvious either:
> 
> len = walrcv_receive(wrconn, &buf, &fd);
> 
> if (len != 0)
> {
> /* Process the data */
> for (;;)
> {
> CHECK_FOR_INTERRUPTS();
> 
> if (len == 0)
> {
> break;
> }
> else if (len < 0)
> {
> ereport(LOG,
> (errmsg("data stream from publisher has ended")));
> endofstream = true;
> break;
> }
> 
> The len < 0, hidden inside a len != 0, which in the loop again chcks if
> len == 0 (because it's decremented in loop)?  And there's two different[5~
>   len = walrcv_receive(wrconn, &buf, &fd);
> statements?
> 

The logic there is that you have one main loop and one busy loop. When
there is flow of data, things happen as the data are processed and we
don't need to do the bookkeeping of timeouts, latching, other
maintenance tasks that need to be done (like checking table
synchronization in logical replication) because it all happens naturally
and that's what the inner loop is for. But when the upstream gets idle
and does not send anything (or there is network issue, etc) we need to
do these things (otherwise we'd miss timeouts, table synchronization
worker would wait forever on apply, etc) and that's what the outer loop
is for.

We only enter the inner loop if there were some data that's why there is
the firs len != 0, but then the inner loop bases decisions on the return
code of the walrcv_receive() so the if is repeated again (it also will
call walrcv_receive() as long as there are data so it needs to check for
those calls too).

It took me a while to understand why walreceiver does this originally,
but it did make sense to me once I understood the split between the main
(idle) loop and busy loop, so I did it the same way.

>> I guess it would make sense to do conditional exit on
>> (PQstatus(streamConn) == CONNECTION_BAD) like libpqrcv_PQexec does. It's
>> quite ugly code-wise though.
> 
> I think that's fine for now.  It'd imo be a good idea to improve matters
> here a bit, but for now I've just applied your patch.
> 

Okay, thanks.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Thomas Munro

On Thu, Jun 8, 2017 at 11:50 AM, Thomas Munro
 wrote:
> On Thu, Jun 8, 2017 at 11:05 AM, Thomas Munro
>  wrote:
>> 1.  Keep the current behaviour. [...]
>>
>> 2.  Make a code change that would split the 'new table' tuplestore in
>> two: an insert tuplestore and an update tuplestore (for new images;
>> old images could remain in the old tuplestore that is also used for
>> deletes) as Peter suggests.  That raises two questions for me: (1)
>> Does it really make sense for the 'when' clause, which sounds like it
>> only controls when we fire a trigger, also to affect which transition
>> tuples it sees?  There is something slightly fishy about that.  (2)
>> Assuming yes, should an AFTER INSERT OR UPDATE trigger see the union
>> of the the two tuplestores?  Trigger authors would need to be aware a
>> single statement can produce a mixture of updates and inserts, but
>> only if they explicitly invite such problems into their lives with
>> that incantation.
>
> A third option would be for an AFTER INSERT OR UPDATE trigger to be
> invoked twice, once for the inserts and once again for the updates.
> No union required, but also surprising.
>
> Any other ideas?

I discussed this off-list with Andrew Gierth and we came up with a
fourth way:  Use separate insert and update tuplestores (as originally
suggested by Peter) and use the  (INSERT, UPDATE) to
decide which one a trigger should see, as described in option 2 above,
but disallow INSERT OR UPDATE triggers with transition tables so that
we don't have to choose any of the surprising behaviours described
above.  Triggers with multiple s are a PostgreSQL
extension, so by not allowing them with transition tables we don't
reduce our compliance.  If you want to be invoked twice when you run
ON CONFLICT statements (like option 3 above) then you'll need to
create two triggers, one for INSERT and the other for UPDATE, and each
will see only the transition tuples resulting from inserts or updates
respectively.

The door is still open for us to allow INSERT OR UPDATE with
transition tables in future releases if someone can figure out what
that should do.

I will draft a patch for this.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication busy-waiting on a lock

2017-06-08 Thread Andres Freund

Hi,

On 2017-06-03 04:50:00 +0200, Petr Jelinek wrote:
> One thing I don't like is the GetLastLocalTransactionId() that I had to
> add because we clear the proc->lxid before we get to AtEOXact_Snapshot()
> but I don't have better solution there.

I dislike that quite a bit - how about we instead just change the
information we keep in exportedSnapshots?  I.e. store a pointer to an
struct ExportedSnapshot
{
char *exportedAs;
Snapshot snapshot;
}

Then we can get rid of that relatively ugly list_length() business too.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-06-08 Thread Andres Freund

On 2017-06-07 11:51:12 +0200, Petr Jelinek wrote:
> On 07/06/17 03:00, Andres Freund wrote:
> > On 2017-06-06 19:36:13 +0200, Petr Jelinek wrote:
> > 
> >> As a side note, we are starting to have several IsSomeTypeOfProcess
> >> functions for these kind of things. I wonder if bgworker infrastructure
> >> should somehow provide this type of stuff (the proposed bgw_type might
> >> help there) as well as maybe being able to register interrupt handler
> >> for the worker (it's really hard to do it via custom SIGTERM handler as
> >> visible on worker, launcher and walsender issues we are fixing).
> >> Obviously this is PG11 related thought.
> > 
> > Maybe it's also a sign that the bgworker infrastructure isn't really the
> > best thing to use for internal processes like parallel workers and lrep
> > processes - it's really built so core code *doesn't* know anything
> > specific about them, which isn't really what we want in some of these
> > cases.  That's not to say that bgworkers and parallelism/lrep shouldn't
> > share infrastructure, don't get me wrong.
> >
> 
> Well the nice thing about bgworkers is that it provides the basic
> infrastructure.

Right.


> Main reason lrep stuff is using it is that I didn't want to add
> another special backend kind to 20 different places, but turns out it
> still needs to be in some. So either we need to add more support for
> these kind of things to bgworkers or write something for internal
> workers that shares the infrastructure with bgworkers as you say.

Yea, I think we should radically unify a lot of the related
infrastructure between all processes.  We've grown a lot of duplication.


> >> @@ -2848,6 +2849,14 @@ ProcessInterrupts(void)
> >>ereport(FATAL,
> >>(errcode(ERRCODE_ADMIN_SHUTDOWN),
> >> errmsg("terminating logical 
> >> replication worker due to administrator command")));
> >> +  else if (IsLogicalLauncher())
> >> +  {
> >> +  ereport(DEBUG1,
> >> +  (errmsg("logical replication launcher 
> >> shutting down")));
> >> +
> >> +  /* The logical replication launcher can be stopped at 
> >> any time. */
> >> +  proc_exit(0);
> >> +  }
> > 
> > We could use PgBackendStatus->st_backendType for these, merging these
> > different paths.
> > 
> 
> Hmm, that's not that easy, st_backendType will be generic type for
> bgworker as the bgw_type patch didn't land yet (which is discussed in
> yet another thread [1]). It seems like an argument for going forward
> with it (the bgw_type patch) in PG10.

Yea.  I left it as is for now.


> I don't mind, it has some overlap with your proposed fixes for latching
> so if you are willing go ahead.

Done.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Tue, Jun 6, 2017 at 8:19 PM, Peter Geoghegan  wrote:
> On Tue, Jun 6, 2017 at 5:01 PM, Peter Geoghegan  wrote:
>> Also, ISTM that the code within ENRMetadataGetTupDesc() probably
>> requires more explanation, resource management wise.

Why so?  I'm not saying you're wrong, but I don't see what's confusing
about the status quo.  I may be missing something important.

> Also, it's not clear why it should be okay that the new type of
> ephemeral RTEs introduced don't have permissions checks. There are
> currently cases where the user cannot see data that they inserted
> themselves (e.g., through RETURNING), on the theory that a before row
> trigger might have modified the final contents of the tuple in a way
> that the original inserter isn't supposed to know details about.

Right, but the trigger has to be have been created by someone who has
TRIGGER permission on the table, and such an individual can trivially
steal the entire contents of every table row subsequently touched by
any DML statement by installing a FOR-EACH-ROW trigger that records
the contents of the tuple someplace.  So the fact that they can now
also do that by installing a FOR-EACH-STATEMENT trigger doesn't seem
to be expanding the vulnerability surface.  If anything, the new thing
gives trigger-creators less power than they have already, since the
tuplestore only lets you see what tuples were modified, whereas a
before-row trigger could block or alter the modifications.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-08 Thread Peter Geoghegan

On Wed, Jun 7, 2017 at 12:34 PM, Alex K  wrote:
> (1) One of my mentors--Alvaro Herrera--suggested me to have a look on the
> UPSERT.

> It may be a good point to be able to achieve the same functionality
> as during the ON CONFLICT DO NOTHING, when COPY actually inserts tuples
> and errors handling is turned on. It could additionally reduce number of
> failed
> subtransactions and reduce XIDs consumption, while still ignoring some
> common
> errors like unique index violation.

Alvaro and I talked about this informally at PGCon.

> Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seems
> to be a large separated task and is out of the current project scope, but
> maybe there is
> a relatively simple way to somehow perform internally tuples insert with
> ON CONFLICT DO NOTHING? I have added Peter Geoghegan to cc, as
> I understand he is the major contributor of UPSERT in PostgreSQL. It would
> be great
> if he will answer this question.

I think that there is a way of making COPY use "speculative
insertion", so that it behaves the same as ON CONFLICT DO NOTHING with
no inference specification. Whether or not this is useful depends on a
lot of things.

You seem to be talking about doing this as an optimization on top of a
base feature that does the main thing you want (captures all errors
within an implementation level subxact without passing them to the
client). That could make sense, as a way of preventing extreme bloat
for a very bad case where almost all inserts have conflicts. (This
seems quite possible, whereas it seems much less likely that users
would have an input file simple full of illformed tuples.)

I think that you need to more formally identify what errors your new
COPY error handling will need to swallow. I'm not sure if it's
possible to avoid using subtransactions all together, but speculative
insertion would help if you find that you can do it without
subtransactions. Using subtransactions is always going to be a bit
ugly, because you'll need to continually reassess whether or not
you're batching insertions together at the right granularity (that is,
that you've weighed the rate of XID consumption against how much work
you lose when a batched transaction has to be "replayed" to include
things that are known to be valid). And, if you care about duplicate
violations, then you can't really be sure that replaying a "known
good" tuple will stay good from one moment to the next.

My advice right now is: see if you can figure out a way of doing what
you want without subtransactions at all, possibly by cutting some
scope. For example, maybe it would be satisfactory to have the
implementation just ignore constraint violations, but still raise
errors for invalid input for types. Is there really much value in
ignoring errors due to invalid encoding? It's not as if such problems
can be reliably detected today. If you use the wrong encoding, and
ignore some errors that COPY would generally raise, then there is an
excellent chance that you'll still insert some remaining rows with
text that has been incorrectly interpreted as valid in the database
encoding -- some text datums are bound to accidentally appear valid.
There are probably similar issues with other types. It's not clear
what the point is at which the user is no longer helped by ignoring
problems, because we cannot reliably detect *all* problems at the
level of each row.

If you must ignore errors within the input functions of types, then
maybe you can optionally let the user do that by way of a "dry run",
where the entire input file is examined for basic structural soundness
ahead of considering constraints. Any errors are saved then and there,
in a format that can be used to make sure that those entries are
skipped on a later COPY. As a further enhancement, in the future, the
user might then be able to define special transform functions that
correct the errors for those rows only. You kind of need to see all
the faulty rows together to do something like that, so a dry run could
make a lot of sense.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 5:01 PM, Andrew Gierth
 wrote:
>> "Robert" == Robert Haas  writes:
>  Robert> unless some other committer volunteers.  (Of course, anyone
>  Robert> could step in to do the work, as Thomas already has to a
>  Robert> considerable degree, but without a committer involved it
>  Robert> doesn't fix the problem.)
>
> I can probably take this on if needed.

I would certainly not complain if you are willing to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] walsender termination error messages worse in v10

2017-06-08 Thread Andres Freund

On 2017-06-03 00:55:22 +0200, Petr Jelinek wrote:
> On 02/06/17 23:45, Andres Freund wrote:
> > Hi Petr,
> > 
> > On 2017-06-02 22:57:37 +0200, Petr Jelinek wrote:
> >> On 02/06/17 20:51, Andres Freund wrote:
> >>> I don't understand why the new block is there, nor does the commit
> >>> message explain it.
> >>>
> >>
> >> Hmm, that particular change can actually be reverted. It was needed for
> >> one those custom replication commands which were replaced by normal
> >> query support. I have missed it during the rewrite.
> > 
> > Doesn't appear to be quite that simple, I get regression test failures
> > in that case.
> > 
> 
> Hmm, looks like we still use it for normal COPY handling. So basically
> the problem is that if we run COPY TO STDOUT and then consume it using
> the libpqrcv_receive it will end with normal PGRES_COMMAND_OK but we
> need to call PQgetResult() in that case otherwise libpq thinks the
> command is still active and any following command will fail, but if we
> call PQgetResult on dead connection we get that error you complained about.

Should this possibly handled at the caller level?  This is a bit too
much magic for my taste.

Looking at the callers, the new code isn't super-obvious either:

len = walrcv_receive(wrconn, &buf, &fd);

if (len != 0)
{
/* Process the data */
for (;;)
{
CHECK_FOR_INTERRUPTS();

if (len == 0)
{
break;
}
else if (len < 0)
{
ereport(LOG,
(errmsg("data stream from publisher has ended")));
endofstream = true;
break;
}

The len < 0, hidden inside a len != 0, which in the loop again chcks if
len == 0 (because it's decremented in loop)?  And there's two different[5~
len = walrcv_receive(wrconn, &buf, &fd);
statements?

> I guess it would make sense to do conditional exit on
> (PQstatus(streamConn) == CONNECTION_BAD) like libpqrcv_PQexec does. It's
> quite ugly code-wise though.

I think that's fine for now.  It'd imo be a good idea to improve matters
here a bit, but for now I've just applied your patch.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Tom Lane

Robert Haas  writes:
> I don't really like either option, because, on the one hand, this is a
> pretty invasive thing to go rip out -- maybe we don't have to rip out
> the entire patch series, but just some of the later patches? -- and on
> the other hand, keeping it in the tree will require work, and I'm
> busy.  But there don't seem to be any other options.

I was wondering if we could simply remove the syntax (and documentation),
and leave the infrastructure in place until things are fixed.  I think
everybody wants this feature, there's just concern about stabilizing it
in time for v10.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Thomas Munro

On Fri, Jun 9, 2017 at 8:41 AM, Andrew Gierth
 wrote:
>> "Thomas" == Thomas Munro  writes:
>
>  Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
>  Thomas> that results from triggers running further statements that
>  Thomas> might fire triggers.  It isn't used for dealing with extra
>  Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
>  Thomas> Could it also be used for that purpose?  I think that would
>  Thomas> only work if it is the case that each ModifyTable node begin
>  Thomas> and then runs to completion (ie no interleaving of wCTE
>  Thomas> execution) and then its AS trigger fires, which I'm not sure
>  Thomas> about.
>
> There's a problem with this which I didn't see anyone mention (though
> I've not read the whole history); existing users of wCTEs rely on the
> fact that no AFTER triggers are run until _all_ modifications are
> completed.  If you change that, then you can't use wCTEs to insert into
> tables with FK checks without knowing the order in which the individual
> wCTEs are executed, which is currently a bit vague and non-obvious
> (which doesn't cause issues at the moment only because nothing actually
> depends on the order).
>
> for example:
> create table foo (id integer primary key);
> create table foo_bar (foo_id integer references foo, bar_id integer);
> with i1 as (insert into foo values (1))
>   insert into foo_bar values (1,2),(1,3);
>
> This would fail the FK check if each insert did its own trigger firing,
> since the foo_bar insert is actually run _first_.
>
> Firing triggers earlier than they currently are would thus break
> existing working queries.

Thanks, and I agree.  Later I came to the conclusion that we should
neither change nor depend on the order of execution, but in order to
avoid doing those things we simply couldn't keep using query_stack as
working space for the tuplestores -- the access pattern does not fit
stack semantics.  That's why I proposed having the ModifyTable nodes
own their own transition tuplestores, and passing them explicitly to
the sites where they're needed (various trigger.c routines), instead
of accessing them via that global stack.  That's the approach of the
WIP patch I posted here:

https://www.postgresql.org/message-id/CAEepm%3D1K7F08QPu%2BkGcNQ-bCcYBa3QX%3D9AeE%3Dj0doCmgqVs4Tg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Andrew Gierth

> "Robert" == Robert Haas  writes:

 Robert> unless some other committer volunteers.  (Of course, anyone
 Robert> could step in to do the work, as Thomas already has to a
 Robert> considerable degree, but without a committer involved it
 Robert> doesn't fix the problem.)

I can probably take this on if needed.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Andrew Gierth

> "Thomas" == Thomas Munro  writes:

 Thomas> So, afterTriggers.query_stack is used to handle the reentrancy
 Thomas> that results from triggers running further statements that
 Thomas> might fire triggers.  It isn't used for dealing with extra
 Thomas> ModifyTable nodes that can appear in a plan because of wCTEs.
 Thomas> Could it also be used for that purpose?  I think that would
 Thomas> only work if it is the case that each ModifyTable node begin
 Thomas> and then runs to completion (ie no interleaving of wCTE
 Thomas> execution) and then its AS trigger fires, which I'm not sure
 Thomas> about.

There's a problem with this which I didn't see anyone mention (though
I've not read the whole history); existing users of wCTEs rely on the
fact that no AFTER triggers are run until _all_ modifications are
completed.  If you change that, then you can't use wCTEs to insert into
tables with FK checks without knowing the order in which the individual
wCTEs are executed, which is currently a bit vague and non-obvious
(which doesn't cause issues at the moment only because nothing actually
depends on the order).

for example:
create table foo (id integer primary key);
create table foo_bar (foo_id integer references foo, bar_id integer);
with i1 as (insert into foo values (1))
  insert into foo_bar values (1,2),(1,3);

This would fail the FK check if each insert did its own trigger firing,
since the foo_bar insert is actually run _first_.

Firing triggers earlier than they currently are would thus break
existing working queries.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Peter Geoghegan

On Thu, Jun 8, 2017 at 1:20 PM, Robert Haas  wrote:
> I am not sure that I entirely agree with you on these points, but I
> don't want to debate it further and especially not on a public mailing
> list.

Fair enough.

> I'm going to spend some time over the next ~24 hours studying
> this thread and the patches and determining whether or not I'm willing
> to take responsibility for this patch.

Thank you.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 2:55 PM, Peter Geoghegan  wrote:
> It does seem like Kevin could have done better there. However, I think
> that Kevin was taking responsibility for the situation, and that it
> wasn't accurate to suggest he blamed others. I thought his account of
> the constraints that he operated under was a simple factual account.
>
> I also don't think it's useful for you to discourage revert on the
> grounds that it's a slippery slope. Admitting fault doesn't need to be
> made any harder.

I am not sure that I entirely agree with you on these points, but I
don't want to debate it further and especially not on a public mailing
list.  I'm going to spend some time over the next ~24 hours studying
this thread and the patches and determining whether or not I'm willing
to take responsibility for this patch.  If I decide that I am, I will
then work on trying to fix the technical problems.  If I decide that I
am not, then I think it will be necessary to take Kevin up on his
offer to revert, unless some other committer volunteers.   (Of course,
anyone could step in to do the work, as Thomas already has to a
considerable degree, but without a committer involved it doesn't fix
the problem.)

I don't really like either option, because, on the one hand, this is a
pretty invasive thing to go rip out -- maybe we don't have to rip out
the entire patch series, but just some of the later patches? -- and on
the other hand, keeping it in the tree will require work, and I'm
busy.  But there don't seem to be any other options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb_to_tsvector should be immutable

2017-06-08 Thread Andrew Dunstan



On 06/08/2017 03:06 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 06/08/2017 02:26 PM, Tom Lane wrote:
>>> Yeah, if the (regconfig,text) one is considered immutable, I don't see
>>> why the other two aren't.  The justification for the other three being
>>> only stable is that they depend on default_text_search_config.
>> Yes, agreed it should be done consistently with text.
> You going to fix it, or shall I?
>
>   


I'll do it.

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb_to_tsvector should be immutable

2017-06-08 Thread Tom Lane

Andrew Dunstan  writes:
> On 06/08/2017 02:26 PM, Tom Lane wrote:
>> Yeah, if the (regconfig,text) one is considered immutable, I don't see
>> why the other two aren't.  The justification for the other three being
>> only stable is that they depend on default_text_search_config.

> Yes, agreed it should be done consistently with text.

You going to fix it, or shall I?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb_to_tsvector should be immutable

2017-06-08 Thread Andrew Dunstan



On 06/08/2017 02:26 PM, Tom Lane wrote:
> Josh Berkus  writes:
>> select proname, prosrc, proargtypes, provolatile from pg_proc where
>> proname = 'to_tsvector';
> Slightly more readable version:
>
> regression=# select oid::regprocedure, provolatile, proparallel from pg_proc 
> where proname = 'to_tsvector';
>  oid  | provolatile | proparallel 
> --+-+-
>  to_tsvector(jsonb)   | s   | s
>  to_tsvector(regconfig,text)  | i   | s
>  to_tsvector(text)| s   | s
>  to_tsvector(json)| s   | s
>  to_tsvector(regconfig,jsonb) | s   | s
>  to_tsvector(regconfig,json)  | s   | s
> (6 rows)
>
>> Both of the _byid functions should be marked immutable, no?  Otherwise
>> how can users use the new functions for indexing?
> Yeah, if the (regconfig,text) one is considered immutable, I don't see
> why the other two aren't.  The justification for the other three being
> only stable is that they depend on default_text_search_config.
>
> (You could argue that none of these should be immutable because text
> search configurations are changeable, but we already decided to ignore
> that for the (regconfig,text) case.)
>
>   



Yes, agreed it should be done consistently with text.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Peter Geoghegan

On Thu, Jun 8, 2017 at 11:32 AM, Robert Haas  wrote:
> I understood the same.  However, he didn't review or commit any of the
> bug fix patches that Thomas posted in May, or even respond to the
> mailing list threads.  I eventually reviewed and committed them to
> avoid having the feature reverted; it took me only a few hours.
> Later, he said he would review the TransitionCaptureState patch Thomas
> posted at PGCon, later said again on-list that he would do so by
> Friday or anyway Monday, and it's now Thursday.  In other words, I am
> not going just by this particular email, but by the fact that he
> hasn't committed so much as a one line bug fix or posted any reviews
> in a long time.  The last non-reverted commit he made related to this
> feature was on April 6th, two days after the initial commit.

It does seem like Kevin could have done better there. However, I think
that Kevin was taking responsibility for the situation, and that it
wasn't accurate to suggest he blamed others. I thought his account of
the constraints that he operated under was a simple factual account.

I also don't think it's useful for you to discourage revert on the
grounds that it's a slippery slope. Admitting fault doesn't need to be
made any harder.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange error message from slave when connection to master cannot be established

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 1:17 PM, Pavel Stehule  wrote:
> 2017-06-08 17:19 GMT+02:00 Robert Haas :
>> On Wed, Jun 7, 2017 at 12:30 PM, Pavel Stehule 
>> wrote:
>> > I got strange error message - false message - max connection is less on
>> > slave than on master, although these numbers was same. The issue was in
>> > wrong connection string in recovery conf and slave cannot to check
>> > master
>> > and used some defaults.
>>
>> I don't understand this problem report.
>
> I am sorry. I did training so I had not enough time to get enough data.
>
> We started slave and start fails. The error message was some about less max
> connection on slave than master. Slave had 30, Master had 30.  But we had a
> issue in ssl configuration on slave connection string. And although slave
> didn't connect master we got a massage about different max connection - 30
> was compared with default 100.

I don't think that changing the SSL configuration can make any
difference to this error message, but replaying an
XLOG_PARAMETER_CHANGE WAL record might.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HACKERS[PROPOSAL] split ProcArrayLock into multiple parts

2017-06-08 Thread Jim Van Fleet

pgsql-hackers-ow...@postgresql.org wrote on 06/07/2017 04:06:57 PM:

...
> > 
> > Did you intend to attach a patch?
> Yes I do -- tomorrow or Thursday -- needs a little cleaning up ...
meant Friday

> 
> > > Sokolov Yura has a patch which, to me, looks good for pgbench rw
> > > performance.  Does not do so well with hammerdb (about the same 
> as base) on
> > > single socket and two socket.
> > 
> > Any idea why?  I think we will have to understand *why* certain things
> > help in some situations and not others, not just *that* they do, in
> > order to come up with a good solution to this problem.
> Looking at the data now -- LWLockAquire philosophy is different -- 
> at first glance I would have guessed "about the same" as the base, 
> but I can not yet explain why we have super pgbench rw performance 
> and "the same" hammerdb performance. 
(data taken from perf cycles when I invoked the performance data gathering 
script, generally in the middle of the run)
In hammerdb two socket, the ProcArrayLock is the bottle neck in 
LWLockAcquire (called from GetSnapshotData about 75% of the calls to 
LWLockAquire). With Sokolov's patch, LWLockAcquire (with LWLockAttemptLock 
included) is a little over 9%; pgbench, on the other hand, has 
LWLockAquire at 1.3% with GetSnapshotData calling only 11% of the calls to 
LWLockAcquire. 

What I think that means is that there is no ProcArrayLock bottleneck in 
pgbench. GetSnapshotData runs the entire proc chain of PGXACT's so is held 
a rather long time. Guessing that the other locks are held a much shorter 
time;  Sukolov's patch handles the other locks better because of spinning. 
We see much more time in LWLockAcquire with hammerdb because of the 
spinning -- with the ProcArrayLock, spinning does not help much because of 
the longer hold time.

The spin count is relatively high (100/2), so I made it much smaller 
(20/2) in the hopes that the spin would still handle the shorter hold time 
locks but not be a bother with long hold times.

Running pgbench with 96 users, the thruput was slightly less at 70K tsp vs 
75K tps (vs base of 40K tps at 96 threads and peak of 58K at 64 threads); 
hammerdb two socket was slightly better (about 3%) than the peak base.

What all this tells me is that LWLockAcquire would (probably) benefit from 
some spinning.
> 
> > 
> > -- 
> > Robert Haas
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company
> >

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 2:05 PM, Peter Geoghegan  wrote:
> On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas  wrote:
>> More generally, I don't think there's ever a
>> time when it's OK to commit a patch that you're not willing to put at
>> least some effort into fixing up afterwards.
>
> Kevin said "It has become clear that the scope of problems being found
> now exceed what I can be sure of being able to fix in time to make for
> a stable release, in spite of the heroic efforts Thomas has been
> putting in". I think it's clear that Kevin is willing to put in some
> work. The issue is that he is unable to *guarantee* that he'll be able
> to put in *sufficient* time, and in light of that concedes that it
> might be best to revert and revisit for Postgres 11. He is being
> cautious, and does not want to *risk* unduly holding up the release.
>
> That was my understanding, at least.

I understood the same.  However, he didn't review or commit any of the
bug fix patches that Thomas posted in May, or even respond to the
mailing list threads.  I eventually reviewed and committed them to
avoid having the feature reverted; it took me only a few hours.
Later, he said he would review the TransitionCaptureState patch Thomas
posted at PGCon, later said again on-list that he would do so by
Friday or anyway Monday, and it's now Thursday.  In other words, I am
not going just by this particular email, but by the fact that he
hasn't committed so much as a one line bug fix or posted any reviews
in a long time.  The last non-reverted commit he made related to this
feature was on April 6th, two days after the initial commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb_to_tsvector should be immutable

2017-06-08 Thread Tom Lane

Josh Berkus  writes:
> select proname, prosrc, proargtypes, provolatile from pg_proc where
> proname = 'to_tsvector';

Slightly more readable version:

regression=# select oid::regprocedure, provolatile, proparallel from pg_proc 
where proname = 'to_tsvector';
 oid  | provolatile | proparallel 
--+-+-
 to_tsvector(jsonb)   | s   | s
 to_tsvector(regconfig,text)  | i   | s
 to_tsvector(text)| s   | s
 to_tsvector(json)| s   | s
 to_tsvector(regconfig,jsonb) | s   | s
 to_tsvector(regconfig,json)  | s   | s
(6 rows)

> Both of the _byid functions should be marked immutable, no?  Otherwise
> how can users use the new functions for indexing?

Yeah, if the (regconfig,text) one is considered immutable, I don't see
why the other two aren't.  The justification for the other three being
only stable is that they depend on default_text_search_config.

(You could argue that none of these should be immutable because text
search configurations are changeable, but we already decided to ignore
that for the (regconfig,text) case.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Bruce Momjian

On Thu, Jun  8, 2017 at 11:05:43AM -0700, Peter Geoghegan wrote:
> On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas  wrote:
> > More generally, I don't think there's ever a
> > time when it's OK to commit a patch that you're not willing to put at
> > least some effort into fixing up afterwards.
> 
> Kevin said "It has become clear that the scope of problems being found
> now exceed what I can be sure of being able to fix in time to make for
> a stable release, in spite of the heroic efforts Thomas has been
> putting in". I think it's clear that Kevin is willing to put in some
> work. The issue is that he is unable to *guarantee* that he'll be able
> to put in *sufficient* time, and in light of that concedes that it
> might be best to revert and revisit for Postgres 11. He is being
> cautious, and does not want to *risk* unduly holding up the release.
> 
> That was my understanding, at least.

I think we can all agree that Kevin should have communicated this
earlier, rather than requiring Robert to push him on the issue.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] jsonb_to_tsvector should be immutable

2017-06-08 Thread Josh Berkus

Wanted to pull this out of my general report, because nobody seems to
have seen it:

P3: apparently jsonb_to_tsvector with lang parameter isn't immutable?
This means that it can't be used for indexing:

libdata=# create index bookdata_fts on bookdata using gin ((
to_tsvector('english',bookdata)));
ERROR:  functions in index expression must be marked IMMUTABLE

... and indeed it's not:

select proname, prosrc, proargtypes, provolatile from pg_proc where
proname = 'to_tsvector';
   proname   | prosrc | proargtypes | provolatile
-++-+-
 to_tsvector | jsonb_to_tsvector  | 3802| s
 to_tsvector | to_tsvector_byid   | 3734 25 | i
 to_tsvector | to_tsvector| 25  | s
 to_tsvector | json_to_tsvector   | 114 | s
 to_tsvector | jsonb_to_tsvector_byid | 3734 3802   | s
 to_tsvector | json_to_tsvector_byid  | 3734 114| s

Both of the _byid functions should be marked immutable, no?  Otherwise
how can users use the new functions for indexing?



-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Peter Geoghegan

On Thu, Jun 8, 2017 at 10:59 AM, Robert Haas  wrote:
> More generally, I don't think there's ever a
> time when it's OK to commit a patch that you're not willing to put at
> least some effort into fixing up afterwards.

Kevin said "It has become clear that the scope of problems being found
now exceed what I can be sure of being able to fix in time to make for
a stable release, in spite of the heroic efforts Thomas has been
putting in". I think it's clear that Kevin is willing to put in some
work. The issue is that he is unable to *guarantee* that he'll be able
to put in *sufficient* time, and in light of that concedes that it
might be best to revert and revisit for Postgres 11. He is being
cautious, and does not want to *risk* unduly holding up the release.

That was my understanding, at least.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 1:48 PM, Heikki Linnakangas  wrote:
> On 06/08/2017 08:36 PM, Robert Haas wrote:
>>
>> I freely admit I encouraged you to commit this.  I did not imagine
>> that would be followed immediately by abdicating all responsibility
>> for it.  My mistake, I guess.
>
> Robert, chill out.

That's probably good advice, but ...

> Kevin offered to revert. It's perhaps not the best way
> forward - I'm not familiar with the details here - but it's certainly one
> way to take responsibility.

... he also proposed to then commit it again for some future release
cycle, and what then?  Revert it again if it turns out to have any
bugs, and commit it a third time in some release cycle after that?
It's a big, invasive patch.  I don't think we want it going in and out
of the tree repeatedly.  More generally, I don't think there's ever a
time when it's OK to commit a patch that you're not willing to put at
least some effort into fixing up afterwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench tap tests & minor fixes

2017-06-08 Thread Fabien COELHO



Hello Nikolay,


I did some investigation: The code there really suppose that there is always
\n at the end of the line, and truncates the line. It is done in

/* Get location of the ending newline */
end_offset = expr_scanner_offset(sstate) - 1;

just two lines above the code we are discussing.

When you have one line code /sleep 2ms with no "end of line" symbol at the
end, it will cut off "s" instead of "\n"

You've fix it, but now it will leave \n, in all sleeps in multiline scripts.

So this should be fixed in both expr_scanner_get_substring cases, and keep last
symbol only if it is not "\n".


Indeed, this is a mess. The code assumes all stuff is a line ending with 
'\n', but this is not always the case.



Also, if someone could run a test on windows, it would be great.

I'll try to ask a friend of mine to run this on windows...


That would be great!

Here is a v7 which chomps the final newline only if there is one.

Thanks again,

--
Fabien.diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
index dc1367b..40a2a52 100644
--- a/src/bin/pgbench/exprscan.l
+++ b/src/bin/pgbench/exprscan.l
@@ -360,6 +360,20 @@ expr_scanner_get_substring(PsqlScanState state,
 }
 
 /*
+ * get current expression line without ending newline
+ */
+char *
+expr_scanner_get_line(PsqlScanState state, int start_offset, int end_offset)
+{
+	const char *p = state->scanbuf;
+	/* chomp eols */
+	while (end_offset > start_offset &&
+		   (p[end_offset] == '\n' || p[end_offset] == '\r'))
+		end_offset--;
+	return expr_scanner_get_substring(state, start_offset, end_offset + 1);
+}
+
+/*
  * Get the line number associated with the given string offset
  * (which must not be past the end of where we've lexed to).
  */
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index ae36247..383aa78 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -229,7 +229,7 @@ typedef struct SimpleStats
 typedef struct StatsData
 {
 	time_t		start_time;		/* interval start time, for aggregates */
-	int64		cnt;			/* number of transactions */
+	int64		cnt;			/* number of transactions, including skipped */
 	int64		skipped;		/* number of transactions skipped under --rate
  * and --latency-limit */
 	SimpleStats latency;
@@ -329,7 +329,7 @@ typedef struct
 	bool		prepared[MAX_SCRIPTS];	/* whether client prepared the script */
 
 	/* per client collected stats */
-	int64		cnt;			/* transaction count */
+	int64		cnt;			/* client transaction count, for -t */
 	int			ecnt;			/* error count */
 } CState;
 
@@ -2045,7 +2045,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	if (INSTR_TIME_IS_ZERO(now))
 		INSTR_TIME_SET_CURRENT(now);
 	now_us = INSTR_TIME_GET_MICROSEC(now);
-	while (thread->throttle_trigger < now_us - latency_limit)
+	while (thread->throttle_trigger < now_us - latency_limit &&
+		   (nxacts <= 0 || st->cnt < nxacts)) /* with -t, do not overshoot */
 	{
 		processXactStats(thread, st, &now, true, agg);
 		/* next rendez-vous */
@@ -2053,6 +2054,12 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		thread->throttle_trigger += wait;
 		st->txn_scheduled = thread->throttle_trigger;
 	}
+
+	if (nxacts > 0 && st->cnt >= nxacts)
+	{
+		st->state = CSTATE_FINISHED;
+		break;
+	}
 }
 
 st->state = CSTATE_THROTTLE;
@@ -2364,15 +2371,8 @@ doCustom(TState *thread, CState *st, StatsData *agg)
  */
 			case CSTATE_END_TX:
 
-/*
- * transaction finished: calculate latency and log the
- * transaction
- */
-if (progress || throttle_delay || latency_limit ||
-	per_script_stats || use_log)
-	processXactStats(thread, st, &now, false, agg);
-else
-	thread->stats.cnt++;
+/* transaction finished: calculate latency and do log */
+processXactStats(thread, st, &now, false, agg);
 
 if (is_connect)
 {
@@ -2381,7 +2381,6 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_ZERO(now);
 }
 
-++st->cnt;
 if ((st->cnt >= nxacts && duration <= 0) || timer_exceeded)
 {
 	/* exit success */
@@ -2519,17 +2518,20 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 {
 	double		latency = 0.0,
 lag = 0.0;
+	bool		detailed = progress || throttle_delay || latency_limit ||
+		per_script_stats || use_log;
 
-	if ((!skipped) && INSTR_TIME_IS_ZERO(*now))
-		INSTR_TIME_SET_CURRENT(*now);
-
-	if (!skipped)
+	if (detailed && !skipped)
 	{
+		if (INSTR_TIME_IS_ZERO(*now))
+			INSTR_TIME_SET_CURRENT(*now);
+
 		/* compute latency & lag */
 		latency = INSTR_TIME_GET_MICROSEC(*now) - st->txn_scheduled;
 		lag = INSTR_TIME_GET_MICROSEC(st->txn_begin) - st->txn_scheduled;
 	}
 
+	/* detailed thread stats */
 	if (progress || throttle_delay || latency_limit)
 	{
 		accumStats(&thread->stats, skipped, latency, lag);
@@ -2539,7 +2541,13 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 			thread->latency_late++;
 	}

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread Tom Lane

"'Andres Freund'"  writes:
> On 2017-06-08 11:57:49 -0400, Regina Obe wrote:
>> My main concern in these cases is the short-circuiting not happening.

> Note there's also no short-circuiting e.g. for aggregates inside case
> either.

Well, depends.  If const-folding manages to get rid of the aggregate
call altogether, it won't be computed.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Heikki Linnakangas


On 06/08/2017 08:36 PM, Robert Haas wrote:

I freely admit I encouraged you to commit this.  I did not imagine
that would be followed immediately by abdicating all responsibility
for it.  My mistake, I guess.


Robert, chill out. Kevin offered to revert. It's perhaps not the best 
way forward - I'm not familiar with the details here - but it's 
certainly one way to take responsibility.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table

2017-06-08 Thread Robert Haas

On Wed, Jun 7, 2017 at 5:47 PM, Kevin Grittner  wrote:
> It has become clear that the scope of problems being found now
> exceed what I can be sure of being able to fix in time to make for a
> stable release, in spite of the heroic efforts Thomas has been
> putting in.  I had hoped to get this into the first or second CF of
> this release, same with the release before, and same with the
> release before that.  At least landing it in the final CF drew the
> level of review and testing needed to polish it, but it's far from
> ideal timing (or procedure).  I can revert from v10 and deal with
> all of this for the first CF of some future release, but if someone
> feels they can deal with it in v10 I'll stand back and offer what
> help I can.

I really don't know what to make of this.  I can't ever remember a
committer taking this attitude before.  Aside from committing one
patch that Thomas submitted shortly after the original commit, you
haven't been willing to write, review, or commit a single line of
code.  The problem doesn't seem to be that the amount of effort is any
more than it would be for any other feature this size; it's in better
shape than some.  The problem appears to be, rather, that the scope of
problems you were willing to try to fix in a timely fashion was
basically zero, which isn't very realistic for a patch of this size no
matter who has reviewed it or in what level of detail.

We don't have a lot of formal documentation around the
responsibilities of PostgreSQL committers, but I think one that is
pretty clearly acknowledged is "you are responsible for what you
commit".  If you're not willing to be responsible for your own
patches, turn in your commit bit.  Then, when you submit a patch,
it'll either not get committed, or it'll be the responsibility of
whoever does.

I freely admit I encouraged you to commit this.  I did not imagine
that would be followed immediately by abdicating all responsibility
for it.  My mistake, I guess.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange error message from slave when connection to master cannot be established

2017-06-08 Thread Pavel Stehule

2017-06-08 17:19 GMT+02:00 Robert Haas :

> On Wed, Jun 7, 2017 at 12:30 PM, Pavel Stehule 
> wrote:
> > I got strange error message - false message - max connection is less on
> > slave than on master, although these numbers was same. The issue was in
> > wrong connection string in recovery conf and slave cannot to check master
> > and used some defaults.
>
> I don't understand this problem report.
>

I am sorry. I did training so I had not enough time to get enough data.

We started slave and start fails. The error message was some about less max
connection on slave than master. Slave had 30, Master had 30.  But we had a
issue in ssl configuration on slave connection string. And although slave
didn't connect master we got a massage about different max connection - 30
was compared with default 100.

Regards

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-08 Thread Josh Berkus

On 06/07/2017 06:37 PM, Peter Eisentraut wrote:
> On 6/7/17 21:19, Josh Berkus wrote:
>> The user's first thought is going to be a network issue, or a bug, or
>> some other problem, not a missing PK.  Yeah, they can find that
>> information in the logs, but only if they think to look for it in the
>> first place, and in some environments (AWS, containers, etc.) logs can
>> be very hard to access.
> 
> You're not going to get very far with using this feature if you are not
> looking in the logs for errors.  These are asynchronously operating
> background workers, so the only way they can communicate problems is
> through the log.

Well, we *could* provide a system view, as we now do for archiving, and
for the same reasons.

The issue isn't that the error detail is in the log.  It's somehow
letting the user know that they need to look at the log, as opposed to
somewhere else.  Consider that this is asynchonous for the user as well;
they are likely to find out about the broken replication well after it
happens, and thus have a lot of log to search through.

Activity logs are a *terrible* UI for debugging systems problems.  I
realize that there is information it's hard for us to provide any other
way.  But the logs should be our "monitoring of last resort", where we
put stuff after we've run out of ideas on where else to put it, because
they are the hardest thing to access for a user.

-- 
Josh Berkus
Containers & Databases Oh My!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Notes on testing Postgres 10b1

2017-06-08 Thread Josh Berkus

On 06/07/2017 07:01 PM, Petr Jelinek wrote:
> On 08/06/17 03:50, Josh Berkus wrote:
>> On 06/07/2017 06:25 PM, Petr Jelinek wrote:
>>> On 08/06/17 03:19, Josh Berkus wrote:

 Peter and Petr:

 On 06/07/2017 05:24 PM, Peter Eisentraut wrote:
> On 6/7/17 01:01, Josh Berkus wrote:
>> * Having defaults on the various _workers all devolve from max_workers
>> is also great.
>
> I'm not aware of anything like that happening.
>
>> P1. On the publishing node, logical replication relies on the *implied*
>> correspondence of the application_name and the replication_slot both
>> being named the same as the publication in order to associate a
>> particular publication with a particular replication connection.
>> However, there's absolutely nothing preventing me from also creating a
>> binary replication connection by the same name  It really seems like we
>> need a field in pg_stat_replication or pg_replication_slots which lists
>> the publication.
>
> I'm not quite sure what you are getting at here.  The application_name
> seen on the publisher side is the subscription name.  You can create a
> binary replication connection using the same application_name, but
> that's already been possible before.  But the publications don't care
> about any of this.

 My point is that there is no system view where I can see, on the origin
 node, what subscribers are subscribing to which publications.  You can
 kinda guess that from pg_stat_replication etc., but it's not dependable
 information.

>>>
>>> That's like wanting the foreign server to show you which foreign tables
>>> exist on the local server. This is not a tightly coupled system and you
>>> are able to setup both sides without them being connected to each other
>>> at the time of setup, so there is no way publisher can know anything.
>>
>> Why wouldn't the publisher know who's connected once the replication
>> connection as been made and the subscription has started?  Or is it just
>> a log position, and the publisher really has no idea how many
>> publications are being consumed?
>>
> 
> Plugin knows while the connection exists, but that's the thing, it goes
> through pluggable interface (that can be used by other plugins, without
> publications) so there would have to be some abstracted way for plugins
> to give some extra information for the pg_stat_replication or similar
> view. I am afraid it's bit too late to design something like that in
> PG10 cycle.

OK, consider it a feature request for PG11, then.


-- 
Josh Berkus
Containers & Databases Oh My!


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 7:01 AM, Amit Kapila  wrote:
> On Thu, Jun 8, 2017 at 1:33 AM, Robert Haas  wrote:
>> On Wed, Jun 7, 2017 at 5:46 AM, Amit Kapila  wrote:
>>> As far as I understand, it is to ensure that for deleted rows, nothing
>>> more needs to be done.  For example, see the below check in
>>> ExecUpdate/ExecDelete.
>>> if (!ItemPointerEquals(tupleid, &hufd.ctid))
>>> {
>>> ..
>>> }
>>> ..
>>>
>>> Also a similar check in ExecLockRows.  Now for deleted rows, if the
>>> t_ctid wouldn't point to itself, then in the mentioned functions, we
>>> were not in a position to conclude that the row is deleted.
>>
>> Right, so we would have to find all such checks and change them to use
>> some other method to conclude that the row is deleted.  What method
>> would we use?
>
> I think before doing above check we can simply check if ctid.ip_blkid
> contains InvalidBlockNumber, then return an error.

Hmm, OK.  That case never happens today?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread 'Andres Freund'

On 2017-06-08 11:57:49 -0400, Regina Obe wrote:
> My main concern in these cases is the short-circuiting not happening.

Note there's also no short-circuiting e.g. for aggregates inside case
either.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-08 Thread Robert Haas

On Wed, Jun 7, 2017 at 5:47 AM, Ashutosh Bapat
 wrote:
> On Sat, Jun 3, 2017 at 2:11 AM, Robert Haas  wrote:
>>
>> + errmsg("default partition contains row(s)
>> that would overlap with partition being created")));
>>
>> It doesn't really sound right to talk about rows overlapping with a
>> partition.  Partitions can overlap with each other, but not rows.
>> Also, it's not really project style to use ambiguously plural forms
>> like "row(s)" in error messages.  Maybe something like:
>>
>> new partition constraint for default partition \"%s\" would be
>> violated by some row
>
> Partition constraint is implementation detail here. We enforce
> partition bounds through constraints and we call such constraints as
> partition constraints. But a user may not necessarily understand this
> term or may interpret it different. Adding "new" adds to the confusion
> as the default partition is not new.

I see your point.  We could say "updated partition constraint" instead
of "new partition constraint" to address that to some degree.

> My suggestion in an earlier mail
> was ""default partition contains rows that conflict with the partition
> bounds of "part_xyz"", with a note that we should use a better word
> than "conflict". So, Jeevan seems to have used overlap, which again is
> not correct. How about "default partition contains row/s which would
> fit the partition "part_xyz" being created or attached." with a hint
> to move those rows to the new partition's table in case of attach. I
> don't think hint would be so straight forward i.e. to create the table
> with SELECT INTO and then ATTACH.

The problem is that none of these actually sound very good.  Neither
conflict nor overlap nor fit actually express the underlying idea very
clearly, at least IMHO.  I'm not opposed to using some wording along
these lines if we can think of a clear way to word it, but I think my
wording is better than using some unclear word for this concept.  I
can't immediately think of a way to adjust your wording so that it
seems completely clear.

> Also, the error code ERRCODE_CHECK_VIOLATION, which is an "integrity
> constraint violation" code, seems misleading. We aren't violating any
> integrity here. In fact I am not able to understand, how could adding
> an object violate integrity constraint. The nearest errorcode seems to
> be ERRCODE_INVALID_OBJECT_DEFINITION, which is also used for
> partitions with overlapping bounds.

I think that calling a constraint failure a check violation is not too
much of a stretch, even if it's technically a partition constraint
rather than a CHECK constraint.  However, your proposal also seems
reasonable.  I'm happy to go with whatever most people like best.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-08 Thread Robert Haas

On Wed, Jun 7, 2017 at 1:59 AM, amul sul  wrote:
> But Ashutosh's suggestion make sense, we might have constraints other
> than that partitioning constraint on default partition.  If those
> constraints refutes the new partition's constraints, we should skip
> the scan.

Right.  If the user adds a constraint to the default partition that is
identical to the new partition constraint, that should cause the scan
to be skipped.

Ideally, we could do even better.  For example, if the user is
creating a new partition FOR VALUES IN (7), and the default partition
has CHECK (key != 7), we could perhaps deduce that the combination of
the existing partition constraint (which must certainly hold) and the
additional CHECK constraint (which must also hold, at least assuming
it's not marked NOT VALID) are sufficient to prove the new check
constraint.  But I'm not sure whether predicate_refuted_by() is smart
enough to figure that out.  However, it should definitely be smart
enough to figure out that if somebody's added the new partitioning
constraint as a CHECK constraint on the default partition, we don't
need to scan it.

The reason somebody might want to do that, just to be clear, is that
they could do this in multiple steps: first, add the new CHECK
constraint as NOT VALID.  Then VALIDATE CONSTRAINT.  Then add the new
non-default partition.  This would result in holding an exclusive lock
for a lesser period of time than if they did it all together as one
operation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long binded parameter value in the postgres log

2017-06-08 Thread Masahiko Sawada

On Thu, Jun 8, 2017 at 5:00 PM, Nikitin Nikolay
 wrote:
> Hi!
>
>
>
> We insert many rows with long text and bytea (about 500 MB) values. In the
> postgres config we set log_min_duration_statement to 120 seconds.
>
> If this statements work more 120 seconds then they will be written into the
> postgres log with parameter values.
>
> As a result, the postgres log increases by 500MB for each statement.
>
>
>
> I think the postgres should have a "max bind value log size" parameter. If a
> bind value is bigger then this parameter, it will be truncated.
>
> And in the log its real size and its truncated part will be written.
>

In this case, I think you can use log_error_verbosity = terse which
excludes the logging of DETAIL, HINT, QUERY, and CONTEXT error
information. On the other hand, since actually the log of particular
large SQL or its parameter sometimes presses the free capacity,
possibly it's a good idea to have such threshold.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL 10 changes in exclusion constraints - did something change? CASE WHEN behavior oddity

2017-06-08 Thread Regina Obe


> But this line of thinking does strengthen my feeling that throwing an
error is the right thing to do for the moment.  If we allow v10 to accept
such cases but do something different from what we used to, that 
> will greatly complicate any future attempt to try to restore the old
behavior.

>   regards, tom lane

Agreed.  The other side benefit of throwing an error instead of just doing
something different is you'll find out how rampant the old behavior is :).

People are more likely to know to complain when their apps break than they
are if it just silently starts doing something different.

My main concern in these cases is the short-circuiting not happening.
Because in these cases, the code goes into areas that it shouldn't which is
likely to mess up some logic in hard to troubleshoot ways.
I think erroring out is the best compromise.

Thanks,
Regina



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List of hostaddrs not supported

2017-06-08 Thread David G. Johnston

On Thu, Jun 8, 2017 at 8:18 AM, Robert Haas  wrote:

> Whatever you put in the hostaddr field - or any field other than host
> and port - is one entry.  There is no notion of a list of entries in
> any other field, and no attempt to split any other field on a comma or
> any other symbol.
>
[...]

> I think the argument is about whether I made the right decision when I
> scoped the feature, not about whether there's a defect in the
> implementation.
>

Implementation comes into play if the scope decision stands.

I have no immediate examples but it doesn't seem that we usually go to
great lengths to infer user intent and show hints based upon said
inference.  But we also don't forbid doing so.  So the question of whether
we should implement better error messages here seems to be in scope -
especially since we do allow for lists in some of the sibling fields.

These are already failing so I'd agree that explicit rejection isn't
necessary - the question seems restricted to usability.  Though I suppose
we need to consider whether there is any problem with the current setup if
indeed our intended separator is also an allowable character - i.e., do we
want to future-proof the syntax by requiring quotes now?

David J.

Re: [HACKERS] strange error message from slave when connection to master cannot be established

2017-06-08 Thread Robert Haas

On Wed, Jun 7, 2017 at 12:30 PM, Pavel Stehule  wrote:
> I got strange error message - false message - max connection is less on
> slave than on master, although these numbers was same. The issue was in
> wrong connection string in recovery conf and slave cannot to check master
> and used some defaults.

I don't understand this problem report.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List of hostaddrs not supported

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 10:50 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> It doesn't seem like a problem to me if somebody else wants to extend
>> it to hostaddr, though.  Whether that change belongs in v10 or v11 is
>> debatable.  I would object to adding this as an open item with me as
>> the owner because doesn't seem to me to be a must-fix issue, but I
>> don't mind someone else doing the work.
>
> If you want to define multiple-hostaddrs as a future feature, that
> seems fine, but I think Heikki is describing actual bugs.  The minimum
> that I think needs to be done for v10 is to make libpq reject a hostaddr
> string with the wrong number of entries (either different from the
> host list, or different from 1).

Whatever you put in the hostaddr field - or any field other than host
and port - is one entry.  There is no notion of a list of entries in
any other field, and no attempt to split any other field on a comma or
any other symbol.  The fact that ::1,::1 looks to you like two entries
rather than a single malformed entry is just a misunderstanding on
your part, just like you'd be wrong if you thought that
password=foo,bar is a list of passwords rather than a password
containing a comma.

I think the argument is about whether I made the right decision when I
scoped the feature, not about whether there's a defect in the
implementation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does pg_upgrade really support "make installcheck"?

2017-06-08 Thread Tom Lane

Andrew Dunstan  writes:
> The whole thing is explained here:
> 

> And since then the buildfarm has acquired a separate optional module
> that tests pg_upgrade. It's enabled by default in the sample config file.

> So it's not like we don't have buildfarm coverage - we do in fact.
> Re-enabling this in the Makefile would a) result in breakage on some
> members and b) make most members do redundant work.

> I vote for improving the docs.

WFM, I'll go do that (and maybe improve the comment in the makefile)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List of hostaddrs not supported

2017-06-08 Thread Tom Lane

Robert Haas  writes:
> It doesn't seem like a problem to me if somebody else wants to extend
> it to hostaddr, though.  Whether that change belongs in v10 or v11 is
> debatable.  I would object to adding this as an open item with me as
> the owner because doesn't seem to me to be a must-fix issue, but I
> don't mind someone else doing the work.

If you want to define multiple-hostaddrs as a future feature, that
seems fine, but I think Heikki is describing actual bugs.  The minimum
that I think needs to be done for v10 is to make libpq reject a hostaddr
string with the wrong number of entries (either different from the
host list, or different from 1).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does pg_upgrade really support "make installcheck"?

2017-06-08 Thread Andrew Dunstan

On 06/08/2017 03:04 AM, Neha Khatri wrote:
> On 6/7/17, Tom Lane  wrote:
>> src/bin/pg_upgrade/TESTING claims (much further down in the file
>> than I'd like):
>>
>>  The shell script test.sh in this directory performs more or less this
>>  procedure.  You can invoke it by running
>>  make check
>>  or by running
>>  make installcheck
>>  if "make install" (or "make install-world") were done beforehand.
>>
>> However, the second alternative doesn't really work:
>>
>> $ make installcheck
>> make: Nothing to be done for `installcheck'.
>> $
>>
>> Did this ever work, or could it easily be made to work?
> It seems it would work if the following two lines are uncommented from
> the src/bin/pg_upgrade/Makefile:
>
>   # disabled because it upsets the build farm
>   # installcheck: test.sh
>   #  MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) $(SHELL) $<
>
> As the comment says, it was purposely disabled, probably to avoid
> failure on cetain build farm members.
> Attached the result of make installcheck after enabling the
> intallcheck target (just to be sure if that is what you are looking
> for).
>
>> If not, we need to fix that documentation.
> If the attached is result is what you are after, should the
> documentation be updated to mention that make installcheck is
> purposely disabled, providing the reason for it. Or, should the
> intallcheck target be enabled in the Makefile to find out if specific
> buildfarm members still complain about it.

The whole thing is explained here:

And since then the buildfarm has acquired a separate optional module
that tests pg_upgrade. It's enabled by default in the sample config file.

So it's not like we don't have buildfarm coverage - we do in fact.
Re-enabling this in the Makefile would a) result in breakage on some
members and b) make most members do redundant work.

I vote for improving the docs.

Let's also note that this test is not great anyway for a couple of
reasons. First, it doesn't test as much as it might, only the core
regression database. And second (and probably more importantly) it
doesn't test cross-version upgrade, which, after all, is the reason for
pg_upgrade's existence. There is an experimental buildfarm module that
addresses both these issues, but it needs a bit of work to make it
production ready. It runs (without reporting) on the animal crake, and
has been stable since some time in April. I recently started work on
bringing it up to production standard. It does take up a lot of space,
since it's saving out binaries and databases that are normally removed
at the end of each buildfarm run. On crake the static space required is
3.2Gb. That's in contrast to roughly 0.5 Gb used for all the supported
branches for everything else.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List of hostaddrs not supported

2017-06-08 Thread Robert Haas

On Thu, Jun 8, 2017 at 4:36 AM, Heikki Linnakangas  wrote:
> So, this is all quite confusing. I think we should support a list of
> hostaddrs, to go with the list of hostnames. It seems like a strange
> omission. Looking at the archives, it was mentioned a few times when this
> was developed and reviewed, latest Takayuki Tsunakawa asked [1] the same
> question, but it was then forgotten about.

I didn't really forget about it; I just didn't think it seemed
important.  There seemed to be a danger of scope creep, too.  For
example, you could argue that multiplicity ought to also be permitted
for passwords.  The status quo is that you can use different passwords
for different hosts if you specify the password via your .pgpass file,
but not if you include it in the connection string, and somebody could
argue that's weird and inconsistent.  But if you allow multiple
passwords then maybe you ought to also allow multiple usernames.  And
what do you do about the possibility that a password contains a
literal comma?  And if you allow a different password for each host,
maybe you ought to allow a different sslcert, too, for people not
using passwords to authenticate.  Maybe hostaddr is more
closely-related than any of that stuff, but I just made a judgement
call that host by itself was going to be a problem but host+port was
enough to make a credible minimal patch, and I didn't want to go
beyond what was absolutely required for fear of biting off more than I
could chew.

It doesn't seem like a problem to me if somebody else wants to extend
it to hostaddr, though.  Whether that change belongs in v10 or v11 is
debatable.  I would object to adding this as an open item with me as
the owner because doesn't seem to me to be a must-fix issue, but I
don't mind someone else doing the work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-06-08 Thread Dilip Kumar

On Thu, May 18, 2017 at 8:07 PM, Robert Haas  wrote:

Thanks for the feedback and sorry for the delayed response.

> You might need to adjust effective_cache_size.

You are right. But, effective_cache_size will have the impact on the
number of pages_fetched when it's used as parameterized path (i.e
inner side of the nested loop). But for our case where we see the
wrong number of pages got estimated (Q10), it was for the
non-parameterized path.  However, I have also tested with high
effective cache size but did not observe any change.

> The Mackert and Lohman
> formula isn't exactly counting unique pages fetched.

Right.

>It will count
> the same page twice if it thinks the page will be evicted from the
> cache after the first fetch and before the second one.

That too when loop count > 1.  If loop_count =1 then seems like it
doesn't consider the effective_cache size. But, actually, multiple
tuples can fall into the same page.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-08 Thread Amit Kapila

On Thu, Jun 8, 2017 at 6:49 PM, Dmitriy Sarafannikov
 wrote:
>
>> Why didn't rsync made the copies on master and replica same?
>
> Because rsync was running with —size-only flag.
>

IIUC the situation, the new WAL and updated pg_control file has been
copied, but not updated data files due to which the WAL has not been
replayed on replicas?  If so, why the pg_control file is copied, it's
size shouldn't have changed?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] List of hostaddrs not supported

2017-06-08 Thread Tom Lane

Heikki Linnakangas  writes:
> So, this is all quite confusing. I think we should support a list of 
> hostaddrs, to go with the list of hostnames. It seems like a strange 
> omission.

+1, if it's not too large a patch.  It could be argued that this is
a new feature and should wait for v11, but the behavior you describe
is weird enough that it could also be called a bug fix.

If it seems too hard to fix fully for v10, maybe we could insert
some checking code that just counts the number of entries in the
two lists and insists they match (which we'd keep later), plus
some code to reject >1 hostaddr (which would eventually go away).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-08 Thread Dmitriy Sarafannikov


> Why didn't rsync made the copies on master and replica same?

Because rsync was running with —size-only flag.

> I haven't looked in detail, but it sounds slightly risky proposition
> to manipulate the tuples by writing C functions of the form you have
> in your code.  I would have preferred some way to avoid this problem
> by ensuring that replicas are properly synced (complete data of master
> via WAL) or by disabling autovacuum.

Avoiding this problem is a good way. But what to do with already corrupted data?
Can you explain more what do you mean?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Broken hint bits (freeze)

2017-06-08 Thread Amit Kapila

On Fri, Jun 2, 2017 at 4:20 PM, Dmitriy Sarafannikov
 wrote:
> Thanks for all.
>
> We found the source of the problem. It was mistake in upgrade to 9.6.
>
> We upgrade replica with rsync as it is in the documentation:
> rsync --verbose --relative --archive --hard-links --size-only old_pgdata 
> new_pgdata remote_dir
>
> We must provide 100% read-only availability of our shard (master + 2 
> replicas).
> So we can’t stop master and both replicas, upgrade them one by one and start 
> them.
> We do it as follows:
> Close master from load, stop master, upgrade it, stop 1st replica, upgrade 
> it, start 1st replica,
> stop 2nd replica, upgrade it, start 2nd replica, start master, open master.
> But upgraded replicas died under load without statistics and we decided to 
> perform
> analyze on master before upgrading replicas. In this case statistics would be 
> copied to replicas by rsync.
> The upgrade algorithm became as follows:
> Close master, stop master, close master from replicas (iptables), upgrade 
> master,
> start master, perform analyze, stop master, stop 1st replica, upgrade 1st 
> replica,
> start 1st replica, stop 2nd replica, upgrade 2nd replica, start 2nd replica,
> start master, open master.
>
> If autovacuum starts vacuuming relations while we are performing analyze, wal 
> records
> generated by it will not be replayed on replicas, because next step is 
> stopping
> master with checkpoint and new redo location LSN (newer that these wal 
> records)
> will appear in pg_control file, which then will be copied by rsync to 
> replicas.
>
> If it was simple vacuum, we most likely will not see the consequences. 
> Because it marks
> tuples as deleted, and some of the next new tuples will be placed here, and 
> due to FPW
> replicas will receive correct page, identical to master.
> But if it was vacuum to prevent wraparound, we will see situation like ours. 
> Tuples on
> master will be frozen, but on replicas not. And it will not change if nobody 
> will not
> update any tuple on this page.
>

Why didn't rsync made the copies on master and replica same?

> It’s dangerous, because, if we perform switchover to replica, «corrupted» page
> will be delivered to all replicas after next update of any tuple from this 
> page.
>
> We reproduced this case in our test environment and this assumption was 
> confirmed.
>
> Starting and stopping master after running pg_upgrade but before rsync to 
> collect statistics
> was a bad idea.
>
> We know how to find such «corrupted» tuples. And we want to fix this by 
> manually
> freezing tuples via calling specially written C functions. Functions are 
> «copy-pasted»
> and simplified code from vacuum functions with SQL interface (see attachment).
> Can you look on them? Do you think it is safe to use them for fixing 
> corrupted pages
> or is there a better way not to loose data?
>

I haven't looked in detail, but it sounds slightly risky proposition
to manipulate the tuples by writing C functions of the form you have
in your code.  I would have preferred some way to avoid this problem
by ensuring that replicas are properly synced (complete data of master
via WAL) or by disabling autovacuum.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregation support for aggregate functions that use transitions not implemented for array_agg

2017-06-08 Thread Ashutosh Bapat

On Wed, Jun 7, 2017 at 10:55 PM, Robert Haas  wrote:
> On Tue, Jun 6, 2017 at 3:23 PM, David Fetter  wrote:
>> I'd bet on lack of tuits.
>
> I expect that was part of it.  Another thing to consider is that, for
> numeric aggregates, the transition values don't generally get larger
> as you aggregate, but for something like string_agg(), they will.
> It's not clear how much work we'll really save by parallelizing that
> sort of thing.  Maybe it will be great?

+1, I was thinking about the same. There might be some cases when the
output of array_agg/string_agg is not a lot wider but the underlying
scans are large e.g. having clause containing another aggregate and
very small group sizes. I am not sure how frequent are such usecases.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] Crash observed during the start of the Postgres process

2017-06-08 Thread K S, Sandhya (Nokia - IN/Bangalore)

Hi Tom Lane,

After removing our patch to change FATAL to LOG, we are not observing the crash 
now. 

Thank you for your support. We were struck with this issue for a while.

Regards,
Sandhya

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Friday, May 12, 2017 10:08 PM
To: K S, Sandhya (Nokia - IN/Bangalore) 
Cc: 'Merlin Moncure' ; 'pgsql-hackers@postgresql.org' 
; 'pgsql-b...@postgresql.org' 
; Itnal, Prakash (Nokia - IN/Bangalore) 
; T, Rasna (Nokia - IN/Bangalore) 
Subject: Re: [BUGS] Crash observed during the start of the Postgres process

"K S, Sandhya (Nokia - IN/Bangalore)"  writes:
> I have filtered the logs based on PID (19825) to see if this helps to
> debug the issue further.

Is this really a stock Postgres build?

The proximate cause of the PANIC is that the startup process is seeing
other processes active even though it hasn't reachedConsistency.  This
is bad on any number of levels, quite aside from that particular PANIC,
because those other processes are presumably seeing non-consistent
database state.  Looking elsewhere in the log, we see that indeed there
seem to be several backend processes happily executing commands.
For instance, here's the trace of one of them starting up:

[19810-58f473ff.4d62-187] 2017-04-17 07:51:28.783 GMT < > DEBUG:  0: forked 
new backend, pid=19850 socket=10
[19810-58f473ff.4d62-188] 2017-04-17 07:51:28.783 GMT < > LOCATION:  
BackendStartup, postmaster.c:3884
[19850-58f47400.4d8a-1] 2017-04-17 07:51:28.783 GMT <  > LOG:  57P03: the 
database system is starting up
[19850-58f47400.4d8a-2] 2017-04-17 07:51:28.783 GMT <  > LOCATION:  
ProcessStartupPacket, postmaster.c:2143
[19850-58f47400.4d8a-3] 2017-04-17 07:51:28.784 GMT <  authentication> DEBUG:  
0: postgres child[19850]: starting with (

Now, that LOG message proves that this backend has observed that the
database is not ready to allow connections.  So why did it only emit the
message as LOG and keep going?  The code for this in 9.3 looks like

/*
 * If we're going to reject the connection due to database state, say so
 * now instead of wasting cycles on an authentication exchange. (This 
also
 * allows a pg_ping utility to be written.)
 */
switch (port->canAcceptConnections)
{
case CAC_STARTUP:
ereport(FATAL,
(errcode(ERRCODE_CANNOT_CONNECT_NOW),
 errmsg("the database system is 
starting up")));
break;
...

I can't draw any other conclusion but that you've hacked something
to make FATAL act like LOG.  Which is a fatal mistake.  Errors that
are marked FATAL are generally ones where allowing the process to
keep going is not an acceptable outcome.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Long binded parameter value in the postgres log

2017-06-08 Thread Nikitin Nikolay

Hi!

We insert many rows with long text and bytea (about 500 MB) values. In the 
postgres config we set log_min_duration_statement to 120 seconds.
If this statements work more 120 seconds then they will be written into the 
postgres log with parameter values.
As a result, the postgres log increases by 500MB for each statement.

I think the postgres should have a "max bind value log size" parameter. If a 
bind value is bigger then this parameter, it will be truncated.
And in the log its real size and its truncated part will be written.

Regards,
Nikolay Nikitin.

Email address: nikolay.niki...@infowatch.com
PostgreSQL version: 9.4.11

Re: [HACKERS] UPDATE of partition key

2017-06-08 Thread Amit Kapila

On Thu, Jun 8, 2017 at 1:33 AM, Robert Haas  wrote:
> On Wed, Jun 7, 2017 at 5:46 AM, Amit Kapila  wrote:
>> As far as I understand, it is to ensure that for deleted rows, nothing
>> more needs to be done.  For example, see the below check in
>> ExecUpdate/ExecDelete.
>> if (!ItemPointerEquals(tupleid, &hufd.ctid))
>> {
>> ..
>> }
>> ..
>>
>> Also a similar check in ExecLockRows.  Now for deleted rows, if the
>> t_ctid wouldn't point to itself, then in the mentioned functions, we
>> were not in a position to conclude that the row is deleted.
>
> Right, so we would have to find all such checks and change them to use
> some other method to conclude that the row is deleted.  What method
> would we use?
>

I think before doing above check we can simply check if ctid.ip_blkid
contains InvalidBlockNumber, then return an error.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-08 Thread Jeevan Ladhe

Thanks Ashutosh,

On Thu, Jun 8, 2017 at 4:04 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
>  wrote:
> > On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
> >  wrote:
> >
> >>
> >>>
> >>> The code in check_default_allows_bound() to check whether the default
> >>> partition
> >>> has any rows that would fit new partition looks quite similar to the
> code
> >>> in
> >>> ATExecAttachPartition() checking whether all rows in the table being
> >>> attached
> >>> as a partition fit the partition bounds. One thing that
> >>> check_default_allows_bound() misses is, if there's already a
> constraint on
> >>> the
> >>> default partition refutes the partition constraint on the new
> partition,
> >>> we can
> >>> skip the scan of the default partition since it can not have rows that
> >>> would
> >>> fit the new partition. ATExecAttachPartition() has code to deal with a
> >>> similar
> >>> case i.e. the table being attached has a constraint which implies the
> >>> partition
> >>> constraint. There may be more cases which check_default_allows_bound()
> >>> does not
> >>> handle but ATExecAttachPartition() handles. So, I am wondering whether
> >>> it's
> >>> better to somehow take out the common code into a function and use it.
> We
> >>> will
> >>> have to deal with a difference through. The first one would throw an
> error
> >>> when
> >>> finding a row that satisfies partition constraints whereas the second
> one
> >>> would
> >>> throw an error when it doesn't find such a row. But this difference
> can be
> >>> handled through a flag or by negating the constraint. This would also
> take
> >>> care
> >>> of Amit Langote's complaint about foreign partitions. There's also
> another
> >>> difference that the ATExecAttachPartition() queues the table for scan
> and
> >>> the
> >>> actual scan takes place in ATRewriteTable(), but there is not such
> queue
> >>> while
> >>> creating a table as a partition. But we should check if we can reuse
> the
> >>> code to
> >>> scan the heap for checking a constraint.
> >>>
> >>> In case of ATTACH PARTITION, probably we should schedule scan of
> default
> >>> partition in the alter table's work queue like what
> >>> ATExecAttachPartition() is
> >>> doing for the table being attached. That would fit in the way alter
> table
> >>> works.
> >>
> >
> > I tried refactoring existing code so that it can be used for default
> > partitioning as well. The code to validate the partition constraints
> > against the table being attached in ATExecAttachPartition() is
> > extracted out into a set of functions. For default partition we reuse
> > those functions to check whether it contains any row that would fit
> > the partition being attached. While creating a new partition, the
> > function to skip validation is reused but the scan portion is
> > duplicated from ATRewriteTable since we are not in ALTER TABLE
> > context. The names of the functions, their declaration will require
> > some thought.
> >
> > There's one test failing because for ATTACH partition the error comes
> > from ATRewriteTable instead of check_default_allows_bounds(). May be
> > we want to use same message in both places or some make ATRewriteTable
> > give a different message while validating default partition.
> >
> > Please review the patch and let me know if the changes look good.
>
> From the discussion on thread [1], that having a NOT NULL constraint
> embedded within an expression may cause a scan to be skipped when it
> shouldn't be. For default partitions such a case may arise. If an
> existing partition accepts NULL and we try to attach a default
> partition, it would get a NOT NULL partition constraint but it will be
> buried within an expression like !(key = any(array[1, 2, 3]) OR key is
> null) where the existing partition/s accept values 1, 2, 3 and null.
> We need to check whether the refactored code handles this case
> correctly. v19 patch does not have this problem since it doesn't try
> to skip the scan based on the constraints of the table being attached.
> Please try following cases 1. a default partition accepting nulls
> exists and we attach a partition to accept NULL values 2. a NULL
> accepting partition exists and we try to attach a table as default
> partition. In both the cases default partition should be checked for
> rows with NULL partition keys. In both the cases, if the default
> partition table has a NOT NULL constraint we should be able to skip
> the scan and should scan the table when such a constraint does not
> exist.
>

I will review your refactoring patch as well test above cases.

Regards,
Jeevan Ladhe

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-08 Thread Ashutosh Bapat

On Thu, Jun 8, 2017 at 2:54 PM, Ashutosh Bapat
 wrote:
> On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
>  wrote:
>
>>
>>>
>>> The code in check_default_allows_bound() to check whether the default
>>> partition
>>> has any rows that would fit new partition looks quite similar to the code
>>> in
>>> ATExecAttachPartition() checking whether all rows in the table being
>>> attached
>>> as a partition fit the partition bounds. One thing that
>>> check_default_allows_bound() misses is, if there's already a constraint on
>>> the
>>> default partition refutes the partition constraint on the new partition,
>>> we can
>>> skip the scan of the default partition since it can not have rows that
>>> would
>>> fit the new partition. ATExecAttachPartition() has code to deal with a
>>> similar
>>> case i.e. the table being attached has a constraint which implies the
>>> partition
>>> constraint. There may be more cases which check_default_allows_bound()
>>> does not
>>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>>> it's
>>> better to somehow take out the common code into a function and use it. We
>>> will
>>> have to deal with a difference through. The first one would throw an error
>>> when
>>> finding a row that satisfies partition constraints whereas the second one
>>> would
>>> throw an error when it doesn't find such a row. But this difference can be
>>> handled through a flag or by negating the constraint. This would also take
>>> care
>>> of Amit Langote's complaint about foreign partitions. There's also another
>>> difference that the ATExecAttachPartition() queues the table for scan and
>>> the
>>> actual scan takes place in ATRewriteTable(), but there is not such queue
>>> while
>>> creating a table as a partition. But we should check if we can reuse the
>>> code to
>>> scan the heap for checking a constraint.
>>>
>>> In case of ATTACH PARTITION, probably we should schedule scan of default
>>> partition in the alter table's work queue like what
>>> ATExecAttachPartition() is
>>> doing for the table being attached. That would fit in the way alter table
>>> works.
>>
>
> I tried refactoring existing code so that it can be used for default
> partitioning as well. The code to validate the partition constraints
> against the table being attached in ATExecAttachPartition() is
> extracted out into a set of functions. For default partition we reuse
> those functions to check whether it contains any row that would fit
> the partition being attached. While creating a new partition, the
> function to skip validation is reused but the scan portion is
> duplicated from ATRewriteTable since we are not in ALTER TABLE
> context. The names of the functions, their declaration will require
> some thought.
>
> There's one test failing because for ATTACH partition the error comes
> from ATRewriteTable instead of check_default_allows_bounds(). May be
> we want to use same message in both places or some make ATRewriteTable
> give a different message while validating default partition.
>
> Please review the patch and let me know if the changes look good.

>From the discussion on thread [1], that having a NOT NULL constraint
embedded within an expression may cause a scan to be skipped when it
shouldn't be. For default partitions such a case may arise. If an
existing partition accepts NULL and we try to attach a default
partition, it would get a NOT NULL partition constraint but it will be
buried within an expression like !(key = any(array[1, 2, 3]) OR key is
null) where the existing partition/s accept values 1, 2, 3 and null.
We need to check whether the refactored code handles this case
correctly. v19 patch does not have this problem since it doesn't try
to skip the scan based on the constraints of the table being attached.
Please try following cases 1. a default partition accepting nulls
exists and we attach a partition to accept NULL values 2. a NULL
accepting partition exists and we try to attach a table as default
partition. In both the cases default partition should be checked for
rows with NULL partition keys. In both the cases, if the default
partition table has a NOT NULL constraint we should be able to skip
the scan and should scan the table when such a constraint does not
exist.

[1] 
http://www.postgresql-archive.org/A-bug-in-mapping-attributes-in-ATExecAttachPartition-td5965298.html

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-08 Thread Ashutosh Bapat

On Thu, Jun 8, 2017 at 3:13 PM, Amit Langote
 wrote:
> On 2017/06/08 1:44, Robert Haas wrote:
>> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat
>>  wrote:
>>> In ATExecAttachPartition() there's following code
>>>
>>> 13715 partnatts = get_partition_natts(key);
>>> 13716 for (i = 0; i < partnatts; i++)
>>> 13717 {
>>> 13718 AttrNumber  partattno;
>>> 13719
>>> 13720 partattno = get_partition_col_attnum(key, i);
>>> 13721
>>> 13722 /* If partition key is an expression, must not skip
>>> validation */
>>> 13723 if (!partition_accepts_null &&
>>> 13724 (partattno == 0 ||
>>> 13725  !bms_is_member(partattno, not_null_attrs)))
>>> 13726 skip_validate = false;
>>> 13727 }
>>>
>>> partattno obtained from the partition key is the attribute number of
>>> the partitioned table but not_null_attrs contains the attribute
>>> numbers of attributes of the table being attached which have NOT NULL
>>> constraint on them. But the attribute numbers of partitioned table and
>>> the table being attached may not agree i.e. partition key attribute in
>>> partitioned table may have a different position in the table being
>>> attached. So, this code looks buggy. Probably we don't have a test
>>> which tests this code with different attribute order between
>>> partitioned table and the table being attached. I didn't get time to
>>> actually construct a testcase and test it.
>
> There seem to be couple of bugs here:
>
> 1. When partition's key attributes differ in ordering from the parent,
>predicate_implied_by() will give up due to structural inequality of
>Vars in the expressions.  By fixing this, we can get it to return
>'true' when it's really so.
>
> 2. As you said, we store partition's attribute numbers in the
>not_null_attrs bitmap, but then check partattno (which is the parent's
>attribute number which might differ) against the bitmap, which seems
>like it might produce incorrect result.  If, for example,
>predicate_implied_by() set skip_validate to true, and the above code
>failed to set skip_validate to false where it should have, then we
>would wrongly end up skipping the scan.  That is, rows in the partition
>will contain null values whereas the partition constraint does not
>allow it.  It's hard to reproduce this currently, because with
>different ordering of attributes, predicate_refute_by() never returns
>true (as mentioned in 1 above), so skip_validate does not need to be
>set to false again.
>
> Consider this example:
>
> create table p (a int, b char) partition by list (a);
>
> create table p1 (b char not null, a int check (a in (1)));
> insert into p1 values ('b', null);
>
> Note that not_null_attrs for p1 will contain 1 corresponding to column b,
> which matches key attribute of the parent, that is 1, corresponding to
> column a.  Hence we end up wrongly concluding that p1's partition key
> column does not allow nulls.
>
>> I think this code could be removed entirely in light of commit
>> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7.
>
> I am assuming you think that because now we emit IS NOT NULL constraint
> internally for any partition keys that do not allow null values (including
> all the keys of range partitions as of commit
> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7).  But those IS NOT NULL
> constraint expressions are inconclusive as far as the application of
> predicate_implied_by() to determine if we can skip the scan is concerned.
> So even if predicate_implied_by() returned 'true', we cannot conclude,
> just based on that result, that there are not any null values in the
> partition keys.

I am not able to understand this. Are you saying that
predicate_implied_by() returns true even when it's not implied when
NOT NULL constraints are involved? That sounds like a bug in
predicate_implied_by(), which should be fixed instead of adding code
to pepper over it?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A bug in mapping attributes in ATExecAttachPartition()

2017-06-08 Thread Amit Langote

On 2017/06/08 1:44, Robert Haas wrote:
> On Wed, Jun 7, 2017 at 7:47 AM, Ashutosh Bapat
>  wrote:
>> In ATExecAttachPartition() there's following code
>>
>> 13715 partnatts = get_partition_natts(key);
>> 13716 for (i = 0; i < partnatts; i++)
>> 13717 {
>> 13718 AttrNumber  partattno;
>> 13719
>> 13720 partattno = get_partition_col_attnum(key, i);
>> 13721
>> 13722 /* If partition key is an expression, must not skip
>> validation */
>> 13723 if (!partition_accepts_null &&
>> 13724 (partattno == 0 ||
>> 13725  !bms_is_member(partattno, not_null_attrs)))
>> 13726 skip_validate = false;
>> 13727 }
>>
>> partattno obtained from the partition key is the attribute number of
>> the partitioned table but not_null_attrs contains the attribute
>> numbers of attributes of the table being attached which have NOT NULL
>> constraint on them. But the attribute numbers of partitioned table and
>> the table being attached may not agree i.e. partition key attribute in
>> partitioned table may have a different position in the table being
>> attached. So, this code looks buggy. Probably we don't have a test
>> which tests this code with different attribute order between
>> partitioned table and the table being attached. I didn't get time to
>> actually construct a testcase and test it.

There seem to be couple of bugs here:

1. When partition's key attributes differ in ordering from the parent,
   predicate_implied_by() will give up due to structural inequality of
   Vars in the expressions.  By fixing this, we can get it to return
   'true' when it's really so.

2. As you said, we store partition's attribute numbers in the
   not_null_attrs bitmap, but then check partattno (which is the parent's
   attribute number which might differ) against the bitmap, which seems
   like it might produce incorrect result.  If, for example,
   predicate_implied_by() set skip_validate to true, and the above code
   failed to set skip_validate to false where it should have, then we
   would wrongly end up skipping the scan.  That is, rows in the partition
   will contain null values whereas the partition constraint does not
   allow it.  It's hard to reproduce this currently, because with
   different ordering of attributes, predicate_refute_by() never returns
   true (as mentioned in 1 above), so skip_validate does not need to be
   set to false again.

Consider this example:

create table p (a int, b char) partition by list (a);

create table p1 (b char not null, a int check (a in (1)));
insert into p1 values ('b', null);

Note that not_null_attrs for p1 will contain 1 corresponding to column b,
which matches key attribute of the parent, that is 1, corresponding to
column a.  Hence we end up wrongly concluding that p1's partition key
column does not allow nulls.

> I think this code could be removed entirely in light of commit
> 3ec76ff1f2cf52e9b900349957b42d28128b7bc7.

I am assuming you think that because now we emit IS NOT NULL constraint
internally for any partition keys that do not allow null values (including
all the keys of range partitions as of commit
3ec76ff1f2cf52e9b900349957b42d28128b7bc7).  But those IS NOT NULL
constraint expressions are inconclusive as far as the application of
predicate_implied_by() to determine if we can skip the scan is concerned.
So even if predicate_implied_by() returned 'true', we cannot conclude,
just based on that result, that there are not any null values in the
partition keys.

The code in question is there to check if there are explicit NOT NULL
constraints on the partition keys.  It cannot be true for expression keys,
so we give up on skip_validate in that case anyway.  But if 1) there are
no expression keys, 2) all the partition key columns of the table being
attached have NOT NULL constraint, and 3) predicate_implied_by() returned
'true', we can skip the scan.

Thoughts?

I am working on a patch to fix the above mentioned issues and will post
the same no later than Friday.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-08 Thread Ashutosh Bapat

On Wed, Jun 7, 2017 at 2:08 AM, Jeevan Ladhe
 wrote:

>
>>
>> The code in check_default_allows_bound() to check whether the default
>> partition
>> has any rows that would fit new partition looks quite similar to the code
>> in
>> ATExecAttachPartition() checking whether all rows in the table being
>> attached
>> as a partition fit the partition bounds. One thing that
>> check_default_allows_bound() misses is, if there's already a constraint on
>> the
>> default partition refutes the partition constraint on the new partition,
>> we can
>> skip the scan of the default partition since it can not have rows that
>> would
>> fit the new partition. ATExecAttachPartition() has code to deal with a
>> similar
>> case i.e. the table being attached has a constraint which implies the
>> partition
>> constraint. There may be more cases which check_default_allows_bound()
>> does not
>> handle but ATExecAttachPartition() handles. So, I am wondering whether
>> it's
>> better to somehow take out the common code into a function and use it. We
>> will
>> have to deal with a difference through. The first one would throw an error
>> when
>> finding a row that satisfies partition constraints whereas the second one
>> would
>> throw an error when it doesn't find such a row. But this difference can be
>> handled through a flag or by negating the constraint. This would also take
>> care
>> of Amit Langote's complaint about foreign partitions. There's also another
>> difference that the ATExecAttachPartition() queues the table for scan and
>> the
>> actual scan takes place in ATRewriteTable(), but there is not such queue
>> while
>> creating a table as a partition. But we should check if we can reuse the
>> code to
>> scan the heap for checking a constraint.
>>
>> In case of ATTACH PARTITION, probably we should schedule scan of default
>> partition in the alter table's work queue like what
>> ATExecAttachPartition() is
>> doing for the table being attached. That would fit in the way alter table
>> works.
>

I tried refactoring existing code so that it can be used for default
partitioning as well. The code to validate the partition constraints
against the table being attached in ATExecAttachPartition() is
extracted out into a set of functions. For default partition we reuse
those functions to check whether it contains any row that would fit
the partition being attached. While creating a new partition, the
function to skip validation is reused but the scan portion is
duplicated from ATRewriteTable since we are not in ALTER TABLE
context. The names of the functions, their declaration will require
some thought.

There's one test failing because for ATTACH partition the error comes
from ATRewriteTable instead of check_default_allows_bounds(). May be
we want to use same message in both places or some make ATRewriteTable
give a different message while validating default partition.

Please review the patch and let me know if the changes look good.

default_partition_refactor_v20.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] List of hostaddrs not supported

2017-06-08 Thread Heikki Linnakangas

While testing libpq and GSS the other day, I was surprised by the
behavior of the host and hostaddr libpq options, if you specify a list
of hostnames.

I did this this, and it took me quite a while to figure out what was
going on:

$ psql "dbname=postgres hostaddr=::1 host=localhost,localhost user=krbtestuser" -c
"SELECT 'hello'"
psql: GSSAPI continuation error: Unspecified GSS failure. Minor code may
provide more information
GSSAPI continuation error: Server postgres/localhost,localhost@PG.EXAMPLE not
found in Kerberos database

That was a pilot error; I specified a list of hostnames, but only one
hostaddr. But I would've expected to get a more helpful error, pointing
that out.

Some thoughts on this:

1. You cannot actually specify a list of hostaddrs. Trying to do so
gives error:

psql: could not translate host name "::1,::1" to address: Name or service not
known

That error message is a bit inaccurate, in that it wasn't really a "host
name" that it tried to translate, but a raw address in string format.
That's even more confusing if you make the mistake that you specify
"hostaddr=localhost":

psql: could not translate host name "localhost" to address: Name or service not
known

But in the first case, could we detect that there is a comma in the
string, and give an error along the lines of "list of hostaddr's not
supported". (Or better yet, support multiple hostaddrs)

2. The documentation is not very clear on the fact that multiple
hostaddr's is not supported. Nor what happens if you specify a single
hostaddr, but a list of hostnames. (The list of hostnames gets treated
as a single hostname, that's what.)

So, this is all quite confusing. I think we should support a list of
hostaddrs, to go with the list of hostnames. It seems like a strange
omission. Looking at the archives, it was mentioned a few times when
this was developed and reviewed, latest Takayuki Tsunakawa asked [1] the
same question, but it was then forgotten about.

[1]
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1F63FB5E%40G01JPEXMBYT05

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Alter subscription..SET - NOTICE message is coming for table which is already removed

2017-06-08 Thread Masahiko Sawada

On Thu, Jun 8, 2017 at 5:36 AM, Peter Eisentraut
 wrote:
> On 5/30/17 13:25, Masahiko Sawada wrote:
>> I think this cause is that the relation status entry could be deleted
>> by ALTER SUBSCRIPTION REFRESH before corresponding table sync worker
>> starting. Attached patch fixes issues reported on this thread so far.
>
> I have committed the part of the patch that changes the
> SetSubscriptionRelState() calls.
>

Thank you!

> I think there was a mistake in your patch, in that the calls in
> LogicalRepSyncTableStart() used true once and false once.  I think all
> the calls in tablesync.c should be the same.

Yes, you're right.

> (If you look at the patch again, notice that I have changed the
> insert_ok argument to update_only, so true and false are flipped.)

Okay.

> I'm not convinced about the change to the GetSubscriptionRelState()
> argument.  In the examples given, no tables are removed from any
> publications, so I don't see how the claimed situation can happen.  I
> would like to see more reproducible examples.

In process_syncing_tables_for_apply(), apply worker gets the list of
all non-ready tables and tries to launch table sync workers
accordingly. But after got the list but before launch workers these
tables can be removed from publication, so launched table sync worker
cannot found its rel state from pg_subscription_rel. It completely
depends on timing and it happens rarely.

The reproduction step is provided by tushar but I could reproduced it
with following step.

X cluster ->
=# select 'create table t' || generate_series(1,100) || '(c
int);';\gexec -- create 100 tables
=# create table t (c int);  -- create one more table
=# create publication all_pub for all tables;
=# create publication one_pub for table t;

Y Cluster ->
(create the same 101 tables as well)
=# create subscription hoge_sub connection 'host=localhost  port=5432'
publication one_pub;
=# alter subscription hoge_sub set publication all_pub; select
pg_sleep(1); alter subscription hoge_sub set publication one_pub;
*Error occurs here*

> Right now, if the subscription rel state disappears before the sync
> worker starts, the error kills the sync worker, so things should
> continue working correctly.  Perhaps the error message isn't the best.
>

The change to GetSubscriptionRelState in that patch solves the error
message problem you mentioned. Returning SUBREL_STATE_UNKNOWN by
GetSubscriptionRelState means that the subscription rel state could
not found at the time. So we can emit the error with appropriate
message.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Does pg_upgrade really support "make installcheck"?

2017-06-08 Thread Neha Khatri

On 6/7/17, Tom Lane  wrote:
> src/bin/pg_upgrade/TESTING claims (much further down in the file
> than I'd like):
>
>   The shell script test.sh in this directory performs more or less this
>   procedure.  You can invoke it by running
>   make check
>   or by running
>   make installcheck
>   if "make install" (or "make install-world") were done beforehand.
>
> However, the second alternative doesn't really work:
>
> $ make installcheck
> make: Nothing to be done for `installcheck'.
> $
>
> Did this ever work, or could it easily be made to work?

It seems it would work if the following two lines are uncommented from
the src/bin/pg_upgrade/Makefile:

  # disabled because it upsets the build farm
  # installcheck: test.sh
  #  MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) $(SHELL) $<

As the comment says, it was purposely disabled, probably to avoid
failure on cetain build farm members.
Attached the result of make installcheck after enabling the
intallcheck target (just to be sure if that is what you are looking
for).

> If not, we need to fix that documentation.

If the attached is result is what you are after, should the
documentation be updated to mention that make installcheck is
purposely disabled, providing the reason for it. Or, should the
intallcheck target be enabled in the Makefile to find out if specific
buildfarm members still complain about it.

-- 
Regards,
Neha

pgu_installcheck.log
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

86 matches

Mail list logo