Re: [HACKERS] why not parallel seq scan for slow functions

2018-02-16 Thread Amit Kapila
On Fri, Feb 16, 2018 at 9:29 AM, Ashutosh Bapat
 wrote:
> On Thu, Feb 15, 2018 at 7:47 PM, Amit Kapila  wrote:
>> On Thu, Feb 15, 2018 at 4:18 PM, Ashutosh Bapat
>>  wrote:
>>> I happened to look at the patch for something else. But here are some
>>> comments. If any of those have been already discussed, please feel
>>> free to ignore. I have gone through the thread cursorily, so I might
>>> have missed something.
>>>
>>> In grouping_planner() we call query_planner() first which builds the
>>> join tree and creates paths, then calculate the target for scan/join
>>> rel which is applied on the topmost scan join rel. I am wondering
>>> whether we can reverse this order to calculate the target list of
>>> scan/join relation and pass it to standard_join_search() (or the hook)
>>> through query_planner().
>>>
>>
>> I think the reason for doing in that order is that we can't compute
>> target's width till after query_planner().  See the below note in
>> code:
>>
>> /*
>> * Convert the query's result tlist into PathTarget format.
>> *
>> * Note: it's desirable to not do this till after query_planner(),
>> * because the target width estimates can use per-Var width numbers
>> * that were obtained within query_planner().
>> */
>> final_target = create_pathtarget(root, tlist);
>>
>> Now, I think we can try to juggle the code in a way that the width can
>> be computed later, but the rest can be done earlier.
>
> /* Convenience macro to get a PathTarget with valid cost/width fields */
> #define create_pathtarget(root, tlist) \
> set_pathtarget_cost_width(root, make_pathtarget_from_tlist(tlist))
> create_pathtarget already works that way. We will need to split it.
>
> Create the Pathtarget without widths. Apply width estimates once we
> know the width of Vars somewhere here in query_planner()
> /*
>  * We should now have size estimates for every actual table involved in
>  * the query, and we also know which if any have been deleted from the
>  * query by join removal; so we can compute total_table_pages.
>  *
>  * Note that appendrels are not double-counted here, even though we don't
>  * bother to distinguish RelOptInfos for appendrel parents, because the
>  * parents will still have size zero.
>  *
> The next step is building the join tree. Set the pathtarget before that.
>
>> However, I think
>> that will be somewhat major change
>
> I agree.
>
>>  still won't address all kind of
>> cases (like for ordered paths) unless we can try to get all kind of
>> targets pushed down in the call stack.
>
> I didn't understand that.
>

The places where we use a target different than the target which is
pushed via query planner can cause a similar problem (ex. see the
usage of adjust_paths_for_srfs) because the cost of that target
wouldn't be taken into consideration for Gather's costing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: pgsql: Do execGrouping.c via expression eval machinery, take two.

2018-02-16 Thread Andres Freund
Hi,

On 2018-02-16 22:48:39 +, Andres Freund wrote:
> Do execGrouping.c via expression eval machinery, take two.
> 
> This has a performance benefit on own, although not hugely so. The
> primary benefit is that it will allow for to JIT tuple deforming and
> comparator invocations.
> 
> Large parts of this were previously committed (773aec7aa), but the
> commit contained an omission around cross-type comparisons and was
> thus reverted.
> 
> Author: Andres Freund
> Discussion: 
> https://postgr.es/m/20171129080934.amqqkke2zjtek...@alap3.anarazel.de

This triggered a failure on rhinoceros, in the sepgsql test:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros=2018-02-16%2023%3A45%3A02

The relevant diff is:
+ LOG:  SELinux: allowed { execute } 
scontext=unconfined_u:unconfined_r:sepgsql_regtest_superuser_t:s0-s0:c0.c255 
tcontext=system_u:object_r:sepgsql_proc_exec_t:s0 tclass=db_procedure 
name="pg_catalog.int4eq(integer,integer)"
and that's because we now invoke the function access hook for grouping
equal, which we previously didn't.

I personally think the new behaviour makes more sense, but if somebody
wants to argue differently? The only argument against I can see is that
there's some other cases where also don't yet invoke it, but that seems
weak.

I never fully grasped the exact use-case for the function execute hook
is, so maybe Kaigai and/or Robert could comment?

Greetings,

Andres Freund



Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-16 Thread Michael Paquier
On Fri, Feb 16, 2018 at 03:32:46PM -0500, Peter Eisentraut wrote:
> Maybe that would work.
> 
> We still need a way to configure whether we want to run tests that open
> TCP/IP listen sockets.

For this an environment variable seems suited to me.  Say if
PG_TAP_ALLOW_INSECURE is set, then the tests said "insecure" are allowed
to run.  If the tests are tried to be run, then they are just skipped
with a nice log message to tell you about it.  The cool part about this
design is that all the tests that are not part of check-world could be
integrated in it.  Most developers don't run regression tests on a
shared resource.  Let me think about a full-fledged patch first.
Documentation also matters a lot.
--
Michael


signature.asc
Description: PGP signature


Re: ALTER TABLE ADD COLUMN fast default

2018-02-16 Thread Tomas Vondra


On 02/16/2018 10:46 PM, Andrew Dunstan wrote:
> On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan
>  wrote:
>> On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
>>  wrote:
>>> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
>>>  wrote:
 On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
  wrote:
> On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
>  wrote:
>> Yeah, thanks. revised patch attached
>
> FYI the identity regression test started failing recently with this
> patch applied (maybe due to commit
> 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)
>

 Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.

>>>
>>>
>>> Here's a version that fixes the above issue and also the issue with
>>> VACUUM that Tomas Vondra reported. I'm still working on the issue with
>>> aggregates that Tomas also reported.
>>>
>>
>> This version fixes that issue. Thanks to Tomas for his help in finding
>> where the problem was. There are now no outstanding issues that I'm
>> aware of.
>>
> 
> 
> The attached version largely fixes with the performance degradation
> that Tomas Vondra reported, replacing an O(N^2) piece of code in
> slot_getmissingattrs() by one that is O(N).
> 

Seems fine to me - for comparison, numbers for master and v9/v10 patch
versions look like this:

 create.sqlcreate-alter.sql
   -
master 11201320
v9 1100  50
   v10 11301370

This particular machine has a laptop-class CPU, so the timings are
somewhat noisy - which explains the small tps differences.

What I find interesting is that if I do VACUUM FULL after running the
SQL script, I get this:

 create.sqlcreate-alter.sql
   -
master 11201450
v9 11001320
   v10 11001450

That is, the throughput for create-alter case jumps up by about 100 tps,
for some reason. Not sure why, though ...

Anyway, I consider the performance to be OK. But perhaps Andres could
comment on this too, as he requested the benchmarks.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: missing toast table for pg_policy

2018-02-16 Thread Tom Lane
Joe Conway  writes:
> On 02/16/2018 05:07 PM, Andres Freund wrote:
>> On 2018-02-16 16:56:15 -0500, Joe Conway wrote:
>>> Looking at the issue, the problem seems to be missing toast table for
>>> pg_policy. Also attached is a one line patch. It isn't clear to me
>>> whether this is a candidate for backpatching.

>> Don't think it is - it'd not take effect on already initdb'ed clusters.

> Yep, knew that, but...

>> If problematic for < master users I think you'll have to restart cluster
>> with allow_system_table_mods, manually create/drop toasted column. IIRC
>> that should add a toast table even after dropping.

> I wasn't sure if we would want to backpatch and put the manual procedure
> steps into the release notes.

The example you give seems like pretty bad practice to me.  I don't think
we should back-patch unless it's possible to trigger the problem with a
more realistic policy expression.

(Also, one can always work around it by putting the complicated condition
into a function, which would likely be a better idea anyway from a
maintenance standpoint.)

regards, tom lane



Re: missing toast table for pg_policy

2018-02-16 Thread Joe Conway
On 02/16/2018 05:07 PM, Andres Freund wrote:
> Hi,
> 
> On 2018-02-16 16:56:15 -0500, Joe Conway wrote:
>> Looking at the issue, the problem seems to be missing toast table for
>> pg_policy. Also attached is a one line patch. It isn't clear to me
>> whether this is a candidate for backpatching.
> 
> Don't think it is - it'd not take effect on already initdb'ed clusters.

Yep, knew that, but...

> If problematic for < master users I think you'll have to restart cluster
> with allow_system_table_mods, manually create/drop toasted column. IIRC
> that should add a toast table even after dropping.

I wasn't sure if we would want to backpatch and put the manual procedure
steps into the release notes.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: missing toast table for pg_policy

2018-02-16 Thread Andres Freund
Hi,

On 2018-02-16 16:56:15 -0500, Joe Conway wrote:
> Looking at the issue, the problem seems to be missing toast table for
> pg_policy. Also attached is a one line patch. It isn't clear to me
> whether this is a candidate for backpatching.

Don't think it is - it'd not take effect on already initdb'ed clusters.

If problematic for < master users I think you'll have to restart cluster
with allow_system_table_mods, manually create/drop toasted column. IIRC
that should add a toast table even after dropping.

Greetings,

Andres Freund



Re: Add more information_schema columns

2018-02-16 Thread Peter Eisentraut
On 2/13/18 18:39, Tom Lane wrote:
> Andres Freund  writes:
>> Do we have a policy about catversion bumps for information schema
>> changes? A cluster from before this commit fails the regression tests
>> after the change, but still mostly works...
> 
> I think historically we've not bumped catversion, on the grounds that
> there's no incompatibility with the backend as such.  However, it is
> kind of annoying that not updating means the regression tests fail.
> Informally, I'm sure most developers take "catversion bump" to mean
> "you need to initdb".  So I'd support saying that an information_schema
> change should include a catversion bump if it involves any changes in
> regression test results.

I will do that in the future if that is the preference.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: ALTER TABLE ADD COLUMN fast default

2018-02-16 Thread Andrew Dunstan
On Tue, Feb 13, 2018 at 6:28 PM, Andrew Dunstan
 wrote:
> On Fri, Feb 9, 2018 at 3:54 PM, Andrew Dunstan
>  wrote:
>> On Mon, Feb 5, 2018 at 7:49 AM, Andrew Dunstan
>>  wrote:
>>> On Mon, Feb 5, 2018 at 7:19 AM, Thomas Munro
>>>  wrote:
 On Fri, Jan 26, 2018 at 1:23 PM, Andrew Dunstan
  wrote:
> Yeah, thanks. revised patch attached

 FYI the identity regression test started failing recently with this
 patch applied (maybe due to commit
 533c5d8bddf0feb1785b3da17c0d17feeaac76d8?)

>>>
>>> Thanks. Probably the same bug Tomas Vondra found a few days ago. I'm on it.
>>>
>>
>>
>> Here's a version that fixes the above issue and also the issue with
>> VACUUM that Tomas Vondra reported. I'm still working on the issue with
>> aggregates that Tomas also reported.
>>
>
> This version fixes that issue. Thanks to Tomas for his help in finding
> where the problem was. There are now no outstanding issues that I'm
> aware of.
>


The attached version largely fixes with the performance degradation
that Tomas Vondra reported, replacing an O(N^2) piece of code in
slot_getmissingattrs() by one that is O(N).

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


fast_default-v10.patch
Description: Binary data


Re: Cancelling parallel query leads to segfault

2018-02-16 Thread Peter Eisentraut
On 2/14/18 13:56, Andres Freund wrote:
> With your example I can reliably trigger the issue if I shut down the
> server while the query is running:

OK, that way I can see it too.  Fix pushed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add PGDLLIMPORT to enable_hashagg

2018-02-16 Thread Brian Cloutier
On Fri, Feb 9, 2018 at 1:01 PM, Robert Haas  wrote:

>
> Committed.


Thanks for committing this! We forgot to ask though, could you please
backport this patch to 10 and maybe even 9.6? As-is I don't think these
variables will be available until PG 11.


Re: rename sgml files?

2018-02-16 Thread Tom Lane
Peter Eisentraut  writes:
> On 2/12/18 16:19, Tom Lane wrote:
>> At that point, back-patching documentation fixes would become effectively
>> impossible except through manual intervention in the patching process.

> Are you not using git cherry-pick?

Yes, when it works, which it tends not to in cases that are even a little
bit complicated.  I have zero faith that it works across a file rename,
and would not like to give up the option of using patch(1) instead.  (See,
eg, recent discussions about the fragility of "git apply" vs "patch".)

regards, tom lane



Re: Disabling src/test/[ssl|ldap] when not building with SSL/LDAP support

2018-02-16 Thread Peter Eisentraut
On 2/12/18 23:00, Michael Paquier wrote:
> Hm.  Wouldn't it be enough to just spread the use of
> TestLib::check_pg_config and use SKIP on the tests where things cannot
> be run then?  I see no need to invent an extra facility if all the
> control is already in pg_config.h.  For slapd, you can actually just
> check if it can be executed in the PATH defined at the top of
> 001_auth.pl in $slapd, and skip the tests if the thing cannot be run.
> We have enough flexibility in perl and TAP to allow that to work
> reliably.

Maybe that would work.

We still need a way to configure whether we want to run tests that open
TCP/IP listen sockets.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: FOR EACH ROW triggers on partitioned tables

2018-02-16 Thread Peter Eisentraut
On 2/15/18 16:55, Alvaro Herrera wrote:
> Amit Langote wrote:
>> Do you mean to fire these triggers only if the parent table (not a child
>> table/partition) is addressed in the DML, right?  If the table directly
>> addressed in the DML is a partition whose parent has a row-level trigger,
>> then that trigger should not get fired I suppose.
> 
> No, I think that would be strange and cause data inconsistencies.
> Inserting directly into the partition is seen as a performance
> optimization (compared to inserted into the partitioned table), so we
> don't get to skip firing the triggers defined on the parent because the
> behavior would become different.  In other words, the performance
> optimization breaks the database.
> 
> Example: suppose the trigger is used to maintain an audit record trail.

Although this situation could probably be addressed by not giving
permission to write directly into the partitions, I can't think of an
example where one would want a trigger that is only fired when writing
into the partition root rather than into the partition directly.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: rename sgml files?

2018-02-16 Thread Peter Eisentraut
On 2/12/18 16:19, Tom Lane wrote:
> At that point, back-patching documentation fixes would become effectively
> impossible except through manual intervention in the patching process.

Are you not using git cherry-pick?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

2018-02-16 Thread Alvaro Herrera
Tom Lane wrote:
> Anna Akenteva  writes:
> > [ widen StringInfoData max length to size_t ]
> 
> I find this scary as heck.  Have you spent any time looking at the
> side effects?  There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

See these commits:
fa2fa9955280 42f50cb8fa98 b66adb7b0c83
and the discussion threads linked in the commit messages.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

2018-02-16 Thread Tom Lane
Andres Freund  writes:
> On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
>> Anna Akenteva  writes:
>>> [ widen StringInfoData max length to size_t ]

>> I find this scary as heck.  Have you spent any time looking at the
>> side effects?  There are probably hundreds of places that expect that
>> stringinfos won't get larger than 1GB.

> FWIW, I think we're going to have to bite that bullet sooner rather than
> later. I do agree it's not going to fix this issue for real, and that
> we're not going to backpatch it.

I'm not necessarily saying we shouldn't consider widening this.
I'm just saying it's going to take a good deal of cross-checking for
consequences, in particular that nothing is at risk of integer overflow
if it's presented with a very long StringInfo.

One way to limit the side effects would be to have StringInfos default to
only allowing 1GB of content as before, and you have to do something extra
at creation time to let one get bigger.

There's still the problem that the wire protocol will limit us to 2GB
(or maybe 4GB if you want to be brave^Wfoolhardy and assume clients think
the width fields are unsigned).  I can't get hugely excited about moving
the goalposts only from 1GB to 2GB ...

regards, tom lane



Re: master check fails on Windows Server 2008

2018-02-16 Thread Tom Lane
Marina Polyakova  writes:
> Hello, hackers! I got a permanent failure of master (commit 
> 2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 2008. 
> Regression output and diffs as well as config.pl are attached.

Weird.  AFAICS the cost estimates for those two plans should be quite
different, so this isn't just a matter of the estimates maybe being
a bit platform-dependent.  (And that test has been there nearly a
year without causing reported problems.)

To dig into it a bit more, I tweaked the test case to show the costs
for both plans, and got an output diff as attached.  Could you try
the same experiment on your Windows box?  In order to force the choice
in the other direction, you'd need to temporarily disable enable_sort,
not enable_hashagg as I did here, but the principle is the same.

regards, tom lane

diff --git a/src/test/regress/sql/stats_ext.sql b/src/test/regress/sql/stats_ext.sql
index 46acaad..1082660 100644
*** a/src/test/regress/sql/stats_ext.sql
--- b/src/test/regress/sql/stats_ext.sql
*** EXPLAIN (COSTS off)
*** 177,184 
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
--- 177,188 
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, b, c, d;
  
! EXPLAIN --(COSTS off)
!  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
! set enable_hashagg = 0;
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+ reset enable_hashagg;
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
*** /home/postgres/pgsql/src/test/regress/expected/stats_ext.out	Mon Feb 12 14:53:46 2018
--- /home/postgres/pgsql/src/test/regress/results/stats_ext.out	Fri Feb 16 11:23:11 2018
***
*** 309,323 
   ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!  QUERY PLAN  
! -
!  HashAggregate
 Group Key: b, c, d
!->  Seq Scan on ndistinct
  (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
   QUERY PLAN  
--- 309,336 
   ->  Seq Scan on ndistinct
  (5 rows)
  
! EXPLAIN --(COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
!   QUERY PLAN  
! --
!  HashAggregate  (cost=291.00..307.32 rows=1632 width=20)
 Group Key: b, c, d
!->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=1 width=12)
  (3 rows)
  
+ set enable_hashagg = 0;
+ EXPLAIN --(COSTS off)
+  SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
+  QUERY PLAN 
+ 
+  GroupAggregate  (cost=1026.89..1168.21 rows=1632 width=20)
+Group Key: b, c, d
+->  Sort  (cost=1026.89..1051.89 rows=1 width=12)
+  Sort Key: b, c, d
+  ->  Seq Scan on ndistinct  (cost=0.00..191.00 rows=1 width=12)
+ (5 rows)
+ 
+ reset enable_hashagg;
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
   QUERY PLAN  

==



Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

2018-02-16 Thread Andres Freund
Hi,

On 2018-02-16 09:58:29 -0500, Tom Lane wrote:
> Anna Akenteva  writes:
> > [ widen StringInfoData max length to size_t ]
> 
> I find this scary as heck.  Have you spent any time looking at the
> side effects?  There are probably hundreds of places that expect that
> stringinfos won't get larger than 1GB.

FWIW, I think we're going to have to bite that bullet sooner rather than
later. I do agree it's not going to fix this issue for real, and that
we're not going to backpatch it.

Greetings,

Andres Freund



Re: After an error - pg_replication_slot is dropped

2018-02-16 Thread Petr Jelinek
On 16/02/18 12:38, tushar wrote:
> On 02/16/2018 04:02 PM, Petr Jelinek wrote:
>> It's because you are creating temporary slot. Temporary slots are
>> removed on error, that's a documented behavior.
> 
> Thanks for pointing out but It looks weird behavior - even a small mea
> culpa can remove the slot. Temporary table - doesn't go automatically
> after an error ?
> 

Temporary tables have transactional properties, slots don't (even the
non-temporary). For example if you create replication slot in
transaction and then abort it, the slot will survive. That's the price
we pay for ability to create slots on standby.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



[PATCH] pg_hba.conf : new auth option : clientcert=verify-full

2018-02-16 Thread Julian Markwort
Dear Postgresql Hackers,

as of now, pg_hba.conf allows us to enable authentification by
certificate through the auth-method "cert", in which case the user must
provide a valid certificate with a certificate common name(CN) matching
the database user's name or an entry in a pg_ident map.

Additionaly, for every other auth-method it is possible to set the
auth-option "clientcert=1", so clients must present a valid certificate
at login. The logic behind this only checks the validity of the
certificate itself, but the certificate common name(CN) is not
relevant.

I wrote a very small patch that adds another auth-option:
- clientcert=verify-full (analogous to server certificates; you could
also use 2 instead of verify-full for backwards compatibility, or
verify-ca instead of 1) 
which also checks the certificate common name,
so all 3 factors get checked:
1.) auth-method, e.g. scram or md5 password passes 
2.) client cert is in truststore 
3.) CN is correct.

(The patch simply makes use of the function that is used for auth-
method "cert" to avoid code duplication).

Have a nice weekend,
Julian Markwortdiff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 3014b17a..5757aa99 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -347,6 +347,7 @@ void
 ClientAuthentication(Port *port)
 {
 	int			status = STATUS_ERROR;
+	int			status_verify_cert_full = STATUS_ERROR;
 	char	   *logdetail = NULL;
 
 	/*
@@ -600,10 +601,23 @@ ClientAuthentication(Port *port)
 			break;
 	}
 
+	if(port->hba->clientcert_verify_full)
+	{
+#ifdef USE_SSL
+			status_verify_cert_full = CheckCertAuth(port);
+#else
+			Assert(false);
+#endif
+	}
+	else
+	{
+		status_verify_cert_full = STATUS_OK;
+	}
+
 	if (ClientAuthentication_hook)
 		(*ClientAuthentication_hook) (port, status);
 
-	if (status == STATUS_OK)
+	if (status == STATUS_OK && status_verify_cert_full == STATUS_OK)
 		sendAuthRequest(port, AUTH_REQ_OK, NULL, 0);
 	else
 		auth_failed(port, status, logdetail);
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index acf625e4..a5b0683d 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -1675,10 +1675,17 @@ parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
 			*err_msg = "clientcert can only be configured for \"hostssl\" rows";
 			return false;
 		}
-		if (strcmp(val, "1") == 0)
+		if (strcmp(val, "1") == 0
+			|| strcmp(val, "verify-ca") == 0)
 		{
 			hbaline->clientcert = true;
 		}
+		else if(strcmp(val, "2") == 0
+|| strcmp(val, "verify-full") == 0)
+		{
+			hbaline->clientcert = true;
+			hbaline->clientcert_verify_full = true;
+		}
 		else
 		{
 			if (hbaline->auth_method == uaCert)
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index 5f68f4c6..309db47d 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -87,6 +87,7 @@ typedef struct HbaLine
 	char	   *ldapprefix;
 	char	   *ldapsuffix;
 	bool		clientcert;
+	bool		clientcert_verify_full;
 	char	   *krb_realm;
 	bool		include_realm;
 	bool		compat_realm;


smime.p7s
Description: S/MIME cryptographic signature


Re: spelling of enable_partition_wise_join

2018-02-16 Thread Peter Eisentraut
On 2/14/18 03:28, Ashutosh Bapat wrote:
> On Wed, Feb 14, 2018 at 2:15 AM, Alvaro Herrera  
> wrote:
>> Peter Eisentraut wrote:
>>> I wonder how others feel about this, but the spelling of
>>> enable_partition_wise_join feels funny to me every time I look at it.  I
>>> would write it enable_partitionwise_join.
>>
>> +1
>>
>> See also: https://postgr.es/m/20171005134847.shzldz2ublrb3ny2@alvherre.pgsql
> 
> To that I replied with
> https://www.postgresql.org/message-id/CAFjFpRcsZnxCen88a-16R5EYqPCwFYnFThM%2Bmjagu%3DB1QvxPVA%40mail.gmail.com.
> I didn't get any further response, so nothing was changed that time.
> Alvaro, Peter, Gavin have voted for partitionwise in this thread and
> Robert had similar objections earlier. Looks like we should change it.

done

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pearltidy source code has been removed (pgindent)

2018-02-16 Thread Steven Lembark




-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

2018-02-16 Thread Tom Lane
Anna Akenteva  writes:
> [ widen StringInfoData max length to size_t ]

I find this scary as heck.  Have you spent any time looking at the
side effects?  There are probably hundreds of places that expect that
stringinfos won't get larger than 1GB.

Also, I don't entirely see how this fixes your stated goal of being
able to select a bytea value whose textual representation exceeds
1GB.  The wire protocol can't support that either, and even if it did,
I wonder how many client programs could cope.  Extremely wide tuple
values create pain points in many places.

> And as it seems like quite a serious issue, would it be possible to 
> backport a fix for it to earlier versions?

Since this is an ABI break with very widely visible effects, there is
no chance whatsoever that it would be back-patched.

regards, tom lane



[HACKERS] [bug-fix] Cannot select big bytea values (~600MB)

2018-02-16 Thread Anna Akenteva

Hello!

If I create a big bytea value and try to select it from a table, I get 
an error, something like: "ERROR:  invalid memory alloc request size 
...".


So basically we can insert data into a table but then we can't even work 
with it. Sounds like a bug. Attaching a patch that fixes it (applies to 
2a41507dab0f293ff241fe8ae326065998668af8).


And as it seems like quite a serious issue, would it be possible to 
backport a fix for it to earlier versions?





HOW TO RECREATE:
1) generate some random data (in this case, 600 MB):
dd if=/dev/urandom of=rand.dat bs=1M count=600

2) postgres=# select lo_import('/PATH/TO/rand.dat');
 lo_import
---
 16397 [USE THIS ID FOR THE NEXT STEP]
(1 row)

3) postgres=# create table big_data as select (string_agg(data,'')) as 
data from pg_largeobject where loid =16397;

SELECT 1

4) postgres=# select * from big_data;
ERROR:  invalid memory alloc request size 1468006403



--
Anna Akenteva
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 9d4efc0..85296e9 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1143,7 +1143,7 @@ HandleParallelMessage(ParallelContext *pcxt, int i, StringInfo msg)
 
 		default:
 			{
-elog(ERROR, "unrecognized message type received from parallel worker: %c (message length %d bytes)",
+elog(ERROR, "unrecognized message type received from parallel worker: %c (message length %zu bytes)",
 	 msgtype, msg->len);
 			}
 	}
diff --git a/src/backend/lib/stringinfo.c b/src/backend/lib/stringinfo.c
index 798a823..d99b8bb 100644
--- a/src/backend/lib/stringinfo.c
+++ b/src/backend/lib/stringinfo.c
@@ -45,7 +45,7 @@ makeStringInfo(void)
 void
 initStringInfo(StringInfo str)
 {
-	int			size = 1024;	/* initial default buffer size */
+	size_t		size = 1024;	/* initial default buffer size */
 
 	str->data = (char *) palloc(size);
 	str->maxlen = size;
@@ -80,7 +80,7 @@ appendStringInfo(StringInfo str, const char *fmt,...)
 	for (;;)
 	{
 		va_list		args;
-		int			needed;
+		size_t		needed;
 
 		/* Try to format the data. */
 		va_start(args, fmt);
@@ -110,10 +110,10 @@ appendStringInfo(StringInfo str, const char *fmt,...)
  * to redo va_start before you can rescan the argument list, and we can't do
  * that from here.
  */
-int
+size_t
 appendStringInfoVA(StringInfo str, const char *fmt, va_list args)
 {
-	int			avail;
+	size_t		avail;
 	size_t		nprinted;
 
 	Assert(str != NULL);
@@ -127,24 +127,20 @@ appendStringInfoVA(StringInfo str, const char *fmt, va_list args)
 	if (avail < 16)
 		return 32;
 
-	nprinted = pvsnprintf(str->data + str->len, (size_t) avail, fmt, args);
+	nprinted = pvsnprintf(str->data + str->len, avail, fmt, args);
 
-	if (nprinted < (size_t) avail)
+	if (nprinted < avail)
 	{
 		/* Success.  Note nprinted does not include trailing null. */
-		str->len += (int) nprinted;
+		str->len += nprinted;
 		return 0;
 	}
 
 	/* Restore the trailing null so that str is unmodified. */
 	str->data[str->len] = '\0';
 
-	/*
-	 * Return pvsnprintf's estimate of the space needed.  (Although this is
-	 * given as a size_t, we know it will fit in int because it's not more
-	 * than MaxAllocSize.)
-	 */
-	return (int) nprinted;
+	/* Return pvsnprintf's estimate of the space needed. */
+	return nprinted;
 }
 
 /*
@@ -189,7 +185,7 @@ appendStringInfoSpaces(StringInfo str, int count)
 	if (count > 0)
 	{
 		/* Make more room if needed */
-		enlargeStringInfo(str, count);
+		enlargeStringInfo(str, (size_t) count);
 
 		/* OK, append the spaces */
 		while (--count >= 0)
@@ -205,7 +201,7 @@ appendStringInfoSpaces(StringInfo str, int count)
  * if necessary. Ensures that a trailing null byte is present.
  */
 void
-appendBinaryStringInfo(StringInfo str, const char *data, int datalen)
+appendBinaryStringInfo(StringInfo str, const char *data, size_t datalen)
 {
 	Assert(str != NULL);
 
@@ -231,7 +227,7 @@ appendBinaryStringInfo(StringInfo str, const char *data, int datalen)
  * if necessary. Does not ensure a trailing null-byte exists.
  */
 void
-appendBinaryStringInfoNT(StringInfo str, const char *data, int datalen)
+appendBinaryStringInfoNT(StringInfo str, const char *data, size_t datalen)
 {
 	Assert(str != NULL);
 
@@ -261,21 +257,19 @@ appendBinaryStringInfoNT(StringInfo str, const char *data, int datalen)
  * This is the desired and indeed critical behavior!
  */
 void
-enlargeStringInfo(StringInfo str, int needed)
+enlargeStringInfo(StringInfo str, size_t needed)
 {
-	int			newlen;
+	size_t		newlen;
 
 	/*
 	 * Guard against out-of-range "needed" values.  Without this, we can get
 	 * an overflow or infinite loop in the following.
 	 */
-	if (needed < 0)/* should not happen */
-		elog(ERROR, "invalid string enlargement request size: %d", needed);
-	if (((Size) needed) >= (MaxAllocSize - (Size) str->len))
+	if (needed >= (MaxAllocHugeSize - str->len))
 		

Re: autovacuum: change priority of the vacuumed tables

2018-02-16 Thread Masahiko Sawada
On Fri, Feb 16, 2018 at 7:50 PM, Ildus Kurbangaliev
 wrote:
> On Fri, 16 Feb 2018 17:42:34 +0900
> Masahiko Sawada  wrote:
>
>> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
>>  wrote:
>> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>> >
>> >> Hi,
>> >>
>> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>> >>  wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> Attached patch adds 'autovacuum_table_priority' to the current
>> >>> list of automatic vacuuming settings. It's used in sorting of
>> >>> vacuumed tables in autovacuum worker before actual vacuum.
>> >>>
>> >>> The idea is to give possibility to the users to prioritize their
>> >>> tables in autovacuum process.
>> >>>
>> >> Hmm, I couldn't understand the benefit of this patch. Would you
>> >> elaborate it a little more?
>> >>
>> >> Multiple autovacuum worker can work on one database. So even if a
>> >> table that you want to vacuum first is the back of the list and
>> >> there other worker would pick up it. If the vacuuming the table
>> >> gets delayed due to some big tables are in front of that table I
>> >> think you can deal with it by increasing the number of autovacuum
>> >> workers.
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Masahiko Sawada
>> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> >> NTT Open Source Software Center
>> >>
>> >
>> > Database can contain thousands of tables and often updates/deletes
>> > concentrate mostly in only a handful of tables.
>> > Going through thousands of less bloated tables can take ages.
>> > Currently autovacuum know nothing about prioritizing it`s work with
>> > respect to user`s understanding of his data and application.
>>
>> Understood. I have a question; please imagine the following case.
>>
>> Suppose that there are 1000 tables in a database, and one table of
>> them (table-A) has the highest priority while other 999 tables have
>> same priority. Almost tables (say 800 tables) including table-A need
>> to get vacuumed at some point, so with your patch an AV worker listed
>> 800 tables and table-A will be at the head of the list. Table-A will
>> get vacuumed first but this AV worker has to vacuum other 799 tables
>> even if table-A requires vacuum later again.
>>
>> If an another AV worker launches during table-A being vacuumed, the
>> new AV worker would include table-A but would not process it because
>> concurrent AV worker is processing it. So it would vacuum other tables
>> instead. Similarly, this AV worker can not get the new table list
>> until finish to vacuum all other tables. (Note that it might skip some
>> tables if they are already vacuumed by other AV worker.) On the other
>> hand, if another new AV worker launches after table-A got vacuumed and
>> requires vacuuming again, the new AV worker puts the table-A at the
>> head of list. It processes table-A first but, again, it has to vacuum
>> other tables before getting new table list next time that might
>> include table-A.
>>
>> Is this the expected behavior? I'd rather expect postgres to vacuum it
>> before other lower priority tables whenever the table having the
>> highest priority requires vacuuming, but it wouldn't.
>
> Yes, this is the expected behavior. The patch is the way to give the
> user at least some control of the sorting, later it could be extended
> with something more sophisticated.
>

Since user doesn't know that each AV worker processes tables based on
its table list that is different from lists that other worker has, I
think it's hard for user to understand this parameter. I'd say that
user would expect that high priority table can get vacuumed any time.

I think what you want to solve is to vacuum some tables preferentially
if there are many tables requiring vacuuming. Right? If so, I think
the prioritizing table only in the list would not solve the
fundamental issue. In the example, table-A will still need to wait for
other 799 tables to get vacuumed. Table-A will be bloating during
vacuuming other tables. To deal with it, I think we need something
queue on the shmem per database in order to control the order of
tables waiting for vacuuming and need to use it with a smart
algorithm. Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: After an error - pg_replication_slot is dropped

2018-02-16 Thread tushar

On 02/16/2018 04:02 PM, Petr Jelinek wrote:

It's because you are creating temporary slot. Temporary slots are
removed on error, that's a documented behavior.


Thanks for pointing out but It looks weird behavior - even a small mea 
culpa can remove the slot. Temporary table - doesn't go automatically 
after an error ?


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




master check fails on Windows Server 2008

2018-02-16 Thread Marina Polyakova
Hello, hackers! I got a permanent failure of master (commit 
2a41507dab0f293ff241fe8ae326065998668af8) check on Windows Server 2008. 
Regression output and diffs as well as config.pl are attached.


I used the following commands:
build.bat > build.txt
vcregress.bat check > check.txt

Binary search has shown that this failure begins with commit 
bed9ef5a16239d91d97a1fa2efd9309c3cbbc4b2 (Rework the stats_ext test). On 
the previous commit (70ec3f1f8f0b753c38a1a582280a02930d7cac5f) the check 
passes.
I'm trying to figure out what went wrong, and any suspicions are 
welcome.


About the system: Windows Server 2008 R2 Standard, Service Pack 1, 
64-bit.

--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company*** C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/expected/stats_ext.out	Fri Feb 16 12:56:00 2018
--- C:/Users/buildfarm/mpolyakova/postgrespro_master/src/test/regress/results/stats_ext.out	Fri Feb 16 13:09:47 2018
***
*** 312,322 
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
   QUERY PLAN  
! -
!  HashAggregate
 Group Key: b, c, d
 ->  Seq Scan on ndistinct
! (3 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;
--- 312,324 
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY b, c, d;
  QUERY PLAN 
! ---
!  GroupAggregate
 Group Key: b, c, d
+->  Sort
+  Sort Key: b, c, d
   ->  Seq Scan on ndistinct
! (5 rows)
  
  EXPLAIN (COSTS off)
   SELECT COUNT(*) FROM ndistinct GROUP BY a, d;

==

test tablespace   ... ok
parallel group (20 tests):  boolean char name varchar text int2 int4 txid int8 
bit float8 float4 oid pg_lsn regproc rangetypes enum money uuid numeric
 boolean  ... ok
 char ... ok
 name ... ok
 varchar  ... ok
 text ... ok
 int2 ... ok
 int4 ... ok
 int8 ... ok
 oid  ... ok
 float4   ... ok
 float8   ... ok
 bit  ... ok
 numeric  ... ok
 txid ... ok
 uuid ... ok
 enum ... ok
 money... ok
 rangetypes   ... ok
 pg_lsn   ... ok
 regproc  ... ok
test strings  ... ok
test numerology   ... ok
parallel group (20 tests):  lseg box path line point tstypes reltime circle 
interval date time timetz macaddr8 tinterval abstime macaddr inet timestamptz 
timestamp polygon
 point... ok
 lseg ... ok
 line ... ok
 box  ... ok
 path ... ok
 polygon  ... ok
 circle   ... ok
 date ... ok
 time ... ok
 timetz   ... ok
 timestamp... ok
 timestamptz  ... ok
 interval ... ok
 abstime  ... ok
 reltime  ... ok
 tinterval... ok
 inet ... ok
 macaddr  ... ok
 macaddr8 ... ok
 tstypes  ... ok
parallel group (9 tests):  geometry horology expressions oidjoins misc_sanity 
type_sanity comments regex opr_sanity
 geometry ... ok
 horology ... ok
 regex... ok
 oidjoins ... ok
 type_sanity  ... ok
 opr_sanity   ... ok
 misc_sanity  ... ok
 comments ... ok
 expressions  ... ok
test insert   ... ok
test insert_conflict  ... ok
test create_function_1... ok
test create_type  ... ok
test create_table ... ok
test create_function_2... ok
parallel group (3 tests):  copy copydml copyselect
 copy ... ok
 copyselect   ... ok
 copydml  ... ok
parallel group (3 tests):  create_misc create_operator create_procedure
 create_misc  ... ok
 create_operator  ... ok
 create_procedure ... ok
parallel group (2 tests):  create_view create_index
 create_index ... ok
 create_view  ... ok
parallel group (15 tests):  create_aggregate create_function_3 create_cast 
roleattributes create_am typed_table hash_func vacuum drop_if_exists 
constraints 

Re: [HACKERS] Pluggable storage

2018-02-16 Thread Alexander Korotkov
Hi, Haribabu!

On Mon, Feb 5, 2018 at 2:22 PM, Haribabu Kommi 
wrote:

>
> On Tue, Jan 9, 2018 at 11:42 PM, Haribabu Kommi 
> wrote:
>
>>
>> Updated patches are attached.
>>
>
> To integrate the columnar store with the pluggable storage API, I found
> that
> there are couple of other things also that needs to be supported.
>
> 1. Choosing the right table access method for a particular table?
>
> I am thinking of adding a new table option to let the user select the
> correct table
> access method that the user wants for the table. HEAP is the default access
> method. This approach may be simple and doesn't need any syntax changes.
>
> Or Is it fine to add syntax "USING method" to CREATE TABLE similar like
> CREATE INDEX?
>
> comments?
>

Sure, user needs to specify particular table access method when creating a
table.  "USING method" is good for me.  I think it's better to reuse
already existing syntax rather than reinventing something new.


> 2. As the columnar storage needs many other relations that are needs to be
> created along with main relation.
>

That's an interesting point.  During experimenting on some new index access
methods I also have to define some kind of "internal" relations invisible
for user, but essential for main relation functionality.  I've made them in
hackery manner, and I think legal mechanism for them would be very good.


> As these extra relations are internal to the storage and shouldn't be
> visible
> directly from pg_class and these will be stored in the storage specific
> catalog tables. A dependency is created for the original table as these
> storage
> specific tables must be created/dropped/altered whenever there is a change
> with the original table.
>

I think that internal relations should be visible from pg_class, otherwise
it wouldn't be possible to define dependencies over them.  But they should
have different relkind, so they wouldn't be messed up with regular
relations.

Is it fine to add new API while creating/altering/drop the table to get the
> control?
> or to use only exiting processutility hook?
>

I believe that we need some generic way for defining internal relations,
not hooks.  For me it seems like useful feature for further development of
both index access methods and table access methods.

During developer meeting [1], I've proposed to reorder patches so that
refactoring patches go first and API introduction patches go afterwards.
That would make possible to commit some of refactoring patches earlier
without necessary committing API in the same release cycle.  If no
objection, I would do that reordering.

BTW, EnterpriseDB announces zheap table access method (heap with undo log)
[2].  I think this is great, and I'm looking forward for publishing zheap
in mailing lists.  But I'm concerning about its compatibility with
pluggable table access methods API.  Does zheap use table AM API from this
thread?  Or does it just override current heap and needs to be adopted to
use table AM API?  Or does it implements own API?

*Links*

1. https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2018_
Developer_Meeting#Minutes
2. http://rhaas.blogspot.com.by/2018/01/do-or-undo-there-is-no-vacuum.html

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: non-bulk inserts and tuple routing

2018-02-16 Thread Etsuro Fujita

(2018/02/16 18:23), Amit Langote wrote:

On 2018/02/16 18:12, Etsuro Fujita wrote:

In the patch you added the comments:

+   wcoList = linitial(node->withCheckOptionLists);
+
+   /*
+* Convert Vars in it to contain this partition's attribute numbers.
+* Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+* reference to calculate attno's for the returning expression of
+* this partition.  In the INSERT case, that refers to the root
+* partitioned table, whereas in the UPDATE tuple routing case the
+* first partition in the mtstate->resultRelInfo array.  In any case,
+* both that relation and this partition should have the same
columns,
+* so we should be able to map attributes successfully.
+*/
+   wcoList = map_partition_varattnos(wcoList, firstVarno,
+ partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange
these comments, maybe by dividing these to something like this:

/*
 * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
 * reference to calculate attno's for the returning expression of
 * this partition.  In the INSERT case, that refers to the root
 * partitioned table, whereas in the UPDATE tuple routing case the
 * first partition in the mtstate->resultRelInfo array.  In any case,
 * both that relation and this partition should have the same columns,
 * so we should be able to map attributes successfully.
 */
wcoList = linitial(node->withCheckOptionLists);

/*
 * Convert Vars in it to contain this partition's attribute numbers.
 */
wcoList = map_partition_varattnos(wcoList, firstVarno,
  partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.


Good idea.  Done.


Thanks.  I fixed a typo (s/Converti/Convert/) and adjusted these 
comments a bit further to match the preceding code/comments.  Attached 
is the updated version.



Another thing I noticed about comments in the patch is:

+* In the case of INSERT on partitioned tables, there is only one
+* plan.  Likewise, there is only one WCO list, not one per
+* partition.  For UPDATE, there would be as many WCO lists as
+* there are plans, but we use the first one as reference.  Note
+* that if there are SubPlans in there, they all end up attached
+* to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
WCO constraints, which would change the place to attach SubPlans in WCO
constraints from the parent PlanState to the subplan's PlanState, which
would make the last comment obsolete.  Since this change would be more
consistent with PG10, I don't think we need to update the comment as such;
I would vote for just removing that comment from here.


I have thought about removing it too, so done.


OK.


Updated patch attached.


Thanks, I think the patch is in good shape, so I'll mark this as ready 
for committer.


Best regards,
Etsuro Fujita
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df..118452b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
 		PartitionTupleRouting *proute;
 
 		proute = cstate->partition_tuple_routing =
-			ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, estate);
+			ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
 		/*
 		 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 			 */
 			saved_resultRelInfo = resultRelInfo;
 			resultRelInfo = proute->partitions[leaf_part_index];
+			if (resultRelInfo == NULL)
+			{
+resultRelInfo = ExecInitPartitionInfo(NULL,
+	  saved_resultRelInfo,
+	  proute, estate,
+	  leaf_part_index);
+Assert(resultRelInfo != NULL);
+			}
 
 			/* We do not yet have a way to insert into a foreign partition */
 			if (resultRelInfo->ri_FdwRoutine)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 4048c3e..06a7607 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -44,18 +44,23 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
  *
  * Note that all the relations in the partition tree are locked using the
  * RowExclusiveLock mode upon return from this function.
+ *
+ * While we allocate the arrays of pointers of ResultRelInfo and
+ * TupleConversionMap for all partitions here, actual objects themselves are
+ * lazily allocated for a given partition if a tuple is actually routed to it;
+ * see ExecInitPartitionInfo.  However, if the function is invoked for update
+ * tuple routing, caller would already 

Re: autovacuum: change priority of the vacuumed tables

2018-02-16 Thread Ildus Kurbangaliev
On Fri, 16 Feb 2018 17:42:34 +0900
Masahiko Sawada  wrote:

> On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
>  wrote:
> > On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
> >  
> >> Hi,
> >>
> >> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
> >>  wrote:  
> >>>
> >>> Hi,
> >>>
> >>> Attached patch adds 'autovacuum_table_priority' to the current
> >>> list of automatic vacuuming settings. It's used in sorting of
> >>> vacuumed tables in autovacuum worker before actual vacuum.
> >>>
> >>> The idea is to give possibility to the users to prioritize their
> >>> tables in autovacuum process.
> >>>  
> >> Hmm, I couldn't understand the benefit of this patch. Would you
> >> elaborate it a little more?
> >>
> >> Multiple autovacuum worker can work on one database. So even if a
> >> table that you want to vacuum first is the back of the list and
> >> there other worker would pick up it. If the vacuuming the table
> >> gets delayed due to some big tables are in front of that table I
> >> think you can deal with it by increasing the number of autovacuum
> >> workers.
> >>
> >> Regards,
> >>
> >> --
> >> Masahiko Sawada
> >> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
> >> NTT Open Source Software Center
> >>  
> >
> > Database can contain thousands of tables and often updates/deletes
> > concentrate mostly in only a handful of tables.
> > Going through thousands of less bloated tables can take ages.
> > Currently autovacuum know nothing about prioritizing it`s work with
> > respect to user`s understanding of his data and application.  
> 
> Understood. I have a question; please imagine the following case.
> 
> Suppose that there are 1000 tables in a database, and one table of
> them (table-A) has the highest priority while other 999 tables have
> same priority. Almost tables (say 800 tables) including table-A need
> to get vacuumed at some point, so with your patch an AV worker listed
> 800 tables and table-A will be at the head of the list. Table-A will
> get vacuumed first but this AV worker has to vacuum other 799 tables
> even if table-A requires vacuum later again.
> 
> If an another AV worker launches during table-A being vacuumed, the
> new AV worker would include table-A but would not process it because
> concurrent AV worker is processing it. So it would vacuum other tables
> instead. Similarly, this AV worker can not get the new table list
> until finish to vacuum all other tables. (Note that it might skip some
> tables if they are already vacuumed by other AV worker.) On the other
> hand, if another new AV worker launches after table-A got vacuumed and
> requires vacuuming again, the new AV worker puts the table-A at the
> head of list. It processes table-A first but, again, it has to vacuum
> other tables before getting new table list next time that might
> include table-A.
> 
> Is this the expected behavior? I'd rather expect postgres to vacuum it
> before other lower priority tables whenever the table having the
> highest priority requires vacuuming, but it wouldn't.

Yes, this is the expected behavior. The patch is the way to give the
user at least some control of the sorting, later it could be extended
with something more sophisticated.


-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Server Crash while executing pg_replication_slot_advance (second time)

2018-02-16 Thread tushar

Hi,

Getting an another server crash against latest sources of v11 while 
executing pg_replication_slot_advance second time . This issue is also  
reproducible  with the patch given at  



Test-Case scenario-

postgres=#  SELECT * FROM 
pg_create_logical_replication_slot('regression_slot1', 'test_decoding', 
true);

    slot_name |    lsn
--+---
 regression_slot1 | 0/4000258
(1 row)

postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |   8756 |  |  557 | 0/4000220   | 
0/4000258

(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/4000270
(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/500
(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/500');

  end_lsn
---
 0/500
(1 row)

postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |   8756 |  |  557 | 0/4000220   | 
0/500

(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/578
(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/600
(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/600');

server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> \q

Stack trace -

Core was generated by `postgres: centos postgres [local] 
SELECT  '.

Program terminated with signal 6, Aborted.
#0  0x003746e325e5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 
krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 
libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-48.el6_8.4.x86_64 
zlib-1.2.3-29.el6.x86_64

(gdb) bt
#0  0x003746e325e5 in raise () from /lib64/libc.so.6
#1  0x003746e33dc5 in abort () from /lib64/libc.so.6
#2  0x00a11f8a in ExceptionalCondition (conditionName=0xad5f00 
"!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + 
((8) - 1)) & ~((uintptr_t) ((8) - 1)",
    errorType=0xad5eed "FailedAssertion", fileName=0xad5ee0 
"xlogreader.c", lineNumber=243) at assert.c:54
#3  0x0055df1d in XLogReadRecord (state=0x2d859c0, 
RecPtr=83886080, errormsg=0x7ffd6a4359f8) at xlogreader.c:243
#4  0x008516bc in pg_logical_replication_slot_advance 
(startlsn=83886080, moveto=100663296) at slotfuncs.c:370
#5  0x00851a21 in pg_replication_slot_advance 
(fcinfo=0x7ffd6a435bb0) at slotfuncs.c:488
#6  0x006defdc in ExecMakeTableFunctionResult 
(setexpr=0x2d5e260, econtext=0x2d5df58, argContext=0x2da97f0, 
expectedDesc=0x2d5f780, randomAccess=0 '\000') at execSRF.c:231
#7  0x006f1782 in FunctionNext (node=0x2d5de40) at 
nodeFunctionscan.c:95
#8  0x006de80d in ExecScanFetch (node=0x2d5de40, 
accessMtd=0x6f16cb , recheckMtd=0x6f1abd 
) at execScan.c:95
#9  0x006de8ad in ExecScan (node=0x2d5de40, accessMtd=0x6f16cb 
, recheckMtd=0x6f1abd ) at execScan.c:162
#10 0x006f1b0a in ExecFunctionScan (pstate=0x2d5de40) at 
nodeFunctionscan.c:270
#11 0x006dcd66 in ExecProcNodeFirst (node=0x2d5de40) at 
execProcnode.c:446
#12 0x006d3c05 in ExecProcNode (node=0x2d5de40) at 
../../../src/include/executor/executor.h:246
#13 0x006d6559 in ExecutePlan (estate=0x2d5dc28, 
planstate=0x2d5de40, use_parallel_mode=0 '\000', operation=CMD_SELECT, 
sendTuples=1 '\001', numberTuples=0,
    direction=ForwardScanDirection, dest=0x2d63628, execute_once=1 
'\001') at execMain.c:1721
#14 0x006d41d7 in standard_ExecutorRun (queryDesc=0x2cce878, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
execMain.c:361
#15 0x006d3ff3 in ExecutorRun (queryDesc=0x2cce878, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
execMain.c:304
#16 0x008afeeb in PortalRunSelect (portal=0x2d0e838, 

Re: After an error - pg_replication_slot is dropped

2018-02-16 Thread Petr Jelinek
Hi,

On 16/02/18 10:51, tushar wrote:
> Hi,
> 
> Please refer this straight forward scenario  against latest sources of v11.
> 
> [centos@centos-cpula bin]$ ./psql  postgres
> psql (11devel)
> Type "help" for help.
> 
> postgres=#  SELECT * FROM
> pg_create_logical_replication_slot('regression_slot1', 'test_decoding',
> true);
>     slot_name |    lsn
> --+---
>  regression_slot1 | 0/40001E8
> (1 row)
> 
> postgres=#
> postgres=# select * from pg_replication_slots;
>     slot_name |    plugin | slot_type | datoid | database |
> temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
> confirmed_flush_lsn
> --+---+---++--+---+++--+--+-+-
> 
>  regression_slot1 | test_decoding | logical   |  13220 | postgres |
> t | t  |  28015 |  |  557 | 0/40001B0 |
> 0/40001E8
> (1 row)
> 
> --Try to again create  the same slot , getting an error - which is expected
> postgres=#
> postgres=#  SELECT * FROM
> pg_create_logical_replication_slot('regression_slot1', 'test_decoding',
> true);
> ERROR:  replication slot "regression_slot1" already exists
> postgres=#
> 
> --No slot found
> postgres=# select * from pg_replication_slots;
>  slot_name | plugin | slot_type | datoid | database | temporary | active
> | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
> ---++---++--+---+++--+--+-+-
> 
> (0 rows)
> 

It's because you are creating temporary slot. Temporary slots are
removed on error, that's a documented behavior.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Server crash in pg_replication_slot_advance function

2018-02-16 Thread Masahiko Sawada
On Fri, Feb 16, 2018 at 6:55 PM, amul sul  wrote:
> On Fri, Feb 16, 2018 at 3:06 PM, amul sul  wrote:
>> On Fri, Feb 16, 2018 at 1:44 PM, tushar  
>> wrote:
>>> Hi,
>> []
>>> postgres=# SELECT end_lsn FROM
>>> pg_replication_slot_advance('regression_slot1', '0/271');
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>>  before or while processing the request.
>>> !>
>>>
>>
>> I am able to reproduce this on the latest master head, the problem is in the
>> following hunk of pg_replication_slot_advance() where oldest LSN value
>> is accessed after releasing replication slot:
>>
>>
>> 476 if (moveto < startlsn)
>> 477 {
>> 478 ReplicationSlotRelease();
>> 479 ereport(ERROR,
>> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> 481  errmsg("cannot move slot to %X/%X, minimum is %X/%X",
>> 482 (uint32) (moveto >> 32), (uint32) moveto,
>> 483 (uint32)
>> (MyReplicationSlot->data.confirmed_flush >> 32),
>> 484 (uint32)
>> (MyReplicationSlot->data.confirmed_flush;
>> 485 }
>> 486
>>
>
>  Attached patch proposes a required fix.
>

The change looks good to me. Thank you.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: Server crash in pg_replication_slot_advance function

2018-02-16 Thread Petr Jelinek
On 16/02/18 10:55, amul sul wrote:
> On Fri, Feb 16, 2018 at 3:06 PM, amul sul  wrote:
>> On Fri, Feb 16, 2018 at 1:44 PM, tushar  
>> wrote:
>>> Hi,
>> []
>>> postgres=# SELECT end_lsn FROM
>>> pg_replication_slot_advance('regression_slot1', '0/271');
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>>  before or while processing the request.
>>> !>
>>>
>>
>> I am able to reproduce this on the latest master head, the problem is in the
>> following hunk of pg_replication_slot_advance() where oldest LSN value
>> is accessed after releasing replication slot:
>>
>>
>> 476 if (moveto < startlsn)
>> 477 {
>> 478 ReplicationSlotRelease();
>> 479 ereport(ERROR,
>> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
>> 481  errmsg("cannot move slot to %X/%X, minimum is %X/%X",
>> 482 (uint32) (moveto >> 32), (uint32) moveto,
>> 483 (uint32)
>> (MyReplicationSlot->data.confirmed_flush >> 32),
>> 484 (uint32)
>> (MyReplicationSlot->data.confirmed_flush;
>> 485 }
>> 486
>>

Yeah, of course we can't use MyReplicationSlot after calling
ReplicationSlotRelease().

> 
>  Attached patch proposes a required fix.
> 

Looks correct to me.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Server crash in pg_replication_slot_advance function

2018-02-16 Thread tushar

On 02/16/2018 03:25 PM, amul sul wrote:

  Attached patch proposes a required fix.


Thanks, Issue seems to be fixed with this patch , now getting an 
expected error -ERROR:  cannot move slot to 0/290, minimum is 0/298


--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: Server crash in pg_replication_slot_advance function

2018-02-16 Thread amul sul
On Fri, Feb 16, 2018 at 3:06 PM, amul sul  wrote:
> On Fri, Feb 16, 2018 at 1:44 PM, tushar  wrote:
>> Hi,
> []
>> postgres=# SELECT end_lsn FROM
>> pg_replication_slot_advance('regression_slot1', '0/271');
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>>  before or while processing the request.
>> !>
>>
>
> I am able to reproduce this on the latest master head, the problem is in the
> following hunk of pg_replication_slot_advance() where oldest LSN value
> is accessed after releasing replication slot:
>
>
> 476 if (moveto < startlsn)
> 477 {
> 478 ReplicationSlotRelease();
> 479 ereport(ERROR,
> 480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> 481  errmsg("cannot move slot to %X/%X, minimum is %X/%X",
> 482 (uint32) (moveto >> 32), (uint32) moveto,
> 483 (uint32)
> (MyReplicationSlot->data.confirmed_flush >> 32),
> 484 (uint32)
> (MyReplicationSlot->data.confirmed_flush;
> 485 }
> 486
>

 Attached patch proposes a required fix.


Regards,
Amul


fix_crash_in_pg_replication_slot_advance.patch
Description: Binary data


After an error - pg_replication_slot is dropped

2018-02-16 Thread tushar

Hi,

Please refer this straight forward scenario  against latest sources of v11.

[centos@centos-cpula bin]$ ./psql  postgres
psql (11devel)
Type "help" for help.

postgres=#  SELECT * FROM 
pg_create_logical_replication_slot('regression_slot1', 'test_decoding', 
true);

    slot_name |    lsn
--+---
 regression_slot1 | 0/40001E8
(1 row)

postgres=#
postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |  28015 |  |  557 | 0/40001B0 | 
0/40001E8

(1 row)

--Try to again create  the same slot , getting an error - which is expected
postgres=#
postgres=#  SELECT * FROM 
pg_create_logical_replication_slot('regression_slot1', 'test_decoding', 
true);

ERROR:  replication slot "regression_slot1" already exists
postgres=#

--No slot found
postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | 
active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

---++---++--+---+++--+--+-+-
(0 rows)

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: Server crash in pg_replication_slot_advance function

2018-02-16 Thread amul sul
On Fri, Feb 16, 2018 at 1:44 PM, tushar  wrote:
> Hi,
[]
> postgres=# SELECT end_lsn FROM
> pg_replication_slot_advance('regression_slot1', '0/271');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
>  before or while processing the request.
> !>
>

I am able to reproduce this on the latest master head, the problem is in the
following hunk of pg_replication_slot_advance() where oldest LSN value
is accessed after releasing replication slot:


476 if (moveto < startlsn)
477 {
478 ReplicationSlotRelease();
479 ereport(ERROR,
480 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
481  errmsg("cannot move slot to %X/%X, minimum is %X/%X",
482 (uint32) (moveto >> 32), (uint32) moveto,
483 (uint32)
(MyReplicationSlot->data.confirmed_flush >> 32),
484 (uint32)
(MyReplicationSlot->data.confirmed_flush;
485 }
486

Regards,
Amul



Re: non-bulk inserts and tuple routing

2018-02-16 Thread Amit Langote
Fujita-san,

Thanks for the review.

On 2018/02/16 18:12, Etsuro Fujita wrote:
> (2018/02/16 13:42), Amit Langote wrote:
>> Attached v9.  Thanks a for the review!
> 
> Thanks for the updated patch!  In the patch you added the comments:
> 
> +   wcoList = linitial(node->withCheckOptionLists);
> +
> +   /*
> +    * Convert Vars in it to contain this partition's attribute numbers.
> +    * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
> +    * reference to calculate attno's for the returning expression of
> +    * this partition.  In the INSERT case, that refers to the root
> +    * partitioned table, whereas in the UPDATE tuple routing case the
> +    * first partition in the mtstate->resultRelInfo array.  In any case,
> +    * both that relation and this partition should have the same
> columns,
> +    * so we should be able to map attributes successfully.
> +    */
> +   wcoList = map_partition_varattnos(wcoList, firstVarno,
> + partrel, firstResultRel, NULL);
> 
> This would be just nitpicking, but I think it would be better to arrange
> these comments, maybe by dividing these to something like this:
> 
>    /*
>     * Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
>     * reference to calculate attno's for the returning expression of
>     * this partition.  In the INSERT case, that refers to the root
>     * partitioned table, whereas in the UPDATE tuple routing case the
>     * first partition in the mtstate->resultRelInfo array.  In any case,
>     * both that relation and this partition should have the same columns,
>     * so we should be able to map attributes successfully.
>     */
>    wcoList = linitial(node->withCheckOptionLists);
> 
>    /*
>     * Convert Vars in it to contain this partition's attribute numbers.
>     */
>    wcoList = map_partition_varattnos(wcoList, firstVarno,
>  partrel, firstResultRel, NULL);
> 
> I'd say the same thing to the comments you added for RETURNING.

Good idea.  Done.

> Another thing I noticed about comments in the patch is:
> 
> +    * In the case of INSERT on partitioned tables, there is only one
> +    * plan.  Likewise, there is only one WCO list, not one per
> +    * partition.  For UPDATE, there would be as many WCO lists as
> +    * there are plans, but we use the first one as reference.  Note
> +    * that if there are SubPlans in there, they all end up attached
> +    * to the one parent Plan node.
> 
> The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing
> WCO constraints, which would change the place to attach SubPlans in WCO
> constraints from the parent PlanState to the subplan's PlanState, which
> would make the last comment obsolete.  Since this change would be more
> consistent with PG10, I don't think we need to update the comment as such;
> I would vote for just removing that comment from here.

I have thought about removing it too, so done.

Updated patch attached.

Thanks,
Amit
From 03c0f265537fc618d5f0de2c7b7e487b89af014d Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 19 Dec 2017 16:20:09 +0900
Subject: [PATCH v10] During tuple-routing, initialize per-partition objects
 lazily

Those objects include ResultRelInfo, tuple conversion map,
WITH CHECK OPTION quals and RETURNING projections.

This means we don't allocate these objects for partitions that are
never inserted into.
---
 src/backend/commands/copy.c|  10 +-
 src/backend/executor/execPartition.c   | 309 -
 src/backend/executor/nodeModifyTable.c | 131 ++
 src/include/executor/execPartition.h   |   9 +-
 4 files changed, 252 insertions(+), 207 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b3933df9af..118452b602 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -2470,7 +2470,7 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute;
 
proute = cstate->partition_tuple_routing =
-   ExecSetupPartitionTupleRouting(NULL, cstate->rel, 1, 
estate);
+   ExecSetupPartitionTupleRouting(NULL, cstate->rel);
 
/*
 * If we are capturing transition tuples, they may need to be
@@ -2607,6 +2607,14 @@ CopyFrom(CopyState cstate)
 */
saved_resultRelInfo = resultRelInfo;
resultRelInfo = proute->partitions[leaf_part_index];
+   if (resultRelInfo == NULL)
+   {
+   resultRelInfo = ExecInitPartitionInfo(NULL,
+   
  saved_resultRelInfo,
+ 

Re: non-bulk inserts and tuple routing

2018-02-16 Thread Etsuro Fujita

(2018/02/16 13:42), Amit Langote wrote:

On 2018/02/16 12:41, Etsuro Fujita wrote:

(2018/02/16 10:49), Amit Langote wrote:

I think you're right.  If node->returningLists is non-NULL at all,
ExecInitModifyTable() would've initialized the needed slot and expression
context.  I added Assert()s to that affect.


OK, but one thing I'd like to ask is:

+   /*
+* Use the slot that would have been set up in ExecInitModifyTable()
+* for the output of the RETURNING projection(s).  Just make sure to
+* assign its rowtype using the RETURNING list.
+*/
+   Assert(mtstate->ps.ps_ResultTupleSlot != NULL);
+   tupDesc = ExecTypeFromTL(returningList, false);
+   ExecAssignResultType(>ps, tupDesc);
+   slot = mtstate->ps.ps_ResultTupleSlot;

Do we need that assignment here?


I guess mean the assignment of rowtype, that is, the
ExecAssignResultType() line.


That's right.


On looking at this some more, it looks like
we don't need to ExecAssignResultType here, as you seem to be suspecting,
because we want the RETURNING projection output to use the rowtype of the
first of returningLists and that's what mtstate->ps.ps_ResultTupleSlot has
been set to use in the first place.


Year, I think so, too.


So, removed the ExecAssignResultType().


OK, thinks.


Attached v9.  Thanks a for the review!


Thanks for the updated patch!  In the patch you added the comments:

+   wcoList = linitial(node->withCheckOptionLists);
+
+   /*
+* Convert Vars in it to contain this partition's attribute numbers.
+* Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
+* reference to calculate attno's for the returning expression of
+* this partition.  In the INSERT case, that refers to the root
+* partitioned table, whereas in the UPDATE tuple routing case the
+* first partition in the mtstate->resultRelInfo array.  In any 
case,
+* both that relation and this partition should have the same 
columns,

+* so we should be able to map attributes successfully.
+*/
+   wcoList = map_partition_varattnos(wcoList, firstVarno,
+ partrel, firstResultRel, NULL);

This would be just nitpicking, but I think it would be better to arrange 
these comments, maybe by dividing these to something like this:


   /*
* Use the WITH CHECK OPTIONS list of the first resultRelInfo as a
* reference to calculate attno's for the returning expression of
* this partition.  In the INSERT case, that refers to the root
* partitioned table, whereas in the UPDATE tuple routing case the
* first partition in the mtstate->resultRelInfo array.  In any 
case,
* both that relation and this partition should have the same 
columns,

* so we should be able to map attributes successfully.
*/
   wcoList = linitial(node->withCheckOptionLists);

   /*
* Convert Vars in it to contain this partition's attribute numbers.
*/
   wcoList = map_partition_varattnos(wcoList, firstVarno,
 partrel, firstResultRel, NULL);

I'd say the same thing to the comments you added for RETURNING.

Another thing I noticed about comments in the patch is:

+* In the case of INSERT on partitioned tables, there is only one
+* plan.  Likewise, there is only one WCO list, not one per
+* partition.  For UPDATE, there would be as many WCO lists as
+* there are plans, but we use the first one as reference.  Note
+* that if there are SubPlans in there, they all end up attached
+* to the one parent Plan node.

The patch calls ExecInitQual with mtstate->mt_plans[0] for initializing 
WCO constraints, which would change the place to attach SubPlans in WCO 
constraints from the parent PlanState to the subplan's PlanState, which 
would make the last comment obsolete.  Since this change would be more 
consistent with PG10, I don't think we need to update the comment as 
such; I would vote for just removing that comment from here.


Best regards,
Etsuro Fujita



Re: Contention preventing locking

2018-02-16 Thread Michail Nikolaev
Hello.

Just want to notice - this work also correlates with
https://www.postgresql.org/message-id/CAEepm%3D18buPTwNWKZMrAXLqja1Tvezw6sgFJKPQ%2BsFFTuwM0bQ%40mail.gmail.com
 paper.
It provides more general way to address the issue comparing to single
optimisations (but they could do the work too, of course).

Thanks,
Michail.


чт, 15 февр. 2018 г. в 19:00, Konstantin Knizhnik :

> Hi,
>
> PostgreSQL performance degrades signficantly in case of high contention.
> You can look at the attached YCSB results (ycsb-zipf-pool.png) to
> estimate the level of this degradation.
>
> Postgres is acquiring two kind of heavy weight locks during update: it
> locks TID of the updated tuple and XID of transaction created this version.
> In debugger I see the following picture: if several transactions are
> trying to update the same record, then first of all they compete for
> SnapshotDirty.xmax transaction lock in EvalPlanQualFetch.
> Then in heap_tuple_update they are trying to lock TID of the updated
> tuple: heap_acquire_tuplock in heapam.c
> After that transactions wait completion of the transaction updated the
> tuple: XactLockTableWait in heapam.c
>
> So in heap_acquire_tuplock all competing transactions are waiting for
> TID of the updated version. When transaction which changed this tuple is
> committed, one of the competitors will grant this lock and proceed,
> creating new version of the tuple. Then all other competitors will be
> awaken and ... find out that locked tuple is not the last version any more.
> Then they will locate new version and try to lock it... The more
> competitors we have, then more attempts they all have to perform
> (quadratic complexity).
>
> My idea was that we can avoid such performance degradation if we somehow
> queue competing transactions so that they will get control one-by-one,
> without doing useless work.
> First of all I tried to replace TID  lock with PK (primary key) lock.
> Unlike TID, PK of record  is not changed during hot update. The second
> idea is that instead immediate releasing lock after update we can hold
> it until the end of transaction. And this optimization really provides
> improve of performance - it corresponds to pg_f1_opt configuration at
> the attached diagram.
> For some workloads it provides up to two times improvement comparing
> with vanilla Postgres. But there are many problems with correct
> (non-prototype) implementation of this approach:
> handling different types of PK, including compound keys, PK updates,...
>
> Another approach,  which I have recently implemented, is much simpler
> and address another lock kind: transaction lock.
> The idea o this approach is mostly the same: all competing transaction
> are waiting for transaction which is changing the tuple.
> Then one of them is given a chance to proceed and now all other
> transactions are waiting for this transaction and so on:
>
> T1<-T2,T3,T4,T5,T6,T7,T8,T9
> T2<-T3,T4,T5,T6,T7,T8,T9
> T3<-T4,T5,T6,T7,T8,T9
> ...
>
> <- here corresponds to "wait for" dependency between transactions.
>
> If we change this picture to
>
> T1<-T2, T2<-T3, T3<-T4, T4<-T5, T5<-T6, T6<-T7, T7<-T8, T8<-T9
>
> then number of lock requests can be significantly reduced.
> And it can be implemented using very simple patch (attached xlock.patch).
> I hope that I have not done something incorrect here.
> Effect of this simple patch is even larger:  more than 3 times for
> 50..70 clients.
> Please look at the attached xlock.pdf spreadsheet.
>
> Unfortunately combination of this two approaches gives worser result
> than just single xlock.patch.
> Certainly this patch also requires further improvement, for example it
> will not correctly work in case of aborting transactions due to deadlock
> or some other reasons.
> I just want to know option of community if the proposed approaches to
> reduce contention are really promising and it makes sense to continue
> work in this direction.
>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>


Re: autovacuum: change priority of the vacuumed tables

2018-02-16 Thread Masahiko Sawada
On Thu, Feb 15, 2018 at 10:16 PM, Grigory Smolkin
 wrote:
> On 02/15/2018 09:28 AM, Masahiko Sawada wrote:
>
>> Hi,
>>
>> On Thu, Feb 8, 2018 at 11:01 PM, Ildus Kurbangaliev
>>  wrote:
>>>
>>> Hi,
>>>
>>> Attached patch adds 'autovacuum_table_priority' to the current list of
>>> automatic vacuuming settings. It's used in sorting of vacuumed tables in
>>> autovacuum worker before actual vacuum.
>>>
>>> The idea is to give possibility to the users to prioritize their tables
>>> in autovacuum process.
>>>
>> Hmm, I couldn't understand the benefit of this patch. Would you
>> elaborate it a little more?
>>
>> Multiple autovacuum worker can work on one database. So even if a
>> table that you want to vacuum first is the back of the list and there
>> other worker would pick up it. If the vacuuming the table gets delayed
>> due to some big tables are in front of that table I think you can deal
>> with it by increasing the number of autovacuum workers.
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> NIPPON TELEGRAPH AND TELEPHONE CORPORATION
>> NTT Open Source Software Center
>>
>
> Database can contain thousands of tables and often updates/deletes
> concentrate mostly in only a handful of tables.
> Going through thousands of less bloated tables can take ages.
> Currently autovacuum know nothing about prioritizing it`s work with respect
> to user`s understanding of his data and application.

Understood. I have a question; please imagine the following case.

Suppose that there are 1000 tables in a database, and one table of
them (table-A) has the highest priority while other 999 tables have
same priority. Almost tables (say 800 tables) including table-A need
to get vacuumed at some point, so with your patch an AV worker listed
800 tables and table-A will be at the head of the list. Table-A will
get vacuumed first but this AV worker has to vacuum other 799 tables
even if table-A requires vacuum later again.

If an another AV worker launches during table-A being vacuumed, the
new AV worker would include table-A but would not process it because
concurrent AV worker is processing it. So it would vacuum other tables
instead. Similarly, this AV worker can not get the new table list
until finish to vacuum all other tables. (Note that it might skip some
tables if they are already vacuumed by other AV worker.) On the other
hand, if another new AV worker launches after table-A got vacuumed and
requires vacuuming again, the new AV worker puts the table-A at the
head of list. It processes table-A first but, again, it has to vacuum
other tables before getting new table list next time that might
include table-A.

Is this the expected behavior? I'd rather expect postgres to vacuum it
before other lower priority tables whenever the table having the
highest priority requires vacuuming, but it wouldn't.

> Also It`s would be great to sort tables according to dead/live tuple ratio
> and relfrozenxid.

Yeah, for anti-wraparound vacuum on the database, it would be good
idea to sort the list by relfrozenxid as discussed on another
thread[1],

[1] 
https://www.postgresql.org/message-id/CA%2BTgmobT3m%3D%2BdU5HF3VGVqiZ2O%2Bv6P5wN1Gj%2BPrq%2Bhj7dAm9AQ%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Server crash in pg_replication_slot_advance function

2018-02-16 Thread tushar

Hi,

While playing around with newly implemented function 
'pg_replication_slot_advance' , found a server crash .


Please refer the testcase scenario -

centos@centos-cpula bin]$ ./psql  postgres
psql (11devel)
Type "help" for help.

postgres=#  SELECT * FROM 
pg_create_logical_replication_slot('regression_slot1', 'test_decoding', 
true);

    slot_name |    lsn
--+---
 regression_slot1 | 0/1636AE8
(1 row)

postgres=# select * from pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |  25910 |  |  557 | 0/1636AB0 | 
0/1636AE8

(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/1636B38
(1 row)

postgres=# select pg_switch_wal();
 pg_switch_wal
---
 0/278
(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/278');

  end_lsn
---
 0/278
(1 row)

postgres=# select * from pg_replication_slots;
2018-02-16 08:09:57.989 GMT [25910] LOG:  statement: select * from 
pg_replication_slots;
    slot_name |    plugin | slot_type | datoid | database | 
temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | 
confirmed_flush_lsn

--+---+---++--+---+++--+--+-+-
 regression_slot1 | test_decoding | logical   |  13220 | postgres | 
t | t  |  25910 |  |  557 | 0/228 | 
0/278

(1 row)

postgres=# SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/271');

server closed the connection unexpectedly
This probably means the server terminated abnormally
 before or while processing the request.
!>

Stack trace -

Core was generated by `postgres: centos postgres [local] 
SELECT  '.

Program terminated with signal 11, Segmentation fault.
#0  0x008519a8 in pg_replication_slot_advance 
(fcinfo=0x7fff5c9c4630) at slotfuncs.c:479

479            ereport(ERROR,
Missing separate debuginfos, use: debuginfo-install 
glibc-2.12-1.192.el6.x86_64 keyutils-libs-1.4-5.el6.x86_64 
krb5-libs-1.10.3-57.el6.x86_64 libcom_err-1.41.12-22.el6.x86_64 
libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-48.el6_8.4.x86_64 
zlib-1.2.3-29.el6.x86_64

(gdb) bt
#0  0x008519a8 in pg_replication_slot_advance 
(fcinfo=0x7fff5c9c4630) at slotfuncs.c:479
#1  0x006defdc in ExecMakeTableFunctionResult 
(setexpr=0x20f7db0, econtext=0x20f7aa8, argContext=0x20fd690, 
expectedDesc=0x20f92d0, randomAccess=0 '\000') at execSRF.c:231
#2  0x006f1782 in FunctionNext (node=0x20f7990) at 
nodeFunctionscan.c:95
#3  0x006de80d in ExecScanFetch (node=0x20f7990, 
accessMtd=0x6f16cb , recheckMtd=0x6f1abd 
) at execScan.c:95
#4  0x006de8ad in ExecScan (node=0x20f7990, accessMtd=0x6f16cb 
, recheckMtd=0x6f1abd ) at execScan.c:162
#5  0x006f1b0a in ExecFunctionScan (pstate=0x20f7990) at 
nodeFunctionscan.c:270
#6  0x006dcd66 in ExecProcNodeFirst (node=0x20f7990) at 
execProcnode.c:446
#7  0x006d3c05 in ExecProcNode (node=0x20f7990) at 
../../../src/include/executor/executor.h:246
#8  0x006d6559 in ExecutePlan (estate=0x20f7778, 
planstate=0x20f7990, use_parallel_mode=0 '\000', operation=CMD_SELECT, 
sendTuples=1 '\001', numberTuples=0,
    direction=ForwardScanDirection, dest=0x20fd288, execute_once=1 
'\001') at execMain.c:1721
#9  0x006d41d7 in standard_ExecutorRun (queryDesc=0x2065878, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
execMain.c:361
#10 0x006d3ff3 in ExecutorRun (queryDesc=0x2065878, 
direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
execMain.c:304
#11 0x008afeeb in PortalRunSelect (portal=0x20a5838, forward=1 
'\001', count=0, dest=0x20fd288) at pquery.c:932
#12 0x008afb79 in PortalRun (portal=0x20a5838, 
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001', 
dest=0x20fd288, altdest=0x20fd288,

    completionTag=0x7fff5c9c4f00 "") at pquery.c:773
#13 0x008a9ba0 in exec_simple_query (query_string=0x2040438 
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', 
'0/271');") at postgres.c:1120
#14 0x008ade34 in PostgresMain (argc=1, argv=0x206bec0, 
dbname=0x206bd20 "postgres", username=0x203cf58 "centos") at postgres.c:4144

#15 0x0080d11a in BackendRun (port=0x2063d00) at postmaster.c:4412
#16 0x0080c88e in BackendStartup (port=0x2063d00) at 
postmaster.c:4084

#17 0x00808cbb in ServerLoop () at postmaster.c:1757
#18 

Re: [HACKERS] Can ICU be used for a database's default sort order?

2018-02-16 Thread Andrey Borodin
Hi everyone! 

> 10 февр. 2018 г., в 20:45, Andrey Borodin  написал(а):
> 
> I'm planning to provide review
> 

So, I was looking into the patch.
The patch adds:
1. Ability to specify collation provider (with version) in --locale for initdb 
and createdb. 
2. Changes to locale checks
3. Sets ICU as default collation provider. For example "ru_RU@icu.153.80.32.1" 
is default on my machine with patch
4. Tests and necessary changes to documentation

With patch I get correct ICU ordering by default
postgres=# select unnest(array['е','ё','ж']) order by 1;
 unnest 

 е
 ё
 ж
(3 rows)

While libc locale provides incorrect order (I also get same ordering by default 
without patch)

postgres=# select c from unnest(array['е','ё','ж']) c order by c collate 
"ru_RU";
 c 
---
 е
 ж
 ё
(3 rows)


Unfortunately, neither "ru_RU@icu.153.80.32.1" (exposed by LC_COLLATE and other 
places) nor "ru_RU@icu" cannot be used by collate SQL clause.
Also, patch removes compatibility with MSVC 1800 (Visual Studio 2013) on 
Windows XP and Windows Server 2003. This is done to use newer locale-related 
functions in VS2013 build.

If the database was initialized with default locale without this patch, one 
cannot connect to it anymore
psql: FATAL:  could not find out the collation provider for datcollate 
"ru_RU.UTF-8" of database "postgres"
This problem is mentioned in commit message of the patch. I think that this 
problem should be addressed somehow.
What do you think?

Overall patch looks solid and thoughtful work and adds important functionality.

Best regards, Andrey Borodin.


Re: [bug fix] Cascaded standby cannot start after a clean shutdown

2018-02-16 Thread Michael Paquier
On Fri, Feb 16, 2018 at 04:19:00PM +0900, Michael Paquier wrote:
> Wait a minute here, when recycled past WAL segments would be filled with
> zeros before being used.

Please feel free to ignore this part.  I pushed the "Send" button
without seeing it, and I was thinking uner which circumstances segments
get zero-filled, which is moot for this thread.
--
Michael


signature.asc
Description: PGP signature