date:20170411

Re: [HACKERS] pg_basebackup: Allow use of arbitrary compression program

2017-04-11 Thread Michael Harris

Hi,

Thanks for the feedback!

>> 2) The current logic either uses zlib if compiled in, or offers no
>> compression at all, controlled by a series of #ifdef/#endif. I would
>> prefer that the user can either use zlib or an external program
>> without having to recompile, so I would remove the #ifdefs and replace
>> them with run time branching.
>
>
> Not sure how that would work or be needed. The reasonable thing would be if 
> zlib
> is available when building the choices would be "no compression",
> "zlib compression" or "external compression". If there was no zlib available
> when building, the choices would be "no compression" or "external 
> compression".

That's exactly how I intend it to work. I had thought that the current
structure of the code would not allow that, but looking at it more
closely I see that it does, so I don't have to re-organize the
#ifdefs.

Regards // Mike


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Alexander Korotkov

On Wed, Apr 12, 2017 at 12:59 AM, Dmitry Ivanov 
wrote:

> Tom Lane wrote:
>
>> Uh, no, construction of a custom plan node is entirely driven by the
>> PlanCustomPath method as far as I can see.  You're free to ignore what
>> create_scan_plan did and insert your own tlist.
>>
>
> Are you sure? Even if it's true, this targetlist should still contain each
> and every Var that's been requested. If I'm correct, the only way to ensure
> that is to call build_path_tlist(), which is static (oops!). Perhaps I
> could make my own, but it uses replace_nestloop_params() (again, static),
> and the problem goes further and further.
>

As I understand, making build_path_tlist a non-static function would solve
the problem.
Tom, do you think it's possible?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Why does logical replication launcher set application_name?

2017-04-11 Thread Kuntal Ghosh

On Wed, Apr 12, 2017 at 6:02 AM, Robert Haas  wrote:
> On Tue, Apr 11, 2017 at 8:11 PM, Michael Paquier
>  wrote:
>> On Wed, Apr 12, 2017 at 3:40 AM, Tom Lane  wrote:
>>> I notice looking at pg_stat_activity that the logical replication launcher
>>> sets its application_name to "logical replication launcher".  This seems
>>> inconsistent (no other standard background process sets application_name),
>>> redundant with other columns that already tell you what it is, and an
>>> unreasonable consumption of horizontal space in the tabular output.
>>> Can we drop that?  If we do have to have something like that, what about
>>> putting it in the "query" field where it's much less likely to be
>>> substantially wider than any other entry in the column?
>>
>> It seems to me that the logic behind that is to be able to identify
>> easily the logical replication launcher in pg_stat_activity, so using
>> the query field instead sounds fine for me as we need another way than
>> backend_type to guess what is this bgworker.
>
> Wait, why do we need two ways?
>
For backend_type=background worker, application_name shows the name of
the background worker (BackgroundWorker->bgw_name). I think we need
some way to distinguish among different background workers. But,
application_name may not be the correct field to show the information.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer

On 12 April 2017 at 08:22, Michael Paquier  wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>  wrote:
>> Well, you can get a lot of information on timings in crake's latest
>> builds for example, or nightjar's.
>>
>> Here's an interesting fact: almost all the time taken up in the scripts
>> test set seems to be consumed in running initdb over and over.
>>
>> Is it worth adding an init(cache => 1) option to PostgresNode, where we
>> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?
>
> This looks like a good idea to me, but I don't like much the idea of
> an option in the init() routine as that's hard to maintain.

How so?

> It would
> make sense to me to be able to override the initialization with an
> environment variable instead

Yeah, that's reasonable.

Patch attached. It supports setting CACHE_INITDB=0 to disable the cache.

With cache: 3m55.063s (user 7.7s, sys 0m11.833s)

Without cache:  4m11.229s (user 0m16.963s, sys 0m12.384s)

so in a serial run it doesn't make a ton of difference.

Adding some timing on initdb runs shows that initdb takes about 1s, a
cached copy about 0.1s with our Perl based recursive copy code. So
it's hardly an overwhelming benefit, but might be more beneficial with
prove -j .


BTW, I suggest adding --timer to our default PROVE_FLAGS, so we can
collect more data from the buildfarm on what takes a while. Sample
output:

[13:20:47] t/001_stream_rep.pl .. ok50445 ms (
0.01 usr  0.00 sys +  2.16 cusr  3.57 csys =  5.74 CPU)
[13:21:38] t/002_archiving.pl ... ok 5883 ms (
0.01 usr  0.00 sys +  0.19 cusr  0.45 csys =  0.65 CPU)
[13:21:44] t/003_recovery_targets.pl  ok26197 ms (
0.00 usr  0.00 sys +  0.70 cusr  1.75 csys =  2.45 CPU)
[13:22:10] t/004_timeline_switch.pl . ok10049 ms (
0.01 usr  0.00 sys +  0.24 cusr  0.53 csys =  0.78 CPU)
[13:22:20] t/005_replay_delay.pl  ok10224 ms (
0.00 usr  0.01 sys +  0.18 cusr  0.43 csys =  0.62 CPU)
[13:22:30] t/006_logical_decoding.pl  ok 7570 ms (
0.00 usr  0.00 sys +  0.25 cusr  0.32 csys =  0.57 CPU)
[13:22:38] t/007_sync_rep.pl  ok20693 ms (
0.00 usr  0.00 sys +  0.51 cusr  1.24 csys =  1.75 CPU)
[13:22:58] t/008_fsm_truncation.pl .. ok25399 ms (
0.01 usr  0.00 sys +  0.26 cusr  0.52 csys =  0.79 CPU)
[13:23:24] t/009_twophase.pl  ok58531 ms (
0.01 usr  0.00 sys +  0.49 cusr  0.92 csys =  1.42 CPU)
[13:24:22] t/010_logical_decoding_timelines.pl .. ok 8135 ms (
0.00 usr  0.01 sys +  0.34 cusr  0.76 csys =  1.11 CPU)
[13:24:30] t/011_crash_recovery.pl .. ok 4334 ms (
0.00 usr  0.00 sys +  0.12 cusr  0.16 csys =  0.28 CPU)


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d2b30831803eb3edb2e39b2e49b89bc74a65ee37 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 12 Apr 2017 13:25:19 +0800
Subject: [PATCH] Cache initdb runs in TAP tests

---
 src/test/perl/PostgresNode.pm  | 51 +++---
 src/test/perl/RecursiveCopy.pm |  5 +
 src/test/perl/TestLib.pm   |  5 +
 3 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index cb84f1f..eef9f66 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -391,13 +391,28 @@ sub init
 
 	$params{allows_streaming} = 0 unless defined $params{allows_streaming};
 	$params{has_archiving}= 0 unless defined $params{has_archiving};
+	$params{cache_initdb}	  = 1 unless defined $params{cache_initdb};
 
 	mkdir $self->backup_dir;
 	mkdir $self->archive_dir;
 
-	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
-		@{ $params{extra} });
-	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
+	# Respect CACHE_INITDB=0 in the environment if set
+	$params{cache_initdb} = 0
+		if (defined($ENV{'CACHE_INITDB'}) && !$ENV{'CACHE_INITDB'});
+
+	# We don't cache initdb results for non-default runs
+	# and they're too uncommon to care about anyway.
+	$params{cache_initdb} = 0
+		if defined($params{extra});
+
+	if ($params{cache_initdb})
+	{
+		$self->_initdb_cached(%params);
+	}
+	else
+	{
+		$self->_initdb($pgdata, %params);
+	}
 
 	open my $conf, '>>', "$pgdata/postgresql.conf";
 	print $conf "\n# Added by PostgresNode.pm\n";
@@ -446,6 +461,36 @@ sub init
 	$self->enable_archiving if $params{has_archiving};
 }
 
+# Wrapper to actually run initdb its self
+sub _initdb
+{
+	my ($self, $pgdata, %params) = @_;
+
+	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
+		@{ $params{extra} });
+	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
+}
+
+sub _initdb_cached
+{
+	my ($self, %params) = @_;
+	my $pgdata =

Re: [HACKERS] Interval for launching the table sync worker

2017-04-11 Thread Masahiko Sawada

On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
 wrote:
> On 4/10/17 08:10, Petr Jelinek wrote:
>> I don't think solution is quite this simple. This will cause all table
>> sync workers to be delayed which means concurrency will suffer and the
>> initial sync of all tables will take much longer especially if there is
>> little data. We need a way to either detect if we are launching same
>> worker that was already launched before, or alternatively if we are
>> launching crashed worker and only then apply the delay.
>
> Perhaps instead of a global last_start_time, we store a per relation
> last_start_time in SubscriptionRelState?

I was thinking the same. But a problem is that the list of
SubscriptionRelState is refreshed whenever the syncing table state
becomes invalid (table_state_valid = false). I guess we need to
improve these logic including GetSubscriptionNotReadyRelations().

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Interval for launching the table sync worker

2017-04-11 Thread Peter Eisentraut

On 4/10/17 08:10, Petr Jelinek wrote:
> I don't think solution is quite this simple. This will cause all table
> sync workers to be delayed which means concurrency will suffer and the
> initial sync of all tables will take much longer especially if there is
> little data. We need a way to either detect if we are launching same
> worker that was already launched before, or alternatively if we are
> launching crashed worker and only then apply the delay.

Perhaps instead of a global last_start_time, we store a per relation
last_start_time in SubscriptionRelState?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee

On Wed, Apr 12, 2017 at 9:23 AM, Amit Kapila 
wrote:

> On Tue, Apr 11, 2017 at 10:50 PM, Pavan Deolasee
>
> >
> > And I fixed them as quickly as humanly possible.
> >
>
> Yes, you have responded to them quickly, but I didn't get a chance to
> re-verify all of those.  However, I think the main point Robert wants
> to say is that somebody needs to dig the complete patch to see if
> there is any kind of problems with it.
>

There are no two views about that. I don't even claim that more problems
won't be found during in-depth review. I was only responding to his view
that I did not do much to address the regressions reported during the
review/tests.

>
> > 5. Added code to set a CLEAR pointer to a WARM pointer when we know that
> the
> > entire chain is WARM. This should address the workload Dilip ran and
> found
> > regression (I don't think he got chance to confirm that)
> >
>
> Have you by any chance tried to reproduce it at your end?

I did reproduce and verified that the new technique helps the case [1] (see
last para). I did not go extra length to check if there are more cases
which can still cause regression, like recheck is applied only once  to
each tuple (so the new technique does not yield any benefit) and whether
that still causes regression and by how much. However I ran pure pgbench
workload (only HOT updates) with smallish scale factor so that everything
fits in memory and did not find any regression.

Having said that, it's my view that others need not agree to it, that we
need to distinguish between CPU and IO load since WARM is designed to
address IO problems and not so much CPU problems. We also need to see
things in totality and probably measure updates and selects both if we are
going to WARM update all tuples once and read them once. That doesn't mean
we shouldn't perform more tests and I am more than willing to fix if we
find regression in even a remotely real-world use case.

Thanks,
Pavan

[1]
https://www.postgresql.org/message-id/CABOikdOTstHK2y0rDk%2BY3Wx9HRe%2BbZtj3zuYGU%3DVngneiHo5KQ%40mail.gmail.com

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] GCC 7 warnings

2017-04-11 Thread Tom Lane

Peter Eisentraut  writes:
> Attached is a more refined patch that I propose for PG10 now.  Compared
> to the previous rushed version, this one uses some more precise
> arithmetic to size some of the buffers.

Generally +1 for switching the snprintf calls to use sizeof() rather
than repeating the declared sizes of the arrays.

The change in setup_formatted_log_time seems a bit weird:

-   charmsbuf[8];
+   charmsbuf[10];

The associated call is

sprintf(msbuf, ".%03d", (int) (saved_timeval.tv_usec / 1000));

Now a human can see that saved_timeval.tv_usec must be 0..99, so
that the %d format item must always emit exactly 3 characters, which
means that really 5 bytes would be enough.  I wouldn't expect a
compiler to know that, but if it's making a generic assumption about
the worst-case width of %d, shouldn't it conclude that we might need
as many as 13 bytes for the buffer?  Why does msbuf[10] satisfy it
if msbuf[8] doesn't?

IOW, if we're going to touch this at all, I'd be inclined to go with
msbuf[16] or so, as being more likely to satisfy compilers that have
decided to try to warn about this.  And maybe we should use snprintf,
just for luck.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication apply to run with sync commit off by default

2017-04-11 Thread Peter Eisentraut

On 3/24/17 10:49, Petr Jelinek wrote:
> Rebase after table copy patch got committed.

You could please sent another rebase of this, perhaps with some more
documentation, as discussed downthread.

Also, I wonder why we don't offer the other values of
synchronous_commit.  In any case, we should keep the values consistent.
So on should be on and local should be local, but not mixing it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Amit Kapila

On Tue, Apr 11, 2017 at 10:50 PM, Pavan Deolasee
 wrote:
>
>
> On Tue, Apr 11, 2017 at 7:10 PM, Robert Haas  wrote:
>>
>>
>>
>> Yes, and as Andres says, you don't help with those, and then you're
>> upset when your own patch doesn't get attention.
>
>
> I am not upset, I was obviously a bit disappointed which I think is a very
> natural emotion after spending weeks on it. I am not blaming any one
> individual (excluding myself) for that and neither the community at large
> for the outcome. And I've moved on. I know everyone is busy getting the
> release ready and I see no point discussing this endlessly. We have enough
> on our plates for next few weeks.
>
>>
>>
>>   Amit and others who have started to
>> dig into this patch a little bit found real problems pretty quickly
>> when they started digging.
>
>
> And I fixed them as quickly as humanly possible.
>

Yes, you have responded to them quickly, but I didn't get a chance to
re-verify all of those.  However, I think the main point Robert wants
to say is that somebody needs to dig the complete patch to see if
there is any kind of problems with it.

>>
>>   Those problems should be addressed, and
>> review should continue (from whatever source) until no more problems
>> can be found.
>
>
> Absolutely.
>
>>
>>  The patch may
>> or may not have any data-corrupting bugs, but regressions have been
>> found and not addressed.
>
>
> I don't know why you say that regressions are not addressed. Here are a few
> things I did to address the regressions/reviews/concerns, apart from fixing
> all the bugs discovered, but please let me know if there are things I've not
> addressed.
>
> 1. Improved the interesting attrs patch that Alvaro wrote to address the
> regression discovered in fetching more heap attributes. The patch that got
> committed in fact improved certain synthetic workloads over then master.
> 2. Based on Petr and your feedback, disabled WARM on toasted attributes to
> reduce overhead of fetching/decompressing the attributes.
> 3. Added code to avoid doing second index scan when the index does not
> contain any WARM pointers. This should address the situation Amit brought up
> where only one of the indexes receive WARM inserts.
> 4. Added code to kill wrong index pointers to do online cleanup.
> 5. Added code to set a CLEAR pointer to a WARM pointer when we know that the
> entire chain is WARM. This should address the workload Dilip ran and found
> regression (I don't think he got chance to confirm that)
>

Have you by any chance tried to reproduce it at your end?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Ashutosh Bapat

On Tue, Apr 11, 2017 at 10:02 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane  wrote:
>>> OK.  I was trying to be noninvasive, but this does look more sensible.
>>> I think this might read better if we inverted the test (so that
>>> the outer-join-joinclause-must-be-pushable case would come first).
>
>> Ok. In fact, thinking more about it, we should probably test
>> JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are
>> not considered as outer joins and I am not sure how would we be
>> treating the quals for those.
>
> No, that's correct as-is --- or at least, if it's not correct, there
> are a bunch of other places that are also not correct.

Hmm, when we support SEMI and ANTI join push-down, we will have a
bunch of other places to change. This is one of them.

>
> Thinking about this further, though, it seems like a more straightforward
> solution to the original problem is to not store quals in a Plan's
> fdw_private list in the first place.  Putting them there is at best
> useless overhead that the finished plan will have to drag around;
> and it's not terribly hard to imagine future changes that would make
> having never-processed-by-setrefs.c qual trees in a plan be broken in
> other ways.  Can't we use a field in the rel's PgFdwRelationInfo to
> carry the remote_exprs forward from postgresGetForeignPlan to
> postgresPlanDirectModify?
>

I was thinking of using fdw_recheck_quals by renaming it. We don't
push DML with joins down to the foreign server. So, it's ok to set
fdw_recheck_quals (or whatever name we choose) to be NIL in join and
upper relation case as we do today, without changing anything else.
When we support DMLs with join pushdown, we will have to use subquery
for the scan and thus will not require fdw_recheck_quals to be set.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GCC 7 warnings

2017-04-11 Thread Peter Eisentraut

On 4/11/17 13:57, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Peter Eisentraut  writes:
> 
>>> d) Replace most of the problematic code with psprintf() and dynamically
>>> sized buffers.
>>
>> +1 for (c) as you have it.  Later we might think about selectively
>> doing (d), but it seems like more work for probably not much benefit.
> 
> Yeah -- also it's possible some of these code paths must not attempt to
> palloc() for robustness reasons.  I would go for c) only for now, and
> only try d) for very specific cases where there are no such concerns.

Attached is a more refined patch that I propose for PG10 now.  Compared
to the previous rushed version, this one uses some more precise
arithmetic to size some of the buffers.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


v2-0001-Fix-new-warnings-from-GCC-7.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical replication worker and statistics

2017-04-11 Thread Peter Eisentraut

On 4/10/17 13:06, Stas Kelvich wrote:
> 
>> On 10 Apr 2017, at 19:50, Peter Eisentraut 
>>  wrote:
>>
>> On 4/10/17 05:49, Stas Kelvich wrote:
>>> Here is small patch to call statistics in logical worker. Originally i 
>>> thought that stat
>>> collection during logical replication should manually account amounts of 
>>> changed tuples,
>>> but seems that it is already smoothly handled on relation level. So call to 
>>> pgstat_report_stat() is enough.
>>
>> I wonder whether we need a similar call somewhere in tablesync.c.  It
>> seems to work without it, though.
> 
> I thought it spins up the same worker from worker.c.

It depends on which of the various tablesync scenarios happens.  If the
apply lags behind the tablesync, then the apply doesn't process commit
messages until it has caught up.  So the part of the code you patched
wouldn't be called.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Noah Misch

On Tue, Apr 11, 2017 at 11:21:24PM -0400, Peter Eisentraut wrote:
> On 4/9/17 22:16, Noah Misch wrote:
> > [Action required within three days.  This is a generic notification.]
> 
> Patches have been posted.  Discussion is still going on a bit.

By what day should the community look for your next update?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread Tom Lane

"David G. Johnston"  writes:
> On Tue, Apr 11, 2017 at 11:10 AM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
>> On 4/11/17 11:47, David G. Johnston wrote:
>>> A potential middle-ground is to start, but then only allow superuser
>>> connections.

>> Then you might as well start and allow all connections.  I don't see
>> what this buys.

> If "leave it offline until it gets fixed" is on the table then there is
> some underlying reason that we'd not want application (or replication)
> users connecting to the database while it is in a degraded state.  Even if
> one accepts that premise that doesn't mean that an administrator shouldn't
> be allowed to login and do ad-hoc stuff; the goal of the prevention is to
> disallow programmed external actors that assume/require that these
> background worker processes are active from connecting while they are not.
> This middle-ground accommodates that goal in a precise manner.

Only if you assume that those external scripts aren't connecting as
superuser.  Unfortunately, that assumption is probably widely violated,
especially in the sort of less-well-run installations that are most
at risk for the kind of problem we're speculating about here.

> I don't have an opinion as to which extreme is better so in the absence I'm
> in favor of "put control in the hands of the administrator" - this just
> provides a slightly more usable environment for the administrator to
> operate within.

Yeah, "usable environment" is key here.  A point I meant to bring up
earlier today is that we do already have a solution for degraded
operation, ie, run a standalone backend.  But that's so awful, from
any standpoint including UI friendliness (no psql, let alone pgadmin or
other GUI tools), performance (no bgwriter, walwriter, stats collector,
or autovacuum), or reliability (no automatic checkpoints, never mind
replication), that nobody in their right mind would choose to use it
until their back was very hard against the wall.  So what we're
really discussing here is intermediate operating modes where you have
at least some of those facilities.  I doubt there are black-and-white
answers, but I still believe in the likely usefulness of a mode where
we start as much of that stuff as we can.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Peter Eisentraut

On 4/9/17 22:16, Noah Misch wrote:
> [Action required within three days.  This is a generic notification.]

Patches have been posted.  Discussion is still going on a bit.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Peter Eisentraut

On 4/10/17 20:55, Stephen Frost wrote:
>> Proposal: Dump all subscriptions in DISABLED state.  Remove current
>> pg_dump --no-subscription-connect option.
> I like this idea in general, I'm just not sure if it's the right answer
> when we're talking about pg_upgrade.  At a minimum, if we go with this
> approach, we should produce a warning when subscriptions exist during
> the pg_upgrade that the user will need to go re-enable them.

It's not clear what to do with a subscription after a dump/restore or a
pg_upgrade anyway.  You can't just continue streaming where you left
off.  Most likely, you will want to truncate the target tables and
resync them.  In some cases, you might just accept the data gap and
continue streaming at the current state.  But in any case you'll need to
decide based on what you're actually doing.  Just telling the user, turn
it back on and you're ready to go isn't necessarily the right answer.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane

Andres Freund  writes:
> The time in initdb is largely spent in regprocin and server start/stop,
> so that doesn't surprise me.

Yeah.  We had a plan for removing the regprocin overhead via preprocessing
of the .bki file, but I've not heard anything about that project for
months :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andres Freund

On 2017-04-12 10:39:22 +0800, Craig Ringer wrote:
> On 12 April 2017 at 08:22, Michael Paquier  wrote:
> > On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
> >  wrote:
> >>
> >> We should also have a debug --no-fsync option for initdb, or maybe allow it
> >> to take -o options to pass to child postgres so we can pass fsync=off .
> >
> > That's an idea as well...
> 
> OK, looking at this, initdb already runs postgres -F then does its own
> fsync()s at the end.
> 
> It already has a -N / --no-sync option too. Which is already used by
> PostgresNode.pm . So as much as can be done has already been done
> here.

The time in initdb is largely spent in regprocin and server start/stop,
so that doesn't surprise me.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan



On 04/11/2017 08:12 PM, Craig Ringer wrote:
>
>
> On 12 Apr. 2017 03:14, "Andrew Dunstan"
>  > wrote:
>
>
>
> On 04/11/2017 12:08 PM, Tom Lane wrote:
> > Robert Haas  > writes:
> >> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
> >>  > wrote:
> >>> This buildfarm run as you can see takes 33m32s, and the Tap
> tests take a
> >>> combined 19m52s of that time.
> >> I don't think it's quite fair to complain about the TAP tests
> taking a
> >> long time to run as a general matter.  Many people here have long
> >> wanted a separate test-suite for long running tests that can be
> run to
> >> really check everything possible, and apparently, TAP tests are it.
> >> What I think would be more useful is to break down where the
> time is
> >> getting spent.  It may be that some of those tests are not adding
> >> value proportionate to their runtime.
>
>
>
> Well, you can get a lot of information on timings in crake's latest
> builds for example, or nightjar's.
>
> Here's an interesting fact: almost all the time taken up in the
> scripts
> test set seems to be consumed in running initdb over and over.
>
>
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where
> we stash an initdb'd db in tmp_check/  and just do a simple fs copy of
> it ? 
>
> Even default to caching and allow an option to disable the cached copy.
>
> We're going to need to initdb a lot in the TAP tests. We often need a
> clean state. Tests also get harder to debug the more you pack into a
> single test run and more difficult to run individually to test some
> specific failure. So IMO the best thing is to try to make that repeat
> initdb as fast as possible.
>
> It reduces our coverage of initdb only incredibly slightly - all that
> repeat runs will do is help find very uncommon intermittent failures.
> And we rerun the buildfarm all the time so it's not like there's a
> shortage of initdb runs anyway.
>
> We should also have a debug --no-fsync option for initdb, or maybe
> allow it to take -o options to pass to child postgres so we can pass
> fsync=off .


Absolutely worth it I should say.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer

On 12 April 2017 at 08:22, Michael Paquier  wrote:
> On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
>  wrote:
>>
>> We should also have a debug --no-fsync option for initdb, or maybe allow it
>> to take -o options to pass to child postgres so we can pass fsync=off .
>
> That's an idea as well...

OK, looking at this, initdb already runs postgres -F then does its own
fsync()s at the end.

It already has a -N / --no-sync option too. Which is already used by
PostgresNode.pm . So as much as can be done has already been done
here.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread Noah Misch

On Mon, Apr 10, 2017 at 11:22:38PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > On Mon, Apr 10, 2017 at 09:36:59PM -0400, Peter Eisentraut wrote:
> >> If history had been different, we could have implemented, say,
> >> autovacuum or walreceiver on the background worker framework.  I think
> >> unifying some of that might actually be a future project.  Would it be
> >> OK if these processes just logged a warning and didn't start if there
> >> was a misconfiguration?
> 
> > No.  I can't think of any background worker so unimportant that I'd want the
> > warning only.  Certainly, then, the ones you list are far too important.
> 
> Well, I dunno.  We allow the system to start without a functioning stats
> collector (cf what happens if we fail to create a working loopback
> socket).  But lack of stats will effectively disable autovacuum.  So at
> the very least we have some inconsistent decisions here.

Yep.  I mentioned "complicated dependencies" as a factor, and that's relevant
to the stats collector.  While creating a loopback network socket is not
highly complicated, it does fail in the field, and the user owning PostgreSQL
may lack the means to fix it.  RegisterBackgroundWorker() failure, on the
other hand, implies the DBA changed a PGC_POSTMASTER setting like
shared_preload_libraries or max_logical_replication_workers without raising
max_worker_processes.  Best to get unmistakable feedback and immediately raise
max_worker_processes or rollback the causative GUC change.

> Personally I'd err on the side of "starting up degraded is better than
> not starting at all".  Or maybe we should invent a GUC to let DBAs
> express their preference on that?

Maybe.  I share Peter's doubts.  Also, not all degradation is equal, and one
user may cherish worker A alone while another user cherishes worker B alone.
Still, maybe.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Amit Kapila

On Wed, Apr 12, 2017 at 12:40 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>
>> I think there is a possibility of doing ExecInitNode in a parallel
>> worker for a parallel-unsafe subplan, because we pass a list of all
>> the sublans stored in planned statement.
>
> It's more than a possibility.  If you run Andreas' test case against
> HEAD, now that the can't-transmit-RestrictInfo failure is fixed,
> you will find that the parallel worker actually creates and immediately
> destroys an FDW remote connection.  (The easiest way to prove that is
> to turn on log_connections/log_disconnections.  BTW, is there any
> convenient way to attach a debugger to a parallel worker process as it's
> being launched?
>

What I generally do to debug parallel worker case is to add a "while
(1) {}" kind of loop in the beginning of ParallelQueryMain() or in
ParallelWorkerMain() depending on area I want to debug, like in this
case it would be okay to add such a loop in ParallelQueryMain().  Then
as the worker will be waiting there attach the debugger to that
process and resume debugging.  It is easier to identify the worker
process if we add an infinite loop, otherwise one can use sleep or
some other form of wait or debug break mechanism as well.

>  I couldn't manage to catch the worker in the act.)
>
> So I think this is clearly a Bad Thing and we'd better do something
> about it.  The consequences for postgres_fdw aren't so awful perhaps,
> but other non-parallel-safe FDWs might have bigger problems with this
> behavior.
>
>> However, the worker will
>> never execute such a plan as we don't generate a plan where unsafe
>> sublan/initplan is referenced in the node passed to the worker.  If we
>> want to avoid passing parallel-unsafe subplans to workers, then I
>> think we can maintain a list of parallel safe subplans along with
>> subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel
>> safe flag in Plan so that we can pass only parallel safe plans to
>> workers.
>
> Right, we could, say, leave a hole in the subplan list corresponding
> to any subplan that's not parallel-safe.  That seems like a good idea
> anyway because right now there's clearly no cross-check preventing
> a worker from trying to run such a subplan.
>
> Anyone want to draft a patch for this?
>

Yes, I will work on it in this week and possibly today or tomorrow and
either produce a patch or if I face any problems, then will update
about them here.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Peter Eisentraut

On 4/10/17 13:58, Peter Eisentraut wrote:
> Proposal: Dump subscriptions if running as superuser.  If not, check if
> there are subscriptions and warn about that.  Remove current pg_dump
> --include-subscriptions option.

Patch for this part.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-pg_dump-Dump-subscriptions-by-default.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-04-11 Thread Amit Langote

On 2017/04/12 2:41, Robert Haas wrote:
> On Tue, Apr 11, 2017 at 9:45 AM, Tom Lane  wrote:
>> Amit Langote  writes:
>>> 2. DefineQueryRewrite() may try to scan a partitioned table in the case of
>>> converting a table to view, where we must make sure that the table being
>>> converted is empty.  It's checked by scanning the heap, which we should
>>> not do for a partitioned table.  Nor should we try to drop the storage
>>> once ready to make the table into a REKIND_VIEW relation (because all
>>> other checks passed okaying the conversion).
>>
>> It looks like this patch intends to allow converting a partitioned table
>> to a view.  I would lobby for refusing the command, instead.  There is
>> no good reason to allow it, and it might well be a user error.
> 
> Yeah, I agree.

Alright.  So I made it into two patches instead: 0001 fixes the bug that
validateCheckConstraint() tries to scan partitioned tables and 0002 makes
trying to convert a partitioned table to a view a user error.

Thanks,
Amit
>From a71b2d8c38c1ca2ca3430bdc3f4f850d66e486d6 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 11 Apr 2017 16:04:50 +0900
Subject: [PATCH 1/2] Fix a few places still scanning partitioned tables

Commit c94e6942ce missed validateCheckConstraint.
---
 src/backend/commands/tablecmds.c  | 8 ++--
 src/test/regress/expected/alter_table.out | 6 ++
 src/test/regress/sql/alter_table.sql  | 7 +++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a6a9f54b13..a02904c85c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8120,8 +8120,12 @@ validateCheckConstraint(Relation rel, HeapTuple constrtup)
 	bool		isnull;
 	Snapshot	snapshot;
 
-	/* VALIDATE CONSTRAINT is a no-op for foreign tables */
-	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+	/*
+	 * VALIDATE CONSTRAINT is a no-op for foreign tables and partitioned
+	 * tables.
+	 */
+	if (rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+		rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
 		return;
 
 	constrForm = (Form_pg_constraint) GETSTRUCT(constrtup);
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 2227f2d977..883a5c9864 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3372,3 +3372,9 @@ ERROR:  partition constraint is violated by some row
 -- cleanup
 drop table p;
 drop table p1;
+-- validate constraint on partitioned tables should only scan leaf partitions
+create table parted_validate_test (a int) partition by list (a);
+create table parted_validate_test_1 partition of parted_validate_test for values in (0, 1);
+alter table parted_validate_test add constraint parted_validate_test_chka check (a > 0) not valid;
+alter table parted_validate_test validate constraint parted_validate_test_chka;
+drop table parted_validate_test;
diff --git a/src/test/regress/sql/alter_table.sql b/src/test/regress/sql/alter_table.sql
index 8cd6786a90..eb1b4b536f 100644
--- a/src/test/regress/sql/alter_table.sql
+++ b/src/test/regress/sql/alter_table.sql
@@ -2228,3 +2228,10 @@ alter table p attach partition p1 for values from (1, 2) to (1, 10);
 -- cleanup
 drop table p;
 drop table p1;
+
+-- validate constraint on partitioned tables should only scan leaf partitions
+create table parted_validate_test (a int) partition by list (a);
+create table parted_validate_test_1 partition of parted_validate_test for values in (0, 1);
+alter table parted_validate_test add constraint parted_validate_test_chka check (a > 0) not valid;
+alter table parted_validate_test validate constraint parted_validate_test_chka;
+drop table parted_validate_test;
-- 
2.11.0

>From e788b5663db1e62ee4d6f6a7a9c111b049f25fbe Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 12 Apr 2017 11:12:11 +0900
Subject: [PATCH 2/2] Disallow converting partitioned tables to a view

---
 src/backend/rewrite/rewriteDefine.c | 6 ++
 src/test/regress/sql/rules.sql  | 5 +
 2 files changed, 11 insertions(+)

diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index df32f2c3ae..eab3f6062d 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -422,6 +422,12 @@ DefineQueryRewrite(char *rulename,
 			HeapScanDesc scanDesc;
 			Snapshot	snapshot;
 
+			if (event_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ereport(ERROR,
+		(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+		 errmsg("could not convert partitioned table \"%s\" to a view",
+RelationGetRelationName(event_relation;
+
 			snapshot = RegisterSnapshot(GetLatestSnapshot());
 			scanDesc = heap_beginscan(event_relation, snapshot, 0, NULL);
 			if (heap_getnext(scanDesc, ForwardScanDirection) != NULL)
diff --git

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread Noah Misch

On Tue, Apr 11, 2017 at 11:33:34AM -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > I think there is no clear agreement here, and no historically consistent
> > behavior.  I'm prepared to let it go and cross it off the list of open
> > items.  I believe we should keep thinking about it, but it's not
> > something that has to hold up beta.
> 
> Agreed, this doesn't seem like a must-fix-for-beta consideration.

Definitely not a beta blocker, agreed.  Would it be okay to release v10 final
with the logical replication launcher soft-failing this way?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remove pg_stat_progress_vacuum from Table 28.2

2017-04-11 Thread Amit Langote

On 2017/04/12 0:22, Fujii Masao wrote:
> On Fri, Apr 7, 2017 at 9:53 AM, Amit Langote
>  wrote:
>> On 2017/04/07 0:56, Fujii Masao wrote:
>>> On Tue, Apr 4, 2017 at 10:19 AM, Amit Langote
>>>  wrote:
 It seems pg_stat_progress_vacuum is not supposed to appear in the table
 titled "Collected Statistics Views".  It was added by c16dc1aca.  Attached
 patch fixes that.
>>>
>>> Instead, it should appear in the table of "Dynamic Statistics Views"
>>> because it reports dynamic info, i.e., progress, about VACUUM activity?
>>
>> I thought the same at first, but then realized we have a entirely separate
>> section 28.4. Progress Reporting.
> 
> Yes, but I don't think that removing that from 28.2 is an improvement
> for users.

Perhaps you're right.  Here's a patch that moves pg_stat_progress_vacuum
to the Dynamic Statistics Views table.

Thanks,
Amit
>From adfdaaa612c15cfa5c9d63cc80846336fe0bc0ca Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 12 Apr 2017 10:27:41 +0900
Subject: [PATCH] Move pg_stat_progress_vacuum to Table 28.1

Because, it sounds more like a Dynamic Statistics View than a Collected
Statistics view.
---
 doc/src/sgml/monitoring.sgml | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index d42a461ad9..2a83671b53 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -324,6 +324,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_vacuumpg_stat_progress_vacuum
+  One row for each backend (including autovacuum worker processes) running
+   VACUUM, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -515,12 +523,6 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   yet included in pg_stat_user_functions).
  
 
- 
-  pg_stat_progress_vacuumpg_stat_progress_vacuum
-  One row for each backend (including autovacuum worker processes) running
-  VACUUM, showing current progress.
-  See .
- 
 

   
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Bruce Momjian

On Tue, Apr 11, 2017 at 08:58:30PM -0400, Stephen Frost wrote:
> > > But just a bit more is needed to make it really a big announcement and
> > > provide real value to (I guess, mostly but very interesting) enterprise
> > > customers, for which MITM and impersonating are big things. The good news 
> > > is
> > > that adding channel binding is like inverse Paretto: a 20% of extra effort
> > > (I bet significantly less) leads to 80% improvement.
> > 
> > I don't see why channel binding is a big deal for enterprises because I
> > assume they are already using SSL:
> 
> Channel binding should be used with SSL to ensure that there is no
> man-in-the-middle attack being performed.  It's necessary when the
> end-points aren't performing full, mutual, certificate-based
> verification.

And which enterprises are using SSL without certificates?  And I thought
channel binding required certificates anyway, e.g.:


https://en.wikipedia.org/wiki/Salted_Challenge_Response_Authentication_Mechanism#Channel_binding

For instance, for the tls-server-end-point channel binding, it is the
server's TLS certificate.

> > I think the big win for SCRAM is the inability to replay md5 packets
> > after recording 16k sessions (salt was only 32-bit, so a 50% chance of
> > replay after 16 sessions), and storage of SHA256 hashes instead of MD5
> > in pg_authid, though the value of that is mostly a check-box item
> > because collisions are not a problem for the way we use MD5.
> 
> There are a lot of wins to having SCRAM implemented.  I disagree
> strongly that securing PG from attacks based on latent information
> gathering (backups which include pg_authid) is just a "check-box" item.

Well, they have the entire backup so I don't think cracking the password
is a huge win, though the password does potentially give them access to
future data.  And it prevents them from reusing the password on another
server, but _again_, I still think the big win is to prevent replay
after 16k packets are sniffed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RENAME RULE doesn't work with partitioned tables

2017-04-11 Thread Amit Langote

On 2017/04/12 2:20, Robert Haas wrote:
> On Tue, Apr 11, 2017 at 5:54 AM, Amit Langote
>  wrote:
>> Just noticed that RangeVarCallbackForRenameRule() was not updated to
>> handle partitioned tables, causing the following bug:
>>
>> create table parted_table (a int) partition by list (a);
>> create table part partition of parted_table for values in (1);
>> create rule parted_table_insert as on insert to parted_table
>>   do instead insert into part values (new.*);
>> alter rule parted_table_insert on parted_table
>>   rename to parted_table_insert_redirect;
>> -- ERROR:  "parted_table" is not a table or view
>>
>> Attached fixes this and adds a test.  Added to open items list.
> 
> Committed.

Thanks.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] dropping a partition may cause deadlock

2017-04-11 Thread Amit Langote

On 2017/04/11 22:18, Robert Haas wrote:
> On Sun, Apr 9, 2017 at 7:57 PM, Noah Misch  wrote:
>> The above-described topic is currently a PostgreSQL 10 open item.  Robert,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> v10 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within three calendar days of
>> this message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping v10.  Consequently, I will appreciate your 
>> efforts
>> toward speedy resolution.  Thanks.
> 
> I have committed the patch.

Thanks.

Regards,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Stephen Frost

Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
> On Tue, Apr 11, 2017 at 02:53:24PM +0200, Álvaro Hernández Tortosa wrote:
> > Let's put ourselves on the foot of potential users. Why would anyone
> > want to use SCRAM? What for? The hashing mechanism is better, no question.
> > And bring some added benefits, true. So its "better". But the real gain
> > comes from using channel binding, which avoids impersonation, MITM attacks.
> > This is the deal breaker. SCRAM without channel binding is like Coke Zero
> > without caffeine and mixed with water. Don't get me wrong, the work behind
> > is great.
> > 
> > But just a bit more is needed to make it really a big announcement and
> > provide real value to (I guess, mostly but very interesting) enterprise
> > customers, for which MITM and impersonating are big things. The good news is
> > that adding channel binding is like inverse Paretto: a 20% of extra effort
> > (I bet significantly less) leads to 80% improvement.
> 
> I don't see why channel binding is a big deal for enterprises because I
> assume they are already using SSL:

Channel binding should be used with SSL to ensure that there is no
man-in-the-middle attack being performed.  It's necessary when the
end-points aren't performing full, mutual, certificate-based
verification.

> I think the big win for SCRAM is the inability to replay md5 packets
> after recording 16k sessions (salt was only 32-bit, so a 50% chance of
> replay after 16 sessions), and storage of SHA256 hashes instead of MD5
> in pg_authid, though the value of that is mostly a check-box item
> because collisions are not a problem for the way we use MD5.

There are a lot of wins to having SCRAM implemented.  I disagree
strongly that securing PG from attacks based on latent information
gathering (backups which include pg_authid) is just a "check-box" item.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-11 Thread Bruce Momjian

On Tue, Apr 11, 2017 at 04:51:03PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> Yeah, very long ago.  A quick search of our archives shows that the number
> of mentions of Borland pretty much fell off a cliff after 2009 (excluding
> the repeated conversations about dropping support, that is).  I found one
> report suggesting that it was already broken in 2012:
> 
> https://www.postgresql.org/message-id/flat/AD61A3A7C80949178643FE5D2811C35F%40LynnPC
> 
> It seems pretty safe to say that nobody's using this build method
> anymore.  As best I can tell from perusing the archives, the reason
> we used to expend a lot of sweat on it was that there was a freely
> available version of Borland C and none of MSVC.  But that stopped
> being true a long time ago, so there's not much reason to concern
> ourselves with it anymore.

Agreed, I was just pointing out why it remained so long.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Bruce Momjian

On Mon, Apr 10, 2017 at 04:34:50PM -0700, Andres Freund wrote:
> Hi,
> 
> 
> On 2017-04-08 23:36:13 +0530, Pavan Deolasee wrote:
> > On Wed, Apr 5, 2017 at 11:57 PM, Andres Freund  wrote:
> > 
> > > On 2017-04-05 09:36:47 -0400, Robert Haas wrote:
> > > > By the way, the "Converting WARM chains back to HOT chains" section of
> > > > README.WARM seems to be out of date.  Any chance you could update that
> > > > to reflect the current state and thinking of the patch?
> > >
> > > I propose we move this patch to the next CF.  That shouldn't prevent you
> > > working on it, although focusing on review of patches that still might
> > > make it wouldn't hurt either.
> > >
> > >
> > Thank you all for the  reviews, feedback, tests, criticism. And apologies
> > for keep pushing it till the last minute even though it was clear to me
> > quite some time back the patch is not going to make it.
> 
> What confuses me about that position is that people were advocating to
> actually commit till literally hours before the CF closed.

Yes, I was surprised by that too and have privately emailed people on
this topic.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some thoughts about SCRAM implementation

2017-04-11 Thread Bruce Momjian

On Tue, Apr 11, 2017 at 02:53:24PM +0200, Álvaro Hernández Tortosa wrote:
> Let's put ourselves on the foot of potential users. Why would anyone
> want to use SCRAM? What for? The hashing mechanism is better, no question.
> And bring some added benefits, true. So its "better". But the real gain
> comes from using channel binding, which avoids impersonation, MITM attacks.
> This is the deal breaker. SCRAM without channel binding is like Coke Zero
> without caffeine and mixed with water. Don't get me wrong, the work behind
> is great.
> 
> But just a bit more is needed to make it really a big announcement and
> provide real value to (I guess, mostly but very interesting) enterprise
> customers, for which MITM and impersonating are big things. The good news is
> that adding channel binding is like inverse Paretto: a 20% of extra effort
> (I bet significantly less) leads to 80% improvement.

I don't see why channel binding is a big deal for enterprises because I
assume they are already using SSL:


https://en.wikipedia.org/wiki/Salted_Challenge_Response_Authentication_Mechanism#Channel_binding

I think the big win for SCRAM is the inability to replay md5 packets
after recording 16k sessions (salt was only 32-bit, so a 50% chance of
replay after 16 sessions), and storage of SHA256 hashes instead of MD5
in pg_authid, though the value of that is mostly a check-box item
because collisions are not a problem for the way we use MD5.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why does logical replication launcher set application_name?

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 8:11 PM, Michael Paquier
 wrote:
> On Wed, Apr 12, 2017 at 3:40 AM, Tom Lane  wrote:
>> I notice looking at pg_stat_activity that the logical replication launcher
>> sets its application_name to "logical replication launcher".  This seems
>> inconsistent (no other standard background process sets application_name),
>> redundant with other columns that already tell you what it is, and an
>> unreasonable consumption of horizontal space in the tabular output.
>> Can we drop that?  If we do have to have something like that, what about
>> putting it in the "query" field where it's much less likely to be
>> substantially wider than any other entry in the column?
>
> It seems to me that the logic behind that is to be able to identify
> easily the logical replication launcher in pg_stat_activity, so using
> the query field instead sounds fine for me as we need another way than
> backend_type to guess what is this bgworker.

Wait, why do we need two ways?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Michael Paquier

On Wed, Apr 12, 2017 at 9:12 AM, Craig Ringer
 wrote:
> Well, you can get a lot of information on timings in crake's latest
> builds for example, or nightjar's.
>
> Here's an interesting fact: almost all the time taken up in the scripts
> test set seems to be consumed in running initdb over and over.
>
> Is it worth adding an init(cache => 1) option to PostgresNode, where we
> stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?

This looks like a good idea to me, but I don't like much the idea of
an option in the init() routine as that's hard to maintain. It would
make sense to me to be able to override the initialization with an
environment variable instead, or just make it the default and get the
base image stored in tmp_install/. Base backups are out of scope
though.

> It reduces our coverage of initdb only incredibly slightly - all that repeat
> runs will do is help find very uncommon intermittent failures. And we rerun
> the buildfarm all the time so it's not like there's a shortage of initdb
> runs anyway.

initdb is always run with the same set of options when init() is
called, so this is not something to worry about. And tests inherent to
initdb should happen in src/bin/initdb.

> We should also have a debug --no-fsync option for initdb, or maybe allow it
> to take -o options to pass to child postgres so we can pass fsync=off .

That's an idea as well...
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Craig Ringer

On 12 Apr. 2017 03:14, "Andrew Dunstan" 
wrote:

On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>>  wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.

Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over.

Is it worth adding an init(cache => 1) option to PostgresNode, where we
stash an initdb'd db in tmp_check/  and just do a simple fs copy of it ?

Even default to caching and allow an option to disable the cached copy.

We're going to need to initdb a lot in the TAP tests. We often need a clean
state. Tests also get harder to debug the more you pack into a single test
run and more difficult to run individually to test some specific failure.
So IMO the best thing is to try to make that repeat initdb as fast as
possible.

It reduces our coverage of initdb only incredibly slightly - all that
repeat runs will do is help find very uncommon intermittent failures. And
we rerun the buildfarm all the time so it's not like there's a shortage of
initdb runs anyway.

We should also have a debug --no-fsync option for initdb, or maybe allow it
to take -o options to pass to child postgres so we can pass fsync=off .

Re: [HACKERS] Why does logical replication launcher set application_name?

2017-04-11 Thread Michael Paquier

On Wed, Apr 12, 2017 at 3:40 AM, Tom Lane  wrote:
> I notice looking at pg_stat_activity that the logical replication launcher
> sets its application_name to "logical replication launcher".  This seems
> inconsistent (no other standard background process sets application_name),
> redundant with other columns that already tell you what it is, and an
> unreasonable consumption of horizontal space in the tabular output.
> Can we drop that?  If we do have to have something like that, what about
> putting it in the "query" field where it's much less likely to be
> substantially wider than any other entry in the column?

It seems to me that the logic behind that is to be able to identify
easily the logical replication launcher in pg_stat_activity, so using
the query field instead sounds fine for me as we need another way than
backend_type to guess what is this bgworker.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread David G. Johnston

On Tue, Apr 11, 2017 at 11:10 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/11/17 11:47, David G. Johnston wrote:
> > A potential middle-ground is to start, but then only allow superuser
> > connections.
>
> Then you might as well start and allow all connections.  I don't see
> what this buys.
>
>
If "leave it offline until it gets fixed" is on the table then there is
some underlying reason that we'd not want application (or replication)
users connecting to the database while it is in a degraded state.  Even if
one accepts that premise that doesn't mean that an administrator shouldn't
be allowed to login and do ad-hoc stuff; the goal of the prevention is to
disallow programmed external actors that assume/require that these
background worker processes are active from connecting while they are not.
This middle-ground accommodates that goal in a precise manner.

I don't have an opinion as to which extreme is better so in the absence I'm
in favor of "put control in the hands of the administrator" - this just
provides a slightly more usable environment for the administrator to
operate within.

David J.

Re: [HACKERS] Variable substitution in psql backtick expansion

2017-04-11 Thread Fabien COELHO




Hmmm. Although I do not buy this, it could work as a replacement for \set
which it seems cannot be upgraded because some people may rely on it to
just store whatever comes after it in a variable.


I have no strong opinion on how expressive expressions should be, but
having a separate \expr (or \setexpr, etc) gives us a green field to
develop them.


Yep.

One possible approach would be to reuse pgbench expression engine in order 
to avoid redevelopping yet another lexer & parser & evaluator. This would 
mean some abstraction work, but it looks like the simplest & most 
effective approach right now. Currently it supports an SQL-expression 
subset about int & float, and there is an ongoing submission to add 
booleans and a few functions. If this is done this way, this suggests that 
variable management should/could be merged as well, but there are some 
differences (psql variables are not typed, it relies on a list, there is a 
"namespace" thing I'm not sure I understood...).


Pavel also suggested some support for TEXT, although I would like to see a 
use case. That could be another extension to the engine.


A drawback is that pgbench needs more powerfull client-side expressions 
than psql, thus it adds some useless complexity to psql, but avoiding 
another engine seems desirable.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Dmitry Ivanov


Tom Lane wrote:

Uh, no, construction of a custom plan node is entirely driven by the
PlanCustomPath method as far as I can see.  You're free to ignore what
create_scan_plan did and insert your own tlist.


Are you sure? Even if it's true, this targetlist should still contain each 
and every Var that's been requested. If I'm correct, the only way to ensure 
that is to call build_path_tlist(), which is static (oops!). Perhaps I 
could make my own, but it uses replace_nestloop_params() (again, static), 
and the problem goes further and further.


This effectively means that it would be nice if I could just use the 
existing machinery. The fix would be quite trivial.


By the way, what if our CustomScan is a top node? Let's take a look at 
create_plan():


/* Recursively process the path tree, demanding the correct tlist result */
plan = create_plan_recurse(root, best_path, CP_EXACT_TLIST);

...
if (!IsA(plan, ModifyTable))
apply_tlist_labeling(plan->targetlist, root->processed_tlist);
...

If I spoil the targetlist, everything blows up in apply_tlist_labeling():

Assert(list_length(dest_tlist) == list_length(src_tlist));

since the lengths may differ. It's much safer to use the tlist provided by 
the core code, but alas, sometimes it's not an option.



In particular, if what
your custom path actually is is a rescan of something like an
IndexOnlyScan, why don't you just copy the IOS's result tlist?


I'm actively working on a prototype of partition pruning. It could be much 
simpler if I just patched the core, but we have a working extension for 9.5 
and 9.6, and we can't throw it away just yet. I wouldn't bother you if I 
didn't encounter a broken query :)




--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-04-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-11 17:25:52 -0400, Tom Lane wrote:
>> What was the previous behavior for such cases?  If it was reasonably
>> sane, we probably have to preserve it.  If it was unpredictable or
>> completely wacko, maybe we don't.

> Previously we'd stash the result in a new tuplestore, because it
> happened inside ExecMakeTableFunctionResult()'s fallback path.  The
> inner tuplestore (from the proper SRF) would get evaluated via the the
> isDone mechanism.

> That'd imo be a fair amount of work to emulate, because we'd have to
> manually go over the tuplesttore.

Yeah.  I don't have a big problem with saying that things that aren't
themselves SRFs are evaluated as though in a projection step atop the
SRF calculation.  We've already crossed that bridge with respect to
expressions around SRFs in the tlist --- for instance this:

regression=# select f1, case when f1>0 then generate_series(1,2) else null end 
as c from int4_tbl;
 f1  | c 
-+---
   0 |  
   0 |  
  123456 | 1
  123456 | 2
 -123456 |  
 -123456 |  
  2147483647 | 1
  2147483647 | 2
 -2147483647 |  
 -2147483647 |  
(10 rows)

gives different results than it used to:

regression96=# select f1, case when f1>0 then generate_series(1,2) else null 
end as c from int4_tbl;
 f1  | c 
-+---
   0 |  
  123456 | 1
  123456 | 2
 -123456 |  
  2147483647 | 1
  2147483647 | 2
 -2147483647 |  
(7 rows)

Now, that old behavior matches what you got in the RangeFunction case:

regression96=# select * from int4_tbl, cast(case when f1>0 then 
generate_series(1,2) else null end as int);
 f1  | int4 
-+--
   0 | 
  123456 |1
  123456 |2
 -123456 | 
  2147483647 |1
  2147483647 |2
 -2147483647 | 
(7 rows)

So it would make sense to me for our new behavior to still match the
targetlist case.

Not sure if that's exactly the same as what you're saying or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-04-11 Thread Alexander Korotkov

On Tue, Apr 11, 2017 at 8:24 PM, Alexander Kuzmenkov <
a.kuzmen...@postgrespro.ru> wrote:

> I would like to propose a patch that speeds up the queries of the form
> 'select
> count(*) ... where ...',  where the restriction clauses can be satisfied
> by some
> indexes. At the moment, such queries use index-only scans followed by
> aggregation. Index-only scans are only possible for indexes that are
> capable of
> returning indexed tuples, that is, support the 'amgettuple' access method.
> They
> are not applicable to indexes such as GIN and RUM. However, it is possible
> to
> improve count(*) performance for indexes that support bitmap scans. Having
> performed a bitmap index scan or a combination of such, the bits in bitmap
> can
> be counted to obtain the final result. Of course, the bitmap pages that are
> lossy or not visible to all existing transactions will still require heap
> access.
>

That's a cool feature for FTS users! Please, register this patch to the
next commitfest.

This patch has some important limitations:
> * It only supports targetlist consisting of a single expression that can be
> projected from count(*).
> * count(expr) is not supported. We could support it for cases where the
> "expr is not null" restriction can be satisfied with an index.
> * The current implementation does not support parallel execution. It could
> be
> implemented during the PostgreSQL 11 release cycle.
> * For some indexes, the bitmap index scan will always require rechecking
> all
> the tuples. Bitmap count plans should not be used in such cases. For now,
> this
> check is not implemented.
>

Does this limitation cause a performance drawback?  When bitmap index scan
returns all rechecks, alternative to Bitmap Count is still Aggregate +
Bitmap Heap Scan.  Thus, I think Bitmap Count would have the same
performance or even slightly faster.  That's worth testing.

Also, what is planning overhead of this patch?  That's worth testing too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-04-11 Thread Andres Freund

On 2017-04-11 17:25:52 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Tom, do you have any opinion on the volatility stuff?
> 
> What was the previous behavior for such cases?  If it was reasonably
> sane, we probably have to preserve it.  If it was unpredictable or
> completely wacko, maybe we don't.

Previously we'd stash the result in a new tuplestore, because it
happened inside ExecMakeTableFunctionResult()'s fallback path.  The
inner tuplestore (from the proper SRF) would get evaluated via the the
isDone mechanism.

That'd imo be a fair amount of work to emulate, because we'd have to
manually go over the tuplesttore.

But given that we do *not* have similar semantics for volatiles in the
targetlist, I'm quite unconvinced that that's necessary.  Consider
e.g. my previous example of
  SELECT * FROM CAST(srf() * volatile_func() AS whatnot)
rewritten into a saner version as
  SELECT srf * volatile_func() FROM srf() AS srf;
here volatile_func() would before and now get re-evaluated if there's a
rewind, and would only be invoked if the row is actually evaluated.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-04-11 Thread Tom Lane

Andres Freund  writes:
> Tom, do you have any opinion on the volatility stuff?

What was the previous behavior for such cases?  If it was reasonably
sane, we probably have to preserve it.  If it was unpredictable or
completely wacko, maybe we don't.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

2017-04-11 Thread Andres Freund

On 2017-01-30 18:54:50 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > Wonder if we there's an argument to be made for implementing this
> > roughly similarly to split_pathtarget_at_srf - instead of injecting a
> > ProjectSet node we'd add a FunctionScan node below a Result node.
>
> Yeah, possibly.  That would have the advantage of avoiding an ExecProject
> step when the SRFs aren't buried, which would certainly be the expected
> case.
>
> If you don't want to make ExecInitExpr responsible, then the planner would
> have to do something like split_pathtarget_at_srf anyway to decompose the
> expressions, no matter which executor representation we use.

Working on this I came across a few things:

Splitting things away from the FunctionScan node doesn't work entirely
naturally, due to ORDINALITY.  Consider e.g. cases where there's no
function to evaluate anymore, because it got inlined, but we want to
ORDINALITY.  We could obviously support FunctionScan nodes without any
associated (or empty) RangeTblFunctions, but that seems awkward.

Secondly, doing non-function stuff gets interesting when volatility is
involved: Consider something like FROM CAST(srf() * volatile_func() AS
whatnot); when, and how often, does volatile_func() get evaluated?  If
we put it somewhere around the projection path it'll only get evaluated
if the rows are actually retrieved and will get re-evaluated upon
projection.  If we put it somewhere below the tuplestores that
nodeFunctionscan./execSRF.c generate for each RangeTblFunction, it'll
get called evaluated regardless of being retrieved.  The latter is
noticeably more complicated, because of SRFM_Materialize SRFs.   Our
policy around when it's ok to re-evaluate volatile functions isn't clear
enough to me, to say which one is right and which one is wrong.

For now there's simply another RangeTblFunctions field with a separate
non-FuncExpr expression, that can reference to the SRF output via scan
Vars.  That's evaluated in FunctionNext's !simple branch, where we
conveniently have a separate slot already.  The separation currently
happens create_functionscan_plan().

Tom, do you have any opinion on the volatility stuff?

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan

On 04/11/2017 03:58 PM, Andres Freund wrote:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
 The other thing that might be useful here is to push on parallelizing
 buildfarm runs.
>>> One reason I haven't supported "make -j nn" everywhere (it is supported
>>> for making core, contrib and test_modules) is that the output tends to
>>> get rather jumbled, and I didn't want to impose that extra burden in
>>> people trying to decipher the results.
>> Agreed, that would be a mess.
> Doesn't make's -Otarget largely solve this?
>

Possibly, but I have just gone to a not inconsiderable amount of trouble
to disaggregate the TAP tests so we get not only the output but all the
logs for each step separately. Say there's a problem in pg_rewind.
Compare this

with having to pick apart this:

So I don't really want the output from these run in parallel to be run
all together, even if they aren't jumbled.

Now it might make sense for other pieces to be run like that. I will
have a look. But the ones that are taking the most time are ones that
seem least suitable.

Afterthought: It might be possible to make it a runtime option.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Tom Lane

Dmitry Ivanov  writes:
>> Reading between the lines, I think the problem may be that you're not
>> being careful about how you set up custom_scan_tlist.  But since the
>> core code has zero involvement in that decision, it's hard to see why
>> it would be a core code bug.

> I'm sorry, but you didn't understand. It's *the core code* that builds the 
> targetlist, and sometimes it may decide to provide a physical targetlist, 
> not the one that's *really needed*.

Uh, no, construction of a custom plan node is entirely driven by the
PlanCustomPath method as far as I can see.  You're free to ignore what
create_scan_plan did and insert your own tlist.  In particular, if what
your custom path actually is is a rescan of something like an
IndexOnlyScan, why don't you just copy the IOS's result tlist?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-11 Thread Tom Lane

Bruce Momjian  writes:
> I am fine with removing things, but I do remember one reason the Borland
> part was kept is that some tool would only work with the
> Borland-compiled library, not gcc or MSVC, but that was long ago.

Yeah, very long ago.  A quick search of our archives shows that the number
of mentions of Borland pretty much fell off a cliff after 2009 (excluding
the repeated conversations about dropping support, that is).  I found one
report suggesting that it was already broken in 2012:

https://www.postgresql.org/message-id/flat/AD61A3A7C80949178643FE5D2811C35F%40LynnPC

It seems pretty safe to say that nobody's using this build method
anymore.  As best I can tell from perusing the archives, the reason
we used to expend a lot of sweat on it was that there was a freely
available version of Borland C and none of MSVC.  But that stopped
being true a long time ago, so there's not much reason to concern
ourselves with it anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-11 Thread Claudio Freire

On Tue, Apr 11, 2017 at 4:17 PM, Robert Haas  wrote:
> On Tue, Apr 11, 2017 at 2:59 PM, Claudio Freire  
> wrote:
>> On Tue, Apr 11, 2017 at 3:53 PM, Robert Haas  wrote:
>>> 1TB / 8kB per page * 60 tuples/page * 20% * 6 bytes/tuple = 9216MB of
>>> maintenance_work_mem
>>>
>>> So we'll allocate 128MB+256MB+512MB+1GB+2GB+4GB which won't be quite
>>> enough so we'll allocate another 8GB, for a total of 16256MB, but more
>>> than three-quarters of that last allocation ends up being wasted.
>>> I've been told on this list before that doubling is the one true way
>>> of increasing the size of an allocated chunk of memory, but I'm still
>>> a bit unconvinced.
>>
>> There you're wrong. The allocation is capped to 1GB, so wastage has an
>> upper bound of 1GB.
>
> Ah, OK.  Sorry, didn't really look at the code.  I stand corrected,
> but then it seems a bit strange to me that the largest and smallest
> allocations are only 8x different. I still don't really understand
> what that buys us.

Basically, attacking the problem (that, I think, you mentioned) of
very small systems in which overallocation for small vacuums was an
issue.

The "slow start" behavior of starting with smaller segments tries to
improve the situation for small vacuums, not big ones.

By starting at 128M and growing up to 1GB, overallocation is bound to
the range 128M-1GB and is proportional to the amount of dead tuples,
not table size, as it was before. Starting at 128M helps the initial
segment search, but I could readily go for starting at 64M, I don't
think it would make a huge difference. Removing exponential growth,
however, would.

As the patch stands, small systems (say 32-bit systems) without
overcommit and with slowly-changing data can now set high m_w_m
without running into overallocation issues with autovacuum reserving
too much virtual space, as it will reserve memory only proportional to
the amount of dead tuples. Previously, it would reserve all of m_w_m
regardless of whether it was needed or not, with the only exception
being really small tables, so m_w_m=1GB was unworkable in those cases.
Now it should be fine.

> What would we lose if we just made 'em all 128MB?

TBH, not that much. We'd need 8x compares to find the segment, that
forces a switch to binary search of the segments, which is less
cache-friendly. So it's more complex code, less cache locality. I'm
just not sure what's the benefit given current limits.

The only aim of this multiarray approach was making *virtual address
space reservations* proportional to the amount of actual memory
needed, as opposed to configured limits. It doesn't need to be a tight
fit, because calling palloc on its own doesn't actually use that
memory, at least on big allocations like these - the OS will not map
the memory pages until they're first touched. That's true in most
modern systems, and many ancient ones too.

In essence, the patch as it is proposed, doesn't *need* a binary
search, because the segment list can only grow up to 15 segments at
its biggest, and that's a size small enough that linear search will
outperform (or at least perform as well as) binary search. Reducing
the initial segment size wouldn't change that. If the 12GB limit is
lifted, or the maximum segment size reduced (from 1GB to 128MB for
example), however, that would change.

I'd be more in favor of lifting the 12GB limit than of reducing the
maximum segment size, for the reasons above. Raising the 12GB limit
has concrete and readily apparent benefits, whereas using bigger (or
smaller) segments is far more debatable. Yes, that will need a binary
search. But, I was hoping that could be a second (or third) patch, to
keep things simple, and benefits measurable.

Also, the plan as discussed in this very long thread, was to
eventually try to turn segments into bitmaps if dead tuple density was
big enough. That benefits considerably from big segments, since lookup
on a bitmap is O(1) - the bigger the segments, the faster the lookup,
as the search on the segment list would be dominant.

So... what shall we do?

At this point, I've given all my arguments for the current design. If
the more senior developers don't agree, I'll be happy to try your way.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication status in logical replication

2017-04-11 Thread Simon Riggs

On 22 March 2017 at 02:50, Masahiko Sawada  wrote:

> When using logical replication, I ran into a situation where the
> pg_stat_replication.state is not updated until any wal record is sent
> after started up. For example, I set up logical replication with 2
> subscriber and restart the publisher server, but I see the following
> status for a while (maybe until autovacuum run).
...

> Attached patch fixes this behavior by updating WalSndCaughtUp before
> trying to read next WAL if already caught up.

Looks like a bug that we should fix in PG10, with backpatch to 9.4 (or
as far as it goes).

Objections to commit?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Dmitry Ivanov


Tom Lane wrote:

Reading between the lines, I think the problem may be that you're not
being careful about how you set up custom_scan_tlist.  But since the
core code has zero involvement in that decision, it's hard to see why
it would be a core code bug.


I'm sorry, but you didn't understand. It's *the core code* that builds the 
targetlist, and sometimes it may decide to provide a physical targetlist, 
not the one that's *really needed*. The broader targetlist has Vars that 
cannot be supplied by the IndexOnlyScan node, hence the error.


We cannot come up with our own targetlist because we don't know if it's a 
top node and we should return exactly the same tuple (CP_EXACT_TLIST) or 
it's just the stray optimization that made us think so.


Append works only because it doesn't allow projections, and it can never 
get a physical targetlist if an index doesn't contain all columns.


But everything changes when we use CustomScan: PostgreSQL will not pay 
attention. For example, if our CustomScan is a child of NestLoop, the 
former will call this (create_nestloop_plan):


inner_plan = create_plan_recurse(root, best_path->innerjoinpath, 0);

Since NestLoop can make projections, it doesn't enforce CP_EXACT_TLIST 
flag, and our CustomScan may end up with a full physical targetlist (see 
the code of create_scan_plan() if you don't believe me), INSTEAD OF the one 
it really has been asked to produce. Meanwhile, Append will behave as it 
should, since it doesn't care about physical tlists.


It's not just my imagination.


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] src/interfaces/libpq shipping nmake-related Makefiles

2017-04-11 Thread Bruce Momjian

On Tue, Apr 11, 2017 at 03:14:23PM +0200, Magnus Hagander wrote:
> On Tue, Apr 11, 2017 at 4:18 AM, Michael Paquier 
> wrote:
> 
> On Tue, Apr 11, 2017 at 1:45 AM, Tom Lane  wrote:
> > Magnus Hagander  writes:
> > Are these votes for getting rid of both win32.mak and bcc32.mak?
> >
> >> PFA a patch that does this. Did I miss something? :)
> >
> > Perhaps we should get rid of the WIN32_ONLY_COMPILER symbol altogether;
> > given this patch, "#ifdef WIN32_ONLY_COMPILER" could be replaced by
> > "#ifdef _MSC_VER".
> 
> That's here in the patch for people wondering:
> -#if defined(_MSC_VER) || defined(__BORLANDC__)
> +#if defined(_MSC_VER)
>  #define WIN32_ONLY_COMPILER
>  #endif
> +1 for the renaming.
> 
> 
> Ok, since we have two votes for it, I will go ahead and do that (as a separate
> patch pushed together).'

I am fine with removing things, but I do remember one reason the Borland
part was kept is that some tool would only work with the
Borland-compiled library, not gcc or MSVC, but that was long ago.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane

Andres Freund  writes:
> On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
>> Agreed, that would be a mess.

> Doesn't make's -Otarget largely solve this?

Given a sufficiently new make (which would include exactly zero of
my own buildfarm critters), that would help.  I think it does nothing
much for "prove -j" though, and certainly not for MSVC.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andres Freund

On 2017-04-11 15:52:34 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> >> The other thing that might be useful here is to push on parallelizing
> >> buildfarm runs.
> 
> > One reason I haven't supported "make -j nn" everywhere (it is supported
> > for making core, contrib and test_modules) is that the output tends to
> > get rather jumbled, and I didn't want to impose that extra burden in
> > people trying to decipher the results.
> 
> Agreed, that would be a mess.

Doesn't make's -Otarget largely solve this?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM authentication, take three

2017-04-11 Thread Bruce Momjian

On Tue, Apr 11, 2017 at 08:10:23AM +0300, Heikki Linnakangas wrote:
> On 04/11/2017 04:52 AM, Peter Eisentraut wrote:
> Good question. We would need to decide the order of preference for those.
> 
> That question won't arise in practice. Firstly, if the server can do
> scram-sha-256-plus, it presumably can also do scram-sha-512-plus. Unless
> there's a change in the way the channel binding works, such that the
> scram-sha-512-plus variant needs a newer version of OpenSSL or something.
> Secondly, the user's pg_authid row will contain a SCRAM-SHA-256 or
> SCRAM-SHA-512 verifier, not both, so that will dictate which one to use.

It seems openssl has had to deal with cipher preferences for years and
invented ssl_ciphers.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane

Andrew Dunstan  writes:
>> The other thing that might be useful here is to push on parallelizing
>> buildfarm runs.

> One reason I haven't supported "make -j nn" everywhere (it is supported
> for making core, contrib and test_modules) is that the output tends to
> get rather jumbled, and I didn't want to impose that extra burden in
> people trying to decipher the results.

Agreed, that would be a mess.  I was thinking in terms of running steps in
parallel if they have independent output log files, so that that problem
wouldn't arise.  AFAIK, for example, we could run the per-subdirectory
"make check" tests in src/bin/ in parallel without incurring any
legibility issues.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Tom Lane

Dmitry Ivanov  writes:
> Tom Lane wrote:
>> Uh, why would you see that?  The planner would never generate an
>> IndexOnlyScan in the first place if the query required any columns
>> not available from the index.

> True, but as you can see, create_append_plan() produces its own targetlist:

> static Plan *
> create_append_plan(PlannerInfo *root, AppendPath *best_path)
> {
>   Append *plan;
>   List   *tlist = build_path_tlist(root, _path->path);
>   ...

> If we replace Append with some custom node, the plan will instantly become 
> invalid (it won't be be able to build a projection from 'custom_scan_tlist' 
> to 'targetlist'). However, this doesn't mean that it's unable to produce 
> the same result.

You haven't really convinced me that anything is wrong there.  The append
plan's tlist isn't going to contain unwanted variables either.

Reading between the lines, I think the problem may be that you're not
being careful about how you set up custom_scan_tlist.  But since the
core code has zero involvement in that decision, it's hard to see why
it would be a core code bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 2:59 PM, Claudio Freire  wrote:
> On Tue, Apr 11, 2017 at 3:53 PM, Robert Haas  wrote:
>> 1TB / 8kB per page * 60 tuples/page * 20% * 6 bytes/tuple = 9216MB of
>> maintenance_work_mem
>>
>> So we'll allocate 128MB+256MB+512MB+1GB+2GB+4GB which won't be quite
>> enough so we'll allocate another 8GB, for a total of 16256MB, but more
>> than three-quarters of that last allocation ends up being wasted.
>> I've been told on this list before that doubling is the one true way
>> of increasing the size of an allocated chunk of memory, but I'm still
>> a bit unconvinced.
>
> There you're wrong. The allocation is capped to 1GB, so wastage has an
> upper bound of 1GB.

Ah, OK.  Sorry, didn't really look at the code.  I stand corrected,
but then it seems a bit strange to me that the largest and smallest
allocations are only 8x different.  I still don't really understand
what that buys us.  What would we lose if we just made 'em all 128MB?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Andrew Dunstan

On 04/11/2017 12:08 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>>  wrote:
>>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>>> combined 19m52s of that time.
>> I don't think it's quite fair to complain about the TAP tests taking a
>> long time to run as a general matter.  Many people here have long
>> wanted a separate test-suite for long running tests that can be run to
>> really check everything possible, and apparently, TAP tests are it.
>> What I think would be more useful is to break down where the time is
>> getting spent.  It may be that some of those tests are not adding
>> value proportionate to their runtime.

Well, you can get a lot of information on timings in crake's latest
builds for example, or nightjar's.

Here's an interesting fact: almost all the time taken up in the scripts
test set seems to be consumed in running initdb over and over. The
actual tests take almost no time at all. Other test sets might have
similar stats, I am still digging.

+1 for doing some assessments on test value, and also for possibly
creating several test categories, based on both speed and criticality. I
could imaging having some very long running tests, using TAP or
pg_regress,  and optionally enabling them in a buildfarm critter.

I'm not at all opposed to having TAP tests - I have just put in quite a
bit of time extending and improving buildfarm coverage of them. And I'm
probably about to do some more work to enable similar functionality for
MSVC.

> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.  We seem to have
> fixed that for manual application of "make check-world", at least
> if you know the right magic incantations to parallelize it; but
> AFAIK the buildfarm script is entirely serial.

Yeah, it is, and changing that won't be simple. Say 3 steps run in
parallel fail? What do we do? I guess we can just pick the first (by
some definition) and discard the others. Parallelizing individual steps
might be simpler, I'd have to take a look.

Honestly, I'm not that concerned about how long it takes the buildfarm
client per se to run, although I am working on facilities for it where
timing will matter to some people.

And anyone who is concerned about how long it takes on their machine can
run without --enable-tap-tests.

Not sure what the magic incantation is other than "make -j nn"?

One reason I haven't supported "make -j nn" everywhere (it is supported
for making core, contrib and test_modules) is that the output tends to
get rather jumbled, and I didn't want to impose that extra burden in
people trying to decipher the results.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Tom Lane

Amit Kapila  writes:
> On Tue, Apr 11, 2017 at 5:29 AM, Tom Lane  wrote:
>> I think the fact that we see this problem at all may indicate an
>> oversight in commit 5e6d8d2bbbcace304450b309e79366c0da4063e4 ("Allow
>> parallel workers to execute subplans").  If the worker were to actually
>> run the initplan, bad things would happen (the worker would create its
>> own remote connection, which we don't want).  Now, in the problem plan
>> the InitPlan is actually attached to the topmost Gather, which I think
>> is safe because it'll be run by the master, but I wonder if we're being
>> careful enough about non-parallel-safe plans for initplans/subplans.

> Initplans are never marked parallel safe, only subplans that are
> generated for parallel safe paths are marked as parallel safe.

OK ...

>> Also, even if the worker never actually executes the plan node, just
>> doing ExecInitNode on it in a parallel worker might be more than a
>> non-parallel-safe FDW is prepared to cope with.

> I think there is a possibility of doing ExecInitNode in a parallel
> worker for a parallel-unsafe subplan, because we pass a list of all
> the sublans stored in planned statement.

It's more than a possibility.  If you run Andreas' test case against
HEAD, now that the can't-transmit-RestrictInfo failure is fixed,
you will find that the parallel worker actually creates and immediately
destroys an FDW remote connection.  (The easiest way to prove that is
to turn on log_connections/log_disconnections.  BTW, is there any
convenient way to attach a debugger to a parallel worker process as it's
being launched?  I couldn't manage to catch the worker in the act.)

So I think this is clearly a Bad Thing and we'd better do something
about it.  The consequences for postgres_fdw aren't so awful perhaps,
but other non-parallel-safe FDWs might have bigger problems with this
behavior.

> However, the worker will
> never execute such a plan as we don't generate a plan where unsafe
> sublan/initplan is referenced in the node passed to the worker.  If we
> want to avoid passing parallel-unsafe subplans to workers, then I
> think we can maintain a list of parallel safe subplans along with
> subplans in PlannerGlobal and PlannedStmt or maybe keep a parallel
> safe flag in Plan so that we can pass only parallel safe plans to
> workers.

Right, we could, say, leave a hole in the subplan list corresponding
to any subplan that's not parallel-safe.  That seems like a good idea
anyway because right now there's clearly no cross-check preventing
a worker from trying to run such a subplan.

Anyone want to draft a patch for this?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] scram and \password

2017-04-11 Thread Heikki Linnakangas


On 04/10/2017 08:42 AM, Michael Paquier wrote:

As there have been some conflicts because of the commit of SASLprep,
here is a rebased set of patches. A couple of things worth noting:
- SASLprep does an allocation of the prepared password string. It is
definitely better to do all the ground work in pg_saslprep but this
costs a free() call with a #ifdef FRONTEND at the end of
scram_build_verifier().
- Patch 0005 does that:
+   /*
+* Hash password using SCRAM-SHA-256 when connecting to servers
+* newer than Postgres 10, and hash with MD5 otherwise.
+*/
+   if (pset.sversion < 10)
+   encrypted_password = PQencryptPassword(pw1, user, "md5");
+   else
+   encrypted_password = PQencryptPassword(pw1, user, "scram");
Actually I am thinking that guessing the hashing function according to
the value of password_encryption would make the most sense. Thoughts?


Thanks! I've been busy on the other thread on future-proofing the 
protocol with negotiating the SASL mechanism, I'll come back to this 
once we get that settled. By the end of the week, I presume.


Not sure about the password-encryption thing, there are good arguments 
for either behavior..


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-11 Thread Claudio Freire

On Tue, Apr 11, 2017 at 3:59 PM, Claudio Freire  wrote:
> On Tue, Apr 11, 2017 at 3:53 PM, Robert Haas  wrote:
>> 1TB / 8kB per page * 60 tuples/page * 20% * 6 bytes/tuple = 9216MB of
>> maintenance_work_mem
>>
>> So we'll allocate 128MB+256MB+512MB+1GB+2GB+4GB which won't be quite
>> enough so we'll allocate another 8GB, for a total of 16256MB, but more
>> than three-quarters of that last allocation ends up being wasted.
>> I've been told on this list before that doubling is the one true way
>> of increasing the size of an allocated chunk of memory, but I'm still
>> a bit unconvinced.
>
> There you're wrong. The allocation is capped to 1GB, so wastage has an
> upper bound of 1GB.

And total m_w_m for vacuum is still capped to 12GB (as big you can get
with 32-bit integer indices).

So you can get at most 15 segments (a binary search is thus not worth
it), and overallocate by at most 1GB (the maximum segment size).

At least that's my rationale.

Removing the 12GB limit requires a bit of care (there are some 32-bit
counters still around I believe).


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-11 Thread Claudio Freire

On Tue, Apr 11, 2017 at 3:53 PM, Robert Haas  wrote:
> 1TB / 8kB per page * 60 tuples/page * 20% * 6 bytes/tuple = 9216MB of
> maintenance_work_mem
>
> So we'll allocate 128MB+256MB+512MB+1GB+2GB+4GB which won't be quite
> enough so we'll allocate another 8GB, for a total of 16256MB, but more
> than three-quarters of that last allocation ends up being wasted.
> I've been told on this list before that doubling is the one true way
> of increasing the size of an allocated chunk of memory, but I'm still
> a bit unconvinced.

There you're wrong. The allocation is capped to 1GB, so wastage has an
upper bound of 1GB.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bumping HASH_VERSION to 3

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 2:23 PM, Bruce Momjian  wrote:
> On Fri, Mar 31, 2017 at 02:48:05PM -0400, Tom Lane wrote:
>> +1, as long as we're clear on what will happen when pg_upgrade'ing
>> an installation containing hash indexes.  I think a minimum requirement is
>> that it succeed and be able to start up, and allow the user to manually
>> REINDEX such indexes afterwards.  Bonus points for:
>>
>> 1. teaching pg_upgrade to create a script containing the required REINDEX
>> commands.  (I think it's produced scripts for similar requirements in the
>> past.)
>>
>> 2. marking the index invalid so that the system would silently ignore it
>> until it's been reindexed.  I think there might be adequate infrastructure
>> for that already thanks to REINDEX CONCURRENTLY, and it'd just be a matter
>> of getting pg_upgrade to hack the indexes' catalog state.  (If not, it's
>> probably not worth the trouble.)
>
> We already have code to do all of that, but it was removed from
> pg_upgrade in 9.5.  You can still see it in 9.4:
>
> 
> contrib/pg_upgrade/version_old_8_3.c::old_8_3_invalidate_hash_gin_indexes()
>
> I would be happy to restore that code and make it work for PG 10.

Cool!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-04-11 Thread Robert Haas

On Fri, Apr 7, 2017 at 9:12 PM, Andres Freund  wrote:
>> Why do you say exponential growth fragments memory? AFAIK, all those
>> allocations are well beyond the point where malloc starts mmaping
>> memory, so each of those segments should be a mmap segment,
>> independently freeable.
>
> Not all platforms have that, and even on platforms with it, frequent,
> unevenly sized, very large allocations can lead to enough fragmentation
> that further allocations are harder and fragment / enlarge the
> pagetable.

Such a thing is completely outside my personal experience.  I've never
heard of a case where a 64-bit platform fails to allocate memory
because something (what?) is fragmented.  Page table memory usage is a
concern at some level, but probably less so for autovacuum workers
than for most backends, because autovacuum workers (where most
vacuuming is done) exit after one pass through pg_class.  Although I
think our memory footprint is a topic that could use more energy, I
don't really see any reason to think that pagetable bloat caused my
unevenly sized allocations in short-lived processes is the place to
start worrying.

That having been said, IIRC, I did propose quite a ways upthread that
we use a fixed chunk size, just because it would use less actual
memory, never mind the size of the page table.  I mean, if you
allocate in chunks of 64MB, which I think is what I proposed, you'll
never waste more than 64MB.  If you allocate in
exponentially-increasing chunk sizes starting at 128MB, you could
easily waste much more.  Let's imagine a 1TB table where 20% of the
tuples are dead due to some large bulk operation (a bulk load failed,
or a bulk delete succeeded, or a bulk update happened).  Back of the
envelope calculation:

1TB / 8kB per page * 60 tuples/page * 20% * 6 bytes/tuple = 9216MB of
maintenance_work_mem

So we'll allocate 128MB+256MB+512MB+1GB+2GB+4GB which won't be quite
enough so we'll allocate another 8GB, for a total of 16256MB, but more
than three-quarters of that last allocation ends up being wasted.
I've been told on this list before that doubling is the one true way
of increasing the size of an allocated chunk of memory, but I'm still
a bit unconvinced.

On the other hand, if we did allocate fixed chunks of, say, 64MB, we
could end up with an awful lot of them.  For example, in the example
above, 9216MB/64MB = 144 chunks.  Is that number of mappings going to
make the VM system unhappy on any of the platforms we care about?  Is
that a bigger or smaller problem than what you (Andres) are worrying
about?  I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Why does logical replication launcher set application_name?

2017-04-11 Thread Tom Lane

I notice looking at pg_stat_activity that the logical replication launcher
sets its application_name to "logical replication launcher".  This seems
inconsistent (no other standard background process sets application_name),
redundant with other columns that already tell you what it is, and an
unreasonable consumption of horizontal space in the tabular output.
Can we drop that?  If we do have to have something like that, what about
putting it in the "query" field where it's much less likely to be
substantially wider than any other entry in the column?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Dmitry Ivanov


Tom Lane wrote:

Uh, why would you see that?  The planner would never generate an
IndexOnlyScan in the first place if the query required any columns
not available from the index.


True, but as you can see, create_append_plan() produces its own targetlist:

static Plan *
create_append_plan(PlannerInfo *root, AppendPath *best_path)
{
Append *plan;
List   *tlist = build_path_tlist(root, _path->path);
...

If we replace Append with some custom node, the plan will instantly become 
invalid (it won't be be able to build a projection from 'custom_scan_tlist' 
to 'targetlist'). However, this doesn't mean that it's unable to produce 
the same result.



--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bumping HASH_VERSION to 3

2017-04-11 Thread Bruce Momjian

On Fri, Mar 31, 2017 at 02:48:05PM -0400, Tom Lane wrote:
> +1, as long as we're clear on what will happen when pg_upgrade'ing
> an installation containing hash indexes.  I think a minimum requirement is
> that it succeed and be able to start up, and allow the user to manually
> REINDEX such indexes afterwards.  Bonus points for:
> 
> 1. teaching pg_upgrade to create a script containing the required REINDEX
> commands.  (I think it's produced scripts for similar requirements in the
> past.)
> 
> 2. marking the index invalid so that the system would silently ignore it
> until it's been reindexed.  I think there might be adequate infrastructure
> for that already thanks to REINDEX CONCURRENTLY, and it'd just be a matter
> of getting pg_upgrade to hack the indexes' catalog state.  (If not, it's
> probably not worth the trouble.)

We already have code to do all of that, but it was removed from
pg_upgrade in 9.5.  You can still see it in 9.4:


contrib/pg_upgrade/version_old_8_3.c::old_8_3_invalidate_hash_gin_indexes()

I would be happy to restore that code and make it work for PG 10.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread Peter Eisentraut

On 4/11/17 11:47, David G. Johnston wrote:
> A potential middle-ground is to start, but then only allow superuser
> connections.

Then you might as well start and allow all connections.  I don't see
what this buys.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Peter Eisentraut

On 4/11/17 12:08, Tom Lane wrote:
> The other thing that might be useful here is to push on parallelizing
> buildfarm runs.  Admittedly that will do nothing for the oldest and
> slowest buildfarm critters, but for reasonably modern hardware the
> serialization of the tests really handicaps you.

Especially the recovery and subscription tests do a lot of waiting for
one of the nodes to catch up.  You get by that by running several of
these tests in parallel, which is well supported now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Tom Lane

Dmitry Ivanov  writes:
> In theory, CustomScans should be able to use any Plan nodes (i.e. 
> 'custom_plans' list), but as far as I can understand, there's no way to 
> override behavior of use_physical_tlist(), which means that we might see 
> something like this:

> ERROR:  variable not found in subplan target list

> if we use child IndexOnlyScan and the index does not include some of the 
> relation's columns.

Uh, why would you see that?  The planner would never generate an
IndexOnlyScan in the first place if the query required any columns
not available from the index.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GCC 7 warnings

2017-04-11 Thread Alvaro Herrera

Tom Lane wrote:
> Peter Eisentraut  writes:

> > d) Replace most of the problematic code with psprintf() and dynamically
> > sized buffers.
> 
> +1 for (c) as you have it.  Later we might think about selectively
> doing (d), but it seems like more work for probably not much benefit.

Yeah -- also it's possible some of these code paths must not attempt to
palloc() for robustness reasons.  I would go for c) only for now, and
only try d) for very specific cases where there are no such concerns.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merge join for GiST

2017-04-11 Thread Andrew Borodin

2017-04-11 14:17 GMT+05:00 Alexander Korotkov :
> FYI, I've implemented this algorithm for pgsphere.  See following branch.
> https://github.com/akorotkov/pgsphere/tree/experimental
> It's implemented as crossmatch() function which takes as arguments names of
> two indexes over spoint and maximum distance (it checks not overlapping but
> proximity of points).  This function returns setof pairs of TIDs.
>
> Later, Dmitry Ivanov made it a custom scan node.
> https://github.com/akorotkov/pgsphere/tree/crossmatch_cnode
>
> You also can find some experimental evaluation here:
> http://www.adass2016.inaf.it/images/presentations/10_Korotkov.pdf
>
> Feel free to experiment with this code and/or ask any question.

Wow, that's cool! Thanks! That code actually answers a lot of questions.
I'll try to generalize that for && operator

Best regards, Andrey Borodin.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioned tables and relfilenode

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 9:45 AM, Tom Lane  wrote:
> Amit Langote  writes:
>> 2. DefineQueryRewrite() may try to scan a partitioned table in the case of
>> converting a table to view, where we must make sure that the table being
>> converted is empty.  It's checked by scanning the heap, which we should
>> not do for a partitioned table.  Nor should we try to drop the storage
>> once ready to make the table into a REKIND_VIEW relation (because all
>> other checks passed okaying the conversion).
>
> It looks like this patch intends to allow converting a partitioned table
> to a view.  I would lobby for refusing the command, instead.  There is
> no good reason to allow it, and it might well be a user error.

Yeah, I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical rep worker for table sync can start and stop very frequently unexpectedly

2017-04-11 Thread Masahiko Sawada

On Wed, Apr 12, 2017 at 2:05 AM, Fujii Masao  wrote:
> Hi,
>
> I observed $subject when I ran the following steps.
>
> 1. start the publisher server
> 2. start the subscriber server
> 3. create table with primary key and publication for it
> 4. insert several records into the table
> 5. create table with primary key and subscription
> 6. wait for table sync to finish
> 7. drop the subscription and recreate the subscription
>
> Then the launcher starts the worker, and the worker starts
> another worker for table sync. But that worker for table sync
> exits immediately because of primary key violation.
> Then the worker tries to start new worker for table sync immediately
> and it also exits immediately because of primary key violation.
> This sequence repeats very fast...
>
> Attached is the script that I used to reproduce the issue.
>

I encountered the same situation[1] and am proposing to put interval
of launching table sync worker after failed to attempt.

[1] 
https://www.postgresql.org/message-id/CAD21AoDCnyRJDUY%3DESVVe68AukvOP2dFomTeBFpAd1TiFbjsGg%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Quorum commit for multiple synchronous replication.

2017-04-11 Thread Masahiko Sawada

On Thu, Apr 6, 2017 at 4:17 PM, Masahiko Sawada  wrote:
> On Thu, Apr 6, 2017 at 10:51 AM, Noah Misch  wrote:
>> On Thu, Apr 06, 2017 at 12:48:56AM +0900, Fujii Masao wrote:
>>> On Wed, Apr 5, 2017 at 3:45 PM, Noah Misch  wrote:
>>> > On Mon, Dec 19, 2016 at 09:49:58PM +0900, Fujii Masao wrote:
>>> >> Regarding this feature, there are some loose ends. We should work on
>>> >> and complete them until the release.
>>> >>
>>> >> (1)
>>> >> Which synchronous replication method, priority or quorum, should be
>>> >> chosen when neither FIRST nor ANY is specified in s_s_names? Right now,
>>> >> a priority-based sync replication is chosen for keeping backward
>>> >> compatibility. However some hackers argued to change this decision
>>> >> so that a quorum commit is chosen because they think that most users
>>> >> prefer to a quorum.
>>> >>
>>> >> (2)
>>> >> There will be still many source comments and documentations that
>>> >> we need to update, for example, in high-availability.sgml. We need to
>>> >> check and update them throughly.
>>> >>
>>> >> (3)
>>> >> The priority value is assigned to each standby listed in s_s_names
>>> >> even in quorum commit though those priority values are not used at all.
>>> >> Users can see those priority values in pg_stat_replication.
>>> >> Isn't this confusing? If yes, it might be better to always assign 1 as
>>> >> the priority, for example.
>>> >
>>> > [Action required within three days.  This is a generic notification.]
>>> >
>>> > The above-described topic is currently a PostgreSQL 10 open item.  Fujii,
>>> > since you committed the patch believed to have created it, you own this 
>>> > open
>>> > item.  If some other commit is more relevant or if this does not belong 
>>> > as a
>>> > v10 open item, please let us know.  Otherwise, please observe the policy 
>>> > on
>>> > open item ownership[1] and send a status update within three calendar 
>>> > days of
>>> > this message.  Include a date for your subsequent status update.  Testers 
>>> > may
>>> > discover new open items at any time, and I want to plan to get them all 
>>> > fixed
>>> > well in advance of shipping v10.  Consequently, I will appreciate your 
>>> > efforts
>>> > toward speedy resolution.  Thanks.
>>> >
>>> > [1] 
>>> > https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
>>>
>>> Thanks for the notice!
>>>
>>> Regarding the item (2), Sawada-san told me that he will work on it after
>>> this CommitFest finishes. So we would receive the patch for the item from
>>> him next week. If there will be no patch even after the end of next week
>>> (i.e., April 14th), I will. Let's wait for Sawada-san's action at first.
>>
>> Sounds reasonable; I will look for your update on 14Apr or earlier.
>>
>>> The items (1) and (3) are not bugs. So I don't think that they need to be
>>> resolved before the beta release. After the feature freeze, many users
>>> will try and play with many new features including quorum-based syncrep.
>>> Then if many of them complain about (1) and (3), we can change the code
>>> at that timing. So we need more time that users can try the feature.
>>
>> I've moved (1) to a new section for things to revisit during beta.  If 
>> someone
>> feels strongly that the current behavior is Wrong and must change, speak up 
>> as
>> soon as you reach that conclusion.  Absent such arguments, the behavior won't
>> change.
>>
>>> BTW, IMO (3) should be fixed so that pg_stat_replication reports NULL
>>> as the priority if quorum-based sync rep is chosen. It's less confusing.
>>
>> Since you do want (3) to change, please own it like any other open item,
>> including the mandatory status updates.
>
> I agree to report NULL as the priority. I'll send a patch for this as well.
>
> Regards,
>

Attached two draft patches. The one makes pg_stat_replication.sync
priority report NULL if in quorum-based sync replication. To prevent
extra change I don't change so far the code of setting standby
priority. The another one improves the comment and documentation. If
there is more thing what we need to mention in documentation please
give me feedback.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


report_null_as_sync_priority.patch
Description: Binary data


quorum_repl_doc_improve.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] my open items vs. my vacation

2017-04-11 Thread Robert Haas

I've managed to clean up all but one of the open items[1] attributable
to my v10 commits; the exception is the issue with pg_dump and
partitioned tables, which I still hope to get resolved this week.

On Friday, I'm leaving on vacation; my first day back in the office
will be April 24th.  While I expect to check email, including
pgsql-hackers, periodically while on vacation, my ability to spend
serious time addressing open items will obviously be reduced.
Therefore, I'd like to request that the RMT accept the following
blanket status update for open items attributable to my commits which
arise after April 13th: "I will begin investigating this no later than
April 24th, my first day back from vacation, and will provide a
further update by that same day."

Most of my close colleagues at EnterpriseDB will be in the office
during this time, and I have asked the people with whom I work closely
to follow up on items related to their own commits and, even, where
possible, other open items with my name on them.  However, as none of
those individuals are committers[2], they will not be able to do more
than propose patches and review those proposed by others.  It would be
great if other committers are able to help out with committing any
unintimidating patches which may be proposed.

I apologize for any disruption this may cause, but I'm hopeful that it
won't be too bad.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://wiki.postgresql.org/wiki/PostgreSQL_10_Open_Items
[2] Something which I hope will change at an appropriate point in time.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-04-11 Thread Alexander Kuzmenkov


Hi,

I would like to propose a patch that speeds up the queries of the form 
'select
count(*) ... where ...',  where the restriction clauses can be satisfied 
by some

indexes. At the moment, such queries use index-only scans followed by
aggregation. Index-only scans are only possible for indexes that are 
capable of
returning indexed tuples, that is, support the 'amgettuple' access 
method. They
are not applicable to indexes such as GIN and RUM. However, it is 
possible to

improve count(*) performance for indexes that support bitmap scans. Having
performed a bitmap index scan or a combination of such, the bits in 
bitmap can

be counted to obtain the final result. Of course, the bitmap pages that are
lossy or not visible to all existing transactions will still require heap
access.

One kind of applications that can benefit from this change is the full-text
search with pagination. To show a search results page, the application 
has to
know the results that go to current page, and the total number of the 
results.
Getting one page is fast, when the desired sorting order can be provided 
by an
index. For example, results can be sorted by date with a separate btree 
index,
or by relevance with RUM index. However, getting the total number of 
results is
more difficult. With text search indexes, it requires a bitmap heap 
scan, which
can be rather slow due to obligatory heap access. A well-known hack for 
this is
using the approximate data from 'explain' results. The proposed change 
allows

the user to obtain the precise number of the results in an efficient and
idiomatic manner.

The performance of this approach was tested on an archive of pgsql-hackers
mailing list. The detailed results for two sample queries can be found 
in the

attached file 'benchmark.txt'. The first test demonstrates full-text search
with RUM index, ordering the results by rank. For a query with low 
selectivity,
getting the top results is much faster than counting them all with a 
bitmap heap
scan. With bitmap count execution plan, the results can be counted much 
faster.
A similar test is done with a GIN index, where the results are 
restricted and
ordered by date using another btree index. Again, it shows a significant 
speedup

for count(*) query for bitmap count scan compared to bitmap heap scan. These
results demonstrate that the bitmap count plan can indeed be useful for 
full-

text search scenarios.

Structurally, the patch consists of two major parts: a specialized executor
node and the generation of corresponding paths and plans. The optimizer 
behavior
here is similar to that of the min/max aggregate optimization. The main 
entry

point is preprocess_count_aggs(); it is called by grouping_planner(). It
searches for count(*) expression in the target list, and, if possible, 
generates

a special path for it (struct BitmapCountPath). This path is then added to
UPPERREL_GROUP_AGG upperrel, and competes with other paths at the 
further stages

of planning.

The executor node (nodeBitmapCount.c) is similar to the bitmap heap scan 
node,
with the main difference being that it does not access heap for tuples 
that are

known to satisfy the restriction and to be visible to all transactions.

This patch has some important limitations:
* It only supports targetlist consisting of a single expression that can be
projected from count(*).
* count(expr) is not supported. We could support it for cases where the
"expr is not null" restriction can be satisfied with an index.
* The current implementation does not support parallel execution. It 
could be

implemented during the PostgreSQL 11 release cycle.
* For some indexes, the bitmap index scan will always require rechecking 
all
the tuples. Bitmap count plans should not be used in such cases. For 
now, this

check is not implemented.

I would be glad to hear your comments on this patch.

Regards,
Alexander Kuzmenkov

===
= The data

test1=# \d pglist
   Table "public.pglist"
   Column   |Type | Collation | Nullable | Default 
+-+---+--+-
 id | integer |   |  | 
 sent   | timestamp without time zone |   |  | 
 subject| text|   |  | 
 author | text|   |  | 
 body_plain | text|   |  | 
 fts| tsvector|   |  | 
Indexes:
"idx_pglist_rum_fts" rum (fts)
"idx_pglist_fts" gin (fts)
"idx_pglist_sent" btree (sent)


test1=# select min(sent), max(sent), count(*) from pglist;
 min | max |  count  
-+-+-
 1997-06-24 11:31:09 | 2016-04-27 23:46:29 | 1013770
(1 row)

Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)

2017-04-11 Thread Pavan Deolasee

On Tue, Apr 11, 2017 at 7:10 PM, Robert Haas  wrote:

>
>
> Yes, and as Andres says, you don't help with those, and then you're
> upset when your own patch doesn't get attention.

I am not upset, I was obviously a bit disappointed which I think is a very
natural emotion after spending weeks on it. I am not blaming any one
individual (excluding myself) for that and neither the community at large
for the outcome. And I've moved on. I know everyone is busy getting the
release ready and I see no point discussing this endlessly. We have enough
on our plates for next few weeks.

>
>   Amit and others who have started to
> dig into this patch a little bit found real problems pretty quickly
> when they started digging.

And I fixed them as quickly as humanly possible.

>   Those problems should be addressed, and
> review should continue (from whatever source) until no more problems
> can be found.

Absolutely.

>  The patch may
> or may not have any data-corrupting bugs, but regressions have been
> found and not addressed.

I don't know why you say that regressions are not addressed. Here are a few
things I did to address the regressions/reviews/concerns, apart from fixing
all the bugs discovered, but please let me know if there are things I've
not addressed.

1. Improved the interesting attrs patch that Alvaro wrote to address the
regression discovered in fetching more heap attributes. The patch that got
committed in fact improved certain synthetic workloads over then master.
2. Based on Petr and your feedback, disabled WARM on toasted attributes to
reduce overhead of fetching/decompressing the attributes.
3. Added code to avoid doing second index scan when the index does not
contain any WARM pointers. This should address the situation Amit brought
up where only one of the indexes receive WARM inserts.
4. Added code to kill wrong index pointers to do online cleanup.
5. Added code to set a CLEAR pointer to a WARM pointer when we know that
the entire chain is WARM. This should address the workload Dilip ran and
found regression (I don't think he got chance to confirm that)
6. Enhanced stats collector to collect information about candidate WARM
chains and added mechanism to control WARM cleanup at the heap as well as
index level, based on configurable parameters. This gives user better
control over the additional work that is required for WARM cleanup.
7. Added table level option to disable WARM if nothing else works.
8. Added mechanism to disable WARM when more than 50% indexes are being
updated. I ran some benchmarks with different percentage of indexes getting
updated and thought this is a good threshold.

I may have missed something, but there is no intention to ignore known
regressions/reviews. Of course, I don't think that every regression will be
solvable, like if you run a CPU-bound workload, setting it up in a way such
that you repeatedly exercise the area where WARM is doing additional work,
without providing any benefit, may be you can still find regression. I am
willing to fix them as long as they are fixable and we are comfortable with
the additional code complexity. IMHO certain trade-offs are good, but I
understand that not everybody will agree with my views and that's ok.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] FDW and parallel execution

2017-04-11 Thread PostgreSQL - Hans-Jürgen Schönig

hello ...

did you check out antonin houska's patches?
we basically got code, which can do that.

many thanks,

hans


On 04/02/2017 03:30 PM, Konstantin Knizhnik wrote:
> Hi hackers and personally Robet (you are the best expert in both areas).
> I want to ask one more question concerning parallel execution and FDW.
> Below are two plans for the same query (TPC-H Q5): one for normal
> tables, another for FDW to vertical representation of the same data.
> FDW supports analyze function and is expected to produce the similar
> statistic as for original tables.
>
> Query plans are the following:
>
> Normal tables:
>
>  Sort  (cost=2041588.48..2041588.98 rows=200 width=48)
>Sort Key: (sum((lineitem.l_extendedprice * ('1'::double precision -
> lineitem.l_discount DESC
>->  Finalize GroupAggregate  (cost=2041335.76..2041580.83 rows=200
> width=48)
>  Group Key: nation.n_name
>  ->  Gather Merge  (cost=2041335.76..2041568.33 rows=1400
> width=48)
>Workers Planned: 7
>->  Partial GroupAggregate 
> (cost=2040335.64..2040396.71 rows=200 width=48)
>  Group Key: nation.n_name
>  ->  Sort  (cost=2040335.64..2040345.49 rows=3938
> width=40)
>Sort Key: nation.n_name
>->  Hash Join  (cost=605052.97..2040100.48
> rows=3938 width=40)
>  Hash Cond: ((orders.o_custkey =
> customer.c_custkey) AND (nation.n_nationkey = customer.c_nationkey))
>  ->  Hash Join 
> (cost=525126.37..1951647.85 rows=98414 width=52)
>Hash Cond: (lineitem.l_orderkey
> = orders.o_orderkey)
>->  Hash Join 
> (cost=3802.22..1404473.37 rows=654240 width=52)
>  Hash Cond:
> (lineitem.l_suppkey = supplier.s_suppkey)
>  ->  Parallel Seq Scan on
> lineitem  (cost=0.00..1361993.36 rows=8569436 width=16)
>  ->  Hash 
> (cost=3705.97..3705.97 rows=7700 width=44)
>->  Hash Join 
> (cost=40.97..3705.97 rows=7700 width=44)
>  Hash Cond:
> (supplier.s_nationkey = nation.n_nationkey)
>  ->  Seq Scan
> on supplier  (cost=0.00..3090.00 rows=10 width=8)
>  ->  Hash 
> (cost=40.79..40.79 rows=15 width=36)
>-> 
> Hash Join  (cost=20.05..40.79 rows=15 width=36)
> 
> Hash Cond: (nation.n_regionkey = region.r_regionkey)
> 
> ->  Seq Scan on nation  (cost=0.00..17.70 rows=770 width=40)
> 
> ->  Hash  (cost=20.00..20.00 rows=4 width=4)
>   
> ->  Seq Scan on region  (cost=0.00..20.00 rows=4 width=4)
>   
>   
> Filter: ((r_name)::text = 'ASIA'::text)
>->  Hash 
> (cost=484302.37..484302.37 rows=2256542 width=8)
>  ->  Seq Scan on orders 
> (cost=0.00..484302.37 rows=2256542 width=8)
>Filter:
> ((o_orderdate >= '1996-01-01'::date) AND (o_orderdate <
> '1997-01-01'::date))
>  ->  Hash  (cost=51569.64..51569.64
> rows=1499864 width=8)
>->  Seq Scan on customer 
> (cost=0.00..51569.64 rows=1499864 width=8)
>
>
> Plan with FDW:
>
>  Sort  (cost=2337312.28..2337312.78 rows=200 width=48)
>Sort Key: (sum((lineitem_fdw.l_extendedprice * ('1'::double
> precision - lineitem_fdw.l_discount DESC
>->  GroupAggregate  (cost=2336881.54..2337304.64 rows=200 width=48)
>  Group Key: nation.n_name
>  ->  Sort  (cost=2336881.54..2336951.73 rows=28073 width=40)
>Sort Key: nation.n_name
>->  Hash Join  (cost=396050.65..2334807.39 rows=28073
> width=40)
>  Hash Cond: ((orders_fdw.o_custkey =
> customer_fdw.c_custkey) AND (nation.n_nationkey =
> customer_fdw.c_nationkey))
>  ->  Hash Join  (cost=335084.53..2247223.46
> rows=701672 width=52)
>Hash Cond: (lineitem_fdw.l_orderkey =
> orders_fdw.o_orderkey)
>->  Hash Join  (cost=2887.07..1786058.18
> rows=4607421 width=52)
>  Hash Cond: (lineitem_fdw.l_suppkey =
> supplier_fdw.s_suppkey)
>

Re: [HACKERS] RENAME RULE doesn't work with partitioned tables

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 5:54 AM, Amit Langote
 wrote:
> Just noticed that RangeVarCallbackForRenameRule() was not updated to
> handle partitioned tables, causing the following bug:
>
> create table parted_table (a int) partition by list (a);
> create table part partition of parted_table for values in (1);
> create rule parted_table_insert as on insert to parted_table
>   do instead insert into part values (new.*);
> alter rule parted_table_insert on parted_table
>   rename to parted_table_insert_redirect;
> -- ERROR:  "parted_table" is not a table or view
>
> Attached fixes this and adds a test.  Added to open items list.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FDW and parallel execution

2017-04-11 Thread Konstantin Knizhnik




On 04.04.2017 13:29, Kyotaro HORIGUCHI wrote:

Hi,

At Sun, 02 Apr 2017 16:30:24 +0300, Konstantin Knizhnik  
wrote in <58e0fcf0.2070...@postgrespro.ru>

Hi hackers and personally Robet (you are the best expert in both
areas).
I want to ask one more question concerning parallel execution and FDW.
Below are two plans for the same query (TPC-H Q5): one for normal
tables, another for FDW to vertical representation of the same data.
FDW supports analyze function and is expected to produce the similar
statistic as for original tables.



The plans look very similar, but first one is parallel and second -
not.
My FDW provides implementation for IsForeignScanParallelSafe which
returns true.
I wonder what can prevent optimizer from using parallel plan in this
case?

Parallel execution requires partial paths. It's the work for
GetForeignPaths of your FDW.


Thank you very much for explanation.
But unfortunately I still do not completely understand what kind of 
queries allow parallel execution with FDW.


Section "FDW Routines for Parallel Execution" of FDW specification says:
A ForeignScan node can, optionally, support parallel execution. A 
parallel ForeignScan will be executed in multiple processes and should 
return each row only once across all cooperating processes. To do 
this, processes can coordinate through fixed size chunks of dynamic 
shared memory. This shared memory is not guaranteed to be mapped at 
the same address in every process, so pointers may not be used. The 
following callbacks are all optional in general, but required if 
parallel execution is to be supported.


I provided IsForeignScanParallelSafe, EstimateDSMForeignScan, 
InitializeDSMForeignSca and InitializeWorkerForeignScan in my FDW.

IsForeignScanParallelSafe returns true.
Also in GetForeignPaths function I created path with 
baserel->consider_parallel == true.

Is it enough or I should do something else?

But unfortunately I failed to find any query: sequential scan, grand 
aggregation, aggregation with group by, joins... when parallel execution 
plan is used for this FDW.
Also there are no examples of using this functions in Postgres 
distributive and I failed to find any such examples in Internet.


Can somebody please clarify my situation with parallel execution and FDW 
and may be point at some examples?

Thank in advance.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas  wrote:
> On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
>  wrote:
>> At first I was like 'WTF? Why do we need a new GUC just becase of an
>> assert?' but you're actually not adding a new GUC parameter, you're adding a
>> constant which is then used as a max value for max for the two existing
>> parallel GUCs.
>>
>> I think this is fine.
>
> I think it is pretty odd-looking.  As written, it computes an unsigned
> -- and therefore necessarily non-negative -- value into a signed --
> and thus possibly neative -- value only to pass it back to abs() to
> make sure it's not negative:
>
> +   Assert(!parallel ||
> abs((int)(BackgroundWorkerData->parallel_register_count -
> +
>   BackgroundWorkerData->parallel_terminate_count)) <=
> +   MAX_PARALLEL_WORKER_LIMIT);
>
> I think we can just say
>
> Assert(!parallel || BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

Actually, there's an even simpler way: stick it inside the if () block
where we return false if we're outside the limit.  Then we don't need
to test the "parallel" bool either, because it's already known to be
true.  Committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] logical rep worker for table sync can start and stop very frequently unexpectedly

2017-04-11 Thread Fujii Masao

Hi,

I observed $subject when I ran the following steps.

1. start the publisher server
2. start the subscriber server
3. create table with primary key and publication for it
4. insert several records into the table
5. create table with primary key and subscription
6. wait for table sync to finish
7. drop the subscription and recreate the subscription

Then the launcher starts the worker, and the worker starts
another worker for table sync. But that worker for table sync
exits immediately because of primary key violation.
Then the worker tries to start new worker for table sync immediately
and it also exits immediately because of primary key violation.
This sequence repeats very fast...

Attached is the script that I used to reproduce the issue.

Regards,

-- 
Fujii Masao


test.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reversed sync check in pg_receivewal

2017-04-11 Thread Tom Lane

Magnus Hagander  writes:
> Something like the attached?

Not sure about

+ * All methods that have a failure path will set errno on failure.

Given that you've got a getlasterror method, I don't think that's really
the API invariant is it?  If it were, you'd just have the callers doing
strerror(errno) for themselves.  I think maybe a more accurate statement
would be

 * All methods that have a failure return indicator will set state
 * allowing the getlasterror() method to return a suitable message.
 * Commonly, errno is this state (or part of it); so callers must take
 * care not to clobber errno between a failed method call and use of
 * getlasterror() to retrieve the message.

Also:

* the arguments of open_for_write() could stand to be explained

* what's the return value of finish()?

* wouldn't it be better to declare getlasterror() as returning
  "const char *"?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Possible problem in Custom Scan API

2017-04-11 Thread Dmitry Ivanov


Hi hackers,

I'm struggling to understand one particular thing about Custom Scan API.

As you may know, there's a function called use_physical_tlist(), which aims 
to eliminate meaningless projections during query execution. Any scan node 
(e.g. CustomScan) aims to take advantage of physical targetlists... except 
for the IndexOnlyScan (for obvious reasons):


createplan.c, create_scan_plan():

if (use_physical_tlist(root, best_path, flags))
{
if (best_path->pathtype == T_IndexOnlyScan)
{
/* For index-only scan, the preferred tlist is the index's */
tlist = copyObject(((IndexPath *) 
best_path)->indexinfo->indextlist);
...
}
...
}

In theory, CustomScans should be able to use any Plan nodes (i.e. 
'custom_plans' list), but as far as I can understand, there's no way to 
override behavior of use_physical_tlist(), which means that we might see 
something like this:


ERROR:  variable not found in subplan target list

if we use child IndexOnlyScan and the index does not include some of the 
relation's columns.


Is there any existing workaround?


--
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Tom Lane

Ashutosh Bapat  writes:
> On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane  wrote:
>> OK.  I was trying to be noninvasive, but this does look more sensible.
>> I think this might read better if we inverted the test (so that
>> the outer-join-joinclause-must-be-pushable case would come first).

> Ok. In fact, thinking more about it, we should probably test
> JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are
> not considered as outer joins and I am not sure how would we be
> treating the quals for those.

No, that's correct as-is --- or at least, if it's not correct, there
are a bunch of other places that are also not correct.

Thinking about this further, though, it seems like a more straightforward
solution to the original problem is to not store quals in a Plan's
fdw_private list in the first place.  Putting them there is at best
useless overhead that the finished plan will have to drag around;
and it's not terribly hard to imagine future changes that would make
having never-processed-by-setrefs.c qual trees in a plan be broken in
other ways.  Can't we use a field in the rel's PgFdwRelationInfo to
carry the remote_exprs forward from postgresGetForeignPlan to
postgresPlanDirectModify?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 10:13 AM, Stephen Frost  wrote:
>> I totally agree, which is why I was rather surprised when you
>> vigorously objected to my attempts to replace the remainder of the
>> main tree's superuser checks that completely block execution of
>> certain SQL functions with privilege grants.  The parameters within
>> which you find explicit superuser checks tolerable are extremely murky
>> to me.
>
> Filesystem-level access seems reasonable to restrict to superusers.

I think in an ideal world we wouldn't have any hard-coded superuser()
checks at all, because all privileges would be able to be delegated in
a fine-grained manner.  But even if I were to agree with you on this
point, you've argued for a fairly inconsistent application of that
principle.  However, I do agree with your remarks later in the email
that it's best not to derail this thread to discuss that issue in
depth, so I'll shut up about this for now.

> I specifically made a point to not question what was or wasn't
> considered sensitive as it can certainly vary.  The point I was making
> is that if there's sensitive information there then we can exclude just
> that information but allow a pg_dump-using user to see that a
> subscription exists without it.

OK, apparently I misread your remarks.  I thought you were arguing
that the data wasn't sensitive, which seemed like an odd take.

I don't think it's the purpose of pg_dump to reveal what objects
exist, but rather to create a dump file that can be used to recreate
the state of the database.  To the extent that the user lacks
permissions necessary to dump the objects, they can't be dumped.
Maybe modifying subscriptions so that they can exist without a
connection and "half-dumping" them is better than not dumping them at
all, but my initial impression is to think that it's worse.  A user
may be more likely to notice something that's missing altogether than
something that exists but in a non-working state.  You mentioned
--no-role-passwords, but that's a nasty kludge in my book and I'm not
in favor of making something like that the default behavior.  Peter's
approach looks like less work, and helps us get beta out the door.

I can't claim to be an expert on all of this, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SUBSCRIPTIONS and pg_upgrade

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 9:48 AM, Robert Haas  wrote:
> On Mon, Apr 10, 2017 at 1:58 PM, Peter Eisentraut
>  wrote:
>> OK, we need to come to a conclusion here.  To summarize:
>>
>> Problem 1: pg_subscription.subconninfo can only be read by superuser.
>> So non-superusers cannot dump subscriptions.
>>
>> Precedent is pg_user_mapping.  In that case, we just omit the
>> user-mapping options if we're not a superuser.  Pretty dubious, but in
>> any case that won't work here, because you cannot create a subscription
>> without a CONNECTION clause.
>>
>> Proposal: Dump subscriptions if running as superuser.  If not, check if
>> there are subscriptions and warn about that.  Remove current pg_dump
>> --include-subscriptions option.
>
> +1. I don't totally love it, but I don't have a better idea.
>
>> Problem 2: Restoring a subscription immediately starts replication.
>> Maybe you want that or maybe you don't.  We could dump all subscriptions
>> in DISABLED state.  But then after restoring you have to go and manually
>> enable all subscriptions.  We don't have a command to turn all
>> subscriptions back on at once.  Maybe that is not a terrible issue,
>> since one wouldn't normally have many subscriptions.
>>
>> Proposal: Dump all subscriptions in DISABLED state.  Remove current
>> pg_dump --no-subscription-connect option.
>
> +1. I like this a lot.

Oops, forgot to copy the list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Ashutosh Bapat

On Tue, Apr 11, 2017 at 8:11 PM, Tom Lane  wrote:
> Ashutosh Bapat  writes:
>> On Tue, Apr 11, 2017 at 3:53 AM, Tom Lane  wrote:
>> +* there is no need for EPQ recheck at a join (and Vars or Aggrefs in
>> +* the qual might not be available locally anyway).
>
>> I am not sure whether EPQ checks make sense for an upper relation, esp. a
>> grouped relation. So mentioning Aggref can be confusing here. For joins, we
>> execute EPQ by executing the (so called) outer plan created from 
>> fdw_outerpath.
>> For this, we fetch whole-row references for the joining relations and build 
>> the
>> output row by executing the local outer plan attached to the ForeignScanPlan.
>> This whole-row references has values for all Vars, so even though Vars are 
>> not
>> available, corresponding column values are available.  So mentioning Vars is
>> also confusing here.
>
> Well, my first attempt at fixing this merged remote_conds and remote_exprs
> together, which in the previous version of the code resulted in always
> passing the remote conditions as fdw_recheck_quals too.  And what happened
> was that I got "variable not found in subplan target list" errors for Vars
> used inside Aggrefs in pushed-to-the-remote HAVING clauses.  Which is
> unsurprising -- it'd be impossible to return such a Var if the grouping is
> being done remotely.  So I think it's important for this comment to
> explain that we *can't* put upperrel quals into fdw_recheck_quals, not just
> that "there's no need to".

The comments in the committed versions are good.

> But pointing out that at a join, there's a
> different mechanism that's responsible for EPQ checks is good.  I'll
> reword this again.
>
>> We can club the code to separate other and join clauses, checking that all 
>> join
>> clauses are shippable and separating other clauses into local and remote
>> clauses in a single list traversal as done in the attached patch.
>
> OK.  I was trying to be noninvasive, but this does look more sensible.
> I think this might read better if we inverted the test (so that
> the outer-join-joinclause-must-be-pushable case would come first).

Ok. In fact, thinking more about it, we should probably test
JOIN_INNER explicitly instead of !IS_OUTER_JOIN() since SEMI_JOINS are
not considered as outer joins and I am not sure how would we be
treating the quals for those.

>
> If we're going for a more-than-minimal patch, I'm inclined to also
> move the first loop in postgresGetForeignPlan into the "else" branch,
> so that it doesn't get executed in the join/upperrel case.  I realize
> that it's going to iterate zero times in that case, but it's just
> confusing to have it there when we don't expect it to do anything
> and we would only throw away the results if it did do anything.
>

Committed version looks quite clear.

I noticed that you committed the patch, while I was writing this mail.
Sending it anyway.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas

On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
 wrote:
> At first I was like 'WTF? Why do we need a new GUC just becase of an
> assert?' but you're actually not adding a new GUC parameter, you're adding a
> constant which is then used as a max value for max for the two existing
> parallel GUCs.
>
> I think this is fine.

I think it is pretty odd-looking.  As written, it computes an unsigned
-- and therefore necessarily non-negative -- value into a signed --
and thus possibly neative -- value only to pass it back to abs() to
make sure it's not negative:

+   Assert(!parallel ||
abs((int)(BackgroundWorkerData->parallel_register_count -
+
  BackgroundWorkerData->parallel_terminate_count)) <=
+   MAX_PARALLEL_WORKER_LIMIT);

I think we can just say

Assert(!parallel || BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [sqlsmith] ERROR: badly formatted node string "RESTRICTINFO...

2017-04-11 Thread Tom Lane

Andreas Seltenreich  writes:
> I see the above ERROR logged a lot when testing master at eef8c0069e
> with a postgres_fdw around.  Below is a recipe to reproduce it on top of
> the regression DB.

I've pushed a fix that should get you past that problem, although
I suspect we still have some issues with treatment of parallel-unsafe
subplans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Problem in Parallel Bitmap Heap Scan?

2017-04-11 Thread Robert Haas

On Sun, Apr 9, 2017 at 11:17 PM, Noah Misch  wrote:
> The above-described topic is currently a PostgreSQL 10 open item.

I have committed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Tom Lane

Robert Haas  writes:
> On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
>  wrote:
>> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
>> combined 19m52s of that time.

> I don't think it's quite fair to complain about the TAP tests taking a
> long time to run as a general matter.  Many people here have long
> wanted a separate test-suite for long running tests that can be run to
> really check everything possible, and apparently, TAP tests are it.

> What I think would be more useful is to break down where the time is
> getting spent.  It may be that some of those tests are not adding
> value proportionate to their runtime.

The other thing that might be useful here is to push on parallelizing
buildfarm runs.  Admittedly that will do nothing for the oldest and
slowest buildfarm critters, but for reasonably modern hardware the
serialization of the tests really handicaps you.  We seem to have
fixed that for manual application of "make check-world", at least
if you know the right magic incantations to parallelize it; but
AFAIK the buildfarm script is entirely serial.

BTW, speaking of "value proportionate to runtime", the hash_index
regression script is now visibly a bottleneck in the core regression
tests.  Could we take another look at whether that needs to run
quite so long?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TAP tests take a long time

2017-04-11 Thread Robert Haas

On Tue, Apr 11, 2017 at 11:32 AM, Andrew Dunstan
 wrote:
> I've complained about this before. Below are some timings from buildfarm
> member nightjar as I test out the new client code.
>
> This buildfarm run as you can see takes 33m32s, and the Tap tests take a
> combined 19m52s of that time. That seems quite an inordinate amount of
> time, when checking out the code, configuring, building and running all
> the other checks in two locales in many cases consumes 13m40s. If people
> want to incorporate these into the check-world targets and then
> encourage developers to run check-world, there will be a massive fail if
> you can take a shower and make and drink a cup of coffee in the time it
> takes to run it.

I don't think it's quite fair to complain about the TAP tests taking a
long time to run as a general matter.  Many people here have long
wanted a separate test-suite for long running tests that can be run to
really check everything possible, and apparently, TAP tests are it.

What I think would be more useful is to break down where the time is
getting spent.  It may be that some of those tests are not adding
value proportionate to their runtime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread David G. Johnston

On Tue, Apr 11, 2017 at 8:33 AM, Tom Lane  wrote:

> Peter Eisentraut  writes:
> > On 4/10/17 23:22, Tom Lane wrote:
> >> Personally I'd err on the side of "starting up degraded is better than
> >> not starting at all".  Or maybe we should invent a GUC to let DBAs
> >> express their preference on that?
>
> > If we defaulted allow_degraded to yes, then users wouldn't find that
> > setting until they do start up degraded and want to fix things, in which
> > case they could just fix the settings that caused the degraded startup
> > in the first place.
>
> > If we defaulted to no, then I don't think any user would go in and
> > change it.  "Sure, I'll allow degraded startup.  That sounds useful."
>
> Well, they would change it when their server failed to start and they
> needed to start it rather than just rebuild from backups.  I'd be fine
> with defaulting it off.  I just don't want "can't make a loopback socket"
> to be equivalent to "you're screwed and you'll never see your data again".
>

A potential middle-ground is to start, but then only allow superuser
connections.  At least then if the configuration problem is sitting
postgresql.conf.auto the superuser can issue ALTER SYSETM to fix it; and
can be reassured that worse case they can at least see their data.

If that was a soft setting they could also run a function to enable normal
access if they so choose.  In effect its a "default to off" mode with an
easy way to force the system to become live - but without a GUC (so they
couldn't make the decision permanent...which seems like a feature)

David J.

Re: [HACKERS] Reversed sync check in pg_receivewal

2017-04-11 Thread Magnus Hagander

On Tue, Apr 11, 2017 at 3:53 PM, Tom Lane  wrote:

> Magnus Hagander  writes:
> > On Tue, Apr 11, 2017 at 3:19 PM, Tom Lane  wrote:
> >> I think the patch is correct, but if there's any documentation of the
> >> walmethod APIs that would allow one to assert which side of the API got
> >> this wrong, I sure don't see it.  Would it be unreasonable to insist
> >> on some documentation around that?
>
> > Would you say comments in the struct in walmethods.h is enough, or were
> you
> > thinking actual sgml docs when you commented that?
>
> This is just internal to pg_basebackup isn't it?  I think comments in
> walmethods.h would be plenty.
>

Local to pg_basebackup and pg_receivewal, yes.

Something like the attached?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 
diff --git a/src/bin/pg_basebackup/walmethods.h b/src/bin/pg_basebackup/walmethods.h
index 8d679da..a3316a6 100644
--- a/src/bin/pg_basebackup/walmethods.h
+++ b/src/bin/pg_basebackup/walmethods.h
@@ -19,18 +19,50 @@ typedef enum
 	CLOSE_NO_RENAME
 }	WalCloseMethod;
 
+/*
+ * A WalWriteMethod structure represents the different methods used
+ * to write the streaming WAL as it's received.
+ *
+ * All methods that have a failure path will set errno on failure.
+ */
 typedef struct WalWriteMethod WalWriteMethod;
 struct WalWriteMethod
 {
+	/* Open a target file. Returns Walfile, or NULL if open failed. */
 	Walfile(*open_for_write) (const char *pathname, const char *temp_suffix, size_t pad_to_size);
+
+	/*
+	 * Close an open Walfile, using one or more methods for handling
+	 * automatic unlinking etc. Returns 0 on success, other values
+	 * for error.
+	 */
 	int			(*close) (Walfile f, WalCloseMethod method);
+
+	/* Check if a file exist */
 	bool		(*existsfile) (const char *pathname);
+
+	/* Return the size of a file, or -1 on failure. */
 	ssize_t		(*get_file_size) (const char *pathname);
 
+	/*
+	 * Write count number of bytes to the file, and return the number
+	 * of bytes actually written or -1 for error.
+	 */
 	ssize_t		(*write) (Walfile f, const void *buf, size_t count);
+
+	/* Return the current position in a file or -1 on error */
 	off_t		(*get_current_pos) (Walfile f);
+
+	/*
+	 * fsync the contents of the specified file. Returns 0 on
+	 * success.
+	 */
 	int			(*sync) (Walfile f);
+
+	/* Clean up the Walmethod, closing any shared resources */
 	bool		(*finish) (void);
+
+	/* Return a text for the last error in this Walfile */
 	char	   *(*getlasterror) (void);
 };
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Merge join for GiST

2017-04-11 Thread Alexander Korotkov

On Tue, Apr 11, 2017 at 5:46 PM, Jeff Davis  wrote:

> On Tue, Apr 11, 2017 at 2:17 AM, Alexander Korotkov
>  wrote:
> > FYI, I've implemented this algorithm for pgsphere.  See following branch.
> > https://github.com/akorotkov/pgsphere/tree/experimental
> > It's implemented as crossmatch() function which takes as arguments names
> of
> > two indexes over spoint and maximum distance (it checks not overlapping
> but
> > proximity of points).  This function returns setof pairs of TIDs.
> >
> > Later, Dmitry Ivanov made it a custom scan node.
> > https://github.com/akorotkov/pgsphere/tree/crossmatch_cnode
> >
> > You also can find some experimental evaluation here:
> > http://www.adass2016.inaf.it/images/presentations/10_Korotkov.pdf
>
> Do you have a sense of how this might compare with range merge join?
>

If you have GiST indexes over ranges for both sides of join, then this
method could be used for range join.  Hence, it could be compared with
range merge join.
However, particular implementation in pgsphere uses hardcoded datatypes and
operations.
Thus, for range join we need either generalized version of GiST-based join
or special implementation for ranges.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] error handling in RegisterBackgroundWorker

2017-04-11 Thread Tom Lane

Peter Eisentraut  writes:
> On 4/10/17 23:22, Tom Lane wrote:
>> Personally I'd err on the side of "starting up degraded is better than
>> not starting at all".  Or maybe we should invent a GUC to let DBAs
>> express their preference on that?

> If we defaulted allow_degraded to yes, then users wouldn't find that
> setting until they do start up degraded and want to fix things, in which
> case they could just fix the settings that caused the degraded startup
> in the first place.

> If we defaulted to no, then I don't think any user would go in and
> change it.  "Sure, I'll allow degraded startup.  That sounds useful."

Well, they would change it when their server failed to start and they
needed to start it rather than just rebuild from backups.  I'd be fine
with defaulting it off.  I just don't want "can't make a loopback socket"
to be equivalent to "you're screwed and you'll never see your data again".

> I think there is no clear agreement here, and no historically consistent
> behavior.  I'm prepared to let it go and cross it off the list of open
> items.  I believe we should keep thinking about it, but it's not
> something that has to hold up beta.

Agreed, this doesn't seem like a must-fix-for-beta consideration.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 177 matches

Mail list logo