date:20180119

Re: Test-cases for exclusion constraints is missing in alter_table.sql file

2018-01-19 Thread Amit Kapila

On Wed, Jan 17, 2018 at 1:04 PM, Ashutosh Sharma  wrote:
> Hi All,
>
> While working on exclusion constraints for one of our internal
> project, I noticed that there is no test-case for exclusion
> constraints in alter_table.sql file. However, for other constraints i
> could see lots of test-cases in alter_table.sql. There are hardly 1-2
> test-cases for exclusion constraint using ALTER TABLE command (1 in
> constraints.source and 1 for partitioned table in alter_table.sql).
> Shouldn't we consider adding few test-cases for exclusion constraints
> in alter_table.sql considering that we have added lots of test-cases
> for other constraints in this file. Thoughts?
>

Adding test cases will generally be meaningful if there doesn't exist
any test to cover the specific code your new test wants to cover.  I
think before writing/submitting tests, you can once verify if there
already exists some tests which cover the code you want to target with
new tests.  You can refer code coverage report
@https://coverage.postgresql.org/ or else you can try to generate a
fresh code coverage report.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Amit Kapila

On Sat, Jan 20, 2018 at 7:03 AM, Peter Geoghegan  wrote:
> On Thu, Jan 18, 2018 at 5:53 PM, Peter Geoghegan  wrote:
>
> Attached patch details:
>
> * The patch synchronizes processes used the approach just described.
> Note that this allowed me to remove several #include statements within
> tuplesort.c.
>
> * The patch uses only a single condition variable for a single wait
> within nbtsort.c, for the leader. No barriers are used at all (and, as
> I said, tuplesort.c doesn't use condition variables anymore). Since
> things are now very simple, I can't imagine anyone still arguing for
> the use of barriers.
>
> Note that I'm still relying on nworkers_launched as the single source
> of truth on the number of participants that can be expected to
> eventually show up (even if they end up doing zero real work). This
> should be fine, because I think that it will end up being formally
> guaranteed to be reliable by the work currently underway from Robert
> and Amit. But even if I'm wrong about things going that way, and it
> turns out that the leader needs to decide how many putative launched
> workers don't "get lost" due to fork() failure (the approach which
> Amit apparently advocates), then there *still* isn't much that needs
> to change.
>
> Ultimately, the leader needs to have the exact number of workers that
> participated, because that's fundamental to the tuplesort approach to
> parallel sort.
>

I think I can see why this patch needs that.  Is it mainly for the
work you are doing in _bt_leader_heapscan where you are waiting for
all the workers to be finished?

 If necessary, the leader can just figure it out in
> whatever way it likes at one central point within nbtsort.c, before
> the leader calls its main spool's tuplesort_begin_index_btree() --
> that can happen fairly late in the process. Actually doing that (and
> not just using nworkers_launched) seems questionable to me, because it
> would be almost the first thing that the leader would do after
> starting parallel mode -- why not just have the parallel
> infrastructure do it for us, and for everyone else?
>

I think till now we don't have any such requirement, but if it is must
for this patch, then I don't think it is tough to do that.  We need to
write an API WaitForParallelWorkerToAttach() and then call for each
launched worker or maybe WaitForParallelWorkersToAttach() which can
wait for all workers to attach and report how many have successfully
attached.   It will have functionality of
WaitForBackgroundWorkerStartup and additionally it needs to check if
the worker is attached to the error queue.  We already have similar
API (WaitForReplicationWorkerAttach) for logical replication workers
as well.  Note that it might have a slight impact on the performance
because with this you need to wait for the workers to startup before
doing any actual work, but I don't think it should be noticeable for
large operations especially for operations like parallel create index.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> Well, I'm a little stumped.  When I do it the way you did it, it fails
> with the same error you got:

> contrib_regression=# EXPLAIN (VERBOSE, COSTS OFF)
> SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> AND ft1.c1 = ft5.c1 FOR UPDATE;
> ERROR:  outer pathkeys do not match mergeclauses

> But when I add the same command to postgres_fdw.sql and run it as part
> of the regression tests, it works.  I tried adding it at the end with
> \c beforehand so that there wouldn't be any temporary settings in
> effect.

I duplicated those results, and then added an ANALYZE, and then it fails:

***
*** 7548,7550 
--- 7595,7604 
  (4 rows)

  RESET enable_partition_wise_join;
+ \c -
+ analyze "S 1"."T 1";
+ -- multi-way join involving multiple merge joins
+ EXPLAIN (VERBOSE, COSTS OFF)
+ SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
+ AND ft1.c1 = ft5.c1 FOR UPDATE;
+ ERROR:  outer pathkeys do not match mergeclauses

So apparently the reason our manual tests replicated the failure is
that an auto-analyze happened in between.  Now that I realize that,
I find that the manual query doesn't fail if executed *immediately*
after the regression test run.  Wait a minute, repeat, it does fail.

OTOH, put the same ANALYZE before the test case where it is in the
script, no change.

So the contributing factors here are (1) there are a bunch of data
changes to "S 1"."T 1" further down in the script than where you
added the test case, and (2) you need an auto-analyze to have seen
those changes before the problematic plan gets picked.

It looks like Etsuro-san's proposed patch locks down the choice of
plan more tightly, which is probably a reasonable answer.

regards, tom lane

Re: MCV lists for highly skewed distributions

2018-01-19 Thread John Naylor

I wrote:
> FWIW, I suspect that a solution
> that doesn't take into account a metric like coefficient of variation
> will have the wrong behavior sometimes, whether for highly uniform or
> highly non-uniform distributions.

By this I meant the coefficient of variation of the class size in the
sample, as denoted by gamma in the Haas and Stokes paper on page 7.

-John Naylor

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Peter Geoghegan

On Fri, Jan 19, 2018 at 8:44 PM, Amit Kapila  wrote:
> Right, but I think using parallel_leader_participation, you can do it
> reliably and probably write some regression tests which can complete
> in a predictable time.

Do what reliably? Guarantee that the leader will not participate as a
worker, but that workers will be used? If so, yes, you can get that.

The only issue is that you may not be able to launch parallel workers
due to hitting a limit like max_parallel_workers, in which case you'll
get a serial index build despite everything. Nothing we can do about
that, though.

-- 
Peter Geoghegan

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Amit Kapila

On Sat, Jan 20, 2018 at 8:33 AM, Peter Geoghegan  wrote:
> On Fri, Jan 19, 2018 at 6:52 PM, Amit Kapila  wrote:
>
>> BTW, is there any other way for "parallel create index" to force that
>> the work is done by workers?  I am insisting on having something which
>> can test the code path in workers because we have found quite a few
>> bugs using that idea.
>
> I agree that this is essential (more so than supporting
> parallel_leader_participation). You can use the parallel_workers table
> storage parameter for this. When the storage param has been set, we
> don't care about the amount of memory available to each worker. You
> can stress-test the implementation as needed. (The storage param does
> care about max_parallel_maintenance_workers, but you can set that as
> high as you like.)
>

Right, but I think using parallel_leader_participation, you can do it
reliably and probably write some regression tests which can complete
in a predictable time.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2018-01-19 Thread Masahiko Sawada

On Sun, Jan 7, 2018 at 11:26 PM, Masahiko Sawada  wrote:
> On Fri, Jan 5, 2018 at 1:39 AM, Robert Haas  wrote:
>> On Tue, Jan 2, 2018 at 1:09 AM, Mithun Cy  wrote:
>>> So in case of  N_RELEXTLOCK_ENTS = 1 we can see regression as high 25%. ?
>
> Thank you for the performance measurement!
>
>> So now the question is: what do these results mean for this patch?
>>
>> I think that the chances of someone simultaneously bulk-loading 16 or
>> more relations that all happen to hash to the same relation extension
>> lock bucket is pretty darn small.  Most people aren't going to be
>> running 16 bulk loads at the same time in the first place, and if they
>> are, then there's a good chance that at least some of those loads are
>> either actually to the same relation, or that many or all of the loads
>> are targeting the same filesystem and the bottleneck will occur at
>> that level, or that the loads are to relations which hash to different
>> buckets.  Now, if we want to reduce the chances of hash collisions, we
>> could boost the default value of N_RELEXTLOCK_ENTS to 2048 or 4096.
>>
>> However, if we take the position that no hash collision probability is
>> low enough and that we must eliminate all chance of false collisions,
>> except perhaps when the table is full, then we have to make this
>> locking mechanism a whole lot more complicated.  We can no longer
>> compute the location of the lock we need without first taking some
>> other kind of lock that protects the mapping from {db_oid, rel_oid} ->
>> {memory address of the relevant lock}.  We can no longer cache the
>> location where we found the lock last time so that we can retake it.
>> If we do that, we're adding extra cycles and extra atomics and extra
>> code that can harbor bugs to every relation extension to guard against
>> something which I'm not sure is really going to happen.  Something
>> that's 3-8% faster in a case that occurs all the time and as much as
>> 25% slower in a case that virtually never arises seems like it might
>> be a win overall.
>>
>> However, it's quite possible that I'm not seeing the whole picture
>> here.  Thoughts?
>>
>
> I agree that the chances of the case where through-put got worse is
> pretty small and we can get performance improvement in common cases.
> Also, we could mistakenly overestimate the number of blocks we need to
> add by false collisions. Thereby the performance might got worse and
> we extend a relation more than necessary but I think the chances are
> small. Considering the further parallel operations (e.g. parallel
> loading, parallel index creation etc) multiple processes will be
> taking a relext lock of the same relation. Thinking of that, the
> benefit of this patch that improves the speeds of acquiring/releasing
> the lock would be effective.
>
> In short I personally think the current patch is simple and the result
> is not a bad. But If community cannot accept these degradations we
> have to deal with the problem. For example, we could make the length
> of relext lock array configurable by users. That way, users can reduce
> the possibility of collisions. Or we could improve the relext lock
> manager to eliminate false collision by changing it to a
> open-addressing hash table. The code would get complex but false
> collisions don't happen unless the array is not full.
>

On second thought, perhaps we should also do performance measurement
with the patch that uses HTAB instead a fixed array. Probably the
performance with that patch will be equal to or slightly greater than
current HEAD, hopefully not be worse. In addition to that, if the
performance degradation by false collision doesn't happen or we can
avoid it by increasing GUC parameter, I think it's better than current
fixed array approach. Thoughts?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2018-01-19 Thread Amit Kapila

On Thu, Jan 18, 2018 at 8:53 PM, Robert Haas  wrote:
> On Thu, Dec 21, 2017 at 9:13 AM, Amit Kapila  wrote:
>> I am not against using the way specific to parallel context layer as
>> described by you above.   However, I was trying to see if there is
>> some general purpose solution as the low-impact way is not very
>> straightforward.  I think you can go ahead with the way you have
>> described to fix the hole I was pointing to and I can review it or I
>> can also give it a try if you want to.
>
> See attached.
>

The patch doesn't apply cleanly on the head, but after rebasing it, I
have reviewed and tested it and it seems to be working fine.  Apart
from this specific issue, I think we should consider making
nworkers_launched reliable (at the very least for cases where it
matters).  You seem to be of opinion that can be a source of subtle
bugs which I don't deny but now I think we are starting to see some
use cases of such a mechanism as indicated by Peter G. in parallel
create index thread.  Even, if we find some way for parallel create
index to not rely on that assumption, I won't be surprised if some
future parallelism work would have such a requirement.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Amit Kapila

On Sat, Jan 20, 2018 at 2:57 AM, Thomas Munro
 wrote:
> On Sat, Jan 20, 2018 at 6:32 AM, Robert Haas  wrote:
>> On Fri, Jan 19, 2018 at 12:16 PM, Peter Geoghegan  wrote:
>>> Clarity on what I should do about parallel_leader_participation in the
>>> next revision would be useful at this point. You seem to either want
>>> me to remove it from consideration entirely, or to remove the code
>>> that specifically disallows a "degenerate parallel CREATE INDEX". I
>>> need a final answer on that.
>>
>> Right.  I do think that we should do one of those things, and I lean
>> towards removing it entirely, but I'm not entirely sure.Rather
>> than making an executive decision immediately, I'd like to wait a few
>> days to give others a chance to comment. I am hoping that we might get
>> some other opinions, especially from Thomas who implemented
>> parallel_leader_participation, or maybe Amit who has been reviewing
>> recently, or anyone else who is paying attention to this thread.
>
> Well, I see parallel_leader_participation as having these reasons to exist:
>
> 1.  Gather could in rare circumstances not run the plan in the leader.
> This can hide bugs.  It's good to be able to force that behaviour for
> testing.
>

Or reverse is also possible which means the workers won't get chance
to run the plan in which case we can use parallel_leader_participation
= off to test workers behavior.  As said before, I see only that as
the reason to keep parallel_leader_participation in this patch.  If we
decide to do that way, then I think we should remove the code that
specifically disallows a "degenerate parallel CREATE INDEX" as that
seems to be confusing.   If we go this way, then I think we should use
the wording suggested by Robert in one of its email [1] to describe
the usage of parallel_leader_participation.

BTW, is there any other way for "parallel create index" to force that
the work is done by workers?  I am insisting on having something which
can test the code path in workers because we have found quite a few
bugs using that idea.

[1] - 
https://www.postgresql.org/message-id/CA%2BTgmoYN-YQU9JsGQcqFLovZ-C%2BXgp1_xhJQad%3DcunGG-_p5gg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 5:19 PM, Tomas Vondra
 wrote:
> Regarding the HOT issue - I have to admit I don't quite see why A2
> wouldn't be reachable through the index, but that's likely due to my
> limited knowledge of the HOT internals.

The index entries only point to the root tuple in the HOT chain.  Any
subsequent entries can only be reached by following the CTID pointers
(that's why they are called "Heap Only Tuples").  After T1 aborts,
we're still OK because the CTID link isn't immediately cleared.  But
after T2 updates the tuple, it makes A1's CTID link point to A3,
leaving no remaining link to A2.

Although in most respects PostgreSQL treats commits and aborts
surprisingly symmetrically, CTID links are an exception.  When T2
comes to A1, it sees that A1's xmax is T1 and checks the status of T1.
If T1 is still in progress, it waits.  If T2 has committed, it must
either abort with a serialization error or update A2 instead under
EvalPlanQual semantics, depending on the isolation level.  If T2 has
aborted, it assumes that the CTID field of T1 is garbage nobody cares
about, adds A3 to the page, and makes A1 point to A3 instead of A2.
No record of the A1->A2 link is kept anywhere *precisely because* A2
can no longer be visible to anyone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-19 Thread Tomas Vondra



On 01/19/2018 08:33 PM, Robert Haas wrote:
> On Fri, Jan 19, 2018 at 2:16 PM, Tomas Vondra
>  wrote:
>> I think an important piece of this puzzle is that we only really care
>> about catalog changes made in a transaction that aborts after doing some
>> additional changes, with that catalog tuple in place. Because only then
>> we actually need that catalog tuple in order to interpret the changes.
>>
>> AFAICS that guarantees the catalog changes were not interrupted half-way
>> through, leaving some of the catalogs in inconsistent state.
> 
> Yeah, that may be true, and I alluded to it in the part you didn't 
> quote. However, it doesn't help with the second problem I mentioned, 
> which looks to me to be a fatal problem.
> 

Ah, OK - I didn't realize "using a command ID value that was reached
prior to the error" refers to this scenario.

Regarding the HOT issue - I have to admit I don't quite see why A2
wouldn't be reachable through the index, but that's likely due to my
limited knowledge of the HOT internals.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-01-19 Thread Tomas Vondra

On 01/19/2018 03:34 PM, Tomas Vondra wrote:
> Attached is v5, fixing a silly bug in part 0006, causing segfault when
> creating a subscription.
> 

Meh, there was a bug in the sgml docs ( vs. ),
causing another failure. Hopefully v6 will pass the CI build, it does
pass a build with the same parameters on my system.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v6.patch.gz
Description: application/gzip


0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v6.patch.gz
Description: application/gzip


0003-Issue-individual-invalidations-with-wal_level-log-v6.patch.gz
Description: application/gzip


0004-Extend-the-output-plugin-API-with-stream-methods-v6.patch.gz
Description: application/gzip


0005-Implement-streaming-mode-in-ReorderBuffer-v6.patch.gz
Description: application/gzip


0006-Add-support-for-streaming-to-built-in-replication-v6.patch.gz
Description: application/gzip


0007-Track-statistics-for-streaming-spilling-v6.patch.gz
Description: application/gzip


0008-BUGFIX-make-sure-subxact-is-marked-as-is_known_as-v6.patch.gz
Description: application/gzip


0009-BUGFIX-set-final_lsn-for-subxacts-before-cleanup-v6.patch.gz
Description: application/gzip

Re: non-bulk inserts and tuple routing

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 3:56 AM, Amit Langote
 wrote:
> I rebased the patches, since they started conflicting with a recently
> committed patch [1].

I think that my latest commit has managed to break this pretty thoroughly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PATCH: Configurable file mode mask

2018-01-19 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 1/19/18 14:07, David Steele wrote:
> > I have yet to add tests for pg_rewindwal and pg_upgrade.  pg_rewindwal
> > doesn't *have* any tests as far as I can tell and pg_upgrade has tests
> > in a shell script -- it's not clear how I would extend it or reuse the
> > Perl code for perm testing.
> > 
> > Does anyone have suggestions on tests for pg_resetwal and pg_upgrade?
> > Should I start from scratch?
> 
> A test suite for pg_resetwal would be nice.
> 
> TAP tests for pg_upgrade will create problems with the build farm.
> There was a recent thread about that.

Is this about this commit?

commit 58ffe141eb37c3f027acd25c1fc6b36513bf9380
Author: Peter Eisentraut 
AuthorDate: Fri Sep 22 16:34:46 2017 -0400
CommitDate: Fri Sep 22 16:46:56 2017 -0400

Revert "Add basic TAP test setup for pg_upgrade"

This reverts commit f41e56c76e39f02bef7ba002c9de03d62b76de4d.

The build farm client would run the pg_upgrade tests twice, once as part
of the existing pg_upgrade check run and once as part of picking up all
TAP tests by looking for "t" directories.  Since the pg_upgrade tests
are pretty slow, we will need a better solution or possibly a build farm
client change before we can proceed with this.

If the only problem is that buildfarm would run tests twice, then I
think we should just press forward with this regardless of that: it
seems a chicken-and-egg problem, because buildfarm cannot be upgraded to
use some new test method if the method doesn't exist yet.  As a
solution, let's just live with some run duplication for a period until
the machines are upgraded to a future buildfarm client code.

If there are other problems, let's see what they are so that we can fix
them.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Configurable file mode mask

2018-01-19 Thread Peter Eisentraut

On 1/19/18 14:07, David Steele wrote:
> I have yet to add tests for pg_rewindwal and pg_upgrade.  pg_rewindwal
> doesn't *have* any tests as far as I can tell and pg_upgrade has tests
> in a shell script -- it's not clear how I would extend it or reuse the
> Perl code for perm testing.
> 
> Does anyone have suggestions on tests for pg_resetwal and pg_upgrade?
> Should I start from scratch?

A test suite for pg_resetwal would be nice.

TAP tests for pg_upgrade will create problems with the build farm.
There was a recent thread about that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 19, 2018 at 2:54 PM, Tom Lane  wrote:
>> What do people think of just removing this DROP DATABASE restriction?
>> Arguably, superusers should know better than to drop template1 anyway.
>> Maybe we should replace it with a hard-wired check against dropping
>> template0 (matched by name) just to stave off the worst-case scenario.

> I think it's a little scary.  The tail might be wagging the dog at this point.

True, and having to make assumptions about which server version we're
restoring to is not very nice either.

After further reflection I've found a way to do this on the pg_dump
end that's not as ugly as I feared -- no uglier than most of the other
hacks there, anyway.  So nevermind ...

regards, tom lane

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 2:54 PM, Tom Lane  wrote:
> Hmm ... so there's a small problem with this idea of dropping and
> recreating template1:
>
> pg_restore: connecting to database for restore
> pg_restore: dropping DATABASE template1
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 3024; 1262 1 DATABASE 
> template1
>  postgres
> pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop a 
> templ
> ate database
> Command was: DROP DATABASE "template1";
>
> Now in principle we could hack around that by issuing "ALTER DATABASE
> ... IS_TEMPLATE false" first, but it turns out to be harder than you
> might think to wedge that into the pg_dump infrastructure.  (The
> natural way to do it, trying to add this into the dropCmd for the
> database TOC entry, fails because (a) DROP DATABASE then ends up as
> one part of an implicit transaction block, and (b) it confuses the heck
> out of pg_restore's --if-exists kluge.)
>
> You can actually exhibit this in current releases if you try "pg_dump
> --clean --create" on a user-created template database, so it's not
> solely the fault of this patch.
>
> What do people think of just removing this DROP DATABASE restriction?
> Arguably, superusers should know better than to drop template1 anyway.
> Maybe we should replace it with a hard-wired check against dropping
> template0 (matched by name) just to stave off the worst-case scenario.

I think it's a little scary.  The tail might be wagging the dog at this point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] UPDATE of partition key

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 4:37 AM, Amit Khandekar  wrote:
> Attached rebased patch.

Committed with a bunch of mostly-cosmetic revisions.  I removed the
macro you added, which has a multiple evaluation hazard, and just put
that logic back into the function.  I don't think it's likely to
matter for performance, and this way is safer.  I removed an inline
keyword from another static function as well; better to let the
compiler decide what to do.  I rearranged a few things to shorten some
long lines, too.  Aside from that I think all of the changes I made
were to comments and documentation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane

Hmm ... so there's a small problem with this idea of dropping and
recreating template1:

pg_restore: connecting to database for restore
pg_restore: dropping DATABASE template1
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 3024; 1262 1 DATABASE template1
 postgres
pg_restore: [archiver (db)] could not execute query: ERROR:  cannot drop a templ
ate database
Command was: DROP DATABASE "template1";

Now in principle we could hack around that by issuing "ALTER DATABASE
... IS_TEMPLATE false" first, but it turns out to be harder than you
might think to wedge that into the pg_dump infrastructure.  (The
natural way to do it, trying to add this into the dropCmd for the
database TOC entry, fails because (a) DROP DATABASE then ends up as
one part of an implicit transaction block, and (b) it confuses the heck
out of pg_restore's --if-exists kluge.)

You can actually exhibit this in current releases if you try "pg_dump
--clean --create" on a user-created template database, so it's not
solely the fault of this patch.

What do people think of just removing this DROP DATABASE restriction?
Arguably, superusers should know better than to drop template1 anyway.
Maybe we should replace it with a hard-wired check against dropping
template0 (matched by name) just to stave off the worst-case scenario.

regards, tom lane

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 2:16 PM, Tomas Vondra
 wrote:
> I think an important piece of this puzzle is that we only really care
> about catalog changes made in a transaction that aborts after doing some
> additional changes, with that catalog tuple in place. Because only then
> we actually need that catalog tuple in order to interpret the changes.
>
> AFAICS that guarantees the catalog changes were not interrupted half-way
> through, leaving some of the catalogs in inconsistent state.

Yeah, that may be true, and I alluded to it in the part you didn't
quote.  However, it doesn't help with the second problem I mentioned,
which looks to me to be a fatal problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-19 Thread Tomas Vondra

On 01/19/2018 06:54 PM, Robert Haas wrote:
> On Tue, Dec 26, 2017 at 9:21 AM, Nikhil Sontakke
>  wrote:
>> The main issue here is that HeapTupleSatisfiesVacuum *assumes* that
>> rows belonging to an aborted transaction are not visible to anyone
>> else.
> 
> One problem here is that if a transaction aborts, it might have done
> so after inserting or update a tuple in the heap and before inserting
> new index entries for the tuple, or after inserting only some of the
> necessary new index entries.  ... [snip] ...
>

I think an important piece of this puzzle is that we only really care
about catalog changes made in a transaction that aborts after doing some
additional changes, with that catalog tuple in place. Because only then
we actually need that catalog tuple in order to interpret the changes.

AFAICS that guarantees the catalog changes were not interrupted half-way
through, leaving some of the catalogs in inconsistent state.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra



On 01/19/2018 07:35 PM, Claudio Freire wrote:
> 
> 
> On Fri, Jan 19, 2018 at 2:22 PM, Tomas Vondra
> > wrote:
> 
> 
> 
> On 01/19/2018 06:13 PM, Claudio Freire wrote:
> >
> >
> > On Fri, Jan 19, 2018 at 2:07 PM, Konstantin Knizhnik
> > 
>  >> wrote:
> >
> >
> >
> >
> >         Well, I haven't said it has to be single-threaded like
> pgbouncer. I
> >         don't see why the bgworker could not use multiple threads
> >         internally (of
> >         course, it'd need to be not to mess the stuff that is not
> >         thread-safe).
> >
> >
> >     Certainly architecture with N multiple scheduling bgworkers and M
> >     executors (backends) may be more flexible
> >     than solution when scheduling is done in executor itself. But we
> >     will have to pay extra cost for redirection.
> >     I am not sure that finally it will allow to reach better
> performance.
> >     More flexible solution in many cases doesn't mean more efficient
> >     solution.
> >
> >
> > I think you can take the best of both worlds.
> >
> > You can take your approach of passing around fds, and build a "load
> > balancing protocol" in a bgworker.
> >
> > The postmaster sends the socket to the bgworker, the bgworker
> waits for
> > a command as pgbouncer does, but instead of proxying everything, when
> > commands arrive, it passes the socket to a backend to handle.
> >
> > That way, the bgworker can do what pgbouncer does, handle different
> > pooling modes, match backends to databases, etc, but it doesn't
> have to
> > proxy all data, it just delegates handling of a command to a backend,
> > and forgets about that socket.
> >
> > Sounds like it could work.
> >
> 
> How could it do all that without actually processing all the data? For
> example, how could it determine the statement/transaction boundaries?
> 
> 
> It only needs to determine statement/transaction start.
> 
> After that, it hands off the connection to a backend, and the
> backend determines when to give it back.
> 
> So instead of processing all the data, it only processes a tiny part of it.
> 

How exactly would the backend "give back" the connection? The only way
for the backend and pgbouncer to communicate is by embedding information
in the data stream. Which means pgbouncer still has to parse it.

Furthermore, those are not the only bits of information pgbouncer may
need. For example, if pgbouncer gets improved to handle prepared
statements (which is likely) it'd need to handle PARSE/BIND/EXECUTE. And
it already needs to handle SET parameters. And so on.

In any case, this discussion is somewhat off topic in this thread, so
let's not hijack it.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] GnuTLS support

2018-01-19 Thread Peter Eisentraut

Comparing the existing {be,fe}-secure-openssl.c with the proposed
{be,fe}-secure-gnutls.c, and with half an eye on the previously proposed
Apple Secure Transport implementation, I have identified a few more
areas of refactoring that should be done in order to avoid excessive
copy-and-pasting in the new implementations:

0001-Add-installcheck-support-to-more-test-suites.patch

This will help with interoperability testing, because you can then
create an installation with mixed SSL implementations and run the test
suite against it.

0002-Split-out-documentation-of-SSL-parameters-into-their.patch

Prepares and cleans up the documentation a bit before the addition of
new things, as discussed elsewhere.

0003-Move-EDH-support-to-common-files.patch

To avoid copy-and-paste, and also because the EDH explanation doesn't
really belong in a file header comment.  Maybe the whole thing is known
well enough nowadays that we can just remove the explanation.

0004-Move-SSL-API-comments-to-header-files.patch
0005-Extract-common-bits-from-OpenSSL-implementation.patch

Move copy-and-paste avoidance.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 5b5f568bf05b2424ceb522c0a6d2f94487fea3b7 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 19 Jan 2018 12:17:35 -0500
Subject: [PATCH 1/5] Add installcheck support to more test suites

---
 src/test/authentication/Makefile | 3 +++
 src/test/ldap/Makefile   | 3 +++
 src/test/recovery/Makefile   | 3 +++
 src/test/ssl/Makefile| 3 +++
 src/test/subscription/Makefile   | 3 +++
 5 files changed, 15 insertions(+)

diff --git a/src/test/authentication/Makefile b/src/test/authentication/Makefile
index a435b13057..218452ec76 100644
--- a/src/test/authentication/Makefile
+++ b/src/test/authentication/Makefile
@@ -16,5 +16,8 @@ include $(top_builddir)/src/Makefile.global
 check:
$(prove_check)
 
+installcheck:
+   $(prove_installcheck)
+
 clean distclean maintainer-clean:
rm -rf tmp_check
diff --git a/src/test/ldap/Makefile b/src/test/ldap/Makefile
index 50e3c17e95..fef5742b82 100644
--- a/src/test/ldap/Makefile
+++ b/src/test/ldap/Makefile
@@ -16,5 +16,8 @@ include $(top_builddir)/src/Makefile.global
 check:
$(prove_check)
 
+installcheck:
+   $(prove_installcheck)
+
 clean distclean maintainer-clean:
rm -rf tmp_check
diff --git a/src/test/recovery/Makefile b/src/test/recovery/Makefile
index aecf37d89a..daf79a0b1f 100644
--- a/src/test/recovery/Makefile
+++ b/src/test/recovery/Makefile
@@ -18,5 +18,8 @@ include $(top_builddir)/src/Makefile.global
 check:
$(prove_check)
 
+installcheck:
+   $(prove_installcheck)
+
 clean distclean maintainer-clean:
rm -rf tmp_check
diff --git a/src/test/ssl/Makefile b/src/test/ssl/Makefile
index 4886e901d0..4e9095529a 100644
--- a/src/test/ssl/Makefile
+++ b/src/test/ssl/Makefile
@@ -132,3 +132,6 @@ clean distclean maintainer-clean:
 
 check:
$(prove_check)
+
+installcheck:
+   $(prove_installcheck)
diff --git a/src/test/subscription/Makefile b/src/test/subscription/Makefile
index 25c48e470d..0f3d2098ad 100644
--- a/src/test/subscription/Makefile
+++ b/src/test/subscription/Makefile
@@ -18,5 +18,8 @@ EXTRA_INSTALL = contrib/hstore
 check:
$(prove_check)
 
+installcheck:
+   $(prove_installcheck)
+
 clean distclean maintainer-clean:
rm -rf tmp_check

base-commit: 4e54dd2e0a750352ce2a5c45d1cc9183e887eec3
-- 
2.15.1

From d9160fc3cfa0e9725c857469fc1f156452a77922 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 18 Jan 2018 19:12:05 -0500
Subject: [PATCH 2/5] Split out documentation of SSL parameters into their own
 section

Split the "Authentication and Security" section into two separate
sections "Authentication" and "SSL".  The latter part has gotten much
longer over time, and doesn't primarily have to do with authentication.
---
 doc/src/sgml/config.sgml   | 233 +
 src/backend/utils/misc/guc.c   |  38 +++
 src/include/utils/guc_tables.h |   3 +-
 3 files changed, 143 insertions(+), 131 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 37a61a13c8..907bf1471d 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -924,8 +924,9 @@ Connection Settings
 
  
  
- 
- Security and Authentication
+
+ 
+ Authentication
 
  
  
@@ -950,6 +951,123 @@ Security and Authentication
   
  
 
+ 
+  password_encryption (enum)
+  
+   password_encryption configuration 
parameter
+  
+  
+  
+   
+When a password is specified in  or
+, this parameter determines the 
algorithm
+to use to encrypt the password. The default value is 
md5,
+which stores the password as an MD5 hash (on is also
+accepted, as alias for md5).

Re: Built-in connection pooling

2018-01-19 Thread Claudio Freire

On Fri, Jan 19, 2018 at 2:22 PM, Tomas Vondra 
wrote:

>
>
> On 01/19/2018 06:13 PM, Claudio Freire wrote:
> >
> >
> > On Fri, Jan 19, 2018 at 2:07 PM, Konstantin Knizhnik
> > > wrote:
> >
> >
> >
> >
> > Well, I haven't said it has to be single-threaded like
> pgbouncer. I
> > don't see why the bgworker could not use multiple threads
> > internally (of
> > course, it'd need to be not to mess the stuff that is not
> > thread-safe).
> >
> >
> > Certainly architecture with N multiple scheduling bgworkers and M
> > executors (backends) may be more flexible
> > than solution when scheduling is done in executor itself. But we
> > will have to pay extra cost for redirection.
> > I am not sure that finally it will allow to reach better performance.
> > More flexible solution in many cases doesn't mean more efficient
> > solution.
> >
> >
> > I think you can take the best of both worlds.
> >
> > You can take your approach of passing around fds, and build a "load
> > balancing protocol" in a bgworker.
> >
> > The postmaster sends the socket to the bgworker, the bgworker waits for
> > a command as pgbouncer does, but instead of proxying everything, when
> > commands arrive, it passes the socket to a backend to handle.
> >
> > That way, the bgworker can do what pgbouncer does, handle different
> > pooling modes, match backends to databases, etc, but it doesn't have to
> > proxy all data, it just delegates handling of a command to a backend,
> > and forgets about that socket.
> >
> > Sounds like it could work.
> >
>
> How could it do all that without actually processing all the data? For
> example, how could it determine the statement/transaction boundaries?
>

It only needs to determine statement/transaction start.

After that, it hands off the connection to a backend, and the backend
determines when to give it back.

So instead of processing all the data, it only processes a tiny part of it.

Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra



On 01/19/2018 06:07 PM, Konstantin Knizhnik wrote:
> 
> ...
>
 3) Is there any sort of shrinking the pools? I mean, if the backend is
 idle for certain period of time (or when we need backends for other
 databases), does it get closed automatically?

>>> When client is disconnected, client session is closed. But backen is not
>>> terminated even if there are no more sessions at this backend.
>>> It  was done intentionally, to avoid permanent spawning of new processes
>>> when there is one or few clients which frequently connect/disconnect to
>>> the database.
>>>
>> Sure, but it means a short peak will exhaust the backends indefinitely.
>> That's acceptable for a PoC, but I think needs to be fixed eventually.
>>
> Sorry, I do not understand it.
> You specify size of backends pool which will server client session.
> Size of this pool is chosen to provide the best performance at the
> particular system and workload.
> So number of backends will never exceed this optimal value even in case
> of "short peak".
> From my point of view terminating backends when there are no active
> sessions is wrong idea in any case, it was not temporary decision just
> for PoC.
> 

That is probably true when there is just a single pool (for one
database/user). But when there are multiple such pools, it forces you to
keep the sum(pool_size) below max_connections. Which seems strange.

I do think the ability to evict backends after some timeout, or when
there is pressure in other pools (different user/database) is rather useful.

>>
>> Well, I haven't said it has to be single-threaded like pgbouncer. I
>> don't see why the bgworker could not use multiple threads internally (of
>> course, it'd need to be not to mess the stuff that is not thread-safe).
>>
> 
> Certainly architecture with N multiple scheduling bgworkers and M
> executors (backends) may be more flexible
> than solution when scheduling is done in executor itself. But we will
> have to pay extra cost for redirection.
>
> I am not sure that finally it will allow to reach better performance.
> More flexible solution in many cases doesn't mean more efficient solution.
> 

Sure, I wasn't really suggesting it's a clear win. I was responding to
your argument that pgbouncer in some cases reaches 100% CPU utilization
- that can be mitigated to a large extent by adding threads. Of course,
the cost for extra level of indirection is not zero.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 20:03, Claudio Freire wrote:



On Fri, Jan 19, 2018 at 1:53 PM, Konstantin Knizhnik 
> wrote:




On 19.01.2018 19:28, Pavel Stehule wrote:



When I've been thinking about adding a built-in
connection pool, my
rough plan was mostly "bgworker doing something like
pgbouncer" (that
is, listening on a separate port and proxying everything
to regular
backends). Obviously, that has pros and cons, and
probably would not
work serve the threading use case well.


And we will get the same problem as with pgbouncer: one
process will not be able to handle all connections...
Certainly it is possible to start several such scheduling
bgworkers... But in any case it is more efficient to
multiplex session in backend themselves.


pgbouncer hold all time client connect. When we implement the
listeners, then all work can be done by worker processes not by
listeners.



Sorry, I do not understand your point.
In my case pgbench establish connection to the pgbouncer only 
once at the beginning of the test.
And pgbouncer spends all time in context switches (CPU usage is
100% and it is mostly in kernel space: top of profile are kernel
functions).
The same picture will be if instead of pgbouncer you will do such
scheduling in one bgworker.
For the modern systems are not able to perform more than several
hundreds of connection switches per second.
So with single multiplexing thread or process you can not get
speed more than 100k, while at powerful NUMA system it is possible
to achieve millions of TPS.
It is illustrated by the results I have sent in the previous mail:
by spawning 10 instances of pgbouncer I was able to receive 7
times bigger speed.


I'm sure pgbouncer can be improved. I've seen async code handle 
millions of packets per second (zmq), pgbouncer shouldn't be radically 
different.



With pgbouncer you will never be able to use prepared statements which 
slows down simple queries almost twice (unless my patch with 
autoprepared statements is committed).


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Peter Geoghegan

On Fri, Jan 19, 2018 at 4:52 AM, Robert Haas  wrote:
>> Other than that, looks good to me. It's a simple patch with a clear purpose.
>
> Committed.

Cool.

Clarity on what I should do about parallel_leader_participation in the
next revision would be useful at this point. You seem to either want
me to remove it from consideration entirely, or to remove the code
that specifically disallows a "degenerate parallel CREATE INDEX". I
need a final answer on that.

-- 
Peter Geoghegan

Re: Built-in connection pooling

2018-01-19 Thread Claudio Freire

On Fri, Jan 19, 2018 at 2:07 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
>
>> Well, I haven't said it has to be single-threaded like pgbouncer. I
>> don't see why the bgworker could not use multiple threads internally (of
>> course, it'd need to be not to mess the stuff that is not thread-safe).
>>
>>
> Certainly architecture with N multiple scheduling bgworkers and M
> executors (backends) may be more flexible
> than solution when scheduling is done in executor itself. But we will have
> to pay extra cost for redirection.
> I am not sure that finally it will allow to reach better performance.
> More flexible solution in many cases doesn't mean more efficient solution.

I think you can take the best of both worlds.

You can take your approach of passing around fds, and build a "load
balancing protocol" in a bgworker.

The postmaster sends the socket to the bgworker, the bgworker waits for a
command as pgbouncer does, but instead of proxying everything, when
commands arrive, it passes the socket to a backend to handle.

That way, the bgworker can do what pgbouncer does, handle different pooling
modes, match backends to databases, etc, but it doesn't have to proxy all
data, it just delegates handling of a command to a backend, and forgets
about that socket.

Sounds like it could work.

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 20:01, Pavel Stehule wrote:



2018-01-19 17:53 GMT+01:00 Konstantin Knizhnik 
>:




On 19.01.2018 19:28, Pavel Stehule wrote:



When I've been thinking about adding a built-in
connection pool, my
rough plan was mostly "bgworker doing something like
pgbouncer" (that
is, listening on a separate port and proxying everything
to regular
backends). Obviously, that has pros and cons, and
probably would not
work serve the threading use case well.


And we will get the same problem as with pgbouncer: one
process will not be able to handle all connections...
Certainly it is possible to start several such scheduling
bgworkers... But in any case it is more efficient to
multiplex session in backend themselves.


pgbouncer hold all time client connect. When we implement the
listeners, then all work can be done by worker processes not by
listeners.



Sorry, I do not understand your point.
In my case pgbench establish connection to the pgbouncer only 
once at the beginning of the test.
And pgbouncer spends all time in context switches (CPU usage is
100% and it is mostly in kernel space: top of profile are kernel
functions).
The same picture will be if instead of pgbouncer you will do such
scheduling in one bgworker.
For the modern systems are not able to perform more than several
hundreds of connection switches per second.
So with single multiplexing thread or process you can not get
speed more than 100k, while at powerful NUMA system it is possible
to achieve millions of TPS.
It is illustrated by the results I have sent in the previous mail:
by spawning 10 instances of pgbouncer I was able to receive 7
times bigger speed.


pgbouncer is proxy sw. I don't think so native pooler should be proxy 
too. So the compare pgbouncer with hypothetical native pooler is not 
fair, because pgbouncer pass all communication


If we will have separate scheduling bgworker(s) as Tomas proposed, then 
in any case we will have to do some kind of redirection.
It can be done in more efficient way than using Unix sockets (as it is 
in case of locally installed pgbouncer), but even if we use shared 
memory queue then
performance will be comparable and limited by number of context 
switches. It is possible to increase it by combining several requests 
into one parcel.
But it even more complicate communication protocol between clients, 
scheduling proxies and executors.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Built-in connection pooling

2018-01-19 Thread Claudio Freire

On Fri, Jan 19, 2018 at 2:06 PM, Tomas Vondra 
wrote:

>
>
> On 01/19/2018 06:03 PM, Claudio Freire wrote:
> >
> >
> > On Fri, Jan 19, 2018 at 1:53 PM, Konstantin Knizhnik
> > > wrote:
> >
> >
> >
> > On 19.01.2018 19:28, Pavel Stehule wrote:
> >>
> >>
> >> When I've been thinking about adding a built-in connection
> >> pool, my
> >> rough plan was mostly "bgworker doing something like
> >> pgbouncer" (that
> >> is, listening on a separate port and proxying everything
> >> to regular
> >> backends). Obviously, that has pros and cons, and probably
> >> would not
> >> work serve the threading use case well.
> >>
> >>
> >> And we will get the same problem as with pgbouncer: one
> >> process will not be able to handle all connections...
> >> Certainly it is possible to start several such scheduling
> >> bgworkers... But in any case it is more efficient to multiplex
> >> session in backend themselves.
> >>
> >>
> >> pgbouncer hold all time client connect. When we implement the
> >> listeners, then all work can be done by worker processes not by
> >> listeners.
> >>
> >
> > Sorry, I do not understand your point.
> > In my case pgbench establish connection to the pgbouncer only  once
> > at the beginning of the test.
> > And pgbouncer spends all time in context switches (CPU usage is 100%
> > and it is mostly in kernel space: top of profile are kernel
> functions).
> > The same picture will be if instead of pgbouncer you will do such
> > scheduling in one bgworker.
> > For the modern systems are not able to perform more than several
> > hundreds of connection switches per second.
> > So with single multiplexing thread or process you can not get speed
> > more than 100k, while at powerful NUMA system it is possible to
> > achieve millions of TPS.
> > It is illustrated by the results I have sent in the previous mail:
> > by spawning 10 instances of pgbouncer I was able to receive 7 times
> > bigger speed.
> >
> >
> > I'm sure pgbouncer can be improved. I've seen async code handle millions
> > of packets per second (zmq), pgbouncer shouldn't be radically different.
> >
>
> The trouble is pgbouncer is not handling individual packets. It needs to
> do additional processing to assemble the messages, understand the state
> of the connection (e.g. to do transaction pooling) etc. Or handle SSL.
>

I understand. But zmq also has to process framing very similar to the fe
protocol, so I'm still hopeful.

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 19:59, Tomas Vondra wrote:

The problem can be much easily solved in case of using pthread version

of Postgres. In this case reassigning session to another executor
(thread) can be don much easily.
And there is no need to use unportable trick with passing fiel
descriptor to other process.
And in future I am going to combine them. The problem is that pthread
version of Postgres is still in very raw state.


Yeah. Unfortunately, we're using processes now, and switching to threads
will take time (assuming it happens at all).

I have to agree with you.


3) Is there any sort of shrinking the pools? I mean, if the backend is
idle for certain period of time (or when we need backends for other
databases), does it get closed automatically?

When client is disconnected, client session is closed. But backen is not
terminated even if there are no more sessions at this backend.
It  was done intentionally, to avoid permanent spawning of new processes
when there is one or few clients which frequently connect/disconnect to
the database.

Sure, but it means a short peak will exhaust the backends indefinitely.
That's acceptable for a PoC, but I think needs to be fixed eventually.

Sorry, I do not understand it.
You specify size of backends pool which will server client session.
Size of this pool is chosen to provide the best performance at the 
particular system and workload.
So number of backends will never exceed this optimal value even in case 
of "short peak".
From my point of view terminating backends when there are no active 
sessions is wrong idea in any case, it was not temporary decision just 
for PoC.




Well, I haven't said it has to be single-threaded like pgbouncer. I
don't see why the bgworker could not use multiple threads internally (of
course, it'd need to be not to mess the stuff that is not thread-safe).



Certainly architecture with N multiple scheduling bgworkers and M 
executors (backends) may be more flexible
than solution when scheduling is done in executor itself. But we will 
have to pay extra cost for redirection.

I am not sure that finally it will allow to reach better performance.
More flexible solution in many cases doesn't mean more efficient solution.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Built-in connection pooling

2018-01-19 Thread Claudio Freire

On Fri, Jan 19, 2018 at 1:53 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

>
>
> On 19.01.2018 19:28, Pavel Stehule wrote:
>
>
>
> When I've been thinking about adding a built-in connection pool, my
>>> rough plan was mostly "bgworker doing something like pgbouncer" (that
>>> is, listening on a separate port and proxying everything to regular
>>> backends). Obviously, that has pros and cons, and probably would not
>>> work serve the threading use case well.
>>>
>>
>> And we will get the same problem as with pgbouncer: one process will not
>> be able to handle all connections...
>> Certainly it is possible to start several such scheduling bgworkers...
>> But in any case it is more efficient to multiplex session in backend
>> themselves.
>>
>
> pgbouncer hold all time client connect. When we implement the listeners,
> then all work can be done by worker processes not by listeners.
>
>
> Sorry, I do not understand your point.
> In my case pgbench establish connection to the pgbouncer only  once at the
> beginning of the test.
> And pgbouncer spends all time in context switches (CPU usage is 100% and
> it is mostly in kernel space: top of profile are kernel functions).
> The same picture will be if instead of pgbouncer you will do such
> scheduling in one bgworker.
> For the modern systems are not able to perform more than several hundreds
> of connection switches per second.
> So with single multiplexing thread or process you can not get speed more
> than 100k, while at powerful NUMA system it is possible to achieve millions
> of TPS.
> It is illustrated by the results I have sent in the previous mail: by
> spawning 10 instances of pgbouncer I was able to receive 7 times bigger
> speed.
>

I'm sure pgbouncer can be improved. I've seen async code handle millions of
packets per second (zmq), pgbouncer shouldn't be radically different.

Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra



On 01/19/2018 05:53 PM, Konstantin Knizhnik wrote:
> 
> 
> On 19.01.2018 19:28, Pavel Stehule wrote:
>>
>>
>> When I've been thinking about adding a built-in connection
>> pool, my
>> rough plan was mostly "bgworker doing something like
>> pgbouncer" (that
>> is, listening on a separate port and proxying everything to
>> regular
>> backends). Obviously, that has pros and cons, and probably
>> would not
>> work serve the threading use case well.
>>
>>
>> And we will get the same problem as with pgbouncer: one process
>> will not be able to handle all connections...
>> Certainly it is possible to start several such scheduling
>> bgworkers... But in any case it is more efficient to multiplex
>> session in backend themselves.
>>
>>
>> pgbouncer hold all time client connect. When we implement the
>> listeners, then all work can be done by worker processes not by listeners.
>>
> 
> Sorry, I do not understand your point.
> In my case pgbench establish connection to the pgbouncer only  once at
> the beginning of the test.
> And pgbouncer spends all time in context switches (CPU usage is 100% and
> it is mostly in kernel space: top of profile are kernel functions).
> The same picture will be if instead of pgbouncer you will do such
> scheduling in one bgworker.
> For the modern systems are not able to perform more than several
> hundreds of connection switches per second.
> So with single multiplexing thread or process you can not get speed more
> than 100k, while at powerful NUMA system it is possible to achieve
> millions of TPS.
> It is illustrated by the results I have sent in the previous mail: by
> spawning 10 instances of pgbouncer I was able to receive 7 times bigger
> speed.
> 

AFAICS making pgbouncer multi-threaded would not be hugely complicated.
A simple solution would be a fixed number of worker threads, and client
connections randomly assigned to them.

But this generally is not a common bottleneck in practical workloads (of
course, YMMV).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 19:28, Pavel Stehule wrote:



When I've been thinking about adding a built-in connection
pool, my
rough plan was mostly "bgworker doing something like
pgbouncer" (that
is, listening on a separate port and proxying everything to
regular
backends). Obviously, that has pros and cons, and probably
would not
work serve the threading use case well.


And we will get the same problem as with pgbouncer: one process
will not be able to handle all connections...
Certainly it is possible to start several such scheduling
bgworkers... But in any case it is more efficient to multiplex
session in backend themselves.


pgbouncer hold all time client connect. When we implement the 
listeners, then all work can be done by worker processes not by listeners.




Sorry, I do not understand your point.
In my case pgbench establish connection to the pgbouncer only  once at 
the beginning of the test.
And pgbouncer spends all time in context switches (CPU usage is 100% and 
it is mostly in kernel space: top of profile are kernel functions).
The same picture will be if instead of pgbouncer you will do such 
scheduling in one bgworker.
For the modern systems are not able to perform more than several 
hundreds of connection switches per second.
So with single multiplexing thread or process you can not get speed more 
than 100k, while at powerful NUMA system it is possible to achieve 
millions of TPS.
It is illustrated by the results I have sent in the previous mail: by 
spawning 10 instances of pgbouncer I was able to receive 7 times bigger 
speed.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Secondary index access optimizations

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 16:14, Antonin Houska wrote:

Konstantin Knizhnik  wrote:


On 11.01.2018 12:34, Antonin Houska wrote:

Konstantin Knizhnik  wrote:
I haven't thought that much about details, so just one comment: you shouldn't
need the conversion to text and back to binary form. utils/adt/rangetypes.c
contains constructors that accept the binary values.


Attached please find new version of the patch which uses range type to
interval matching in operator_predicate_proof function.

I think that instead of checking the operator name, e.g.

strcmp(oprname, "<")

you should test the operator B-tree strategy: BTLessStrategyNumber,
BTLessEqualStrategyNumber, etc. The operator string alone does not tell enough
about the operator semantics.

The strategy can be found in the pg_amop catalog.
get_op_btree_interpretation() function may be useful, but there may be
something better in utils/cache/lsyscache.c.


Thank you very much.
Shame on me that I didn't notice such solution myself - such checking of 
B-tree strategy is done in the same predtest.c file!
Now checking of predicate clauses compatibility is done in much more 
elegant and general way.

Attached please find new version of the patch.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bce3348..6a7e7fb 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -626,12 +626,12 @@ EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- Nu
Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NULL))
 (3 rows)
 
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
- QUERY PLAN  
--
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
+QUERY PLAN
+--
  Foreign Scan on public.ft1 t1
Output: c1, c2, c3, c4, c5, c6, c7, c8
-   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" IS NOT NULL))
+   Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE ((c3 IS NOT NULL))
 (3 rows)
 
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 1df1e3a..c421530 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -292,7 +292,7 @@ RESET enable_nestloop;
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 1; -- Var, OpExpr(b), Const
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE t1.c1 = 100 AND t1.c2 = 0; -- BoolExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NULL;-- NullTest
-EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL;-- NullTest
+EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 IS NOT NULL and c3 is not null;-- NullTest
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE round(abs(c1), 0) = 1; -- FuncExpr
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE c1 = -c1;  -- OpExpr(l)
 EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM ft1 t1 WHERE 1 = c1!;   -- OpExpr(r)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 44f6b03..c7bf118 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -345,6 +345,7 @@ set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		switch (rel->rtekind)
 		{
 			case RTE_RELATION:
+remove_restrictions_implied_by_constraints(root, rel, rte);
 if (rte->relkind == RELKIND_FOREIGN_TABLE)
 {
 	/* Foreign table */
@@ -1130,6 +1131,7 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 			set_dummy_rel_pathlist(childrel);
 			continue;
 		}
+		remove_restrictions_implied_by_constraints(root, childrel, childRTE);
 
 		/* CE failed, so finish copying/modifying join quals. */
 		childrel->joininfo = (List *)
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index f743871..f763a97 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -1466,6 +1466,51 @@ relation_excluded_by_constraints(PlannerInfo *root,
 	return false;
 }
 
+/*
+ * Remove from restrictions list items implied by table constraints
+ */
+void

Re: Add default role 'pg_access_server_files'

2018-01-19 Thread Ryan Murphy

Ok great. Thanks Michael and Stephen for the explanations.

Re: Built-in connection pooling

2018-01-19 Thread Pavel Stehule

When I've been thinking about adding a built-in connection pool, my
>> rough plan was mostly "bgworker doing something like pgbouncer" (that
>> is, listening on a separate port and proxying everything to regular
>> backends). Obviously, that has pros and cons, and probably would not
>> work serve the threading use case well.
>>
>
> And we will get the same problem as with pgbouncer: one process will not
> be able to handle all connections...
> Certainly it is possible to start several such scheduling bgworkers... But
> in any case it is more efficient to multiplex session in backend themselves.
>

pgbouncer hold all time client connect. When we implement the listeners,
then all work can be done by worker processes not by listeners.

Regards

Pavel


> But it would have some features that I find valuable - for example, it's
>> trivial to decide which connection requests may or may not be served
>> from a pool (by connection to the main port or pool port).
>>
>> That is not to say the bgworker approach is better than what you're
>> proposing, but I wonder if that would be possible with your approach.
>>
>>
>> regards
>>
>>
> --
> Konstantin Knizhnik
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 19, 2018 at 10:00 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
 Well, we could say that the properties of template1 and postgres
 are only restored if you use --clean.

>>> True.  Would that be a POLA violation, do you think?

>> It seems a bit non-orthogonal.  Also, while your point that people
>> expect "merge" behavior from pg_dump is certainly true, I'm not
>> convinced that anybody would be relying on that for pg_dumpall.

> [ assorted examples ]
> Still, it's worth thinking over these kinds of
> scenarios, I think.  People do a lot of ugly things in the real world
> that we as developers would never do, mostly to work around the
> problems we fail to foresee.

Unless someone has a better idea, I'll go with the semantics stated
above: DB-level properties of the two standard databases are only
transferred to pg_dumpall's target cluster if you authorize dropping
their old contents by saying --clean.  (pg_upgrade, of course, will
do exactly that.)

regards, tom lane

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 19.01.2018 18:53, Tomas Vondra wrote:


On 01/19/2018 10:52 AM, Konstantin Knizhnik wrote:


On 18.01.2018 18:02, Tomas Vondra wrote:

Hi Konstantin,

On 01/18/2018 03:48 PM, Konstantin Knizhnik wrote:

On 17.01.2018 19:09, Konstantin Knizhnik wrote:

Hi hackers,

...

I haven't looked at the code yet, but after reading your message I have
a simple question - gow iss this going to work with SSL? If you're only
passing a file descriptor, that does not seem to be sufficient for the
backends to do crypto (that requires the SSL stuff from Port).

Maybe I'm missing something and it already works, though ...


regards


Ooops, I missed this aspect with SSL. Thank you.
New version of the patch which correctly maintain session context is
attached.
Now each session has its own allocator which should be  used instead
of TopMemoryAllocator. SSL connections work now.


OK. I've looked at the code, but I have a rather hard time understanding
it, because there are pretty much no comments explaining the intent of
the added code :-( I strongly suggest improving that, to help reviewers.

Sorry, sorry, sorry...
There are some comments and I will add more.



The questions I'm asking myself are mostly these:

1) When assigning a backend, we first try to get one from a pool, which
happens right at the beginning of BackendStartup. If we find a usable
backend, we send the info to the backend (pg_send_sock/pg_recv_sock).

But AFAICS this only only happens at connection time, right? But it your
initial message you say "Rescheduling is done at transaction level,"
which in my understanding means "transaction pooling". So, how does that
part work?


Here it is:

              ChooseSession:
            DoingCommandRead = true;
            /* Select which client session is ready to send new 
query */
            if (WaitEventSetWait(SessionPool, -1, _client, 1, 
PG_WAIT_CLIENT) != 1)

    ...
                if (ready_client.fd == SessionPoolSock)
           {
                    /* Here we handle case of attaching new session */
    ...
                }
                else /* and here we handle case when there is query 
(new transaction) from some client */

                {
                    elog(DEBUG2, "Switch to session %d in backend %d", 
ready_client.fd, MyProcPid);
                    CurrentSession = 
(SessionContext*)ready_client.user_data;

                    MyProcPort = CurrentSession->port;
                }




2) How does this deal with backends for different databases? I don't see
any checks that the requested database matches the backend database (not
any code switching the backend from one db to another - which would be
very tricky, I think).

As I wrote in the initial mail this problem is not handled now.
It is expected that all clients are connected to the same database using 
the same user.

I only check and report an error if this assumption is violated.
Definitely it should be fixed. And it is one of the main challenge with 
this approach! And I want to receive some advices from community about 
the best ways of solving it.
The problem is that we get information about database/user in 
ProcessStartupPackage function in the beackend, when session is already 
assigned to the particular backend.
We either have to somehow redirect session to some other backend 
(somehow notify postmaster that we are not able to handle it)?
either obtain database/user name in postmaster. But it meas that 
ProcessStartupPackage should be called in postmaster and Postmaster has 
to read from client's socket.

I afraid that postmaster can be a bottleneck in this case.

The problem can be much easily solved in case of using pthread version 
of Postgres. In this case reassigning session to another executor 
(thread) can be don much easily.
And there is no need to use unportable trick with passing fiel 
descriptor to other process.
And in future I am going to combine them. The problem is that pthread 
version of Postgres is still in very raw state.



3) Is there any sort of shrinking the pools? I mean, if the backend is
idle for certain period of time (or when we need backends for other
databases), does it get closed automatically?


When client is disconnected, client session is closed. But backen is not 
terminated even if there are no more sessions at this backend.
It  was done intentionally, to avoid permanent spawning of new processes 
when there is one or few clients which frequently connect/disconnect to 
the database.



Furthermore, I'm rather confused about the meaning of session_pool_size.
I mean, that GUC determines the number of backends in the pool, it has
nothing to do with sessions per se, right? Which would mean it's a bit
misleading to name it "session_..." (particularly if the pooling happens
at transaction level, not session level - which is question #1).

Yehh, yes it is not right name. It means maximal number of backends 
which should be used to serve

Re: Built-in connection pooling

2018-01-19 Thread Tomas Vondra

On 01/19/2018 10:52 AM, Konstantin Knizhnik wrote:
> 
> 
> On 18.01.2018 18:02, Tomas Vondra wrote:
>> Hi Konstantin,
>>
>> On 01/18/2018 03:48 PM, Konstantin Knizhnik wrote:
>>> On 17.01.2018 19:09, Konstantin Knizhnik wrote:
 Hi hackers,

 ...
>> I haven't looked at the code yet, but after reading your message I have
>> a simple question - gow iss this going to work with SSL? If you're only
>> passing a file descriptor, that does not seem to be sufficient for the
>> backends to do crypto (that requires the SSL stuff from Port).
>>
>> Maybe I'm missing something and it already works, though ...
>>
>>
>> regards
>>
> Ooops, I missed this aspect with SSL. Thank you.
> New version of the patch which correctly maintain session context is
> attached.
> Now each session has its own allocator which should be  used instead
> of TopMemoryAllocator. SSL connections work now.
> 

OK. I've looked at the code, but I have a rather hard time understanding
it, because there are pretty much no comments explaining the intent of
the added code :-( I strongly suggest improving that, to help reviewers.

The questions I'm asking myself are mostly these:

1) When assigning a backend, we first try to get one from a pool, which
happens right at the beginning of BackendStartup. If we find a usable
backend, we send the info to the backend (pg_send_sock/pg_recv_sock).

But AFAICS this only only happens at connection time, right? But it your
initial message you say "Rescheduling is done at transaction level,"
which in my understanding means "transaction pooling". So, how does that
part work?

2) How does this deal with backends for different databases? I don't see
any checks that the requested database matches the backend database (not
any code switching the backend from one db to another - which would be
very tricky, I think).

3) Is there any sort of shrinking the pools? I mean, if the backend is
idle for certain period of time (or when we need backends for other
databases), does it get closed automatically?

Furthermore, I'm rather confused about the meaning of session_pool_size.
I mean, that GUC determines the number of backends in the pool, it has
nothing to do with sessions per se, right? Which would mean it's a bit
misleading to name it "session_..." (particularly if the pooling happens
at transaction level, not session level - which is question #1).

When I've been thinking about adding a built-in connection pool, my
rough plan was mostly "bgworker doing something like pgbouncer" (that
is, listening on a separate port and proxying everything to regular
backends). Obviously, that has pros and cons, and probably would not
work serve the threading use case well.

But it would have some features that I find valuable - for example, it's
trivial to decide which connection requests may or may not be served
from a pool (by connection to the main port or pool port).

That is not to say the bgworker approach is better than what you're
proposing, but I wonder if that would be possible with your approach.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 10:40 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
>>  wrote:
>>> I noticed that this test case added by the patch is not appropriate:
>>> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
>>> works well without the fix.  I modified this to inject a Sort into the
>>> recheck plan of the very first foreign join.  Attached is a patch for that.
>
>> Mumble.  Tom provided me with that example and said it failed without
>> the patch.  I didn't check, I just believed him.  But I'm surprised if
>> he was wrong; Tom usually tries to avoid being wrong...
>
> Hm.  It did fail as advertised when I connected to the contrib_regression
> database (after installcheck) and entered the query by hand; I
> copied-and-pasted the result of that to show you.  It's possible that it
> would not have failed in the particular spot where you chose to insert it
> in the regression script, if for example there were nondefault planner GUC
> settings active at that spot.  Did you check that the script produced the
> expected failure against unpatched code?

No.  I guess I'll have to go debug this.  Argh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
>  wrote:
>> I noticed that this test case added by the patch is not appropriate:
>> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
>> works well without the fix.  I modified this to inject a Sort into the
>> recheck plan of the very first foreign join.  Attached is a patch for that.

> Mumble.  Tom provided me with that example and said it failed without
> the patch.  I didn't check, I just believed him.  But I'm surprised if
> he was wrong; Tom usually tries to avoid being wrong...

Hm.  It did fail as advertised when I connected to the contrib_regression
database (after installcheck) and entered the query by hand; I
copied-and-pasted the result of that to show you.  It's possible that it
would not have failed in the particular spot where you chose to insert it
in the regression script, if for example there were nondefault planner GUC
settings active at that spot.  Did you check that the script produced the
expected failure against unpatched code?

regards, tom lane

Re: Fwd: [BUGS] pg_trgm word_similarity inconsistencies or bug

2018-01-19 Thread Liudmila Mantrova


Hello everyone,

I would like to contribute to documentation review of the patches 
discussed in thread 
https://www.postgresql.org/message-id/flat/cy4pr17mb13207ed8310f847cf117eed0d8...@cy4pr17mb1320.namprd17.prod.outlook.com 
(https://commitfest.postgresql.org/16/1403/). Unfortunately, I was not 
subscribed to pgsql-hackers before, so I cannot respond to this thread 
directly.


Please find attached new revisions of the original patches. I hope 
you'll find the changes useful!


--
Liudmila Mantrova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_trgm/trgm_op.c b/contrib/pg_trgm/trgm_op.c
index f7e96ac..306d60b 100644
--- a/contrib/pg_trgm/trgm_op.c
+++ b/contrib/pg_trgm/trgm_op.c
@@ -456,7 +456,7 @@ iterate_word_similarity(int *trg2indexes,
 			lastpos[trgindex] = i;
 		}
 
-		/* Adjust lower bound if this trigram is present in required substring */
+		/* Adjust upper bound if this trigram is present in required substring */
 		if (found[trgindex])
 		{
 			int			prev_lower,
@@ -473,7 +473,7 @@ iterate_word_similarity(int *trg2indexes,
 
 			smlr_cur = CALCSML(count, ulen1, ulen2);
 
-			/* Also try to adjust upper bound for greater similarity */
+			/* Also try to adjust lower bound for greater similarity */
 			tmp_count = count;
 			tmp_ulen2 = ulen2;
 			prev_lower = lower;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 338ef30..005961c 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -99,12 +99,10 @@
   
   real
   
-   Returns a number that indicates how similar the first string
-   to the most similar word of the second string. The function searches in
-   the second string a most similar word not a most similar substring.  The
-   range of the result is zero (indicating that the two strings are
-   completely dissimilar) to one (indicating that the first string is
-   identical to one of the words of the second string).
+   Returns a number that indicates the greatest similarity between
+   the set of trigrams in the first string and any continuous extent
+   of an ordered set of trigrams in the second string.  For details, see
+   the explanation below.
   
  
  
@@ -131,6 +129,35 @@

   
 
+  
+   word_similarity(text, text) requires further
+   explanation. Consider the following example:
+
+
+# SELECT word_similarity('word', 'two words');
+ word_similarity
+-
+ 0.8
+(1 row)
+
+
+   In the first string, the set of trigrams is
+   {"  w"," wo","ord","wor","rd "}.
+   In the second string, the ordered set of trigrams is
+   {"  t"," tw",two,"wo ","  w"," wo","wor","ord","rds", ds "}.
+   The most similar extent of an ordered set of trigrams in the second string
+   is {"  w"," wo","wor","ord"}, and the similarity is
+   0.8.
+  
+
+  
+   This function returns a value that can be approximately understood as the
+   greatest similarity between the first string and any substring of the second
+   string.  However, this function does not add paddings to the boundaries of
+   the extent.  Thus, a whole word match gets a higher score than a match with
+   a part of the word.
+  
+
   
pg_trgm Operators

@@ -156,10 +183,11 @@
   text % text
   boolean
   
-   Returns true if its first argument has the similar word in
-   the second argument and they have a similarity that is greater than the
-   current word similarity threshold set by
-   pg_trgm.word_similarity_threshold parameter.
+   Returns true if the similarity between the trigram
+   set in the first argument and a continuous extent of an ordered trigram
+   set in the second argument is greater than the current word similarity
+   threshold set by pg_trgm.word_similarity_threshold
+   parameter.
   
  
  
@@ -302,10 +330,11 @@ SELECT t, word_similarity('word', t) AS sml
   WHERE 'word' % t
   ORDER BY sml DESC, t;
 
-   This will return all values in the text column that have a word
-   which sufficiently similar to word, sorted from best
-   match to worst.  The index will be used to make this a fast operation
-   even over very large data sets.
+   This will return all values in the text column for which there is a
+   continuous extent in the corresponding ordered trigram set that is
+   sufficiently similar to the trigram set of word,
+   sorted from best match to worst.  The index will be used to make this
+   a fast operation even over very large data sets.
   
 
   
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index 212a890..dfecc2a 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -4,11 +4,12 @@ MODULE_big = pg_trgm
 OBJS = trgm_op.o trgm_gist.o trgm_gin.o trgm_regexp.o $(WIN32RES)
 
 EXTENSION = pg_trgm
-DATA = pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
+DATA = pg_trgm--1.3--1.4.sql \
+	pg_trgm--1.3.sql

Re: [PATCH] make check with Apple's SIP enabled

2018-01-19 Thread Tom Lane

=?iso-8859-1?Q?J=F6rg?= Westheide  writes:
> When doing a "make check" on Mac OS X with SIP (aka rootless mode) 
> enabled it will fail like this:
> ...

Yeah, well-known problem.  AFAIK, all PG developers who use Macs just
disable SIP immediately.

> The problem is that the psql links to libpq which it cannot find (at 
> least not the one from the postgres you're building). The usual 
> approach to set an environment variable pointing to the correct 
> location (DYLD_LIBRARY_PATH on Mac OS X/darwin, see above) does not 
> work since Apple's SIP prevents it from getting passed to psql

Yup.  This is incredibly brain-damaged and unnecessary on Apple's
part.  Several of us have filed bugs about it, and generally not
gotten much response.  Perhaps if people keep complaining, though,
eventually they'll see the light.

> What I do in the attached patch is changing the install name of libpq 
> in the psql binary of the temp install (used by "make check") to point 
> to libpq of the temp install so the (correct) lib is found.

Cute idea, but AFAICS it would at most fix "make check" and not any
tests that required other executables, or libraries besides libpq.
Plus there's the objection that then you're not really testing the
binary you intend to deploy.  I suppose if we got invasive enough,
we could extend this concept to every trouble spot, but I don't
much want to go there.

IIRC, the fact that SIP loses DYLD_LIBRARY_PATH is sort of accidental.
It's not that they've broken the variable altogether, it's that when
we invoke the shell to invoke psql, they clear out DYLD_LIBRARY_PATH
because of some weird decision that the shell is a security-critical
program.  (As if the ability to invoke a shell with your choice of input
didn't already give you the chance to do, far more easily, anything you
might do by subverting the shell internally.)  So it might be possible to
dodge this problem, at least for "make check" and other pg_regress-driven
cases, by changing pg_regress to avoid going through system(3) but
rather fork-and-exec'ing psql directly.

Nobody's pursued that idea, in part because we've generally found
that SIP breaks enough other stuff that you have to keep it turned
off anyway on development machines.  But if you're interested,
you could have a look at how messy that would be.

regards, tom lane

Re: Built-in connection pooling

2018-01-19 Thread Konstantin Knizhnik




On 18.01.2018 18:00, Claudio Freire wrote:



On Thu, Jan 18, 2018 at 11:48 AM, Konstantin Knizhnik 
> wrote:



Attached please find new version of the patch with few fixes.
And more results at NUMA system with 144 cores and 3Tb of RAM.

Read-only pgbench (-S):


#Connections\kTPS
Vanilla Postgres
Session pool size 256
1k
13001505
10k
633
1519
100k
-   1425




Read-write contention test: access to small number of records with
1% of updates.

#Clients\TPSVanilla PostgresSession pool size 256
100 557232  573319
200 520395  551670
300 511423  533773
400 468562  523091
500 442268  514056
600 401860  526704
700 363912  530317
800 325148  512238
900 301310  512844
1000278829  554516


So, as you can see, there is no degrade of performance with increased 
number of connections in case of using session pooling.


TBH, the tests you should be running are comparisons with a similar 
pool size managed by pgbouncer, not just vanilla unlimited postgres.


Of course a limited pool size will beat thousands of concurrent 
queries by a large margin. The real question is whether a 
pthread-based approach beats the pgbouncer approach.




Below are are results with pgbouncer:

#Connections\kTPS
Vanilla Postgres
Builti-in session pool size 256
Postgres + pgbouncer with transaction pooling mode and pool size  256
Postgres + 10 pgbouncers with pool size 20
1k
13001505
105
751
10k
633
1519
94
664
100k
-   1425
-
-


(-) here means that I failed to start such number of connections 
(because of "resource temporary unavailable" and similar errors).


So single pgbouncer is 10 times slower than direct connection to the 
postgres.
No surprise here: pgbouncer is snigle threaded and CPU usage for 
pgbouncer is almost 100%.
So we have to launch several instances of pgbouncer and somehow 
distribute load between them.
In Linux it is possible to use 
REUSEPORT(https://lwn.net/Articles/542629/) to perform load balancing 
between several pgbouncer instances.
But you have to edit pgbouncer code: it doesn't support such mode. So I 
have started several instances of pgbouncer at different ports and 
explicitly distribute several pgbench instances  between them.


But even in this case performance is twice slower than direct connection 
and built-in session pooling.
It is because of lacked of prepared statements which I can not use with 
pgbouncer in statement/transaction pooling mode.


Also please notice that with session pooling performance is better than 
with vanilla Postgres.
It is because with session pooling we can open more connections with out 
launching more backends.
It is especially noticeable at my local desktop with 4 cores: for normal 
Postgres optimal number of connections is about 10. But with session 
pooling 100 connections shows about 30% better result.


So, summarizing all above:

1. pgbouncer doesn't allows to use prepared statements and it cause up 
to two times performance penalty.
2. pgbouncer is single threaded and can not efficiently handle more than 
1k connections.
3. pgbouncer never can provide better performance than application 
connected directly to Postgres with optimal number of connections. In 
contrast session pooling can provide better performance than vanilla 
Postgres with optimal number of connections.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] postgres_fdw bug in 9.6

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 1:53 AM, Etsuro Fujita
 wrote:
> I noticed that this test case added by the patch is not appropriate:
>
> +-- multi-way join involving multiple merge joins
> +EXPLAIN (VERBOSE, COSTS OFF)
> +SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> +AND ft1.c1 = ft5.c1 FOR UPDATE;
> +SELECT * FROM ft1, ft2, ft4, ft5 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = ft4.c1
> +AND ft1.c1 = ft5.c1 FOR UPDATE;
>
> because it doesn't inject extra Sort nodes into EPQ recheck plans, so it
> works well without the fix.  I modified this to inject a Sort into the
> recheck plan of the very first foreign join.  Attached is a patch for that.

Mumble.  Tom provided me with that example and said it failed without
the patch.  I didn't check, I just believed him.  But I'm surprised if
he was wrong; Tom usually tries to avoid being wrong...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [PATCH] make check with Apple's SIP enabled

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 9:54 AM, Jörg Westheide
 wrote:
> When doing a "make check" on Mac OS X with SIP (aka rootless mode) enabled
> it will fail like this:

I don't know whether this fix is any good, but thanks for working on
the problem.  It's a very annoying problem, and I'd love to see it
solved somehow.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 10:00 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
>>> Well, we could say that the properties of template1 and postgres
>>> are only restored if you use --clean.
>
>> True.  Would that be a POLA violation, do you think?
>
> It seems a bit non-orthogonal.  Also, while your point that people
> expect "merge" behavior from pg_dump is certainly true, I'm not
> convinced that anybody would be relying on that for pg_dumpall.

What I expect typically happens is that someone restores into an
existing cluster and then realizes their mistake and goes and removes
all of the extra stuff that got created.  This is a bit like when you
run tar -zxf expecting it to create a subdirectory, but it turns out
to extract in the current directory instead.  You then curse your fate
and remove all of the files it created.  It's annoying, but not that
bad.  If it nuked the contents of the currently directory first,
though, you'd be much sadder.  If somebody changed tar to have that
behavior, people would probably *eventually* become aware of the risk
and adjust their behavior to avoid catastrophe, but it would probably
take 1-2 disasters per individual before they got used to the new way,
and that's a lot of disasters considering how many people use tar.
Or, maybe people wouldn't get used to it and they'd just go after
whoever made the change with pitchforks.  Anyway, that kind of thing
seems like a danger here.

The other possibility that comes to mind is that somebody might be
doing this is working around a failure of something like CREATE
LANGUAGE.  Say the name of the .so has changed, so CREATE LANGUAGE
will fail or do the wrong thing.  Instead of editing the dump, they
pre-install the language with the correct definition and then restore
over it, ignoring errors.  I guess that would only work with pg_dump,
not pg_dumpall, unless the database that had the problem was one of
the pre-created ones.  Still, it's worth thinking over these kinds of
scenarios, I think.  People do a lot of ugly things in the real world
that we as developers would never do, mostly to work around the
problems we fail to foresee.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> On Fri, Jan 19, 2018 at 9:45 AM, Tom Lane  wrote:
>> Well, we could say that the properties of template1 and postgres
>> are only restored if you use --clean.

> True.  Would that be a POLA violation, do you think?

It seems a bit non-orthogonal.  Also, while your point that people
expect "merge" behavior from pg_dump is certainly true, I'm not
convinced that anybody would be relying on that for pg_dumpall.

regards, tom lane

[PATCH] make check with Apple's SIP enabled

2018-01-19 Thread Jörg Westheide


Hi!

When doing a "make check" on Mac OS X with SIP (aka rootless mode) 
enabled it will fail like this:


- >8 snip 8< -
rm -rf ./testtablespace
mkdir ./testtablespace
PATH="/Volumes/Home/arbeit/port25/postgresql-clean/tmp_install/usr/local/pgsql/bin:$PATH" 
DYLD_LIBRARY_PATH="//tmp_install/usr/local/pgsql/lib" 
../../../src/test/regress/pg_regress --temp-instance=./tmp_check 
--inputdir=. --bindir= --dlpath=. --max-concurrent-tests=20 
--schedule=./parallel_schedule

== creating temporary instance==
== initializing database system   ==
== starting postmaster==
sh: line 1: 81758 Trace/BPT trap: 5   "psql" -X postgres < 
/dev/null 2> /dev/null
sh: line 1: 81762 Trace/BPT trap: 5   "psql" -X postgres < 
/dev/null 2> /dev/null
sh: line 1: 81765 Trace/BPT trap: 5   "psql" -X postgres < 
/dev/null 2> /dev/null
sh: line 1: 81768 Trace/BPT trap: 5   "psql" -X postgres < 
/dev/null 2> /dev/null
sh: line 1: 81771 Trace/BPT trap: 5   "psql" -X postgres < 
/dev/null 2> /dev/null

^Cmake[1]: *** [check] Interrupt: 2
make: *** [check] Interrupt: 2
- >8 snap 8< -

The problem is that the psql links to libpq which it cannot find (at 
least not the one from the postgres you're building). The usual 
approach to set an environment variable pointing to the correct 
location (DYLD_LIBRARY_PATH on Mac OS X/darwin, see above) does not 
work since Apple's SIP prevents it from getting passed to psql (see 
https://developer.apple.com/library/content/documentation/Security/Conceptual/System_Integrity_Protection_Guide/RuntimeProtections/RuntimeProtections.html 
)


What I do in the attached patch is changing the install name of libpq 
in the psql binary of the temp install (used by "make check") to point 
to libpq of the temp install so the (correct) lib is found.
For not duplicating the information I created a new file 
"Makefile.libdefs" to where I extracted the information needed to do 
the install name change


This patch has only been tested on Mac OS X El Capitan (10.11.6) since 
I currently have no other OS available


Please let me know what you think of it

Jörg




SIP.v1.patch
Description: Binary data

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jan 18, 2018 at 6:35 PM, Tom Lane  wrote:
>> If we did it like that, the rationale for an actual --set-db-properties
>> switch would vanish, at least so far as pg_dumpall is concerned -- we
>> could just make all that behavior an integral part of --create.  And
>> this wouldn't need to be conditional on getting ALTER DATABASE
>> CURRENT_DATABASE done.

> Unfortunately, I have a feeling that --set-db-properties might not be
> the only thing that would vanish.  I think users are accustomed by now
> to the idea that if you restore into an existing database, the
> existing contents are preserved and the new stuff from the dump is
> added (possibly with some errors and messiness).  With this design,
> the existing database contents will instead vanish, and that is
> probably going to make somebody unhappy.

Well, we could say that the properties of template1 and postgres
are only restored if you use --clean.

regards, tom lane

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2018-01-19 Thread Robert Haas

On Fri, Jan 19, 2018 at 4:56 AM, Yoshimi Ichiyanagi
 wrote:
>>Was the only non-default configuration setting wal_sync_method?  i.e.
>>synchronous_commit=on?  No change to max_wal_size?
> No, I used the following parameter in postgresql.conf to prevent
> checkpoints from occurring while running the tests.

I think that you really need to include the checkpoints in the tests.
I would suggest setting max_wal_size and/or checkpoint_timeout so that
you reliably complete 2 checkpoints in a 30-minute test, and then do a
comparison on that basis.

> Do you know any good WAL I/O intensive benchmarks? DBT2?

pgbench is quite a WAL-intensive benchmark; it is much more
write-heavy than what most systems experience in real life, at least
in my experience.  Your comparison of DAX FS to DAX FS + PMDK is very
interesting, but in real life the bandwidth of DAX FS is already so
high -- and the latency so low -- that I think most real-world
workloads won't gain very much.  At least, that is my impression based
on internal testing EnterpriseDB did a few months back.  (Thanks to
Mithun and Kuntal for that work.)

That's not necessarily an argument against this patch, which by the
way I have not reviewed.  Even a 5% speedup on this kind of workload
is potentially worthwhile; everyone likes it when things go faster.
I'm just not convinced you can get very much more than that on a
realistic workload.  Of course, I might be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions

2018-01-19 Thread Simon Riggs

On 26 December 2017 at 14:21, Nikhil Sontakke  wrote:

> With logical decoding, there might arise a case that such a row, if it
> belongs to a system catalog table or even a user catalog table
> (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=66abc2608c7c00fcd449e00a9e23f13f02e65d04),
> then it might be being used for decoding at the very same moment that
> the abort came in. If the row is re-claimed immediately, then it's
> possible that the decoding happening alongside might face issues.

If we want to make this change, I'd like to see something that
explains exactly how the problem in logical decoding occurs, or
preferably a test case. I can't understand the problem by reading the
archives. I'm not suggesting this patch doesn't work, I'm thinking
about whether there are other ways.

Surely if you are decoding a transaction and a concurrent abort is
requested then decoding should be interrupted at the next sensible
point. Allowing the two actions to occur without interlock is an
issue, so I suggest we just don't do it, rather than allow it and fix
subsequent breakage, which is what I understand this patch to do.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2018-01-19 Thread Tomas Vondra

Attached is v5, fixing a silly bug in part 0006, causing segfault when
creating a subscription.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Introduce-logical_work_mem-to-limit-ReorderBuffer-v5.patch.gz
Description: application/gzip


0002-Issue-XLOG_XACT_ASSIGNMENT-with-wal_level-logical-v5.patch.gz
Description: application/gzip


0003-Issue-individual-invalidations-with-wal_level-log-v5.patch.gz
Description: application/gzip


0004-Extend-the-output-plugin-API-with-stream-methods-v5.patch.gz
Description: application/gzip


0005-Implement-streaming-mode-in-ReorderBuffer-v5.patch.gz
Description: application/gzip


0006-Add-support-for-streaming-to-built-in-replication-v5.patch.gz
Description: application/gzip


0007-Track-statistics-for-streaming-spilling-v5.patch.gz
Description: application/gzip


0008-BUGFIX-make-sure-subxact-is-marked-as-is_known_as-v5.patch.gz
Description: application/gzip


0009-BUGFIX-set-final_lsn-for-subxacts-before-cleanup-v5.patch.gz
Description: application/gzip

Re: Add default role 'pg_access_server_files'

2018-01-19 Thread Stephen Frost

Michael, all,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Thu, Jan 18, 2018 at 02:04:45PM +, Ryan Murphy wrote:
> > I had not tried this before with my unpatched build of postgres.  (In
> > retrospect of course I should have).  I expected my superuser to be
> > able to perform this task, but it seems that for security reasons we
> > presently don't allow access to absolute path names (except in the
> > data dir and log dir) - not even for a superuser.  Is that correct?
> 
> Correct.

That's how it currently is, yes, though that doesn't really prevent a
superuser from accessing files outside of the data dir, they would just
have to use another mechanism to do so than this (but it's not hard).

> > In that case the security implications of this patch would need more
> > consideration. 
> > 
> > Stephen, looking at the patch, I see that in
> > src/backend/utils/adt/genfile.c you've made some changes to the
> > function convert_and_check_filename().  These changes, I believe,
> > loosen the security checks in ways other than just adding the
> > granularity of a new role which can be GRANTed to non superusers: 
> > 
> > +   /*
> > +* Members of the 'pg_access_server_files' role are allowed to 
> > access any
> > +* files on the server as the PG user, so no need to do any further 
> > checks
> > +* here.
> > +*/
> > +   if (is_member_of_role(GetUserId(), 
> > DEFAULT_ROLE_ACCESS_SERVER_FILES))
> > +   return filename;
> > +
> > +   /* User isn't a member of the default role, so check if it's 
> > allowable */
> > if (is_absolute_path(filename))
> > {
> 
> It seems to me that this is the intention behind the patch as the
> comment points out and as Stephen has stated in
> https://www.postgresql.org/message-id/20171231191939.gr2...@tamriel.snowman.net.

Yes, this change is intentional.  Note that superusers are members of
all roles explicitly (see the check in is_member_of_role()).

> > As you can see, you've added a short-circuit "return" statement for
> > anyone who has the DEFAULT_ROLE_ACCESS_SERVER_FILES.  Prior to this
> > patch, even a Superuser would be subject to the following
> > is_absolute_path() logic.  But with it, the return statement
> > short-circuits the is_absolute_path() check.
> 
> I agree that it is a strange concept to loosen the access while even
> superusers cannot do that. By concept superusers are assumed to be able
> to do anything on the server by the way.

As best as I can tell, the checks in these functions weren't because of
security concerns but simply because the original justification for them
was to be able to read files in the data directory and so they were
written specifically for that purpose.  There's no such check in
lo_import(), for example, and it is, as Michael says, assumed that
superusers are equivilant to having full access to the server as the
user the database is running as.

This patch still needs updating for Magnus' comments, of course, and
I'm still planning to make that happen, so Waiting on Author is the
right status currently.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Refactor handling of database attributes between pg_dump and pg_dumpall

2018-01-19 Thread Robert Haas

On Thu, Jan 18, 2018 at 6:35 PM, Tom Lane  wrote:
> If we did it like that, the rationale for an actual --set-db-properties
> switch would vanish, at least so far as pg_dumpall is concerned -- we
> could just make all that behavior an integral part of --create.  And
> this wouldn't need to be conditional on getting ALTER DATABASE
> CURRENT_DATABASE done.

Unfortunately, I have a feeling that --set-db-properties might not be
the only thing that would vanish.  I think users are accustomed by now
to the idea that if you restore into an existing database, the
existing contents are preserved and the new stuff from the dump is
added (possibly with some errors and messiness).  With this design,
the existing database contents will instead vanish, and that is
probably going to make somebody unhappy.

I agree with you that making ALTER DATABASE accept CURRENT_DATABASE is
a good idea.  I don't have a great idea what to do about the SET
TABLESPACE problem.  It's always seemed to me to be sort of weird that
you have to have a database in order to create, drop, etc. another
database, or even get a list of databases that exist.  As a new user,
I remember being quite frustrated that connecting to a database that
didn't exist gave no hint of how to find out what databases did exist.
Eventually I discovered psql -l and all was well, but I had no idea
how it worked under the hood.  Even though I now understand the
architectural considerations that have gotten us to where we are, I
still think it would be more intuitive to users if there were a
command-shell unrelated to any database that would let you perform
operations on databases.  I realize, however, that this patch isn't
going to create such a thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-19 Thread Alvaro Herrera

Hi Jesper

Jesper Pedersen wrote:

> I get
> 
> pg_restore: creating INDEX "pg_catalog.pg_authid_oid_index"
> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 5493; 1259 2677 INDEX
> pg_authid_oid_index jpedersen
> pg_restore: [archiver (db)] could not execute query: ERROR:  permission
> denied: "pg_authid" is a system catalog
> Command was:
> -- For binary upgrade, must preserve pg_class oids
> SELECT 
> pg_catalog.binary_upgrade_set_next_index_pg_class_oid('2677'::pg_catalog.oid);
> 
> CREATE UNIQUE INDEX "pg_authid_oid_index" ON "pg_authid" USING "btree"
> ("oid");
> 
> during check-world.

Wow, thanks for reporting.  That's freaking weird ...  Maybe I need to
use a more restrictive condition.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH][PROPOSAL] Refuse setting toast.* reloptions when TOAST table does not exist

2018-01-19 Thread Nikolay Shaplov

В письме от 18 января 2018 18:42:01 пользователь Tom Lane написал:
> >> This patch raises error if user tries o set or change toast.* option for
> >> a
> >> table that does not have a TOST relation.
> > 
> > I think there is a problem with this idea, which is that the rules for
> > whether or not a given table has an associated TOAST table
> > occasionally change from one major release to the next.  Suppose that,
> > in release X, a particular table definition causes a TOAST table to be
> > created, but in release X+1, it does not.  If a table with that
> > definition has a toast.* option set, then upgrading from release X to
> > release X+1 via pg_dump and restore will fail.  That's bad.
> 
> Yeah --- and it doesn't even have to be a major version change; the
> same version on different hardware might make different choices too.
> Not to mention that there is discussion out there about making the
> toasting rules more configurable.
> 
> There might be a case for raising a warning in this situation,
> but I would disagree with making it a hard error in any case.
> All that's going to accomplish is to break people's scripts and
> get them mad at you.

These all sound very reasonable, but still does not solve problem with loosing 
toast option values when you set one for table without TOAST relation.

May be, if reasons for existence TOAST relation is no so much fixed thing (I 
thought it is almost as fixed as a rock), may be the solution would be to 
create a TOAST relation anyway if toast options is set, and gave a warning 
that may be setting this option is not the thing the user really want?
This way we will not loose option values. And it would be 100% backward 
compatible.

The main misfeature here is that we will have empty TOAST relation in this 
case. But first I do not think it will really harm anybody, and second, we 
would WARN that it is not the best way to do things, so DB user will be able 
to find way around.

What do you think about it?

-- 
Do code for fun.

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH] session_replication_role = replica with TRUNCATE

2018-01-19 Thread Marco Nenciarini

Hi Peter,

Il 18/01/18 17:30, Peter Eisentraut ha scritto:
> On 1/17/18 11:33, Petr Jelinek wrote:
>>> P.S: I'm struggling to understand why we have two possible values of
>>> session_replication_role settings that behave identically (origin and
>>> local). I'm unable to see any difference according to the code or the
>>> documentation, so I'm wondering if we should document that they are the
>>> same.
>>>
>> It's for use by 3rd party tools (afaik both londiste and slony use it)
>> to differentiate between replicated and not replicated changes.
> 
> I have committed some documentation updates and tests to cover this a
> bit better.
> 

Thanks, the documentation is a lot clearer now.

This superseded the documentation change that was in the patch, so I've
removed it from the v3 version.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f2a928b823..180ebd0717 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 1340,1355  ExecuteTruncate(TruncateStmt *stmt)
}
  
/*
!* Check foreign key references.  In CASCADE mode, this should be
!* unnecessary since we just pulled in all the references; but as a
!* cross-check, do it anyway if in an Assert-enabled build.
 */
  #ifdef USE_ASSERT_CHECKING
-   heap_truncate_check_FKs(rels, false);
- #else
-   if (stmt->behavior == DROP_RESTRICT)
heap_truncate_check_FKs(rels, false);
  #endif
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them
--- 1340,1363 
}
  
/*
!* Suppress foreign key references check if session replication role is
!* set to REPLICA.
 */
+   if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
+   {
+ 
+   /*
+* Check foreign key references.  In CASCADE mode, this should 
be
+* unnecessary since we just pulled in all the references; but 
as a
+* cross-check, do it anyway if in an Assert-enabled build.
+*/
  #ifdef USE_ASSERT_CHECKING
heap_truncate_check_FKs(rels, false);
+ #else
+   if (stmt->behavior == DROP_RESTRICT)
+   heap_truncate_check_FKs(rels, false);
  #endif
+   }
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them


signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Transaction control in procedures

2018-01-19 Thread Simon Riggs

On 16 January 2018 at 20:24, Andrew Dunstan
 wrote:

> Looks good. Marking ready for committer.

Few questions/points for the docs.

Docs say: "A new transaction is started automatically after a
transaction is ended using these commands"
Presumably this would have exactly the same isolation level and other
transaction characteristics?
(Is it somehow possible to vary that. OK if not, no problem)

The error "cannot commit while a subtransaction is active"
is commented as intending to prevent COMMIT/ROLLBACK inside an EXCEPTION block.
That makes sense. It seems it will also prevent SAVEPOINTs, though
that seems not to be intended.
The two cases are dissimilar and it would be possible to block the
former but allow the latter.

It's not documented or tested that SET LOCAL would work or not work.
Does it work?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Transaction control in procedures

2018-01-19 Thread Simon Riggs

On 6 December 2017 at 22:34, Merlin Moncure  wrote:
> On Wed, Dec 6, 2017 at 8:41 AM, Peter Eisentraut
>  wrote:
>> On 12/5/17 13:33, Robert Haas wrote:
>>> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut
>>>  wrote:
 I think ROLLBACK in a cursor loop might not make sense, because the
 cursor query itself could have side effects, so a rollback would have to
 roll back the entire loop.  That might need more refined analysis before
 it could be allowed.
>>>
>>> COMMIT really has the same problem; if the cursor query has side
>>> effects, you can't commit those side effects piecemeal as the loop
>>> executed and have things behave sanely.
>>
>> The first COMMIT inside the loop would commit the cursor query.  This
>> isn't all that different from what you'd get now if you coded this
>> manually using holdable cursors or just plain client code.  Clearly, you
>> can create a mess if the loop body interacts with the loop expression,
>> but that's already the case.
>>
>> But if you coded something like this yourself now and ran a ROLLBACK
>> inside the loop, the holdable cursor would disappear (unless previously
>> committed), so you couldn't proceed with the loop.
>>
>> The SQL standard for persistent stored modules explicitly prohibits
>> COMMIT and ROLLBACK in cursor loop bodies.  But I think people will
>> eventually want it.
>
> The may want it, but silently promoting all cursors to held ones is
> not the way to give it to them, unless we narrow it down the the
> 'for-loop derived cursor' only.

I don't think we should do that automatically for all cursors, but it
seems clear that we would want that iff the loop contains COMMIT or
ROLLBACK.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Secondary index access optimizations

2018-01-19 Thread Antonin Houska

Konstantin Knizhnik  wrote:

> On 11.01.2018 12:34, Antonin Houska wrote:

> > Konstantin Knizhnik  wrote:

> > I haven't thought that much about details, so just one comment: you 
> > shouldn't
> > need the conversion to text and back to binary form. utils/adt/rangetypes.c
> > contains constructors that accept the binary values.
> >
> Attached please find new version of the patch which uses range type to
> interval matching in operator_predicate_proof function.

I think that instead of checking the operator name, e.g.

strcmp(oprname, "<")

you should test the operator B-tree strategy: BTLessStrategyNumber,
BTLessEqualStrategyNumber, etc. The operator string alone does not tell enough
about the operator semantics.

The strategy can be found in the pg_amop catalog.
get_op_btree_interpretation() function may be useful, but there may be
something better in utils/cache/lsyscache.c.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2018-01-19 Thread Robert Haas

On Thu, Jan 18, 2018 at 2:05 PM, Peter Geoghegan  wrote:
> Review of your patch:
>
> * SerializedReindexState could use some comments. At least a one liner
> stating its basic purpose.

Added a comment.

> * The "System index reindexing support" comment block could do with a
> passing acknowledgement of the fact that this is serialized for
> parallel workers.

Done.

> * Maybe the "Serialize reindex state" comment within
> InitializeParallelDSM() should instead say something like "Serialize
> indexes-pending-reindex state".

That would require corresponding changes in a bunch of other places,
possibly including the function names.  I think it's better to keep
the function names shorter and the comments matching the function
names, so I did not make this change.

> Other than that, looks good to me. It's a simple patch with a clear purpose.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: MCV lists for highly skewed distributions

2018-01-19 Thread John Naylor

>> [1] Occasionally it will store a much longer MCV list, because no values
>> was
>> sampled exactly once, which triggers a different code path in which all
>> seen
>> values are put in the MCV and the histogram is NULL.  This is not
>> reliable,
>> as whether the least-sample value is present in the sample once or twice
>> is
>> pretty brittle.
>
> And we need a better discussion of risk: Before we generated too few
> MCV entried. To what extent might me now generate too many? Which
> would be a problem in increased planning time.

(My apologies, I just now found some time to test this further, but I
don't have additional results yet)

Simon,
Earlier, I referenced a thread [1] complaining that we currently
already have too many MCVs in the case of uniform distributions, with
worse consequences than planning time. Based on my (admittedly quick
and dirty) preliminary testing (see attachment from a couple weeks
ago), this patch exacerbates that problem, and I was hoping to find a
way to fix that.

> I have a slight reservaton about whether 1.25x is still a sensible
> heuristic.

This was also discussed in [1], but no patch came out of it. I was
just now turning the formulas discussed there into code, but I'll
defer to someone with more expertise. FWIW, I suspect that a solution
that doesn't take into account a metric like coefficient of variation
will have the wrong behavior sometimes, whether for highly uniform or
highly non-uniform distributions.

[1] 
https://www.postgresql.org/message-id/flat/32261.1496611829%40sss.pgh.pa.us#32261.1496611...@sss.pgh.pa.us

-John Naylor

Re: Package version in PG_VERSION and version()

2018-01-19 Thread Christoph Berg

Re: Peter Eisentraut 2018-01-17 

> On 1/17/18 10:14, Christoph Berg wrote:
> > The difference is that when parsing version() (which is all my variant
> > is changing), people already have to deal with extra stuff at the end
> > (gcc version), while that would be new for "psql --version".
> 
> For me, having the package identifier in "psql -version" specifically
> would actually be the biggest win in this, because that's where a lot of
> issues arise (e.g., wrong libedit/readline library, different
> compiled-in socket location).

I enabled it for 10 and 11 now. With the arguably very long 11devel
snapshot build version number, it now looks like this:

$ psql --version
psql (PostgreSQL) 11devel (Debian 
11~~devel~20180119.1047-1~329.git4e54dd2.pgdg+1)

$ /usr/lib/postgresql/10/bin/psql --version
psql (PostgreSQL) 10.1 (Debian 10.1-3.pgdg+1)

$ psql
psql (11devel (Debian 11~~devel~20180119.1047-1~329.git4e54dd2.pgdg+1), Server 
10.1 (Debian 10.1-3.pgdg+1))

I'll report back if I see anything exploding.

The CF item is closed, thanks!

Christoph
-- 
Senior Berater, Tel.: +49 2166 9901 187
credativ GmbH, HRB Mönchengladbach 12080, USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
pgp fingerprint: 5C48 FE61 57F4 9179 5970  87C6 4C5A 6BAB 12D2 A7AE

Re: [PATCH] Logical decoding of TRUNCATE

2018-01-19 Thread Marco Nenciarini

Hi All,

Il 18/01/18 17:48, Simon Riggs ha scritto:
> On 17 January 2018 at 17:07, Petr Jelinek  
> wrote:
> 
>> Things I am less convinced about:
>>
>> The patch will cascade truncation on downstream if cascade was specified
>> on the upstream, that can potentially be dangerous and we either should
>> not do it and only truncate the tables which were truncated upstream
>> (but without restricting because of FKs), leaving the data inconsistent
>> on downstream (like we do already with DELETE or UPDATE). Or maybe make
>> it into either subscription or publication option so that user can chose
>> the behaviour here as I am sure some people will want it to cascade (but
>> the default should still IMHO be to not cascade as that's safer).
> 
> I agree the default should be to NOT cascade.
> 
> If someone wants cascading as a publication option, that can be added later.
> 

I agree that not replicating the CASCADE option is the best option
according to POLA principle.

>>> + /* logicalrep_rel_close call not needed, because ExecuteTruncateGuts
>>> +  * already closes the relations. Setting localrel to NULL in the map 
>>> entry
>>> +  * is still needed.
>>> +  */
>>> + rel->localrel = NULL;
>>
>> This is somewhat ugly. Perhaps the ExecuteTruncateGuts should track
>> which relations it opened and only close those and the rest should be
>> closed by caller? That should also remove the other ugly part which is
>> that the ExecuteTruncateGuts modifies the input list. What if caller
>> wanted to use those relations it sent as parameter later?
> 
> Agreed
> 

Attached a new version of the patch addressing these issues.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f2a928b823..180ebd0717 100644
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***
*** 1340,1355  ExecuteTruncate(TruncateStmt *stmt)
}
  
/*
!* Check foreign key references.  In CASCADE mode, this should be
!* unnecessary since we just pulled in all the references; but as a
!* cross-check, do it anyway if in an Assert-enabled build.
 */
  #ifdef USE_ASSERT_CHECKING
-   heap_truncate_check_FKs(rels, false);
- #else
-   if (stmt->behavior == DROP_RESTRICT)
heap_truncate_check_FKs(rels, false);
  #endif
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them
--- 1340,1363 
}
  
/*
!* Suppress foreign key references check if session replication role is
!* set to REPLICA.
 */
+   if (SessionReplicationRole != SESSION_REPLICATION_ROLE_REPLICA)
+   {
+ 
+   /*
+* Check foreign key references.  In CASCADE mode, this should 
be
+* unnecessary since we just pulled in all the references; but 
as a
+* cross-check, do it anyway if in an Assert-enabled build.
+*/
  #ifdef USE_ASSERT_CHECKING
heap_truncate_check_FKs(rels, false);
+ #else
+   if (stmt->behavior == DROP_RESTRICT)
+   heap_truncate_check_FKs(rels, false);
  #endif
+   }
  
/*
 * If we are asked to restart sequences, find all the sequences, lock 
them
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 1e22c1eefc..d1feea4909 100644
*** a/contrib/test_decoding/expected/ddl.out
--- b/contrib/test_decoding/expected/ddl.out
***
*** 543,548  UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
--- 543,550 
  UPDATE table_with_unique_not_null SET id = -id;
  UPDATE table_with_unique_not_null SET id = -id;
  DELETE FROM table_with_unique_not_null WHERE data = 3;
+ TRUNCATE table_with_unique_not_null;
+ TRUNCATE table_with_unique_not_null, table_with_unique_not_null RESTART 
IDENTITY CASCADE;
  -- check toast support
  BEGIN;
  CREATE SEQUENCE toasttable_rand_seq START 79 INCREMENT 1499; -- portable 
"random"
***
*** 660,665  SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
--- 662,673 
   table public.table_with_unique_not_null: DELETE: id[integer]:4
   COMMIT
   BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: (no-flags)
+  COMMIT
+  BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: restart_seqs cascade
+  COMMIT
+  BEGIN
   table public.toasttable: INSERT: id[integer]:1

Re: Bug in Physical Replication Slots (at least 9.5)?

2018-01-19 Thread Greg Stark

On 19 January 2017 at 09:37, Kyotaro HORIGUCHI
 wrote:
>
> Though I haven't look closer to how a modification is splitted
> into WAL records. A tuple cannot be so long. As a simple test, I
> observed rechder->xl_tot_len at the end of XLogRecordAssemble
> inserting an about 400KB not-so-compressable string into a text
> column, but I saw a series of many records with shorter than
> several thousand bytes.

I think the case to check is a commit record with many thousands of
subtransactions. I'm not sure you can fill a whole segment though.


-- 
greg

Re: [HACKERS] Useless code in ExecInitModifyTable

2018-01-19 Thread Amit Langote

On 2018/01/19 18:50, Amit Khandekar wrote:
> FYI ...
> 
> For the pending update-partition-key patch, we would again require the
> rel variable for UPDATE. So in the rebased update-partition-key patch
> [1], 'rel' is assigned the root partitioned table. But this time, I
> have used the already opened node->rootResultRelInfo to get the root
> partitioned table, rather than opening it again. Details : [1] . Sorry
> for not noticing this potential conflict earlier. Comments welcome.
> 
> [1] : 
> https://www.postgresql.org/message-id/CAJ3gD9cpyM1L0vTrXzrggR%3Dt6MSZtuy_kge1kagMBi0TSKa_UQ%40mail.gmail.com

That's nice.  Actually, the rootResultRelInfo field was introduced [1]
after partitioned_rels [2] and the code that got removed with the patch
that was committed should have gone much earlier.  That is, when
rootResultRelInfo was introduced, but then again as Fujita-san pointed
out, there wasn't much point then (and now) to finding the root table's
Relation pointer in some special place.  But now we need to, for the
update tuple routing case as you said.

Thanks,
Amit

[1]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=e180c8aa8ca

[2]
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d3cc37f1d801

Re: Rangejoin rebased

2018-01-19 Thread Simon Riggs

On 19 January 2018 at 08:25, Simon Riggs  wrote:
> On 17 January 2018 at 05:49, Jeff Davis  wrote:
>> On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs  wrote:
>>> Do we optimize for TIMESTAMP <@ RANGE as well?
>>
>> Not currently. It requires a little extra complexity because empty
>> ranges will match anything and need special handling.

err... that isn't correct. An empty range matches nothing, so can be
ignored in joins.

So probably best to explain some more, please.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] Useless code in ExecInitModifyTable

2018-01-19 Thread Amit Khandekar

FYI ...

For the pending update-partition-key patch, we would again require the
rel variable for UPDATE. So in the rebased update-partition-key patch
[1], 'rel' is assigned the root partitioned table. But this time, I
have used the already opened node->rootResultRelInfo to get the root
partitioned table, rather than opening it again. Details : [1] . Sorry
for not noticing this potential conflict earlier. Comments welcome.

[1] : 
https://www.postgresql.org/message-id/CAJ3gD9cpyM1L0vTrXzrggR%3Dt6MSZtuy_kge1kagMBi0TSKa_UQ%40mail.gmail.com

Thanks
-Amit Khandekar

On 18 January 2018 at 12:48, Etsuro Fujita  wrote:
> (2018/01/18 4:46), Tom Lane wrote:
>>
>> Pushed.  I think the long delay on this is really my fault for having
>> raised an incorrect objection initially --- apologies for that.
>
>
> Thanks for committing!
>
> Best regards,
> Etsuro Fujita
>

Re: [HACKERS] [PATCH] Improve geometric types

2018-01-19 Thread Emre Hasegeli

> I'm fine with the current shape. Thanks. Maybe the same
> discussion applies to polygons. (cf. poly_overlap)

It indeed does.  I am incorporating.

>  line_closept_point asserts line_interpt_line returns true but it
>  is hazardous. It is safer if line_closept_point returns NaN if
>  line_interpt_line returns false.

Yes, it makes more sense.  I am changing it.

>> >  I understand that we don't need to consider NAN is equal to NAN.
>> >  circle_same, point_eq_point, line_eq no longer needs such
>> >  change?
>>
>> I though it would be better to consider NaNs equal only on the
>> equality operators.  That is why only they are changed that way.
>
> I'm convinced by that.

Great.  I am documenting this decision better on the patches.

Re: [Sender Address Forgery]Re: pg_(total_)relation_size and partitioned tables

2018-01-19 Thread Amit Langote

Thanks for taking a look.

On 2018/01/19 14:39, Michael Paquier wrote:
> On Thu, Jan 18, 2018 at 06:54:18PM +0900, Amit Langote wrote:
>> I think having pg_partition_root() and pg_partition_parent() will give
>> users enough to get useful views as follows:
> 
> So... pg_partition_root() gives you access to the highest relation in
> the hierarchy, and pg_partition_parent() gives you access to the direct
> parent.

Right.

>> drop table p;
>> create table p (a int) partition by list (a);
>> create table p123 partition of p for values in (1, 2, 3) partition by list
> (a);
>> create table p12 partition of p1 for values in (1, 2) partition by list (a);
>> create table p12 partition of p123 for values in (1, 2) partition by list 
>> (a);
>> create table p1 partition of p12 for values in (1);
>> create table p2 partition of p12 for values in (2);
>> create table p3 partition of p123 for values in (3);
> 
> You need to reorder those queries, the creation of the first p12 would
> fail as p1 does not exist at this point.

Oops.  I had copy-pasted above commands from the psql's \s output and
ended up copying the command I didn't intend to.  Here it is again, but
without the mistake I made in my last email:

drop table p;
create table p (a int) partition by list (a);
create table p123 partition of p for values in (1, 2, 3) partition by list
(a);
create table p12 partition of p123 for values in (1, 2) partition by list (a);
create table p1 partition of p12 for values in (1);
create table p2 partition of p12 for values in (2);
create table p3 partition of p123 for values in (3);

> Wouldn't also a
> pg_partition_tree() be useful? You could shape it as a function which
> returns all regclass partitions in the tree as unique entries. Combined
> with pg_partition_parent() it can be powerful as it returns NULL for the
> partition at the top of the tree. So I think that we could live without
> pg_partition_root(). At the end, let's design something which makes
> unnecessary the use of WITH RECURSIVE when looking at a full partition
> tree to ease the user's life.

Do you mean pg_partition_tree(regclass), that returns all partitions in
the partition tree whose root is passed as the parameter?

Perhaps, like the following (roughly implemented in the attached)?

select  pg_partition_root(p) as root_parent,
pg_partition_parent(p) as parent,
p as relname,
pg_total_relation_size(p) as size
frompg_partition_tree_tables('p') p
order by 4;
 root_parent | parent | relname |  size
-++-+-
 p   || p   |   0
 p   | p  | p123|   0
 p   | p123   | p12 |   0
 p   | p123   | p3  | 3653632
 p   | p12| p1  | 3653632
 p   | p12| p2  | 3653632
(6 rows)

> Documentation, as well as regression tests, would be welcome :)

OK, I will add those things in the next version.

Thanks,
Amit
>From 50dfb02bd3ea833d8b18fc5d3d54e863fbc223e4 Mon Sep 17 00:00:00 2001
From: amit 
Date: Tue, 16 Jan 2018 19:02:13 +0900
Subject: [PATCH] Add assorted partition reporting functions

---
 src/backend/catalog/partition.c | 117 +++-
 src/backend/utils/cache/lsyscache.c |  22 +++
 src/include/catalog/partition.h |   1 +
 src/include/catalog/pg_proc.h   |  12 
 src/include/utils/lsyscache.h   |   1 +
 5 files changed, 152 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 8adc4ee977..ac92bbfa71 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_type.h"
 #include "commands/tablecmds.h"
 #include "executor/executor.h"
+#include "funcapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
@@ -181,6 +182,7 @@ static int partition_bound_bsearch(PartitionKey key,
 static int get_partition_bound_num_indexes(PartitionBoundInfo b);
 static int get_greatest_modulus(PartitionBoundInfo b);
 static uint64 compute_hash_value(PartitionKey key, Datum *values, bool 
*isnull);
+static Oid get_partition_parent_internal(Oid relid, bool recurse_to_root);
 
 /* SQL-callable function for use in hash partition CHECK constraints */
 PG_FUNCTION_INFO_V1(satisfies_hash_partition);
@@ -1362,7 +1364,7 @@ check_default_allows_bound(Relation parent, Relation 
default_rel,
 /*
  * get_partition_parent
  *
- * Returns inheritance parent of a partition by scanning pg_inherits
+ * Returns inheritance parent of a partition.
  *
  * Note: Because this function assumes that the relation whose OID is passed
  * as an argument will have precisely one parent, it should only be called
@@ -1371,6 +1373,37 @@ check_default_allows_bound(Relation parent, Relation 
default_rel,
 Oid
 get_partition_parent(Oid relid)
 {
+   if (!get_rel_relispartition(relid))
+   return InvalidOid;
+
+

Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [Sender Address Forgery]Re: [HACKERS] path toward faster partition pruning

2018-01-19 Thread David Rowley

On 19 January 2018 at 16:00, Kyotaro HORIGUCHI
 wrote:
> And I'd like to ask David to check out his mail environment so
> that SPF record is available for his message.

Will investigate

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Function to move the position of a replication slot

2018-01-19 Thread Magnus Hagander

On Wed, Nov 29, 2017 at 7:48 AM, Michael Paquier 
wrote:

> On Tue, Sep 5, 2017 at 11:51 AM, Andres Freund  wrote:
> > On 2017-09-05 11:36:47 +0900, Michael Paquier wrote:
> >> On Thu, Aug 31, 2017 at 9:19 PM, Magnus Hagander 
> wrote:
> >> > PFA an updated and rebased patch.
> >> >
> >> > Rebased. Now named pg_advance_replication_slot. ERROR on logical
> slots.
> >> > Forward only.
> >> >
> >> > I think that, in the end, covered all the comments?
> >>
> >> +   if (backwards)
> >> +   ereport(WARNING,
> >> +   (errmsg("Not moving replication slot backwards!")));
> >> Shouldn't this be an ERROR, mentioning the current position of the slot?
> >>
> >> +ereport(ERROR,
> >> +(errmsg("Only physical replication slots can be
> advanced.")));
> >> ERRCODE_FEATURE_NOT_SUPPORTED, no?
> >
> > Seither of these seem to follow the message guidelines.
>
> True as well, and the patch did not get an update in two months to
> reflect that. So I am marking it as returned with feedback.
>

For the purpose of the archives: this patch has been superseded by Petrs
work in
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=9c7d06d60680c7f00d931233873dee81fdb311c6
which will be in PostgreSQL 11.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2018-01-19 Thread Michael Paquier

On Fri, Jan 19, 2018 at 10:54:53AM +0900, Kyotaro HORIGUCHI wrote:
> On the other hand if one logical record must be read from single
> source, we need any means to deterrent wal-recycling on the
> primary side. Conducting that within the primary side is rejected
> by consensus.

There is this approach of extending the message protocol as well so as
the primary can retain the segments it needs to keep around...

> (There might be smarter way to do that, though.) To
> do that from the standby side, just retarding write feedbacks or
> this kind of special feedback would be needed. In any way it is
> necessary to introduce WAL-record awareness into WAL shipping
> process and it's seemingly large complexity.

Coming to logical slots, don't you need to be aware of the beginning of
the record on the primary to perform correctly decoding of tuples
through time travel? If the record is present across segments, it seems
to me that it matters. Andres knows the answer to that for sure, so I
would expect to be counter-argued in the next 12 hours or so.
--
Michael


signature.asc
Description: PGP signature

Re: Typo in slotfuncs.c

2018-01-19 Thread Masahiko Sawada

On Fri, Jan 19, 2018 at 3:39 PM, Simon Riggs  wrote:
> On 19 January 2018 at 05:38, Masahiko Sawada  wrote:
>> Hi,
>>
>> Attached patch for fixing a typo in slotfuncs.c
>>
>> s/possition/position/
>
> Fixed, thanks.
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] [PATCH] Improve geometric types

2018-01-19 Thread Kyotaro HORIGUCHI

Hello,

At Thu, 18 Jan 2018 16:01:01 +0100, Emre Hasegeli  wrote in

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2018-01-19 Thread Yuto Hayamizu

On Thu, Nov 9, 2017 at 12:33 PM, Ashutosh Bapat
 wrote:
> different set, within each set the order is same. FWIW, we can order
> all clauses in largest set once and use that order every time. Albeit
> we will have to remember the order somewhere OR make the separator
> routine retain the order in the larger set, which I guess is true
> about all separator functions.

For this patch, sorting of a qual list happens only once for each
range table entry, not for each path. So there is no need for caching
sorted qual lists as far as I know.

regards,
Yuto Hayamizu

Re: Rangejoin rebased

2018-01-19 Thread Simon Riggs

On 17 January 2018 at 05:49, Jeff Davis  wrote:
> On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs  wrote:
>> Do we optimize for TIMESTAMP <@ RANGE as well?
>
> Not currently. It requires a little extra complexity because empty
> ranges will match anything and need special handling.

TIMESTAMP <@ RANGE is arguably more important than RANGE && RANGE

Trying to cast timestamp to range to make that work is a bit hokey

If the problem is just empty ranges, it seems like we should do that here also.

I'd be happy with the optimization only working if ranges are provably
non-empty, e.g. CHECK (NOT isempty(col))
Or perhaps we need non-empty types: e.g. tsrangene

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: Configurable file mode mask

2018-01-19 Thread Michael Paquier

On Wed, Jan 10, 2018 at 03:19:46PM -0300, Alvaro Herrera wrote:
> David Steele wrote:
> > On 1/8/18 8:58 PM, Peter Eisentraut wrote:
> 
> > > Yeah, I didn't like this aspect when this patch was originally
> > > submitted.  We want to keep the code legible for future new
> > > contributors.  Having these generic-sounding but specific-in-purpose
> > > wrapper functions can be pretty confusing.  Let's use mkdir() when it's
> > > the appropriate function, and let's figure out a different name for
> > > "make a data directory subdirectory in a secure and robust way".
> 
> > How about MakeDirectoryDefaultPerm()?  That's what I'll go with if I
> > don't hear any other ideas.  The single call to MakeDirectoryPerm() will
> > be reverted to mkdir() and I'll remove the function.
> 
> I'd go with MakeDirectory, documenting exactly what it does and why, and
> be done with it.  If your new function satisfies future users, great; if
> not, it can be patched (or not) once we know exactly what these callers
> need.

After going through the thread, I would vote for making things simple by
just using MakeDirectory() and document precisely what it does. Anyway,
there is as well the approach of using MakeDirectoryDefaultPerm(), so
I'll be fine with the decision you make, David. The patchis moved to
"waiting on author".
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2018-01-19 Thread Yuto Hayamizu

On Wed, Nov 8, 2017 at 6:48 AM, Tom Lane  wrote:
> Because (1) up to now there's been no need to consider the qual ordering
> till later, and (2) re-doing that sort for each path seemed unduly
> expensive.  If we're to try to estimate whether later quals will be
> reached, then sure the ordering becomes important.  I'm still concerned
> about (2) though.

This patch changes 'set_baserel_size_estimates', and it is called only once
for each range table entry, so sorting does not happen for each path and
its negative impact on optimizer performance is negligible.

While sorting quals does not cost so much, I noticed another performance
problem of this patch.
Patched 'set_baserel_size_estimates'  is O(N^2) (N is the number of quals)
and would take so long time when a query has a large qual list.
After sorting quals, it loops over quals q_1,q_2, ..., q_N to
estimated "weighted"
cost of each qual q_k. In each iteration, in order to calculate "the
weight" of q_k,
it calls clauselist_selectivity({q_1,q_2, ..., q_k}), which loops k
times in the function.

I've checked actual negative impact on optimization time with the
artificial query:
SELECT count(*) FROM pgbench_accounts WHERE (randomly generated N quals);

N=50: 0.5966 msec (original), 0.938 msec (patched)
N=100: 1.1992 msec (original), 1.5396 msec (patched)
N=200: 1.9167 msec (original),  4.6676 msec (patched)
N=400: 2.7583 msec (original), 12.0786 msec (patched)
N=800: 5.2919 msec (original), 38.0228 msec (patched)
N=1600: 9.3428 msec (original), 167.737 msec (patched)

>From this result, patched version is almost O(N^2).
100 quals in a query seems unusually large number, but I think there may exists
some real-world queries that have hundreds or thousands of quals,
so I feel this patch needs improvement for handling large N.

My idea of improving this patch is that give a threshold N_limit,
and for q_1 ... q_N_limit, do the same weighted cost estimation in the
current version of this patch.
For q_{N_limit+1} , stop calling clauselist_selectivity for
calculating the weight
and reuse the result of clauselist_selectivity({q_1,q_2, ..., q_N_limit}).
For example, if N_limit=100, additional overhead is only
sub-milliseconds per each range table entry,
and cost estimation is surely better than the current postgres implementation.

Any thoughts?

81 matches

Mail list logo