Re: Index Skip Scan

2019-07-21 Thread David Rowley
On Wed, 17 Jul 2019 at 04:53, Jesper Pedersen
 wrote:
> Here is a patch more in that direction.

Thanks. I've just had a look over this and it roughly what I have in mind.

Here are the comments I noted down during the review:

cost_index:

I know you've not finished here, but I think it'll need to adjust
tuples_fetched somehow to account for estimate_num_groups() on the
Path's unique keys. Any Eclass with an ec_has_const = true does not
need to be part of the estimate there as there can only be at most one
value for these.

For example, in a query such as:

SELECT x,y FROM t WHERE x = 1 GROUP BY x,y;

you only need to perform estimate_num_groups() on "y".

I'm really not quite sure on what exactly will be required from
amcostestimate() here.  The cost of the skip scan is not the same as
the normal scan. So other that API needs adjusted to allow the caller
to mention that we want skip scans estimated, or there needs to be
another callback.

build_index_paths:

I don't quite see where you're checking if the query's unique_keys
match what unique keys can be produced by the index.  This is done for
pathkeys with:

pathkeys_possibly_useful = (scantype != ST_BITMAPSCAN &&
!found_lower_saop_clause &&
has_useful_pathkeys(root, rel));
index_is_ordered = (index->sortopfamily != NULL);
if (index_is_ordered && pathkeys_possibly_useful)
{
index_pathkeys = build_index_pathkeys(root, index,
  ForwardScanDirection);
useful_pathkeys = truncate_useless_pathkeys(root, rel,
index_pathkeys);
orderbyclauses = NIL;
orderbyclausecols = NIL;
}

Here has_useful_pathkeys() checks if the query requires some ordering.
For unique keys you'll want to do the same. You'll have set the
query's unique key requirements in standard_qp_callback().

I think basically build_index_paths() should be building index paths
with unique keys, for all indexes that can support the query's unique
keys. I'm just a bit uncertain if we need to create both a normal
index path and another path for the same index with unique keys.
Perhaps we can figure that out down the line somewhere. Maybe it's
best to build path types for now, when possible, and we can consider
later if we can skip the non-uniquekey paths.  Likely that would
require a big XXX comment to explain we need to review that before the
code makes it into core.

get_uniquekeys_for_index:

I think this needs to follow more the lead from build_index_pathkeys.
Basically, ask the index what it's pathkeys are.

standard_qp_callback:

build_group_uniquekeys & build_distinct_uniquekeys could likely be one
function that takes a list of SortGroupClause. You just either pass
the groupClause or distinctClause in.  Pretty much the UniqueKey
version of make_pathkeys_for_sortclauses().

> Some questions:
>
> 1) Do we really need the UniqueKey struct ?  If it only contains the
> EquivalenceClass field then we could just have a list of those instead.
> That would make the patch simpler.

I dunno about that. I understand it looks a bit pointless due to just
having one field, but perhaps we can worry about that later. If we
choose to ditch it and replace it with just an EquivalenceClass then
we can do that later.

> 2) Do we need both canon_uniquekeys and query_uniquekeys ?  Currently
> the patch only uses canon_uniquekeys because the we attach the list
> directly on the Path node.

canon_uniquekeys should store at most one UniqueKey per
EquivalenceClass. The reason for this is for fast comparison. We can
compare memory addresses rather than checking individual fields are
equal. Now... yeah it's true that there is only one field so far and
we could just check the pointers are equal on the EquivalenceClasses,
but I think maybe this is in the same boat as #1. Let's do it for now
so we're sticking as close to the guidelines laid out by PathKeys and
once it's all working and plugged into skip scans then we can decide
if it needs a simplification pass over the code.

> I'll clean the patch up based on your feedback, and then start to rebase
> v21 on it.

Cool.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Fix typos and inconsistencies for HEAD (take 7)

2019-07-21 Thread Alexander Lakhin
Hello Tom,
22.07.2019 7:14, Tom Lane wrote:
>> Fixing both places sounds adapted to me.  An alternative we could use
>> here is just to say something like that:
>> The effective resolution is only 1/HZ, which can be configured with
>> kernel parameter (see man 7 time), and is 4 milliseconds by
>> default.
> Whatever we say here is going to be a lie on some platforms.
>
> Probably best just to say that the sleep resolution is platform-dependent
> and leave it at that.
I think, we can say "on many systems"/ "on most Unixen", and it would
not be a lie.
In my opinion, while a generic reference to platform-dependency is OK
for developer' documentation, it makes the following passage in
config.sgml vague (we don't give user a hint, what "the effective
resolution" can be - several seconds/milliseconds/nanoseconds?):
/The default value is 200 milliseconds (200ms). Note
that on many systems, the//
//effective resolution of sleep delays is 10 milliseconds; setting//
//bgwriter_delay to a value that is not a multiple of
10//
//might have the same results as setting it to the next higher multiple
of 10. /
->
/The default value is 200 milliseconds (200ms). Note
that the//
//effective resolution of sleep delays is paltform-dependent. setting//
//bgwriter_delay to a value that is not a multiple of
the effective resolution,/
/might have the same results as setting it to the next higher multiple./

Best regards,
Alexander


Re: POC: converting Lists into arrays

2019-07-21 Thread Tom Lane
David Rowley  writes:
> On Mon, 22 Jul 2019 at 16:37, Tom Lane  wrote:
>> Interesting.  I wonder if bms_next_member() could be made any quicker?

> I had a quick look earlier and the only thing I saw was maybe to do
> the first loop differently from subsequent ones.  The "w &= mask;"
> does nothing useful once we're past the first bitmapword that the loop
> touches.

Good thought, but it would only help when we're actually iterating to
later words, which happens just 1 out of 64 times in the fully-
populated-bitmap case.

Still, I think it might be worth pursuing to make the sparse-bitmap
case faster.

regards, tom lane




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 16:36, Tsunakawa, Takayuki
 wrote:
>
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
> > For the use case we've been measuring with partitioned tables and the
> > generic plan generation causing a sudden spike in the number of
> > obtained locks, then having plan_cache_mode = force_custom_plan will
> > cause the lock table not to become bloated. I'm not sure there's
> > anything interesting to measure there.
>
> I meant the difference between the following two cases, where the query only 
> touches one partition (e.g. SELECT ... WHERE pkey = value):
>
> * plan_cache_mode=force_custom_plan: LocalLockHash won't bloat.  The query 
> execution time is steady.
>
> * plan_cache_mode=auto: LocalLockHash bloats on the sixth execution due to 
> the creation of the generic plan.  The generic plan is not adopted because 
> its cost is high.  Later executions of the query will suffer from the bloat 
> until the 1006th execution when LocalLockHash is shrunk.

I measured this again in
https://www.postgresql.org/message-id/CAKJS1f_ycJ-6QTKC70pZRYdwsAwUo+t0_CV0eXC=j-b5a2m...@mail.gmail.com
where I posted the v6 patch.  It's the final results in the email.   I
didn't measure plan_cache_mode = force_custom_plan. There'd be no lock
table bloat in that case and the additional overhead would just be
from the hash_get_num_entries() && hash_get_max_bucket() calls, which
the first results show next to no overhead for.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: POC: converting Lists into arrays

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 16:37, Tom Lane  wrote:
>
> David Rowley  writes:
> > So the bms_next_member() loop is slower when the bitmapset is fully
> > populated with all subplans, but way faster when there's just 1
> > member.
>
> Interesting.  I wonder if bms_next_member() could be made any quicker?

I had a quick look earlier and the only thing I saw was maybe to do
the first loop differently from subsequent ones.  The "w &= mask;"
does nothing useful once we're past the first bitmapword that the loop
touches.  Not sure what the could would look like exactly yet, or how
much it would help. I'll maybe experiment a bit later, but as separate
work from the other patch.

> Still, I agree that this is negligible compared to the actual work
> needed per live subplan, and the fact that the cost scales per live
> subplan is a good thing.  So no objection to this patch, but a mental
> note to take another look at bms_next_member() someday.

Thanks for having a look.  I'll have another look and will likely push
this a bit later on today if all is well.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: POC: converting Lists into arrays

2019-07-21 Thread Tom Lane
David Rowley  writes:
> On Mon, 22 Jul 2019 at 02:45, Tom Lane  wrote:
>> One small question is whether it loses if most of the subplans
>> are present in the bitmap.  I imagine that would be close enough
>> to break-even, but it might be worth trying to test to be sure.

> ...
> -- Test 2 (all matching subplans (8192 of them)) --

> Master version:

> Time: 14825.304 ms (00:14.825)
> Time: 14701.601 ms (00:14.702)
> Time: 14650.969 ms (00:14.651)

> Patched version:

> Time: 44551.811 ms (00:44.552)
> Time: 44357.915 ms (00:44.358)
> Time: 43454.958 ms (00:43.455)

> So the bms_next_member() loop is slower when the bitmapset is fully
> populated with all subplans, but way faster when there's just 1
> member.

Interesting.  I wonder if bms_next_member() could be made any quicker?
Still, I agree that this is negligible compared to the actual work
needed per live subplan, and the fact that the cost scales per live
subplan is a good thing.  So no objection to this patch, but a mental
note to take another look at bms_next_member() someday.

regards, tom lane




RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
> For the use case we've been measuring with partitioned tables and the
> generic plan generation causing a sudden spike in the number of
> obtained locks, then having plan_cache_mode = force_custom_plan will
> cause the lock table not to become bloated. I'm not sure there's
> anything interesting to measure there.

I meant the difference between the following two cases, where the query only 
touches one partition (e.g. SELECT ... WHERE pkey = value):

* plan_cache_mode=force_custom_plan: LocalLockHash won't bloat.  The query 
execution time is steady.

* plan_cache_mode=auto: LocalLockHash bloats on the sixth execution due to the 
creation of the generic plan.  The generic plan is not adopted because its cost 
is high.  Later executions of the query will suffer from the bloat until the 
1006th execution when LocalLockHash is shrunk.


Depending on the number of transactions and what each transaction does, I 
thought the difference will be noticeable or not.


Regards
Takayuki Tsunakawa




Re: Fix typos and inconsistencies for HEAD (take 7)

2019-07-21 Thread Alexander Lakhin
Hello Michael,
22.07.2019 4:05, Michael Paquier wrote:
>> Also, I found e-mail headers in optimizer/plan/README not relevant, so I
>> propose to remove them.
> Not sure about that part.
I agree that the proposed fix is not complete, but just raises the
demand for a subsequent fix.
If you don't mind, I would return to such questionable and aside items
after finishing with the unicums en masse.

Best regards.
Alexander




Re: Fix typos and inconsistencies for HEAD (take 7)

2019-07-21 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Jul 21, 2019 at 08:28:53AM +0300, Alexander Lakhin wrote:
>> And another finding is related to the sleep effective resolution. `man 7
>> time` says "Since kernel 2.6.13, the HZ value is a kernel configuration 
>> parameter  and  can  be 100, 250 (the default) ...", so the 10
>> milliseconds is not the most common effective resolution nowadays.
>> I propose the corresponding patch for pgsleep.c, but we have a similar
>> statement in doc/.../config.sgml. I think It should be fixed too.

> Fixing both places sounds adapted to me.  An alternative we could use
> here is just to say something like that:
> The effective resolution is only 1/HZ, which can be configured with
> kernel parameter (see man 7 time), and is 4 milliseconds by
> default.

Whatever we say here is going to be a lie on some platforms.

Probably best just to say that the sleep resolution is platform-dependent
and leave it at that.

regards, tom lane




Re: Add parallelism and glibc dependent only options to reindexdb

2019-07-21 Thread Michael Paquier
On Fri, Jul 19, 2019 at 08:29:27AM +0200, Julien Rouhaud wrote:
> On Fri, Jul 19, 2019 at 2:35 AM Michael Paquier  wrote:
>> For the second patch, could you send a rebase with a fix for the
>> connection slot when processing the reindex commands?
> 
> Attached, I also hopefully removed all the now unneeded progname usage.

+Note that this mode is not compatible the -i / --index
+or the -s / --system options.
Nits: this is not a style consistent with the documentation.  When
referring to both the long and short options the formulation "-i or
--index" gets used.  Here we could just use the long option.  This
sentence is missing a "with".

   simple_string_list_append(, optarg);
+  tbl_count++;
   break;
The number of items in a simple list is not counted, and vacuumdb does
the same thing to count objects.  What do you think about extending
simple lists to track the number of items stored?

+$node->issues_sql_like([qw(reindexdb -j2)],
+   qr/statement: REINDEX TABLE public.test1/,
+   'Global and parallel reindex will issue per-table REINDEX');
Would it make sense to have some tests for schemas here?

One of my comments in [1] has not been answered.  What about
the decomposition of a list of schemas into a list of tables when
using the parallel mode?

[1]: https://www.postgresql.org/message-id/20190711040433.gg4...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: Psql patch to show access methods info

2019-07-21 Thread Alvaro Herrera
On 2019-Jul-21, Alexander Korotkov wrote:

> I've one note.  Behavior of "\dA" and "\dA pattern" look
> counter-intuitive to me.  I would rather expect that "\dA pattern"
> would just filter results of "\dA", but it displays different
> information.  I suggest rename displaying access method properties
> from "\dA pattern" to different.

\dA+ maybe?  Then ...

> And leave "\dA pattern" just filter results of "\dA".

"\dA+ pattern" works intuitively, I think.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 14:21, Tsunakawa, Takayuki
 wrote:
>
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
> > I personally don't think that's true.  The only way you'll notice the
> > LockReleaseAll() overhead is to execute very fast queries with a
> > bloated lock table.  It's pretty hard to notice that a single 0.1ms
> > query is slow. You'll need to execute thousands of them before you'll
> > be able to measure it, and once you've done that, the lock shrink code
> > will have run and the query will be performing optimally again.
>
> Maybe so.  Will the difference be noticeable between plan_cache_mode=auto 
> (default) and plan_cache_mode=custom?

For the use case we've been measuring with partitioned tables and the
generic plan generation causing a sudden spike in the number of
obtained locks, then having plan_cache_mode = force_custom_plan will
cause the lock table not to become bloated. I'm not sure there's
anything interesting to measure there. The only additional code that
gets executed is the hash_get_num_entries() and possibly
hash_get_max_bucket. Maybe it's worth swapping the order of those
calls since most of the time the entry will be 0 and the max bucket
count threshold won't be exceeded.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




[Patch] PQconnectPoll() is blocked if target_session_attrs is read-write

2019-07-21 Thread Matsumura, Ryo
Hi

# I rewrote my previous mail.

PQconnectPoll() is used as method for asynchronous using externally or 
internally.
If a caller confirms a socket ready for writing or reading that is
requested by return value of previous PQconnectPoll(), next PQconnectPoll()
must not be blocked. But if the caller specifies target_session_attrs to
'read-write', PQconnectPoll() may be blocked.

Detail:
If target_session_attrs is set to read-write, PQconnectPoll() calls
PQsendQuery("SHOW transaction_read_only") althogh previous return value was
PGRES_POLLING_READING not WRITING.
In result, PQsendQuery() may be blocked in pqsecure_raw_write().

I attach a patch.

Regards
Ryo Matsumura


libpq_state_change_bugfix.ver1.0.patch
Description: libpq_state_change_bugfix.ver1.0.patch


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
> I personally don't think that's true.  The only way you'll notice the
> LockReleaseAll() overhead is to execute very fast queries with a
> bloated lock table.  It's pretty hard to notice that a single 0.1ms
> query is slow. You'll need to execute thousands of them before you'll
> be able to measure it, and once you've done that, the lock shrink code
> will have run and the query will be performing optimally again.

Maybe so.  Will the difference be noticeable between plan_cache_mode=auto 
(default) and plan_cache_mode=custom?


> I voice my concerns with v5 and I wasn't really willing to push it
> with a known performance regression of 7% in a fairly valid case. v6
> does not suffer from that.

You're right.  We may have to consider the unpredictability to users by this 
hidden behavior as a compromise for higher throughput.


> > Where else does the extra overhead come from?
> 
> hash_get_num_entries(LockMethodLocalHash) == 0 &&
> + hash_get_max_bucket(LockMethodLocalHash) >
> + LOCKMETHODLOCALHASH_SHRINK_THRESHOLD)
> 
> that's executed every time, not every 1000 times.

I see.  Thanks.


Regards
Takayuki Tsunakawa



Re: pg_receivewal documentation

2019-07-21 Thread Michael Paquier
On Fri, Jul 19, 2019 at 02:04:03PM -0400, Robert Haas wrote:
> You could just say something like:
> 
> Since pg_receivewal does not apply WAL, you should not allow it to
> become a synchronous standby when synchronous_commit = remote_apply.
> If it does, it will appear to be a standby which never catches up,
> which may cause commits to block.  To avoid this, you should either
> configure an appropriate value for synchronous_standby_names, or
> specify an application_name for pg_receivewal that does not match it,
> or change the value of synchronous_commit to something other than
> remote_apply.
> 
> I think that'd be a lot more useful than enumerating the total-failure
> scenarios.

+1.  Thanks for the suggestions!  Your wording looks good to me.
--
Michael


signature.asc
Description: PGP signature


Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 12:48, Tsunakawa, Takayuki
 wrote:
>
> From: David Rowley [mailto:david.row...@2ndquadrant.com]
> > I went back to the drawing board on this and I've added some code that 
> > counts
> > the number of times we've seen the table to be oversized and just shrinks
> > the table back down on the 1000th time.  6.93% / 1000 is not all that much.
>
> I'm afraid this kind of hidden behavior would appear mysterious to users.  
> They may wonder "Why is the same query fast at first in the session (5 or 6 
> times of execution), then gets slower for a while, and gets faster again?  Is 
> there something to tune?  Am I missing something wrong with my app (e.g. how 
> to use prepared statements)?"  So I prefer v5.

I personally don't think that's true.  The only way you'll notice the
LockReleaseAll() overhead is to execute very fast queries with a
bloated lock table.  It's pretty hard to notice that a single 0.1ms
query is slow. You'll need to execute thousands of them before you'll
be able to measure it, and once you've done that, the lock shrink code
will have run and the query will be performing optimally again.

I voice my concerns with v5 and I wasn't really willing to push it
with a known performance regression of 7% in a fairly valid case. v6
does not suffer from that.

> > Of course, not all the extra overhead might be from rebuilding the table,
> > so here's a test with the updated patch.
>
> Where else does the extra overhead come from?

hash_get_num_entries(LockMethodLocalHash) == 0 &&
+ hash_get_max_bucket(LockMethodLocalHash) >
+ LOCKMETHODLOCALHASH_SHRINK_THRESHOLD)

that's executed every time, not every 1000 times.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: POC: converting Lists into arrays

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 02:45, Tom Lane  wrote:
> One small question is whether it loses if most of the subplans
> are present in the bitmap.  I imagine that would be close enough
> to break-even, but it might be worth trying to test to be sure.
> (I'd think about breaking out just the loops in question and
> testing them stand-alone, or else putting in an outer loop to
> repeat them, since as you say the surrounding work probably
> dominates.)

My 2nd test was for when all subplans were present in the bitmap. It
did show a very slight slowdown for the case were all subplans were
present in the bitmapset. However, yeah, it seems like a good idea to
try it a million times to help show the true cost.

I did:

int x = 0;

/* Patched version */
for (j = 0; j < 100; j++)
{
i = -1;
while ((i = bms_next_member(validsubplans, i)) >= 0)
{
Plan*initNode = (Plan *) list_nth(node->appendplans, i);
x++;
}
}

/* Master version */
for (j = 0; j < 100; j++)
{
ListCell *lc;
i = 0;
foreach(lc, node->appendplans)
{
Plan*initNode;
if (bms_is_member(i, validsubplans))
   {
initNode = (Plan *)lfirst(lc);
x++;
}
}
}

elog(DEBUG1, "%d", x); /* stop the compiler optimizing away the loops */

I separately commented out each one of the outer loops away before
performing the test again.

plan_cache_mode = force_generic_plan

-- Test 1 (one matching subplan) --

prepare q1(int) as select * from ht where a = $1;
execute q1(1);

Master version:

Time: 14441.332 ms (00:14.441)
Time: 13829.744 ms (00:13.830)
Time: 13753.943 ms (00:13.754)

Patched version:

Time: 41.250 ms
Time: 40.976 ms
Time: 40.853 ms

-- Test 2 (all matching subplans (8192 of them)) --

prepare q2 as select * from ht;
execute q2;

Master version:

Time: 14825.304 ms (00:14.825)
Time: 14701.601 ms (00:14.702)
Time: 14650.969 ms (00:14.651)

Patched version:

Time: 44551.811 ms (00:44.552)
Time: 44357.915 ms (00:44.358)
Time: 43454.958 ms (00:43.455)

So the bms_next_member() loop is slower when the bitmapset is fully
populated with all subplans, but way faster when there's just 1
member.  In realiy, the ExecInitNode() call drowns most of it out.
Plus a plan with more subnodes is going take longer to execute and
then shutdown the plan after too.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Fix typos and inconsistencies for HEAD (take 7)

2019-07-21 Thread Michael Paquier
On Sun, Jul 21, 2019 at 08:28:53AM +0300, Alexander Lakhin wrote:
> Please consider fixing the next pack of typos and inconsistencies in the
> tree:

Thanks, all those things look fine.  I have noticed one mistake.

> 7.44. json_plperl -> jsonb_plperlu

The path was incorrect here.

> Also, I found e-mail headers in optimizer/plan/README not relevant, so I
> propose to remove them.

Not sure about that part.

> And another finding is related to the sleep effective resolution. `man 7
> time` says "Since kernel 2.6.13, the HZ value is a kernel configuration 
> parameter  and  can  be 100, 250 (the default) ...", so the 10
> milliseconds is not the most common effective resolution nowadays.
> I propose the corresponding patch for pgsleep.c, but we have a similar
> statement in doc/.../config.sgml. I think It should be fixed too.

Fixing both places sounds adapted to me.  An alternative we could use
here is just to say something like that:
The effective resolution is only 1/HZ, which can be configured with
kernel parameter (see man 7 time), and is 4 milliseconds by
default.
--
Michael


signature.asc
Description: PGP signature


RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
> I went back to the drawing board on this and I've added some code that counts
> the number of times we've seen the table to be oversized and just shrinks
> the table back down on the 1000th time.  6.93% / 1000 is not all that much.

I'm afraid this kind of hidden behavior would appear mysterious to users.  They 
may wonder "Why is the same query fast at first in the session (5 or 6 times of 
execution), then gets slower for a while, and gets faster again?  Is there 
something to tune?  Am I missing something wrong with my app (e.g. how to use 
prepared statements)?"  So I prefer v5.


> Of course, not all the extra overhead might be from rebuilding the table,
> so here's a test with the updated patch.

Where else does the extra overhead come from?


Regards
Takayuki Tsunakawa



Re: new function for tsquery creartion

2019-07-21 Thread Where is Where
Hello everyone, I am wondering if
AROUND(N) or  is still possible? I found this thread below and the
original post
https://www.postgresql.org/message-id/fe93ff7e9ad79196486ada79e268%40postgrespro.ru
mentioned the proposed feature: 'New operator AROUND(N). It matches if the
distance between words(or maybe phrases) is less than or equal to N.'

currently in tsquery_phrase(query1 tsquery, query2 tsquery, distance
integer) the distaince is searching a fixed distance, is there way to
search maximum distance so the search returns query1 followed by query2 up
to a certain distance? like the AROUND(N) or  mentioned in the thread?

Thank you!



On Mon, Jul 22, 2019 at 9:13 AM Dmitry Ivanov 
wrote:

> Hi everyone,
>
> I'd like to share some intermediate results. Here's what has changed:
>
>
> 1. OR operator is now case-insensitive. Moreover, trailing whitespace is
> no longer used to identify it:
>
> select websearch_to_tsquery('simple', 'abc or');
>   websearch_to_tsquery
> --
>   'abc' & 'or'
> (1 row)
>
> select websearch_to_tsquery('simple', 'abc or(def)');
>   websearch_to_tsquery
> --
>   'abc' | 'def'
> (1 row)
>
> select websearch_to_tsquery('simple', 'abc or!def');
>   websearch_to_tsquery
> --
>   'abc' | 'def'
> (1 row)
>
>
> 2. AROUND(N) has been dropped. I hope that  operator will allow us
> to implement it with a few lines of code.
>
> 3. websearch_to_tsquery() now tolerates various syntax errors, for
> instance:
>
> Misused operators:
>
> 'abc &'
> '| abc'
> '<- def'
>
> Missing parentheses:
>
> 'abc & (def <-> (cat or rat'
>
> Other sorts of nonsense:
>
> 'abc &--|| def'  =>  'abc' & !!'def'
> 'abc:def'  =>  'abc':D & 'ef'
>
> This, however, doesn't mean that the result will always be adequate (who
> would have thought?). Overall, current implementation follows the GIGO
> principle. In theory, this would allow us to use user-supplied websearch
> strings (but see gotchas), even if they don't make much sense. Better
> then nothing, right?
>
> 4. A small refactoring: I've replaced all WAIT* macros with a enum for
> better debugging (names look much nicer in GDB). Hope this is
> acceptable.
>
> 5. Finally, I've added a few more comments and tests. I haven't checked
> the code coverage, though.
>
>
> A few gotchas:
>
> I haven't touched gettoken_tsvector() yet. As a result, the following
> queries produce errors:
>
> select websearch_to_tsquery('simple', );
> ERROR:  syntax error in tsquery: "'"
>
> select websearch_to_tsquery('simple', '\');
> ERROR:  there is no escaped character: "\"
>
> Maybe there's more. The question is: should we fix those, or it's fine
> as it is? I don't have a strong opinion about this.
>
> --
> Dmitry Ivanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company


Re: Performance issue in foreign-key-aware join estimation

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 01:50, David Rowley  wrote:
>
> On Mon, 22 Jul 2019 at 00:44, Andreas Seltenreich  wrote:
> > sqlsmith triggers an assertion in this commit (3373c7155350):
> >
> > TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: 
> > "equivclass.c", Line: 764)
>
> Thanks for the report.
>
> It looks like this is caused by the join removal code removing the
> LEFT JOIN and leaving a dead rel in the eclasses ec_relids.  The fix
> is likely either to adjust the Assert to allow that or to add an if
> test so that we only bother calling bms_add_member for base rels.  I'm
> not quite sure yet.

I ended up adjusting the Assert to allow dead rels too.  I thought
about adding an if test so we only do the bms_add_member for base
rels, but I didn't really like the special case where eclass_indexes
wouldn't be correctly set for dead rels. I had thoughts that dead rels
were not common enough to go to additional trouble over.  That's
debatable of course.  I also thought about removing the Assert
completely, but it does help verify we don't get anything unexpected
in ec_relids.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: The flinfo->fn_extra question, from me this time.

2019-07-21 Thread Tom Lane
Chapman Flack  writes:
> Until now, I had assumed that SFRM_ValuePerCall mode might offer some
> benefits, such as the possibility of pipelining certain queries and not
> building up a whole tuplestore in advance.

> But looking in the code, I'm getting the impression that those
> benefits are only theoretical future ones, as ExecMakeTableFunctionResult
> implements SFRM_ValuePerCall mode by ... repeatedly calling the function
> to build up a whole tuplestore in advance.

Yes, that's the case for a SRF in FROM.  A SRF in the targetlist
actually does get the chance to pipeline, if it implements ValuePerCall.

The FROM case could be improved perhaps, if somebody wanted to put
time into it.  You'd still need to be prepared to build a tuplestore,
in case of rescan or backwards fetch; but in principle you could return
rows immediately while stashing them aside in a tuplestore.

regards, tom lane




Re: The flinfo->fn_extra question, from me this time.

2019-07-21 Thread Chapman Flack
On 06/15/19 21:46, Chapman Flack wrote:
> On 06/15/19 21:21, Tom Lane wrote:
>> Yup.  (Of course, you don't have to use the SRF_FIRSTCALL_INIT
>> infrastructure.)
> 
> That had crossed my mind ... but it seems there's around 80 or 100
> lines of good stuff there that'd be a shame to duplicate. If only

I suppose that's only if I want to continue using SFRM_ValuePerCall mode.

SFRM_Materialize mode could remove a good deal of complexity currently
in PL/Java around managing memory contexts, SPI_connect, etc. through
multiple calls ... and I'd also have fn_extra all to myself.

Until now, I had assumed that SFRM_ValuePerCall mode might offer some
benefits, such as the possibility of pipelining certain queries and not
building up a whole tuplestore in advance.

But looking in the code, I'm getting the impression that those
benefits are only theoretical future ones, as ExecMakeTableFunctionResult
implements SFRM_ValuePerCall mode by ... repeatedly calling the function
to build up a whole tuplestore in advance.

Am I right about that? Are there other sites from which a SRF might be
called that I haven't found, where ValuePerCall mode might actually
support some form of pipelining? Are there actual cases where allowedModes
might not contain SFRM_Materialize?

Or is the ValuePerCall variant currently there just to support possible
future such cases, none of which exist at the moment?

Regards,
-Chap




Re: POC: converting Lists into arrays

2019-07-21 Thread Tom Lane
I wrote:
>> * Rationalize places that are using combinations of list_copy and
>> list_concat, probably by inventing an additional list-concatenation
>> primitive that modifies neither input.

> I poked around to see what we have in this department.  There seem to
> be several identifiable use-cases:
> [ ... analysis ... ]

Here's a proposed patch based on that.  I added list_concat_copy()
and then simplified callers as appropriate.

It turns out there are a *lot* of places where list_concat() callers
are now leaking the second input list (where before they just leaked
that list's header).  So I've got mixed emotions about the choice not
to add a variant function that list_free's the second input.  On the
other hand, the leakage probably amounts to nothing significant in
all or nearly all of these places, and I'm concerned about the
readability/understandability loss of having an extra version of
list_concat.  Anybody have an opinion about that?

Other than that point, I think this is pretty much good to go.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 548ae66..50d1f18 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1497,7 +1497,7 @@ deparseFromExprForRel(StringInfo buf, PlannerInfo *root, RelOptInfo *foreignrel,
 			if (fpinfo->jointype == JOIN_INNER)
 			{
 *ignore_conds = list_concat(*ignore_conds,
-			list_copy(fpinfo->joinclauses));
+			fpinfo->joinclauses);
 fpinfo->joinclauses = NIL;
 			}
 
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 033aeb2..b9f90e9 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2667,8 +2667,8 @@ estimate_path_cost_size(PlannerInfo *root,
 		 * baserestrictinfo plus any extra join_conds relevant to this
 		 * particular path.
 		 */
-		remote_conds = list_concat(list_copy(remote_param_join_conds),
-   fpinfo->remote_conds);
+		remote_conds = list_concat_copy(remote_param_join_conds,
+		fpinfo->remote_conds);
 
 		/*
 		 * Construct EXPLAIN query including the desired SELECT, FROM, and
@@ -5102,23 +5102,23 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 	{
 		case JOIN_INNER:
 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-			   list_copy(fpinfo_i->remote_conds));
+			   fpinfo_i->remote_conds);
 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-			   list_copy(fpinfo_o->remote_conds));
+			   fpinfo_o->remote_conds);
 			break;
 
 		case JOIN_LEFT:
 			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
-			  list_copy(fpinfo_i->remote_conds));
+			  fpinfo_i->remote_conds);
 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-			   list_copy(fpinfo_o->remote_conds));
+			   fpinfo_o->remote_conds);
 			break;
 
 		case JOIN_RIGHT:
 			fpinfo->joinclauses = list_concat(fpinfo->joinclauses,
-			  list_copy(fpinfo_o->remote_conds));
+			  fpinfo_o->remote_conds);
 			fpinfo->remote_conds = list_concat(fpinfo->remote_conds,
-			   list_copy(fpinfo_i->remote_conds));
+			   fpinfo_i->remote_conds);
 			break;
 
 		case JOIN_FULL:
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fd29927..e0af665 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -526,8 +526,8 @@ DefineIndex(Oid relationId,
 	 * is part of the key columns, and anything equal to and over is part of
 	 * the INCLUDE columns.
 	 */
-	allIndexParams = list_concat(list_copy(stmt->indexParams),
- list_copy(stmt->indexIncludingParams));
+	allIndexParams = list_concat_copy(stmt->indexParams,
+	  stmt->indexIncludingParams);
 	numberOfAttributes = list_length(allIndexParams);
 
 	if (numberOfAttributes <= 0)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 64a9e58..83337c2 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -719,8 +719,7 @@ init_sql_fcache(FmgrInfo *finfo, Oid collation, bool lazyEvalOK)
 		  fcache->pinfo,
 		  NULL);
 		queryTree_list = lappend(queryTree_list, queryTree_sublist);
-		flat_query_list = list_concat(flat_query_list,
-	  list_copy(queryTree_sublist));
+		flat_query_list = list_concat(flat_query_list, queryTree_sublist);
 	}
 
 	check_sql_fn_statements(flat_query_list);
diff --git a/src/backend/nodes/list.c b/src/backend/nodes/list.c
index 9163464..6bf13ae 100644
--- a/src/backend/nodes/list.c
+++ b/src/backend/nodes/list.c
@@ -501,12 +501,15 @@ lcons_oid(Oid datum, List *list)
 }
 
 /*
- * Concatenate list2 to the end of list1, and return list1. list1 is
- * destructively changed, list2 is not. (However, in the case of pointer
- * lists, list1 and list2 will point to the same structures.) 

Re: Bad canonicalization for dateranges with 'infinity' bounds

2019-07-21 Thread Stephen Frost
Greetings,

* Jeff Davis (pg...@j-davis.com) wrote:
> On Thu, 2019-07-18 at 17:36 -0400, Tom Lane wrote:
> > (The commit message doesn't seem to have made it to the pgsql-
> > committers
> > list either, but that's probably an independent issue.)
> 
> I was curious about that as well.

The whitelists we put in place expire after a certain period of time
(iirc, it's 1 year currently) and then your posts end up getting
moderated.

If you register that address as an alternate for you, you should be
able to post with it without needing to be on a whitelist.

Thanks,

Stephen


signature.asc
Description: PGP signature


Queries on QMF to POSTGRE

2019-07-21 Thread JVM .
HI Team,



I’m looking to convert QMF Queries , QMF forms and QMF procedure to the
POSTGRESQL will it support all of them.

If yes please help us with the sample example. Or any Documentation.



Thanking in anticipation .



Regards

Jotiram More


Re: Fix typos and inconsistencies for HEAD (take 7)

2019-07-21 Thread Tom Lane
Alexander Lakhin  writes:
> Also, I found e-mail headers in optimizer/plan/README not relevant, so I
> propose to remove them.

FWIW, I think they're highly relevant, because they put a date on
that text.  I've not gone through that README lately, but I wouldn't
be surprised if it's largely obsolete --- it hasn't been maintained
in any meaningful way since Vadim wrote it.  Without the headers, a
reader would have no warning of that.

What really ought to happen, likely, is for somebody to extract
whatever is still useful there into a new(?) section in the parent
directory's README.

regards, tom lane




Re: POC: converting Lists into arrays

2019-07-21 Thread Tom Lane
David Rowley  writes:
> On Wed, 17 Jul 2019 at 11:06, Tom Lane  wrote:
>> (Actually, I doubt that any of these changes will really move the
>> performance needle in the real world.  It's more a case of wanting
>> the code to present good examples not bad ones.)

> In spirit with the above, I'd quite like to fix a small bad example
> that I ended up with in nodeAppend.c and nodeMergeappend.c for
> run-time partition pruning.

I didn't test the patch, but just by eyeball it looks sane,
and I concur it should win if the bitmap is sparse.

One small question is whether it loses if most of the subplans
are present in the bitmap.  I imagine that would be close enough
to break-even, but it might be worth trying to test to be sure.
(I'd think about breaking out just the loops in question and
testing them stand-alone, or else putting in an outer loop to
repeat them, since as you say the surrounding work probably
dominates.)

regards, tom lane




Re: Performance issue in foreign-key-aware join estimation

2019-07-21 Thread David Rowley
On Mon, 22 Jul 2019 at 00:44, Andreas Seltenreich  wrote:
> sqlsmith triggers an assertion in this commit (3373c7155350):
>
> TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: 
> "equivclass.c", Line: 764)

Thanks for the report.

It looks like this is caused by the join removal code removing the
LEFT JOIN and leaving a dead rel in the eclasses ec_relids.  The fix
is likely either to adjust the Assert to allow that or to add an if
test so that we only bother calling bms_add_member for base rels.  I'm
not quite sure yet.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: Compiler warnings with MinGW

2019-07-21 Thread Michael Paquier
On Sat, Jul 20, 2019 at 06:19:34PM +0900, Michael Paquier wrote:
> Just wanted to double-check something.  We usually don't bother
> back-patching warning fixes like this one, right?

I have double-checked the thing, and applied it only on HEAD as we
have that for some time (since 9.1 actually and 00cdd83 has improved
the original situation here).
--
Michael


signature.asc
Description: PGP signature


Re: POC: converting Lists into arrays

2019-07-21 Thread David Rowley
On Wed, 17 Jul 2019 at 11:06, Tom Lane  wrote:
> (Actually, I doubt that any of these changes will really move the
> performance needle in the real world.  It's more a case of wanting
> the code to present good examples not bad ones.)

In spirit with the above, I'd quite like to fix a small bad example
that I ended up with in nodeAppend.c and nodeMergeappend.c for
run-time partition pruning.

The code in question performs a loop over a list and checks
bms_is_member() on each element and only performs an action if the
member is present in the Bitmapset.

It would seem much more efficient just to perform a bms_next_member()
type loop then just fetch the list item with list_nth(), at least this
is certainly the case when only a small number of the list items are
indexed by the Bitmapset. With these two loops in particular, when a
large number of list items are in the set the cost of the work goes up
greatly, so it does not seem unreasonable to optimise the case for
when just a few match.

A quick test shows that it's hardly groundbreaking performance-wise,
but test 1 does seem measurable above the noise.

-- Setup
plan_cache_mode = force_generic_plan
max_locks_per_transaction = 256

create table ht (a int primary key, b int, c int) partition by hash (a);
select 'create table ht' || x::text || ' partition of ht for values
with (modulus 8192, remainder ' || (x)::text || ');' from
generate_series(0,8191) x;
\gexec

-- Test 1: Just one member in the Bitmapset.

test1.sql:
\set p 1
select * from ht where a = :p

Master:

$ pgbench -n -f test1.sql -T 60 -M prepared postgres
tps = 297.267191 (excluding connections establishing)
tps = 298.276797 (excluding connections establishing)
tps = 296.264459 (excluding connections establishing)
tps = 298.968037 (excluding connections establishing)
tps = 298.575684 (excluding connections establishing)

Patched:

$ pgbench -n -f test1.sql -T 60 -M prepared postgres
tps = 300.924254 (excluding connections establishing)
tps = 299.360196 (excluding connections establishing)
tps = 300.197024 (excluding connections establishing)
tps = 299.741215 (excluding connections establishing)
tps = 299.748088 (excluding connections establishing)

0.71% faster

-- Test 2: when all list items are found in the Bitmapset.

test2.sql:
select * from ht;

Master:

$ pgbench -n -f test2.sql -T 60 -M prepared postgres
tps = 12.526578 (excluding connections establishing)
tps = 12.528046 (excluding connections establishing)
tps = 12.491347 (excluding connections establishing)
tps = 12.538292 (excluding connections establishing)
tps = 12.528959 (excluding connections establishing)

Patched:

$ pgbench -n -f test2.sql -T 60 -M prepared postgres
tps = 12.503670 (excluding connections establishing)
tps = 12.516133 (excluding connections establishing)
tps = 12.404925 (excluding connections establishing)
tps = 12.514567 (excluding connections establishing)
tps = 12.541484 (excluding connections establishing)

0.21% slower

With that removed the slowness of test 1 is almost entirely in
AcquireExecutorLocks() and ExecCheckRTPerms(). We'd be up close to
about 30k tps instead of 300 tps if there was some solution to those
problems. I think it makes sense to remove the inefficient loops and
leave the just final two bottlenecks, in the meantime.

Patch attached.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


list_fixups_list_nth.patch
Description: Binary data


Re: Compile from source using latest Microsoft Windows SDK

2019-07-21 Thread Andrew Dunstan


On 7/19/19 9:10 PM, Michael Paquier wrote:
> On Fri, Jul 19, 2019 at 08:30:38AM -0400, Andrew Dunstan wrote:
>> My tests of the VS2017 stuff used this install mechanism on a fresh
>> Windows instance:
>>
>> choco install -y visualstudio2017-workload-vctools --package-parameters
>> "--includeOptional"
>>
>> This installed Windows Kits 8.1 and 10, among many other things.
> So you have bypassed the problem by installing the v8.1 SDK.  And if
> you don't do that and only include the v10 SDK, then you see the
> problem.  Functionally, it also means that with a VS2017 compilation
> the SDK version is forcibly downgraded, isn't that a bad idea anyway?



For VS2017, the 8.1 SDK is part of the optional package set (see
)
but for VS2019 it is not (see
)
so yes, we need to deal with the issue, but it's really only a major
issue for VS2019, ISTM. I guess we might need a test for what SDK is
available? That's going to be fun ...


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services






Re: Psql patch to show access methods info

2019-07-21 Thread Alexander Korotkov
On Mon, Jul 15, 2019 at 10:05 PM Nikita Glukhov  wrote:
> On 01.07.2019 14:06, Thomas Munro wrote:
>
> > On Sun, Mar 31, 2019 at 2:13 PM  wrote:
> >> Thanks for review.
> > Hi Sergey,
> >
> > A new Commitfest is here and this doesn't apply -- could you please
> > post a rebase?
> >
> > Thanks,
>
> Attached 7th version of the patches rebased onto current master.

Thank you for posting this patch.  It looks good to me.

I've one note.  Behavior of "\dA" and "\dA pattern" look
counter-intuitive to me.  I would rather expect that "\dA pattern"
would just filter results of "\dA", but it displays different
information.  I suggest rename displaying access method properties
from "\dA pattern" to different.  And leave "\dA pattern" just filter
results of "\dA".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: Performance issue in foreign-key-aware join estimation

2019-07-21 Thread Andreas Seltenreich
David Rowley writes:

> On Thu, 18 Jul 2019 at 19:24, David Rowley  
> wrote:
>> Unless there's some objection, I'll be looking into pushing both 0001
>> and 0002 in a single commit in the next few days.
>
> I've pushed this after doing a bit of final tweaking.

sqlsmith triggers an assertion in this commit (3373c7155350):

TRAP: FailedAssertion("!(rel->reloptkind == RELOPT_BASEREL)", File: 
"equivclass.c", Line: 764)

Here's a somewhat reduced testcase to be run on the regression db:

--8<---cut here---start->8---
select
max(date_mii(now()::date, 42)) over (partition by subq_1.c9 order by c3),
min(c3) over (partition by subq_1.c8 )
from
  (select 1 as c3 from public.partr_def2 as ref_0
left join public.num_exp_power_10_ln as sample_0
on (ref_0.a = sample_0.id ) ) as subq_0
right join (select 1 as c8, 1 as c9) as subq_1
on (true);
--8<---cut here---end--->8---

Backtrace below.

regards,
Andreas


(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x7face8834535 in __GI_abort () at abort.c:79
#2  0x562b8d2731a3 in ExceptionalCondition (
conditionName=conditionName@entry=0x562b8d418320 "!(rel->reloptkind == 
RELOPT_BASEREL)",
errorType=errorType@entry=0x562b8d2c601d "FailedAssertion", 
fileName=fileName@entry=0x562b8d418190 "equivclass.c",
lineNumber=lineNumber@entry=764) at assert.c:54
#3  0x562b8d067ddc in get_eclass_for_sort_expr 
(root=root@entry=0x562b8e9077c8, expr=expr@entry=0x7facdf610030,
nullable_relids=0x7facdf6216f8, nullable_relids@entry=0x7facdf615560, 
opfamilies=0x7facdf621348,
opcintype=opcintype@entry=2277, collation=collation@entry=0, sortref=1, 
rel=0x0, create_it=true) at equivclass.c:764
#4  0x562b8d07326e in make_pathkey_from_sortinfo (root=0x562b8e9077c8, 
expr=0x7facdf610030, nullable_relids=0x7facdf615560,
opfamily=397, opcintype=2277, collation=0, reverse_sort=false, 
nulls_first=false, sortref=1, rel=0x0, create_it=true)
at pathkeys.c:215
#5  0x562b8d07401f in make_pathkey_from_sortop (create_it=true, sortref=1, 
nulls_first=false, ordering_op=1072,
nullable_relids=0x7facdf615560, expr=0x7facdf610030, root=0x562b8e9077c8) 
at pathkeys.c:258
#6  make_pathkeys_for_sortclauses (root=root@entry=0x562b8e9077c8, 
sortclauses=sortclauses@entry=0x7facdf620f98,
tlist=tlist@entry=0x562b8e929768) at pathkeys.c:1086
#7  0x562b8d082533 in make_pathkeys_for_window 
(root=root@entry=0x562b8e9077c8, tlist=0x562b8e929768, wc=,
wc=) at planner.c:5642
#8  0x562b8d087c81 in create_one_window_path (wflists=, 
activeWindows=,
output_target=, input_target=, 
path=0x7facdf620ea8, window_rel=,
root=) at planner.c:4663
#9  create_window_paths (activeWindows=, wflists=0x7facdf613b80, 
output_target_parallel_safe=,
output_target=0x7facdf6205b8, input_target=0x7facdf6206f8, 
input_rel=, root=0x562b8e9077c8) at planner.c:4588
#10 grouping_planner (root=, inheritance_update=false, 
tuple_fraction=) at planner.c:2211
#11 0x562b8d089dfa in subquery_planner (glob=glob@entry=0x562b8e91bb20, 
parse=parse@entry=0x562b8e906740,
parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, 
tuple_fraction=tuple_fraction@entry=0)
at planner.c:1013
#12 0x562b8d08afa7 in standard_planner (parse=0x562b8e906740, 
cursorOptions=256, boundParams=) at planner.c:406




Re: [PATCH] Incremental sort (was: PoC: Partial sort)

2019-07-21 Thread Tomas Vondra

On Sat, Jul 20, 2019 at 07:37:08PM -0400, James Coleman wrote:

..

Yes, you're right - an extra sort node would be necessary. But I don't
think that changes the idea behind that example. The question is whether
the extra sorts below the gather would be cheaper than doing a large sort
on top of it - but I don't see why wouldn't that be the case, and if we
only need a couple of rows from the beginning ...


Ah, I see. Right now we get:

Limit
  ->  Incremental Sort
Sort Key: a, (sum(expensive_function(c)))
Presorted Key: a
->  Finalize GroupAggregate
  Group Key: a, b
  ->  Gather Merge
Workers Planned: 2
->  Partial GroupAggregate
  Group Key: a, b
  ->  Parallel Index Scan using t_a_b on t

even with the parallel additions to create_ordered_paths() -- that
addition doesn't actually add any new parallel paths because
input_rel->partial_pathlist != NIL is false (I'm not sure why yet), so
if we want (if I understand correctly) something more like:



Well, in this particular case it's fairly simple, I think. We only call
the create_ordered_paths() from the grouping planner, on the upper
relation that represents the result of the GROUP BY. But that means it
has to see only the finalized result (after Finalize GroupAggregate). So
it can't see partial aggregation or any other partial path. So in this
case it seems guaranteed (partial_pathlist == NIL).

So maybe we should not be looking at GROUP BY queries, which probably
can't hit this particular code path at all - we need a different type of
upper relation. For example UNION ALL should hit this code, I think.

So maybe try

  select * from t union all select * from t order by a, b limit 10;

and that will hit this condition with partial_pathlist != NIL.

I don't know if such queries can benefit from incremental sort, though.
There are other upper relations too:

 typedef enum UpperRelationKind
 {
 UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */
 UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
  * any */
 UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */
 UPPERREL_WINDOW,/* result of window functions, if any */
 UPPERREL_DISTINCT,  /* result of "SELECT DISTINCT", if any */
 UPPERREL_ORDERED,   /* result of ORDER BY, if any */
 UPPERREL_FINAL  /* result of any remaining top-level actions */
 /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;



I'm still struggling to understand though how another incremental sort
below the gather-merge would actually be able to help us. For one I'm
not sure it would be less expensive,


That's a good question. I think in general we agree that if we can get
the gather merge to sort the data the way the operation above the gather
merge (which is the first operation that can't operate in parallel
mode), that's probably beneficial.

So this pattern seems reasonable:

   -> something
  -> non-parallel operation
 -> gather merge
-> incremental sort
   -> something

And it's likely faster, especially when the parts above this can
leverage the lower startup cost. Say, when there's an explicit LIMIT.

I think the question is where exactly do we add the incremental sort.
It's quite possible some of the places we modified are redundant, at
least for some queries. Not sure.


but more importantly I'm not sure how we could do that and maintain
correctness. Wouldn't a per-worker sum not actually be useful in
sorting since it has no predictable impact on the ordering of the total
sum?


Yes, you're right - that wouldn't be correct. The reason why I've been
thinking about such examples because distributed databases do make such
things in some cases, and we might do that too with partitioning.

Consider a simple example

   create table t (a int, b int, c int) partition by hash (a);
   create table t0 partition of t for values with (modulus 4, remainder 0);
   create table t1 partition of t for values with (modulus 4, remainder 1);
   create table t2 partition of t for values with (modulus 4, remainder 2);
   create table t3 partition of t for values with (modulus 4, remainder 3);
   insert into t select mod(i,1000), i, i from generate_series(1,100) s(i);
   analyze t;
   select a, count(b) from t group by a order by a, count(b) limit 10;

In this case we might do a plan similar to what I proposed, assuming
each worker gets to execute on a different partition (because then we
know each worker will see distinct groups, thanks to the partitioning).

But AFAIK we don't do such optimizations yet, so it's probably just a
distraction.


Describing that got me thinking of similar cases where ordering of the
partial aggregate would (in theory) be a correct 

Re: Speed up transaction completion faster after many relations are accessed in a transaction

2019-07-21 Thread David Rowley
On Thu, 18 Jul 2019 at 14:53, David Rowley  wrote:
> Is anyone particularly concerned about the worst-case slowdown here
> being about 1.54%?  The best case, and arguably a more realistic case
> above showed a 34% speedup for the best case.

I took a bit more time to test the performance on this. I thought I
might have been a bit unfair on the patch by giving it completely
empty tables to look at. It just seems too unrealistic to have a large
number of completely empty partitions. I decided to come up with a
more realistic case where there are 140 partitions but we're
performing an index scan to find a record that can exist in only 1 of
the 140 partitions.

create table hp (a int, b int) partition by hash(a);
select 'create table hp'||x||' partition of hp for values with
(modulus 140, remainder ' || x || ');' from generate_series(0,139)x;
create index on hp (b);
insert into hp select x,x from generate_series(1, 14) x;
analyze hp;

select3.sql: select * from hp where b = 1

Master:

$ pgbench -n -f select3.sql -T 60 -M prepared postgres
tps = 2124.591367 (excluding connections establishing)
tps = 2158.754837 (excluding connections establishing)
tps = 2146.348465 (excluding connections establishing)
tps = 2148.995800 (excluding connections establishing)
tps = 2154.223600 (excluding connections establishing)

Patched:

$ pgbench -n -f select3.sql -T 60 -M prepared postgres
tps = 2002.480729 (excluding connections establishing)
tps = 1997.272944 (excluding connections establishing)
tps = 1992.527079 (excluding connections establishing)
tps = 1995.789367 (excluding connections establishing)
tps = 2001.195760 (excluding connections establishing)

so it turned out it's even slower, and not by a small amount either!
Patched is 6.93% slower on average with this case :-(

I went back to the drawing board on this and I've added some code that
counts the number of times we've seen the table to be oversized and
just shrinks the table back down on the 1000th time.  6.93% / 1000 is
not all that much. Of course, not all the extra overhead might be from
rebuilding the table, so here's a test with the updated patch.

$ pgbench -n -f select3.sql -T 60 -M prepared postgres
tps = 2142.414323 (excluding connections establishing)
tps = 2139.666899 (excluding connections establishing)
tps = 2138.744789 (excluding connections establishing)
tps = 2138.300299 (excluding connections establishing)
tps = 2137.346278 (excluding connections establishing)

Just a 0.34% drop. Pretty hard to pick that out the noise.

Testing the original case that shows the speedup:

create table ht (a int primary key, b int, c int) partition by hash (a);
select 'create table ht' || x::text || ' partition of ht for values
with (modulus 8192, remainder ' || (x)::text || ');' from
generate_series(0,8191) x;
\gexec

select.sql:
\set p 1
select * from ht where a = :p

Master:

$ pgbench -n -f select.sql -T 60 -M prepared postgres
tps = 10172.035036 (excluding connections establishing)
tps = 10192.780529 (excluding connections establishing)
tps = 10331.306003 (excluding connections establishing)

Patched:

$ pgbench -n -f select.sql -T 60 -M prepared postgres
tps = 15080.765549 (excluding connections establishing)
tps = 14994.404069 (excluding connections establishing)
tps = 14982.923480 (excluding connections establishing)

That seems fine, 46% faster.

v6 is attached.

I plan to push this in a few days unless someone objects.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


shrink_bloated_locallocktable_v6.patch
Description: Binary data