Re: [PoC] Reducing planning time when tables have many partitions

2024-05-15 Thread Yuya Watari
Hello,

Thank you for reviewing these patches.

On Thu, May 2, 2024 at 11:35 PM jian he  wrote:
> on v24-0001:
> +/*
> + * add_eq_member - build a new EquivalenceMember and add it to an EC
> + */
> +static EquivalenceMember *
> +add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
> +  JoinDomain *jdomain, Oid datatype)
> +{
> + EquivalenceMember *em = make_eq_member(ec, expr, relids, jdomain,
> +   NULL, datatype);
> +
> + ec->ec_members = lappend(ec->ec_members, em);
> + return em;
> +}
> +
> this part seems so weird to me.
> add_eq_member function was added very very long ago,
> why do we create a function with the same function name?
>
> also I didn't see deletion of original add_eq_member function
> (https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/optimizer/path/equivclass.c#n516)
> in v24-0001.

Actually, this patch does not recreate the add_eq_member() function
but splits it into two functions: add_eq_member() and
make_eq_member().

The reason why planning takes so long time in the current
implementation is that EquivalenceClasses have a large number of child
EquivalenceMembers, and the linear search for them is time-consuming.
To solve this problem, the patch makes EquivalenceClasses have only
parent members. There are few parent members, so we can speed up the
search. In the patch, the child members are introduced when needed.

The add_eq_member() function originally created EquivalenceMembers and
added them to ec_members. In the patch, this function is split into
the following two functions.

1. make_eq_member
Creates a new (parent or child) EquivalenceMember and returns it
without adding it to ec_members.
2. add_eq_member
Creates a new parent (not child) EquivalenceMember and adds it to
ec_members. Internally calls make_eq_member.

When we create parent members, we simply call add_eq_member(). This is
the same as the current implementation. When we create child members,
we have to do something different. Look at the code below. The
add_child_rel_equivalences() function creates child members. The patch
creates child EquivalenceMembers by the make_eq_member() function and
stores them in RelOptInfo (child_rel->eclass_child_members) instead of
their parent EquivalenceClass->ec_members. When we need child
EquivalenceMembers, we get them via RelOptInfos.

=
void
add_child_rel_equivalences(PlannerInfo *root,
   AppendRelInfo *appinfo,
   RelOptInfo *parent_rel,
   RelOptInfo *child_rel)
{
...
i = -1;
while ((i = bms_next_member(parent_rel->eclass_indexes, i)) >= 0)
{
...
foreach(lc, cur_ec->ec_members)
{
...
if (bms_is_subset(cur_em->em_relids, top_parent_relids) &&
!bms_is_empty(cur_em->em_relids))
{
/* OK, generate transformed child version */
...
child_em = make_eq_member(cur_ec, child_expr, new_relids,
  cur_em->em_jdomain,
  cur_em, cur_em->em_datatype);
child_rel->eclass_child_members =
lappend(child_rel->eclass_child_members,
  child_em);
...
}
}
}
}
=

I didn't change the name of add_eq_member, but it might be better to
change it to something like add_parent_eq_member(). Alternatively,
creating a new function named add_child_eq_member() that adds child
members to RelOptInfo can be a solution. I will consider these changes
in the next version.

> Obviously, now I cannot compile it correctly.
> What am I missing?

Thank you for pointing this out. This is due to a conflict with a
recent commit [1]. This commit introduces a new function named
add_setop_child_rel_equivalences(), which is quoted below. This
function creates a new child EquivalenceMember by calling
add_eq_member(). We have to adjust this function to make my patch
work, but it is not so easy. I'm sorry it will take some time to solve
this conflict, but I will post a new version when it is fixed.

=
/*
 * add_setop_child_rel_equivalences
 *  Add equivalence members for each non-resjunk target in 'child_tlist'
 *  to the EquivalenceClass in the corresponding setop_pathkey's pk_eclass.
 *
 * 'root' is the PlannerInfo belonging to the top-level set operation.
 * 'child_rel' is the RelOptInfo of the child relation we're adding
 * EquivalenceMembers for.
 * 'child_tlist' is the target list for the setop child relation.  The target
 * list expressions are what we add as EquivalenceMembers.
 * 'setop_pathkeys' is a list of PathKeys which must contain an entry for each
 * non-resjunk target in 'child_tlist'.
 */
void
add_setop_child_rel_equivalences(PlannerInfo *root, RelOptInfo *child_rel,
 List *child_tlist, List *setop_pathkeys)
{
ListCell   *lc;
ListCell   

Re: [PoC] Reducing planning time when tables have many partitions

2024-05-02 Thread jian he
On Thu, May 2, 2024 at 3:57 PM Yuya Watari  wrote:
>

hi. sorry to bother you, maybe a dumb question.

trying to understand something under the hood.
currently I only applied
v24-0001-Speed-up-searches-for-child-EquivalenceMembers.patch.

on v24-0001:
+/*
+ * add_eq_member - build a new EquivalenceMember and add it to an EC
+ */
+static EquivalenceMember *
+add_eq_member(EquivalenceClass *ec, Expr *expr, Relids relids,
+  JoinDomain *jdomain, Oid datatype)
+{
+ EquivalenceMember *em = make_eq_member(ec, expr, relids, jdomain,
+   NULL, datatype);
+
+ ec->ec_members = lappend(ec->ec_members, em);
+ return em;
+}
+
this part seems so weird to me.
add_eq_member function was added very very long ago,
why do we create a function with the same function name?

also I didn't see deletion of original add_eq_member function
(https://git.postgresql.org/cgit/postgresql.git/tree/src/backend/optimizer/path/equivclass.c#n516)
in v24-0001.

Obviously, now I cannot compile it correctly.
What am I missing?




Re: [PoC] Reducing planning time when tables have many partitions

2024-05-02 Thread Yuya Watari
Hello Ashutosh,

Thank you for your email and for reviewing the patch. I sincerely
apologize for the delay in responding.

On Wed, Mar 6, 2024 at 11:16 PM Ashutosh Bapat
 wrote:
> here's summary of observations
> 1. The patch improves planning time significantly (3X to 20X) and the 
> improvement increases with the number of tables being joined.
> 2. In the assert enabled build the patch slows down (in comparison to HEAD) 
> planning with higher number of tables in the join. You may want to 
> investigate this. But this is still better than my earlier measurements.
> 3. The patch increased memory consumption by planner. But the numbers have 
> improved since my last measurement. Still it will be good to investigate what 
> causes this extra memory consumption.
> 4. Generally with the assert enabled build planner consumes more memory with 
> or without patch. From my previous experience this might be due to Bitmapset 
> objects created within Assert() calls.

Thank you for testing the patch and sharing the results. For comment
#1, these results show the effectiveness of the patch.

For comment #2, I agree that we should not slow down assert-enabled
builds. The patch adds a lot of assertions to avoid adding bugs, but
they might be too excessive. I will reconsider these assertions and
remove unnecessary ones.

For comments #3 and #4, while the patch improves time complexity, it
has some negative impacts on space complexity. The patch uses a
Bitmapset-based index to speed up searching for EquivalenceMembers and
RestrictInfos. Reducing this memory consumption is a little hard, but
this is a very important problem in committing this patch, so I will
investigate this further.

> Does v24-0002 have any relation/overlap with my patches to reduce memory 
> consumed by RestrictInfos? Those patches have code to avoid creating 
> duplicate RestrictInfos (including commuted RestrictInfos) from ECs. [2]

Thank you for sharing these patches. My patch may be related to your
patches. My patch speeds up slow linear searches over
EquivalenceMembers and RestrictInfos. It uses several approaches, one
of which is the Bitmapset-based index. Bitmapsets are space
inefficient, so if there are many EquivalenceMembers and
RestrictInfos, this index becomes large. This is true for highly
partitioned cases, where there are a lot of similar (or duplicate)
elements. Eliminating such duplicate elements may help my patch reduce
memory consumption. I will investigate this further.

Unfortunately, I've been busy due to work, so I may not be able to
respond soon. I really apologize for this. However, I will look into
the patches, including yours, and share further information if found.

Again, I apologize for my late response and appreciate your kind review.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2024-03-06 Thread Ashutosh Bapat
Hi Yuya

On Wed, Feb 28, 2024 at 4:48 PM Yuya Watari  wrote:

> Hello,
>
> On Tue, Feb 13, 2024 at 6:19 AM Alena Rybakina 
> wrote:
> >
> > Yes, it is working correctly now with the assertion check. I suppose
> > it's better to add this code with an additional comment and a
> > recommendation for other developers
> > to use it for checking in case of manipulations with the list of
> > equivalences.
>
> Thank you for your reply and advice. I have added this assertion so
> that other developers can use it in the future.
>
> I also merged recent changes and attached a new version, v24. Since
> this thread is getting long, I will summarize the patches.
>
>
>
I repeated my experiments in [1]. I ran 2, 3, 4, 5-way self-joins on a
partitioned table with 1000 partitions.

Planning time measurement
---
Without patch with an assert enabled build and enable_partitionwise_join =
false, those joins took 435.31 ms, 1629.16 ms, 4701.59 ms and 11976.69 ms
respectively.
Keeping other things the same, with the patch, they took 247.33 ms, 1318.57
ms, 6960.31 ms and 28463.24 ms respectively.
Those with enable_partitionwise_join = true are 488.73 ms, 2102.12 ms,
6906.02 ms and 21300.77 ms respectively without the patch.
And with the patch, 277.22 ms, 1542.48 ms, 7879.35 ms, and 31826.39 ms.

Without patch without assert enabled build and enable_partitionwise_join =
false, the joins take 298.43 ms, 1179.15 ms, 3518.84 ms and 9149.76 ms
respectively.
Keeping other things the same, with the patch, the joins take 65.70 ms,
131.29 ms, 247.67 ms and 477.74 ms respectively.
Those with enable_partitionwise_join = true are 348.48 ms, 1576.11 ms,
5417.98 and 17433.65 ms respectively without the patch.
And with the patch 95.15 ms, 333.99 ms, 1084.06 ms, and 3609.42 ms.

Memory usage measurement
---
Without patch, with an assert enabled build and enable_partitionwise_join =
false, memory used is 19 MB, 45 MB, 83 MB and 149 MB respectively.
Keeping other things the same, with the patch, memory used is 23 MB, 66 MB,
159 MB and 353 MB respectively.
That with enable_partitionwise_join = true is 40 MB, 151 MB, 464 MB and
1663 MB respectively.
And with the patch it is 44 MB, 172 MB, 540 MB and 1868 MB respectively.

Without patch without assert enabled build and enable_partitionwise_join =
false, memory used is 17 MB, 41 MB, 77 MB, and 140 MB resp.
Keeping other things the same with the patch memory used is 21 MB, 62 MB,
152 MB and 341 MB resp.
That with enable_partitionwise_join = true is 37 MB, 138 MB, 428 MB and
1495 MB resp.
And with the patch it is 42 MB, 160 MB, 496 MB and 1705 MB resp.

here's summary of observations
1. The patch improves planning time significantly (3X to 20X) and the
improvement increases with the number of tables being joined.
2. In the assert enabled build the patch slows down (in comparison to HEAD)
planning with higher number of tables in the join. You may want to
investigate this. But this is still better than my earlier measurements.
3. The patch increased memory consumption by planner. But the numbers have
improved since my last measurement. Still it will be good to investigate
what causes this extra memory consumption.
4. Generally with the assert enabled build planner consumes more memory
with or without patch. From my previous experience this might be due to
Bitmapset objects created within Assert() calls.

Does v24-0002 have any relation/overlap with my patches to reduce memory
consumed by RestrictInfos? Those patches have code to avoid creating
duplicate RestrictInfos (including commuted RestrictInfos) from ECs. [2]

[1]
https://www.postgresql.org/message-id/CAExHW5uVZ3E5RT9cXHaxQ_DEK7tasaMN=d6rphcao5gcxan...@mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAExHW5tEvzM%3D%2BLpN%3DyhU%2BP33D%2B%3D7x6fhzwDwNRM971UJunRTkQ%40mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat


Re: [PoC] Reducing planning time when tables have many partitions

2024-02-12 Thread Alena Rybakina

Hi! Sorry my delayed reply too.

On 17.01.2024 12:33, Yuya Watari wrote:

Hello Alena,

Thank you for your quick response, and I'm sorry for my delayed reply.

On Sun, Dec 17, 2023 at 12:41 AM Alena Rybakina
 wrote:

I thought about this earlier and was worried that the index links of the 
equivalence classes might not be referenced correctly for outer joins,
so I decided to just overwrite them and reset the previous ones.

Thank you for pointing this out. I have investigated this problem and
found a potential bug place. The code quoted below modifies
RestrictInfo's clause_relids. Here, our indexes, namely
eclass_source_indexes and eclass_derive_indexes, are based on
clause_relids, so they should be adjusted after the modification.
However, my patch didn't do that, so it may have missed some
references. The same problem occurs in places other than the quoted
one.

=
/*
  * Walker function for replace_varno()
  */
static bool
replace_varno_walker(Node *node, ReplaceVarnoContext *ctx)
{
 ...
 else if (IsA(node, RestrictInfo))
 {
 RestrictInfo *rinfo = (RestrictInfo *) node;
 ...

 if (bms_is_member(ctx->from, rinfo->clause_relids))
 {
 replace_varno((Node *) rinfo->clause, ctx->from, ctx->to);
 replace_varno((Node *) rinfo->orclause, ctx->from, ctx->to);
 rinfo->clause_relids = replace_relid(rinfo->clause_relids,
ctx->from, ctx->to);
 ...
 }
 ...
 }
 ...
}
=

I have attached a new version of the patch, v23, to fix this problem.
v23-0006 adds a helper function called update_clause_relids(). This
function modifies RestrictInfo->clause_relids while adjusting its
related indexes. I have also attached a sanity check patch
(sanity-check.txt) to this email. This sanity check patch verifies
that there are no missing references between RestrictInfos and our
indexes. I don't intend to commit this patch, but it helps find
potential bugs. v23 passes this sanity check, but the v21 you
submitted before does not. This means that the adjustment by
update_clause_relids() is needed to prevent missing references after
modifying clause_relids. I'd appreciate your letting me know if v23
doesn't solve your concern.

One of the things I don't think is good about my approach is that it
adds some complexity to the code. In my approach, all modifications to
clause_relids must be done through the update_clause_relids()
function, but enforcing this rule is not so easy. In this sense, my
patch may need to be simplified more.


this is due to the fact that I explained before: we zeroed the values indicated 
by the indexes,
then this check is not correct either - since the zeroed value indicated by the 
index is correct.
That's why I removed this check.

Thank you for letting me know. I fixed this in v23-0005 to adjust the
indexes in update_eclasses(). With this change, the assertion check
will be correct.

Yes, it is working correctly now with the assertion check. I suppose 
it's better to add this code with an additional comment and a 
recommendation for other developers
to use it for checking in case of manipulations with the list of 
equivalences.


--
Regards,
Alena Rybakina
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PoC] Reducing planning time when tables have many partitions

2023-12-16 Thread Alena Rybakina

Hi!
On 13.12.2023 09:21, Yuya Watari wrote:

Hello Alena, Andrei, and all,

I am sorry for my late response. I found that the current patches do
not apply to the master, so I have rebased those patches. I have
attached v22. For this later discussion, I separated the rebasing and
bug fixing that Alena did in v21 into separate commits, v22-0003 and
v22-0004. I will merge these commits after the discussion.

1. v22-0003 (solved_conflict_with_self_join_removal.txt)

Thank you!

Thank you for your rebase. Looking at your rebasing patch, I thought
we could do this more simply. Your patch deletes (more precisely, sets
to null) non-redundant members from the root->eq_sources list and
re-adds them to the same list. However, this approach seems a little
waste of memory. Instead, we can update
EquivalenceClass->ec_source_indexes directly. Then, we can reuse the
members in root->eq_sources and don't need to extend root->eq_sources.
I did this in v22-0003. What do you think of this approach?
I thought about this earlier and was worried that the index links of the 
equivalence classes might not be referenced correctly for outer joins,

so I decided to just overwrite them and reset the previous ones.

The main concern with this idea is that it does not fix
RangeTblEntry->eclass_source_indexes. The current code works fine even
if we don't fix the index because get_ec_source_indexes() always does
bms_intersect() for eclass_source_indexes and ec_source_indexes. If we
guaranteed this behavior of doing bms_intersect, then simply modifying
ec_source_indexes would be fine. Fortunately, such a guarantee is not
so difficult.

And your patch removes the following assertion code from the previous
patch. May I ask why you removed this code? I think this assertion is
helpful for sanity checks. Of course, I know that this kind of
assertion will slow down regression tests or assert-enabled builds.
So, we may have to discuss which assertions to keep and which to
discard.

=
-#ifdef USE_ASSERT_CHECKING
-   /* verify the results look sane */
-   i = -1;
-   while ((i = bms_next_member(rel_esis, i)) >= 0)
-   {
-   RestrictInfo *rinfo = list_nth_node(RestrictInfo, root->eq_sources,
-   i);
-
-   Assert(bms_overlap(relids, rinfo->clause_relids));
-   }
-#endif
=
this is due to the fact that I explained before: we zeroed the values 
indicated by the indexes,
then this check is not correct either - since the zeroed value indicated 
by the index is correct.

That's why I removed this check.

Finally, your patch changes the name of the following function. I
understand the need for this change, but it has nothing to do with our
patches, so we should not include it and discuss it in another thread.

=
-update_eclasses(EquivalenceClass *ec, int from, int to)
+update_eclass(PlannerInfo *root, EquivalenceClass *ec, int from, int to)
=

I agree.

2. v22-0004 (bug_related_to_atomic_function.txt)

Thank you for fixing the bug. As I wrote in the previous mail:

On Wed, Nov 22, 2023 at 2:32 PM Yuya Watari  wrote:

On Mon, Nov 20, 2023 at 1:45 PM Andrei Lepikhov
  wrote:

During the work on committing the SJE feature [1], Alexander Korotkov
pointed out the silver lining in this work [2]: he proposed that we
shouldn't remove RelOptInfo from simple_rel_array at all but replace it
with an 'Alias', which will refer the kept relation. It can simplify
further optimizations on removing redundant parts of the query.

Thank you for sharing this information. I think the idea suggested by
Alexander Korotkov is also helpful for our patch. As mentioned above,
the indexes are in RangeTblEntry in the current implementation.
However, I think RangeTblEntry is not the best place to store them. An
'alias' relids may help solve this and simplify fixing the above bug.
I will try this approach soon.

I think that the best way to solve this issue is to move these indexes
from RangeTblEntry to RelOptInfo. Since they are related to planning
time, they should be in RelOptInfo. The reason why I put these indexes
in RangeTblEntry is because some RelOptInfos can be null and we cannot
store the indexes. This problem is similar to an issue regarding
'varno 0' Vars. I hope an alias RelOptInfo would help solve this
issue. I have attached the current proof of concept I am considering
as poc-alias-reloptinfo.txt. To test this patch, please follow the
procedure below.

1. Apply all *.patch files,
2. Apply Alexander Korotkov's alias_relids.patch [1], and
3. Apply poc-alias-reloptinfo.txt, which is attached to this email.

My patch creates a dummy (or an alias) RelOptInfo to store indexes if
the corresponding RelOptInfo is null. The following is the core change
in my patch.

=
@@ -627,9 +627,19 @@ add_eq_source(PlannerInfo *root, EquivalenceClass
*ec, RestrictInfo *rinfo)
 i = -1;
 while ((i = bms_next_member(rinfo->clause_relids, i)) >= 0)
 {
-   RangeTblEntry *rte = root->simple_rte_array[i];
+   

Re: [PoC] Reducing planning time when tables have many partitions

2023-11-29 Thread Yuya Watari
Hello,

On Wed, Nov 22, 2023 at 2:32 PM Yuya Watari  wrote:
> Unfortunately, I've been busy due to work, so I won't be able to
> respond for several weeks. I'm really sorry for not being able to see
> the patches. As soon as I'm not busy, I will look at them, consider
> the above approach, and reply to this thread. If there is no
> objection, I will move this CF entry forward to next CF.

Since the end of this month is approaching, I moved this CF entry to
the next CF (January CF). I will reply to this thread in a few weeks.
Again, I appreciate your kind reviews and patches.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-11-21 Thread Yuya Watari
Hello Alena, Andrei, and all,

Thank you for reviewing this patch. I really apologize for not
updating this thread for a while.

On Sat, Nov 18, 2023 at 6:04 AM Alena Rybakina  wrote:
> Hi, all!
>
> While I was reviewing the patches, I noticed that they needed some rebasing, 
> and in one of the patches (Introduce-indexes-for-RestrictInfo.patch) there 
> was a conflict with the recently added self-join-removal feature [1]. So, I 
> rebased patches and resolved the conflicts. While I was doing this, I found a 
> problem that I also fixed:

Thank you very much for rebasing these patches and fixing the issue.
The bug seemed to be caused because these indexes were in
RangeTblEntry, and the handling of their serialization was not
correct. Thank you for fixing it.

On Mon, Nov 20, 2023 at 1:45 PM Andrei Lepikhov
 wrote:
> During the work on committing the SJE feature [1], Alexander Korotkov
> pointed out the silver lining in this work [2]: he proposed that we
> shouldn't remove RelOptInfo from simple_rel_array at all but replace it
> with an 'Alias', which will refer the kept relation. It can simplify
> further optimizations on removing redundant parts of the query.

Thank you for sharing this information. I think the idea suggested by
Alexander Korotkov is also helpful for our patch. As mentioned above,
the indexes are in RangeTblEntry in the current implementation.
However, I think RangeTblEntry is not the best place to store them. An
'alias' relids may help solve this and simplify fixing the above bug.
I will try this approach soon.

Unfortunately, I've been busy due to work, so I won't be able to
respond for several weeks. I'm really sorry for not being able to see
the patches. As soon as I'm not busy, I will look at them, consider
the above approach, and reply to this thread. If there is no
objection, I will move this CF entry forward to next CF.

Again, thank you very much for looking at this thread, and I'm sorry
for my late work.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-11-19 Thread Andrei Lepikhov

On 27/9/2023 14:28, Yuya Watari wrote:

Thank you for pointing it out. The ec_source_indexes and
ec_derive_indexes are just picked up from the previous patch, and I
have not changed their design. I think a similar approach to
EquivalenceMembers may be applied to RestrictInfos. I will experiment
with them and share a new patch.


During the work on committing the SJE feature [1], Alexander Korotkov 
pointed out the silver lining in this work [2]: he proposed that we 
shouldn't remove RelOptInfo from simple_rel_array at all but replace it 
with an 'Alias', which will refer the kept relation. It can simplify 
further optimizations on removing redundant parts of the query.


[1] 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru
[2] 
https://www.postgresql.org/message-id/capphfdsnabg8cak+nj8akig_+_tt07ecstkb1loa50f0ust...@mail.gmail.com


--
regards,
Andrei Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-11-17 Thread Tom Lane
John Naylor  writes:
> On Sat, Nov 18, 2023 at 4:04 AM Alena Rybakina  
> wrote:
>> All diff files have already been added to 
>> v21-0002-Introduce-indexes-for-RestrictInfo patch.

> Unfortunately, the patch tester is too smart for its own good, and
> will try to apply .diff files as well.

Yeah --- see documentation here:

https://wiki.postgresql.org/wiki/Cfbot

That suggests using a .txt extension for anything you don't want to
be taken as part of the patch set.

regards, tom lane




Re: [PoC] Reducing planning time when tables have many partitions

2023-11-17 Thread John Naylor
On Sat, Nov 18, 2023 at 4:04 AM Alena Rybakina  wrote:
>
> All diff files have already been added to 
> v21-0002-Introduce-indexes-for-RestrictInfo patch.

Unfortunately, the patch tester is too smart for its own good, and
will try to apply .diff files as well. Since
bug_related_to_atomic_function.diff is first in the alphabet, it comes
first, which is the reason for the current CI failure.




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-27 Thread Yuya Watari
Hello Ashutosh and Andrey,

On Wed, Sep 20, 2023 at 8:03 PM Ashutosh Bapat
 wrote:
> While working on RestrictInfo translations patch I was thinking on
> these lines. [1] uses hash table for storing translated RestrictInfo.
> An EC can have a hash table to store ec_member translations. The same
> patchset also has some changes in the code which generates
> RestrictInfo clauses from ECs. I think that code will be simplified by
> your approach.

Thank you for sharing this. I agree that we have to avoid adding
complexity to existing or future codes through my patch. As you say,
this approach mentioned in the last email is helpful to simplify the
code, so I will try it.

On Fri, Sep 22, 2023 at 12:49 PM Lepikhov Andrei
 wrote:
> It is okay if we talk about the self-join-removal feature specifically 
> because joins are removed before an inheritance expansion.
> But ec_source_indexes and ec_derive_indexes point to specific places in 
> eq_sources and eq_derives lists. If I removed an EquivalenceClass or a 
> restriction during an optimisation, I would arrange all indexes, too.
> Right now, I use a workaround here and remove the index link without removing 
> the element from the list. But I'm not sure how good this approach can be in 
> perspective.
> So, having eq_sources and eq_derives localised in EC could make such 
> optimisations a bit more simple.

Thank you for pointing it out. The ec_source_indexes and
ec_derive_indexes are just picked up from the previous patch, and I
have not changed their design. I think a similar approach to
EquivalenceMembers may be applied to RestrictInfos. I will experiment
with them and share a new patch.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-21 Thread Lepikhov Andrei
On Wed, Sep 20, 2023, at 5:04 PM, Yuya Watari wrote:
> On Tue, Sep 19, 2023 at 5:21 PM Andrey Lepikhov
>  wrote:
>> Working on self-join removal in the thread [1] nearby, I stuck into the
>> problem, which made an additional argument to work in this new direction
>> than a couple of previous ones.
>> With indexing positions in the list of equivalence members, we make some
>> optimizations like join elimination more complicated - it may need to
>> remove some clauses and equivalence class members.
>> For changing lists of derives or ec_members, we should go through all
>> the index lists and fix them, which is a non-trivial operation.
>
> Thank you for looking into this and pointing that out. I understand
> that this problem will occur somewhere like your patch [1] quoted
> below because we need to modify RelOptInfo->eclass_child_members in
> addition to ec_members. Is my understanding correct? (Of course, I
> know ec_[no]rel_members, but I doubt we need them.)

It is okay if we talk about the self-join-removal feature specifically because 
joins are removed before an inheritance expansion.
But ec_source_indexes and ec_derive_indexes point to specific places in 
eq_sources and eq_derives lists. If I removed an EquivalenceClass or a 
restriction during an optimisation, I would arrange all indexes, too.
Right now, I use a workaround here and remove the index link without removing 
the element from the list. But I'm not sure how good this approach can be in 
perspective.
So, having eq_sources and eq_derives localised in EC could make such 
optimisations a bit more simple.

-- 
Regards,
Andrei Lepikhov




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-20 Thread Ashutosh Bapat
On Wed, Sep 20, 2023 at 3:35 PM Yuya Watari  wrote:

> I think we may be able to remove the eclass_child_members field by
> making child members on demand. v20 makes child members at
> add_[child_]join_rel_equivalences() and adds them into
> RelOptInfo->eclass_child_members. Instead of doing that, if we
> translate on demand when child members are requested,
> RelOptInfo->eclass_child_members is no longer necessary. After that,
> there is only ec_members, which consists of parent members, so
> removing clauses will still be simple. Do you think this idea will
> solve your problem? If so, I will experiment with this and share a new
> patch version. The main concern with this idea is that the same child
> member will be created many times, wasting time and memory. Some
> techniques like caching might solve this.
>

While working on RestrictInfo translations patch I was thinking on
these lines. [1] uses hash table for storing translated RestrictInfo.
An EC can have a hash table to store ec_member translations. The same
patchset also has some changes in the code which generates
RestrictInfo clauses from ECs. I think that code will be simplified by
your approach.

[1] 
https://www.postgresql.org/message-id/caexhw5u0yyyr2mwvlrvvy_qnld65kpc9u-bo0ox7bglkgba...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-20 Thread Yuya Watari
Hello Ashutosh and Andrey,

Thank you for your email, and I really apologize for my late response.

On Thu, Sep 7, 2023 at 3:43 PM Ashutosh Bapat
 wrote:
> It seems that  you are still investigating and fixing issues. But the
> CF entry is marked as "needs review". I think a better status is
> "WoA". Do you agree with that?

Yes, I am now investigating and fixing issues. I agree with you and
changed the entry's status to "Waiting on Author". Thank you for your
advice.

On Tue, Sep 19, 2023 at 5:21 PM Andrey Lepikhov
 wrote:
> Working on self-join removal in the thread [1] nearby, I stuck into the
> problem, which made an additional argument to work in this new direction
> than a couple of previous ones.
> With indexing positions in the list of equivalence members, we make some
> optimizations like join elimination more complicated - it may need to
> remove some clauses and equivalence class members.
> For changing lists of derives or ec_members, we should go through all
> the index lists and fix them, which is a non-trivial operation.

Thank you for looking into this and pointing that out. I understand
that this problem will occur somewhere like your patch [1] quoted
below because we need to modify RelOptInfo->eclass_child_members in
addition to ec_members. Is my understanding correct? (Of course, I
know ec_[no]rel_members, but I doubt we need them.)

=
+static void
+update_eclass(EquivalenceClass *ec, int from, int to)
+{
+   List   *new_members = NIL;
+   ListCell   *lc;
+
+   foreach(lc, ec->ec_members)
+   {
+   EquivalenceMember  *em = lfirst_node(EquivalenceMember, lc);
+   boolis_redundant = false;
+
...
+
+   if (!is_redundant)
+   new_members = lappend(new_members, em);
+   }
+
+   list_free(ec->ec_members);
+   ec->ec_members = new_members;
=

I think we may be able to remove the eclass_child_members field by
making child members on demand. v20 makes child members at
add_[child_]join_rel_equivalences() and adds them into
RelOptInfo->eclass_child_members. Instead of doing that, if we
translate on demand when child members are requested,
RelOptInfo->eclass_child_members is no longer necessary. After that,
there is only ec_members, which consists of parent members, so
removing clauses will still be simple. Do you think this idea will
solve your problem? If so, I will experiment with this and share a new
patch version. The main concern with this idea is that the same child
member will be created many times, wasting time and memory. Some
techniques like caching might solve this.

[1] 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-09-19 Thread Andrey Lepikhov

On 25/8/2023 14:39, Yuya Watari wrote:

Hello,

On Wed, Aug 9, 2023 at 8:54 PM David Rowley  wrote:

I think the best way to move this forward is to explore not putting
partitioned table partitions in EMs and instead see if we can
translate to top-level parent before lookups.  This might just be too
complex to translate the Exprs all the time and it may add overhead
unless we can quickly determine somehow that we don't need to attempt
to translate the Expr when the given Expr is already from the
top-level parent. If that can't be made to work, then maybe that shows
the current patch has merit.


Based on your suggestion, I have experimented with not putting child
EquivalenceMembers in an EquivalenceClass. I have attached a new
patch, v20, to this email. The following is a summary of v20.
Working on self-join removal in the thread [1] nearby, I stuck into the 
problem, which made an additional argument to work in this new direction 
than a couple of previous ones.
With indexing positions in the list of equivalence members, we make some 
optimizations like join elimination more complicated - it may need to 
remove some clauses and equivalence class members.
For changing lists of derives or ec_members, we should go through all 
the index lists and fix them, which is a non-trivial operation.


[1] 
https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-09-07 Thread Ashutosh Bapat
Hi Yuya,

On Fri, Aug 25, 2023 at 1:09 PM Yuya Watari  wrote:
>
> 3. Future works
>
> 3.1. Redundant memory allocation of Lists
>
> When we need child EquivalenceMembers in a loop over ec_members, v20
> adds them to the list. However, since we cannot modify the ec_members,
> v20 always copies it. In most cases, there are only one or two child
> members, so this behavior is a waste of memory and time and not a good
> idea. I didn't address this problem in v20 because doing so could add
> much complexity to the code, but it is one of the major future works.
>
> I suspect that the degradation of Queries A and B is due to this
> problem. The difference between 'make installcheck' and Queries A and
> B is whether there are partitioned tables. Most of the tests in 'make
> installcheck' do not have partitions, so find_relids_top_parents()
> could immediately determine the given Relids are already top-level and
> keep degradation very small. However, since Queries A and B have
> partitions, too frequent allocations of Lists may have caused the
> regression. I hope we can reduce the degradation by avoiding these
> memory allocations. I will continue to investigate and fix this
> problem.
>
> 3.2. em_relids and pull_varnos
>
> I'm sorry that v20 did not address your 1st concern regarding
> em_relids and pull_varnos. I will try to look into this.
>
> 3.3. Indexes for RestrictInfos
>
> Indexes for RestrictInfos are still in RangeTblEntry in v20-0002. I
> will also investigate this issue.
>
> 3.4. Correctness
>
> v20 has passed all regression tests in my environment, but I'm not so
> sure if v20 is correct.
>
> 4. Conclusion
>
> I wrote v20 based on a new idea. It may have a lot of problems, but it
> has advantages. At least it solves your 3rd concern. Since we iterate
> Lists instead of Bitmapsets, we don't have to introduce an iterator
> mechanism. My experiment showed that the 'make installcheck'
> degradation was very small. For the 2nd concern, v20 no longer adds
> child EquivalenceMembers to ec_members. I'm sorry if this is not what
> you intended, but it effectively worked. Again, v20 is a new proof of
> concept. I hope the v20-based approach will be a good alternative
> solution if we can overcome several problems, including what I
> mentioned above.

It seems that  you are still investigating and fixing issues. But the
CF entry is marked as "needs review". I think a better status is
"WoA". Do you agree with that?

-- 
Best Wishes,
Ashutosh Bapat




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-10 Thread Yuya Watari
Hello David,

I really appreciate your quick reply.

On Wed, Aug 9, 2023 at 7:28 PM David Rowley  wrote:
> If 0004 is adding an em_index to mark the index into
> PlannerInfo->eq_members, can't you use that in
> setup_eclass_member[_strict]_iterator to loop to verify that the two
> methods yield the same result?
>
> i.e:
>
> + Bitmapset *matching_ems = NULL;
> + memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
> + memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
> +
> + idx_iter.use_index = true;
> + noidx_iter.use_index = false;
> +
> + while ((em = eclass_member_iterator_strict_next(_iter)) != NULL)
> + matching_ems = bms_add_member(matching_ems, em->em_index);
> +
> + Assert(bms_equal(matching_ems, iter->matching_ems));
>
> That should void the complaint that the Assert checking is too slow.
> You can also delete the list_ptr_cmp function too (also noticed a
> complaint about that).

Thanks for sharing your idea regarding this verification. It looks
good to solve the degradation problem in assert-enabled builds. I will
try it.

> For the 0003 patch.  Can you explain why you think these fields should
> be in RangeTblEntry rather than RelOptInfo? I can only guess you might
> have done this for memory usage so that we don't have to carry those
> fields for join rels?  I think RelOptInfo is the correct place to
> store fields that are only used in the planner.  If you put them in
> RangeTblEntry they'll end up in pg_rewrite and be stored for all
> views.  Seems very space inefficient and scary as it limits the scope
> for fixing bugs in back branches due to RangeTblEntries being
> serialized into the catalogues in various places.

This change was not made for performance reasons but to avoid null
reference exceptions. The details are explained in my email [1]. In
brief, the earlier patch did not work because simple_rel_array[i]
could be NULL for some 'i', and we referenced such a RelOptInfo. For
example, the following code snippet in add_eq_member() does not work.
I inserted "Assert(rel != NULL)" into this code, and then the
assertion failed. So, I moved the indexes to RangeTblEntry to address
this issue, but I don't know if this solution is good. We may have to
solve this in a different way.

=
@@ -572,9 +662,31 @@ add_eq_member(EquivalenceClass *ec, Expr *expr,
Relids relids,
+i = -1;
+while ((i = bms_next_member(expr_relids, i)) >= 0)
+{
+RelOptInfo *rel = root->simple_rel_array[i];
+
+rel->eclass_member_indexes =
bms_add_member(rel->eclass_member_indexes, em_index);
+}
=

On Wed, Aug 9, 2023 at 8:54 PM David Rowley  wrote:
> So, I have three concerns with this patch.

> I think the best way to move this forward is to explore not putting
> partitioned table partitions in EMs and instead see if we can
> translate to top-level parent before lookups.  This might just be too
> complex to translate the Exprs all the time and it may add overhead
> unless we can quickly determine somehow that we don't need to attempt
> to translate the Expr when the given Expr is already from the
> top-level parent. If that can't be made to work, then maybe that shows
> the current patch has merit.

I really appreciate your detailed advice. I am sorry that I will not
be able to respond for a week or two due to my vacation, but I will
explore and work on these issues. Thanks again for your kind reply.

[1] 
https://www.postgresql.org/message-id/CAJ2pMkYR_X-%3Dpq%2B39-W5kc0OG7q9u5YUwDBCHnkPur17DXnxuQ%40mail.gmail.com

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-09 Thread David Rowley
On Wed, 9 Aug 2023 at 20:15, Yuya Watari  wrote:
> I agree with your opinion that my patch lacks some explanations, so I
> will consider adding more comments. However, I received the following
> message from David in March.
>
> On Thu, Mar 9, 2023 at 6:23 AM David Rowley  wrote:
> > For the main patch, I've been starting to wonder if it should work
> > completely differently.  Instead of adding members for partitioned and
> > inheritance children, we could just translate the Vars from child to
> > top-level parent and find the member that way. I wondered if this
> > method might be even faster as it would forego
> > add_child_rel_equivalences(). I think we'd still need em_is_child for
> > UNION ALL children.  So far, I've not looked into this in detail. I
> > was hoping to find an idea that would allow some means to have the
> > planner realise that a LIST partition which allows a single Datum
> > could skip pushing base quals which are constantly true. i.e:
> >
> > create table lp (a int) partition by list(a);
> > create table lp1 partition of lp for values in(1);
> > explain select * from lp where a = 1;
> >
> >  Seq Scan on lp1 lp  (cost=0.00..41.88 rows=13 width=4)
> >Filter: (a = 1)
>
> I am concerned that fixing the current patch will conflict with
> David's idea. Of course, I am now trying to experiment with the above
> idea, but I should avoid the conflict if he is working on this. David,
> what do you think about this? Is it OK to post a new patch to address
> the review comments? I am looking forward to your reply.

So, I have three concerns with this patch.

1) I really dislike the way eclass_member_iterator_next() has to check
bms_overlap() to filter out unwanted EMs.  This is required because of
how add_child_rel_equivalences() does not pass the "relids" parameter
in add_eq_member() as equivalent to pull_varnos(expr).  See this code
in master:

/*
* Transform em_relids to match.  Note we do *not* do
* pull_varnos(child_expr) here, as for example the
* transformation might have substituted a constant, but we
* don't want the child member to be marked as constant.
*/
new_relids = bms_difference(cur_em->em_relids,
top_parent_relids);
new_relids = bms_add_members(new_relids, child_relids);


I understand this is done to support Consts in UNION ALL parents, e.g
the following query prunes the n=2 UNION ALL branch

postgres=# explain select * from (select 1 AS n,* from pg_Class c1
union all select 2 AS n,* from pg_Class c2) where n=1;
   QUERY PLAN

 Seq Scan on pg_class c1  (cost=0.00..18.13 rows=413 width=277)
(1 row)

... but the following (existing) comment is just a lie:

Relids em_relids; /* all relids appearing in em_expr */

This means that there's some weirdness on which RelOptInfos we set
eclass_member_indexes.  Do we just set the EM in the RelOptInfos
mentioned in the em_expr, or should it be the ones in em_relids?

You can see the following code I wrote in the 0001 patch which tries
to work around this problem:

+ /*
+ * We must determine the exact set of relids in the expr for child
+ * EquivalenceMembers as what is given to us in 'relids' may differ from
+ * the relids mentioned in the expression.  See add_child_rel_equivalences
+ */
+ if (parent != NULL)
+ expr_relids = pull_varnos(root, (Node *) expr);
+ else
+ {
+ expr_relids = relids;
+ /* We expect the relids to match for non-child members */
+ Assert(bms_equal(pull_varnos(root, (Node *) expr), relids));
+ }

So, you can see we go with the relids from the em_expr rather than
what's mentioned in em_relids.  I believe this means we need the
following line:

+ /*
+ * Don't return members which have no common rels with with_relids
+ */
+ if (!bms_overlap(em->em_relids, iter->with_relids))
+ continue;

I don't quite recall if the em_expr can mention relids that are not in
em_relids or not or if em_expr's relids always is a subset of
em_relids.

I'm just concerned this adds complexity and the risk of mixing up the
meaning (even more than it is already in master). I'm not sure I'm
confident that all this is correct, and I wrote the 0001 patch.

Maybe this can be fixed by changing master so that em_relids always
matches pull_varnos(em_expr)? I'm unsure if there are any other
complexities other than having to ensure we don't set em_is_const for
child members.

2) The 2nd reason is what I hinted at that you quoted in the email I
sent you in March.  I think if it wasn't for UNION ALL and perhaps
table inheritance and we only needed child EMs for partitions of
partitioned tables, then I think we might be able to get away with
just translating Exprs child -> parent before looking up the EM and
likewise when asked to get join quals for child rels, we'd translate
the child relids to their top level parents, find the quals then
translate those back to child form again. EquivalenceClasses would
then only contain a few members and there likely wouldn't be a 

Re: [PoC] Reducing planning time when tables have many partitions

2023-08-09 Thread David Rowley
On Wed, 9 Aug 2023 at 22:28, David Rowley  wrote:
> i.e:
>
> + Bitmapset *matching_ems = NULL;
> + memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
> + memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
> +
> + idx_iter.use_index = true;
> + noidx_iter.use_index = false;
> +
> + while ((em = eclass_member_iterator_strict_next(_iter)) != NULL)
> + matching_ems = bms_add_member(matching_ems, em->em_index);
> +
> + Assert(bms_equal(matching_ems, iter->matching_ems));

Slight correction, you could just get rid of idx_iter completely. I
only added that copy since the Assert code needed to iterate and I
didn't want to change the position of the iterator that's actually
being used.  Since the updated code wouldn't be interesting over
"iter", you could just use "iter" directly like I have in the
Assert(bms_equals... code above.

David




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-09 Thread David Rowley
On Wed, 5 Jul 2023 at 21:58, Yuya Watari  wrote:
>
> Hello,
>
> On Fri, Mar 10, 2023 at 5:38 PM Yuya Watari  wrote:
> > Thank you for pointing it out. I have attached the rebased version to
> > this email.
>
> Recent commits, such as a8c09daa8b [1], have caused conflicts and
> compilation errors in these patches. I have attached the fixed version
> to this email.
>
> The v19-0004 adds an 'em_index' field representing the index within
> root->eq_members of the EquivalenceMember. This field is needed to
> delete EquivalenceMembers when iterating them using the ec_members
> list instead of the ec_member_indexes.

If 0004 is adding an em_index to mark the index into
PlannerInfo->eq_members, can't you use that in
setup_eclass_member[_strict]_iterator to loop to verify that the two
methods yield the same result?

i.e:

+ Bitmapset *matching_ems = NULL;
+ memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
+ memcpy(_iter, iter, sizeof(EquivalenceMemberIterator));
+
+ idx_iter.use_index = true;
+ noidx_iter.use_index = false;
+
+ while ((em = eclass_member_iterator_strict_next(_iter)) != NULL)
+ matching_ems = bms_add_member(matching_ems, em->em_index);
+
+ Assert(bms_equal(matching_ems, iter->matching_ems));

That should void the complaint that the Assert checking is too slow.
You can also delete the list_ptr_cmp function too (also noticed a
complaint about that).

For the 0003 patch.  Can you explain why you think these fields should
be in RangeTblEntry rather than RelOptInfo? I can only guess you might
have done this for memory usage so that we don't have to carry those
fields for join rels?  I think RelOptInfo is the correct place to
store fields that are only used in the planner.  If you put them in
RangeTblEntry they'll end up in pg_rewrite and be stored for all
views.  Seems very space inefficient and scary as it limits the scope
for fixing bugs in back branches due to RangeTblEntries being
serialized into the catalogues in various places.

David




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-09 Thread Yuya Watari
Hello Andrey, Ashutosh, and David,

Thank you for your reply and for reviewing the patch.

On Mon, Aug 7, 2023 at 5:51 PM Andrey Lepikhov
 wrote:
> One more thing: I think, you should add comments to
> add_child_rel_equivalences() and add_child_join_rel_equivalences()
> on replacing of:
>
> if (bms_is_subset(cur_em->em_relids, top_parent_relids) &&
> !bms_is_empty(cur_em->em_relids))
> and
> if (bms_overlap(cur_em->em_relids, top_parent_relids))
>
> with different logic. What was changed? It will be better to help future
> developers realize this part of the code more easily by adding some
> comments.

The following change in add_child_join_rel_equivalences():

-/* Does this member reference child's topmost parent rel? */
-if (bms_overlap(cur_em->em_relids, top_parent_relids))

is correct because EquivalenceMemberIterator guarantees that these two
Relids always overlap for the iterated results. The following code
does this iteration. As seen from the below code, the iteration
eliminates not overlapping Relids, so we do not need to check
bms_overlap() for the iterated results.

=
/*
 * eclass_member_iterator_next
 * Fetch the next EquivalenceMember from an EquivalenceMemberIterator
 * which was set up by setup_eclass_member_iterator().  Returns NULL when
 * there are no more matching EquivalenceMembers.
 */
EquivalenceMember *
eclass_member_iterator_next(EquivalenceMemberIterator *iter)
{
...
ListCell   *lc;

for_each_from(lc, iter->eclass->ec_members, iter->current_index + 1)
{
EquivalenceMember *em = lfirst_node(EquivalenceMember, lc);
...
/*
 * Don't return members which have no common rels with with_relids
 */
if (!bms_overlap(em->em_relids, iter->with_relids))
continue;

return em;
}
return NULL;
...
}
=

I agree with your opinion that my patch lacks some explanations, so I
will consider adding more comments. However, I received the following
message from David in March.

On Thu, Mar 9, 2023 at 6:23 AM David Rowley  wrote:
> For the main patch, I've been starting to wonder if it should work
> completely differently.  Instead of adding members for partitioned and
> inheritance children, we could just translate the Vars from child to
> top-level parent and find the member that way. I wondered if this
> method might be even faster as it would forego
> add_child_rel_equivalences(). I think we'd still need em_is_child for
> UNION ALL children.  So far, I've not looked into this in detail. I
> was hoping to find an idea that would allow some means to have the
> planner realise that a LIST partition which allows a single Datum
> could skip pushing base quals which are constantly true. i.e:
>
> create table lp (a int) partition by list(a);
> create table lp1 partition of lp for values in(1);
> explain select * from lp where a = 1;
>
>  Seq Scan on lp1 lp  (cost=0.00..41.88 rows=13 width=4)
>Filter: (a = 1)

I am concerned that fixing the current patch will conflict with
David's idea. Of course, I am now trying to experiment with the above
idea, but I should avoid the conflict if he is working on this. David,
what do you think about this? Is it OK to post a new patch to address
the review comments? I am looking forward to your reply.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-08 Thread Ashutosh Bapat
Hi Andrey,

On Tue, Aug 8, 2023 at 8:52 AM Andrey Lepikhov
 wrote:
> It is a positive thing to access some planner internals from the
> console, of course. My point is dedicated to the structuration of an
> EXPLAIN output and is caused by two reasons:
> 1. I use the EXPLAIN command daily to identify performance issues and
> the optimiser's weak points. According to the experience, when you have
> an 'explain analyze' containing more than 100 strings, you try removing
> unnecessary information to improve observability. It would be better to
> have the possibility to see an EXPLAIN with different levels of the
> output details. Flexibility here reduces a lot of manual work, sometimes.

I use the json output format to extract the interesting parts of
EXPLAIN output. See my SQL scripts attached upthread. That way I can
ignore new additions like this.

> 2. Writing extensions and having an explain analyze in the regression
> test, we must create masking functions just to make the test more
> stable. That additional work can be avoided with another option, like
> MEMUSAGE ON/OFF.

We already have a masking function in-place. See changes to
explain.out in my proposed patch at [1]

> > I will propose it as a separate patch in the next commitfest and will
> > seek opinions from other hackers.
> Cool, good news.

Done. Commitfest entry https://commitfest.postgresql.org/44/4492/

[1] 
https://www.postgresql.org/message-id/CAExHW5sZA=5lj_zppro-w09ck8z9p7eayaqq3ks9gdfhrxe...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Andrey Lepikhov

On 7/8/2023 19:15, Ashutosh Bapat wrote:



On Mon, Aug 7, 2023 at 2:21 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:


 >> Do you think that the memory measurement patch I have shared in
those threads is useful in itself? If so, I will start another
proposal to address it.
 >
 > For me, who is developing the planner in this thread, the memory
 > measurement patch is useful. However, most users do not care about
 > memory usage, so there is room for consideration. For example, making
 > the metrics optional in EXPLAIN ANALYZE outputs might be better.
 >
+1. Any memory-related info in the output of EXPLAIN ANALYZE makes
tests
more complex because of architecture dependency.


As far as the tests go, the same is the case with planning time and 
execution time. They change even without changing the architecture. But 
we have tests which mask the actual values. Something similar will be 
done to the planning memory.
It is a positive thing to access some planner internals from the 
console, of course. My point is dedicated to the structuration of an 
EXPLAIN output and is caused by two reasons:
1. I use the EXPLAIN command daily to identify performance issues and 
the optimiser's weak points. According to the experience, when you have 
an 'explain analyze' containing more than 100 strings, you try removing 
unnecessary information to improve observability. It would be better to 
have the possibility to see an EXPLAIN with different levels of the 
output details. Flexibility here reduces a lot of manual work, sometimes.
2. Writing extensions and having an explain analyze in the regression 
test, we must create masking functions just to make the test more 
stable. That additional work can be avoided with another option, like 
MEMUSAGE ON/OFF.


So, in my opinion, it would be better to introduce this new output data 
guarded by additional option.




I will propose it as a separate patch in the next commitfest and will 
seek opinions from other hackers.

Cool, good news.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Ashutosh Bapat
On Mon, Aug 7, 2023 at 2:21 PM Andrey Lepikhov 
wrote:

> >> Do you think that the memory measurement patch I have shared in those
> threads is useful in itself? If so, I will start another proposal to
> address it.
> >
> > For me, who is developing the planner in this thread, the memory
> > measurement patch is useful. However, most users do not care about
> > memory usage, so there is room for consideration. For example, making
> > the metrics optional in EXPLAIN ANALYZE outputs might be better.
> >
> +1. Any memory-related info in the output of EXPLAIN ANALYZE makes tests
> more complex because of architecture dependency.
>
>
As far as the tests go, the same is the case with planning time and
execution time. They change even without changing the architecture. But we
have tests which mask the actual values. Something similar will be done to
the planning memory.

I will propose it as a separate patch in the next commitfest and will seek
opinions from other hackers.

-- 
Best Wishes,
Ashutosh Bapat


Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Andrey Lepikhov

On 7/8/2023 15:19, Yuya Watari wrote:

Hello,

Thank you for your reply.

On Thu, Aug 3, 2023 at 10:29 PM Ashutosh Bapat
 wrote:

If you think that the verification is important to catch bugs, you may want to 
encapsulate it with an #ifdef .. #endif such that the block within is not 
compiled by default. See OPTIMIZER_DEBUG for example.


In my opinion, verifying the iteration results is only necessary to
avoid introducing bugs while developing this patch. The verification
is too excessive for regular development of PostgreSQL. I agree that
we should avoid a significant degradation in assert enabled builds, so
I will consider removing it.
I should admit, these checks has helped me during backpatching this 
feature to pg v.13 (users crave speed up of query planning a lot). Maybe 
it is a sign of a lack of tests, but in-fact, it already has helped.


One more thing: I think, you should add comments to 
add_child_rel_equivalences() and add_child_join_rel_equivalences()

on replacing of:

if (bms_is_subset(cur_em->em_relids, top_parent_relids) &&
!bms_is_empty(cur_em->em_relids))
and
if (bms_overlap(cur_em->em_relids, top_parent_relids))

with different logic. What was changed? It will be better to help future 
developers realize this part of the code more easily by adding some 
comments.



Do you think that the memory measurement patch I have shared in those threads 
is useful in itself? If so, I will start another proposal to address it.


For me, who is developing the planner in this thread, the memory
measurement patch is useful. However, most users do not care about
memory usage, so there is room for consideration. For example, making
the metrics optional in EXPLAIN ANALYZE outputs might be better.

+1. Any memory-related info in the output of EXPLAIN ANALYZE makes tests 
more complex because of architecture dependency.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-07 Thread Yuya Watari
Hello,

Thank you for your reply.

On Thu, Aug 3, 2023 at 10:29 PM Ashutosh Bapat
 wrote:
> If you think that the verification is important to catch bugs, you may want 
> to encapsulate it with an #ifdef .. #endif such that the block within is not 
> compiled by default. See OPTIMIZER_DEBUG for example.

In my opinion, verifying the iteration results is only necessary to
avoid introducing bugs while developing this patch. The verification
is too excessive for regular development of PostgreSQL. I agree that
we should avoid a significant degradation in assert enabled builds, so
I will consider removing it.

> Do you think that the memory measurement patch I have shared in those threads 
> is useful in itself? If so, I will start another proposal to address it.

For me, who is developing the planner in this thread, the memory
measurement patch is useful. However, most users do not care about
memory usage, so there is room for consideration. For example, making
the metrics optional in EXPLAIN ANALYZE outputs might be better.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-03 Thread Ashutosh Bapat
On Wed, Aug 2, 2023 at 12:11 PM Yuya Watari  wrote:

> Hello,
>
> I really appreciate sharing very useful scripts and benchmarking results.
>
> On Fri, Jul 28, 2023 at 6:51 PM Ashutosh Bapat
>  wrote:
> > Given that most of the developers run assert enabled builds it would
> > be good to bring down the degradation there while keeping the
> > excellent speedup in non-assert builds.
>
> From my observation, this degradation in assert enabled build is
> caused by verifying the iteration results of EquivalenceMembers. My
> patch uses Bitmapset-based indexes to speed up the iteration. When
> assertions are enabled, we verify that the result of the iteration is
> the same with and without the indexes. This verification results in
> executing a similar loop three times, which causes the degradation. I
> measured planning time by using your script without this verification.
> The results are as follows:
>
> Master: 144.55 ms
> Patched (v19): 529.85 ms
> Patched (v19) without verification: 78.84 ms
> (*) All runs are with assertions.
>
> As seen from the above, verifying iteration results was the cause of
> the performance degradation. I agree that we should avoid such
> degradation because it negatively affects the development of
> PostgreSQL. Removing the verification when committing this patch is
> one possible option.
>

If you think that the verification is important to catch bugs, you may want
to encapsulate it with an #ifdef .. #endif such that the block within is
not compiled by default. See OPTIMIZER_DEBUG for example.


>
> > Queries on partitioned tables eat a lot of memory anyways, increasing
> > that further should be avoided.
> >
> > I have not studied the patches. But I think the memory increase has to
> > do with our Bitmapset structure. It's space inefficient when there are
> > thousands of partitions involved. See my comment at [2]
>
> Thank you for pointing this out. I have never considered the memory
> usage impact of this patch. As you say, the Bitmapset structure caused
> this increase. I will try to look into this further.
>
>
Do you think that the memory measurement patch I have shared in those
threads is useful in itself? If so, I will start another proposal to
address it.

-- 
Best Wishes,
Ashutosh Bapat


Re: [PoC] Reducing planning time when tables have many partitions

2023-08-03 Thread Yuya Watari
Hello,

On Wed, Aug 2, 2023 at 6:43 PM Andrey Lepikhov
 wrote:
> You introduced list_ptr_cmp as an extern function of a List, but use it
> the only under USE_ASSERT_CHECKING ifdef.
> Maybe you hide it under USE_ASSERT_CHECKING or remove all the stuff?

Thank you for your quick reply and for pointing that out. If we remove
the verification code when committing this patch, we should also
remove the list_ptr_cmp() function because nobody will use it. If we
don't remove the verification, whether to hide it by
USE_ASSERT_CHECKING is a difficult question. The list_ptr_cmp() can be
used for generic use and is helpful even without assertions, so not
hiding it is one option. However, I understand that it is not pretty
to have the function compiled even though it is not referenced from
anywhere when assertions are disabled. As you say, I think hiding it
by USE_ASSERT_CHECKING is also a possible solution.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-08-02 Thread Andrey Lepikhov

On 2/8/2023 13:40, Yuya Watari wrote:

As seen from the above, verifying iteration results was the cause of
the performance degradation. I agree that we should avoid such
degradation because it negatively affects the development of
PostgreSQL. Removing the verification when committing this patch is
one possible option.
You introduced list_ptr_cmp as an extern function of a List, but use it 
the only under USE_ASSERT_CHECKING ifdef.

Maybe you hide it under USE_ASSERT_CHECKING or remove all the stuff?

--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2023-08-02 Thread Yuya Watari
Hello,

I really appreciate sharing very useful scripts and benchmarking results.

On Fri, Jul 28, 2023 at 6:51 PM Ashutosh Bapat
 wrote:
> Given that most of the developers run assert enabled builds it would
> be good to bring down the degradation there while keeping the
> excellent speedup in non-assert builds.

>From my observation, this degradation in assert enabled build is
caused by verifying the iteration results of EquivalenceMembers. My
patch uses Bitmapset-based indexes to speed up the iteration. When
assertions are enabled, we verify that the result of the iteration is
the same with and without the indexes. This verification results in
executing a similar loop three times, which causes the degradation. I
measured planning time by using your script without this verification.
The results are as follows:

Master: 144.55 ms
Patched (v19): 529.85 ms
Patched (v19) without verification: 78.84 ms
(*) All runs are with assertions.

As seen from the above, verifying iteration results was the cause of
the performance degradation. I agree that we should avoid such
degradation because it negatively affects the development of
PostgreSQL. Removing the verification when committing this patch is
one possible option.

> Queries on partitioned tables eat a lot of memory anyways, increasing
> that further should be avoided.
>
> I have not studied the patches. But I think the memory increase has to
> do with our Bitmapset structure. It's space inefficient when there are
> thousands of partitions involved. See my comment at [2]

Thank you for pointing this out. I have never considered the memory
usage impact of this patch. As you say, the Bitmapset structure caused
this increase. I will try to look into this further.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-07-28 Thread Ashutosh Bapat
Hi Yuya, Andrey,

On Fri, Jul 28, 2023 at 9:58 AM Andrey Lepikhov
 wrote:

> >>
> > Discovering quality of partition pruning at the stage of execution
> > initialization and using your set of patches I have found some dubious
> > results with performance degradation. Look into the test case in
> > attachment.
> > Here is three queries. Execution times:
> > 1 - 8s; 2 - 30s; 3 - 131s (with your patch set).
> > 1 - 5s; 2 - 10s; 3 - 33s (current master).
> >
> > Maybe it is a false alarm, but on my laptop I see this degradation at
> > every launch.
> Sorry for this. It was definitely a false alarm. In this patch,
> assertion checking adds much overhead. After switching it off, I found
> out that this feature solves my problem with a quick pass through the
> members of an equivalence class. Planning time results for the queries
> from the previous letter:
> 1 - 0.4s, 2 - 1.3s, 3 - 1.3s; (with the patches applied)
> 1 - 5s; 2 - 8.7s; 3 - 22s; (current master).

I measured planning time using my scripts setup.sql and queries.sql
attached to [1] with and without assert build using your patch. The
timings are recorded in the attached spreadsheet. I have following
observations

1. The patchset improves the planning time of queries involving
partitioned tables by an integral factor. Both in case of
partitionwise join and without it. The speedup is 5x to 21x in my
experiment. That's huge.
2. There's slight degradation in planning time of queries involving
unpartitioned tables. But I have seen that much variance usually.
3. assert and debug enabled build shows degradation in planning time
in all the cases.
4. There is substantial memory increase in all the cases. It's
percentage wise predominant when the partitionwise join is not used.

Given that most of the developers run assert enabled builds it would
be good to bring down the degradation there while keeping the
excellent speedup in non-assert builds.
Queries on partitioned tables eat a lot of memory anyways, increasing
that further should be avoided.

I have not studied the patches. But I think the memory increase has to
do with our Bitmapset structure. It's space inefficient when there are
thousands of partitions involved. See my comment at [2]

[1] 
https://www.postgresql.org/message-id/caexhw5stmouobe55pmt83r8uxvfcph+pvo5dnpdrvcsbgxe...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAExHW5s4EqY43oB%3Dne6B2%3D-xLgrs9ZGeTr1NXwkGFt2j-OmaQQ%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat


planning time measurement.ods
Description: application/vnd.oasis.opendocument.spreadsheet


Re: [PoC] Reducing planning time when tables have many partitions

2023-07-28 Thread Yuya Watari
Hello,

On Fri, Jul 28, 2023 at 1:27 PM Andrey Lepikhov
 wrote:
> Sorry for this. It was definitely a false alarm. In this patch,
> assertion checking adds much overhead. After switching it off, I found
> out that this feature solves my problem with a quick pass through the
> members of an equivalence class. Planning time results for the queries
> from the previous letter:
> 1 - 0.4s, 2 - 1.3s, 3 - 1.3s; (with the patches applied)
> 1 - 5s; 2 - 8.7s; 3 - 22s; (current master).
>
> I have attached flamegraph that shows query 2 planning process after
> applying this set of patches. As you can see, overhead at the
> equivalence class routines has gone.

I really appreciate testing the patches and sharing your results. The
results are interesting because they show that our optimization
effectively reduces planning time for your workload containing
different queries than I have used in my benchmarks.

Thank you again for reviewing this.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2023-07-27 Thread Andrey Lepikhov

On 5/7/2023 16:57, Yuya Watari wrote:

Hello,

On Fri, Mar 10, 2023 at 5:38 PM Yuya Watari  wrote:

Thank you for pointing it out. I have attached the rebased version to
this email.


Recent commits, such as a8c09daa8b [1], have caused conflicts and
compilation errors in these patches. I have attached the fixed version
to this email.

The v19-0004 adds an 'em_index' field representing the index within
root->eq_members of the EquivalenceMember. This field is needed to
delete EquivalenceMembers when iterating them using the ec_members
list instead of the ec_member_indexes.

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a8c09daa8bb1d741bb8b3d31a12752448eb6fb7c

Discovering quality of partition pruning at the stage of execution 
initialization and using your set of patches I have found some dubious 
results with performance degradation. Look into the test case in attachment.

Here is three queries. Execution times:
1 - 8s; 2 - 30s; 3 - 131s (with your patch set).
1 - 5s; 2 - 10s; 3 - 33s (current master).

Maybe it is a false alarm, but on my laptop I see this degradation at 
every launch.


--
regards,
Andrey Lepikhov
Postgres Professional


parts-problem.sql
Description: application/sql


Re: [PoC] Reducing planning time when tables have many partitions

2023-03-08 Thread David Rowley
On Thu, 9 Mar 2023 at 01:34, Alvaro Herrera  wrote:
> David, do you intend to continue to be involved in reviewing this one?

Yes. I'm currently trying to make a few Bitmapset improvements which
include the change made in this thread's 0001 patch over on [1].

For the main patch, I've been starting to wonder if it should work
completely differently.  Instead of adding members for partitioned and
inheritance children, we could just translate the Vars from child to
top-level parent and find the member that way. I wondered if this
method might be even faster as it would forego
add_child_rel_equivalences(). I think we'd still need em_is_child for
UNION ALL children.  So far, I've not looked into this in detail. I
was hoping to find an idea that would allow some means to have the
planner realise that a LIST partition which allows a single Datum
could skip pushing base quals which are constantly true. i.e:

create table lp (a int) partition by list(a);
create table lp1 partition of lp for values in(1);
explain select * from lp where a = 1;

 Seq Scan on lp1 lp  (cost=0.00..41.88 rows=13 width=4)
   Filter: (a = 1)

David

[1] 
https://postgr.es/m/caaphdvq9eq0w_afugrb6ba28ieuqn4zm5uwqxy7+lmzjjc+...@mail.gmail.com




Re: [PoC] Reducing planning time when tables have many partitions

2023-03-08 Thread Alvaro Herrera
Hello Watari-san, this patch does not currently apply.  Could you please
rebase?

David, do you intend to continue to be involved in reviewing this one?

Thanks to both,

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"All rings of power are equal,
But some rings of power are more equal than others."
 (George Orwell's The Lord of the Rings)




Re: [PoC] Reducing planning time when tables have many partitions

2023-02-14 Thread Andrey Lepikhov

On 2/6/23 06:47, Yuya Watari wrote:

Of course, I'm not sure if my approach in v16-0003 is ideal, but it
may help solve your concern above. Since simple_rel_array[0] is no
longer necessary with my patch, I removed the setup_append_rel_entry()
function in v16-0004. However, to work the patch, I needed to change
some assertions in v16-0005. For more details, please see the commit
message of v16-0005. After these works, the attached patches passed
all regression tests in my environment.

Instead of my approach, imitating the following change to
get_eclass_indexes_for_relids() is also a possible solution. Ignoring
NULL RelOptInfos enables us to avoid the segfault, but we have to
adjust EquivalenceMemberIterator to match the result, and I'm not sure
if this idea is correct.
As I see, You moved the indexes from RelOptInfo to PlannerInfo. May be 
better to move them into RangeTblEntry instead?


--
Regards
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-12-06 Thread Thom Brown
On Mon, 5 Dec 2022 at 21:28, David Rowley  wrote:
>
> On Tue, 6 Dec 2022 at 04:45, Thom Brown  wrote:
> > Testing your patches with the same 1024 partitions, each with 64
> > sub-partitions, I get a planning time of 205.020 ms, which is now a
> > 1,377x speedup.  This has essentially reduced the planning time from a
> > catastrophe to a complete non-issue.  Huge win!
>
> Thanks for testing the v10 patches.
>
> I wouldn't have expected such additional gains from v10. I was mostly
> focused on trying to minimise any performance regression for simple
> queries that wouldn't benefit from indexing the EquivalenceMembers.
> Your query sounds like it does not fit into that category.  Perhaps it
> is down to the fact that v9-0002 or v9-0003 reverts a couple of the
> optimisations that is causing v9 to be slower than v10 for your query.
> It's hard to tell without more details of what you're running.

I celebrated prematurely as I neglected to wait for the 6th execution
of the prepared statement, which shows the real result.  With the v10
patches, it takes 5632.040 ms, a speedup of 50x.

Testing the v9 patches, the same query takes 3388.173 ms, a speedup of
83x.  And re-testing v8, I'm getting roughly the same times.  These
are all with a cold cache.

So the result isn't as dramatic as I had initially interpreted it to
have unfortunately.

-- 
Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-12-05 Thread David Rowley
On Tue, 6 Dec 2022 at 04:45, Thom Brown  wrote:
> Testing your patches with the same 1024 partitions, each with 64
> sub-partitions, I get a planning time of 205.020 ms, which is now a
> 1,377x speedup.  This has essentially reduced the planning time from a
> catastrophe to a complete non-issue.  Huge win!

Thanks for testing the v10 patches.

I wouldn't have expected such additional gains from v10. I was mostly
focused on trying to minimise any performance regression for simple
queries that wouldn't benefit from indexing the EquivalenceMembers.
Your query sounds like it does not fit into that category.  Perhaps it
is down to the fact that v9-0002 or v9-0003 reverts a couple of the
optimisations that is causing v9 to be slower than v10 for your query.
It's hard to tell without more details of what you're running.

Is this a schema and query you're able to share? Or perhaps mock up a
script of something similar enough to allow us to see why v9 and v10
are so different?

Additionally, it would be interesting to see if patching with v10-0002
alone helps the performance of your query at all. I didn't imagine
that change would give us anything easily measurable, but partition
pruning makes extensive use of Bitmapsets, so perhaps you've found
something. If you have then it might be worth considering v10-0002
independently of the EquivalenceMember indexing work.

David




Re: [PoC] Reducing planning time when tables have many partitions

2022-12-05 Thread Thom Brown
On Sun, 4 Dec 2022 at 00:35, David Rowley  wrote:
>
> On Tue, 29 Nov 2022 at 21:59, Yuya Watari  wrote:
> > Thank you for testing the patch with an actual query. This speedup is
> > very impressive. When I used an original query with 1024 partitions,
> > its planning time was about 200ms. Given that each partition is also
> > partitioned in your workload, I think the result of 1415ms is
> > reasonable.
>
> I was looking again at the v9-0001 patch and I think we can do a
> little better when building the Bitmapset of matching EMs.  For
> example, in the v9 patch, the code for get_ecmember_indexes_strict()
> is doing:
>
> + if (!with_children)
> + matching_ems = bms_copy(ec->ec_nonchild_indexes);
> + else
> + matching_ems = bms_copy(ec->ec_member_indexes);
> +
> + i = -1;
> + while ((i = bms_next_member(relids, i)) >= 0)
> + {
> + RelOptInfo *rel = root->simple_rel_array[i];
> +
> + matching_ems = bms_int_members(matching_ems, 
> rel->eclass_member_indexes);
> + }
>
> It seems reasonable that if there are a large number of partitions
> then ec_member_indexes will have a large number of Bitmapwords.  When
> we do bms_int_members() on that, we're going to probably end up with a
> bunch of trailing zero words in the set.  In the v10 patch, I've
> changed this to become:
>
> +inti = bms_next_member(relids, -1);
> +
> +if (i >= 0)
> +{
> +RelOptInfo *rel = root->simple_rel_array[i];
> +
> +/*
> + * bms_intersect to the first relation to try to keep the resulting
> + * Bitmapset as small as possible.  This saves having to make a
> + * complete bms_copy() of one of them.  One may contain significantly
> + * more words than the other.
> + */
> +if (!with_children)
> +matching_ems = bms_intersect(rel->eclass_member_indexes,
> + ec->ec_nonchild_indexes);
> +else
> +matching_ems = bms_intersect(rel->eclass_member_indexes,
> + ec->ec_member_indexes);
> +
> +while ((i = bms_next_member(relids, i)) >= 0)
> +{
> +rel = root->simple_rel_array[i];
> +matching_ems = bms_int_members(matching_ems,
> +   rel->eclass_member_indexes);
> +}
> +}
>
> so, effectively we first bms_intersect to the first member of relids
> before masking out the bits for the remaining ones.  This should mean
> we'll have a Bitmapset with fewer words in many complex planning
> problems. There's no longer the dilemma of having to decide if we
> should start with RelOptInfo's eclass_member_indexes or the
> EquivalenceClass's member indexes.  When using bms_int_member, we
> really want to start with the smallest of those so we get the smallest
> resulting set.  With bms_intersect(), it will always make a copy of
> the smallest set. v10 does that instead of bms_copy()ing the
> EquivalenceClass's member's Bitmapset.
>
> I also wondered how much we're losing to the fact that
> bms_int_members() zeros the trailing words and does not trim the
> Bitmapset down.
>
> The problem there is 2-fold;
> 1) we have to zero the trailing words on the left input. That'll
> pollute the CPU cache a bit as it may have to fetch a bunch of extra
> cache lines, and;
> 2) subsequent bms_int_members() done afterwards may have to mask out
> additional words. If we can make the shortest input really short, then
> subsequent bms_int_members() are going to be very fast.
>
> You might argue there that setting nwords to the shortest length may
> cause us to have to repalloc the Bitmapset if we need to later add
> more members again, but if you look at the repalloc() code, it's
> effectively a no-op when the allocated size >= the requested size, so
> repalloc() should be very fast in this case. So, worst case, there's
> an additional "no-op" repalloc() (which should be very fast) followed
> by maybe a bms_add_members() which has to zero the words instead of
> bms_int_members(). I changed this in the v10-0002 patch. I'm not sure
> if we should do this or not.
>
> I also changed v10-0001 so that we still store the EquivalenceClass's
> members list.  There were a few places where the code just wanted to
> get the first member and having to look at the Bitmapset index and
> fetch the first match from PlannerInfo seemed convoluted.  If the
> query is simple, it seems like it's not going to be very expensive to
> add a few EquivalenceMembers to this list. When planning more complex
> problems, there's probably enough other extra overhead that we're
> unlikely to notice the extra lappend()s.  This also allows v10-0003 to
> work, see below.
>
> In v10-0003, I experimented with the iterator concept that I mentioned
> earlier.  Since v10-0001 is now storing the EquivalenceMember list in
> EquivalenceClass again, it's now quite simple to have the iterator
> decide if it should be scanning the 

Re: [PoC] Reducing planning time when tables have many partitions

2022-11-29 Thread Yuya Watari
Dear Andrey and Thom,

Thank you for reviewing and testing the patch. I really apologize for
my late response.

On Tue, Nov 8, 2022 at 8:31 PM Andrey Lepikhov
 wrote:
> Looking into find_em_for_rel() changes I see that you replaced
> if (bms_is_subset(em->em_relids, rel->relids)
> with assertion statement.
> According of get_ecmember_indexes(), the em_relids field of returned
> equivalence members can contain relids, not mentioned in the relation.
> I don't understand, why it works now? For example, we can sort by t1.x,
> but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it
> is not a mistake, maybe to add a comment why assertion here isn't failed?

As you pointed out, changing the bms_is_subset() condition to an
assertion is logically incorrect here. Thank you for telling me about
it. I fixed it and attached the modified patch to this email.

On Thu, Nov 17, 2022 at 9:05 PM Thom Brown  wrote:
> No issues with applying. Created 1024 partitions, each of which is
> partitioned into 64 partitions.
>
> I'm getting a generic planning time of 1415ms. Is that considered
> reasonable in this situation? Bear in mind that the planning time
> prior to this patch was 282311ms, so pretty much a 200x speedup.

Thank you for testing the patch with an actual query. This speedup is
very impressive. When I used an original query with 1024 partitions,
its planning time was about 200ms. Given that each partition is also
partitioned in your workload, I think the result of 1415ms is
reasonable.

-- 
Best regards,
Yuya Watari


v9-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data


v9-0002-Revert-one-of-the-optimizations.patch
Description: Binary data


v9-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data


v9-0004-Fix-incorrect-assertion.patch
Description: Binary data


Re: [PoC] Reducing planning time when tables have many partitions

2022-11-17 Thread Thom Brown
On Thu, 17 Nov 2022 at 11:20, Thom Brown  wrote:
>
> On Thu, 17 Nov 2022 at 09:31, Alvaro Herrera  wrote:
> >
> > On 2022-Nov-16, Thom Brown wrote:
> >
> > > Once the issue Tom identified has been resolved, I'd like to test
> > > drive newer patches.
> >
> > What issue?  If you mean the one from the thread "Reducing
> > duplicativeness of EquivalenceClass-derived clauses", that patch is
> > already applied (commit a5fc46414deb), and Yuya Watari's v8 series
> > applies fine to current master.
>
> Ah, I see..  I'll test the v8 patches.

No issues with applying.  Created 1024 partitions, each of which is
partitioned into 64 partitions.

I'm getting a generic planning time of 1415ms.  Is that considered
reasonable in this situation?  Bear in mind that the planning time
prior to this patch was 282311ms, so pretty much a 200x speedup.

Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-17 Thread Thom Brown
On Thu, 17 Nov 2022 at 09:31, Alvaro Herrera  wrote:
>
> On 2022-Nov-16, Thom Brown wrote:
>
> > Once the issue Tom identified has been resolved, I'd like to test
> > drive newer patches.
>
> What issue?  If you mean the one from the thread "Reducing
> duplicativeness of EquivalenceClass-derived clauses", that patch is
> already applied (commit a5fc46414deb), and Yuya Watari's v8 series
> applies fine to current master.

Ah, I see..  I'll test the v8 patches.

Thanks

Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-17 Thread Alvaro Herrera
On 2022-Nov-16, Thom Brown wrote:

> Once the issue Tom identified has been resolved, I'd like to test
> drive newer patches.

What issue?  If you mean the one from the thread "Reducing
duplicativeness of EquivalenceClass-derived clauses", that patch is
already applied (commit a5fc46414deb), and Yuya Watari's v8 series
applies fine to current master.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-16 Thread Thom Brown
On Mon, 7 Nov 2022 at 06:33, Zhang Mingli  wrote:
>
> HI,
>
> Regards,
> Zhang Mingli
> On Nov 7, 2022, 14:26 +0800, Tom Lane , wrote:
>
> Andrey Lepikhov  writes:
>
> I'm still in review of your patch now. At most it seems ok, but are you
> really need both eq_sources and eq_derives lists now?
>
>
> Didn't we just have this conversation? eq_sources needs to be kept
> separate to support the "broken EC" logic. We don't want to be
> regurgitating derived clauses as well as originals in that path.
>
> Aha, we have that conversation in another thread(Reducing duplicativeness of 
> EquivalenceClass-derived clauses
> ) : https://www.postgresql.org/message-id/644164.1666877342%40sss.pgh.pa.us

Once the issue Tom identified has been resolved, I'd like to test
drive newer patches.

Thom




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-08 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

Looking into find_em_for_rel() changes I see that you replaced
if (bms_is_subset(em->em_relids, rel->relids)
with assertion statement.
According of get_ecmember_indexes(), the em_relids field of returned 
equivalence members can contain relids, not mentioned in the relation.
I don't understand, why it works now? For example, we can sort by t1.x, 
but have an expression t1.x=t1.y*t2.z. Or I've missed something? If it 
is not a mistake, maybe to add a comment why assertion here isn't failed?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-11-06 Thread Zhang Mingli
HI,

Regards,
Zhang Mingli
On Nov 7, 2022, 14:26 +0800, Tom Lane , wrote:
> Andrey Lepikhov  writes:
> > I'm still in review of your patch now. At most it seems ok, but are you
> > really need both eq_sources and eq_derives lists now?
>
> Didn't we just have this conversation? eq_sources needs to be kept
> separate to support the "broken EC" logic. We don't want to be
> regurgitating derived clauses as well as originals in that path.
>
Aha, we have that conversation in another thread(Reducing duplicativeness of 
EquivalenceClass-derived clauses
) : https://www.postgresql.org/message-id/644164.1666877342%40sss.pgh.pa.us


Re: [PoC] Reducing planning time when tables have many partitions

2022-11-06 Thread Tom Lane
Andrey Lepikhov  writes:
> I'm still in review of your patch now. At most it seems ok, but are you 
> really need both eq_sources and eq_derives lists now?

Didn't we just have this conversation?  eq_sources needs to be kept
separate to support the "broken EC" logic.  We don't want to be
regurgitating derived clauses as well as originals in that path.

regards, tom lane




Re: [PoC] Reducing planning time when tables have many partitions

2022-11-06 Thread Andrey Lepikhov

On 2/11/2022 15:27, Yuya Watari wrote:

Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

I'm still in review of your patch now. At most it seems ok, but are you 
really need both eq_sources and eq_derives lists now? As I see, 
everywhere access to these lists guides by eclass_source_indexes and 
eclass_derive_indexes correspondingly. Maybe to merge them?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: [PoC] Reducing planning time when tables have many partitions

2022-11-02 Thread Yuya Watari
Hello,

I noticed that the previous patch does not apply to the current HEAD.
I attached the rebased version to this email.

-- 
Best regards,
Yuya Watari


v8-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data


v8-0002-Revert-one-of-the-optimizations.patch
Description: Binary data


v8-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data


Re: [PoC] Reducing planning time when tables have many partitions

2022-10-23 Thread Yuya Watari
Hello,

On Wed, Sep 21, 2022 at 6:43 PM Yuya Watari  wrote:
> 1.1. Revert one of our optimizations (v5)
>
> As I mentioned in the comment in
> v[5|6|7]-0002-Revert-one-of-the-optimizations.patch, I reverted one of
> our optimizations. This code tries to find EquivalenceMembers that do
> not satisfy the bms_overlap condition. We encounter such members early
> in the loop, so the linear search is enough, and our optimization is
> too excessive here. As a result of experiments, I found this
> optimization was a bottleneck, so I reverted it.

In the previous mail, I proposed a revert of one excessive
optimization. In addition, I found a new bottleneck and attached a new
version of the patch solving it to this email.

The new bottleneck exists in the select_outer_pathkeys_for_merge()
function. At the end of this function, we count EquivalenceMembers
that satisfy the specific condition. To count them, we have used
Bitmapset operations. Through experiments, I concluded that this
optimization is effective for larger cases but leads to some
degradation for the smaller number of partitions. The new patch
switches two algorithms depending on the problem sizes.

1. Experimental result

1.1. Join Order Benchmark

As in the previous email, I used the Join Order Benchmark to evaluate
the patches' performance. The correspondence between each version and
patches is as follows.

v3: v8-0001-*.patch
v5 (v3 with revert): v8-0001-*.patch + v8-0002-*.patch
v8 (v5 with revert): v8-0001-*.patch + v8-0002-*.patch + v8-0003-*.patch

I show the speed-up of each method compared with the master branch in
Table 1. When the number of partitions is 1, performance degradation
is kept to 1.1% in v8, while they are 4.2% and 1.8% in v3 and v5. This
result indicates that a newly introduced revert is effective.

Table 1: Speedup of Join Order Benchmark (higher is better)
(n = the number of partitions)
--
   n | v3 | v5 (v3 with revert) | v8 (v5 with revert)
--
   2 |  95.8% |   98.2% |   98.9%
   4 |  97.2% |   99.7% |   99.3%
   8 | 101.4% |  102.5% |  103.4%
  16 | 108.7% |  111.4% |  110.2%
  32 | 127.1% |  127.6% |  128.8%
  64 | 169.5% |  172.1% |  172.4%
 128 | 330.1% |  335.2% |  332.3%
 256 | 815.1% |  826.4% |  821.8%
--

1.2. pgbench

The following table describes the result of pgbench. The v5 and v8
performed clearly better than the v3 patch. The difference between v5
and v8 is not so significant, but v8's performance is close to the
master branch.

Table 2: The result of pgbench (tps)
-
   n | Master |   v3 | v5 (v3 with revert) | v8 (v5 with revert)
-
   1 |   7550 | 7422 |7474 |7521
   2 |   7594 | 7381 |7536 |7529
   4 |   7518 | 7362 |7461 |7524
   8 |   7459 | 7340 |7424 |7460
-
 Avg |   7531 | 7377 |7474 |7509
-

2. Conclusion and future works

The revert in the v8-0003-*.patch is effective in preventing
performance degradation for the smaller number of partitions. However,
I don't think what I have done in the patch is the best or ideal
solution. As I mentioned in the comments in the patch, switching two
algorithms may be ugly because it introduces code duplication. We need
a wiser solution to this problem.

-- 
Best regards,
Yuya Watari


v8-0001-Apply-eclass_member_speedup_v3.patch.patch
Description: Binary data


v8-0002-Revert-one-of-the-optimizations.patch
Description: Binary data


v8-0003-Use-conventional-algorithm-for-smaller-size-cases.patch
Description: Binary data


Re: [PoC] Reducing planning time when tables have many partitions

2022-08-26 Thread Yuya Watari
Dear David,

On Fri, Aug 26, 2022 at 12:18 PM David Rowley  wrote:
> How about instead of doing this caching like this, why don't we code
> up some iterators that we can loop over to fetch the required EMs.

Thank you very much for your quick reply and for sharing your idea
with code. I also think introducing EquivalenceMemberIterator is one
good alternative solution. I will try to implement and test it.

Thank you again for helping me.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2022-08-25 Thread David Rowley
On Fri, 26 Aug 2022 at 12:40, Yuya Watari  wrote:
> This performance degradation is due to the heavy processing of the
> get_ec***_indexes***() functions. These functions are the core part of
> the optimization we are working on in this thread, but they are
> relatively heavy when the number of partitions is small.
>
> I noticed that these functions were called repeatedly with the same
> arguments. During planning Query B with one partition, the
> get_ec_source_indexes_strict() function was called 2087 times with
> exactly the same parameters. Such repeated calls occurred many times
> in a single query.

How about instead of doing this caching like this, why don't we code
up some iterators that we can loop over to fetch the required EMs.

I'll attempt to type out my thoughts here without actually trying to
see if this works:

typedef struct EquivalenceMemberIterator
{
   EquivalenceClass *ec;
   Relids relids;
   Bitmapset *em_matches;
   int   position; /* last found index of em_matches or -1 */
   bool use_index;
   bool with_children;
   bool with_norel_members;
} EquivalenceMemberIterator;

We'd then have functions like:

static void
get_ecmember_indexes_iterator(EquivalenceMemberIterator *it,
PlannerInfo *root, EquivalenceClass *ec, Relids relids, bool
with_children, bool with_norel_members)
{
it->ec = ec;
it->relids = relids;
it->position = -1;

it->use_index = (root->simple_rel_array_size > 32); /* or whatever
threshold is best */
it->with_children = with_children;
it->with_norel_members = with_norel_members;

if (it->use_index)
it->em_matches = get_ecmember_indexes(root, ec, relids,
with_children, with_norel_members);
   else
   it->em_matches = NULL;
}

static EquivalenceMember *
get_next_matching_member(PlannerInfo *root, EquivalenceMemberIterator *it)
{
   if (it->use_index)
   {
it->position = bms_next_member(it->ec_matches, it->position);
if (it->position >= 0)
 return list_nth(root->eq_members, it->position);
return NULL;
}
else
{
 int i = it->position;
 while ((i = bms_next_member(it->ec->ec_member_indexes, i) >= 0)
  {
/* filter out the EMs we don't want here "break" when
we find a match */
  }
  it->position = i;
  if (i >= 0)
 return list_nth(root->eq_members, i);
  return NULL;
}
}

Then the consuming code will do something like:

EquivalenceMemberIterator iterator;
get_ecmember_indexes_iterator(, root, ec, relids, true, false);

while ((cur_em = get_next_matching_member(root, )) != NULL)
{
 // do stuff
}

David




Re: [PoC] Reducing planning time when tables have many partitions

2022-08-16 Thread David Rowley
esOn Tue, 9 Aug 2022 at 19:10, David Rowley  wrote:
> I've not had a chance to look at the 0003 patch yet.

I've looked at the 0003 patch now.

The performance numbers look quite impressive, however, there were a
few things about the patch that I struggled to figure what they were
done the way you did them:

+ root->eq_not_children_indexes = bms_add_member(root->eq_not_children_indexes,

Why is that in PlannerInfo rather than in the EquivalenceClass?

  if (bms_equal(rel->relids, em->em_relids))
  {
  rel->eclass_member_indexes =
bms_add_member(rel->eclass_member_indexes, em_index);
  }

Why are you only adding the eclass_member_index to the RelOptInfo when
the em_relids contain a singleton relation?

I ended up going and fixing the patch to be more how I imagined it.

I've ended up with 3 Bitmapset fields in EquivalenceClass;
ec_member_indexes, ec_nonchild_indexes, ec_norel_indexes.  I also
trimmed the number of helper functions down for obtaining the minimal
set of matching EquivalenceMember indexes to just:

Bitmapset *
get_ecmember_indexes(PlannerInfo *root, EquivalenceClass *ec, Relids relids,
bool with_children, bool with_norel_members)

Bitmapset *
get_ecmember_indexes_strict(PlannerInfo *root, EquivalenceClass *ec,
Relids relids, bool with_children,
bool with_norel_members)

I'm not so much a fan of the bool parameters, but it seemed better
than having 8 different functions with each combination of the bool
paramters instead of 2.

The "strict" version of the function takes the intersection of
eclass_member_indexes for each rel mentioned in relids, whereas the
non-strict version does a union of those.  Each then intersect that
with all members in the 'ec', or just the non-child members when
'with_children' is false.  They both then optionally bms_add_members()
the ec_norel_members if with_norel_members is true.  I found it
difficult to figure out the best order to do the intersection. That
really depends on if the particular query has many EquivalenceClasses
with few EquivalenceMembers or few EquivalenceClasses with many
EquivalenceMembers. bms_int_members() always recycles the left input.
Ideally, that would always be the smallest Bitmapset. Maybe it's worth
inventing a new version of bms_int_members() that recycles the input
with the least nwords. That would give the subsequent
bms_next_member() calls an easier time. Right now they'll need to loop
over a bunch of 0 words at the end for many queries.

A few problems I ran into along the way:

1. generate_append_tlist() generates Vars with varno=0.  That causes
problems when we add Exprs from those in add_eq_member() as there is
no element at root->simple_rel_array[0] to add eclass_member_indexes
to.
2. The existing comment for EquivalenceMember.em_relids claims "all
relids appearing in em_expr", but that's just not true when it comes
to em_is_child members.

So far, I fixed #1 by adding a hack to setup_simple_rel_arrays() to do
"root->simple_rel_array[0] = makeNode(RelOptInfo);" I'm not suggesting
that's the correct fix. It might be possible to set the varnos to the
varnos from the first Append child instead.

The fact that #2 is not true adds quite a bit of complexity to the
patch and I think the patch might even misbehave as a result. It seems
there are cases where a child em_relids can contain additional relids
that are not present in the em_expr. For example, when a UNION ALL
child has a Const in the targetlist, as explained in a comment in
add_child_rel_equivalences(). However, there also seem to be cases
where the opposite is true.  I had to add the following code in
add_eq_member() to stop a regression test failing:

if (is_child)
expr_relids = bms_add_members(expr_relids, relids);

That's to make sure we add eclass_member_indexes to each RelOptInfo
mentioned in the em_expr.

After doing all that, I noticed that your benchmark was showing that
create_join_clause() was the new bottleneck. This was due to having to
loop so many times over the ec_sources to find an already built
RestrictInfo. I went off and added some new code to optimize the
lookup of those in a similar way by adding a new Bitmapset field in
RelOptInfo to index which ec_sources it mentioned, which meant having
to move ec_sources into PlannerInfo. I don't think this part of the
patch is quite right yet as the code I have relies on em_relids being
the same as the ones mentioned in the RestrictInfo. That seems not
true for em_is_child EMs, so I think we probably need to add a new
field to EquivalenceMember that truly is just pull_varnos from
em_expr, or else look into some way to make em_relids mean that (like
the comment claims).

Here are my results from running your benchmark on master (@f6c750d31)
with and without the attached patch.

npart master (ms) patched (ms) speedup
2   0.28 0.2995.92%
4   0.37 0.3896.75%
8   0.53 0.5694.43%
16 0.92 0.91 

Re: [PoC] Reducing planning time when tables have many partitions

2022-08-09 Thread David Rowley
On Mon, 8 Aug 2022 at 23:28, Yuya Watari  wrote:
> If you have already applied David's patch, please start the 'git am'
> command from 0002-Fix-bugs.patch. All regression tests passed with
> this patch on my environment.

Thanks for fixing those scope bugs.

In regards to the 0002 patch, you have;

+ * TODO: "bms_add_members(ec1->ec_member_indexes, ec2->ec_member_indexes)"
+ * did not work to combine two EquivalenceClasses. This is probably because
+ * the order of the EquivalenceMembers is different from the previous
+ * implementation, which added the ec2's EquivalenceMembers to the end of
+ * the list.

as far as I can see, the reason the code I that wrote caused the
following regression test failure;

- Index Cond: ((ff = '42'::bigint) AND (ff = '42'::bigint))
+ Index Cond: (ff = '42'::bigint)

was down to how generate_base_implied_equalities_const() marks the EC
as ec_broken = true without any regard to cleaning up the work it's
partially already complete.

Because the loop inside generate_base_implied_equalities_const() just
breaks as soon as we're unable to find a valid equality operator for
the two given types, with my version, since the EquivalenceMember's
order has effectively changed, we just discover the EC is broken
before we call process_implied_equality() ->
distribute_restrictinfo_to_rels(). In the code you've added, the
EquivalenceMembers are effectively still in the original order and the
process_implied_equality() -> distribute_restrictinfo_to_rels() gets
done before we discover the broken EC. The same qual is just added
again during generate_base_implied_equalities_broken(), which is why
the plan has a duplicate ff=42.

This is all just down to the order that the ECs are merged. If you'd
just swapped the order of the items in the query's WHERE clause to
become:

  where ec1.ff = 42::int8 and ss1.x = ec1.f1 and ec1.ff = ec1.f1;

then my version would keep the duplicate qual. For what you've changed
the code to, the planner would not have produced the duplicate ff=42
qual if you'd written the WHERE clause as follows:

  where ss1.x = ec1.f1 and ec1.ff = ec1.f1 and ec1.ff = 42::int8;

In short, I think the code I had for that was fine and it's just the
expected plan that you should be editing. If we wanted to this
behaviour to be consistent then the fix should be to make
generate_base_implied_equalities_const() better at only distributing
the quals down to the relations after it has discovered that the EC is
not broken, or at least cleaning up the partial work that it's done if
it discovers a broken EC. The former seems better to me, but I doubt
that it matters too much as broken ECs should be pretty rare and it
does not seem worth spending too much effort making this work better.

I've not had a chance to look at the 0003 patch yet.

David




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-27 Thread David Rowley
On Wed, 27 Jul 2022 at 18:07, Yuya Watari  wrote:
> I agree that introducing a Bitmapset field may solve this problem. I
> will try this approach in addition to previous ones.

I've attached a very half-done patch that might help you get started
on this. There are still 2 failing regression tests which seem to be
due to plan changes. I didn't expend any effort looking into why these
plans changed.

The attached does not contain any actual optimizations to find the
minimal set of EMs to loop through by masking the Bitmapsets that I
mentioned in my post last night.  I just quickly put it together to
see if there's some hole in the idea. I don't think there is.

I've not really considered all of the places that we'll want to do the
bit twiddling to get the minimal set of EquivalenceMember. I did see
there's a couple more functions in postgres_fdw.c that could be
optimized.

One thing I've only partially thought about is what if you want to
also find EquivalenceMembers with a constant value. If there's a
Const, then you'll lose the bit for that when you mask the ec's
ec_member_indexes with the RelOptInfos.  If there are some places
where we need to keep those then I think we'll need to add another
field to EquivalenceClass to mark the index into PlannerInfo's
eq_members for the EquivalenceMember with the Const. That bit would
have to be bms_add_member()ed back into the Bitmapset of matching
EquivalenceMembers after masking out RelOptInfo's ec_member_indexes.

When adding the optimizations to find the minimal set of EM bits to
search through, you should likely add some functions similar to the
get_eclass_indexes_for_relids() and get_common_eclass_indexes()
functions to help you find the minimal set of bits.  You can also
probably get some other inspiration from [1], in general.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3373c715535
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index 048db542d3..0af3943e03 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -7410,11 +7410,11 @@ conversion_error_callback(void *arg)
 EquivalenceMember *
 find_em_for_rel(PlannerInfo *root, EquivalenceClass *ec, RelOptInfo *rel)
 {
-   ListCell   *lc;
+   int i = -1;
 
-   foreach(lc, ec->ec_members)
+   while ((i = bms_next_member(ec->ec_member_indexes, i)) >= 0)
{
-   EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+   EquivalenceMember *em = (EquivalenceMember *) 
list_nth(root->eq_members, i);
 
/*
 * Note we require !bms_is_empty, else we'd accept constant
@@ -7453,7 +7453,7 @@ find_em_for_rel_target(PlannerInfo *root, 
EquivalenceClass *ec,
{
Expr   *expr = (Expr *) lfirst(lc1);
Index   sgref = get_pathtarget_sortgroupref(target, i);
-   ListCell   *lc2;
+   int i;
 
/* Ignore non-sort expressions */
if (sgref == 0 ||
@@ -7469,9 +7469,10 @@ find_em_for_rel_target(PlannerInfo *root, 
EquivalenceClass *ec,
expr = ((RelabelType *) expr)->arg;
 
/* Locate an EquivalenceClass member matching this expr, if any 
*/
-   foreach(lc2, ec->ec_members)
+   i = -1;
+   while ((i = bms_next_member(ec->ec_member_indexes, i)) >= 0)
{
-   EquivalenceMember *em = (EquivalenceMember *) 
lfirst(lc2);
+   EquivalenceMember *em = (EquivalenceMember *) 
list_nth(root->eq_members, i);
Expr   *em_expr;
 
/* Don't match constants */
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index a96f2ee8c6..2060b73686 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -430,7 +430,7 @@ _outEquivalenceClass(StringInfo str, const EquivalenceClass 
*node)
 
WRITE_NODE_FIELD(ec_opfamilies);
WRITE_OID_FIELD(ec_collation);
-   WRITE_NODE_FIELD(ec_members);
+   WRITE_BITMAPSET_FIELD(ec_member_indexes);
WRITE_NODE_FIELD(ec_sources);
WRITE_NODE_FIELD(ec_derives);
WRITE_BITMAPSET_FIELD(ec_relids);
diff --git a/src/backend/nodes/print.c b/src/backend/nodes/print.c
index a5c44adc6c..5953b79fe8 100644
--- a/src/backend/nodes/print.c
+++ b/src/backend/nodes/print.c
@@ -423,16 +423,17 @@ print_expr(const Node *expr, const List *rtable)
  *   pathkeys list of PathKeys
  */
 void
-print_pathkeys(const List *pathkeys, const List *rtable)
+print_pathkeys(const PlannerInfo *root, const List *pathkeys,
+  const List *rtable)
 {
-   const ListCell *i;
+   const ListCell *lc;
 
printf("(");
-   foreach(i, pathkeys)
+   foreach(lc, pathkeys)
{
-   PathKey*pathkey = (PathKey *) lfirst(i);
+   

Re: [PoC] Reducing planning time when tables have many partitions

2022-07-27 Thread Yuya Watari
Dear David,

Thank you for sharing your new idea.

I agree that introducing a Bitmapset field may solve this problem. I
will try this approach in addition to previous ones.

Thank you again for helping me.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-26 Thread David Rowley
On Mon, 4 Jul 2022 at 09:28, Tom Lane  wrote:
> For the bms_equal class of lookups, I wonder if we could get anywhere
> by adding an additional List field to every RelOptInfo that chains
> all EquivalenceMembers that match that RelOptInfo's relids.
> The trick here would be to figure out when to build those lists.
> The simple answer would be to do it lazily on-demand, but that
> would mean a separate scan of all the EquivalenceMembers for each
> RelOptInfo; I wonder if there's a way to do better?

How about, instead of EquivalenceClass having a List field named
ec_members, it has a Bitmapset field named ec_member_indexes and we
just keep a List of all EquivalenceMembers in PlannerInfo and mark
which ones are in the class by setting the bit in the class's
ec_member_indexes field.

That would be teamed up with a new eclass_member_indexes field in
RelOptInfo to store the index into PlannerInfo's List of
EquivalenceMembers that belong to the given RelOptInfo.

For searching:
If you want to get all EquivalenceMembers in an EquivalenceClass, you
bms_next_member loop over the EC's ec_member_indexes field.
If you want to get all EquivalenceMembers for a given RelOptInfo, you
bms_next_member loop over the RelOptInfo's eclass_member_indexes
field.
If you want to get all EquivalenceMembers for a given EquivalenceClass
and RelOptInfo you need to do some bms_intersect() calls for the rel's
eclass_member_indexes and EC's ec_member_indexes.

I'm unsure if we'd want to bms_union the RelOptInfo's
ec_member_indexes field for join rels.  Looking at
get_eclass_indexes_for_relids() we didn't do it that way for
eclass_indexes. Maybe that's because we're receiving RelIds in a few
places without a RelOptInfo.

Certainly, the CPU cache locality is not going to be as good as if we
had a List with all elements together, but for simple queries, there's
not going to be many EquivalenceClasses anyway, and for complex
queries, this should be a win.

David




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-22 Thread Yuya Watari
Dear Andrey Lepikhov,

Thank you for replying and being a reviewer for this patch. I really
appreciate it.

> Are you still working on this patch?

Yes, I’m working on improving this patch. It is not easy to address
the problems that this patch has, but I’m hoping to send a new version
of it in a few weeks.

-- 
Best regards,
Yuya Watari




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-21 Thread Andrey Lepikhov

On 7/5/22 13:57, Yuya Watari wrote:

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane  wrote:

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.


Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.
This work has significant interest for highly partitioned 
configurations. Are you still working on this patch? According to the 
current state of the thread, changing the status to 'Waiting on author' 
may be better until the next version.

Feel free to reverse the status if you need more feedback.

--
Regards
Andrey Lepikhov
Postgres Professional




Re: [PoC] Reducing planning time when tables have many partitions

2022-07-05 Thread Yuya Watari
Dear Tom,

Thank you for replying to my email.

On Mon, Jul 4, 2022 at 6:28 AM Tom Lane  wrote:
> I'm not real thrilled with trying to throw hashtables at the problem,
> though.  As David noted, they'd be counterproductive for simple
> queries.

As you said, my approach that utilizes hash tables has some overheads,
leading to degradation in query planning time.

I tested the degradation by a brief experiment. In this experiment, I
used a simple query shown below.

===
SELECT students.name, gpas.gpa AS gpa, sum(scores.score) AS total_score
FROM students, scores, gpas
WHERE students.id = scores.student_id AND students.id = gpas.student_id
GROUP BY students.id, gpas.student_id;
===

Here, students, scores, and gpas tables have no partitions, i.e., they
are regular tables. Therefore, my techniques do not work for this
query and instead may lead to some regression. I repeatedly issued
this query 1 million times and measured their planning times.

The attached figure describes the distribution of the planning times.
The figure indicates that my patch has no severe negative impacts on
the planning performance. However, there seems to be a slight
degradation.

I show the mean and median of planning times below. With my patch, the
planning time became 0.002-0.004 milliseconds slower. We have to deal
with this problem, but reducing time complexity while keeping
degradation to zero is significantly challenging.

Planning time (ms)
 |  Mean | Median
--
 Master  | 0.682 |  0.674
 Patched | 0.686 |  0.676
--
 Degradation | 0.004 |  0.002

Of course, the attached result is just an example. Significant
regression might occur in other types of queries.

> For the bms_equal class of lookups, I wonder if we could get anywhere
> by adding an additional List field to every RelOptInfo that chains
> all EquivalenceMembers that match that RelOptInfo's relids.
> The trick here would be to figure out when to build those lists.
> The simple answer would be to do it lazily on-demand, but that
> would mean a separate scan of all the EquivalenceMembers for each
> RelOptInfo; I wonder if there's a way to do better?
>
> Perhaps the bms_is_subset class could be handled in a similar
> way, ie do a one-time pass to make a List of all EquivalenceMembers
> that use a RelOptInfo.

Thank you for giving your idea. I will try to polish up my algorithm
based on your suggestion.

-- 
Best regards,
Yuya Watari


Re: [PoC] Reducing planning time when tables have many partitions

2022-07-03 Thread Tom Lane
Yuya Watari  writes:
> On Thu, Mar 24, 2022 at 11:03 AM David Rowley  wrote:
>> I think a better way to solve this would be just to have a single hash
>> table over all EquivalenceClasses that allows fast lookups of
>> EquivalenceMember->em_expr.

> If the predicate were "em->em_expr == something", the hash table whose
> key is em_expr would be effective. However, the actual predicates are
> not of this type but the following.

> // Find EquivalenceMembers whose relids is equal to the given relids
> (1) bms_equal(em->em_relids, relids)

> // Find EquivalenceMembers whose relids is a subset of the given relids
> (2) bms_is_subset(em->em_relids, relids)

Yeah, that's a really interesting observation, and I agree that
David's suggestion doesn't address it.  Maybe after we fix this
problem, matching of em_expr would be the next thing to look at,
but your results say it isn't the first thing.

I'm not real thrilled with trying to throw hashtables at the problem,
though.  As David noted, they'd be counterproductive for simple
queries.  Sure, we could address that with duplicate code paths,
but that's a messy and hard-to-tune approach.  Also, I find the
idea of hashing on all subsets of relids to be outright scary.
"m is not so large in most cases" does not help when m *is* large.

For the bms_equal class of lookups, I wonder if we could get anywhere
by adding an additional List field to every RelOptInfo that chains
all EquivalenceMembers that match that RelOptInfo's relids.
The trick here would be to figure out when to build those lists.
The simple answer would be to do it lazily on-demand, but that
would mean a separate scan of all the EquivalenceMembers for each
RelOptInfo; I wonder if there's a way to do better?

Perhaps the bms_is_subset class could be handled in a similar
way, ie do a one-time pass to make a List of all EquivalenceMembers
that use a RelOptInfo.

regards, tom lane




Re: [PoC] Reducing planning time when tables have many partitions

2022-06-22 Thread Yuya Watari
Dear David,

Thank you for your comments on my patch. I really apologize for my
late response.

On Thu, Mar 24, 2022 at 11:03 AM David Rowley  wrote:
> I think a better way to solve this would be just to have a single hash
> table over all EquivalenceClasses that allows fast lookups of
> EquivalenceMember->em_expr.  I think there's no reason that a given
> Expr should appear in more than one non-merged EquivalenceClass. The
> EquivalenceClass a given Expr belongs to would need to be updated
> during the merge process.

Thank you for your idea. However, I think building a hash table whose
key is EquivalenceMember->em_expr does not work for this case.

What I am trying to optimize in this patch is the following code.

=
EquivalenceClass *ec = /* given */;

EquivalenceMember *em;
ListCell *lc;
foreach(lc, ec->ec_members)
{
em = (EquivalenceMember *) lfirst(lc);

/* predicate is bms_equal or bms_is_subset, etc */
if (!predicate(em))
continue;

/* The predicate satisfies */
do something...;
}
=

>From my observation, the predicates above will be false in most cases
and the subsequent processes are not executed. My optimization is
based on this notion and utilizes hash tables to eliminate calls of
predicates.

If the predicate were "em->em_expr == something", the hash table whose
key is em_expr would be effective. However, the actual predicates are
not of this type but the following.

// Find EquivalenceMembers whose relids is equal to the given relids
(1) bms_equal(em->em_relids, relids)

// Find EquivalenceMembers whose relids is a subset of the given relids
(2) bms_is_subset(em->em_relids, relids)

Since these predicates perform a match search for not em_expr but
em_relids, we need to build a hash table with em_relids as key. If so,
we can drastically reduce the planning time for the pattern (1).
Besides, by enumerating all subsets of relids, pattern (2) can be
optimized. The detailed algorithm is described in the first email.

I show an example of the pattern (1). The next code is in
src/backend/optimizer/path/equivclass.c. As can be seen from this
code, the foreach loop tries to find an EquivalenceMember whose
cur_em->em_relids is equal to rel->relids. If found, subsequent
processing will be performed.

== Before patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
   RelOptInfo *rel,
   ec_matches_callback_type callback,
   void *callback_arg,
   Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
ListCell   *lc2;

cur_em = NULL;
foreach(lc2, cur_ec->ec_members)
{
cur_em = (EquivalenceMember *) lfirst(lc2);
if (bms_equal(cur_em->em_relids, rel->relids) &&
callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

My patch modifies this code as follows. The em_foreach_relids_equals
is a newly defined macro that finds EquivalenceMember satisfying the
bms_equal. The macro looks up a hash table using rel->relids as a key.
This type of optimization cannot be achieved without using hash tables
whose key is em->em_relids.

== After patched ==
List *
generate_implied_equalities_for_column(PlannerInfo *root,
   RelOptInfo *rel,
   ec_matches_callback_type callback,
   void *callback_arg,
   Relids prohibited_rels)
{
...

EquivalenceClass *cur_ec = (EquivalenceClass *)
list_nth(root->eq_classes, i);
EquivalenceMember *cur_em;
EquivalenceMember *other_em;

cur_em = NULL;
em_foreach_relids_equals(cur_em, cur_ec, rel->relids)
{
Assert(bms_equal(cur_em->em_relids, rel->relids));
if (callback(root, rel, cur_ec, cur_em, callback_arg))
break;
cur_em = NULL;
}

if (!cur_em)
continue;

...
}
===

> We might not want to build the hash table for all queries.

I agree with you. Building a lot of hash tables will consume much
memory.  My idea for this problem is to let the hash table's key be a
pair of EquivalenceClass and Relids. However, this approach may lead
to increasing looking up time of the hash table.

==

I noticed that the previous patch does not work with the current HEAD.
I attached the modified one to this email.

Additionally, I added my patch to the current commit fest [1].
[1] https://commitfest.postgresql.org/38/3701/

-- 
Best regards,
Yuya Watari
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 

Re: [PoC] Reducing planning time when tables have many partitions

2022-03-23 Thread David Rowley
On Fri, 18 Mar 2022 at 23:32, Yuya Watari  wrote:
> I found a problem that planning takes too much time when the tables
> have many child partitions. According to my observation, the planning
> time increases in the order of O(n^2). Here, n is the number of child
> partitions. I attached the patch to solve this problem. Please be
> noted that this patch is a PoC.

> 3. How to Solve?

I think a better way to solve this would be just to have a single hash
table over all EquivalenceClasses that allows fast lookups of
EquivalenceMember->em_expr.  I think there's no reason that a given
Expr should appear in more than one non-merged EquivalenceClass. The
EquivalenceClass a given Expr belongs to would need to be updated
during the merge process.

For functions such as get_eclass_for_sort_expr() and
process_equivalence(), that would become a fairly fast hashtable
lookup instead of having nested loops to find if an EquivalenceMember
already exists for the given Expr. We might not want to build the hash
table for all queries. Maybe we could just do it if we get to
something like ~16 EquivalenceMember in total.

As of now, we don't have any means to hash Exprs, so all that
infrastructure would need to be built first.  Peter Eisentraut is
working on a patch [1] which is a step towards having this.

Here's a simple setup to show the pain of this problem:

create table lp (a int, b int) partition by list(a);
select 'create table lp'||x::text|| ' partition of lp for values
in('||x::text||');' from generate_Series(0,4095)x;
\gexec
explain analyze select * from lp where a=b order by a;

 Planning Time: 510.248 ms
 Execution Time: 264.659 ms

David

[1] https://commitfest.postgresql.org/37/3182/