Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas
On Mon, May 6, 2024 at 3:27 PM Alexander Korotkov  wrote:
> I agree it was a hurry to put the plan into commit message.  I think
> Tom already gave valuable feedback [1] and probably we will get more.
> So, plan is to be decided.  One way or the other I'm not going to
> re-commit this without explicit Tom's consent.

Thanks. I hope we find a way to make it happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing unneeded self joins

2024-05-06 Thread Alexander Korotkov
On Mon, May 6, 2024 at 5:44 PM Robert Haas  wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I agree it was a hurry to put the plan into commit message.  I think
Tom already gave valuable feedback [1] and probably we will get more.
So, plan is to be decided.  One way or the other I'm not going to
re-commit this without explicit Tom's consent.

Links.
1. https://www.postgresql.org/message-id/3622801.1715010885%40sss.pgh.pa.us

--
Regards,
Alexander Korotkov
Supabase




Re: Removing unneeded self joins

2024-05-06 Thread Alexander Korotkov
On Mon, May 6, 2024 at 6:54 PM Tom Lane  wrote:
> Robert Haas  writes:
> > I want to go on record right now as disagreeing with the plan proposed
> > in the commit message for the revert commit, namely, committing this
> > again early in the v18 cycle. I don't think Tom would have proposed
> > reverting this feature unless he believed that it had more serious
> > problems than could be easily fixed in a short period of time. I think
> > that concern is well-founded, given the number of fixes that were
> > committed. It seems likely that the patch needs significant rework and
> > stabilization before it gets committed again, and I think it shouldn't
> > be committed again without explicit agreement from Tom or one of the
> > other committers who have significant experience with the query
> > planner.
>
> FWIW I accept some of the blame here, for not having paid any
> attention to the SJE work earlier.  I had other things on my mind
> for most of last year, and not enough bandwidth to help.
>
> The main thing I'd like to understand before we try this again is
> why SJE needed so much new query-tree-manipulation infrastructure.
> I would have expected it to be very similar to the left-join
> elimination we do already, and therefore to mostly just share the
> existing infrastructure.  (I also harbor suspicions that some of
> the new code existed just because someone didn't research what
> was already there --- for instance, the now-removed replace_varno
> sure looks like ChangeVarNodes should have been used instead.)

Thank you for pointing this.  This area certainly requires more investigation.

> Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
> result relation).  While I don't claim that that destroyed the entire
> use case for SJE, it certainly knocked its usefulness down by many
> notches, maybe even to the point where it's not worth putting in the
> effort needed to get it to re-committability.  So I think we need to
> look harder at finding a way around that.  Is the concern that
> RETURNING should return either old or new values depending on which
> RTE is mentioned?  If so, maybe the feature Dean has proposed to
> allow RETURNING to access old values [1] is a prerequisite to moving
> forward.  Alternatively, perhaps it'd be good enough to forbid SJE
> only when the non-target relation is actually mentioned in RETURNING.

Another problem is EPQ.  During EPQ, we use most recent tuples for the
target relation and snapshot-satisfying tuples for joined relations.
And that affects RETURNING as well.  If we need to return values for
joined relation, that wouldn't be old values, but values of
snapshot-satisfying tuple which might be even older.

Proper support of this looks like quite amount of work for me.
Committing SJE to v18 with this looks challenging.  AFICS, going this
way would require substantial help from you.

--
Regards,
Alexander Korotkov
Supabase




Re: Removing unneeded self joins

2024-05-06 Thread Bruce Momjian
On Mon, May  6, 2024 at 12:24:41PM -0400, Robert Haas wrote:
> Now that being said, I do also agree that the planner code is quite
> hard to understand, for various reasons. I don't think the structure
> of that code and the assumptions underlying it are as well-documented
> as they could be, and neither do I think that all of them are optimal.
> It has taken me a long time to learn as much as I know, and there is
> still quite a lot that I don't know. And I also agree that the planner
> does an unfortunate amount of in-place modification of existing
> structures without a lot of clarity about how it all works, and an
> unfortunate amount of data copying in some places, and even that the
> partition-wise join code isn't all that it could be. But I do not
> think that adds up to a conclusion that we should just be less
> ambitious with planner changes. Indeed, I would like to see us do
> more. There is certainly a lot of useful work that could be done. The
> trick is figuring out how to do it without breaking too many things,
> and that is not easy.

I agree with Robert.  While writting the Postgres 17 release notes, I am
excited to see the many optimizer improvements, and removing self-joins
from that list will be unfortunate.

I did write a blog entry in 2021 that suggested we could have
optimizer aggressiveness control to allow for more expensive
optimizations:

https://momjian.us/main/blogs/pgblog/2021.html#May_14_2021

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas
On Mon, May 6, 2024 at 12:01 PM Andrei Lepikhov
 wrote:
> Right now, it evolves extensively - new structures, variables,
> alternative copies of the same node trees with slightly changed
> properties ... This way allows us to quickly introduce some planning
> features (a lot of changes in planner logic since PG16 is evidence of
> that) and with still growing computing resources it allows postgres to
> fit RAM and proper planning time. But maybe we want to be more modest?
> The Ashutosh's work he has been doing this year shows how sometimes
> expensive the planner is. Perhaps we want machinery that will check the
> integrity of planning data except the setrefs, which fail to detect that
> occasionally?
> If an extensive approach is the only viable option, then it's clear that
> this and many other features are simply not suitable for Postgres
> Planner. It's disheartening that this patch didn't elicit such
> high-level feedback.

Well, as I said before, I think self-join elimination is a good
feature, and I believe that it belongs in PostgreSQL. However, I don't
believe that this implementation was done as well as it needed to be
done. A great deal of the work involved in a feature like this lies in
figuring out at what stage of processing certain kinds of
transformations ought to be done, and what cleanup is needed
afterward. It is difficult for anyone to get that completely right the
first time around; left join elimination also provoked a series of
after-the-fact bug fixes. However, I think those were fewer in number
and spread over a longer period of time.

Now that being said, I do also agree that the planner code is quite
hard to understand, for various reasons. I don't think the structure
of that code and the assumptions underlying it are as well-documented
as they could be, and neither do I think that all of them are optimal.
It has taken me a long time to learn as much as I know, and there is
still quite a lot that I don't know. And I also agree that the planner
does an unfortunate amount of in-place modification of existing
structures without a lot of clarity about how it all works, and an
unfortunate amount of data copying in some places, and even that the
partition-wise join code isn't all that it could be. But I do not
think that adds up to a conclusion that we should just be less
ambitious with planner changes. Indeed, I would like to see us do
more. There is certainly a lot of useful work that could be done. The
trick is figuring out how to do it without breaking too many things,
and that is not easy.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing unneeded self joins

2024-05-06 Thread Andrei Lepikhov

On 6/5/2024 21:44, Robert Haas wrote:

On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
 wrote:

Having no objective negative feedback, we have no reason to change
anything in the design or any part of the code. It looks regrettable and
unusual.


To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, 

I'm really sorry that the level of my language caused a misunderstanding.
The main purpose of this work is to form a more or less certain view of 
the direction of the planner's development.
Right now, it evolves extensively - new structures, variables, 
alternative copies of the same node trees with slightly changed 
properties ... This way allows us to quickly introduce some planning 
features (a lot of changes in planner logic since PG16 is evidence of 
that) and with still growing computing resources it allows postgres to 
fit RAM and proper planning time. But maybe we want to be more modest? 
The Ashutosh's work he has been doing this year shows how sometimes 
expensive the planner is. Perhaps we want machinery that will check the 
integrity of planning data except the setrefs, which fail to detect that 
occasionally?
If an extensive approach is the only viable option, then it's clear that 
this and many other features are simply not suitable for Postgres 
Planner. It's disheartening that this patch didn't elicit such 
high-level feedback.


--
regards,
Andrei Lepikhov





Re: Removing unneeded self joins

2024-05-06 Thread Tom Lane
Robert Haas  writes:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner.

FWIW I accept some of the blame here, for not having paid any
attention to the SJE work earlier.  I had other things on my mind
for most of last year, and not enough bandwidth to help.

The main thing I'd like to understand before we try this again is
why SJE needed so much new query-tree-manipulation infrastructure.
I would have expected it to be very similar to the left-join
elimination we do already, and therefore to mostly just share the
existing infrastructure.  (I also harbor suspicions that some of
the new code existed just because someone didn't research what
was already there --- for instance, the now-removed replace_varno
sure looks like ChangeVarNodes should have been used instead.)

Another thing that made me pretty sad was 8c441c082 (Forbid SJE with
result relation).  While I don't claim that that destroyed the entire
use case for SJE, it certainly knocked its usefulness down by many
notches, maybe even to the point where it's not worth putting in the
effort needed to get it to re-committability.  So I think we need to
look harder at finding a way around that.  Is the concern that
RETURNING should return either old or new values depending on which
RTE is mentioned?  If so, maybe the feature Dean has proposed to
allow RETURNING to access old values [1] is a prerequisite to moving
forward.  Alternatively, perhaps it'd be good enough to forbid SJE
only when the non-target relation is actually mentioned in RETURNING.

regards, tom lane

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCWx0J0-v=Qjc6gXzR=KtsdvAE7Ow=D=mu50agoe+pv...@mail.gmail.com




Re: Removing unneeded self joins

2024-05-06 Thread Bruce Momjian
On Mon, May  6, 2024 at 10:44:33AM -0400, Robert Haas wrote:
> I want to go on record right now as disagreeing with the plan proposed
> in the commit message for the revert commit, namely, committing this
> again early in the v18 cycle. I don't think Tom would have proposed
> reverting this feature unless he believed that it had more serious
> problems than could be easily fixed in a short period of time. I think
> that concern is well-founded, given the number of fixes that were
> committed. It seems likely that the patch needs significant rework and
> stabilization before it gets committed again, and I think it shouldn't
> be committed again without explicit agreement from Tom or one of the
> other committers who have significant experience with the query
> planner. That is not to say that I don't approve generally of the idea
> of committing things earlier in the release cycle: I certainly do. It
> gives us more time to shake out problems with patches before we ship.
> But it only makes sense if we collectively believe that the patch is
> mostly correct, and only needs fine-tuning, and I think there are good
> reasons to believe that we shouldn't have that level of confidence in
> this case.

I think what Robert is saying is that it is an unacceptable plan to just
dump the code into PG 18 and clean it up in the following months --- it
needs more research before it is re-added to git.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Removing unneeded self joins

2024-05-06 Thread Robert Haas
On Sat, May 4, 2024 at 10:46 PM Andrei Lepikhov
 wrote:
> Having no objective negative feedback, we have no reason to change
> anything in the design or any part of the code. It looks regrettable and
> unusual.

To me, this sounds like you think it's someone else's job to tell you
what is wrong with the patch, or how to fix it, and if they don't,
then you should get to have the patch as part of PostgreSQL. But that
is not how we do things, nor should we. I agree that it sucks when you
need feedback and don't get it, and I've written about that elsewhere
and recently. But if you don't get feedback and as a result you can't
get the patch to an acceptable level, or if you do get feedback but
the patch fails to reach an acceptable level anyway, then the only
correct decision is for us to not ship that code. That obviously sucks
from the point of view of the patch author, and also of the committer,
but consider the alternative. Once patches get through an initial
release and become part of the product, the responsibility for fixing
problems is understood to slowly move from the original committer to
the community as a whole. In practice, that means that a lot of the
work of fixing things that are broken, after some initial period, ends
up falling on committers other than the person who did the initial
commit. Even one or two problematic commits can generate an enormous
amount of work for people who weren't involved in the original
development and may not even have agreed with the development
direction, and it is more than fair for those people to express a view
about whether they are willing to carry that burden or not. When they
aren't, I do think that's regrettable, but I don't think it's unusual.
Just in this release, we've removed at least two previously-released
features because they're in bad shape and nobody's willing to maintain
them (snapshot too old, AIX support).

> After designing the feature, fixing its bugs, and reviewing joint
> patches on the commitfest, the question more likely lies in the planner
> design. For example, I wonder if anyone here knows why exactly the
> optimiser makes a copy of the whole query subtree in some places.
> Another example is PlannerInfo. Can we really control all the
> consequences of introducing, let's say, a new JoinDomain entity?

Bluntly, if you can't control those consequences, then you aren't
allowed to make that change.

I know first-hand how difficult some of these problems are. Sometime
in the last year or three, I spent weeks getting rid of ONE global
variable (ThisTimeLineID). It took an absolutely inordinate amount of
time, and it became clear to me that I was never going to get rid of
enough global variables in that part of the code to be able to write a
patch for the feature I wanted without risk of unforeseen
consequences. So I gave up on the entire feature. Maybe I'll try again
at some point, or maybe somebody else will feel like cleaning up that
code and then I can try again with a cleaner base, but what I don't
get to do is write a buggy patch for the feature I want and commit it
anyway. I either figure out a way to do it that I believe is low-risk
and that the community judges to be acceptable, or I don't do it.

I want to go on record right now as disagreeing with the plan proposed
in the commit message for the revert commit, namely, committing this
again early in the v18 cycle. I don't think Tom would have proposed
reverting this feature unless he believed that it had more serious
problems than could be easily fixed in a short period of time. I think
that concern is well-founded, given the number of fixes that were
committed. It seems likely that the patch needs significant rework and
stabilization before it gets committed again, and I think it shouldn't
be committed again without explicit agreement from Tom or one of the
other committers who have significant experience with the query
planner. That is not to say that I don't approve generally of the idea
of committing things earlier in the release cycle: I certainly do. It
gives us more time to shake out problems with patches before we ship.
But it only makes sense if we collectively believe that the patch is
mostly correct, and only needs fine-tuning, and I think there are good
reasons to believe that we shouldn't have that level of confidence in
this case.

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing unneeded self joins

2024-05-04 Thread Andrei Lepikhov

On 3/5/2024 20:55, Robert Haas wrote:

One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor.

I discovered this feature and agree that it looks like a severe problem.
Unfortunately, in the case of the SJE patch, the committer and reviewers 
don't provide negative feedback. We see the only (I'm not sure I use the 
proper English phrase) 'negative feelings' from people who haven't 
reviewed or analysed it at all (at least, they didn't mention it).


Considering the situation, I suggest setting the default value of 
enable_self_join_removal to false in PG17 for added safety and then 
changing it to true in early PG18.


Having no objective negative feedback, we have no reason to change 
anything in the design or any part of the code. It looks regrettable and 
unusual.


After designing the feature, fixing its bugs, and reviewing joint 
patches on the commitfest, the question more likely lies in the planner 
design. For example, I wonder if anyone here knows why exactly the 
optimiser makes a copy of the whole query subtree in some places. 
Another example is PlannerInfo. Can we really control all the 
consequences of introducing, let's say, a new JoinDomain entity?


You also mentioned 2024.pgconf.dev. Considering the current migration 
policy in some countries, it would be better to work through the online 
presence as equivalent to offline. Without an online part of the 
conference, the only way to communicate and discuss is through this 
mailing list.


--
regards,
Andrei Lepikhov





Re: Removing unneeded self joins

2024-05-03 Thread Robert Haas
On Fri, May 3, 2024 at 4:57 AM Alexander Korotkov  wrote:
> I agree to revert it for v17, but I'm not exactly sure the issue is
> design (nevertheless design review is very welcome as any other type
> of review).  The experience of the bugs arising with the SJE doesn't
> show me a particular weak spot in the feature.  It looks more like
> this patch has to revise awfully a lot planner data structures to
> replace one relid with another.  And I don't see the design, which
> could avoid that.  Somewhere in the thread I have proposed a concept
> of "alias relids".  However, I suspect that could leave us with more
> lurking bugs instead of the bug-free code.

I agree that reverting it for v17 makes sense. In terms of moving
forward, whether a design review is exactly the right idea or not, I'm
not sure. However, I think that the need to replace relids in a lot of
places is something that a design review might potentially flag as a
problem. Maybe there is some other approach that can avoid the need
for this.

On the other hand, maybe there's not. But in that case, the question
becomes how the patch author(s), and committer, are going to make sure
that most of the issues get flushed out before the initial commit.
What we can't do is say - we know that we need to replace relids in a
bunch of places, so we'll change the ones we know about, and then rely
on testing to find any that we missed. There has to be some kind of
systematic plan that everyone can agree should find all of the
affected places, and then if a few slip through, that's fine, that's
how life goes.

I haven't followed the self-join elimination work very closely, and I
do quite like the idea of the feature. However, looking over all the
follow-up commits, it's pretty hard to escape the conclusion that
there were a lot of cases that weren't adequately considered in the
initial work (lateral, result relations, PHVs, etc.). And that is a
big problem -- it really creates a lot of issues for the project when
a major feature commit misses whole areas that it needs to have
considered, as plenty of previous history will show. When anybody
starts to realize that they've not just had a few goofs but have
missed some whole affected area entirely, it's time to start thinking
about a revert.

One of my most embarrassing gaffes in this area personally was
a448e49bcbe40fb72e1ed85af910dd216d45bad8. I don't know how I managed
to commit the original patch without realizing it was going to cause
an increase in the WAL size, but I can tell you that when I realized
it, my heart sank through the floor. I'd love to return to that work
if we can all ever agree on a way of addressing that problem, but in
the meantime, that patch is very dead. And ... if somebody had taken
the time to give me a really good design review of that patch, they
might well have noticed, and saved me the embarrassment of committing
something that had no shot of remaining in the tree. Unfortunately,
one of the downsides of being a committer is that you tend to get less
of that sort of review, because people assume you know what you're
doing. Which is fabulous, when you actually do know what you're doing,
and really sucks, when you don't. One of the things I'd like to see
discussed at 2024.pgconf.dev is how we can improve this aspect of how
we work together.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Removing unneeded self joins

2024-05-03 Thread Alexander Korotkov
Hi, Tom!

On Fri, May 3, 2024 at 2:19 AM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > I would appreciate your review of this patchset, and review from Andrei as 
> > well.
>
> I hate to say this ... but if we're still finding bugs this
> basic in SJE, isn't it time to give up on it for v17?
>
> I might feel better about it if there were any reason to think
> these were the last major bugs.  But you have already committed
> around twenty separate fixes for the original SJE patch, and
> now here you come with several more; so it doesn't seem like
> the defect rate has slowed materially.  There can be no doubt
> whatever that the original patch was far from commit-ready.

I think if we subtract from the SJE followup commits containing
improvements (extra comments, asserts) and fix for in-place Bitmapset
modification, which was there before, the number of fixes will be
closer to ten.  And the number of pending fixes will be two.  But I
totally get your concern that we're quite late in the release cycle
and new SJE-related issues continue to arise.  This leads to a
significant risk of raising many bugs for end users.

> I think we should revert SJE for v17 and do a thorough design
> review before trying again in v18.

I agree to revert it for v17, but I'm not exactly sure the issue is
design (nevertheless design review is very welcome as any other type
of review).  The experience of the bugs arising with the SJE doesn't
show me a particular weak spot in the feature.  It looks more like
this patch has to revise awfully a lot planner data structures to
replace one relid with another.  And I don't see the design, which
could avoid that.  Somewhere in the thread I have proposed a concept
of "alias relids".  However, I suspect that could leave us with more
lurking bugs instead of the bug-free code.

I suggest we should give this feature more review and testing, then
commit early v18.  That would leave us enough time to fix any other
issues before v18 release.

--
Regards,
Alexander Korotkov
Supabase




Re: Removing unneeded self joins

2024-05-02 Thread Andrei Lepikhov

On 5/3/24 06:19, Tom Lane wrote:

Alexander Korotkov  writes:

I would appreciate your review of this patchset, and review from Andrei as well.


I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?

I might feel better about it if there were any reason to think
these were the last major bugs.  But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially.  There can be no doubt
whatever that the original patch was far from commit-ready.

I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.

I need to say I don't see any evidence of bad design.
I think this feature follows the example of 2489d76 [1], 1349d27 [2], 
and partitionwise join features — we get some issues from time to time, 
but these strengths and frequencies are significantly reduced.
First and foremost, this feature is highly isolated: like the PWJ 
feature, you can just disable (not enable?) SJE and it guarantees you 
will avoid the problems.
Secondly, this feature reflects the design decisions the optimiser has 
made before. It raises some questions: Do we really control the 
consistency of our paths and the plan tree? Maybe we hide our 
misunderstanding of its logic by extensively copying expression trees, 
sometimes without fundamental necessity. Perhaps the optimiser needs 
some abstraction layers or reconstruction to reduce the quickly growing 
complexity.
A good example here is [1]. IMO, the new promising feature it has 
introduced isn't worth the complexity it added to the planner.
SJE, much like OR <-> ANY transformation, introduces a fresh perspective 
into the planner: if we encounter a complex, redundant query, it may be 
more beneficial to invest in simplifying the internal query 
representation rather than adding new optimisations that will grapple 
with this complexity.
Also, SJE raised questions I've never seen before, like: Could we 
control the consistency of the PlannerInfo by changing something in the 
logic?
Considering the current state, I don't see any concrete outcomes or 
evidence that a redesign of the feature will lead us to a new path. 
However, I believe the presence of SJE in the core could potentially 
trigger improvements in the planner. As a result, I vote to stay with 
the feature. But remember, as an author, I'm not entirely objective, so 
let's wait for alternative opinions.


[1] Make Vars be outer-join-aware
[2] Improve performance of ORDER BY / DISTINCT aggregates

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-05-02 Thread Tom Lane
Alexander Korotkov  writes:
> I would appreciate your review of this patchset, and review from Andrei as 
> well.

I hate to say this ... but if we're still finding bugs this
basic in SJE, isn't it time to give up on it for v17?

I might feel better about it if there were any reason to think
these were the last major bugs.  But you have already committed
around twenty separate fixes for the original SJE patch, and
now here you come with several more; so it doesn't seem like
the defect rate has slowed materially.  There can be no doubt
whatever that the original patch was far from commit-ready.

I think we should revert SJE for v17 and do a thorough design
review before trying again in v18.

regards, tom lane




Re: Removing unneeded self joins

2024-05-02 Thread Alexander Korotkov
Hi, Richard!

On Thu, May 2, 2024 at 4:14 PM Richard Guo  wrote:
> On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov  
> wrote:
>> On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
>>  wrote:
>> > One question for me is: Do we anticipate other lateral self-references
>> > except the TABLESAMPLE case? Looking into the extract_lateral_references
>> > implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
>> > pull up subqueries before extracting lateral references. So, if we have
>> > a reference to a subquery, it means we will not flatten this subquery
>> > and don't execute SJE. Do we need more code, as you have written in the
>> > first patch?
>>
>> I think my first patch was crap anyway.  Your explanation seems
>> reasonable to me.  I'm not sure this requires any more code.  Probably
>> it would be enough to add more comments about this.
>
>
> The tablesample case is not the only factor that can cause a relation to
> have a lateral dependency on itself after self-join removal.  It can
> also happen with PHVs.  As an example, consider
>
> explain (costs off)
> select * from t t1
> left join lateral
>   (select t1.a as t1a, * from t t2) t2
> on true
> where t1.a = t2.a;
> server closed the connection unexpectedly
>
> This is because after self-join removal, a PlaceHolderInfo's ph_lateral
> might contain rels mentioned in ph_eval_at, which we should get rid of.
>
> For the tablesample case, I agree that we should not consider relations
> with TABLESAMPLE clauses as candidates to be removed.  Removing such a
> relation could potentially change the syntax of the query, as shown by
> Alexander's example.  It seems to me that we can just check that in
> remove_self_joins_recurse, while we're collecting the base relations
> that are considered to be candidates for removal.
>
> This leads to the attached patch.  This patch also includes some code
> refactoring for the surrounding code.

Great, thank you for your work on this!

I'd like to split this into separate patches for better granularity of
git history.  I also added 0001 patch, which makes first usage of the
SJE acronym in file to come with disambiguation.  Also, I've added
assert that ph_lateral and ph_eval_at didn't overlap before the
changes.  I think this should help from the potential situation when
the changes we do could mask another bug.

I would appreciate your review of this patchset, and review from Andrei as well.

--
Regards,
Alexander Korotkov
Supabase


v2-0002-Minor-refactoring-for-self-join-elimination-code.patch
Description: Binary data


v2-0003-Forbid-self-join-elimination-on-table-sampling-sc.patch
Description: Binary data


v2-0004-Fix-self-join-elimination-work-with-PlaceHolderIn.patch
Description: Binary data


v2-0001-Clarify-the-SJE-self-join-elimination-acronym.patch
Description: Binary data


Re: Removing unneeded self joins

2024-05-02 Thread Richard Guo
On Thu, May 2, 2024 at 6:08 PM Alexander Korotkov 
wrote:

> On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
>  wrote:
> > One question for me is: Do we anticipate other lateral self-references
> > except the TABLESAMPLE case? Looking into the extract_lateral_references
> > implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
> > pull up subqueries before extracting lateral references. So, if we have
> > a reference to a subquery, it means we will not flatten this subquery
> > and don't execute SJE. Do we need more code, as you have written in the
> > first patch?
>
> I think my first patch was crap anyway.  Your explanation seems
> reasonable to me.  I'm not sure this requires any more code.  Probably
> it would be enough to add more comments about this.


The tablesample case is not the only factor that can cause a relation to
have a lateral dependency on itself after self-join removal.  It can
also happen with PHVs.  As an example, consider

explain (costs off)
select * from t t1
left join lateral
  (select t1.a as t1a, * from t t2) t2
on true
where t1.a = t2.a;
server closed the connection unexpectedly

This is because after self-join removal, a PlaceHolderInfo's ph_lateral
might contain rels mentioned in ph_eval_at, which we should get rid of.

For the tablesample case, I agree that we should not consider relations
with TABLESAMPLE clauses as candidates to be removed.  Removing such a
relation could potentially change the syntax of the query, as shown by
Alexander's example.  It seems to me that we can just check that in
remove_self_joins_recurse, while we're collecting the base relations
that are considered to be candidates for removal.

This leads to the attached patch.  This patch also includes some code
refactoring for the surrounding code.

Thanks
Richard


v1-0001-Fix-bogus-lateral-dependency-after-self-join-removal.patch
Description: Binary data


Re: Removing unneeded self joins

2024-05-02 Thread Alexander Korotkov
On Thu, May 2, 2024 at 12:45 PM Andrei Lepikhov
 wrote:
>
> On 5/1/24 18:59, Alexander Korotkov wrote:
> > I think we probably could forbid SJE for the tables with TABLESAMPLE
> > altogether.  Please, check the attached patch.
> Your patch looks good to me. I added some comments and test case into
> the join.sql.

Thank you

> One question for me is: Do we anticipate other lateral self-references
> except the TABLESAMPLE case? Looking into the extract_lateral_references
> implementation, I see the only RTE_SUBQUERY case to be afraid of. But we
> pull up subqueries before extracting lateral references. So, if we have
> a reference to a subquery, it means we will not flatten this subquery
> and don't execute SJE. Do we need more code, as you have written in the
> first patch?

I think my first patch was crap anyway.  Your explanation seems
reasonable to me.  I'm not sure this requires any more code.  Probably
it would be enough to add more comments about this.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-05-02 Thread Andrei Lepikhov

On 5/1/24 18:59, Alexander Korotkov wrote:

I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether.  Please, check the attached patch.
Your patch looks good to me. I added some comments and test case into 
the join.sql.


One question for me is: Do we anticipate other lateral self-references 
except the TABLESAMPLE case? Looking into the extract_lateral_references 
implementation, I see the only RTE_SUBQUERY case to be afraid of. But we 
pull up subqueries before extracting lateral references. So, if we have 
a reference to a subquery, it means we will not flatten this subquery 
and don't execute SJE. Do we need more code, as you have written in the 
first patch?


--
regards,
Andrei Lepikhov
Postgres Professional
From dac8afd2095586921dfcf5564e7f2979e89b77de Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov" 
Date: Thu, 2 May 2024 16:17:52 +0700
Subject: [PATCH] Forbid self-join elimination on table sampling scans.

Having custom table sampling methods we can stuck into unpredictable issues
replacing join with scan operation. It may had sense to analyse possible
situations and enable SJE, but the real profit from this operation looks
too low.
---
 src/backend/optimizer/plan/analyzejoins.c |  8 
 src/backend/optimizer/plan/initsplan.c|  5 -
 src/test/regress/expected/join.out| 13 +
 src/test/regress/sql/join.sql |  5 +
 4 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 506fccd20c..bb89558dcd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2148,6 +2148,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			Assert(root->simple_rte_array[k]->relid ==
    root->simple_rte_array[r]->relid);
 
+			/*
+			 * To avoid corner cases with table sampling methods just forbid
+			 * SJE for table sampling entries.
+			 */
+			if (root->simple_rte_array[k]->tablesample ||
+root->simple_rte_array[r]->tablesample)
+continue;
+
 			/*
 			 * It is impossible to eliminate join of two relations if they
 			 * belong to different rules of order. Otherwise planner can't be
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index e2c68fe6f9..bf839bcaf6 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -415,7 +415,10 @@ extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex)
 	if (!rte->lateral)
 		return;
 
-	/* Fetch the appropriate variables */
+	/* Fetch the appropriate variables.
+	 * Changes in this place may need changes in SJE logic, see
+	 * the remove_self_joins_one_group routine.
+	 */
 	if (rte->rtekind == RTE_RELATION)
 		vars = pull_vars_of_level((Node *) rte->tablesample, 0);
 	else if (rte->rtekind == RTE_SUBQUERY)
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 8b640c2fc2..63143fe55f 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6900,6 +6900,19 @@ where s1.x = 1;
Filter: (1 = 1)
 (9 rows)
 
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+  (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+ QUERY PLAN  
+-
+ Nested Loop
+   ->  Seq Scan on emp1 t1
+   ->  Sample Scan on emp1 t2
+ Sampling: system (t1.code)
+ Filter: ((t1.id = id) AND (t1.code = code))
+(5 rows)
+
 -- Check that PHVs do not impose any constraints on removing self joins
 explain (verbose, costs off)
 select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index c4c6c7b8ba..184fd0876b 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -2652,6 +2652,11 @@ select 1 from emp1 t1 left join
 on true
 where s1.x = 1;
 
+-- Check that SJE doesn't touch TABLESAMPLE joins
+explain (costs off)
+SELECT * FROM emp1 t1 NATURAL JOIN LATERAL
+  (SELECT * FROM emp1 t2 TABLESAMPLE SYSTEM (t1.code));
+
 -- Check that PHVs do not impose any constraints on removing self joins
 explain (verbose, costs off)
 select * from emp1 t1 join emp1 t2 on t1.id = t2.id left join
-- 
2.39.2



Re: Removing unneeded self joins

2024-05-01 Thread Alexander Korotkov
On Wed, May 1, 2024 at 2:00 PM Alexander Lakhin  wrote:
> 30.04.2024 13:20, Alexander Korotkov wrote:
> > On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  
> > wrote:
> >> I've discovered another failure, introduced by d3d55ce57.
> >> Please try the following:
> >> CREATE TABLE t (a int unique, b float);
> >> SELECT * FROM t NATURAL JOIN LATERAL
> >>(SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
> > I think we should just forbid SJE in case when relations to be merged
> > have cross-references with lateral vars.  The draft patch for this is
> > attached.  I'd like to ask Alexander to test it, Richard and Andrei to
> > review it.  Thank you!
>
> Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
> in general. For instance:
> CREATE TABLE t (a int unique);
> INSERT INTO t SELECT * FROM generate_series (1,100);
>
> SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
>   NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
> returned 100, 100, 100 for me, though with enable_self_join_removal = off,
> I got 4, 0, 1...

Right, thank you for reporting this.

BTW, I found another case where my previous fix doesn't work.

SELECT * FROM t NATURAL JOIN LATERAL (SELECT * FROM t t2 TABLESAMPLE
SYSTEM (t.b) NATURAL JOIN LATERAL(SELECT * FROM t t3 TABLESAMPLE
SYSTEM (t2.b)) t3) t2;

I think we probably could forbid SJE for the tables with TABLESAMPLE
altogether.  Please, check the attached patch.

--
Regards,
Alexander Korotkov


sje_tablesample.patch
Description: Binary data


Re: Removing unneeded self joins

2024-05-01 Thread Alexander Lakhin

30.04.2024 13:20, Alexander Korotkov wrote:

On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  wrote:

I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
   (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars.  The draft patch for this is
attached.  I'd like to ask Alexander to test it, Richard and Andrei to
review it.  Thank you!


Beside LATERAL vars, it seems that SJR doesn't play well with TABLESAMPLE
in general. For instance:
CREATE TABLE t (a int unique);
INSERT INTO t SELECT * FROM generate_series (1,100);

SELECT COUNT(*) FROM (SELECT * FROM t TABLESAMPLE BERNOULLI(1)) t1
 NATURAL JOIN (SELECT * FROM t TABLESAMPLE BERNOULLI(100)) t2;
returned 100, 100, 100 for me, though with enable_self_join_removal = off,
I got 4, 0, 1...

Best regards,
Alexander




Re: Removing unneeded self joins

2024-04-30 Thread Alexander Korotkov
On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  wrote:
> 23.10.2023 12:47, Alexander Korotkov wrote:
> > I think this patch makes substantial improvement to query planning.
> > It has received plenty of reviews.  The code is currently in quite
> > good shape.  I didn't manage to find the cases when this optimization
> > causes significant overhead to planning time.  Even if such cases will
> > be spotted there is a GUC option to disable this feature.  So, I'll
> > push this if there are no objections.
>
> I've discovered another failure, introduced by d3d55ce57.
> Please try the following:
> CREATE TABLE t (a int unique, b float);
> SELECT * FROM t NATURAL JOIN LATERAL
>   (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

I think we should just forbid SJE in case when relations to be merged
have cross-references with lateral vars.  The draft patch for this is
attached.  I'd like to ask Alexander to test it, Richard and Andrei to
review it.  Thank you!

--
Regards,
Alexander Korotkov


sje_skip_cross_lateral_vars.patch
Description: Binary data


Re: Removing unneeded self joins

2024-04-30 Thread Alexander Korotkov
Hi, Alexander!

On Tue, Apr 30, 2024 at 9:00 AM Alexander Lakhin  wrote:
> 23.10.2023 12:47, Alexander Korotkov wrote:
> > I think this patch makes substantial improvement to query planning.
> > It has received plenty of reviews.  The code is currently in quite
> > good shape.  I didn't manage to find the cases when this optimization
> > causes significant overhead to planning time.  Even if such cases will
> > be spotted there is a GUC option to disable this feature.  So, I'll
> > push this if there are no objections.
>
> I've discovered another failure, introduced by d3d55ce57.
> Please try the following:
> CREATE TABLE t (a int unique, b float);
> SELECT * FROM t NATURAL JOIN LATERAL
>   (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;
>
> With asserts enabled, it triggers
> TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: 
> "initsplan.c", Line: 697, PID: 3074054
> ExceptionalCondition at assert.c:52:13
> create_lateral_join_info at initsplan.c:700:8
> query_planner at planmain.c:257:2
> grouping_planner at planner.c:1523:17
> subquery_planner at planner.c:1098:2
> standard_planner at planner.c:415:9
> planner at planner.c:282:12
> pg_plan_query at postgres.c:904:9
> pg_plan_queries at postgres.c:996:11
> exec_simple_query at postgres.c:1193:19
> PostgresMain at postgres.c:4684:27
>
> With no asserts, I get:
> ERROR:  failed to construct the join relation
>
> Please take a look at this.

I'm looking into this, thank you!

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-04-30 Thread Alexander Lakhin

Hello Alexander,

23.10.2023 12:47, Alexander Korotkov wrote:

I think this patch makes substantial improvement to query planning.
It has received plenty of reviews.  The code is currently in quite
good shape.  I didn't manage to find the cases when this optimization
causes significant overhead to planning time.  Even if such cases will
be spotted there is a GUC option to disable this feature.  So, I'll
push this if there are no objections.


I've discovered another failure, introduced by d3d55ce57.
Please try the following:
CREATE TABLE t (a int unique, b float);
SELECT * FROM t NATURAL JOIN LATERAL
 (SELECT * FROM t t2 TABLESAMPLE SYSTEM (t.b)) t2;

With asserts enabled, it triggers
TRAP: failed Assert("!bms_is_member(rti, lateral_relids)"), File: 
"initsplan.c", Line: 697, PID: 3074054
ExceptionalCondition at assert.c:52:13
create_lateral_join_info at initsplan.c:700:8
query_planner at planmain.c:257:2
grouping_planner at planner.c:1523:17
subquery_planner at planner.c:1098:2
standard_planner at planner.c:415:9
planner at planner.c:282:12
pg_plan_query at postgres.c:904:9
pg_plan_queries at postgres.c:996:11
exec_simple_query at postgres.c:1193:19
PostgresMain at postgres.c:4684:27

With no asserts, I get:
ERROR:  failed to construct the join relation

Please take a look at this.

Best regards,
Alexander




Re: Removing unneeded self joins

2024-02-24 Thread Noah Misch
Hello,

On Sat, Feb 24, 2024 at 01:02:01PM +0200, Alexander Korotkov wrote:
> On Sat, Feb 24, 2024 at 7:12 AM Noah Misch  wrote:
> > On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov 
> > >  wrote:
> > > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > > I think the right fix for these issues is to introduce a new element
> > > > > 'sublevels_up' in ReplaceVarnoContext, and enhance 
> > > > > replace_varno_walker
> > > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > > perform the replacement only when varlevelsup is equal to 
> > > > > sublevels_up.
> > > > This code looks good. No idea how we have lost it before.
> > >
> > > Thanks to Richard for the patch and to Andrei for review.  I also find
> > > code looking good.  Pushed with minor edits from me.
> >
> > I feel this, commit 466979e, misses a few of our project standards:
> >
> > - The patch makes many non-whitespace changes to existing test queries.  
> > This
> >   makes it hard to review the consequences of the non-test part of the 
> > patch.
> >   Did you minimize such edits?  Of course, not every such edit is avoidable.
> >
> > - The commit message doesn't convey whether this is refactoring or is a bug
> >   fix.  This makes it hard to write release notes, among other things.  From
> >   this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
> >   v17-specific.  The commit message for 489072ab7a is also silent about that
> >   commit's status as refactoring or as a bug fix.
> >
> > - Normally, I could answer the previous question by reading the test case
> >   diffs.  However, in addition to the first point about non-whitespace
> >   changes, the first three join.sql patch hunks just change whitespace.
> >   Worse, since they move line breaks, "git diff -w" doesn't filter them out.
> >
> > To what extent are those community standards vs. points of individual
> > committer preference?  Please tell me where I'm wrong here.
> 
> I agree that commit 466979e is my individual committer failure.  I
> should have written a better, more clear commit message and separate
> tests refactoring from the bug fix.
> 
> I'm not so sure about 489072ab7a (except it provides a wrong fix).  It
> has a "Reported-by:" field meaning it's a problem reported by a
> particular person.  The "Discussion:" points directly to the reported
> test case.  And commit contains the relevant test case.  The commit
> message could be more wordy though.

Agreed, the first and third points don't apply to 489072ab7a.  Thanks to that,
one can deduce from its new test case query that it fixes a bug.  It sounds
like we agree about commit 466979e, so that's good.




Re: Removing unneeded self joins

2024-02-24 Thread Alexander Korotkov
Hi, Noah!

On Sat, Feb 24, 2024 at 7:12 AM Noah Misch  wrote:
> On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> > On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov 
> >  wrote:
> > > On 21/2/2024 14:26, Richard Guo wrote:
> > > > I think the right fix for these issues is to introduce a new element
> > > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > > This code looks good. No idea how we have lost it before.
> >
> > Thanks to Richard for the patch and to Andrei for review.  I also find
> > code looking good.  Pushed with minor edits from me.
>
> I feel this, commit 466979e, misses a few of our project standards:
>
> - The patch makes many non-whitespace changes to existing test queries.  This
>   makes it hard to review the consequences of the non-test part of the patch.
>   Did you minimize such edits?  Of course, not every such edit is avoidable.
>
> - The commit message doesn't convey whether this is refactoring or is a bug
>   fix.  This makes it hard to write release notes, among other things.  From
>   this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
>   v17-specific.  The commit message for 489072ab7a is also silent about that
>   commit's status as refactoring or as a bug fix.
>
> - Normally, I could answer the previous question by reading the test case
>   diffs.  However, in addition to the first point about non-whitespace
>   changes, the first three join.sql patch hunks just change whitespace.
>   Worse, since they move line breaks, "git diff -w" doesn't filter them out.
>
> To what extent are those community standards vs. points of individual
> committer preference?  Please tell me where I'm wrong here.

I agree that commit 466979e is my individual committer failure.  I
should have written a better, more clear commit message and separate
tests refactoring from the bug fix.

I'm not so sure about 489072ab7a (except it provides a wrong fix).  It
has a "Reported-by:" field meaning it's a problem reported by a
particular person.  The "Discussion:" points directly to the reported
test case.  And commit contains the relevant test case.  The commit
message could be more wordy though.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-02-23 Thread Noah Misch
On Sat, Feb 24, 2024 at 12:36:59AM +0200, Alexander Korotkov wrote:
> On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov  
> wrote:
> > On 21/2/2024 14:26, Richard Guo wrote:
> > > I think the right fix for these issues is to introduce a new element
> > > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > > to 1) recurse into subselects with sublevels_up increased, and 2)
> > > perform the replacement only when varlevelsup is equal to sublevels_up.
> > This code looks good. No idea how we have lost it before.
> 
> Thanks to Richard for the patch and to Andrei for review.  I also find
> code looking good.  Pushed with minor edits from me.

I feel this, commit 466979e, misses a few of our project standards:

- The patch makes many non-whitespace changes to existing test queries.  This
  makes it hard to review the consequences of the non-test part of the patch.
  Did you minimize such edits?  Of course, not every such edit is avoidable.

- The commit message doesn't convey whether this is refactoring or is a bug
  fix.  This makes it hard to write release notes, among other things.  From
  this mailing list thread, it gather it's a bug fix in 489072ab7a, hence
  v17-specific.  The commit message for 489072ab7a is also silent about that
  commit's status as refactoring or as a bug fix.

- Normally, I could answer the previous question by reading the test case
  diffs.  However, in addition to the first point about non-whitespace
  changes, the first three join.sql patch hunks just change whitespace.
  Worse, since they move line breaks, "git diff -w" doesn't filter them out.

To what extent are those community standards vs. points of individual
committer preference?  Please tell me where I'm wrong here.




Re: Removing unneeded self joins

2024-02-23 Thread Alexander Korotkov
On Thu, Feb 22, 2024 at 10:51 AM Andrei Lepikhov
 wrote:
> On 21/2/2024 14:26, Richard Guo wrote:
> > I think the right fix for these issues is to introduce a new element
> > 'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
> > to 1) recurse into subselects with sublevels_up increased, and 2)
> > perform the replacement only when varlevelsup is equal to sublevels_up.
> This code looks good. No idea how we have lost it before.

Thanks to Richard for the patch and to Andrei for review.  I also find
code looking good.  Pushed with minor edits from me.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-02-22 Thread Andrei Lepikhov

On 21/2/2024 14:26, Richard Guo wrote:

I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.

This code looks good. No idea how we have lost it before.


While writing the fix, I noticed some outdated comments.  Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed.  So
this patch also fixes some outdated comments.

Thanks, looks good.


While writing the test cases, I found that the test cases for SJE are
quite messy.  Below are what I have noticed:

* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal.  I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.

I see only one unusual query with the pg_class involved.


* In many test cases, a mix of uppercase and lowercase keywords is used
in one query.  I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.

I see uppercase -> lowercase change:
select t1.*, t2.a as ax from sj t1 join sj t2
and lowercase -> uppercase in many other cases:
explain (costs off)
I guess it is a matter of taste, so give up for the committer decision. 
Technically, it's OK.


* In most situations, we verify the plan and the output of a query like:

explain (costs off)
select ...;
select ...;

The two select queries are supposed to be the same.  But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.

This patch also includes some cosmetic tweaks for SJE test cases.  It
does not change the test cases using catalog tables though.  I think
that should be a seperate patch.
I can't assess the necessity of changing these dozens of lines of code 
because I follow another commenting style, but technically, it's still OK.


--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-02-21 Thread Andrei Lepikhov

On 21/2/2024 14:26, Richard Guo wrote:

This patch also includes some cosmetic tweaks for SJE test cases.  It
does not change the test cases using catalog tables though.  I think
that should be a seperate patch.
Thanks for this catch, it is really messy thing, keeping aside the 
question why we need two different subtrees for the same query.

I will look into your fix.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2024-02-20 Thread Richard Guo
On Mon, Feb 19, 2024 at 1:24 PM Andrei Lepikhov 
wrote:

> On 18/2/2024 23:18, Alexander Korotkov wrote:
> > On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov 
> wrote:
> >> On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin 
> wrote:
> >>> Please look at the following query which fails with an error since
> >>> d3d55ce57:
> >>>
> >>> create table t (i int primary key);
> >>>
> >>> select t3.i from t t1
> >>>   join t t2 on t1.i = t2.i,
> >>>   lateral (select t1.i limit 1) t3;
> >>>
> >>> ERROR:  non-LATERAL parameter required by subquery
> >>
> >> Thank you for spotting.  I'm looking at this.
> >
> > Attached is a draft patch fixing this query.  Could you, please, recheck?
> I reviewed this patch. Why do you check only the target list? I guess
> these links can be everywhere. See the patch in the attachment with the
> elaborated test and slightly changed code.


I just noticed that this fix has been committed in 489072ab7a, but it's
just flat wrong.

* The fix walks the subquery and replaces all the Vars with a varno
equal to the relid of the removing rel, without checking the
varlevelsup.  That is to say, a Var that belongs to the subquery itself
might also be replaced, which is wrong.  For instance,

create table t (i int primary key);

explain (costs off)
select t3.i from t t1
  join t t2 on t1.i = t2.i
  join lateral (select * from (select t1.i offset 0) offset 0) t3 on true;
ERROR:  no relation entry for relid 2


* The fix only traverses one level within the subquery, so Vars that
appear in subqueries with multiple levels cannot be replaced.  For
instance,

explain (costs off)
select t4.i from t t1
  join t t2 on t1.i = t2.i
  join lateral (select t3.i from t t3, (select t1.i) offset 0) t4 on true;
ERROR:  non-LATERAL parameter required by subquery


I think the right fix for these issues is to introduce a new element
'sublevels_up' in ReplaceVarnoContext, and enhance replace_varno_walker
to 1) recurse into subselects with sublevels_up increased, and 2)
perform the replacement only when varlevelsup is equal to sublevels_up.

Attached is a patch for the fix.

While writing the fix, I noticed some outdated comments.  Such as in
remove_rel_from_query, the first for loop updates otherrel's attr_needed
as well as lateral_vars, but the comment only mentions attr_needed.  So
this patch also fixes some outdated comments.

While writing the test cases, I found that the test cases for SJE are
quite messy.  Below are what I have noticed:

* There are several test cases using catalog tables like pg_class,
pg_stats, pg_index, etc. for testing join removal.  I don't see a reason
why we need to use catalog tables, and I think this just raises the risk
of instability.

* In many test cases, a mix of uppercase and lowercase keywords is used
in one query.  I think it'd better to maintain consistency by using
either all uppercase or all lowercase keywords in one query.

* In most situations, we verify the plan and the output of a query like:

explain (costs off)
select ...;
select ...;

The two select queries are supposed to be the same.  But in the SJE test
cases, I have noticed instances where the two select queries differ from
each other.

This patch also includes some cosmetic tweaks for SJE test cases.  It
does not change the test cases using catalog tables though.  I think
that should be a seperate patch.

Thanks
Richard


v1-0001-Replace-lateral-references-to-removed-rels-in-subqueries.patch
Description: Binary data


Re: Removing unneeded self joins

2024-02-18 Thread Andrei Lepikhov

On 18/2/2024 23:18, Alexander Korotkov wrote:

On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov  wrote:

On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin  wrote:

09.01.2024 01:09, Alexander Korotkov wrote:

Fixed in 30b4955a46.


Please look at the following query which fails with an error since
d3d55ce57:

create table t (i int primary key);

select t3.i from t t1
  join t t2 on t1.i = t2.i,
  lateral (select t1.i limit 1) t3;

ERROR:  non-LATERAL parameter required by subquery


Thank you for spotting.  I'm looking at this.


Attached is a draft patch fixing this query.  Could you, please, recheck?
I reviewed this patch. Why do you check only the target list? I guess 
these links can be everywhere. See the patch in the attachment with the 
elaborated test and slightly changed code.


--
regards,
Andrei Lepikhov
Postgres Professional
From 7f94a3c96fd410522b87e570240cdb96b300dd31 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Mon, 19 Feb 2024 12:17:55 +0700
Subject: [PATCH] Replace relids in lateral subquery target list during SJE

---
 src/backend/optimizer/plan/analyzejoins.c | 29 ++-
 src/test/regress/expected/join.out| 44 +++
 src/test/regress/sql/join.sql | 12 +++
 3 files changed, 84 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index e494acd51a..072298f66c 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -395,7 +395,34 @@ remove_rel_from_query(PlannerInfo *root, RelOptInfo *rel,
}
 
/* Update lateral references. */
-   replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+   if (root->hasLateralRTEs)
+   {
+   RangeTblEntry *rte = root->simple_rte_array[rti];
+   ReplaceVarnoContext ctx = {.from = relid,.to = subst};
+
+   if (rte->lateral)
+   {
+   replace_varno((Node *) otherrel->lateral_vars, 
relid, subst);
+
+   /*
+* Although we pass root->parse through cleanup 
procedure,
+* but parse->rtable and rte contains refs to 
different copies
+* of the subquery.
+*/
+   if (otherrel->rtekind == RTE_SUBQUERY)
+   query_tree_walker(rte->subquery, 
replace_varno_walker, ,
+ 
QTW_EXAMINE_SORTGROUP);
+#ifdef USE_ASSERT_CHECKING
+   /* Just check possibly hidden non-replaced 
relids */
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->tablesample)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->functions)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->tablefunc)));
+   Assert(!bms_is_member(relid, pull_varnos(root, 
(Node *) rte->values_lists)));
+#endif
+   }
+   }
+
+
}
 
/*
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index 0c2cba8921..d560a4a6b9 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -6349,6 +6349,50 @@ on true;
->  Seq Scan on int8_tbl y
 (7 rows)
 
+-- Test processing target lists in lateral subqueries
+explain (verbose, costs off)
+SELECT t3.a FROM sj t1, sj t2,
+LATERAL (SELECT t1.a WHERE t1.a <> 1
+GROUP BY (t1.a) HAVING t1.a > 0 ORDER BY t1.a LIMIT 1) t3,
+LATERAL (SELECT t1.a,t3.a WHERE t1.a <> t3.a+t2.a
+GROUP BY (t3.a) HAVING t1.a > t3.a*t3.a+t2.a/t1.a LIMIT 2) t4,
+LATERAL (SELECT * FROM sj TABLESAMPLE bernoulli(t1.a/t2.a)
+REPEATABLE (t1.a+t2.a)) t5,
+LATERAL generate_series(1, t1.a + t2.a) AS t6
+WHERE t1.a = t2.a;
+  QUERY PLAN   

+---
+ Nested Loop
+   Output: (t2.a)
+   ->  Nested Loop
+ Output: t2.a, (t2.a)
+ ->  Nested Loop
+   Output: t2.a, (t2.a)
+   ->  Nested Loop
+ Output: t2.a, (t2.a)
+ ->  Seq Scan on public.sj t2
+   Output: t2.a, t2.b, t2.c
+   Filter: (t2.a IS NOT NULL)
+ ->  Limit
+   Output: (t2.a)
+   ->  Group
+ Output: t2.a
+   

Re: Removing unneeded self joins

2024-02-18 Thread Alexander Lakhin

18.02.2024 19:18, Alexander Korotkov wrote:

Attached is a draft patch fixing this query. Could you, please, recheck?


Yes, this patch fixes the behavior for that query (I've also tried several
similar queries). Though I don't know the code well enough to judge the
code change.

Best regards,
Alexander





Re: Removing unneeded self joins

2024-02-18 Thread Alexander Korotkov
On Sun, Feb 18, 2024 at 5:04 PM Alexander Korotkov  wrote:
> On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin  wrote:
> > 09.01.2024 01:09, Alexander Korotkov wrote:
> >
> > Fixed in 30b4955a46.
> >
> >
> > Please look at the following query which fails with an error since
> > d3d55ce57:
> >
> > create table t (i int primary key);
> >
> > select t3.i from t t1
> >  join t t2 on t1.i = t2.i,
> >  lateral (select t1.i limit 1) t3;
> >
> > ERROR:  non-LATERAL parameter required by subquery
>
> Thank you for spotting.  I'm looking at this.

Attached is a draft patch fixing this query.  Could you, please, recheck?

--
Regards,
Alexander Korotkov


0001-Replace-relids-in-lateral-subquery-target-list-du-v1.patch
Description: Binary data


Re: Removing unneeded self joins

2024-02-18 Thread Alexander Korotkov
On Sun, Feb 18, 2024 at 3:00 PM Alexander Lakhin  wrote:
> 09.01.2024 01:09, Alexander Korotkov wrote:
>
> Fixed in 30b4955a46.
>
>
> Please look at the following query which fails with an error since
> d3d55ce57:
>
> create table t (i int primary key);
>
> select t3.i from t t1
>  join t t2 on t1.i = t2.i,
>  lateral (select t1.i limit 1) t3;
>
> ERROR:  non-LATERAL parameter required by subquery

Thank you for spotting.  I'm looking at this.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-02-18 Thread Alexander Lakhin

Hi Alexander,

09.01.2024 01:09, Alexander Korotkov wrote:

Fixed in 30b4955a46.


Please look at the following query which fails with an error since
d3d55ce57:

create table t (i int primary key);

select t3.i from t t1
 join t t2 on t1.i = t2.i,
 lateral (select t1.i limit 1) t3;

ERROR:  non-LATERAL parameter required by subquery

Best regards,
Alexander

Re: Removing unneeded self joins

2024-01-09 Thread Alexander Korotkov
On Tue, Jan 9, 2024 at 8:08 AM Alexander Korotkov  wrote:
> On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin  wrote:
> > 09.01.2024 01:09, Alexander Korotkov wrote:
> > > Fixed in 30b4955a46.
> >
> > Thank you for fixing that!
> >
> > I've found another anomaly coined with d3d55ce57. This query:
> > CREATE TABLE t(a int PRIMARY KEY, b int);
> > INSERT INTO t VALUES  (1, 1), (2, 1);
> >
> > WITH t1 AS (SELECT * FROM t)
> > UPDATE t SET b = t1.b + 1 FROM t1
> > WHERE t.a = t1.a RETURNING t.a, t1.b;
> >
> > gives "ERROR:  variable not found in subplan target lists" on d3d55ce57, but
> > starting from a7928a57b it gives an incorrect result:
> >   a | b
> > ---+---
> >   1 | 2
> >   2 | 2
> > (2 rows)
>
> I see.  It seems to be not safe to apply SJE to the modify table
> target relation because it could use a different snapshot for the
> RETURNING clause.  I think we should just forbid SJE to involve the
> modify table target relation.  I'm planning to fix this later today.

Fixed in 8c441c08279.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Tue, Jan 9, 2024 at 6:00 AM Alexander Lakhin  wrote:
> 09.01.2024 01:09, Alexander Korotkov wrote:
> > Fixed in 30b4955a46.
>
> Thank you for fixing that!
>
> I've found another anomaly coined with d3d55ce57. This query:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES  (1, 1), (2, 1);
>
> WITH t1 AS (SELECT * FROM t)
> UPDATE t SET b = t1.b + 1 FROM t1
> WHERE t.a = t1.a RETURNING t.a, t1.b;
>
> gives "ERROR:  variable not found in subplan target lists" on d3d55ce57, but
> starting from a7928a57b it gives an incorrect result:
>   a | b
> ---+---
>   1 | 2
>   2 | 2
> (2 rows)

I see.  It seems to be not safe to apply SJE to the modify table
target relation because it could use a different snapshot for the
RETURNING clause.  I think we should just forbid SJE to involve the
modify table target relation.  I'm planning to fix this later today.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Lakhin

09.01.2024 01:09, Alexander Korotkov wrote:

Fixed in 30b4955a46.


Thank you for fixing that!

I've found another anomaly coined with d3d55ce57. This query:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES  (1, 1), (2, 1);

WITH t1 AS (SELECT * FROM t)
UPDATE t SET b = t1.b + 1 FROM t1
WHERE t.a = t1.a RETURNING t.a, t1.b;

gives "ERROR:  variable not found in subplan target lists" on d3d55ce57, but
starting from a7928a57b it gives an incorrect result:
 a | b
---+---
 1 | 2
 2 | 2
(2 rows)


Best regards,
Alexander




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Mon, Jan 8, 2024 at 10:20 PM Alexander Korotkov  wrote:
> On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin  wrote:
> > Please look at the following query which produces an incorrect result since
> > d3d55ce57:
> > CREATE TABLE t(a int PRIMARY KEY, b int);
> > INSERT INTO t VALUES  (1, 1), (2, 1);
> > SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b 
> > > 0);
> >
> >   a | b
> > ---+---
> >   1 | 1
> > (1 row)
> >
> > I think that the expected result is:
> >   a | b
> > ---+---
> >   1 | 1
> >   2 | 1
> > (2 rows)
>
> Thank you for your report.  I'm looking at this now.

Fixed in 30b4955a46.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Korotkov
On Mon, Jan 8, 2024 at 10:00 PM Alexander Lakhin  wrote:
> Please look at the following query which produces an incorrect result since
> d3d55ce57:
> CREATE TABLE t(a int PRIMARY KEY, b int);
> INSERT INTO t VALUES  (1, 1), (2, 1);
> SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 
> 0);
>
>   a | b
> ---+---
>   1 | 1
> (1 row)
>
> I think that the expected result is:
>   a | b
> ---+---
>   1 | 1
>   2 | 1
> (2 rows)

Thank you for your report.  I'm looking at this now.

--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2024-01-08 Thread Alexander Lakhin

Hello Andrei and Alexander,

Please look at the following query which produces an incorrect result since
d3d55ce57:
CREATE TABLE t(a int PRIMARY KEY, b int);
INSERT INTO t VALUES  (1, 1), (2, 1);
SELECT * FROM t WHERE EXISTS (SELECT * FROM t t2 WHERE t2.a = t.b AND t2.b > 0);

 a | b
---+---
 1 | 1
(1 row)

I think that the expected result is:
 a | b
---+---
 1 | 1
 2 | 1
(2 rows)

Best regards,
Alexander




Re: Removing unneeded self joins

2023-12-29 Thread Alexander Lakhin

Hi Andrei,

29.12.2023 12:58, Andrei Lepikhov wrote:

Thanks for the report!
The problem is with the resultRelation field. We forget to replace the relid 
here.
Could you check your issue with the patch in the attachment? Does it resolve 
this case?



Yes, with the patch applied I see no error.
Thank you!

Best regards,
Alexander




Re: Removing unneeded self joins

2023-12-29 Thread Andrei Lepikhov

On 29/12/2023 12:00, Alexander Lakhin wrote:

Hi Alexander,

23.10.2023 14:29, Alexander Korotkov wrote:

Fixed all of the above. Thank you for catching this!


I've discovered that starting from d3d55ce57 the following query:
CREATE TABLE t(a int PRIMARY KEY);

WITH tt AS (SELECT * FROM t)
UPDATE t SET a = tt.a + 1 FROM tt
WHERE tt.a = t.a RETURNING t.a;

triggers an error "variable not found in subplan target lists".
(Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)


Thanks for the report!
The problem is with the resultRelation field. We forget to replace the 
relid here.
Could you check your issue with the patch in the attachment? Does it 
resolve this case?


--
regards,
Andrei Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 6c02fe8908..f79c673afd 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1861,6 +1861,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark 
*kmark, PlanRowMark *rmark,
/* Replace varno in all the query structures */
query_tree_walker(root->parse, replace_varno_walker, ,
  QTW_EXAMINE_SORTGROUP);
+   if (root->parse->resultRelation == toRemove->relid)
+   root->parse->resultRelation = toKeep->relid;
 
/* Replace links in the planner info */
remove_rel_from_query(root, toRemove, toKeep->relid, NULL, NULL);
@@ -1870,6 +1872,9 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark 
*kmark, PlanRowMark *rmark,
  toRemove->relid, toKeep->relid);
replace_varno((Node *) root->processed_groupClause,
  toRemove->relid, toKeep->relid);
+   replace_relid(root->all_result_relids, toRemove->relid, toKeep->relid);
+   replace_relid(root->leaf_result_relids, toRemove->relid, toKeep->relid);
+
 
/*
 * There may be references to the rel in root->fkey_list, but if so,


Re: Removing unneeded self joins

2023-12-28 Thread Alexander Lakhin

Hi Alexander,

23.10.2023 14:29, Alexander Korotkov wrote:

Fixed all of the above. Thank you for catching this!


I've discovered that starting from d3d55ce57 the following query:
CREATE TABLE t(a int PRIMARY KEY);

WITH tt AS (SELECT * FROM t)
UPDATE t SET a = tt.a + 1 FROM tt
WHERE tt.a = t.a RETURNING t.a;

triggers an error "variable not found in subplan target lists".
(Commits 8a8ed916f and b5fb6736e don't fix this, unfortunately.)

Best regards,
Alexander




Re: Removing unneeded self joins

2023-10-23 Thread Alexander Korotkov
On Mon, Oct 23, 2023 at 2:00 PM Alexander Lakhin  wrote:
> 23.10.2023 12:47, Alexander Korotkov wrote:
> > I think this patch makes substantial improvement to query planning.
> > It has received plenty of reviews.  The code is currently in quite
> > good shape.  I didn't manage to find the cases when this optimization
> > causes significant overhead to planning time.  Even if such cases will
> > be spotted there is a GUC option to disable this feature.  So, I'll
> > push this if there are no objections.
>
> On a quick glance, I've noticed following typos/inconsistencies in the
> patch, which maybe worth fixing:
> s/cadidates/candidates/
> s/uniquiness/uniqueness/
> s/selfjoin/self-join/
> s/seperate/separate/
>
> Also, shouldn't the reference "see generate_implied_equalities" be
> "see generate_implied_equalities_for_column"?

Fixed all of the above.  Thank you for catching this!

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v48.patch
Description: Binary data


Re: Removing unneeded self joins

2023-10-23 Thread Alexander Lakhin

Hi Alexander,

23.10.2023 12:47, Alexander Korotkov wrote:

I think this patch makes substantial improvement to query planning.
It has received plenty of reviews.  The code is currently in quite
good shape.  I didn't manage to find the cases when this optimization
causes significant overhead to planning time.  Even if such cases will
be spotted there is a GUC option to disable this feature.  So, I'll
push this if there are no objections.


On a quick glance, I've noticed following typos/inconsistencies in the
patch, which maybe worth fixing:
s/cadidates/candidates/
s/uniquiness/uniqueness/
s/selfjoin/self-join/
s/seperate/separate/

Also, shouldn't the reference "see generate_implied_equalities" be
"see generate_implied_equalities_for_column"?

Best regards,
Alexander




Re: Removing unneeded self joins

2023-10-23 Thread Alexander Korotkov
On Mon, Oct 23, 2023 at 6:43 AM Andrei Lepikhov
 wrote:
> On 22/10/2023 05:01, Alexander Korotkov wrote:
> > On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
> >  wrote:
> >> On 19/10/2023 01:50, Alexander Korotkov wrote:
> >>> This query took 3778.432 ms with self-join removal disabled, and
> >>> 3756.009 ms with self-join removal enabled.  So, no measurable
> >>> overhead.  Similar to the higher number of joins.  Can you imagine
> >>> some extreme case when self-join removal could introduce significant
> >>> overhead in comparison with other optimizer parts?  If not, should we
> >>> remove self_join_search_limit GUC?
> >> Thanks,
> >> It was Zhihong Yu who worried about that case [1]. And my purpose was to
> >> show a method to avoid such a problem if it would be needed.
> >> I guess the main idea here is that we have a lot of self-joins, but only
> >> few of them (or no one) can be removed.
> >> I can't imagine a practical situation when we can be stuck in the
> >> problems here. So, I vote to remove this GUC.
> >
> > I've removed the self_join_search_limit.  Anyway there is
> > enable_self_join_removal if the self join removal algorithm causes any
> > problems.  I also did some grammar corrections for the comments.  I
> > think the patch is getting to the committable shape.  I noticed some
> > failures on commitfest.cputube.org.  I'd like to check how this
> > version will pass it.
>
> I have observed the final patch. A couple of minor changes can be made
> (see attachment).

Thank you, Andrei!  I've integrated your changes into the patch.

> Also, I see room for improvement, but it can be done later. For example,
> we limit the optimization to only ordinary tables in this patch. It can
> be extended at least with partitioned and foreign tables soon.

Yes, I think it's reasonable to postpone some improvements.  It's
important to get the basic feature in, make sure it's safe and stable.
Then we can make improvements incrementally.

I think this patch makes substantial improvement to query planning.
It has received plenty of reviews.  The code is currently in quite
good shape.  I didn't manage to find the cases when this optimization
causes significant overhead to planning time.  Even if such cases will
be spotted there is a GUC option to disable this feature.  So, I'll
push this if there are no objections.

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v47.patch
Description: Binary data


Re: Removing unneeded self joins

2023-10-22 Thread Andrei Lepikhov

On 22/10/2023 05:01, Alexander Korotkov wrote:

On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
 wrote:

On 19/10/2023 01:50, Alexander Korotkov wrote:

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the
problems here. So, I vote to remove this GUC.


I've removed the self_join_search_limit.  Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems.  I also did some grammar corrections for the comments.  I
think the patch is getting to the committable shape.  I noticed some
failures on commitfest.cputube.org.  I'd like to check how this
version will pass it.


I have observed the final patch. A couple of minor changes can be made 
(see attachment).
Also, I see room for improvement, but it can be done later. For example, 
we limit the optimization to only ordinary tables in this patch. It can 
be extended at least with partitioned and foreign tables soon.


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 6b848aadad..b84197dadb 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -23,7 +23,6 @@
 #include "postgres.h"
 
 #include "catalog/pg_class.h"
-#include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -50,7 +49,7 @@ typedef struct UniqueRelInfo
 } UniqueRelInfo;
 
 /*
- * The context for replace_varno_walker() containing source and target relids2
+ * The context for replace_varno_walker() containing source and target relids.
  */
 typedef struct
 {


Re: Removing unneeded self joins

2023-10-21 Thread Alexander Korotkov
On Thu, Oct 19, 2023 at 6:16 AM Andrei Lepikhov
 wrote:
> On 19/10/2023 01:50, Alexander Korotkov wrote:
> > This query took 3778.432 ms with self-join removal disabled, and
> > 3756.009 ms with self-join removal enabled.  So, no measurable
> > overhead.  Similar to the higher number of joins.  Can you imagine
> > some extreme case when self-join removal could introduce significant
> > overhead in comparison with other optimizer parts?  If not, should we
> > remove self_join_search_limit GUC?
> Thanks,
> It was Zhihong Yu who worried about that case [1]. And my purpose was to
> show a method to avoid such a problem if it would be needed.
> I guess the main idea here is that we have a lot of self-joins, but only
> few of them (or no one) can be removed.
> I can't imagine a practical situation when we can be stuck in the
> problems here. So, I vote to remove this GUC.

I've removed the self_join_search_limit.  Anyway there is
enable_self_join_removal if the self join removal algorithm causes any
problems.  I also did some grammar corrections for the comments.  I
think the patch is getting to the committable shape.  I noticed some
failures on commitfest.cputube.org.  I'd like to check how this
version will pass it.

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v46.patch
Description: Binary data


Re: Removing unneeded self joins

2023-10-18 Thread Andrei Lepikhov

On 19/10/2023 01:50, Alexander Korotkov wrote:

On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
 wrote:

On 12/10/2023 18:32, Alexander Korotkov wrote:

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:

On 4/10/2023 14:34, Alexander Korotkov wrote:

   > Relid replacement machinery is the most contradictory code here. We used
   > a utilitarian approach and implemented a simplistic variant.

   > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
   > > they are not necessary.  [1] points that infrastructure from [2] might
   > > be useful.  The patchset from [2] seems committed mow.  However, I
   > > can't see it is directly helpful in this matter.  Could we just skip
   > > adding IS NOT NULL clause for the columns, that have
   > > pg_attribute.attnotnull set?
   > Thanks for the links, I will look into that case.

To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.


OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


I would like to propose one more minor improvement (see in attachment).
The idea here is that after removing a self-join and changing clauses we
should re-probe the set of relids with the same Oid, because we can find
more removable self-joins (see the demo test in join.sql).



Thank you, I've integrated this into the patch.  BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit.  enable_self_join_removal variable turns
on/off optimization at all.  self_join_search_limit variable limits
its usage by the number of joins.  AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has.  I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

Thanks,
It was Zhihong Yu who worried about that case [1]. And my purpose was to 
show a method to avoid such a problem if it would be needed.
I guess the main idea here is that we have a lot of self-joins, but only 
few of them (or no one) can be removed.
I can't imagine a practical situation when we can be stuck in the 
problems here. So, I vote to remove this GUC.



[1] 
https://www.postgresql.org/message-id/CALNJ-vTyL-LpvSOPZxpC63Et3LJLUAFZSfRqGEhT5Rj7_EEj7w%40mail.gmail.com


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2023-10-18 Thread Alexander Korotkov
On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
 wrote:
> On 12/10/2023 18:32, Alexander Korotkov wrote:
> > On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> >  wrote:
> >> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >>>   > Relid replacement machinery is the most contradictory code here. We 
> >>> used
> >>>   > a utilitarian approach and implemented a simplistic variant.
> >>>
> >>>   > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >>>   > > they are not necessary.  [1] points that infrastructure from [2] 
> >>> might
> >>>   > > be useful.  The patchset from [2] seems committed mow.  However, I
> >>>   > > can't see it is directly helpful in this matter.  Could we just skip
> >>>   > > adding IS NOT NULL clause for the columns, that have
> >>>   > > pg_attribute.attnotnull set?
> >>>   > Thanks for the links, I will look into that case.
> >> To be more precise, in the attachment, you can find a diff to the main
> >> patch, which shows the volume of changes to achieve the desired behaviour.
> >> Some explains in regression tests shifted. So, I've made additional tests:
> >>
> >> DROP TABLE test CASCADE;
> >> CREATE TABLE test (a int, b int not null);
> >> CREATE UNIQUE INDEX abc ON test(b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> CREATE UNIQUE INDEX abc1 ON test(a,b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> >> DROP INDEX abc1;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
> >>
> >> We have almost the results we wanted to have. But in the last explain
> >> you can see that nothing happened with the OR clause. We should use the
> >> expression mutator instead of walker to handle such clauses. But It
> >> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> >> of this issue off for a while.
> >
> > OK.  I think it doesn't worth to eliminate IS NULL quals with this
> > complexity (at least at this stage of work).
> >
> > I made improvements over the code.  Mostly new comments, grammar
> > corrections of existing comments and small refactoring.
> >
> > Also, I found that the  suggestion from David Rowley [1] to qsort
> > array of relations to faster find duplicates is still unaddressed.
> > I've implemented it.  That helps to evade quadratic complexity with
> > large number of relations.
> >
> > Also I've incorporated improvements from Alena Rybakina except one for
> > skipping SJ removal when no SJ quals is found.  It's not yet clear for
> > me if this check fix some cases. But at least optimization got skipped
> > in some useful cases (as you can see in regression tests).
>
> I would like to propose one more minor improvement (see in attachment).
> The idea here is that after removing a self-join and changing clauses we
> should re-probe the set of relids with the same Oid, because we can find
> more removable self-joins (see the demo test in join.sql).


Thank you, I've integrated this into the patch.  BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit.  enable_self_join_removal variable turns
on/off optimization at all.  self_join_search_limit variable limits
its usage by the number of joins.  AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has.  I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v45.patch
Description: Binary data


Re: Removing unneeded self joins

2023-10-16 Thread Andrei Lepikhov

On 12/10/2023 18:32, Alexander Korotkov wrote:

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:

On 4/10/2023 14:34, Alexander Korotkov wrote:

  > Relid replacement machinery is the most contradictory code here. We used
  > a utilitarian approach and implemented a simplistic variant.

  > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
  > > they are not necessary.  [1] points that infrastructure from [2] might
  > > be useful.  The patchset from [2] seems committed mow.  However, I
  > > can't see it is directly helpful in this matter.  Could we just skip
  > > adding IS NOT NULL clause for the columns, that have
  > > pg_attribute.attnotnull set?
  > Thanks for the links, I will look into that case.

To be more precise, in the attachment, you can find a diff to the main
patch, which shows the volume of changes to achieve the desired behaviour.
Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.


OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


I would like to propose one more minor improvement (see in attachment). 
The idea here is that after removing a self-join and changing clauses we 
should re-probe the set of relids with the same Oid, because we can find 
more removable self-joins (see the demo test in join.sql).


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 7b8dc7a2b7..f7ccda5231 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -2298,6 +2298,7 @@ remove_self_joins_recurse(PlannerInfo *root, List 
*joinlist, Relids toRemove)
{
/* Create group of relation indexes with the 
same oid */
Relids  group = NULL;
+   Relids  removed;
 
while (i < j)
{
@@ -2306,8 +2307,21 @@ remove_self_joins_recurse(PlannerInfo *root, List 
*joinlist, Relids toRemove)
}
 
relids = bms_del_members(relids, group);
-   toRemove = bms_add_members(toRemove,
-   
remove_self_joins_one_group(root, group));
+
+   /*
+* Try to remove self-joins from a group of 
identical entries.
+* Make next attempt iteratively - if something 
is deleted from
+* a group, changes in clauses and equivalence 
classes can give
+* us a chance to find more candidates.
+*/
+   do {
+   Assert(!bms_overlap(group, toRemove));
+   removed = 
remove_self_joins_one_group(root, group);
+   toRemove = bms_add_members(toRemove, 
removed);
+   group = bms_del_members(group, removed);
+   } while (!bms_is_empty(removed) &&
+bms_membership(group) == 
BMS_MULTIPLE);
+   bms_free(removed);
bms_free(group);
   

Re: Removing unneeded self joins

2023-10-13 Thread a.rybakina

On 13.10.2023 12:03, Andrei Lepikhov wrote:

On 13/10/2023 15:56, a.rybakina wrote:



Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


Agree. I wouldn't say I like it too. But also, I suggest skipping 
some unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all 
the negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with 
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already 
checked as is_opclause.

All these changes (see in the attachment) are optional.

I don't mind about asserts, maybe I misunderstood something in the 
patch.


About skipping SJ removal when no SJ quals is found, I assume it is 
about it:


split_selfjoin_quals(root, restrictlist, ,
                               , inner->relid, 
outer->relid);


+            if (list_length(selfjoinquals) == 0)
+             {
+                 /*
+                  * XXX:
+                  * we would detect self-join without quals like 
'x==x' if we had
+                  * an foreign key constraint on some of other quals 
and this join
+                  * haven't any columns from the outer in the target 
list.

+                  * But it is still complex task.
+                  */
+                 continue;
+             }

as far as I remember, this is the place where it is checked that the 
SJ list is empty and it is logical, in my opinion, that no 
transformations should be performed if no elements are found for them.
You forget we have "Degenerate" case, as Alexander mentioned above. 
What if you have something like that:

SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
In this case, uniqueness can be achieved by the baserestrictinfo 
"A.id=1", if we have an unique index on this column.



Yes, sorry, I missed it. thanks again for the explanation 




Re: Removing unneeded self joins

2023-10-13 Thread Andrei Lepikhov

On 13/10/2023 15:56, a.rybakina wrote:



Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


Agree. I wouldn't say I like it too. But also, I suggest skipping some 
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the 
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with 
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already 
checked as is_opclause.

All these changes (see in the attachment) are optional.


I don't mind about asserts, maybe I misunderstood something in the patch.

About skipping SJ removal when no SJ quals is found, I assume it is 
about it:


split_selfjoin_quals(root, restrictlist, ,
                               , inner->relid, 
outer->relid);


+            if (list_length(selfjoinquals) == 0)
+             {
+                 /*
+                  * XXX:
+                  * we would detect self-join without quals like 'x==x' 
if we had
+                  * an foreign key constraint on some of other quals 
and this join

+                  * haven't any columns from the outer in the target list.
+                  * But it is still complex task.
+                  */
+                 continue;
+             }

as far as I remember, this is the place where it is checked that the SJ 
list is empty and it is logical, in my opinion, that no transformations 
should be performed if no elements are found for them.
You forget we have "Degenerate" case, as Alexander mentioned above. What 
if you have something like that:

SELECT ... FROM A a1, A a2 WHERE a1.id=1 AND a2.id=1;
In this case, uniqueness can be achieved by the baserestrictinfo 
"A.id=1", if we have an unique index on this column.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2023-10-13 Thread a.rybakina



Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


Agree. I wouldn't say I like it too. But also, I suggest skipping some 
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the 
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with 
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already 
checked as is_opclause.

All these changes (see in the attachment) are optional.


I don't mind about asserts, maybe I misunderstood something in the patch.

About skipping SJ removal when no SJ quals is found, I assume it is 
about it:


split_selfjoin_quals(root, restrictlist, ,
                              , inner->relid, 
outer->relid);


+            if (list_length(selfjoinquals) == 0)
+             {
+                 /*
+                  * XXX:
+                  * we would detect self-join without quals like 'x==x' 
if we had
+                  * an foreign key constraint on some of other quals 
and this join

+                  * haven't any columns from the outer in the target list.
+                  * But it is still complex task.
+                  */
+                 continue;
+             }

as far as I remember, this is the place where it is checked that the SJ 
list is empty and it is logical, in my opinion, that no transformations 
should be performed if no elements are found for them.


As for the cases where SJ did not work, maybe this is just right if 
there are no elements for processing these cases. I'll try to check or 
come up with tests for them. If I'm wrong, write.


On 11.10.2023 06:51, Andrei Lepikhov wrote:

On 11/10/2023 02:29, Alena Rybakina wrote:

I have reviewed your patch and I noticed a few things.


Thanks for your efforts,

I have looked at the latest version of the code, I assume that the 
error lies in the replace_varno_walker function, especially in the 
place where we check the node by type Var, and does not form any 
NullTest node.


It's not a bug, it's an optimization we discussed with Alexander above.

Secondly, I added some code in some places to catch erroneous cases 
and added a condition when we should not try to apply the 
self-join-removal transformation due to the absence of an empty 
self-join list after searching for it and in general if there are no 
joins in the query. Besides, I added a query for testing and wrote 
about it above. I have attached my diff file.

Ok, I will look at this
In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?
I meant here that only clauses pushed by 
reconsider_outer_join_clauses() can be found in the joininfo list, and 
they are not relevant, as you can understand.
Having written that, I realized that it was a false statement. ;) - 
joininfo can also contain results of previous SJE iterations, look:


CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid; 


Ok, I understood. Thank you for explanation.


--
Regards,
Alena Rybakina


Re: Removing unneeded self joins

2023-10-12 Thread Andrei Lepikhov

On 12/10/2023 18:32, Alexander Korotkov wrote:

On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov

We have almost the results we wanted to have. But in the last explain
you can see that nothing happened with the OR clause. We should use the
expression mutator instead of walker to handle such clauses. But It
doesn't process the RestrictInfo node ... I'm inclined to put a solution
of this issue off for a while.


OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).


Yeah. I think It would be meaningful in the case of replacing also 
nested x IS NOT NULL with nothing. But it requires using a mutator 
instead of the walker and may be done more accurately next time.



I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.


Great!


Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.


I see. The thread is too long so far, thanks for the catch.


Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).


Agree. I wouldn't say I like it too. But also, I suggest skipping some 
unnecessary assertions proposed in that patch:
Assert(toKeep->relid != -1); - quite strange. Why -1? Why not all the 
negative numbers, at least?
Assert(is_opclause(orinfo->clause)); - above we skip clauses with 
rinfo->mergeopfamilies == NIL. Each mergejoinable clause is already 
checked as is_opclause.

All these changes (see in the attachment) are optional.

--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index f0746f35a3..7b8dc7a2b7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1710,8 +1710,6 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark 
*kmark, PlanRowMark *rmark,
List   *binfo_candidates = NIL;
ReplaceVarnoContext ctx = {.from = toRemove->relid,.to = toKeep->relid};
 
-   Assert(toKeep->relid != -1);
-
/*
 * Replace index of removing table with the keeping one. The technique 
of
 * removing/distributing restrictinfo is used here to attach just 
appeared
@@ -2017,8 +2015,6 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo 
*outer, List *uclauses,
/* Don't consider clauses which aren't similar 
to 'F(X)=G(Y)' */
continue;
 
-   Assert(is_opclause(orinfo->clause));
-
oclause = bms_is_empty(orinfo->left_relids) ?
get_rightop(orinfo->clause) : 
get_leftop(orinfo->clause);
c2 = (bms_is_empty(orinfo->left_relids) ?
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 2ff4881fdf..96ebd6eed3 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -367,7 +367,6 @@ CatalogId
 CatalogIdMapEntry
 CatalogIndexState
 ChangeVarNodes_context
-ReplaceVarnoContext
 CheckPoint
 CheckPointStmt
 CheckpointStatsData
@@ -2341,6 +2340,7 @@ ReorderBufferUpdateProgressTxnCB
 ReorderTuple
 RepOriginId
 ReparameterizeForeignPathByChild_function
+ReplaceVarnoContext
 ReplaceVarsFromTargetList_context
 ReplaceVarsNoMatchOption
 ReplicaIdentityStmt
@@ -2474,6 +2474,7 @@ SeenRelsEntry
 SelectLimit
 SelectStmt
 Selectivity
+SelfJoinCandidate
 SemTPadded
 SemiAntiJoinFactors
 SeqScan


Re: Removing unneeded self joins

2023-10-12 Thread Alexander Korotkov
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
 wrote:
> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >  > Relid replacement machinery is the most contradictory code here. We used
> >  > a utilitarian approach and implemented a simplistic variant.
> >
> >  > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >  > > they are not necessary.  [1] points that infrastructure from [2] might
> >  > > be useful.  The patchset from [2] seems committed mow.  However, I
> >  > > can't see it is directly helpful in this matter.  Could we just skip
> >  > > adding IS NOT NULL clause for the columns, that have
> >  > > pg_attribute.attnotnull set?
> >  > Thanks for the links, I will look into that case.
> To be more precise, in the attachment, you can find a diff to the main
> patch, which shows the volume of changes to achieve the desired behaviour.
> Some explains in regression tests shifted. So, I've made additional tests:
>
> DROP TABLE test CASCADE;
> CREATE TABLE test (a int, b int not null);
> CREATE UNIQUE INDEX abc ON test(b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> CREATE UNIQUE INDEX abc1 ON test(a,b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> DROP INDEX abc1;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>
> We have almost the results we wanted to have. But in the last explain
> you can see that nothing happened with the OR clause. We should use the
> expression mutator instead of walker to handle such clauses. But It
> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> of this issue off for a while.

OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

Links
1. 
https://www.postgresql.org/message-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/96f66ae3-df10-4060-9844-4c9633062cd3%40yandex.ru

--
Regards,
Alexander Korotkov


0001-Remove-useless-self-joins-v44.patch
Description: Binary data


Re: Removing unneeded self joins

2023-10-10 Thread Andrei Lepikhov

On 11/10/2023 02:29, Alena Rybakina wrote:

I have reviewed your patch and I noticed a few things.


Thanks for your efforts,

I have looked at the latest version of the code, I assume that the error 
lies in the replace_varno_walker function, especially in the place where 
we check the node by type Var, and does not form any NullTest node.


It's not a bug, it's an optimization we discussed with Alexander above.

Secondly, I added some code in some places to catch erroneous cases and 
added a condition when we should not try to apply the self-join-removal 
transformation due to the absence of an empty self-join list after 
searching for it and in general if there are no joins in the query. 
Besides, I added a query for testing and wrote about it above. I have 
attached my diff file.

Ok, I will look at this
In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?
I meant here that only clauses pushed by reconsider_outer_join_clauses() 
can be found in the joininfo list, and they are not relevant, as you can 
understand.
Having written that, I realized that it was a false statement. ;) - 
joininfo can also contain results of previous SJE iterations, look:


CREATE TABLE test (oid int PRIMARY KEY);
CREATE UNIQUE INDEX ON test((oid*oid));
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c2.oid AND c1.oid*c2.oid=c3.oid*c3.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c1.oid=c3.oid AND c1.oid*c3.oid=c2.oid*c2.oid;
explain
SELECT count(*)
FROM test c1, test c2, test c3
WHERE c3.oid=c2.oid AND c3.oid*c2.oid=c1.oid*c1.oid;

Having executed this SQL code, you could see that in the last query, the 
SJE feature didn't delete one of the JOINs because of the reason I had 
written above.

It's not an one-minute fix - I will try to propose solution a bit later.

--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2023-10-10 Thread Alena Rybakina

Hi!

I have reviewed your patch and I noticed a few things.

First of all, I think I found a bug in your latest patch version, and 
this query shows it:


EXPLAIN (COSTS OFF)
SELECT c.oid, e.oid FROM pg_class c FULL JOIN (
  SELECT e1.oid FROM pg_extension e1, pg_extension e2
  WHERE e1.oid=e2.oid) AS e
  ON c.oid=e.oid;

In the current version we get such a query plan:

   QUERY PLAN
-
 Hash Full Join
   Hash Cond: (c.oid = e2.oid)
   ->  Seq Scan on pg_class c
   ->  Hash
 ->  Seq Scan on pg_extension e2
(5 rows)

But I think it should be:

QUERY PLAN
-
Hash Full Join
Hash Cond: (c.oid = e2.oid)
-> Seq Scan on pg_class c
-> Hash
-> Seq Scan on pg_extension e2
*Filter: (oid IS NOT NULL)*
(6 rows)

I have looked at the latest version of the code, I assume that the error 
lies in the replace_varno_walker function, especially in the place where 
we check the node by type Var, and does not form any NullTest node.


if (OidIsValid(reloid) && get_attnotnull(reloid, attnum)) -- this 
condition works

    {
  rinfo->clause = (Expr *) makeBoolConst(true, false);
    }
    else
    {
  NullTest   *ntest = makeNode(NullTest);

  ntest->arg = leftOp;
  ntest->nulltesttype = IS_NOT_NULL;
  ntest->argisrow = false;
  ntest->location = -1;
  rinfo->clause = (Expr *) ntest;
    }


Secondly, I added some code in some places to catch erroneous cases and 
added a condition when we should not try to apply the self-join-removal 
transformation due to the absence of an empty self-join list after 
searching for it and in general if there are no joins in the query. 
Besides, I added a query for testing and wrote about it above. I have 
attached my diff file.



In addition, I found a comment for myself that was not clear to me. I 
would be glad if you could explain it to me.


You mentioned superior outer join in the comment, unfortunately, I 
didn't find anything about it in the PostgreSQL code, and this 
explanation remained unclear to me. Could you explain in more detail 
what you meant?


/*
 * At this stage joininfo lists of inner and outer can contain
 * only clauses, required for *a superior outer join* that can't
 * influence on this optimization. So, we can avoid to call the
 * build_joinrel_restrictlist() routine.
*/
 restrictlist = generate_join_implied_equalities(root, joinrelids,
  inner->relids,
  outer, NULL);

--

Regards,
Alena Rybakina
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 3e10083905c..5ba5ca693f1 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -1704,6 +1704,8 @@ remove_self_join_rel(PlannerInfo *root, PlanRowMark *kmark, PlanRowMark *rmark,
 	List			   *binfo_candidates = NIL;
 	ReplaceVarnoContext	ctx = {.from = toRemove->relid, .to = toKeep->relid};
 
+	Assert(toKeep->relid != -1);
+
 	/*
 	 * Replace index of removing table with the keeping one. The technique of
 	 * removing/distributing restrictinfo is used here to attach just appeared
@@ -2007,6 +2009,8 @@ match_unique_clauses(PlannerInfo *root, RelOptInfo *outer, List *uclauses,
 /* Don't consider clauses which aren't similar to 'F(X)=G(Y)' */
 continue;
 
+			Assert(is_opclause(orinfo->clause));
+
 			oclause = bms_is_empty(orinfo->left_relids) ?
 	get_rightop(orinfo->clause) : get_leftop(orinfo->clause);
 			c2 = (bms_is_empty(orinfo->left_relids) ?
@@ -2150,6 +2154,18 @@ remove_self_joins_one_group(PlannerInfo *root, Relids relids)
 			split_selfjoin_quals(root, restrictlist, ,
  , inner->relid, outer->relid);
 
+			if (list_length(selfjoinquals) == 0)
+ 			{
+ /*
+  * XXX:
+  * we would detect self-join without quals like 'x==x' if we had
+  * an foreign key constraint on some of other quals and this join
+  * haven't any columns from the outer in the target list.
+  * But it is still complex task.
+  */
+ continue;
+ 			}
+
 			/*
 			 * To enable SJE for the only degenerate case without any self join
 			 * clauses at all, add baserestrictinfo to this list.
@@ -2332,7 +2348,7 @@ remove_useless_self_joins(PlannerInfo *root, List *joinlist)
 	Relids	ToRemove = NULL;
 	int		relid = -1;
 
-	if (!enable_self_join_removal)
+	if ((list_length(joinlist) <=1 && !IsA(linitial(joinlist), List)) || !enable_self_join_removal)
 		return joinlist;
 
 	/*
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 10b23944feb..800410d6b18 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,13 +5807,11 @@ explain (costs off)
 select p.* from
   (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
   where p.k = 1 and p.k = 2;
-   QUERY PLAN

Re: Removing unneeded self joins

2023-10-05 Thread Andrei Lepikhov

On 4/10/2023 14:34, Alexander Korotkov wrote:

 > Relid replacement machinery is the most contradictory code here. We used
 > a utilitarian approach and implemented a simplistic variant.

 > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
 > > they are not necessary.  [1] points that infrastructure from [2] might
 > > be useful.  The patchset from [2] seems committed mow.  However, I
 > > can't see it is directly helpful in this matter.  Could we just skip
 > > adding IS NOT NULL clause for the columns, that have
 > > pg_attribute.attnotnull set?
 > Thanks for the links, I will look into that case.
To be more precise, in the attachment, you can find a diff to the main 
patch, which shows the volume of changes to achieve the desired behaviour.

Some explains in regression tests shifted. So, I've made additional tests:

DROP TABLE test CASCADE;
CREATE TABLE test (a int, b int not null);
CREATE UNIQUE INDEX abc ON test(b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
CREATE UNIQUE INDEX abc1 ON test(a,b);
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
DROP INDEX abc1;
explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);

We have almost the results we wanted to have. But in the last explain 
you can see that nothing happened with the OR clause. We should use the 
expression mutator instead of walker to handle such clauses. But It 
doesn't process the RestrictInfo node ... I'm inclined to put a solution 
of this issue off for a while.


--
regards,
Andrey Lepikhov
Postgres Professional
diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index a127239d30..c12aa15fc9 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -73,7 +73,7 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   List 
*restrictlist,
   List 
**extra_clauses);
 static Bitmapset *replace_relid(Relids relids, int oldId, int newId);
-static void replace_varno(Node *node, int from, int to);
+static void replace_varno(PlannerInfo *root, Node *node, int from, int to);
 
 
 /*
@@ -388,7 +388,7 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
}
 
/* Update lateral references. */
-   replace_varno((Node *) otherrel->lateral_vars, relid, subst);
+   replace_varno(root, (Node *) otherrel->lateral_vars, relid, 
subst);
}
 
/*
@@ -425,7 +425,7 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, 
ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, 
ojrelid, subst);
 
-   replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
+   replace_varno(root, (Node *) sjinf->semi_rhs_exprs, relid, 
subst);
}
 
/*
@@ -1399,6 +1399,7 @@ is_innerrel_unique_for(PlannerInfo *root,
 
 typedef struct ReplaceVarnoContext
 {
+   PlannerInfo *root;
int from;
int to;
 } ReplaceVarnoContext;
@@ -1420,6 +1421,11 @@ replace_varno_walker(Node *node, ReplaceVarnoContext 
*ctx)
}
return false;
}
+
+   /*
+* Expression walker doesn't know about RestrictInfo node. Do recursive 
pass
+* into the clauses manually.
+*/
if (IsA(node, RestrictInfo))
{
RestrictInfo   *rinfo = (RestrictInfo *) node;
@@ -1429,20 +1435,26 @@ replace_varno_walker(Node *node, ReplaceVarnoContext 
*ctx)
 
if (bms_is_member(ctx->from, rinfo->clause_relids))
{
-   replace_varno((Node *) rinfo->clause, ctx->from, 
ctx->to);
-   replace_varno((Node *) rinfo->orclause, ctx->from, 
ctx->to);
-   rinfo->clause_relids = 
replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
-   rinfo->left_relids = replace_relid(rinfo->left_relids, 
ctx->from, ctx->to);
-   rinfo->right_relids = 
replace_relid(rinfo->right_relids, ctx->from, ctx->to);
+   replace_varno(ctx->root, (Node *) rinfo->clause, 
ctx->from, ctx->to);
+   replace_varno(ctx->root, (Node *) rinfo->orclause, 
ctx->from, ctx->to);
+   rinfo->clause_relids =
+   
replace_relid(rinfo->clause_relids, ctx->from, ctx->to);
+   rinfo->left_relids =
+   
replace_relid(rinfo->left_relids, ctx->from, ctx->to);
+

Re: Removing unneeded self joins

2023-10-05 Thread Andrei Lepikhov

On 4/10/2023 14:34, Alexander Korotkov wrote:

Hi!

On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

 > On 4/10/2023 07:12, Alexander Korotkov wrote:
 > > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
 > > mailto:a.lepik...@postgrespro.ru>> wrote:
 > >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
 > >>> During the significant code revision in v.41 I lost some replacement
 > >>> operations. Here is the fix and extra tests to check this in the 
future.

 > >>> Also, Tom added the JoinDomain structure five months ago, and I added
 > >>> code to replace relids for that place too.
 > >>> One more thing, I found out that we didn't replace SJs, defined by
 > >>> baserestrictinfos if no one self-join clause have existed for the 
join.

 > >>> Now, it is fixed, and the test has been added.
 > >>> To understand changes readily, see the delta file in the attachment.
 > >> Here is new patch in attachment. Rebased on current master and some
 > >> minor gaffes are fixed.
 > >
 > > I went through the thread and I think the patch gets better shape.  A
 > > couple of notes from my side.
 > > 1) Why replace_relid() makes a copy of lids only on insert/replace of
 > > a member, but performs deletion in-place?
 >
 > Shortly speaking, it was done according to the 'Paranoid' strategy.
 > The main reason for copying before deletion was the case with the rinfo
 > required_relids and clause_relids. They both point to the same Bitmapset
 > in some cases. And we feared such things for other fields.
 > Right now, it may be redundant because we resolved the issue mentioned
 > above in replace_varno_walker.

OK, but my point is still that you should be paranoid in all the cases 
or none of the cases.  Right now (newId < 0) branch doesn't copy source 
relids, but bms_is_member(oldId, relids) does copy.  Also, I think 
whether we copy or not should be reflected in the function comment.


/*
  * Substitute newId by oldId in relids.
  */
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
     if (oldId < 0)
         return relids;

     if (newId < 0)
         /* Delete relid without substitution. */
         return bms_del_member(relids, oldId);

     if (bms_is_member(oldId, relids))
         return bms_add_member(bms_del_member(bms_copy(relids), oldId), 
newId);


     return relids;
}


We tried to use replace_relid() for both cases of JOIN deletion: 
unneeded outer join and self-join, and the relid deletion is used only 
in the first case. Such an approach was used there for a long time, and 
we just didn't change it.

I am prone to removing the copy operation in the code of relid replacement.



 > Relid replacement machinery is the most contradictory code here. We used
 > a utilitarian approach and implemented a simplistic variant.

 > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
 > > they are not necessary.  [1] points that infrastructure from [2] might
 > > be useful.  The patchset from [2] seems committed mow.  However, I
 > > can't see it is directly helpful in this matter.  Could we just skip
 > > adding IS NOT NULL clause for the columns, that have
 > > pg_attribute.attnotnull set?
 > Thanks for the links, I will look into that case.

Thanks for the curious issue.
The new field Var::varnullingrels introduced in [2] doesn't make sense 
here, as I see: we operate with plain relations only, and I don't know 
how it can be applied to an arbitrary subtree contained OUTER JOINs.
The second option, the attnotnull flag, can be used in this code. We 
haven't implemented it because the process_equivalence routine doesn't 
check the attnotnull before creating NullTest.
In general, it is not a difficult operation - we just need to add a 
trivial get_attnotnull() routine to lssycache.c likewise 
get_attgenerated() and other functions.
But, replace_varno uses the walker to change the relid. The mentioned 
replacement, like

X=X --> X IS NOT NULL
can be applied on different levels of the expression, look:
A a1 JOIN A a2 ON (a1.id=a2.id) WHERE (a1.x AND (a1.y=a2.y))
Here, we can replace id=id and y=y. It may need some 'unwanted clauses' 
collection procedure and a second pass through the expression tree to 
remove them. It may add some unpredictable overhead.
We can replace such a clause with a trivial 'TRUE' clause, of course. 
But is it the feature you have requested?


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2023-10-04 Thread Alexander Korotkov
Hi!

On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov 
wrote:
> On 4/10/2023 07:12, Alexander Korotkov wrote:
> > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> >  wrote:
> >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> >>> During the significant code revision in v.41 I lost some replacement
> >>> operations. Here is the fix and extra tests to check this in the
future.
> >>> Also, Tom added the JoinDomain structure five months ago, and I added
> >>> code to replace relids for that place too.
> >>> One more thing, I found out that we didn't replace SJs, defined by
> >>> baserestrictinfos if no one self-join clause have existed for the
join.
> >>> Now, it is fixed, and the test has been added.
> >>> To understand changes readily, see the delta file in the attachment.
> >> Here is new patch in attachment. Rebased on current master and some
> >> minor gaffes are fixed.
> >
> > I went through the thread and I think the patch gets better shape.  A
> > couple of notes from my side.
> > 1) Why replace_relid() makes a copy of lids only on insert/replace of
> > a member, but performs deletion in-place?
>
> Shortly speaking, it was done according to the 'Paranoid' strategy.
> The main reason for copying before deletion was the case with the rinfo
> required_relids and clause_relids. They both point to the same Bitmapset
> in some cases. And we feared such things for other fields.
> Right now, it may be redundant because we resolved the issue mentioned
> above in replace_varno_walker.

OK, but my point is still that you should be paranoid in all the cases or
none of the cases.  Right now (newId < 0) branch doesn't copy source
relids, but bms_is_member(oldId, relids) does copy.  Also, I think whether
we copy or not should be reflected in the function comment.

/*
 * Substitute newId by oldId in relids.
 */
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
if (oldId < 0)
return relids;

if (newId < 0)
/* Delete relid without substitution. */
return bms_del_member(relids, oldId);

if (bms_is_member(oldId, relids))
return bms_add_member(bms_del_member(bms_copy(relids), oldId),
newId);

return relids;
}

> Relid replacement machinery is the most contradictory code here. We used
> a utilitarian approach and implemented a simplistic variant.

> > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > they are not necessary.  [1] points that infrastructure from [2] might
> > be useful.  The patchset from [2] seems committed mow.  However, I
> > can't see it is directly helpful in this matter.  Could we just skip
> > adding IS NOT NULL clause for the columns, that have
> > pg_attribute.attnotnull set?
> Thanks for the links, I will look into that case.

OK, thank you.

--
Regards,
Alexander Korotkov


Re: Removing unneeded self joins

2023-10-04 Thread Andrei Lepikhov

On 4/10/2023 07:12, Alexander Korotkov wrote:

Hi!

Thanks for the review!


I think this is a neat optimization.

On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
 wrote:

On 5/7/2023 21:28, Andrey Lepikhov wrote:

During the significant code revision in v.41 I lost some replacement
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by
baserestrictinfos if no one self-join clause have existed for the join.
Now, it is fixed, and the test has been added.
To understand changes readily, see the delta file in the attachment.

Here is new patch in attachment. Rebased on current master and some
minor gaffes are fixed.


I went through the thread and I think the patch gets better shape.  A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?


Shortly speaking, it was done according to the 'Paranoid' strategy.
The main reason for copying before deletion was the case with the rinfo 
required_relids and clause_relids. They both point to the same Bitmapset 
in some cases. And we feared such things for other fields.
Right now, it may be redundant because we resolved the issue mentioned 
above in replace_varno_walker.


Relid replacement machinery is the most contradictory code here. We used 
a utilitarian approach and implemented a simplistic variant.



2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary.  [1] points that infrastructure from [2] might
be useful.  The patchset from [2] seems committed mow.  However, I
can't see it is directly helpful in this matter.  Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?

Thanks for the links, I will look into that case.


Links
1. https://www.postgresql.org/message-id/2375492.jE0xQCEvom%40aivenronan
2.  https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Removing unneeded self joins

2023-10-03 Thread Alexander Korotkov
Hi!

I think this is a neat optimization.

On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
 wrote:
> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> > During the significant code revision in v.41 I lost some replacement
> > operations. Here is the fix and extra tests to check this in the future.
> > Also, Tom added the JoinDomain structure five months ago, and I added
> > code to replace relids for that place too.
> > One more thing, I found out that we didn't replace SJs, defined by
> > baserestrictinfos if no one self-join clause have existed for the join.
> > Now, it is fixed, and the test has been added.
> > To understand changes readily, see the delta file in the attachment.
> Here is new patch in attachment. Rebased on current master and some
> minor gaffes are fixed.

I went through the thread and I think the patch gets better shape.  A
couple of notes from my side.
1) Why replace_relid() makes a copy of lids only on insert/replace of
a member, but performs deletion in-place?
2) It would be nice to skip the insertion of IS NOT NULL checks when
they are not necessary.  [1] points that infrastructure from [2] might
be useful.  The patchset from [2] seems committed mow.  However, I
can't see it is directly helpful in this matter.  Could we just skip
adding IS NOT NULL clause for the columns, that have
pg_attribute.attnotnull set?

Links
1. https://www.postgresql.org/message-id/2375492.jE0xQCEvom%40aivenronan
2.  https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us


--
Regards,
Alexander Korotkov




Re: Removing unneeded self joins

2023-09-12 Thread Andrey Lepikhov

On 5/7/2023 21:28, Andrey Lepikhov wrote:

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.
Here is new patch in attachment. Rebased on current master and some 
minor gaffes are fixed.


--
regards,
Andrey Lepikhov
Postgres Professional
From 70bb5cf3d11b2797f1a9c7b04740435135229d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 12 Sep 2023 18:25:51 +0700
Subject: [PATCH] Remove self-joins.

Self Join Elimination (SJE) feature removes an inner join of a plain table to
itself in the query tree if is proved that the join can be replaced with
a scan without impact to the query result.
Self join and inner relation are replaced with the outer in query, equivalence
classes and planner info structures. Also, inner restrictlist moves to the
outer one with removing duplicated clauses. Thus, this optimization reduces
length of range table list (especially it make sense for partitioned relations),
reduces number of restriction clauses === selectivity estimations and
potentially can improve total planner prediction for the query.

The SJE proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. Each matched inner row must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Add to the list above baseretrictinfo of inner table.
3. Check innerrel_is_unique() for the qual list. If it returns false, skip this
pair of joining tables.
4. Check uniqueness, proved by the baserestrictinfo clauses. To prove 
possibility
of the self-join elimination inner and outer clauses must have exact match.

Relation replacement procedure is not trivial and it is partly combined with 
the one,
used to remove useless left joins.

Tests, covering this feature, added to the join.sql.
Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1094 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  824 -
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/expected/updatable_views.out |   17 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  360 ++
 src/tools/pgindent/typedefs.list  |2 +
 14 files changed, 2375 insertions(+), 64 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6bc1b215db..43c07b0d3b 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5299,6 +5299,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 6a93d767a5..508285d1ef 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+

Re: Removing unneeded self joins

2023-07-05 Thread Andrey Lepikhov

Hi,

During the significant code revision in v.41 I lost some replacement 
operations. Here is the fix and extra tests to check this in the future.
Also, Tom added the JoinDomain structure five months ago, and I added 
code to replace relids for that place too.
One more thing, I found out that we didn't replace SJs, defined by 
baserestrictinfos if no one self-join clause have existed for the join. 
Now, it is fixed, and the test has been added.

To understand changes readily, see the delta file in the attachment.

--
regards,
Andrey Lepikhov
Postgres Professional
From 8f5a432f6fbbcad1fd2937f33af09e9328690b6b Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Tue, 4 Jul 2023 16:07:50 +0700
Subject: [PATCH] Add lost arrangements of relids and varnos. Add the test to
 check it. Add one more cleaning procedure on JoinDomain relids which was
 introduced recently with commit 3bef56e. Fix the corner case when we haven't
 removed SJ if the selfjoinquals list was empty.

---
 src/backend/optimizer/plan/analyzejoins.c | 15 ++-
 src/test/regress/expected/join.out| 26 ---
 src/test/regress/expected/updatable_views.out | 17 +---
 src/test/regress/sql/join.sql |  9 +++
 4 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index a93e4ce05c..15234b7a3b 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -424,6 +424,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
sjinf->commute_above_r = replace_relid(sjinf->commute_above_r, 
ojrelid, subst);
sjinf->commute_below_l = replace_relid(sjinf->commute_below_l, 
ojrelid, subst);
sjinf->commute_below_r = replace_relid(sjinf->commute_below_r, 
ojrelid, subst);
+
+   replace_varno((Node *) sjinf->semi_rhs_exprs, relid, subst);
}
 
/*
@@ -465,6 +467,8 @@ remove_rel_from_query_common(PlannerInfo *root, RelOptInfo 
*rel,
/* ph_needed might or might not become empty */
phv->phrels = replace_relid(phv->phrels, relid, subst);
phv->phrels = replace_relid(phv->phrels, ojrelid, 
subst);
+   phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, 
relid, subst);
+   phinfo->ph_var->phrels = 
replace_relid(phinfo->ph_var->phrels, relid, subst);
Assert(!bms_is_empty(phv->phrels));
Assert(phv->phnullingrels == NULL); /* no need to 
adjust */
}
@@ -1545,6 +1549,7 @@ update_eclass(EquivalenceClass *ec, int from, int to)
}
 
em->em_relids = replace_relid(em->em_relids, from, to);
+   em->em_jdomain->jd_relids = 
replace_relid(em->em_jdomain->jd_relids, from, to);
 
/* We only process inner joins */
replace_varno((Node *) em->em_expr, from, to);
@@ -2101,7 +2106,7 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
 */
restrictlist = generate_join_implied_equalities(root, 
joinrelids,

inner->relids,
-   
outer, 0);
+   
outer, NULL);
 
/*
 * Process restrictlist to seperate the self join quals 
out of
@@ -2111,6 +2116,14 @@ remove_self_joins_one_group(PlannerInfo *root, Relids 
relids)
split_selfjoin_quals(root, restrictlist, ,
 
, inner->relid, outer->relid);
 
+   /*
+* To enable SJE for the only degenerate case without 
any self join
+* clauses at all, add baserestrictinfo to this list.
+* Degenerate case works only if both sides have the 
same clause. So
+* doesn't matter which side to add.
+*/
+   selfjoinquals = list_concat(selfjoinquals, 
outer->baserestrictinfo);
+
/*
 * Determine if the inner table can duplicate outer 
rows.  We must
 * bypass the unique rel cache here since we're 
possibly using a
diff --git a/src/test/regress/expected/join.out 
b/src/test/regress/expected/join.out
index b1f43f6ff8..027c356bcc 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5807,11 +5807,13 @@ explain (costs off)
 select 

Re: Removing unneeded self joins

2023-06-30 Thread Andrey Lepikhov

On 4/4/2023 02:30, Gregory Stark (as CFM) wrote:

On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek  wrote:


Hi All,

I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.


Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.

I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.

Hi,

Version 41 is heavily remade of the feature:

1. In previous versions, I tried to reuse remove_rel_from_query() for 
both left and self-join removal. But for now, I realized that it is a 
bit different procedures which treat different operations. In this 
patch, only common stages of the PlannerInfo fixing process are united 
in one function.
2. Transferring clauses from the removing table to keeping one is more 
transparent now and contains comments.
3. Equivalence classes update procedure was changed according to David's 
commit 3373c71. As I see, Tom has added remove_rel_from_eclass since the 
last v.40 version, and it looks pretty similar to the update_eclass 
routine in this patch.


It passes regression tests, but some questions are still open:
1. Should we look for duplicated or redundant clauses (the same for 
eclasses) during the clause transfer procedure? On the one side, we 
reduce the length of restrict lists that can impact planning or 
executing time. Additionally, we improve the accuracy of cardinality 
estimation. On the other side, it is one more place that can make 
planning time much longer in specific cases. It would have been better 
to avoid calling the equal() function here, but it's the only way to 
detect duplicated inequality expressions.
2. Could we reuse ChangeVarNodes instead of sje_walker(), merge 
remove_rel_from_restrictinfo with replace_varno?
3. Also, I still don't finish with the split_selfjoin_quals: some 
improvements could be made.


--
regards,
Andrey Lepikhov
Postgres Professional
From 4a342b9789f5be209318c13fb7ec336fcbd2aee5 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Elimination (SJE) feature removes inner join of plain table to 
itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1077 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  798 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  351 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2319 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 6262cb7bb2..68215e1093 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5419,6 +5419,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 0065c8992b..57bdc6811f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21 

Re: Removing unneeded self joins

2023-05-25 Thread Andrey Lepikhov

On 3/6/23 10:30, Michał Kłeczek wrote:
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several
> commit tests.
>
> I think this patch would be really beneficial for us as we heavily use
> views to structure out code.
> Each view is responsible for providing some calculated values and 
they > are joined in a query to retrieve the full set of information.

>
> Not sure how the process works and how I could help (I am absolutely
> not capable of helping with coding I am afraid - but could sponsor a
> (small :) ) bounty to speed things up).

Yes, I am still working on this feature. Because of significant changes 
in the optimizer code which Tom & Richard had been doing last months, I 
didn't touch it for a while. But now this work can be continued.


Current patch is rebased on current master. Because of the nullable_rels 
logic, introduced recently, ojrelids were highly spreaded across planner 
bitmapsets. So, JE logic was changed.


But now, I'm less happy with the code. It seems we need to refactor it:
1. According to reports of some performance engineers, the feature can 
cause overhead ~0.5% on trivial queries without joins at all. We should 
discover the patch and find the way for quick and cheap return, if the 
statement contains no one join or, maybe stronger, no one self join.
2. During join elimination we replace clauses like 'x=x' with 'x IS NOT 
NULL'. It is a weak point because we change clause semantic 
(mergejoinable to non-mergejoinable, in this example) and could forget 
consistently change some RestrictInfo fields.
3. In the previous versions we changed the remove_rel_from_query routine 
trying to use it in both 'Useless outer join' and 'Self join' 
elimination optimizations. Now, because of the 'ojrelid' field it looks 
too complicated. Do we need to split this routine again?


--
Regards
Andrey Lepikhov
Postgres Professional
From cb4340577dab0e8cf5531e9934f5734fda178490 Mon Sep 17 00:00:00 2001
From: Andrey Lepikhov 
Date: Mon, 15 May 2023 09:04:51 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Elimination (SJE) feature removes inner join of plain table to itself
in a query tree if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clause.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use the next approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation is deduced from baserestrictinfo clause like
'x=const' on unique field(s), check what the inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1093 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2313 insertions(+), 38 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5da74b3c40..b47d11759e 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5437,6 +5437,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+   Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 1436dbc2f2..1b1aa9083c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3491,6 +3491,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return 

Re: Removing unneeded self joins

2023-04-03 Thread Gregory Stark (as CFM)
On Mon, 6 Mar 2023 at 00:30, Michał Kłeczek  wrote:
>
> Hi All,
>
> I just wanted to ask about the status and plans for this patch.
> I can see it being stuck at “Waiting for Author” status in several commit 
> tests.

Sadly it seems to now be badly in need of a rebase. There are large
hunks failing in the guts of analyzejoins.c as well as minor failures
elsewhere and lots of offsets which need to be reviewed.

I think given the lack of activity it's out of time for this release
at this point. I'm moving it ahead to the next CF.



--
Gregory Stark
As Commitfest Manager




Re: Removing unneeded self joins

2023-03-05 Thread Michał Kłeczek
Hi All,

I just wanted to ask about the status and plans for this patch.
I can see it being stuck at “Waiting for Author” status in several commit tests.

I think this patch would be really beneficial for us as we heavily use views to 
structure out code.
Each view is responsible for providing some calculated values and they are 
joined in a query to retrieve the full set of information.

Not sure how the process works and how I could help (I am absolutely not 
capable of helping with coding I am afraid - but could sponsor a (small :) ) 
bounty to speed things up).

Thanks,
Michal

> On 16 Dec 2022, at 07:45, Andrey Lepikhov  wrote:
> 
> On 12/6/22 23:46, Andres Freund wrote:
>> This doesn't pass the main regression tests due to a plan difference.
>> https://cirrus-ci.com/task/5537518245380096
>> https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs
>> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
>> /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
>> --- /tmp/cirrus-ci-build/src/test/regress/expected/join.out  2022-12-05 
>> 19:11:52.453920838 +
>> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out  
>> 2022-12-05 19:15:21.864183651 +
>> @@ -5806,7 +5806,7 @@
>>   Nested Loop
>> Join Filter: (sj_t3.id = sj_t1.id)
>> ->  Nested Loop
>> - Join Filter: (sj_t3.id = sj_t2.id)
>> + Join Filter: (sj_t2.id = sj_t3.id)
>>   ->  Nested Loop Semi Join
>> ->  Nested Loop
>>   ->  HashAggregate
> This change in the test behaviour is induced by the a5fc4641
> "Avoid making commutatively-duplicate clauses in EquivalenceClasses."
> Nothing special, as I see. Attached patch fixes this.
> 
> -- 
> Regards
> Andrey Lepikhov
> Postgres Professional
> 





Re: Removing unneeded self joins

2022-12-15 Thread Andrey Lepikhov

On 12/6/22 23:46, Andres Freund wrote:

This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 
19:11:52.453920838 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 
2022-12-05 19:15:21.864183651 +
@@ -5806,7 +5806,7 @@
   Nested Loop
 Join Filter: (sj_t3.id = sj_t1.id)
 ->  Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
   ->  Nested Loop Semi Join
 ->  Nested Loop
   ->  HashAggregate

This change in the test behaviour is induced by the a5fc4641
"Avoid making commutatively-duplicate clauses in EquivalenceClasses."
Nothing special, as I see. Attached patch fixes this.

--
Regards
Andrey Lepikhov
Postgres Professional
From 3e546637561bf4c6d195bc7c95b1e05263e797e2 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 8e4145979d..2f9948d5f8 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5311,6 +5311,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 914bfd90bc..8d57c68b1f 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3494,6 +3494,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+		 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+			  List *restrictlist,
+			  List *exprlist, List *oprlist,
+			  List **extra_clauses)
 {
 	ListCell   *ic;
 
@@ -3549,6 +3564,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;
 
 		/*
 		 * If the index is not unique, or not immediately enforced, or if 

Re: Removing unneeded self joins

2022-12-06 Thread Andres Freund
Hi,

On 2022-10-05 17:25:18 +0500, Andrey Lepikhov wrote:
> New version, rebased onto current master.
> Nothing special, just rebase.

This doesn't pass the main regression tests due to a plan difference.
https://cirrus-ci.com/task/5537518245380096
https://api.cirrus-ci.com/v1/artifact/task/5537518245380096/testrun/build/testrun/regress/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/join.out 
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/join.out 2022-12-05 
19:11:52.453920838 +
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/join.out 
2022-12-05 19:15:21.864183651 +
@@ -5806,7 +5806,7 @@
  Nested Loop
Join Filter: (sj_t3.id = sj_t1.id)
->  Nested Loop
- Join Filter: (sj_t3.id = sj_t2.id)
+ Join Filter: (sj_t2.id = sj_t3.id)
  ->  Nested Loop Semi Join
->  Nested Loop
  ->  HashAggregate

Greetings,

Andres Freund




Re: Removing unneeded self joins

2022-10-05 Thread Andrey Lepikhov

New version, rebased onto current master.
Nothing special, just rebase.

--
regards,
Andrey Lepikhov
Postgres Professional
From 03aab7a2431032166c9ea5f52fbcccaf7168abec Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Oct 2022 16:58:34 +0500
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc_tables.c   |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d750290f13..5ce2d4d2fa 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5290,6 +5290,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index c31fcc917d..51f672a65c 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3513,6 +3513,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
 
@@ -3568,6 +3583,7 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
+   List   *exprs = NIL;
 
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
@@ -3616,6 +3632,24 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is unique */
+
+   if 
(bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
+   {
+   MemoryContext oldMemCtx =
+   

Re: Removing unneeded self joins

2022-08-28 Thread Andrey Lepikhov

On 8/29/22 04:39, Zhihong Yu wrote:



On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu > wrote:


Hi,
For v36-0001-Remove-self-joins.patch :

bq removes inner join of plane table to itself

plane table -> plain table

For relation_has_unique_index_ext(), it seems when extra_clauses
is NULL, there is no need to compute `exprs`.

Cheers

Done



For remove_self_joins_recurse():

+                   if (bms_num_members(relids) > join_collapse_limit)
+                       break;

The above line just comes out of the switch statement. This check should 
be done again between foreach and switch.

Otherwise the above check wouldn't achieve what you want.

Cheers

Thanks for highlighting the problem.
I guess, usage either of join_collapse_limit or from_collapse_limit 
isn't practical here.
That we really afraid here - many senseless search cycles of self-joins. 
And it may have higher limit than GUCs above. So I introduced a guc, 
called "self_join_search_limit" (so far undocumented) that is an 
explicit limit for a set of plain relations in FROM-list to search 
self-joins.


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 6283d6e21214e34d3c1a6351fa9f6ac1aeb75ce8 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plain table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   38 +
 src/backend/optimizer/plan/analyzejoins.c | 1046 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   22 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |7 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  774 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  340 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2278 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..c41e7256be 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,6 +3495,21 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 			  List *restrictlist,
 			  List *exprlist, List *oprlist)
+{
+	return relation_has_unique_index_ext(root, rel, restrictlist,
+		 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+			  List *restrictlist,
+			  List *exprlist, List *oprlist,
+			  List **extra_clauses)
 {
 	ListCell   *ic;
 
@@ -3550,6 +3565,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
 	{
 		IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
 		int			c;
+		List	   *exprs = NIL;
 
 		/*
 		 * If the index is not unique, 

Re: Removing unneeded self joins

2022-08-28 Thread Zhihong Yu
On Fri, Aug 26, 2022 at 3:02 PM Zhihong Yu  wrote:

> Hi,
> For v36-0001-Remove-self-joins.patch :
>
> bq removes inner join of plane table to itself
>
> plane table -> plain table
>
> For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
> there is no need to compute `exprs`.
>
> Cheers
>

For remove_self_joins_recurse():

+   if (bms_num_members(relids) > join_collapse_limit)
+   break;

The above line just comes out of the switch statement. This check should be
done again between foreach and switch.
Otherwise the above check wouldn't achieve what you want.

Cheers


Re: Removing unneeded self joins

2022-08-26 Thread Zhihong Yu
Hi,
For v36-0001-Remove-self-joins.patch :

bq removes inner join of plane table to itself

plane table -> plain table

For relation_has_unique_index_ext(), it seems when extra_clauses is NULL,
there is no need to compute `exprs`.

Cheers

>


Re: Removing unneeded self joins

2022-08-26 Thread Andrey Lepikhov

On 30/6/2022 17:11, Andrey Lepikhov wrote:

On 19/5/2022 16:47, Ronan Dunklau wrote:

I'll take a look at that one.

New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

New version rebased on new master, minor changes and tests added.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 8d864515da68728ddee10d455f8bdb64d34277aa Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 26 Aug 2022 15:17:53 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   39 +
 src/backend/optimizer/plan/analyzejoins.c | 1045 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |6 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  735 +++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  329 +++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2215 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a5cd4e44c7..2cfb62f97f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5297,6 +5297,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 045ff2e487..a9f8f89312 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3495,8 +3495,24 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
+   List   *exprs;
 
Assert(list_length(exprlist) == list_length(oprlist));
 
@@ -3551,6 +3567,8 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
 
+   exprs = NIL;
+
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
 * a partial index that doesn't match the query, it's useless 
here.
@@ -3598,6 +3616,23 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, 

Re: Removing unneeded self joins

2022-07-04 Thread Zhihong Yu
On Mon, Jul 4, 2022 at 6:52 AM Ronan Dunklau  wrote:

> Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> > On 19/5/2022 16:47, Ronan Dunklau wrote:
> > > I'll take a look at that one.
> >
> > New version of the patch, rebased on current master:
> > 1. pgindent over the patch have passed.
> > 2. number of changed files is reduced.
> > 3. Some documentation and comments is added.
>
> Hello Andrey,
>
> Thanks for the updates.
>
> The general approach seems sensible to me, so I'm going to focus on some
> details.
>
> In a very recent thread [1], Tom Lane is proposing to add infrastructure
> to make Var aware of their nullability by outer joins. I wonder if that
> would help with avoiding the need for adding is not null clauses when the
> column is known not null ?
> If we have a precedent for adding a BitmapSet to the Var itself, maybe the
> whole discussion regarding keeping track of nullability can be extended to
> the original column nullability ?
>
> Also, I saw it was mentioned earlier in the thread but how difficult would
> it be to process the transformed quals through the EquivalenceClass
> machinery and the qual simplification ?
> For example, if the target audience of this patch is ORM, or inlined
> views, it wouldn't surprise me to see queries of this kind in the wild,
> which could be avoided altogether:
>
> postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a =
> s2.a where s1.b = 2 and s2.b =3;
>  QUERY PLAN
> -
>  Seq Scan on sj s2
>Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
> (2 lignes)
>
>
> +   for (counter = 0; counter < list_length(*sources);)
> +   {
> +   ListCell   *cell = list_nth_cell(*sources, counter);
> +   RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
> +   int counter1;
> +
> 
> + ec->ec_members = list_delete_cell(ec->ec_members, cell);
>
>
> Why don't you use foreach() and foreach_delete_current macros for
> iterating and removing items in the lists, both in update_ec_members and
> update_ec_sources ?
>
>
> +   if ((bms_is_member(k, info->syn_lefthand) ^
> +bms_is_member(r,
> info->syn_lefthand)) ||
> +   (bms_is_member(k,
> info->syn_righthand) ^
> +bms_is_member(r,
> info->syn_righthand)))
>
> I think this is more compact and easier to follow than the previous
> version, but I'm not sure how common it is in postgres source code to use
> that kind of construction ?
>
> Some review about the comments:
>
>
> I see you keep using the terms "the keeping relation" and "the removing
> relation" in reference to the relation that is kept and the one that is
> removed.
> Aside from the grammar (the kept relation or the removed relation), maybe
> it would make it clearer to call them something else. In other parts of the
> code, you used "the remaining relation / the removed relation" which makes
> sense.
>
>  /*
>   * Remove the target relid from the planner's data structures, having
> - * determined that there is no need to include it in the query.
> + * determined that there is no need to include it in the query. Or replace
> + * with another relid.
> + * To reusability, this routine can work in two modes: delete relid from
> a plan
> + * or replace it. It is used in replace mode in a self-join removing
> process.
>
> This could be rephrased: ", optionally replacing it with another relid.
> The latter is used by the self-join removing process."
>
>
> [1]
> https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us
>
> --
> Ronan Dunklau
>
>
> Hi,
bq. this is more compact and easier to follow than the previous version

A code comment can be added above the expression (involving XOR) to explain
the purpose of the expression.

Cheers


Re: Removing unneeded self joins

2022-07-04 Thread Ronan Dunklau
Le jeudi 30 juin 2022, 16:11:51 CEST Andrey Lepikhov a écrit :
> On 19/5/2022 16:47, Ronan Dunklau wrote:
> > I'll take a look at that one.
> 
> New version of the patch, rebased on current master:
> 1. pgindent over the patch have passed.
> 2. number of changed files is reduced.
> 3. Some documentation and comments is added.

Hello Andrey,

Thanks for the updates. 

The general approach seems sensible to me, so I'm going to focus on some 
details. 

In a very recent thread [1], Tom Lane is proposing to add infrastructure to 
make Var aware of their nullability by outer joins. I wonder if that would help 
with avoiding the need for adding is not null clauses when the column is known 
not null ?
If we have a precedent for adding a BitmapSet to the Var itself, maybe the 
whole discussion regarding keeping track of nullability can be extended to the 
original column nullability ?

Also, I saw it was mentioned earlier in the thread but how difficult would it 
be to process the transformed quals through the EquivalenceClass machinery and 
the qual simplification ? 
For example, if the target audience of this patch is ORM, or inlined views, it 
wouldn't surprise me to see queries of this kind in the wild, which could be 
avoided altogether:

postgres=# explain (costs off) select * from sj s1 join sj s2 on s1.a = s2.a 
where s1.b = 2 and s2.b =3;
 QUERY PLAN  
-
 Seq Scan on sj s2
   Filter: ((a IS NOT NULL) AND (b = 3) AND (b = 2))
(2 lignes)


+   for (counter = 0; counter < list_length(*sources);)
+   {
+   ListCell   *cell = list_nth_cell(*sources, counter);
+   RestrictInfo *rinfo = castNode(RestrictInfo, lfirst(cell));
+   int counter1;
+

+ ec->ec_members = list_delete_cell(ec->ec_members, cell);


Why don't you use foreach() and foreach_delete_current macros for iterating and 
removing items in the lists, both in update_ec_members and update_ec_sources ?


+   if ((bms_is_member(k, info->syn_lefthand) ^
+bms_is_member(r, info->syn_lefthand)) 
||
+   (bms_is_member(k, info->syn_righthand) ^
+bms_is_member(r, info->syn_righthand)))

I think this is more compact and easier to follow than the previous version, 
but I'm not sure how common it is in postgres source code to use that kind of 
construction ?

Some review about the comments:


I see you keep using the terms "the keeping relation" and "the removing 
relation" in reference to the relation that is kept and the one that is removed.
Aside from the grammar (the kept relation or the removed relation), maybe it 
would make it clearer to call them something else. In other parts of the code, 
you used "the remaining relation / the removed relation" which makes sense.

 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.

This could be rephrased: ", optionally replacing it with another relid. The 
latter is used by the self-join removing process."


[1] https://www.postgresql.org/message-id/flat/830269.1656693747%40sss.pgh.pa.us

-- 
Ronan Dunklau






Re: Removing unneeded self joins

2022-06-30 Thread Andrey Lepikhov

On 19/5/2022 16:47, Ronan Dunklau wrote:

I'll take a look at that one.

New version of the patch, rebased on current master:
1. pgindent over the patch have passed.
2. number of changed files is reduced.
3. Some documentation and comments is added.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 9ce71f1d0ffefa9d77edfa30fc189bc0425ebbbe Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 doc/src/sgml/config.sgml  |   16 +
 src/backend/optimizer/path/indxpath.c |   39 +
 src/backend/optimizer/plan/analyzejoins.c | 1045 -
 src/backend/optimizer/plan/planmain.c |5 +
 src/backend/utils/misc/guc.c  |   10 +
 src/include/optimizer/paths.h |3 +
 src/include/optimizer/planmain.h  |6 +
 src/test/regress/expected/equivclass.out  |   32 +
 src/test/regress/expected/join.out|  686 ++
 src/test/regress/expected/sysviews.out|3 +-
 src/test/regress/sql/equivclass.sql   |   16 +
 src/test/regress/sql/join.sql |  305 ++
 src/tools/pgindent/typedefs.list  |2 +
 13 files changed, 2142 insertions(+), 26 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 48478b1024..2226117c62 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5287,6 +5287,22 @@ ANY num_sync ( 
+  enable_self_join_removal (boolean)
+  
+   enable_self_join_removal configuration 
parameter
+  
+  
+  
+   
+Enables or disables the query planner's optimization which analyses
+query tree and replaces self joins with semantically equivalent single
+scans. Take into consideration only plain tables.
+The default is on.
+   
+  
+ 
+
  
   enable_seqscan (boolean)
   
diff --git a/src/backend/optimizer/path/indxpath.c 
b/src/backend/optimizer/path/indxpath.c
index 0ef70ad7f1..2eb05c79ce 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3498,8 +3498,24 @@ bool
 relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
  List *restrictlist,
  List *exprlist, List 
*oprlist)
+{
+   return relation_has_unique_index_ext(root, rel, restrictlist,
+   
 exprlist, oprlist, NULL);
+}
+
+/*
+ * relation_has_unique_index_ext
+ * if extra_clauses isn't NULL, return baserestrictinfo clauses which were
+ * used to derive uniqueness.
+ */
+bool
+relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
+ List *restrictlist,
+ List *exprlist, List 
*oprlist,
+ List **extra_clauses)
 {
ListCell   *ic;
+   List   *exprs;
 
Assert(list_length(exprlist) == list_length(oprlist));
 
@@ -3554,6 +3570,8 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
int c;
 
+   exprs = NIL;
+
/*
 * If the index is not unique, or not immediately enforced, or 
if it's
 * a partial index that doesn't match the query, it's useless 
here.
@@ -3601,6 +3619,23 @@ relation_has_unique_index_for(PlannerInfo *root, 
RelOptInfo *rel,
if (match_index_to_operand(rexpr, c, ind))
{
matched = true; /* column is 

Re: Removing unneeded self joins

2022-05-19 Thread Ronan Dunklau
Le jeudi 19 mai 2022, 12:48:18 CEST Andrey Lepikhov a écrit :
> On 5/17/22 19:14, Ronan Dunklau wrote:
> > Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> >> New version of the feature.
> >> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> >> when uniqueness of an inner deduced not from join quals, but from a
> >> baserestrictinfo clauses 'x=const', where x - unique field.
> >> Code, dedicated to solve second issue is controversial, so i attached
> >> delta.txt for quick observation.
> >> Maybe we should return to previous version of code, when we didn't split
> >> restriction list into join quals and base quals?
> > 
> > Hello,
> > 
> > I tried to find problematic cases, which would make the planning time grow
> > unacceptably, and couldn't devise it.
> > 
> > The worst case scenario I could think of was as follows:
> >   - a query with many different self joins
> >   - an abundance of unique indexes on combinations of this table columns
> >   to
> > 
> > consider
> > 
> >   - additional predicates on the where clause on columns.
> 
> Looking into the patch I can imagine, that the most difficult case is
> when a set of relations with the same OID is huge, but only small part
> of them (or nothing) can be removed.
> Also, removing a clause from restrictinfo list or from equivalence class
> adds non-linear complexity. So, you can dig this way ).
> 
> > The base table I used for this was a table with 40 integers. 39 unique
> > indexes were defined on every combination of (c1, cX) with cX being
> > columns c2 to c40.
> > 
> > I turned geqo off, set from_collapse_limit and join_collapse_limit to
> > unreasonably high values (30), and tried to run queries of the form:
> > 
> > SELECT * FROM test_table t1
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
> > ...
> > JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.
> > 
> > So no self join can be eliminated in that case.
> 
> I think, you should compare t1.cX with tX.cX to eliminate self-join.
> Cross-unique-index proof isn't supported now.

Yes, that's the point. I wanted to try to introduce as much complexity as I 
could, without actually performing any self join elimination. The idea was to 
try to come up with the worst case scenario.
> 
> > The performance was very similar with or without the GUC enabled. I tested
> > the same thing without the patch, since the test for uniqueness has been
> > slightly altered and incurs a new allocation, but it doesn't seem to
> > change.
> > 
> > One interesting side effect of this patch, is that removing any unneeded
> > self join cuts down the planification time very significantly, as we
> > lower the number of combinations to consider.
> 
> Even more - removing a join we improve cardinality estimation.
> 
> > As for the code:
> >   - Comments on relation_has_unique_index_ext and
> >   relation_has_unique_index_for> 
> > should be rewritten, as relation_has_unique_index_for is now just a
> > special
> > case of relation_has_unique_index_ext. By the way, the comment would
> > probably be better read as: "but if extra_clauses isn't NULL".
> > 
> >   - The whole thing about "extra_clauses", ie, baserestrictinfos which
> >   were
> > 
> > used to determine uniqueness, is not very clear. Most functions where the
> > new argument has been added have not seen an update in their comments,
> > and the name itself doesn't really convey the intented meaning: perhaps
> > required_non_join_clauses ?
> > 
> > The way this works should be explained a bit more thoroughly, for example
> > in remove_self_joins_one_group the purpose of uclauses should be
> > explained. The fact that degenerate_case returns true when we don't have
> > any additional base restrict info is also confusing, as well as the
> > degenerate_case name.
> Agree,
> but after this case thoughts wander in my head: should we make one step
> back to pre-[1] approach? It looks like we have quite similar changes,
> but without special function for a 'degenerate case' detection and
> restrictlist splitting.

I'll take a look at that one. 

> 
> > I'll update if I think of more interesting things to add.
> 
> Thank you for your efforts!
> 
> See in attachment next version which fixes mistakes detected by
> z...@yugabyte.com.
> 
> [1]
> https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW
> 5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com


-- 
Ronan Dunklau






Re: Removing unneeded self joins

2022-05-19 Thread Andrey Lepikhov

On 5/17/22 19:14, Ronan Dunklau wrote:

Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :

New version of the feature.
Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
when uniqueness of an inner deduced not from join quals, but from a
baserestrictinfo clauses 'x=const', where x - unique field.
Code, dedicated to solve second issue is controversial, so i attached
delta.txt for quick observation.
Maybe we should return to previous version of code, when we didn't split
restriction list into join quals and base quals?


Hello,

I tried to find problematic cases, which would make the planning time grow
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

  - a query with many different self joins
  - an abundance of unique indexes on combinations of this table columns to
consider
  - additional predicates on the where clause on columns.
Looking into the patch I can imagine, that the most difficult case is 
when a set of relations with the same OID is huge, but only small part 
of them (or nothing) can be removed.
Also, removing a clause from restrictinfo list or from equivalence class 
adds non-linear complexity. So, you can dig this way ).


The base table I used for this was a table with 40 integers. 39 unique indexes
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
I think, you should compare t1.cX with tX.cX to eliminate self-join. 
Cross-unique-index proof isn't supported now.

The performance was very similar with or without the GUC enabled. I tested the
same thing without the patch, since the test for uniqueness has been slightly
altered and incurs a new allocation, but it doesn't seem to change.

One interesting side effect of this patch, is that removing any unneeded self
join cuts down the planification time very significantly, as we lower the number
of combinations to consider.

Even more - removing a join we improve cardinality estimation.


As for the code:

  - Comments on relation_has_unique_index_ext and relation_has_unique_index_for
should be rewritten, as relation_has_unique_index_for is now just a special
case of relation_has_unique_index_ext. By the way, the comment would probably
be better read as: "but if extra_clauses isn't NULL".
  - The whole thing about "extra_clauses", ie, baserestrictinfos which were
used to determine uniqueness, is not very clear. Most functions where the new
argument has been added have not seen an update in their comments, and the
name itself doesn't really convey the intented meaning: perhaps
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in
remove_self_joins_one_group the purpose of uclauses should be explained. The
fact that degenerate_case returns true when we don't have any additional base
restrict info is also confusing, as well as the degenerate_case name.

Agree,
but after this case thoughts wander in my head: should we make one step 
back to pre-[1] approach? It looks like we have quite similar changes, 
but without special function for a 'degenerate case' detection and 
restrictlist splitting.


I'll update if I think of more interesting things to add.

Thank you for your efforts!

See in attachment next version which fixes mistakes detected by 
z...@yugabyte.com.


[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom 1774b4bcfe337b151b76c7f357dc6748755bf1d9 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

A Self Join Removal (SJR) feature removes inner join of plane table to itself
in a query plan if can be proved that the join can be replaced with a scan.
The proof based on innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation.
3. If uniqueness of outer relation deduces from baserestrictinfo clause like
'x=const' on unique field(s), check that inner has the same clause with the
same constant(s).
4. Compare RowMarks of inner and outer relations. They must have the same
strength.

Some 

Re: Removing unneeded self joins

2022-05-17 Thread Ronan Dunklau
Le vendredi 13 mai 2022, 07:07:47 CEST Andrey Lepikhov a écrit :
> New version of the feature.
> Here a minor bug with RowMarks is fixed. A degenerated case is fixed,
> when uniqueness of an inner deduced not from join quals, but from a
> baserestrictinfo clauses 'x=const', where x - unique field.
> Code, dedicated to solve second issue is controversial, so i attached
> delta.txt for quick observation.
> Maybe we should return to previous version of code, when we didn't split
> restriction list into join quals and base quals?

Hello,

I tried to find problematic cases, which would make the planning time grow 
unacceptably, and couldn't devise it.

The worst case scenario I could think of was as follows:

 - a query with many different self joins
 - an abundance of unique indexes on combinations of this table columns to 
consider
 - additional predicates on the where clause on columns.

The base table I used for this was a table with 40 integers. 39 unique indexes 
were defined on every combination of (c1, cX) with cX being columns c2 to c40.

I turned geqo off, set from_collapse_limit and join_collapse_limit to 
unreasonably high values (30), and tried to run queries of the form:

SELECT * FROM test_table t1
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX
...
JOIN test_table tX ON t1.c1 = tX.c1 AND t1.c[X+2] = tX.cX.

So no self join can be eliminated in that case.
The performance was very similar with or without the GUC enabled. I tested the 
same thing without the patch, since the test for uniqueness has been slightly 
altered and incurs a new allocation, but it doesn't seem to change. 

One interesting side effect of this patch, is that removing any unneeded self 
join cuts down the planification time very significantly, as we lower the 
number 
of combinations to consider. 

As for the code: 

 - Comments on relation_has_unique_index_ext and relation_has_unique_index_for 
should be rewritten, as relation_has_unique_index_for is now just a special 
case of relation_has_unique_index_ext. By the way, the comment would probably 
be better read as: "but if extra_clauses isn't NULL".
 - The whole thing about "extra_clauses", ie, baserestrictinfos which were 
used to determine uniqueness, is not very clear. Most functions where the new 
argument has been added have not seen an update in their comments, and the 
name itself doesn't really convey the intented meaning: perhaps 
required_non_join_clauses ?

The way this works should be explained a bit more thoroughly, for example in 
remove_self_joins_one_group the purpose of uclauses should be explained. The 
fact that degenerate_case returns true when we don't have any additional base 
restrict info is also confusing, as well as the degenerate_case name.

I'll update if I think of more interesting things to add. 

-- 
Ronan Dunklau






Re: Removing unneeded self joins

2022-05-13 Thread Zhihong Yu
Hi,
w.r.t. v33-0001-Remove-self-joins.patch :

removes inner join of plane table -> removes inner join of plain table
in an query plan -> in a query plan

+ * Used as the relation_has_unique_index_for,

Since relation_has_unique_index_for() becomes a wrapper of
relation_has_unique_index_ext, the above sentence doesn't make much sense.
I think you can drop this part.

but if extra_clauses doesn't NULL -> If extra_clauses isn't NULL

+   is_req_equal =
+   (rinfo->required_relids == rinfo->clause_relids) ? true : false;

The above can be simplified to:
  is_req_equal = rinfo->required_relids == rinfo->clause_relids;

+   ListCell *otherCell;
otherCell should be initialized to NULL.

+   if (bms_is_member(k, info->syn_lefthand) &&
+   !bms_is_member(r, info->syn_lefthand))
+   jinfo_check = false;
+   else if (bms_is_member(k, info->syn_righthand) &&
+   !bms_is_member(r, info->syn_righthand))
+   jinfo_check = false;
+   else if (bms_is_member(r, info->syn_lefthand) &&
+   !bms_is_member(k, info->syn_lefthand))
+   jinfo_check = false;

I think the above code can be simplified:

If bms_is_member(k, info->syn_lefthand) ^ bms_is_member(r,
info->syn_lefthand) is true, jinfo_check is false.
If bms_is_member(k, info->syn_righthand) ^ bms_is_member(r,
info->syn_righthand) is true, jinfo_check is false.
Otherwise jinfo_check is true.

Cheers


Re: Removing unneeded self joins

2022-04-04 Thread Andrey V. Lepikhov

On 4/1/22 20:27, Greg Stark wrote:

Sigh. And now there's a patch conflict in a regression test expected
output: sysviews.out

Please rebase. Incidentally, make sure to check the expected output is
actually correct. It's easy to "fix" an expected output to
accidentally just memorialize an incorrect output.

Btw, it's the last week before feature freeze so time is of the essence.Thanks,

patch in attachment rebased on current master.
Sorry for late answer.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom d5fab52bd7e7124d0e557f1eec075a9543c67d29 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
-  Relids joinrelids);
+  Relids joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    RelOptInfo *innerrel,
    JoinType jointype,
    List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
 		remove_rel_from_query(root, innerrelid,
 			  bms_union(sjinfo->min_lefthand,
-		sjinfo->min_righthand));
+		sjinfo->min_righthand), 0);
 
 		/* We verify that exactly one reference gets removed from joinlist */
 		nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
  *
  * We are not terribly thorough here.  We must make sure that the rel is
  * no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
  * lists, but only if they belong to the outer join identified by joinrelids.
  */
 static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+	  int subst_relid)
 {
 	RelOptInfo *rel = 

Re: Removing unneeded self joins

2022-04-01 Thread Greg Stark
Sigh. And now there's a patch conflict in a regression test expected
output: sysviews.out

Please rebase. Incidentally, make sure to check the expected output is
actually correct. It's easy to "fix" an expected output to
accidentally just memorialize an incorrect output.

Btw, it's the last week before feature freeze so time is of the essence.




Re: Removing unneeded self joins

2022-03-23 Thread Andrey V. Lepikhov

On 3/22/22 05:58, Andres Freund wrote:

Hi,

On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:

Also, in new version of the patch I fixed one stupid bug: checking a
self-join candidate expression operator - we can remove only expressions
like F(arg1) = G(arg2).


This CF entry currently fails tests: 
https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938

Looks like you're missing an adjustment of postgresql.conf.sample

Marked as waiting-on-author.

Thanks, I fixed it.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 620dea31ce19965beefe545f08dcc5c8b319c434 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
-  Relids joinrelids);
+  Relids joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    RelOptInfo *innerrel,
    JoinType jointype,
    List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
 		remove_rel_from_query(root, innerrelid,
 			  bms_union(sjinfo->min_lefthand,
-		sjinfo->min_righthand));
+		sjinfo->min_righthand), 0);
 
 		/* We verify that exactly one reference gets removed from joinlist */
 		nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to include it in the query. Or replace
+ * with another relid.
+ * To reusability, this routine can work in two modes: delete relid from a plan
+ * or replace it. It is used in replace mode in a self-join removing process.
  *
  * We are not terribly thorough here.  We must make sure that the rel is
  * no longer treated as a baserel, and that attributes of other baserels
@@ -309,13 +318,16 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
  * lists, but only if they belong to the outer join identified by joinrelids.
  */
 static void
-remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids)
+remove_rel_from_query(PlannerInfo *root, int relid, Relids joinrelids,
+	  int 

Re: Removing unneeded self joins

2022-03-21 Thread Andres Freund
Hi,

On 2022-03-04 15:47:47 +0500, Andrey Lepikhov wrote:
> Also, in new version of the patch I fixed one stupid bug: checking a
> self-join candidate expression operator - we can remove only expressions
> like F(arg1) = G(arg2).

This CF entry currently fails tests: 
https://cirrus-ci.com/task/4632127944785920?logs=test_world#L1938

Looks like you're missing an adjustment of postgresql.conf.sample

Marked as waiting-on-author.

Greetings,

Andres Freund




Re: Removing unneeded self joins

2022-03-04 Thread Andrey Lepikhov

On 1/3/2022 03:03, Greg Stark wrote:

On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau  wrote:


Well in some cases they can't, when the query is not emitting redundant
predicates by itself but they are added by something else like a view or a RLS
policy.
Maybe it would be worth it to allow spending a bit more time planning for
those cases ?


Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
   from t
join t2 as b on (b.id = t.id)
   join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?
Yes, the self-join removal machinery uses EC mechanism as usual to get 
all join clauses. So, this case works (See demo in attachment).


Also, in new version of the patch I fixed one stupid bug: checking a 
self-join candidate expression operator - we can remove only expressions 
like F(arg1) = G(arg2).


--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 70398361a0a0d9c6c3c7ddd1fd305ac11138e7b1 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 888 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 426 +++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 197 +
 12 files changed, 1583 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 337f470d58..c5ac8e2bd4 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min_righthand));
+   

Re: Removing unneeded self joins

2022-02-28 Thread Greg Stark
I don't think the benchmarking that's needed is to check whether
pruning unnecessary joins is helpful. Obviously it's going to be hard
to measure on simple queries and small tables. But the resulting plan
is unambiguously superior and in more complex cases could extra i/o.

The benchmarking people were looking for in the past was testing the
impact of the extra planning work in cases where it doesn't end up
being applied. I'm not sure what the worst case is, perhaps a many-way
self-join where the join clauses are not suitable for pruning?




Re: Removing unneeded self joins

2022-02-28 Thread Jaime Casanova
On Thu, Jul 15, 2021 at 05:49:11PM +0300, Andrey Lepikhov wrote:
> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of partitioned
> tables, for example, the overhead could grow. But in the context of overall
> planning time this overhead will be small till the large number of
> relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
> 

I made some tests in a machine with 16 cores and 32GB of RAM.
So we can see if this is an improvement.

This is what I found:

+---+--+---+---+---+---+---+
| test  |   mode   |  master   |  enabled  |   %   | disabled  
|   %   |
+---+--+---+---+---+---+---+
| pgbench read only | standard |  64418.13 |  63942.94 | -0.74 |  62231.38 
| -3.39 |
| pgbench read only | prepared | 108463.51 | 107002.13 | -1.35 | 100960.83 
| -6.92 |
| pgbench read only | extended |  55409.65 |  56427.63 |  1.84 |  55927.62 
|  0.93 |
+---+--+---+---+---+---+---+
| pgbench read/write| standard |   9374.91 |   9135.21 | -2.56 |   8840.68 
| -5.70 |
| pgbench read/write| prepared |  11849.86 |  11672.23 | -1.50 |  11393.39 
| -3.85 |
| pgbench read/write| extended |   7976.80 |   7947.07 | -0.37 |   7788.99 
| -2.35 |
+---+--+---+---+---+---+---+
| select non optimize 1 | standard | 80.97 | 81.29 |  0.40 | 81.30 
|  0.41 |
| select non optimize 1 | prepared | 81.29 | 81.28 | -0.01 | 80.89 
| -0.49 |
| select non optimize 1 | extended | 81.07 | 80.81 | -0.32 | 80.98 
| -0.11 |
+---+--+---+---+---+---+---+
| select optimized 1| standard | 15.84 | 13.90 |-12.25 | 15.80 
| -0.25 |
| select optimized 1| prepared | 15.24 | 13.82 | -9.32 | 15.55 
|  2.03 |
| select optimized 1| extended | 15.38 | 13.89 | -9.69 | 15.59 
|  1.37 |
+---+--+---+---+---+---+---+
| select optimized 2| standard |  10204.91 |  10818.39 |  6.01 |  10261.07 
|  0.55 |
| select optimized 2| prepared |  13284.06 |  15579.33 | 17.28 |  13116.22 
| -1.26 |
| select optimized 2| extended |  10143.43 |  10645.23 |  4.95 |  10142.77 
| -0.01 |
+---+--+---+---+---+---+---+
| select shoe   | standard |   5645.28 |   5661.71 |  0.29 |   6180.60 
|  9.48 |
| select shoe   | prepared |   9660.45 |   9602.37 | -0.60 |   9894.82 
|  2.43 |
| select shoe   | extended |   5666.47 |   5634.10 | -0.57 |   5757.26 
|  1.60 |
+---+--+---+---+---+---+---+

Obviously the pgbench runs are from the standard script. The numbers are
not clear for me, I can see improvementes with the patch only in one
case and, for some reason, if I disable the patch
(enable_self_join_removal='off') I still see a regression in normal
cases and curiosly an improvement in one case.

I'm attaching the queries. I used the users table that is down-thread
and loaded with ~200k rows using:

insert into users 
select seq, case when random() < 0.2 then null else random() * 1000 end, 
   random() * 1 
  from generate_series(1, 100) seq 
  on conflict (nullable_int) do nothing;

for master I just dumped the data from the table and loaded it. I'm also
attaching the queries I used.

After this tests, I'm not convinced this is actually providing something
performance-wise. At least not in its current state.

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL


select_nonoptimize1.sql
Description: application/sql


select_optimize1.sql
Description: application/sql


select_optimize2.sql
Description: application/sql


select_shoe.sql
Description: application/sql


Re: Removing unneeded self joins

2022-02-28 Thread Greg Stark
On Thu, 1 Jul 2021 at 02:38, Ronan Dunklau  wrote:
>
> Well in some cases they can't, when the query is not emitting redundant
> predicates by itself but they are added by something else like a view or a RLS
> policy.
> Maybe it would be worth it to allow spending a bit more time planning for
> those cases ?

Yeah, I'm generally in favour of doing more work in the optimizer to
save query authors work writing queries.

My question is whether it handles cases like:

select b.x,c.y
  from t
   join t2 as b on (b.id = t.id)
  join t2 as c on (c.id = t.id)

That is, if you join against the same table twice on the same qual.
Does the EC mechanism turn this into a qual on b.id = c.id and then
turn this into a self-join that can be removed?

That's the usual pattern I've seen this arise. Not so much that people
write self joins explicitly but that they add a join to check some
column but that is happening in some isolated piece of code that
doesn't know that that join is already in the query. You can easily
end up with a lot of joins against the same table this way.

It's not far different from the old chestnut

select (select x from t2 where id = t.id) as x,
   (select y from t2 where id = t.id) as y
  from t

which is actually pretty hard to avoid sometimes.

-- 
greg




Re: Removing unneeded self joins

2021-07-26 Thread Andrey V. Lepikhov

On 7/16/21 12:28 AM, Zhihong Yu wrote:



On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu > wrote:

bq. We can proof the uniqueness

proof -> prove

Fixed


1. Collect all mergejoinable join quals looks like a.x = b.x

  quals looks like -> quals which look like

For update_ec_sources(), the variable cc is not used.

Fixed

+   *otherjoinquals = rjoinquals;

Maybe rename rjoinquals as ojoinquals to align with the target variable 
name.

Agree, fixed


+   int k; /* Index of kept relation */
+   int r = -1; /* Index of removed relation */

Naming k as kept, r as removed would make the code more readable (remain 
starts with r but has opposite meaning).
I think it is correct now: k - index of inner (keeping) relation. r - of 
outer (removing) relation.


+               if (bms_is_member(r, info->syn_righthand) &&
+                   !bms_is_member(k, info->syn_righthand))
+                   jinfo_check = false;
+
+               if (!jinfo_check)
+                   break;

There are 4 if statements where jinfo_check is assigned false. Once 
jinfo_check is assigned, we can break out of the loop - instead of 
checking the remaining conditions.

Fixed


+           else if (!innerrel_is_unique(root, joinrelids, outer->relids,

nit: the 'else' is not needed since the if block above it goes to next 
iteration of the loop.

Fixed


+           /* See for row marks. */
+           foreach (lc, root->rowMarks)

It seems once imark and omark are set, we can come out of the loop.

Maybe you right. fixed.

--
regards,
Andrey Lepikhov
Postgres Professional
>From e8b4047aa71c808fa5799b2739b2ae0ab7b6d7e3 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can prove the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals which look like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 886 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 399 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 189 +
 12 files changed, 1546 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..eb9d83b424 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
-  Relids joinrelids);
+  Relids joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int *nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
    RelOptInfo *innerrel,
    JoinType jointype,
    List *restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
 		remove_rel_from_query(root, innerrelid,
 			  bms_union(sjinfo->min_lefthand,
-		sjinfo->min_righthand));
+		sjinfo->min_righthand), 

Re: Removing unneeded self joins

2021-07-15 Thread Zhihong Yu
On Thu, Jul 15, 2021 at 8:25 AM Zhihong Yu  wrote:

>
>
> On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov 
> wrote:
>
>> On 6/7/21 13:49, Hywel Carver wrote:
>> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
>> > mailto:a.lepik...@postgrespro.ru>> wrote:
>> > Looking through the email chain, a previous version of this patch added
>> > ~0.6% to planning time in the worst case tested - does that meet the
>> > "essentially free" requirement?
>> I think these tests weren't full coverage of possible use cases. It will
>> depend on a number of relations in the query. For the JOIN of
>> partitioned tables, for example, the overhead could grow. But in the
>> context of overall planning time this overhead will be small till the
>> large number of relations.
>> Also, we made this feature optional to solve possible problems.
>> Rebased on 768ea9bcf9
>>
>> --
>> regards,
>> Andrey Lepikhov
>> Postgres Professional
>>
> Hi,
>
> bq. We can proof the uniqueness
>
> proof -> prove
>
> 1. Collect all mergejoinable join quals looks like a.x = b.x
>
>  quals looks like -> quals which look like
>
> For update_ec_sources(), the variable cc is not used.
>
> Cheers
>
Hi,

+   *otherjoinquals = rjoinquals;

Maybe rename rjoinquals as ojoinquals to align with the target variable
name.

+   int k; /* Index of kept relation */
+   int r = -1; /* Index of removed relation */

Naming k as kept, r as removed would make the code more readable (remain
starts with r but has opposite meaning).

+   if (bms_is_member(r, info->syn_righthand) &&
+   !bms_is_member(k, info->syn_righthand))
+   jinfo_check = false;
+
+   if (!jinfo_check)
+   break;

There are 4 if statements where jinfo_check is assigned false. Once
jinfo_check is assigned, we can break out of the loop - instead of checking
the remaining conditions.

+   else if (!innerrel_is_unique(root, joinrelids, outer->relids,

nit: the 'else' is not needed since the if block above it goes to next
iteration of the loop.

+   /* See for row marks. */
+   foreach (lc, root->rowMarks)

It seems once imark and omark are set, we can come out of the loop.

Cheers


Re: Removing unneeded self joins

2021-07-15 Thread Zhihong Yu
On Thu, Jul 15, 2021 at 7:49 AM Andrey Lepikhov 
wrote:

> On 6/7/21 13:49, Hywel Carver wrote:
> > On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > Looking through the email chain, a previous version of this patch added
> > ~0.6% to planning time in the worst case tested - does that meet the
> > "essentially free" requirement?
> I think these tests weren't full coverage of possible use cases. It will
> depend on a number of relations in the query. For the JOIN of
> partitioned tables, for example, the overhead could grow. But in the
> context of overall planning time this overhead will be small till the
> large number of relations.
> Also, we made this feature optional to solve possible problems.
> Rebased on 768ea9bcf9
>
> --
> regards,
> Andrey Lepikhov
> Postgres Professional
>
Hi,

bq. We can proof the uniqueness

proof -> prove

1. Collect all mergejoinable join quals looks like a.x = b.x

 quals looks like -> quals which look like

For update_ec_sources(), the variable cc is not used.

Cheers


Re: Removing unneeded self joins

2021-07-15 Thread Andrey Lepikhov

On 6/7/21 13:49, Hywel Carver wrote:
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:
Looking through the email chain, a previous version of this patch added 
~0.6% to planning time in the worst case tested - does that meet the 
"essentially free" requirement?
I think these tests weren't full coverage of possible use cases. It will 
depend on a number of relations in the query. For the JOIN of 
partitioned tables, for example, the overhead could grow. But in the 
context of overall planning time this overhead will be small till the 
large number of relations.

Also, we made this feature optional to solve possible problems.
Rebased on 768ea9bcf9

--
regards,
Andrey Lepikhov
Postgres Professional
From 6f9d11ec64b5b8e2304156deaea7842e0fd77c3e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 15 Jul 2021 15:26:13 +0300
Subject: [PATCH] Remove self-joins.

Remove inner joins of a relation to itself if could prove that the join
can be replaced with a scan. We can proof the uniqueness
using the existing innerrel_is_unique machinery.

We can remove a self-join when for each outer row:
1. At most one inner row matches the join clauses.
2. If the join target list contains any inner vars, an inner row
must be (physically) the same row as the outer one.

In this patch we use Rowley's [1] approach to identify a self-join:
1. Collect all mergejoinable join quals looks like a.x = b.x
2. Check innerrel_is_unique() for the qual list from (1). If it
returns true, then outer row matches only the same row from the inner
relation. So proved, that this join is self-join and can be replaced by
a scan.

Some regression tests changed due to self-join removal logic.

[1] 
https://www.postgresql.org/message-id/raw/CAApHDvpggnFMC4yP-jUO7PKN%3DfXeErW5bOxisvJ0HvkHQEY%3DWw%40mail.gmail.com
---
 src/backend/optimizer/plan/analyzejoins.c | 890 +-
 src/backend/optimizer/plan/planmain.c |   5 +
 src/backend/optimizer/util/joininfo.c |   3 +
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/backend/utils/misc/guc.c  |  10 +
 src/include/optimizer/pathnode.h  |   4 +
 src/include/optimizer/planmain.h  |   2 +
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 399 ++
 src/test/regress/expected/sysviews.out|   3 +-
 src/test/regress/sql/equivclass.sql   |  16 +
 src/test/regress/sql/join.sql | 189 +
 12 files changed, 1550 insertions(+), 29 deletions(-)

diff --git a/src/backend/optimizer/plan/analyzejoins.c 
b/src/backend/optimizer/plan/analyzejoins.c
index 37eb64bcef..a8e638f6e7 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -22,6 +22,7 @@
  */
 #include "postgres.h"
 
+#include "catalog/pg_class.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/joininfo.h"
@@ -32,10 +33,12 @@
 #include "optimizer/tlist.h"
 #include "utils/lsyscache.h"
 
+bool enable_self_join_removal;
+
 /* local functions */
 static bool join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo);
 static void remove_rel_from_query(PlannerInfo *root, int relid,
- Relids 
joinrelids);
+ Relids 
joinrelids, int subst_relid);
 static List *remove_rel_from_joinlist(List *joinlist, int relid, int 
*nremoved);
 static bool rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel);
 static bool rel_is_distinct_for(PlannerInfo *root, RelOptInfo *rel,
@@ -47,6 +50,9 @@ static bool is_innerrel_unique_for(PlannerInfo *root,
   RelOptInfo 
*innerrel,
   JoinType 
jointype,
   List 
*restrictlist);
+static void change_rinfo(RestrictInfo* rinfo, Index from, Index to);
+static Bitmapset* change_relid(Relids relids, Index oldId, Index newId);
+static void change_varno(Expr *expr, Index oldRelid, Index newRelid);
 
 
 /*
@@ -86,7 +92,7 @@ restart:
 
remove_rel_from_query(root, innerrelid,
  
bms_union(sjinfo->min_lefthand,
-   
sjinfo->min_righthand));
+   
sjinfo->min_righthand), 0);
 
/* We verify that exactly one reference gets removed from 
joinlist */
nremoved = 0;
@@ -300,7 +306,10 @@ join_is_removable(PlannerInfo *root, SpecialJoinInfo 
*sjinfo)
 
 /*
  * Remove the target relid from the planner's data structures, having
- * determined that there is no need to include it in the query.
+ * determined that there is no need to 

Re: Removing unneeded self joins

2021-07-15 Thread vignesh C
On Thu, May 27, 2021 at 12:21 PM Andrey V. Lepikhov
 wrote:
>
> On 5/8/21 2:00 AM, Hywel Carver wrote:
> > On Fri, May 7, 2021 at 8:23 AM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > Here I didn't work on 'unnecessary IS NOT NULL filter'.
> >
> > I've tested the new patch, and it is giving the same improved behaviour
> > as the old patch.
> Thank you for this efforts!
>
> I cleaned the code of previous version, improved regression tests and
> rebased on current master.
>
> Also, I see that we could do additional optimizations for an
> EC-generated selfjoin clause (See equivclass.patch for necessary
> changes). Example:
> explain (costs off)
> select * from sj t1, sj t2 where t1.a = t1.b and t1.b = t2.b and t2.b =
> t2.a;
>   QUERY PLAN
> -
>   Seq Scan on sj t2
> Filter: ((a IS NOT NULL) AND (b = a) AND (a = b))
> (2 rows)
>
> But I'm not sure that this patch need to be a part of the self-join
> removal feature because of code complexity.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh




Re: Removing unneeded self joins

2021-07-06 Thread Hywel Carver
On Mon, Jul 5, 2021 at 2:20 PM Andrey Lepikhov 
wrote:

> On 2/7/21 01:56, Hywel Carver wrote:
> > On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov
> > mailto:a.lepik...@postgrespro.ru>> wrote:
> > I think, here we could ask more general question: do we want to
> > remove a
> > 'IS NOT NULL' clause from the clause list if the rest of the list
> > implicitly implies it?
> >
> >
> > My suggestion was not to remove it, but to avoid adding it in the first
> > place. When your optimisation has found a join on a group of columns
> > under a uniqueness constraint, you would do something like this (forgive
> > the pseudo-code)
> >
> > foreach(column, join_clause) {
> >if(column.nullable) { // This condition is what I'm suggesting is
> added
> >  add_null_test(column, IS_NOT_NULL);
> >}
> > }
> >
> > But it may be that that's not possible or practical at this point in the
> > code.
> I think, such option will require to implement a new machinery to prove
> that arbitrary column couldn't produce NULL value.
>

Got it, and it makes sense to me that this would be out of scope for this
change.

I remember in the previous conversation about this, Tomas acknowledged that
while there are some silly queries that would benefit from this change,
there are also some well-written ones (e.g. properly denormalised table
structures, with decomposed views that need joining together in some
queries). So the optimization needs to be essentially free to run to
minimise impact on other queries.

Looking through the email chain, a previous version of this patch added
~0.6% to planning time in the worst case tested - does that meet the
"essentially free" requirement?


Re: Removing unneeded self joins

2021-07-05 Thread Andrey Lepikhov

On 2/7/21 01:56, Hywel Carver wrote:
On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov 
mailto:a.lepik...@postgrespro.ru>> wrote:

I think, here we could ask more general question: do we want to
remove a
'IS NOT NULL' clause from the clause list if the rest of the list
implicitly implies it?


My suggestion was not to remove it, but to avoid adding it in the first 
place. When your optimisation has found a join on a group of columns 
under a uniqueness constraint, you would do something like this (forgive 
the pseudo-code)


foreach(column, join_clause) {
   if(column.nullable) { // This condition is what I'm suggesting is added
     add_null_test(column, IS_NOT_NULL);
   }
}

But it may be that that's not possible or practical at this point in the 
code.
I think, such option will require to implement a new machinery to prove 
that arbitrary column couldn't produce NULL value.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Removing unneeded self joins

2021-07-01 Thread Hywel Carver
On Wed, Jun 30, 2021 at 12:21 PM Andrey Lepikhov 
wrote:

> On 12/3/21 12:05, Hywel Carver wrote:
> > I've built and tested this, and it seems to function correctly to me.
> One question I have is whether the added "IS NOT NULL" filters can be
> omitted when they're unnecessary. Some of the resulting plans included an
> "IS NOT NULL" filter on a non-nullable column. To be clear, this is still
> an improvement (to me) without that.

I think, here we could ask more general question: do we want to remove a
> 'IS NOT NULL' clause from the clause list if the rest of the list
> implicitly implies it?


My suggestion was not to remove it, but to avoid adding it in the first
place. When your optimisation has found a join on a group of columns under
a uniqueness constraint, you would do something like this (forgive the
pseudo-code)

foreach(column, join_clause) {
  if(column.nullable) { // This condition is what I'm suggesting is added
add_null_test(column, IS_NOT_NULL);
  }
}

But it may be that that's not possible or practical at this point in the
code.


  1   2   >