Re: POC, WIP: OR-clause support for indexes

Alexander Korotkov Tue, 12 Mar 2024 08:20:39 -0700

On Mon, Mar 11, 2024 at 2:43 PM Andrei Lepikhov
<a.lepik...@postgrespro.ru> wrote:
> On 11/3/2024 18:31, Alexander Korotkov wrote:
> >> I'm not convinced about this limit. The initial reason was to combine
> >> long lists of ORs into the array because such a transformation made at
> >> an early stage increases efficiency.
> >> I understand the necessity of this limit in the array decomposition
> >> routine but not in the creation one.
> >
> > The comment near MAX_SAOP_ARRAY_SIZE says that this limit is because
> > N^2 algorithms could be applied to arrays.  Are you sure that's not
> > true for our case?
> When you operate an array, indeed. But when we transform ORs to an
> array, not. Just check all the places in the optimizer and even the
> executor where we would pass along the list of ORs. This is why I think
> we should use this optimization even more intensively for huge numbers
> of ORs in an attempt to speed up the overall query.


Ok.

> >>> 3) Better save the original order of clauses by putting hash entries and
> >>> untransformable clauses to the same list.  A lot of differences in
> >>> regression tests output have gone.
> >> I agree that reducing the number of changes in regression tests looks
> >> better. But to achieve this, you introduced a hack that increases the
> >> complexity of the code. Is it worth it? Maybe it would be better to make
> >> one-time changes in tests instead of getting this burden on board. Or
> >> have you meant something more introducing the node type?
> >
> > For me the reason is not just a regression test.  The current code
> > keeps the original order of quals as much as possible.  The OR
> > transformation code reorders quals even in cases when it doesn't
> > eventually apply any optimization.  I don't think that's acceptable.
> > However, less hackery ways for this is welcome for sure.
> Why is it unacceptable? Can the user implement some order-dependent
> logic with clauses, and will it be correct?
> Otherwise, it is a matter of taste, and generally, this decision is up
> to you.

I think this is an important property that the user sees the quals in
the plan in the same order as they were in the query.  And if some
transformations are applied, then the order is saved as much as
possible.  I don't think we should sacrifice this property without
strong reasons.  A bit of code complexity is definitely not that
reason for me.

> >>> We don't make array values unique.  That might make query execution
> >>> performance somewhat worse, and also makes selectivity estimation
> >>> worse.  I suggest Andrei and/or Alena should implement making array
> >>> values unique.
> >> The fix Alena has made looks correct. But I urge you to think twice:
> >> The optimizer doesn't care about duplicates, so why do we do it?
> >> What's more, this optimization is intended to speed up queries with long
> >> OR lists. Using the list_append_unique() comparator on such lists could
> >> impact performance. I suggest sticking to the common rule and leaving
> >> the responsibility on the user's shoulders.
> >
> > I don't see why the optimizer doesn't care about duplicates for OR
> > lists.  As I showed before in an example, it successfully removes the
> > duplicate.  So, currently OR transformation clearly introduces a
> > regression in terms of selectivity estimation.  I think we should
> > evade that.
> I think you are right. It is probably a better place than any other to
> remove duplicates in an array. I just think we should sort and remove
> duplicates from entry->consts in one pass. Thus, this optimisation
> should be applied to sortable constants.

Ok.

> >> At least, we should do this optimization later, in one pass, with
> >> sorting elements before building the array. But what if we don't have a
> >> sort operator for the type?
> >
> > It was probably discussed before, but can we do our work later?  There
> > is a canonicalize_qual() which calls find_duplicate_ors().  This is
> > the place where currently duplicate OR clauses are removed.  Could our
> > OR-to-ANY transformation be just another call from
> > canonicalize_qual()?
> Hmm, we already tried to do it at that point. I vaguely recall some
> issues caused by this approach. Anyway, it should be done as quickly as
> possible to increase the effect of the optimization.

I think there were provided quite strong reasons why this shouldn't be
implemented at the parse analysis stage [1], [2], [3].  The
canonicalize_qual() looks quite appropriate place for that since it
does similar transformations.

Links.
1. 
https://www.postgresql.org/message-id/CA%2BTgmoZCgP6FrBQEusn4yaWm02XU8OPeoEMk91q7PRBgwaAkFw%40mail.gmail.com
2. 
https://www.postgresql.org/message-id/CAH2-Wzm2%3Dnf_JhiM3A2yetxRs8Nd2NuN3JqH%3Dfm_YWYd1oYoPg%40mail.gmail.com
3. 
https://www.postgresql.org/message-id/CA%2BTgmoaOiwMXBBTYknczepoZzKTp-Zgk5ss1%2BCuVQE-eFTqBmA%40mail.gmail.com

------
Regards,
Alexander Korotkov

Re: POC, WIP: OR-clause support for indexes

Reply via email to