Re: [Maria-developers] Review input for: MDEV-22534 Decorrelate IN subquery
Hi Sergei, Thanks again for your comments. I have added a comment on the jira ticket including the link to a patch that addresses your comments. I will take one more look at it tomorrow morning before sending it to you for review, but I thought maybe it makes sense to respond now (see below) given the timezone difference. > Hi Yuchen, > > Please find below review input for > > > commit ba9cf8efeb0b1e1c132d565adb79c98156783f15 > > Author: Yuchen Pei > > Date: Fri May 19 20:06:31 2023 +1000 > > > > MDEV-22534 Decorrelate IN subquery > > > > First, general questions: > > Q1: It seems there is some overlap between > Item_in_subselect::exists2in_processor() and > Item_exists_subselect::exists2in_processor() Did you consider factoring > out common code rather than copying it? (if yes, I'd like to know why > it couldn't be done). Done. I factored out the common code into three methods: - Item_exists_subselect::exists2in_prepare() - Item_exists_subselect::exists2in_create_or_update_in() - Item_exists_subselect::exists2in_and_is_not_nulls() In doing so I tried to not alter the logic of the exists2in transformation, while keeping the decorrelate-in transformation simple. For things done in exists2in but not in my decorrelate-in implementation, I add a substype() check. > Q2: Where is the coe that removes the correlated equalities from the > subquery's WHERE? (I assume Item_exists_subselect code does it?) Naively, the best place to remove the equalities would be in find_inner_outer_equalities(), when it iterates over the conds and one could simply call the remove() method on the iterator. But we can't just do that because after the call to find_inner_outer_equalities() we need to do more checks, as well as replacing the equalities with `IS NOT NULL`s in case of upper_not to ensure the 3vl value of the subquery remains the same (see my previous response about three value logic). > > diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc > > index 9e6c205ca76..8d893794da3 100644 > > --- a/sql/item_subselect.cc > > +++ b/sql/item_subselect.cc > > @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item > > **conds, > >return TRUE; > > } > > > > +/* Checks whether item tree intersects with the free list */ > > +static bool intersects_free_list(Item *item, THD *thd) > > +{ > > + for (const Item *to_find= thd->free_list; to_find; to_find= > > to_find->next) > > +if (item->walk(::find_item_processor, 1, (void *) to_find)) > > + return true; > > + return false; > > +} > > + > > Please try moving this function into a separate file. First name that came to > my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea > is that we should aim for better structure. > (This request is only valid if the factoring out suggested above doesn't > work out). Replied previously. BTW why should this function be in a separate file? Is it because it is not a method of any of the Item_subselect classes? > > +/* De-correlate where conditions in an IN subquery */ > > This needs a bigger comment describing what the rewrite does. Done and replied previously > > +bool Item_in_subselect::exists2in_processor(void *opt_arg) > > +{ > > + THD *thd= (THD *)opt_arg; > > + SELECT_LEX *first_select= unit->first_select(); > > + JOIN *join= first_select->join; > > + bool will_be_correlated; > > + Dynamic_array eqs(PSI_INSTRUMENT_MEM, 5, 5); > > + List outer; > > + Query_arena *arena= NULL, backup; > > + int res= FALSE; > > + DBUG_ENTER("Item_in_subselect::exists2in_processor"); > > + > > Please move comments out of the if statement and number the conditions. > > For examples, see best_access_path(), large if-statement starting with > "Don't test table scan if it can't be better." > or > "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause" > in Item_exists_subselect::exists2in_processor(). Done and replied previously > > + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) || > > + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */ > > + !(is_top_level_item() || > > +(upper_not && upper_not->is_top_level_item())) || > > what is the reason behind this condition? > If we're able to handle top-level NOT IN, why can't we handle subquery in any > context? > > Can we really handle the NOT IN? (I suspect not, it will suffer from the > semantics of IN subqueries with NULLs.. > I can't find the name for this, it's described e.g. here: > https://petrunia.net/2006/11/16/inany-subqueries-null-woes/ ) Done and replied previously > > > + first_select->is_part_of_union() || /* skip if part of a union */ > > + first_select->group_list.elements || /* skip if with group by */ > > + first_select->with_sum_func || /* skip if aggregation */ > > + join->having || /* skip if with having */ > > + !join->conds || /* skip if no conds */ > > + /* skip if left
Re: [Maria-developers] Review input for: MDEV-22534 Decorrelate IN subquery
Hi Sergey, Thanks for your comments. Please find some quick responses to some of the comments while they are still fresh in my mind before the weekend comes, and more to follow next week. On Thu 2023-05-25 22:14:13 +0300, Sergey Petrunia wrote: > Hi Yuchen, > > Please find below review input for > >> commit ba9cf8efeb0b1e1c132d565adb79c98156783f15 >> Author: Yuchen Pei >> Date: Fri May 19 20:06:31 2023 +1000 >> >> MDEV-22534 Decorrelate IN subquery >> > > First, general questions: > > Q1: It seems there is some overlap between > Item_in_subselect::exists2in_processor() and > Item_exists_subselect::exists2in_processor() Did you consider factoring > out common code rather than copying it? (if yes, I'd like to know why > it couldn't be done). Indeed Item_in_subselect::exists2in_processor() is modeled after Item_exists_subselect::exists2in_processor(). The issue with the latter is that it is a large function that could have grown over time and I want to build Item_in_subselect::exists2in_processor() with only what is necessary. I will try to factor out common code. Also please take a look at the added test file in the patch - I tried to come up with tests for every line, including the skip condition involving upper_not mentioned below. > > Q2: Where is the coe that removes the correlated equalities from the > subquery's WHERE? (I assume Item_exists_subselect code does it?) Like the processor for `exists`, I replaces the correlated equalities in `where` with int item 1 rather than removing them. Is it better to remove them? > >> diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc >> index 9e6c205ca76..8d893794da3 100644 >> --- a/sql/item_subselect.cc >> +++ b/sql/item_subselect.cc >> @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item **conds, >>return TRUE; >> } >> >> +/* Checks whether item tree intersects with the free list */ >> +static bool intersects_free_list(Item *item, THD *thd) >> +{ >> + for (const Item *to_find= thd->free_list; to_find; to_find= to_find->next) >> +if (item->walk(::find_item_processor, 1, (void *) to_find)) >> + return true; >> + return false; >> +} >> + > > Please try moving this function into a separate file. First name that came to > my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea > is that we should aim for better structure. > (This request is only valid if the factoring out suggested above doesn't > work out). Sanja commented on a similar commit to MDEV-31269 that this way of finding intersection is too costly. What do you think? Before moving this function, either we need to agree this is an acceptable function or figure out an acceptable way of finding free items in the item tree from the equalities. Failing both, then I suggest (as I did in a comment[1] on MDEV-22534), we mark this function as out of scope for the review of MDEV-22534 and as part of MDEV-31269 which may be fixed by MDEV-30073 (see a comment[2] in MDEV-31269). [1] https://jira.mariadb.org/browse/MDEV-22534?focusedCommentId=259406=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259406 [2] https://jira.mariadb.org/browse/MDEV-31269?focusedCommentId=259516=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-259516 > >> +/* De-correlate where conditions in an IN subquery */ > > This needs a bigger comment describing what the rewrite does. > >> +bool Item_in_subselect::exists2in_processor(void *opt_arg) >> +{ >> + THD *thd= (THD *)opt_arg; >> + SELECT_LEX *first_select= unit->first_select(); >> + JOIN *join= first_select->join; >> + bool will_be_correlated; >> + Dynamic_array eqs(PSI_INSTRUMENT_MEM, 5, 5); >> + List outer; >> + Query_arena *arena= NULL, backup; >> + int res= FALSE; >> + DBUG_ENTER("Item_in_subselect::exists2in_processor"); >> + > > Please move comments out of the if statement and number the conditions. > > For examples, see best_access_path(), large if-statement starting with > "Don't test table scan if it can't be better." > or > "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause" > in Item_exists_subselect::exists2in_processor(). > > >> + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) || >> + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */ >> + !(is_top_level_item() || >> +(upper_not && upper_not->is_top_level_item())) || > > what is the reason behind this condition? Without these conditions, the following testcase in `subslect_decorrelate_in.test` will fail (produce wrong results IIRC): --8<---cut here---start->8--- --echo # skip transformation when neither toplevel IN nor toplevel NOT IN; testcase adapated from main.subslect4 CREATE TABLE t1 ( a INT, b INT ); INSERT INTO t1 VALUES ( 1, NULL ), ( 2, NULL ); CREATE TABLE t2 ( c INT, d INT ); INSERT INTO t2 VALUES ( NULL, 3 ), ( NULL, 4 ); # not top level, upper_not is not top level SELECT * FROM t1 WHERE a NOT
[Maria-developers] Review input for: MDEV-22534 Decorrelate IN subquery
Hi Yuchen, Please find below review input for > commit ba9cf8efeb0b1e1c132d565adb79c98156783f15 > Author: Yuchen Pei > Date: Fri May 19 20:06:31 2023 +1000 > > MDEV-22534 Decorrelate IN subquery > First, general questions: Q1: It seems there is some overlap between Item_in_subselect::exists2in_processor() and Item_exists_subselect::exists2in_processor() Did you consider factoring out common code rather than copying it? (if yes, I'd like to know why it couldn't be done). Q2: Where is the coe that removes the correlated equalities from the subquery's WHERE? (I assume Item_exists_subselect code does it?) > diff --git a/sql/item_subselect.cc b/sql/item_subselect.cc > index 9e6c205ca76..8d893794da3 100644 > --- a/sql/item_subselect.cc > +++ b/sql/item_subselect.cc > @@ -3099,6 +3099,132 @@ static bool find_inner_outer_equalities(Item **conds, >return TRUE; > } > > +/* Checks whether item tree intersects with the free list */ > +static bool intersects_free_list(Item *item, THD *thd) > +{ > + for (const Item *to_find= thd->free_list; to_find; to_find= to_find->next) > +if (item->walk(::find_item_processor, 1, (void *) to_find)) > + return true; > + return false; > +} > + Please try moving this function into a separate file. First name that came to my mind: opt_subq_decorrelate.cc Feel free to suggest better names. The idea is that we should aim for better structure. (This request is only valid if the factoring out suggested above doesn't work out). > +/* De-correlate where conditions in an IN subquery */ This needs a bigger comment describing what the rewrite does. > +bool Item_in_subselect::exists2in_processor(void *opt_arg) > +{ > + THD *thd= (THD *)opt_arg; > + SELECT_LEX *first_select= unit->first_select(); > + JOIN *join= first_select->join; > + bool will_be_correlated; > + Dynamic_array eqs(PSI_INSTRUMENT_MEM, 5, 5); > + List outer; > + Query_arena *arena= NULL, backup; > + int res= FALSE; > + DBUG_ENTER("Item_in_subselect::exists2in_processor"); > + Please move comments out of the if statement and number the conditions. For examples, see best_access_path(), large if-statement starting with "Don't test table scan if it can't be better." or "EXISTS-to-IN coversion and ORDER BY ... LIMIT clause" in Item_exists_subselect::exists2in_processor(). > + if (!optimizer_flag(thd, OPTIMIZER_SWITCH_EXISTS_TO_IN) || > + /* proceed only if I'm a toplevel IN or a toplevel NOT IN */ > + !(is_top_level_item() || > +(upper_not && upper_not->is_top_level_item())) || what is the reason behind this condition? If we're able to handle top-level NOT IN, why can't we handle subquery in any context? Can we really handle the NOT IN? (I suspect not, it will suffer from the semantics of IN subqueries with NULLs.. I can't find the name for this, it's described e.g. here: https://petrunia.net/2006/11/16/inany-subqueries-null-woes/ ) > + first_select->is_part_of_union() || /* skip if part of a union */ > + first_select->group_list.elements || /* skip if with group by */ > + first_select->with_sum_func || /* skip if aggregation */ > + join->having || /* skip if with having */ > + !join->conds || /* skip if no conds */ > + /* skip if left expr is a single nonscalar subselect */ > + (left_expr->type() == Item::SUBSELECT_ITEM && > + !left_expr->type_handler()->is_scalar_type())) The above seems to be some exotic SQL. Is it something like (select col1,lol2 from t1) IN (select col3,col4 from t2 where ..) and does MariaDB really support this? Please add a comment about this. > +DBUG_RETURN(FALSE); > + /* iterate over conditions, and check whether they can be moved out. */ > + if (find_inner_outer_equalities(>conds, eqs)) > +DBUG_RETURN(FALSE); > + /* If we are in the execution of a prepared statement, check for > + intersection with the free list to avoid segfault. Note that the > + check for prepared statement execution is necessary, otherwise it > + will likely always find intersection thus abort the > + transformation. > + > + fixme: this only works when HAVE_PSI_STATEMENT_INTERFACE is > + defined. It causes CI's like amd64-debian-10-debug-embedded to > + fail. Are there other ways to find out we are in the execution of a > + prepared statement? */ I'm looking at the code of activate_stmt_arena_if_needed(). It seems "stmt_arena->is_conventional()" is the check you're looking for. > + if (thd->m_statement_state.m_parent_prepared_stmt) > + { > +for (uint i= 0; i < (uint) eqs.elements(); i++) > +{ > + if (intersects_free_list(*eqs.at(i).eq_ref, thd)) > +DBUG_RETURN(FALSE); This check doesn't look good. As far as I understand, it is a temporary measure until related re-execution bug(s) are fixed by Igor (and/or Sanja)? > +} > + } > + /* Determines whether the result will be correlated */ > + { > +List unused;