Re: [Maria-developers] Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/
Hello Igor, Here's the promised proof of deadcode: On Fri, Feb 11, 2011 at 08:49:37PM -0800, Igor Babaev wrote: === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 + +++ b/sql/sql_select.cc 2011-02-12 04:41:22 + @@ -8419,6 +8438,11 @@ sort_by_tab-type= JT_ALL; sort_by_tab-read_first_record= join_init_read_record; } +else if (sort_by_tab-type == JT_HASH_NEXT) +{ + sort_by_tab-type= JT_HASH; + sort_by_tab-read_first_record= join_init_read_record; +} } break; } Statement: the above else-if branch can never be taken, i.e. equality sort_by_tab-type == JT_HASH_NEXT (1) can never be true. Proof: First, let's see what JOIN_TABs can be in sort_by_tab. sort_by_tab is assigned in this statement: JOIN_TAB *sort_by_tab= join-group join-simple_group join-group_list ? join-join_tab+join-const_tables : join-get_sort_by_join_tab(); i.e. possible values are 1. join-join_tab+join-const_tables 2. join-get_sort_by_join_tab() get_sort_by_join_tab is defined in sql_select.h as: JOIN_TAB *get_sort_by_join_tab() { return (need_tmp || !sort_by_table || skip_sort_order || ((group || tmp_table_param.sum_func_count) !group_list)) ? NULL : join_tab+const_tables; } i.e. it can return: 1. NULL (which is not interesting) 2. join-join_tab + const_tables. this draws us to conclusion that sort_by_tab can be - NULL - join-join_tab + const_tables. so equality (1) can be rewritten as join-join_tab[const_tables].type == JT_HASH_NEXT (2) Now, let's explore the question, what join_tab elements can have type==JT_HASH_NEXT. The value JT_HASH_NEXT is assigned in only one place: sql_select.cc|8399| tab-type= tab-type == JT_ALL ? JT_NEXT : JT_HASH_NEXT; this line is inside the case block of code: break; case JT_ALL: case JT_HASH: tab-type= tab-type == JT_ALL ? JT_NEXT : JT_HASH_NEXT; which means that tab-type can become JT_HASH_NEXT only if it was JT_HASH before. Let's see when a join_tab can get type==JT_HASH. There's only one assignment that can assign type=JT_HASH. It is at sql_select.cc:8279 : if (tab-cache tab-cache-get_join_alg() == JOIN_CACHE::BNLH_JOIN_ALG) tab-type= JT_HASH; i.e. tab-type can become JT_HASH only if tab-cache!=NULL. tab-cache can be set to non-NULL value only inside check_join_cache_usage(). However, the first lines in check_join_cache_usage() have this code: uint i= tab - join-join_tab; join-return_tab= 0; if (cache_level == 0 || i == join-const_tables || !prev_tab) return 0; from which we can conclude that join-join_tab[join-const_tables].cache == NULL will always hold. Going back on our chain of reasoning, we get that join-join_tab[join-const_tables].cache == NULL at all times= join-join_tab[join-const_tables].type != JT_HASH at all times = join-join_tab[join-const_tables].type != JT_HASH_NEXT at all times = equality (2) is false always = original equality (1) is false always, q.e.d. BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/
On Mon, Feb 21, 2011 at 03:20:13PM -0800, Igor Babaev wrote: === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc2011-02-06 04:57:03 + +++ b/sql/sql_select.cc2011-02-12 04:41:22 + ... @@ -8419,6 +8438,11 @@ sort_by_tab-type= JT_ALL; sort_by_tab-read_first_record= join_init_read_record; } +else if (sort_by_tab-type == JT_HASH_NEXT) +{ + sort_by_tab-type= JT_HASH; + sort_by_tab-read_first_record= join_init_read_record; +} } break; } I have a question only to this part of the patch: I don't understand how the above code could work. The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway). I suppose the part you've added does something similar, but I totally fail to understand what exactly it does. The above code says: if hash join is used do not use full index scan to look through the joined table, rather use full table scan in this case. I've put a DBUG_ASSERT(0) right after the sort_by_tab-type= JT_HASH; line and ran the testsuite. The assertion didn't fire. I did this because I suspect (although can't provide a sound proof right away) that the part inside the else if () { ...} is deadcode. Do you think you could easily prove me wrong by giving a testcase? If constructing testcase is complicated, let's meet online and discuss my suspicions about the deadcode. Now, some comments on the new EXPLAIN output: ISSUE1. Consider a query: MariaDB [j1] explain select * from t3, t4 where t3.col1=t4.col2; ++-+---+--+---+---+-++--+--+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra| ++-+---+--+---+---+-++--+--+ | 1 | SIMPLE | t3| ALL | NULL | NULL | NULL | NULL | 1000 | Using where | | 1 | SIMPLE | t4| hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | ++-+---+--+---+---+-++--+--+ 2 rows in set (0.00 sec) AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use #hashkey#, without the cryptic $hj. As for # characters: EXPLAIN already shows internal tables/indexes: If you look into #maria-call log you'll see that I followed strictly Monty's instructions: to use the prefix #hash# before the index name I used to use, i.e. used_index should be substituted for #hash#used_index. The only deviation I afforded myself was: I used $hj instead of hj_key. Ok. I still think that my objections are valid and current EXPLAIN output is not the best but we could discuss/address this outside the scope of this bugfix. skip What is the index that's used for scanning the table t1? #hash#cu1 shows that hash join optimizer used access on index cu1 in its analysis, but generally it doesn't mean that full index scan on table t1 was done on index cu1, doesn't it? I would very much prefer to see #hash#cu1:USED_INDEX there (or taking into account previous suggestions, hashkey:USED_INDEX). It will also be consistent with how hash_range and hash_index_merge use that column. Yes, the scan index is missing here. I will add. OK. ISSUE4 If key and key_len use hash_part:real_part notation, i.e. use semicolon, why use underscore in hash_ALL, hash_index, etc? Won't hash:ALL, hash:index look more consistent with other columns? again: I follow here Monty's recommendations. Nothing more, nor less. BR Sergey -- Sergey Petrunia, Software Developer Monty Program AB, http://askmonty.org Blog: http://s.petrunia.net/blog ___ Mailing list: https://launchpad.net/~maria-developers Post to : maria-developers@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-developers More help : https://help.launchpad.net/ListHelp
Re: [Maria-developers] Fwd: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/
On Fri, Feb 11, 2011 at 08:49:37PM -0800, Igor Babaev wrote: Hi Sergey, Please review this patch that fixes a defect of the current implementation of the mwl #128 that never couples hash join with range/index_merge scans. The patch also adds prefix #hash# to the names of the used hash indexes. (we agreed upon this at the last optimizer meeting). Regards, Igor. Original Message Subject: [Commits] Rev 2901: BNLH algorithm always used a full table scan over the joined table in file:///home/igor/maria/maria-5.3-mwl128-hashrange/ Date: Fri, 11 Feb 2011 20:41:23 -0800 (PST) From: Igor Babaev i...@askmonty.org Reply-To: maria-developers@lists.launchpad.net To: comm...@mariadb.org At file:///home/igor/maria/maria-5.3-mwl128-hashrange/ revno: 2901 revision-id: i...@askmonty.org-20110212044122-w9n3jdk3d2ps0w3o parent: i...@askmonty.org-20110207231903-3rqbfs50d33lk3r9 committer: Igor Babaev i...@askmonty.org branch nick: maria-5.3-mwl128-hashrange timestamp: Fri 2011-02-11 20:41:22 -0800 message: BNLH algorithm always used a full table scan over the joined table even in the cases when there existed range/index-merge scans that were cheaper than the full table scan. This was a defect/bug of the implementation of mwl #128. Now hash join can work not only with full table scan of the joined table, but also with full index scan, range and index-merge scans. Accordingly, in the cases when hash join is used the column 'type' in the EXPLAINs can contain now 'hash_ALL', 'hash_index', 'hash_range' and 'hash_index_merge'. If hash join is coupled with a range/index_merge scan then the columns 'key' and 'key_len' contain info not only on the used hash index, but also on the indexes used for the scan. ... === modified file 'sql/sql_select.cc' --- a/sql/sql_select.cc 2011-02-06 04:57:03 + +++ b/sql/sql_select.cc 2011-02-12 04:41:22 + ... @@ -8419,6 +8438,11 @@ sort_by_tab-type= JT_ALL; sort_by_tab-read_first_record= join_init_read_record; } +else if (sort_by_tab-type == JT_HASH_NEXT) +{ + sort_by_tab-type= JT_HASH; + sort_by_tab-read_first_record= join_init_read_record; +} } break; } I have a question only to this part of the patch: I don't understand how the above code could work. The if statement above the one you've added checks if sort_by_tab was using [ordered] index scan to produce records in orderer, and if yes, switches it to a full scan (because use of join buffering will break ordering anyway). I suppose the part you've added does something similar, but I totally fail to understand what exactly it does. Now, some comments on the new EXPLAIN output: ISSUE1. Consider a query: MariaDB [j1] explain select * from t3, t4 where t3.col1=t4.col2; ++-+---+--+---+---+-++--+--+ | id | select_type | table | type | possible_keys | key | key_len | ref| rows | Extra| ++-+---+--+---+---+-++--+--+ | 1 | SIMPLE | t3| ALL | NULL | NULL | NULL| NULL | 1000 | Using where | | 1 | SIMPLE | t4| hash_ALL | NULL | #hash#$hj | 5 | j1.t3.col1 | 1000 | Using where; Using join buffer (flat, BNLH join) | ++-+---+--+---+---+-++--+--+ 2 rows in set (0.00 sec) AFAIU #hash#$hj means the hash key on the hashtable that's used for hash join. I think the name is too convoluted. The hashtable has only one key, if we need to tell it from t4's indexes it would be sufficient to use #hashkey#, without the cryptic $hj. As for # characters: EXPLAIN already shows internal tables/indexes: MariaDB [j1] explain select * from (select * from t1) X; ++-++--+---+--+-+--+--+---+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | ++-++--+---+--+-+--+--+---+ | 1 | PRIMARY | derived2 | ALL | NULL | NULL | NULL| NULL | 1000 | | | 2 | DERIVED | t1 | ALL | NULL | NULL | NULL| NULL | 1000 | | ++-++--+---+--+-+--+--+---+ # The following is from MWL#90 tree: MariaDB [test] explain select * from t10 where a in (select max(b) from t10 group by a);