Re: [Maria-developers] Regarding approach used for Early NULL filtering in the range optimizer(MDEV-15777)

2019-05-27 Thread Sergey Petrunia
On Fri, May 24, 2019 at 04:39:51PM -0700, Igor Babaev wrote:
> On 05/24/2019 01:41 PM, Sergey Petrunia wrote:
> > On Fri, May 24, 2019 at 10:08:04AM -0700, Igor Babaev wrote:
> > > On 05/24/2019 04:03 AM, Varun Gupta wrote:
> > > > Hi Igor,
> > > > After discussing with Sergey, we came up with these conclusions as to
> > > > why we used the approach of going through all the keyuses in the KEYUSE
> > > > array
> > > > 
> > > > Cases to consider:
> > > > we have an index on column a
> > > > 
> > > > 1) a OP const
> > > > where OP can be ( > > > This case is already handled by the range optimizer. We do not create a
> > > > NULL rejecting predicate for such conditions.
> > > > 
> > > > 2) eq_join conditions (tbl1.a = tbl2.col2)
> > > > This is the specific case we have tried to implement in MDEV-15777, we
> > > > create NULL rejecting predicates for the keyuses for each table and then
> > > > feed these predicates to range optimizer
> > > There is no logical explanation for checking null rejection of fields used
> > > in equalities with the help of array keyuse.
> > The logic here is as follows:
> > 
> > We already collect the set of KEYUSE::null_rejecting attributes. Its
> > meaning is very close to what the patch needs.
> The meaning of this argument is to be used a key value for ref access by an
> index i1.
> You need range access by a different index i2. There is no logical
> connection
> between these two indexes. The usage of i1 can be blocked by IGNORE clause
> and this case
> there won't be any keyuses for it. But the equality from which the
> null-rejection can be
> deduced is still there. You need range scan to reduce the number of records
> produced
> by the left operand of a join operation. And you should not care weather it
> should be the
> argument of any ref access.
> Here is the situation:
> The field of interest for range access is t1.a and you have two conditions
>t1.a=t2.a with index i1 on t2.a
>t1.b=t3.b with index i2 on t3.b
> The use prohibited to use i1. t1.a apparently remains null-rejecting. Yet it
> cannot be not detected
> as such by a look through array of keyuses.

I don't follow the logic. What MDEV-15777 does is to basically look at KEYUSE
object, which represent

  tbl.key_col = other_column

and feed this into the range optimizer:
  
  tbl.key_col IS NOT NULL

If the query CAN use a certain index:
- we will have a KEYUSE object for it
- we will feed the IS NOT NULL into the range optimizer.

if the query CANNOT use a certain index (because of IGNORE INDEX), then
- we won't have KEYUSE object for that index (and so won't construct an
  IS NOT NULL condition for it)
- range optimizer will not build ranges for it (so it doesn't really matter 
  if IS NOT NULL was constructed or not)

Can you provide a full example?

> > 
> > Because of this, the part of the patch for MDEV-15777 which analyzes the 
> > KEYUSE
> > array is very small: 122 lines including the comment (I counted
> > make_null_rejecting_conds() and add_cond()).
> > 
> > I think, the CPU overhead is small, too.
> > 
> > > > 3) a < func(b,c)
> > > > we do not handle this case, because:
> > > > 
> > > > a) It is harder to process. We would need to process the whole WHERE
> > > > clause to infer
> > > > non-nullability of the condition. Example:
> > > > 
> > > >(t1.f2+1 < t1.f1+t1.f3) OR ... OR t1.f1 is null
> > > > 
> > > > here the left part of the OR condition doesn't allow f1 to be NULL, but
> > > > the
> > > > right condition allows it. We would also need to take into account 
> > > > tables
> > > > inside outer joins. We would need to go through Item classes and add 
> > > > code
> > > > which say "Item_xxx() will compute to false when its argument Y is null"
> > > > (there is item->not_null_tables() currently, but not 
> > > > not_null_columns()).
> > > Have you actually looked at the implementations of the virtual function
> > > not_null_tables()?
> > > Do you need me to provide the implementation of the virtual function
> > > not_null_columns()?
> > It will require multiple implementations. At the moment we have:
> > 
> > nm mysqld --demangle | grep 'Item.*::not_null_tables'  | wc -l
> > 21
> > 
> > Another question: do you intend to collect not_null_columns() for all 
> > columns,
> > or just columns that are a part of some index?
> > 
> > What would a good data structure to store a set of tablename.column_name ?
> It will be just a bitmap created for each table that can be used for many
> other purposes.
> If you want to filter out those that are not part of any index it can be
> easily done
> (even once for any table share)

My point was that constructing such a bitmap is a lot heavier than the
not_null_table_map().

table.h declares a field bitmap class:

  typedef Bitmap Field_map;

but it's huge:

(gdb) p sizeof(Field_map)
  $52 = 544

and note that we will not be able to just put one Field_map into the TABLE
object:

If we think about how to compute not_null_columns() for Item_cond_and or

Re: [Maria-developers] Regarding approach used for Early NULL filtering in the range optimizer(MDEV-15777)

2019-05-24 Thread Sergey Petrunia
On Fri, May 24, 2019 at 10:08:04AM -0700, Igor Babaev wrote:
> On 05/24/2019 04:03 AM, Varun Gupta wrote:
> > Hi Igor,
> > After discussing with Sergey, we came up with these conclusions as to
> > why we used the approach of going through all the keyuses in the KEYUSE
> > array
> > 
> > Cases to consider:
> > we have an index on column a
> > 
> > 1) a OP const
> > where OP can be ( > This case is already handled by the range optimizer. We do not create a
> > NULL rejecting predicate for such conditions.
> > 
> > 2) eq_join conditions (tbl1.a = tbl2.col2)
> > This is the specific case we have tried to implement in MDEV-15777, we
> > create NULL rejecting predicates for the keyuses for each table and then
> > feed these predicates to range optimizer
> There is no logical explanation for checking null rejection of fields used
> in equalities with the help of array keyuse.

The logic here is as follows:

We already collect the set of KEYUSE::null_rejecting attributes. Its 
meaning is very close to what the patch needs.

Because of this, the part of the patch for MDEV-15777 which analyzes the KEYUSE
array is very small: 122 lines including the comment (I counted
make_null_rejecting_conds() and add_cond()).

I think, the CPU overhead is small, too.

> > 
> > 3) a < func(b,c)
> > we do not handle this case, because:
> > 
> > a) It is harder to process. We would need to process the whole WHERE
> > clause to infer
> > non-nullability of the condition. Example:
> > 
> >   (t1.f2+1 < t1.f1+t1.f3) OR ... OR t1.f1 is null
> > 
> > here the left part of the OR condition doesn't allow f1 to be NULL, but
> > the
> > right condition allows it. We would also need to take into account tables
> > inside outer joins. We would need to go through Item classes and add code
> > which say "Item_xxx() will compute to false when its argument Y is null"
> > (there is item->not_null_tables() currently, but not not_null_columns()).
> Have you actually looked at the implementations of the virtual function
> not_null_tables()?
> Do you need me to provide the implementation of the virtual function
> not_null_columns()?

It will require multiple implementations. At the moment we have:

nm mysqld --demangle | grep 'Item.*::not_null_tables'  | wc -l
21

Another question: do you intend to collect not_null_columns() for all columns,
or just columns that are a part of some index?

What would a good data structure to store a set of tablename.column_name ?

> > 
> > b) the conditions in form of arbitrary functions are not as frequently
> > used as
> > ref access conditions.
> > 
> > to sum up a) and b) - doable but will require a lot of effort.
> > 
> > Do you have any specific practically relevant example in mind that we
> > should
> > handle?

BR
 Sergei
-- 
Sergei Petrunia, Software Developer
MariaDB Corporation | Skype: sergefp | Blog: http://s.petrunia.net/blog


___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp