On Fri, May 24, 2019 at 04:39:51PM -0700, Igor Babaev wrote:
> On 05/24/2019 01:41 PM, Sergey Petrunia wrote:
> > On Fri, May 24, 2019 at 10:08:04AM -0700, Igor Babaev wrote:
> > > On 05/24/2019 04:03 AM, Varun Gupta wrote:
> > > > Hi Igor,
> > > > After discussing with Sergey, we came up with these conclusions as to
> > > > why we used the approach of going through all the keyuses in the KEYUSE
> > > > array
> > > >
> > > > Cases to consider:
> > > > we have an index on column a
> > > >
> > > > 1) a OP const
> > > > where OP can be ( > > > This case is already handled by the range optimizer. We do not create a
> > > > NULL rejecting predicate for such conditions.
> > > >
> > > > 2) eq_join conditions (tbl1.a = tbl2.col2)
> > > > This is the specific case we have tried to implement in MDEV-15777, we
> > > > create NULL rejecting predicates for the keyuses for each table and then
> > > > feed these predicates to range optimizer
> > > There is no logical explanation for checking null rejection of fields used
> > > in equalities with the help of array keyuse.
> > The logic here is as follows:
> >
> > We already collect the set of KEYUSE::null_rejecting attributes. Its
> > meaning is very close to what the patch needs.
> The meaning of this argument is to be used a key value for ref access by an
> index i1.
> You need range access by a different index i2. There is no logical
> connection
> between these two indexes. The usage of i1 can be blocked by IGNORE clause
> and this case
> there won't be any keyuses for it. But the equality from which the
> null-rejection can be
> deduced is still there. You need range scan to reduce the number of records
> produced
> by the left operand of a join operation. And you should not care weather it
> should be the
> argument of any ref access.
> Here is the situation:
> The field of interest for range access is t1.a and you have two conditions
>t1.a=t2.a with index i1 on t2.a
>t1.b=t3.b with index i2 on t3.b
> The use prohibited to use i1. t1.a apparently remains null-rejecting. Yet it
> cannot be not detected
> as such by a look through array of keyuses.
I don't follow the logic. What MDEV-15777 does is to basically look at KEYUSE
object, which represent
tbl.key_col = other_column
and feed this into the range optimizer:
tbl.key_col IS NOT NULL
If the query CAN use a certain index:
- we will have a KEYUSE object for it
- we will feed the IS NOT NULL into the range optimizer.
if the query CANNOT use a certain index (because of IGNORE INDEX), then
- we won't have KEYUSE object for that index (and so won't construct an
IS NOT NULL condition for it)
- range optimizer will not build ranges for it (so it doesn't really matter
if IS NOT NULL was constructed or not)
Can you provide a full example?
> >
> > Because of this, the part of the patch for MDEV-15777 which analyzes the
> > KEYUSE
> > array is very small: 122 lines including the comment (I counted
> > make_null_rejecting_conds() and add_cond()).
> >
> > I think, the CPU overhead is small, too.
> >
> > > > 3) a < func(b,c)
> > > > we do not handle this case, because:
> > > >
> > > > a) It is harder to process. We would need to process the whole WHERE
> > > > clause to infer
> > > > non-nullability of the condition. Example:
> > > >
> > > >(t1.f2+1 < t1.f1+t1.f3) OR ... OR t1.f1 is null
> > > >
> > > > here the left part of the OR condition doesn't allow f1 to be NULL, but
> > > > the
> > > > right condition allows it. We would also need to take into account
> > > > tables
> > > > inside outer joins. We would need to go through Item classes and add
> > > > code
> > > > which say "Item_xxx() will compute to false when its argument Y is null"
> > > > (there is item->not_null_tables() currently, but not
> > > > not_null_columns()).
> > > Have you actually looked at the implementations of the virtual function
> > > not_null_tables()?
> > > Do you need me to provide the implementation of the virtual function
> > > not_null_columns()?
> > It will require multiple implementations. At the moment we have:
> >
> > nm mysqld --demangle | grep 'Item.*::not_null_tables' | wc -l
> > 21
> >
> > Another question: do you intend to collect not_null_columns() for all
> > columns,
> > or just columns that are a part of some index?
> >
> > What would a good data structure to store a set of tablename.column_name ?
> It will be just a bitmap created for each table that can be used for many
> other purposes.
> If you want to filter out those that are not part of any index it can be
> easily done
> (even once for any table share)
My point was that constructing such a bitmap is a lot heavier than the
not_null_table_map().
table.h declares a field bitmap class:
typedef Bitmap Field_map;
but it's huge:
(gdb) p sizeof(Field_map)
$52 = 544
and note that we will not be able to just put one Field_map into the TABLE
object:
If we think about how to compute not_null_columns() for Item_cond_and or
Item