Hi Sergei, thanks for looking into this.

On Wed, Aug 28, 2019 at 7:54 PM Sergei Golubchik <s...@mariadb.org> wrote:

> Hi, Robert!
>
> On Aug 28, Robert Bindar wrote:
> > revision-id: 5ba0b11c5d1 (mariadb-10.4.7-35-g5ba0b11c5d1)
> > parent(s): 4a490d1a993
> > author: Robert Bindar <rob...@mariadb.org>
> > committer: Robert Bindar <rob...@mariadb.org>
> > timestamp: 2019-08-27 11:38:52 +0300
> > message:
> >
> > MDEV-20257 Fix crash in Grant_table_base::init_read_record
> >
> > The bug shows up when a 10.4+ server starts on a <10.4 datadir
> > with a crashed mysql.user table.
> > Grant_tables::open_and_lock tries to open the list of requested
> > tables (host, db, global_priv,..) and because global_priv doesn't
> > exist (pre-10.4 datadir), it falls back to opening mysql.user.
> > The first open_tables call for [host, db, global_priv,..] works
> > just fine. The second open_tables call (trying to open mysql.user
> > alone) sees the crashed user table and performs 3 steps:
> > - closes all the open tables
> > - attempts a table fix for mysql.user (succeeds)
> > - opens the tables initially passed as arguments
> > In an ideal world, the first step should only close the tables
> > passed as arguments (i.e. mysql.user), but for some reasons
> > close_tables_for_reopen makes a close_thread_tables call which
> > closes everything in THD::open_tables (nevertheless, this is
> > probably the intended behavior).
> > This side effect causes all the tables opened in the first
> > open_tables call to now be closed and Grant_table_base::init_read_record
> > tries to read from a released table.
> >
> > diff --git a/mysql-test/main/mdev20257.test
> b/mysql-test/main/mdev20257.test
> > new file mode 100644
> > index 00000000000..8506292d57c
> > --- /dev/null
> > +++ b/mysql-test/main/mdev20257.test
> > @@ -0,0 +1,51 @@
> > +--source include/not_embedded.inc
> > +--echo #
> > +--echo # MDEV 20257 Server crashes in Grant_table_base::init_read_record
> > +--echo # upon crash-upgrade
> > +--echo #
> > +
> > +let $MYSQLD_DATADIR= `select @@datadir`;
> > +
> > +rename table mysql.global_priv to mysql.global_priv_bak;
> > +rename table mysql.user to mysql.user_bak;
> > +rename table mysql.db to mysql.db_bak;
> > +rename table mysql.proxies_priv to mysql.proxies_priv_bak;
> > +rename table mysql.roles_mapping to mysql.roles_mapping_bak;
> > +
> > +--source include/shutdown_mysqld.inc
> > +
> > +# Bring in a crashed user table
> > +# Ideally we should've copied only the crashed mysql.user, but this
> > +# would make mysqld crash in some other place before hitting the
> > +# crash spot described in MDEV-20257 which is what we're trying to
> > +# test against.
>
> I don't understand that. Where does it crash then?
> What difference does it make that you replace all tables - are they all
> corrupted?
>
Only mysql.user is corrupted. Assuming we take out the fix in acl, if we
don't replace
the db table with the 10.2 version provided by Elena, the server crashes a
bit earlier
in lock_tables (called from Grant_tables::open_and_lock) for reasons I
don't yet understand,
my guess is it has something to do with the aria/myisam change between 10.2
and 10.4.
The host table needs to be copied because the reported crash happened when
acl_load tried
to read the host table and in 10.4 it is not there anymore.
And for proxies_priv and roles_mapping, it is enough if we just hide them
(otherwise the same
Aria crash happens).

> > +--copy_file std_data/mdev20257/user.frm $MYSQLD_DATADIR/mysql/user.frm
> > +--copy_file std_data/mdev20257/user.MYD $MYSQLD_DATADIR/mysql/user.MYD
> > +--copy_file std_data/mdev20257/user.MYI $MYSQLD_DATADIR/mysql/user.MYI
> > +
> > +--copy_file std_data/mdev20257/host.frm $MYSQLD_DATADIR/mysql/host.frm
> > +--copy_file std_data/mdev20257/host.MYD $MYSQLD_DATADIR/mysql/host.MYD
> > +--copy_file std_data/mdev20257/host.MYI $MYSQLD_DATADIR/mysql/host.MYI
> > +
> > +--copy_file std_data/mdev20257/db.frm $MYSQLD_DATADIR/mysql/db.frm
> > +--copy_file std_data/mdev20257/db.MYD $MYSQLD_DATADIR/mysql/db.MYD
> > +--copy_file std_data/mdev20257/db.MYI $MYSQLD_DATADIR/mysql/db.MYI
> > +--copy_file std_data/mdev20257/db.opt $MYSQLD_DATADIR/mysql/db.opt
> > +
> > +--source include/start_mysqld.inc
> > +
> > +call mtr.add_suppression("Table \'\..mysql\.user\' is marked as crashed
> and should be repaired");
> > +call mtr.add_suppression("Checking table:   \'\..mysql\.user\'");
> > +call mtr.add_suppression("mysql.user: 1 client is using or hasn't
> closed the table properly");
> > +call mtr.add_suppression("Missing system table mysql..*; please run
> mysql_upgrade to create it");
> > +
> > +# Cleanup
> > +drop table mysql.user;
> > +drop table mysql.db;
> > +drop table mysql.host;
> > +rename table mysql.global_priv_bak to mysql.global_priv;
> > +rename table mysql.user_bak to mysql.user;
> > +rename table mysql.db_bak to mysql.db;
> > +rename table mysql.proxies_priv_bak to mysql.proxies_priv;
> > +rename table mysql.roles_mapping_bak to mysql.roles_mapping;
> > +
> > diff --git a/mysql-test/std_data/mdev20257/db.opt
> b/mysql-test/std_data/mdev20257/db.opt
> > --- /dev/null
> > +++ b/mysql-test/std_data/mdev20257/db.opt
> > @@ -0,0 +1,2 @@
> > +default-character-set=latin1
> > +default-collation=latin1_swedish_ci
>
> why is that?
>
I agree we don't need db.opt at all.

>
> > diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc
> > index 847d2bd777b..e028d084703 100644
> > --- a/sql/sql_acl.cc
> > +++ b/sql/sql_acl.cc
> > -      DBUG_RETURN(res);
> > +    first = build_open_list(tables, which_tables, lock_type, true);
> > +
> > +    uint counter;
> > +    if (int rv= really_open(thd, first, &counter))
> > +      DBUG_RETURN(rv);
> > +
> > +    TABLE *user_table= tables[USER_TABLE].table;
> > +    if ((which_tables & Table_user) && !user_table)
> > +    {
> > +      close_thread_tables(thd);
> > +      first = build_open_list(tables, which_tables, lock_type, false);
>
> Hmm... Here you always close/reopen all tables, just to account for the
> case when the user table is found corrupted.
>
> I'd argue that a corrupted user table happens rarely (compared to the
> non-corrupted), so it makes sense to optimize for the normal use case.
>
> That is, only open the user table as before. After it's opened you
> check if other tables are still opened - if they aren't you can close
> and reopen everything. This way you only close/reopen all tables if the
> user table was corrupted.
>
> Makes sense?
>
> It makes sense. Is there any other way to check if a table is opened
except for going
through thd->open_tables and see if the table is there? Touching
TABLE_LIST::table that
we have in open_and_lock might not be a good idea if the table instance is
already released
to table cache.

> > +      if (int rv= really_open(thd, first, &counter))
> > +        DBUG_RETURN(rv);
> > +
> > +      p_user_table= &m_user_table_tabular;
> > +      user_table= tables[USER_TABLE + 1].table;
> > +    }
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and secur...@mariadb.org
>
_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Reply via email to