Re: [Maria-developers] c80f867b0df: MDEV-23844 Atomic DROP TABLE

2021-05-16 Thread Michael Widenius
Hi!

On Wed, May 12, 2021 at 3:18 PM Sergei Golubchik  wrote:
>
> Hi, Michael!
>
> On May 12, Michael Widenius wrote:
> > revision-id: c80f867b0df (mariadb-10.6.0-78-gc80f867b0df)
> > parent(s): eebebe8ef75
> > author: Michael Widenius 
> > committer: Michael Widenius 
> > timestamp: 2021-05-04 21:09:11 +0300
> > message:
> >
> > MDEV-23844 Atomic DROP TABLE
> ...
> > diff --git a/sql/sql_view.cc b/sql/sql_view.cc
> > index e6d726b30d7..ff8aa55adef 100644
> > --- a/sql/sql_view.cc
> > +++ b/sql/sql_view.cc
> > @@ -1861,8 +1868,18 @@ bool mysql_drop_view(THD *thd, TABLE_LIST *views, 
> > enum_drop_mode drop_mode)
> >  not_exists_count++;
> >continue;
> >  }
> > +if (!first_table++)
>
> yuck. so your "first_table" actually means "not_first_view",
> because when it's FALSE, it's the first view.
>
> why not to rename it to "not_first_view" ?

I have renamed it to view_count.
I prefer to use a counter as it is easer to set and test at the same time.
Also, when debugging it also helps to see where one is.

> > +{
> > +  if (ddl_log_drop_view_init(thd, _log_state))
> > +DBUG_RETURN(TRUE);
>
> what about a crash here? the first entry is written, the second is not.

The init creates the bases for a set of views to drop.
We do init once and then we only need to do ddl_log_drop_view for the rest.

Most ddl_log operations has only one call.  Only those that can have
multiple operations,
like drop or rename has an init call.

> > +}
> > +if (ddl_log_drop_view(thd, _log_state, , >db,
> > +  >table_name))
> > +  DBUG_RETURN(TRUE);



> > diff --git a/sql/log.cc b/sql/log.cc
> > index 6e0a9706ec8..dba999054e4 100644
> > --- a/sql/log.cc
> > +++ b/sql/log.cc
> > @@ -10471,16 +10483,16 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const 
> > char *last_log_name,
> >bool last_gtid_standalone= false;
> >bool last_gtid_valid= false;
> >  #endif
> > +  DBUG_ENTER("TC_LOG_BINLOG::recover");
> >
> >if (! fdle->is_valid() ||
> > -  (do_xa && my_hash_init(key_memory_binlog_recover_exec, ,
> > - _charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> > - sizeof(my_xid), 0, 0, MYF(0
> > +  (my_hash_init(key_memory_binlog_recover_exec, ,
> > +_charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> > +sizeof(my_xid), 0, 0, MYF(0
> >  goto err1;
> >
> > -  if (do_xa)
> > -init_alloc_root(key_memory_binlog_recover_exec, _root,
> > -TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0));
> > +  init_alloc_root(key_memory_binlog_recover_exec, _root,
> > +  TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0));
>
> I don't think these two changes are correct. do_xa is TRUE on crash recovery
> and FALSE on gtid state recovery on 5.5->10.0 upgade or if gtid state
> file was deleted.

The problem is that do_xa is not always true on crash recovery.

For example if the following is not true in do_binlog_recovery:

if (ev->flags & LOG_EVENT_BINLOG_IN_USE_F)

It will execute binglog recovery later with do_xa_recovery == false trough
the following code inTC_LOG_BINLOG::open(const char *opt_name)

if (!binlog_state_recover_done)
{
  binlog_state_recover_done= true;
  if (do_binlog_recovery(opt_bin_logname, false))
DBUG_RETURN(1);
}

Here is a trace:
#0  MYSQL_BIN_LOG::recover (this=0x2783d80 ,
linfo=0x7fff8590, last_log_name=0x7fff8100
"./master-bin.01", first_log=0x7fff8410, fdle=0x35685a8,
do_xa=false) at /my/maria-10.6/sql/log.cc:10474
#1  0x00dc4c10 in MYSQL_BIN_LOG::do_binlog_recovery
(this=0x2783d80 , opt_name=0x319d098 "master-bin",
do_xa_recovery=false) at /my/maria-10.6/sql/log.cc:10779
#2  0x00db0c90 in MYSQL_BIN_LOG::open (this=0x2783d80
, log_name=0x319d098 "master-bin", new_name=0x0,
next_log_number=0, io_cache_type_arg=WRITE_CACHE,
max_size_arg=1073741824, null_created_arg=false, need_mutex=true) at
/my/maria-10.6/sql/log.cc:3617
#3  0x007c0d24 in init_server_components () at

We also have to be able to collect xa events if someone has deleted the
gtid state file for ddl_recovery to work.

> There is no point to initialize the xid hash or mem_root,
> as it's not a crash recovery and there can be no incomplete transactions
> or ddl statements.

> Yes, I've seen the commit comment:
>
> > - TC_LOG_BINLOG::recover() was changed to always collect Xid's from the
> >   binary log and always call ddl_log_close_binlogged_events(). This was
> >   needed to be able to collect DROP TABLE events with embedded Xid's
> >   (used by ddl log).
>
> but I cannot understand how it applies here.

The issue is that because of ddl log, ::recover must ALWAYS collect
binlog 'xid's to be able to
do ddl log recovery at all.

You can verify this by reversing the patch and do:
mtr atomic.create_view

This will fail with a different result that shows that ddl recovery
was reverting things wrongly
because it 

Re: [Maria-developers] c80f867b0df: MDEV-23844 Atomic DROP TABLE

2021-05-12 Thread Sergei Golubchik
Hi, Michael!

On May 12, Michael Widenius wrote:
> revision-id: c80f867b0df (mariadb-10.6.0-78-gc80f867b0df)
> parent(s): eebebe8ef75
> author: Michael Widenius 
> committer: Michael Widenius 
> timestamp: 2021-05-04 21:09:11 +0300
> message:
> 
> MDEV-23844 Atomic DROP TABLE
...
> diff --git a/sql/sql_view.cc b/sql/sql_view.cc
> index e6d726b30d7..ff8aa55adef 100644
> --- a/sql/sql_view.cc
> +++ b/sql/sql_view.cc
> @@ -1861,8 +1868,18 @@ bool mysql_drop_view(THD *thd, TABLE_LIST *views, 
> enum_drop_mode drop_mode)
>  not_exists_count++;
>continue;
>  }
> +if (!first_table++)

yuck. so your "first_table" actually means "not_first_view",
because when it's FALSE, it's the first view.

why not to rename it to "not_first_view" ?

> +{
> +  if (ddl_log_drop_view_init(thd, _log_state))
> +DBUG_RETURN(TRUE);

what about a crash here? the first entry is written, the second is not.

> +}
> +if (ddl_log_drop_view(thd, _log_state, , >db,
> +  >table_name))
> +  DBUG_RETURN(TRUE);
> +debug_crash_here("ddl_log_drop_before_delete_view");
>  if (unlikely(mysql_file_delete(key_file_frm, path, MYF(MY_WME
>delete_error= TRUE;
> +debug_crash_here("ddl_log_drop_after_delete_view");
>  
>  some_views_deleted= TRUE;
>  
> diff --git a/sql/log.cc b/sql/log.cc
> index 6e0a9706ec8..dba999054e4 100644
> --- a/sql/log.cc
> +++ b/sql/log.cc
> @@ -10471,16 +10483,16 @@ int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const 
> char *last_log_name,
>bool last_gtid_standalone= false;
>bool last_gtid_valid= false;
>  #endif
> +  DBUG_ENTER("TC_LOG_BINLOG::recover");
>  
>if (! fdle->is_valid() ||
> -  (do_xa && my_hash_init(key_memory_binlog_recover_exec, ,
> - _charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> - sizeof(my_xid), 0, 0, MYF(0
> +  (my_hash_init(key_memory_binlog_recover_exec, ,
> +_charset_bin, TC_LOG_PAGE_SIZE/3, 0,
> +sizeof(my_xid), 0, 0, MYF(0
>  goto err1;
>  
> -  if (do_xa)
> -init_alloc_root(key_memory_binlog_recover_exec, _root,
> -TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0));
> +  init_alloc_root(key_memory_binlog_recover_exec, _root,
> +  TC_LOG_PAGE_SIZE, TC_LOG_PAGE_SIZE, MYF(0));

I don't think these two changes are correct. do_xa is TRUE on crash recovery
and FALSE on gtid state recovery on 5.5->10.0 upgade or if gtid state
file was deleted. There is no point to initialize the xid hash or mem_root,
as it's not a crash recovery and there can be no incomplete transactions
or ddl statements.

Yes, I've seen the commit comment:

> - TC_LOG_BINLOG::recover() was changed to always collect Xid's from the
>   binary log and always call ddl_log_close_binlogged_events(). This was
>   needed to be able to collect DROP TABLE events with embedded Xid's
>   (used by ddl log).

but I cannot understand how it applies here.

>fdle->flags&= ~LOG_EVENT_BINLOG_IN_USE_F; // abort on the first error
>  
> diff --git a/sql/sql_table.cc b/sql/sql_table.cc
> index 7a88ffdfd1b..0a83396ae14 100644
> --- a/sql/sql_table.cc
> +++ b/sql/sql_table.cc
> @@ -1363,6 +1373,17 @@ int mysql_rm_table_no_locks(THD *thd, TABLE_LIST 
> *tables, bool if_exists,
>  
>  thd->replication_flags= 0;
>  was_view= table_type == TABLE_TYPE_VIEW;
> +
> +if (!first_table++)

first_table -> not_first_table

> +{
> +  LEX_CSTRING comment= {comment_start, (size_t) comment_len};
> +  if (ddl_log_drop_table_init(thd, _log_state, ))
> +  {
> +error= 1;
> +goto err;
> +  }
> +}
> +
>  if ((table_type == TABLE_TYPE_UNKNOWN) || (was_view && !drop_view) ||
>  (table_type != TABLE_TYPE_SEQUENCE && drop_sequence))
>  {
> @@ -1565,6 +1611,7 @@ int mysql_rm_table_no_locks(THD *thd, TABLE_LIST 
> *tables, bool if_exists,
>  table_name.str, (uint)table_name.length);
>mysql_audit_drop_table(thd, table);
>  }
> +ddl_log_update_phase(_log_state, DDL_DROP_PHASE_COLLECT);

better call it DDL_DROP_PHASE_BINLOG. It would've saved me ~10 minutes
reading the code back and forth, as ""DDL_DROP_PHASE_BINLOG" immediately
explains what this phase is for.

>  
>  if (!dont_log_query &&
>  (!error || table_dropped || non_existing_table_error(error)))
> diff --git a/sql/ddl_log.h b/sql/ddl_log.h
> index 316e6708f22..b843404fc3c 100644
> --- a/sql/ddl_log.h
> +++ b/sql/ddl_log.h
> @@ -97,6 +101,14 @@ enum enum_ddl_log_rename_table_phase {
>DDL_RENAME_PHASE_TABLE,
>  };
>  
> +enum enum_ddl_log_drop_table_phase {
> +  DDL_DROP_PHASE_TABLE=0,
> +  DDL_DROP_PHASE_TRIGGER,

no phase for deleting data from stat tables?

> +  DDL_DROP_PHASE_COLLECT,
> +  DDL_DROP_PHASE_RESET,  /* Reset found list of dropped tables */

this one is unused
(also unused in further patches, I've checked)

> +  DDL_DROP_PHASE_END
> +};