Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-21 Thread Jan Lindström
Hi Sergei,

Your suggestion does not work. There are more than one problem

(1) wsrep_abort_transaction does not release MDL-lock
(2) innobase_kill_one_trx crashes at wsrep->abort_pre_commit() because
transaction registered inside wsrep has disappeared (this does not happen
if THD::LOCK_thd_data is locked)

==> I will use Seppo's KILL as TOI and remove all unrelated changes from
mdl.cc and wsrep_close_connections, they are not related to problems we
need to fix.
Let's take one problem at a time. TOI is a very powerful solution to the
problem we are trying to fix.

R: Jan

On Thu, Oct 21, 2021 at 7:52 AM Jan Lindström 
wrote:

> Hi Sergei,
>
> This does not seem to work. Consider following:
>
> CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB;
> INSERT INTO t1 VALUES (1);
> connection node_2;
> SET AUTOCOMMIT=OFF;
> START TRANSACTION;
> INSERT INTO t1 VALUES (2);
> connection node_2a;
> ALTER TABLE t1 ADD COLUMN f2 INTEGER, LOCK=EXCLUSIVE;
>
> Problem seems to be the fact that the MDL-lock acquired by thread
> executing INSERT that thread executing ALTER wants to be released by
> killing holder is
> not released. We do kill query inside InnoDB.  This MDL-code is not
> familiar to me and I do not yet understand why MDL-lock is not released
>
> R: Jan
>
> On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik  wrote:
>
>> Hi, Jan!
>>
>> Here's an idea of the fix:
>>
>> Let's always use the KILL mutex locking order, that is
>>
>>   victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
>>
>> For this we need to fix wsrep_abort_transaction(), which is called from
>> the
>> server, and wsrep_innobase_kill_one_trx(), which is called from BF
>> thread.
>>
>> wsrep_abort_transaction() can be fixed by not invoking
>> wsrep_innobase_kill_one_trx() and always using KILL code path (that is
>> wsrep_thd_awake) and forcing rollback after the kill.
>>
>> wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
>> at all, just don't lock it. We know that the victim waits on a lock
>> inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
>> cannot go away, cannot modify its data, it cannot do anything. So,
>> LOCK_thd_data doesn't seem to be necessary at that point.
>>
>> I've attached a demo patch. It compiles, but I didn't try to run it,
>> it's only to show the idea, not a working fix (I already suspect I
>> removed too much from wsrep_abort_transaction()). Note it's the patch
>> for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.
>>
>> On Oct 12, Jan Lindström wrote:
>> > Hi Sergei,
>> >
>> > Update on wsrep_close_connections problem. My suggestion to fix this
>> issue
>> > is on
>> >
>> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
>> >
>> > If you have a better solution, please advise.
>> >
>> > R: Jan
>>
>> Regards,
>> Sergei
>> VP of MariaDB Server Engineering
>> and secur...@mariadb.org
>>
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-20 Thread Jan Lindström
Hi Sergei,

This does not seem to work. Consider following:

CREATE TABLE t1 (id INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1);
connection node_2;
SET AUTOCOMMIT=OFF;
START TRANSACTION;
INSERT INTO t1 VALUES (2);
connection node_2a;
ALTER TABLE t1 ADD COLUMN f2 INTEGER, LOCK=EXCLUSIVE;

Problem seems to be the fact that the MDL-lock acquired by thread executing
INSERT that thread executing ALTER wants to be released by killing holder is
not released. We do kill query inside InnoDB.  This MDL-code is not
familiar to me and I do not yet understand why MDL-lock is not released

R: Jan

On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik  wrote:

> Hi, Jan!
>
> Here's an idea of the fix:
>
> Let's always use the KILL mutex locking order, that is
>
>   victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
>
> For this we need to fix wsrep_abort_transaction(), which is called from the
> server, and wsrep_innobase_kill_one_trx(), which is called from BF
> thread.
>
> wsrep_abort_transaction() can be fixed by not invoking
> wsrep_innobase_kill_one_trx() and always using KILL code path (that is
> wsrep_thd_awake) and forcing rollback after the kill.
>
> wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
> at all, just don't lock it. We know that the victim waits on a lock
> inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
> cannot go away, cannot modify its data, it cannot do anything. So,
> LOCK_thd_data doesn't seem to be necessary at that point.
>
> I've attached a demo patch. It compiles, but I didn't try to run it,
> it's only to show the idea, not a working fix (I already suspect I
> removed too much from wsrep_abort_transaction()). Note it's the patch
> for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.
>
> On Oct 12, Jan Lindström wrote:
> > Hi Sergei,
> >
> > Update on wsrep_close_connections problem. My suggestion to fix this
> issue
> > is on
> >
> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
> >
> > If you have a better solution, please advise.
> >
> > R: Jan
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-19 Thread Jan Lindström
Hi Sergei,

I have implemented PlanE as agreed on
branch bb-10.2-MDEV-25114-planE-galera and mostly regression testing looks
promising. However,
I have problems with MDL-locks. For example test case
galera.galera_toi_lock_exclusive hangs and I have not yet found out why. I
will ask
help from Seppo.

R: Jan

On Fri, Oct 15, 2021 at 11:36 AM Sergei Golubchik  wrote:

> Hi, Jan!
>
> On Oct 15, Jan Lindström wrote:
> > Few questions:
> >
> > (1) Is this review for a full patch or just problems on
> > wsrep_abort_transaction ?
>
> a full patch
>
> > (2) In case at wsrep_abort_transaction we do not have a transaction idea
> is
> > that we do not anymore want to enter InnoDB i.e. innobase_kill_query,
> > that is the reason we set MUST_ABORT to wsrep_conflict_state so that no
> > more mutexes from InnoDB is acquired.
>
> right, sorry. I thought I've removed too much from
> wsrep_abort_transaction.
>
> > (3) In wsrep_close_connections code used LOCK_thread_lock, is that
> > enough to protect THDs on list from concurrent disconnect and delete?
> > (I added locking for THD::LOCK_thd_data but I was not sure is it
> > necessary)
>
> Isn't it a separate issue?
>
> Anyway, as far as I understand, LOCK_thread_count protects the list of
> threads, a THD cannot be added to or deleted from the list.
>
> but a THD can be disconnected or killed and it'll wait somewhere at the
> very end of the destructor to remove itself from the list.
> So, yes, you need to take LOCK_thd_data or LOCK_thd_kill too.
>
> > (4) Do we still keep manual KILL as TOI ?
>
> Not in my patch, no.
>
> > (5) What I do with thr_lock.c there we pass victim THD pointer to
> > wsrep_abort_thd and it is as you pointed out mostly unprotected on
> > that code, LOCK TABLE is TOI
>
> It's fine. Same logic as in wsrep_innobase_kill_one_trx().
> You kill a victim thd that owns thr locks. This THD cannot just
> disappear, it has to release those locks first. And you own the
> lock->mutex. So the THD isn't going anywhere until you return from
> thr_lock().
>
> > (6) Please note that fix will not be ready 19th of October as we again
> > change critical path, Ramesh will run QA with pquery
>
> sure
>
> > On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik 
> wrote:
> > > Hi, Jan!
> > >
> > > Here's an idea of the fix:
> > >
> > > Let's always use the KILL mutex locking order, that is
> > >
> > >   victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
> > >
> > > For this we need to fix wsrep_abort_transaction(), which is called
> from the
> > > server, and wsrep_innobase_kill_one_trx(), which is called from BF
> > > thread.
> > >
> > > wsrep_abort_transaction() can be fixed by not invoking
> > > wsrep_innobase_kill_one_trx() and always using KILL code path (that is
> > > wsrep_thd_awake) and forcing rollback after the kill.
> > >
> > > wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
> > > at all, just don't lock it. We know that the victim waits on a lock
> > > inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
> > > cannot go away, cannot modify its data, it cannot do anything. So,
> > > LOCK_thd_data doesn't seem to be necessary at that point.
> > >
> > > I've attached a demo patch. It compiles, but I didn't try to run it,
> > > it's only to show the idea, not a working fix (I already suspect I
> > > removed too much from wsrep_abort_transaction()). Note it's the patch
> > > for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.
> > >
> > > On Oct 12, Jan Lindström wrote:
> > > > Hi Sergei,
> > > >
> > > > Update on wsrep_close_connections problem. My suggestion to fix this
> > > issue
> > > > is on
> > > >
> > >
> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
> > > >
> > > > If you have a better solution, please advise.
> > > >
> > > > R: Jan
> > >
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-15 Thread Sergei Golubchik
Hi, Jan!

On Oct 15, Jan Lindström wrote:
> Few questions:
> 
> (1) Is this review for a full patch or just problems on
> wsrep_abort_transaction ?

a full patch

> (2) In case at wsrep_abort_transaction we do not have a transaction idea is
> that we do not anymore want to enter InnoDB i.e. innobase_kill_query,
> that is the reason we set MUST_ABORT to wsrep_conflict_state so that no
> more mutexes from InnoDB is acquired.

right, sorry. I thought I've removed too much from
wsrep_abort_transaction.

> (3) In wsrep_close_connections code used LOCK_thread_lock, is that
> enough to protect THDs on list from concurrent disconnect and delete?
> (I added locking for THD::LOCK_thd_data but I was not sure is it
> necessary)

Isn't it a separate issue?

Anyway, as far as I understand, LOCK_thread_count protects the list of
threads, a THD cannot be added to or deleted from the list.

but a THD can be disconnected or killed and it'll wait somewhere at the
very end of the destructor to remove itself from the list.
So, yes, you need to take LOCK_thd_data or LOCK_thd_kill too.

> (4) Do we still keep manual KILL as TOI ?

Not in my patch, no.

> (5) What I do with thr_lock.c there we pass victim THD pointer to
> wsrep_abort_thd and it is as you pointed out mostly unprotected on
> that code, LOCK TABLE is TOI

It's fine. Same logic as in wsrep_innobase_kill_one_trx().
You kill a victim thd that owns thr locks. This THD cannot just
disappear, it has to release those locks first. And you own the
lock->mutex. So the THD isn't going anywhere until you return from
thr_lock().

> (6) Please note that fix will not be ready 19th of October as we again
> change critical path, Ramesh will run QA with pquery

sure

> On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik  wrote:
> > Hi, Jan!
> >
> > Here's an idea of the fix:
> >
> > Let's always use the KILL mutex locking order, that is
> >
> >   victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
> >
> > For this we need to fix wsrep_abort_transaction(), which is called from the
> > server, and wsrep_innobase_kill_one_trx(), which is called from BF
> > thread.
> >
> > wsrep_abort_transaction() can be fixed by not invoking
> > wsrep_innobase_kill_one_trx() and always using KILL code path (that is
> > wsrep_thd_awake) and forcing rollback after the kill.
> >
> > wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
> > at all, just don't lock it. We know that the victim waits on a lock
> > inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
> > cannot go away, cannot modify its data, it cannot do anything. So,
> > LOCK_thd_data doesn't seem to be necessary at that point.
> >
> > I've attached a demo patch. It compiles, but I didn't try to run it,
> > it's only to show the idea, not a working fix (I already suspect I
> > removed too much from wsrep_abort_transaction()). Note it's the patch
> > for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.
> >
> > On Oct 12, Jan Lindström wrote:
> > > Hi Sergei,
> > >
> > > Update on wsrep_close_connections problem. My suggestion to fix this
> > issue
> > > is on
> > >
> > https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
> > >
> > > If you have a better solution, please advise.
> > >
> > > R: Jan
> >
Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-14 Thread Jan Lindström
Hi,

Few questions:

(1) Is this review for a full patch or just problems on
wsrep_abort_transaction ?
(2) In case at wsrep_abort_transaction we do not have a transaction idea is
that we do not anymore want to enter InnoDB i.e. innobase_kill_query,
that is the reason we set MUST_ABORT to wsrep_conflict_state so that no
more mutexes from InnoDB is acquired.
(3) In wsrep_close_connections code used LOCK_thread_lock, is that enough
to protect THDs on list from concurrent disconnect and delete? (I added
locking
for THD::LOCK_thd_data but I was not sure is it necessary)
(4) Do we still keep manual KILL as TOI ?
(5) What I do with thr_lock.c there we pass victim THD pointer to
wsrep_abort_thd and it is as you pointed out mostly unprotected on that
code,
LOCK TABLE is TOI
(6) Please note that fix will not be ready 19th of October as we again
change critical path, Ramesh will run QA with pquery

On Thu, Oct 14, 2021 at 9:49 PM Sergei Golubchik  wrote:

> Hi, Jan!
>
> Here's an idea of the fix:
>
> Let's always use the KILL mutex locking order, that is
>
>   victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex
>
> For this we need to fix wsrep_abort_transaction(), which is called from the
> server, and wsrep_innobase_kill_one_trx(), which is called from BF
> thread.
>
> wsrep_abort_transaction() can be fixed by not invoking
> wsrep_innobase_kill_one_trx() and always using KILL code path (that is
> wsrep_thd_awake) and forcing rollback after the kill.
>
> wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
> at all, just don't lock it. We know that the victim waits on a lock
> inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
> cannot go away, cannot modify its data, it cannot do anything. So,
> LOCK_thd_data doesn't seem to be necessary at that point.
>
> I've attached a demo patch. It compiles, but I didn't try to run it,
> it's only to show the idea, not a working fix (I already suspect I
> removed too much from wsrep_abort_transaction()). Note it's the patch
> for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.
>
> On Oct 12, Jan Lindström wrote:
> > Hi Sergei,
> >
> > Update on wsrep_close_connections problem. My suggestion to fix this
> issue
> > is on
> >
> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
> >
> > If you have a better solution, please advise.
> >
> > R: Jan
>
> Regards,
> Sergei
> VP of MariaDB Server Engineering
> and secur...@mariadb.org
>
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-14 Thread Sergei Golubchik
Hi, Jan!

Here's an idea of the fix:

Let's always use the KILL mutex locking order, that is

  victim_thread->LOCK_thd_data -> lock_sys->mutex -> victim_trx->mutex

For this we need to fix wsrep_abort_transaction(), which is called from the
server, and wsrep_innobase_kill_one_trx(), which is called from BF
thread.

wsrep_abort_transaction() can be fixed by not invoking
wsrep_innobase_kill_one_trx() and always using KILL code path (that is
wsrep_thd_awake) and forcing rollback after the kill.

wsrep_innobase_kill_one_trx() can be fixed by not locking LOCK_thd_data
at all, just don't lock it. We know that the victim waits on a lock
inside InnoDB and we've locked trx mutex and lock_sys mutex. The victim
cannot go away, cannot modify its data, it cannot do anything. So,
LOCK_thd_data doesn't seem to be necessary at that point.

I've attached a demo patch. It compiles, but I didn't try to run it,
it's only to show the idea, not a working fix (I already suspect I
removed too much from wsrep_abort_transaction()). Note it's the patch
for 10.2 at the commit 29bbcac0ee8^ - that is one commit before my fix.

On Oct 12, Jan Lindström wrote:
> Hi Sergei,
> 
> Update on wsrep_close_connections problem. My suggestion to fix this issue
> is on
> https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d
> 
> If you have a better solution, please advise.
> 
> R: Jan
 
Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org
diff --git a/include/mysql/service_wsrep.h b/include/mysql/service_wsrep.h
index 54b9eb9250c..3f397276498 100644
--- a/include/mysql/service_wsrep.h
+++ b/include/mysql/service_wsrep.h
@@ -91,7 +91,7 @@ extern struct wsrep_service_st {
   enum wsrep_trx_status   (*wsrep_run_wsrep_commit_func)(THD *thd, bool 
all);
   void(*wsrep_thd_LOCK_func)(THD *thd);
   void(*wsrep_thd_UNLOCK_func)(THD *thd);
-  void(*wsrep_thd_awake_func)(THD *thd, my_bool 
signal);
+  void(*wsrep_thd_awake_func)(THD *thd, my_bool 
signal, my_bool sync);
   enum wsrep_conflict_state   (*wsrep_thd_conflict_state_func)(MYSQL_THD, 
my_bool);
   const char *(*wsrep_thd_conflict_state_str_func)(THD *thd);
   enum wsrep_exec_mode(*wsrep_thd_exec_mode_func)(THD *thd);
@@ -139,7 +139,7 @@ extern struct wsrep_service_st {
 #define wsrep_run_wsrep_commit(T,A) 
wsrep_service->wsrep_run_wsrep_commit_func(T,A)
 #define wsrep_thd_LOCK(T) wsrep_service->wsrep_thd_LOCK_func(T)
 #define wsrep_thd_UNLOCK(T) wsrep_service->wsrep_thd_UNLOCK_func(T)
-#define wsrep_thd_awake(T,S) wsrep_service->wsrep_thd_awake_func(T,S)
+#define wsrep_thd_awake(T,S,L) wsrep_service->wsrep_thd_awake_func(T,S,L)
 #define wsrep_thd_conflict_state(T,S) 
wsrep_service->wsrep_thd_conflict_state_func(T,S)
 #define wsrep_thd_conflict_state_str(T) 
wsrep_service->wsrep_thd_conflict_state_str_func(T)
 #define wsrep_thd_exec_mode(T) wsrep_service->wsrep_thd_exec_mode_func(T)
@@ -220,7 +220,7 @@ void wsrep_lock_rollback();
 void wsrep_post_commit(THD* thd, bool all);
 void wsrep_thd_LOCK(THD *thd);
 void wsrep_thd_UNLOCK(THD *thd);
-void wsrep_thd_awake(THD *thd, my_bool signal);
+void wsrep_thd_awake(THD *thd, my_bool signal, my_bool lock);
 void wsrep_thd_set_conflict_state(THD *thd, enum wsrep_conflict_state state);
 bool wsrep_thd_ignore_table(THD *thd);
 void wsrep_unlock_rollback();
diff --git a/sql/sql_class.cc b/sql/sql_class.cc
index 92736eacee2..e248f644a0a 100644
--- a/sql/sql_class.cc
+++ b/sql/sql_class.cc
@@ -1693,7 +1693,9 @@ void THD::awake(killed_state state_to_set)
   DBUG_PRINT("enter", ("this: %p current_thd: %p  state: %d",
this, current_thd, (int) state_to_set));
   THD_CHECK_SENTRY(this);
-  mysql_mutex_assert_owner(_thd_data);
+
+  if (!WSREP_ON || !wsrep_thd_is_BF(current_thd, FALSE))
+mysql_mutex_assert_owner(_thd_data);
 
   print_aborted_warning(3, "KILLED");
 
diff --git a/sql/wsrep_dummy.cc b/sql/wsrep_dummy.cc
index 53941c06892..7973b325b68 100644
--- a/sql/wsrep_dummy.cc
+++ b/sql/wsrep_dummy.cc
@@ -83,7 +83,7 @@ void wsrep_thd_LOCK(THD *)
 void wsrep_thd_UNLOCK(THD *)
 { }
 
-void wsrep_thd_awake(THD *, my_bool)
+void wsrep_thd_awake(THD *, my_bool, my_bool)
 { }
 
 const char *wsrep_thd_conflict_state_str(THD *)
diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
index d392d1c2a61..ff503ddf82b 100644
--- a/sql/wsrep_mysqld.cc
+++ b/sql/wsrep_mysqld.cc
@@ -2758,13 +2758,15 @@ extern "C" void wsrep_thd_set_wsrep_last_query_id(THD 
*thd, query_id_t id)
 }
 
 
-extern "C" void wsrep_thd_awake(THD *thd, my_bool signal)
+extern "C" void wsrep_thd_awake(THD *thd, my_bool signal, my_bool sync)
 {
   if (signal)
   {
-mysql_mutex_lock(>LOCK_thd_data);
+if (sync)
+  mysql_mutex_lock(>LOCK_thd_data);
 thd->awake(KILL_QUERY);
-mysql_mutex_unlock(>LOCK_thd_data);
+if (sync)
+  mysql_mutex_unlock(>LOCK_thd_data);
   }
   else
   {
diff --git 

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-12 Thread Jan Lindström
Hi Sergei,

Update on wsrep_close_connections problem. My suggestion to fix this issue
is on
https://github.com/MariaDB/server/commit/99cbe03a44cc95e6f548550df51e7201ebea3b9d

If you have a better solution, please advise.

R: Jan


On Mon, Oct 11, 2021 at 12:52 PM Jan Lindström 
wrote:

> Hi Sergei,
>
> After QA runs done by Ramesh, we now know the latest fix candidate i.e.
> what is in bb-10.2-MDEV-25114-galera-v2 is incorrect. Problem is in
> wsrep_close_connections() as it holds LOCK_thread_count while it does
> abort_replicated that will call wsrep_abort_transaction and there we use
> find_thread_by_id that would also take LOCK_thread_count. As there is
> another code path here, the problem is not easily fixed. We can't just
> release LOCK_thread_count at wsrep_close_connections as we iterate the
> thread list.
>
> I must say I'm not sure what to do now.
>
> (gdb) bt
> #0  __pthread_kill (threadid=, signo=signo@entry=6) at
> ../sysdeps/unix/sysv/linux/pthread_kill.c:56
> #1  0x56333963a2e8 in my_write_core (sig=sig@entry=6) at
> /test/mtest/10.2_dbg/mysys/stacktrace.c:382
> #2  0x563338f2993d in handle_fatal_signal (sig=6) at
> /test/mtest/10.2_dbg/sql/signal_handler.cc:355
> #3  
> #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #5  0x14b799572859 in __GI_abort () at abort.c:79
> #6  0x56333963edc4 in safe_mutex_lock (mp=0x563339e47220
> , my_flags=my_flags@entry=0,
> file=file@entry=0x5633396d9050
> "/test/mtest/10.2_dbg/sql/sql_parse.cc", line=line@entry=8902)
> at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
> #7  0x563338d1aea7 in inline_mysql_mutex_lock (src_line=8902,
> src_file=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc",
> that=) at
> /test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
> #8  find_thread_by_id (id=id@entry=48, query_id=query_id@entry=false) at
> /test/mtest/10.2_dbg/sql/sql_parse.cc:8902
> #9  0x563339134c39 in wsrep_abort_transaction (hton=,
> bf_thd=0x14b68d90, victim_thd=,
> signal=) at
> /test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19821
> #10 0x563338f38cbf in ha_abort_transaction 
> (bf_thd=bf_thd@entry=0x14b68d90,
> victim_thd=victim_thd@entry=0x14b68d90,
> signal=signal@entry=1 '\001') at
> /test/mtest/10.2_dbg/sql/handler.cc:6327
> #11 0x563338ebed6d in wsrep_abort_thd 
> (bf_thd_ptr=bf_thd_ptr@entry=0x14b68d90,
> victim_thd_ptr=victim_thd_ptr@entry=0x14b68d90,
> signal=signal@entry=1 '\001') at
> /test/mtest/10.2_dbg/sql/wsrep_thd.cc:832
> #12 0x563338eaa2bd in abort_replicated (thd=thd@entry=0x14b68d90)
> at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2269
> #13 0x563338eae097 in wsrep_close_client_connections
> (wait_to_end=wait_to_end@entry=1 '\001',
> except_caller_thd=except_caller_thd@entry=0x0) at
> /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2437
> #14 0x563338eaedf6 in wsrep_stop_replication (thd=thd@entry=0x0) at
> /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:962
> #15 0x563338c543d8 in kill_server (sig_ptr=sig_ptr@entry=0x0) at
> /test/mtest/10.2_dbg/sql/mysqld.cc:2009
> #16 0x563338c558d5 in kill_server_thread (arg=) at
> /test/mtest/10.2_dbg/sql/mysqld.cc:2047
> #17 0x14b799a7a609 in start_thread (arg=) at
> pthread_create.c:477
> #18 0x14b79966f293 in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> R: Jan
>
> On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik  wrote:
>
>> Hi, Jan!
>>
>> On Oct 06, Jan Lindström wrote:
>> > >
>> > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
>> > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void
>> *victim_thd_ptr, my_bool signal)
>> > > > > > +{
>> > > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
>> > > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
>> > > > > > +  from concurrent usage or disconnect or delete. */
>> > > > >
>> > > > > How do you know victim_thd wasn't deleted before you locked
>> > > > > LOCK_thd_data below?
>> > > >
>> > > > I must say the thr_lock code is not familiar to me but there are
>> > > > mysql_mutex_lock() calls to lock->mutex. After code review  it is
>> not
>> > > > clear to me what that mutex is.
>> > >
>> > > where are mysql_mutex_lock() calls to lock->mutex?
>> >
>> > mysys/thr_lock.c there is function thr_lock() there is call to
>> > mysql_mutex_lock(>mutex); this is before wsrep_break_lock where we
>> > call wsrep_thd_abort
>>
>> this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
>> structure somewhere in the table share. It locks tables and handlers,
>> not threads. And InnoDB isn't using it at all anyway.
>>
>> >
>> > > > > >  if (victim_trx) {
>> > > > > > +wsrep_thd_UNLOCK(victim_thd);
>> > > > >
>> > > > > what keeps victim_trx from disappearing here?
>> > > >
>> > > > Nothing. Do you have suggestions ?
>> > >
>> > > A couple of thoughts:
>> > >
>> > > * Why do you unlock LOCK_thd_data here at all? I believed 

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-12 Thread Jan Lindström
Hi Sergei,


>
> > > trx_rw_is_active needs to be modified to do that, right?
> >
> > No this is current behaviour, I did not change anything on
> > trx_rw_is_active
>
> In xtradb trx_rw_is_active returns bool.
> I think xtradb is still the default innodb in 10.2.
>
> In innobase it returns, indeed, trx_t*, I didn't notice that at first,
> that's why I was confused.
>
>
xtradb is not anymore used at all, not even 10.2.

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Sergei Golubchik
Hi, Jan!

Great, thanks!

On Oct 11, Jan Lindström wrote:
> Update on disconnect
> 
> >
> > >   // As trx is now referenced it can't go away
> >
> > Hmm. What happens if the thd that owns this transaction is killed or the
> > user disconnects? THD gets freed. What happens to the referenced trx?
> >
> 
> I created new mtr-tests (galera_disconnect_debug) to try disconnecting
> victim connections on several debug sync points on this code
> this place included and could not reproduce any crash. Thus, transaction
> can't be killed while we are here or
> disconnect will not free THD.
> 
> I created a new mtr-test (galera_to_error) also for the TOI error case and
> verified that after that you can't issue statements anymore
> including KILL QUERY.
> 
> R: Jan
Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Sergei Golubchik
Hi, Jan!

On Oct 10, Jan Lindström wrote:
> Hi Sergei,
> >
> > >   if (victim_trx) {
> > > const trx_id_t victim_trx_id= victim_trx->id;
> > > const longlong victim_thread= thd_get_thread_id(victim_thd);
> > > /* This is necessary as correct mutexing order is
> > > lock_sys -> trx -> THD::LOCK_thd_data and below
> > > function assumes we have lock_sys and trx locked
> > > and takes THD::LOCK_thd_data for THD state check. */
> > > wsrep_thd_UNLOCK(victim_thd);
> > > // GAP where thd or trx is not protected
> > > lock_mutex_enter();
> > > if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
> >
> > trx_rw_is_active needs to be modified to do that, right?
> 
> No this is current behaviour, I did not change anything on
> trx_rw_is_active

In xtradb trx_rw_is_active returns bool.
I think xtradb is still the default innodb in 10.2.

In innobase it returns, indeed, trx_t*, I didn't notice that at first,
that's why I was confused.

> > >   // As trx is now referenced it can't go away
> >
> > Hmm. What happens if the thd that owns this transaction is killed or
> > the user disconnects? THD gets freed. What happens to the referenced
> > trx?
> 
> In my understanding you can't just free THD before it is aborted or
> committed, right ?
> As we have lock_sys, no trx can commit or abort inside InnoDB, and
> after this function this trx can't be deleted.

okay, good point.

> > What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
> > failure and will just proceed killing? Perhaps if
> > WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
> > anyway? Could you try to find it out?
> 
> User KILL can happen only after the node has moded to READY state so
> at startup you can't use it before the cluster is ready to serve.  We
> could just ignore the TOI error here, but what is the point? There are
> bigger problems in the cluster if TOI fails. TOI can fail only in this
> node as all other nodes in the cluster will ignore the KILL command
> (after parsing it).

Okay then

Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Hi Sergei,

After QA runs done by Ramesh, we now know the latest fix candidate i.e.
what is in bb-10.2-MDEV-25114-galera-v2 is incorrect. Problem is in
wsrep_close_connections() as it holds LOCK_thread_count while it does
abort_replicated that will call wsrep_abort_transaction and there we use
find_thread_by_id that would also take LOCK_thread_count. As there is
another code path here, the problem is not easily fixed. We can't just
release LOCK_thread_count at wsrep_close_connections as we iterate the
thread list.

I must say I'm not sure what to do now.

(gdb) bt
#0  __pthread_kill (threadid=, signo=signo@entry=6) at
../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x56333963a2e8 in my_write_core (sig=sig@entry=6) at
/test/mtest/10.2_dbg/mysys/stacktrace.c:382
#2  0x563338f2993d in handle_fatal_signal (sig=6) at
/test/mtest/10.2_dbg/sql/signal_handler.cc:355
#3  
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x14b799572859 in __GI_abort () at abort.c:79
#6  0x56333963edc4 in safe_mutex_lock (mp=0x563339e47220
, my_flags=my_flags@entry=0,
file=file@entry=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc",
line=line@entry=8902)
at /test/mtest/10.2_dbg/mysys/thr_mutex.c:264
#7  0x563338d1aea7 in inline_mysql_mutex_lock (src_line=8902,
src_file=0x5633396d9050 "/test/mtest/10.2_dbg/sql/sql_parse.cc",
that=) at
/test/mtest/10.2_dbg/include/mysql/psi/mysql_thread.h:688
#8  find_thread_by_id (id=id@entry=48, query_id=query_id@entry=false) at
/test/mtest/10.2_dbg/sql/sql_parse.cc:8902
#9  0x563339134c39 in wsrep_abort_transaction (hton=,
bf_thd=0x14b68d90, victim_thd=,
signal=) at
/test/mtest/10.2_dbg/storage/innobase/handler/ha_innodb.cc:19821
#10 0x563338f38cbf in ha_abort_transaction
(bf_thd=bf_thd@entry=0x14b68d90,
victim_thd=victim_thd@entry=0x14b68d90,
signal=signal@entry=1 '\001') at
/test/mtest/10.2_dbg/sql/handler.cc:6327
#11 0x563338ebed6d in wsrep_abort_thd
(bf_thd_ptr=bf_thd_ptr@entry=0x14b68d90,
victim_thd_ptr=victim_thd_ptr@entry=0x14b68d90,
signal=signal@entry=1 '\001') at
/test/mtest/10.2_dbg/sql/wsrep_thd.cc:832
#12 0x563338eaa2bd in abort_replicated (thd=thd@entry=0x14b68d90)
at /test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2269
#13 0x563338eae097 in wsrep_close_client_connections
(wait_to_end=wait_to_end@entry=1 '\001',
except_caller_thd=except_caller_thd@entry=0x0) at
/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:2437
#14 0x563338eaedf6 in wsrep_stop_replication (thd=thd@entry=0x0) at
/test/mtest/10.2_dbg/sql/wsrep_mysqld.cc:962
#15 0x563338c543d8 in kill_server (sig_ptr=sig_ptr@entry=0x0) at
/test/mtest/10.2_dbg/sql/mysqld.cc:2009
#16 0x563338c558d5 in kill_server_thread (arg=) at
/test/mtest/10.2_dbg/sql/mysqld.cc:2047
#17 0x14b799a7a609 in start_thread (arg=) at
pthread_create.c:477
#18 0x14b79966f293 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

R: Jan

On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik  wrote:

> Hi, Jan!
>
> On Oct 06, Jan Lindström wrote:
> > >
> > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void
> *victim_thd_ptr, my_bool signal)
> > > > > > +{
> > > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > > > > +  from concurrent usage or disconnect or delete. */
> > > > >
> > > > > How do you know victim_thd wasn't deleted before you locked
> > > > > LOCK_thd_data below?
> > > >
> > > > I must say the thr_lock code is not familiar to me but there are
> > > > mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> > > > clear to me what that mutex is.
> > >
> > > where are mysql_mutex_lock() calls to lock->mutex?
> >
> > mysys/thr_lock.c there is function thr_lock() there is call to
> > mysql_mutex_lock(>mutex); this is before wsrep_break_lock where we
> > call wsrep_thd_abort
>
> this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
> structure somewhere in the table share. It locks tables and handlers,
> not threads. And InnoDB isn't using it at all anyway.
>
> >
> > > > > >  if (victim_trx) {
> > > > > > +wsrep_thd_UNLOCK(victim_thd);
> > > > >
> > > > > what keeps victim_trx from disappearing here?
> > > >
> > > > Nothing. Do you have suggestions ?
> > >
> > > A couple of thoughts:
> > >
> > > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > > point of using TOI was to make sure that even if you lock mutexes in
> > > a different order, it will not cause a deadlock.
> >
> > This is DDL-case when we have a MDL-conflict not user KILL, we need to
> > release it in my opinion because we need to take mutexes in
> > lock_sys -> trx -> THD::LOCK_thd_data order,
> > if I do not release I can see easily safe_mutex warnings
>
> I don't understand. First, lock_sys and trx mutexes are not covered 

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-11 Thread Jan Lindström
Update on disconnect

>
> >   // As trx is now referenced it can't go away
>
> Hmm. What happens if the thd that owns this transaction is killed or the
> user disconnects? THD gets freed. What happens to the referenced trx?
>

I created new mtr-tests (galera_disconnect_debug) to try disconnecting
victim connections on several debug sync points on this code
this place included and could not reproduce any crash. Thus, transaction
can't be killed while we are here or
disconnect will not free THD.

I created a new mtr-test (galera_to_error) also for the TOI error case and
verified that after that you can't issue statements anymore
including KILL QUERY.

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-10 Thread Jan Lindström
Hi Sergei,

Update on what happens after TOI failure.


> What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
> failure and will just proceed killing? Perhaps if
> WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
> anyway? Could you try to find it out?
>

After TOI error that node will not serve user anymore, you either get

ERROR 40001: WSREP replication failed. Check your wsrep connection state
and retry the query.
or
ERROR 08S01: WSREP has not yet prepared node for application use

User KILL will return the first one. bf_kill can't happen as you can't
issue commands to cause it.

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-10 Thread Jan Lindström
Hi Sergei,


>
> >   if (victim_trx) {
> > const trx_id_t victim_trx_id= victim_trx->id;
> > const longlong victim_thread= thd_get_thread_id(victim_thd);
> > /* This is necessary as correct mutexing order is
> > lock_sys -> trx -> THD::LOCK_thd_data and below
> > function assumes we have lock_sys and trx locked
> > and takes THD::LOCK_thd_data for THD state check. */
> > wsrep_thd_UNLOCK(victim_thd);
> > // GAP where thd or trx is not protected
> > lock_mutex_enter();
> > if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
>
> trx_rw_is_active needs to be modified to do that, right?
>

No this is current behaviour, I did not change anything on trx_rw_is_active

>
> >   // As trx is now referenced it can't go away
>
> Hmm. What happens if the thd that owns this transaction is killed or the
> user disconnects? THD gets freed. What happens to the referenced trx?
>

In my understanding you can't just free THD before it is aborted or
committed, right ?
As we have lock_sys, no trx can commit or abort inside InnoDB, and after
this function this trx can't be deleted.


>
> >   trx_mutex_enter(victim);
> >   // In below we take THD::LOCK_thd_data
>
> "we take victim->mysql_thd->LOCK_thd_data", correct?
>

Yes


>
> What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
> failure and will just proceed killing? Perhaps if
> WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
> anyway? Could you try to find it out?
>

User KILL can happen only after the node has moded to READY state so at
startup you can't use
it before the cluster is ready to serve.  We could just ignore the TOI
error here, but what is the point?
There are bigger problems in the cluster if TOI fails. TOI can fail only in
this node as all other nodes
in the cluster will ignore the KILL command (after parsing it).

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-08 Thread Sergei Golubchik
Hi, Jan!

On Oct 06, Jan Lindström wrote:
> > > > >
> > > > > I must say the thr_lock code is not familiar to me but there
> > > > > are mysql_mutex_lock() calls to lock->mutex. After code review
> > > > > it is not clear to me what that mutex is.
> > > >
> > this is for table locks. `lock` is `data->lock` where `data` is
> > THR_LOCK structure somewhere in the table share. It locks tables and
> > handlers, not threads. And InnoDB isn't using it at all anyway.
> 
> I do not know how to change this one so that thd is protected, can I
> even do so as this code does not know THD internals...

No, that was my point. thr_lock is for *table locks*, you cannot use it
to protect a THD, they protect TABLE and TABLE_SHARE. Basically, it's
irrelevant for this MDEV.

> > > > > > >  if (victim_trx) {
> > > > > > > +wsrep_thd_UNLOCK(victim_thd);
> > > > > >
> > > > > > what keeps victim_trx from disappearing here?
> > > > >
> > > > > Nothing. Do you have suggestions ?
> > > >
> > > > A couple of thoughts:
> > > >
> > > > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > > > point of using TOI was to make sure that even if you lock mutexes in
> > > > a different order, it will not cause a deadlock.
> > >
> > > This is DDL-case when we have a MDL-conflict not user KILL, we need to
> > > release it in my opinion because we need to take mutexes in
> > > lock_sys -> trx -> THD::LOCK_thd_data order,
> > > if I do not release I can see easily safe_mutex warnings
> >
> > I don't understand. First, lock_sys and trx mutexes are not covered by
> > safe_mutex, so it cannot possibly warn about them.
> >
> > Second, the reason for taking mutexes always in the same order is to
> > avoid deadlocks. And yor TOI trick should archieve that.
> >
> static void wsrep_abort_transaction(
> handlerton* hton, THD *bf_thd, THD *victim_thd, my_bool signal)
> {
>   trx_t* victim_trx = thd_to_trx(victim_thd);
>   trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL;
> 
>   if (victim_trx) {
> const trx_id_t victim_trx_id= victim_trx->id;
> const longlong victim_thread= thd_get_thread_id(victim_thd);
> /* This is necessary as correct mutexing order is
> lock_sys -> trx -> THD::LOCK_thd_data and below
> function assumes we have lock_sys and trx locked
> and takes THD::LOCK_thd_data for THD state check. */
> wsrep_thd_UNLOCK(victim_thd);
> // GAP where thd or trx is not protected
> lock_mutex_enter();
> if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {

trx_rw_is_active needs to be modified to do that, right?

>   // As trx is now referenced it can't go away

Hmm. What happens if the thd that owns this transaction is killed or the
user disconnects? THD gets freed. What happens to the referenced trx?

>   trx_mutex_enter(victim);
>   // In below we take THD::LOCK_thd_data

"we take victim->mysql_thd->LOCK_thd_data", correct?

>   wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, signal);
>   trx_mutex_exit(victim);
>   victim->release_reference();
>   wsrep_srv_conc_cancel_wait(victim);
> }
> lock_mutex_exit();
> DBUG_VOID_RETURN;
>   } else {
> wsrep_thd_LOCK(victim_thd);
> wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT);
> wsrep_thd_awake(victim_thd, signal);
> wsrep_thd_UNLOCK(victim_thd);
>   }
>   DBUG_VOID_RETURN;
> }
> 
> > > > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> > > > Does it need to wait for everything else being replicated?
> > >
> > > As long as it takes to start it on all nodes in the cluster.
> >
> > So, KILL basically won't work when a cluster is starting.
> > Can bf aborts happen during a cluster startup?
> 
> Not sure.

What I mean it, what if KILL would ignore WSREP_TO_ISOLATION_BEGIN
failure and will just proceed killing? Perhaps if
WSREP_TO_ISOLATION_BEGIN fails it means that there can be no bf aborts
anyway? Could you try to find it out?

Regards,
Sergei

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Jan Lindström
Hi Sergei,

Answers to your questions below:

On Wed, Oct 6, 2021 at 5:03 PM Sergei Golubchik  wrote:

> Hi, Jan!
>
> On Oct 06, Jan Lindström wrote:
> > >
> > > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void
> *victim_thd_ptr, my_bool signal)
> > > > > > +{
> > > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > > > > +  from concurrent usage or disconnect or delete. */
> > > > >
> > > > > How do you know victim_thd wasn't deleted before you locked
> > > > > LOCK_thd_data below?
> > > >
> > > > I must say the thr_lock code is not familiar to me but there are
> > > > mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> > > > clear to me what that mutex is.
> > >
> > > where are mysql_mutex_lock() calls to lock->mutex?
> >
> > mysys/thr_lock.c there is function thr_lock() there is call to
> > mysql_mutex_lock(>mutex); this is before wsrep_break_lock where we
> > call wsrep_thd_abort
>
> this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
> structure somewhere in the table share. It locks tables and handlers,
> not threads. And InnoDB isn't using it at all anyway.
>

I do not know how to change this one so that thd is protected, can I even
do so as this code does not know
THD internals...


>
> >
> > > > > >  if (victim_trx) {
> > > > > > +wsrep_thd_UNLOCK(victim_thd);
> > > > >
> > > > > what keeps victim_trx from disappearing here?
> > > >
> > > > Nothing. Do you have suggestions ?
> > >
> > > A couple of thoughts:
> > >
> > > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > > point of using TOI was to make sure that even if you lock mutexes in
> > > a different order, it will not cause a deadlock.
> >
> > This is DDL-case when we have a MDL-conflict not user KILL, we need to
> > release it in my opinion because we need to take mutexes in
> > lock_sys -> trx -> THD::LOCK_thd_data order,
> > if I do not release I can see easily safe_mutex warnings
>
> I don't understand. First, lock_sys and trx mutexes are not covered by
> safe_mutex, so it cannot possibly warn about them.
>
> Second, the reason for taking mutexes always in the same order is to
> avoid deadlocks. And yor TOI trick should archieve that.
>

 static
void
wsrep_abort_transaction(
/**/
handlerton* hton,
THD *bf_thd,
THD *victim_thd,
my_bool signal)
{
  DBUG_ENTER("wsrep_abort_transaction");

  trx_t* victim_trx = thd_to_trx(victim_thd);
  trx_t* bf_trx = (bf_thd) ? thd_to_trx(bf_thd) : NULL;

  WSREP_DEBUG("abort transaction: BF: %s victim: %s victim conf: %d",
wsrep_thd_query(bf_thd),
wsrep_thd_query(victim_thd),
wsrep_thd_conflict_state(victim_thd, FALSE));

  if (victim_trx) {
const trx_id_t victim_trx_id= victim_trx->id;
const longlong victim_thread= thd_get_thread_id(victim_thd);
WSREP_DEBUG("wsrep_abort_transaction: Victim thread %ld "
   "transaction " TRX_ID_FMT " trx_state %d",
   thd_get_thread_id(victim_thd),
victim_trx->id,
victim_trx->state);
/* This is necessary as correct mutexing order is
lock_sys -> trx -> THD::LOCK_thd_data and below
function assumes we have lock_sys and trx locked
and takes THD::LOCK_thd_data for THD state check. */
wsrep_thd_UNLOCK(victim_thd);
// GAP where thd or trx is not protected
DEBUG_SYNC(bf_thd, "wsrep_abort_victim_unlocked");
DBUG_EXECUTE_IF("wsrep_abort_replicated_sleep",
WSREP_DEBUG("wsrep_abort_transaction: sleeping "
   "for thread %ld ",
   thd_get_thread_id(victim_thd));
lock_mutex_enter();
if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, true)) {
  // As trx is now referenced it can't go away
  trx_mutex_enter(victim);
  ut_ad(victim->mysql_thd ?
 thd_get_thread_id(victim->mysql_thd) == victim_thread :1);
  // In below we take THD::LOCK_thd_data
  wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, signal);
  trx_mutex_exit(victim);
  victim->release_reference();
  wsrep_srv_conc_cancel_wait(victim);
}
lock_mutex_exit();
DBUG_VOID_RETURN;
  } else {
WSREP_DEBUG("victim does not have transaction");
wsrep_thd_LOCK(victim_thd);
wsrep_thd_set_conflict_state(victim_thd, MUST_ABORT);
wsrep_thd_awake(victim_thd, signal);
wsrep_thd_UNLOCK(victim_thd);
  }
  DBUG_VOID_RETURN;
}

>
> > > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> > > Does it need to wait for everything else being replicated?
> >
> > As long as it takes to start it on all nodes in the cluster.
>
> So, KILL basically won't work when a cluster is starting.
> Can bf aborts happen during a cluster startup?
>

Not sure.

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : 

Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Sergei Golubchik
Hi, Jan!

On Oct 06, Jan Lindström wrote:
> >
> > > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void 
> > > > > *victim_thd_ptr, my_bool signal)
> > > > > +{
> > > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > > > +  from concurrent usage or disconnect or delete. */
> > > >
> > > > How do you know victim_thd wasn't deleted before you locked
> > > > LOCK_thd_data below?
> > >
> > > I must say the thr_lock code is not familiar to me but there are
> > > mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> > > clear to me what that mutex is.
> >
> > where are mysql_mutex_lock() calls to lock->mutex?
> 
> mysys/thr_lock.c there is function thr_lock() there is call to
> mysql_mutex_lock(>mutex); this is before wsrep_break_lock where we
> call wsrep_thd_abort

this is for table locks. `lock` is `data->lock` where `data` is THR_LOCK
structure somewhere in the table share. It locks tables and handlers,
not threads. And InnoDB isn't using it at all anyway.

> 
> > > > >  if (victim_trx) {
> > > > > +wsrep_thd_UNLOCK(victim_thd);
> > > >
> > > > what keeps victim_trx from disappearing here?
> > >
> > > Nothing. Do you have suggestions ?
> >
> > A couple of thoughts:
> >
> > * Why do you unlock LOCK_thd_data here at all? I believed the whole
> > point of using TOI was to make sure that even if you lock mutexes in
> > a different order, it will not cause a deadlock.
> 
> This is DDL-case when we have a MDL-conflict not user KILL, we need to
> release it in my opinion because we need to take mutexes in
> lock_sys -> trx -> THD::LOCK_thd_data order,
> if I do not release I can see easily safe_mutex warnings

I don't understand. First, lock_sys and trx mutexes are not covered by
safe_mutex, so it cannot possibly warn about them.

Second, the reason for taking mutexes always in the same order is to
avoid deadlocks. And yor TOI trick should archieve that.

> > * You cannot pass a THD safely over a gap where no locks keep it in
> > existence. You can pass an integer, thread id, and use
> > find_thread_by_id later.
> 
> I have a suggestion for this:
> 
>  if (victim_trx) {
> +
> +   const trx_id_t victim_trx_id= victim_trx->id;
> +   const longlong victim_thread= thd_get_thread_id(victim_thd);
> lock_mutex_enter();
> -   trx_mutex_enter(victim_trx);
> -   wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim_trx, 
> signal);
> +   if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL, 
> true)) // If this succeeds, trx can not go away
> +   {
> +   trx_mutex_enter(victim);
> +   ut_ad(victim->mysql_thd ?
> +   thd_get_thread_id(victim->mysql_thd) == 
> victim_thread :
> +   1);
> +   wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim, 
> signal);
> +   trx_mutex_exit(victim);
> +   victim->release_reference(); // Now trx can go away 
> after we have released lock_sys
> +   wsrep_srv_conc_cancel_wait(victim);
> +   }
> lock_mutex_exit();
> -   trx_mutex_exit(victim_trx);
> -   wsrep_srv_conc_cancel_wait(victim_trx);
> DBUG_VOID_RETURN;

I don't understand, sorry. It's not quite clear what this patch should
apply to. May be you can paste what the new code should look like, not a
patch?

In particular, where are wsrep_thd_UNLOCK and wsrep_thd_LOCK. Are they
present in this new variant at all?

> 
> > By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> > Does it need to wait for everything else being replicated?
> 
> As long as it takes to start it on all nodes in the cluster.

So, KILL basically won't work when a cluster is starting.
Can bf aborts happen during a cluster startup?

Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Jan Lindström
Hi Sergei,

Answers below:

>
> > > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void
> *victim_thd_ptr, my_bool signal)
> > > > +{
> > > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > > +  from concurrent usage or disconnect or delete. */
> > >
> > > How do you know victim_thd wasn't deleted before you locked
> > > LOCK_thd_data below?
> >
> > I must say the thr_lock code is not familiar to me but there are
> > mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> > clear to me what that mutex is.
>
> where are mysql_mutex_lock() calls to lock->mutex?
>

 mysys/thr_lock.c there is function thr_lock() there is call to
mysql_mutex_lock(>mutex); this is before wsrep_break_lock where we
call wsrep_thd_abort


> > > >  if (victim_trx) {
> > > > +wsrep_thd_UNLOCK(victim_thd);
> > >
> > > what keeps victim_trx from disappearing here?
> >
> > Nothing. Do you have suggestions ?
>
> A couple of thoughts:
>
> * Why do you unlock LOCK_thd_data here at all? I believed the whole
>   point of using TOI was to make sure that even if you lock mutexes in a
>   different order, it will not cause a deadlock.
>

This is DDL-case when we have a MDL-conflict not user KILL, we need to
release it in my opinion because
we need to take mutexes in lock_sys -> trx -> THD::LOCK_thd_data order, if
I do not release I can see easily
safe_mutex warnings


>
> * You cannot pass a THD safely over a gap where no locks keep it in
>   existence. You can pass an integer, thread id, and use
>   find_thread_by_id later.
>

I have a suggestion for this:

 if (victim_trx) {
+
+   const trx_id_t victim_trx_id= victim_trx->id;
+   const longlong victim_thread= thd_get_thread_id(victim_thd);
lock_mutex_enter();
-   trx_mutex_enter(victim_trx);
-   wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim_trx,
signal);
+   if (trx_t* victim= trx_rw_is_active(victim_trx_id, NULL,
true)) // If this succeeds, trx can not go away
+   {
+   trx_mutex_enter(victim);
+   ut_ad(victim->mysql_thd ?
+   thd_get_thread_id(victim->mysql_thd) ==
victim_thread :
+   1);
+   wsrep_innobase_kill_one_trx(bf_thd, bf_trx, victim,
signal);
+   trx_mutex_exit(victim);
+   victim->release_reference(); // Now trx can go away
after we have released lock_sys
+   wsrep_srv_conc_cancel_wait(victim);
+   }
lock_mutex_exit();
-   trx_mutex_exit(victim_trx);
-   wsrep_srv_conc_cancel_wait(victim_trx);
DBUG_VOID_RETURN;

>
> By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
> Does it need to wait for everything else being replicated?
>

As long as it takes to start it on all nodes in the cluster.

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-06 Thread Sergei Golubchik
Hi, Jan!

On Oct 04, Jan Lindström wrote:
> Hi Sergei,
> > > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, 
> > > my_bool signal)
> > > +{
> > > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > > +  from concurrent usage or disconnect or delete. */
> >
> > How do you know victim_thd wasn't deleted before you locked
> > LOCK_thd_data below?
> 
> I must say the thr_lock code is not familiar to me but there are
> mysql_mutex_lock() calls to lock->mutex. After code review  it is not
> clear to me what that mutex is.

where are mysql_mutex_lock() calls to lock->mutex?

> > >  if (victim_trx) {
> > > +wsrep_thd_UNLOCK(victim_thd);
> >
> > what keeps victim_trx from disappearing here?
> 
> Nothing. Do you have suggestions ?

A couple of thoughts:

* Why do you unlock LOCK_thd_data here at all? I believed the whole
  point of using TOI was to make sure that even if you lock mutexes in a
  different order, it will not cause a deadlock.

* You cannot pass a THD safely over a gap where no locks keep it in
  existence. You can pass an integer, thread id, and use
  find_thread_by_id later.

By the way, how long can WSREP_TO_ISOLATION_BEGIN() take?
Does it need to wait for everything else being replicated?

Regards,
Sergei
VP of MariaDB Server Engineering
and secur...@mariadb.org

___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-03 Thread Jan Lindström
Hi Sergei,

On Fri, Oct 1, 2021 at 9:05 PM Sergei Golubchik  wrote:

> Hi, Seppo, Jan!
>
> Note, this is 10.2 patch below.
>
> >
> > MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)
>
> I think this should say
>
>   MDEV-23328 Server hang due to Galera lock conflict resolution
>
>
Sure.


>
> > +WSREP_TO_ISOLATION_BEGIN(WSREP_MYSQL_DB, NULL, NULL)
>
> when can it fail and jump to wsrep_error_label?
> under what conditions?
>

At least when the node disconnects at a suitable moment.


> > +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> > +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr,
> my_bool signal)
> > +{
> > +  THD* victim_thd= (THD *) victim_thd_ptr;
> > +  /* We need to lock THD::LOCK_thd_data to protect victim
> > +  from concurrent usage or disconnect or delete. */
>
> How do you know victim_thd wasn't deleted before you locked LOCK_thd_data
> below?
>

I must say the thr_lock code is not familiar to me but there are
mysql_mutex_lock() calls
to lock->mutex. After code review  it is not clear to me what that mutex is.

>
> >  if (victim_trx) {
> > +wsrep_thd_UNLOCK(victim_thd);
>
> what keeps victim_trx from disappearing here?
>

Nothing. Do you have suggestions ?

R: Jan
___
Mailing list: https://launchpad.net/~maria-developers
Post to : maria-developers@lists.launchpad.net
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp


Re: [Maria-developers] 4b164f176e6: MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

2021-10-01 Thread Sergei Golubchik
Hi, Seppo, Jan!

Note, this is 10.2 patch below.

> commit 4b164f176e6
> Author: Seppo Jaakola 
> Date:   Wed Sep 15 09:16:44 2021 +0300
> 
> MDEV-25114 Crash: WSREP: invalid state ROLLED_BACK (FATAL)

I think this should say

  MDEV-23328 Server hang due to Galera lock conflict resolution

it'd be easier to understand the commit comment and code changes
if it'll refer to the correct problem it is fixing.

The fix for MDEV-25114 was the previous commit, the one that

  This reverts commit 29bbcac0ee841faaa68eeb09c86ff825eabbe6b6

it should have MDEV-25114 in the commit comment.

> 
> This patch is the plan D variant for fixing potetial mutex locking

"potential"

> order exercised by BF aborting and KILL command execution.
> 
> In this approach, KILL command is replicated as TOI operation.
> This guarantees total isolation for the KILL command execution
> in the first node: there is no concurrent replication applying
> and no concurrent DDL executing. Therefore there is no risk of
> BF aborting to happen in parallel with KILL command execution
> either. Potential mutex deadlocks between the different mutex
> access paths with KILL command execution and BF aborting cannot
> therefore happen.
> 
> TOI replication is used, in this approach,  purely as means
> to provide isolated KILL command execution in the first node.
> KILL command should not (and must not) be applied in secondary
> nodes. In this patch, we make this sure by skipping KILL
> execution in secondary nodes, in applying phase, where we
> bail out if applier thread is trying to execute KILL command.
> This is effective, but skipping the applying of KILL command
> could happen much earlier as well.

I like the idea of using TOI apparatus to provide total isolation for KILL.

Actually replicating it to nodes is kind of crude, it'd be more elegant
not to replicate at all. But it'll do.

> 
> This patch also fixes mutex locking order and unprotected
> THD member accesses on bf aborting case. We try to hold
> THD::LOCK_thd_data during bf aborting. Only case where it
> is not possible is at wsrep_abort_transaction before
> call wsrep_innobase_kill_one_trx where we take InnoDB
> mutexes first and then THD::LOCK_thd_data.
> 
> This will also fix possible race condition during
> close_connection and while wsrep is disconnecting
> connections.
> 
> Added wsrep_bf_kill_debug test case
> 
> Reviewed-by: Jan Lindström 
> 
> diff --git a/sql/sql_class.cc b/sql/sql_class.cc
> index 5ada018e540..05ec9a1c369 100644
> --- a/sql/sql_class.cc
> +++ b/sql/sql_class.cc
> @@ -1804,11 +1804,11 @@ void THD::awake(killed_state state_to_set)
>the Vio might be disassociated concurrently.
>  */
>  
> -void THD::disconnect()
> +void THD::disconnect_mutexed()

should be disconnect_no_mutex, cf. awake_no_mutex, set_killed_no_mutex.

>  {
>Vio *vio= NULL;
>  
> -  mysql_mutex_lock(_thd_data);
> +  mysql_mutex_assert_owner(_thd_data);
>  
>set_killed(KILL_CONNECTION);
>  
> diff --git a/sql/sql_parse.cc b/sql/sql_parse.cc
> index 3e1f248b082..2bec9c6b6cd 100644
> --- a/sql/sql_parse.cc
> +++ b/sql/sql_parse.cc
> @@ -9069,6 +9069,18 @@ static
>  void sql_kill(THD *thd, longlong id, killed_state state, killed_type type)
>  {
>uint error;
> +#ifdef WITH_WSREP
> +  if (WSREP(thd))
> +  {
> +WSREP_DEBUG("sql_kill called");
> +if (thd->wsrep_applier)
> +{
> +  WSREP_DEBUG("KILL in applying, bailing out here");
> +  return;
> +}
> +WSREP_TO_ISOLATION_BEGIN(WSREP_MYSQL_DB, NULL, NULL)

when can it fail and jump to wsrep_error_label?
under what conditions?

> +  }
> +#endif /* WITH_WSREP */
>if (!(error= kill_one_thread(thd, id, state, type)))
>{
>  if (!thd->killed)
> diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
> index e60100e2e90..00a6dfe2f8a 100644
> --- a/sql/wsrep_mysqld.cc
> +++ b/sql/wsrep_mysqld.cc
> @@ -835,13 +835,25 @@ void wsrep_thr_init()
>DBUG_VOID_RETURN;
>  }
>  
> +/* This is wrapper for wsrep_break_lock in thr_lock.c */
> +static int wsrep_thr_abort_thd(void *bf_thd_ptr, void *victim_thd_ptr, 
> my_bool signal)
> +{
> +  THD* victim_thd= (THD *) victim_thd_ptr;
> +  /* We need to lock THD::LOCK_thd_data to protect victim
> +  from concurrent usage or disconnect or delete. */

How do you know victim_thd wasn't deleted before you locked LOCK_thd_data
below?

> +  mysql_mutex_lock(_thd->LOCK_thd_data);
> +  int res= wsrep_abort_thd(bf_thd_ptr, victim_thd_ptr, signal);
> +  mysql_mutex_unlock(_thd->LOCK_thd_data);
> +  return res;
> +}
>
> diff --git a/storage/innobase/handler/ha_innodb.cc 
> b/storage/innobase/handler/ha_innodb.cc
> index 1f666f19d8a..921b8282e45 100644
> --- a/storage/innobase/handler/ha_innodb.cc
> +++ b/storage/innobase/handler/ha_innodb.cc
> @@ -19739,27 +19739,65 @@ wsrep_abort_transaction(
>  {
>