Hi,
I recently switched from a home-made external scheduler to the web2py-2.9.5
embedded scheduler.
Tasks are created at ~1000 tasks/minute rate.
About 30% of queue_task() calls end up with a mariadb deadlock involving
the dead_workers_name subquery. For example:
------------------------
LATEST DETECTED DEADLOCK
------------------------
140820 13:46:36
*** (1) TRANSACTION:
TRANSACTION 14EF916DD, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 13 row lock(s), undo log
entries 1
MySQL thread id 385713, OS thread handle 0x4ba93940, query id 14414018
<somehost> <someip> opensvc updating
DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat <
'2014-08-20 13:46:27') AND (scheduler_worker.status = 'ACTIVE')) OR
((scheduler_worker.last_heartbeat < '2014-08-20 13:44:21') AND
(scheduler_worker.status <> 'ACTIVE')))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table
`opensvc`.`scheduler_worker` trx id 14EF916DD lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 14EF916DE, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 385737, OS thread handle 0x4bc49940, query id
14414021 <somehost> <someip> opensvc updating
DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat <
'2014-08-20 13:46:27') AND (scheduler_worker.status = 'ACTIVE')) OR
((scheduler_worker.last_heartbeat < '2014-08-20 13:44:21') AND
(scheduler_worker.status <> 'ACTIVE')))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table
`opensvc`.`scheduler_worker` trx id 14EF916DE lock_mode X locks rec but not
gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table
`opensvc`.`scheduler_worker` trx id 14EF916DE lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)
I chose to add a db round-trip to be safe from these deadlocks, using the
following patch :
--- /tmp/scheduler.py 2014-08-20 14:02:22.000000000 +0200
+++ ../../gluon/scheduler.py 2014-08-20 13:44:41.000000000 +0200
@@ -946,7 +946,7 @@
((sw.last_heartbeat < expiration) & (sw.status ==
ACTIVE)) |
((sw.last_heartbeat < departure) & (sw.status !=
ACTIVE))
)
- dead_workers_name =
dead_workers._select(sw.worker_name)
+ dead_workers_name = [r.worker_name for r in
dead_workers.select(sw.worker_name)]
db(
(st.assigned_worker_name.belongs(dead_workers_name)) &
(st.status == RUNNING)
... which effectively works around the deadlock issue.
Has someone else faced this issue, and can you comment on the sanity of
this patch ?
Would you consider merging it or an alternative patch, so I don't have to
forward port it ?
I can easily test alternative patches, so let me know if I can help.
Best regards,
Christophe Varoqui
OpenSVC
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.