Hi,

I recently switched from a home-made external scheduler to the web2py-2.9.5 
embedded scheduler.

Tasks are created at ~1000 tasks/minute rate.
About 30% of queue_task() calls end up with a mariadb deadlock involving 
the dead_workers_name subquery. For example:

------------------------
LATEST DETECTED DEADLOCK
------------------------
140820 13:46:36
*** (1) TRANSACTION:
TRANSACTION 14EF916DD, ACTIVE 0 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 13 row lock(s), undo log 
entries 1
MySQL thread id 385713, OS thread handle 0x4ba93940, query id 14414018 
<somehost> <someip> opensvc updating
DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat < 
'2014-08-20 13:46:27') AND (scheduler_worker.status = 'ACTIVE')) OR 
((scheduler_worker.last_heartbeat < '2014-08-20 13:44:21') AND 
(scheduler_worker.status <> 'ACTIVE')))
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table 
`opensvc`.`scheduler_worker` trx id 14EF916DD lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 14EF916DE, ACTIVE 0 sec starting index read
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 1
MySQL thread id 385737, OS thread handle 0x4bc49940, query id 
14414021 <somehost> <someip> opensvc updating
DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat < 
'2014-08-20 13:46:27') AND (scheduler_worker.status = 'ACTIVE')) OR 
((scheduler_worker.last_heartbeat < '2014-08-20 13:44:21') AND 
(scheduler_worker.status <> 'ACTIVE')))
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table 
`opensvc`.`scheduler_worker` trx id 14EF916DE lock_mode X locks rec but not 
gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 262442 page no 3 n bits 104 index `PRIMARY` of table 
`opensvc`.`scheduler_worker` trx id 14EF916DE lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)


I chose to add a db round-trip to be safe from these deadlocks, using the 
following patch :

--- /tmp/scheduler.py   2014-08-20 14:02:22.000000000 +0200
+++ ../../gluon/scheduler.py    2014-08-20 13:44:41.000000000 +0200
@@ -946,7 +946,7 @@
                         ((sw.last_heartbeat < expiration) & (sw.status == 
ACTIVE)) |
                         ((sw.last_heartbeat < departure) & (sw.status != 
ACTIVE))
                     )
-                    dead_workers_name = 
dead_workers._select(sw.worker_name)
+                    dead_workers_name = [r.worker_name for r in 
dead_workers.select(sw.worker_name)]
                     db(
                        
 (st.assigned_worker_name.belongs(dead_workers_name)) &
                         (st.status == RUNNING)


... which effectively works around the deadlock issue.

Has someone else faced this issue, and can you comment on the sanity of 
this patch ?
Would you consider merging it or an alternative patch, so I don't have to 
forward port it ?

I can easily test alternative patches, so let me know if I can help.

Best regards,
Christophe Varoqui
OpenSVC

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to