[web2py] DEADLOCKs between two or more scheduler worker

Erwn Ltmann Thu, 27 Oct 2016 02:21:50 -0700

Hi,

I have a deadlock after a while of running more than one worker, e.g. 
output for both appearences:


10:29:07,903 [31505] Scheduler - ERROR - Error cleaning up
> 10:29:07,904 [31505] Scheduler - ERROR - (1213, u'Deadlock found when 
> trying to get lock; try restarting transaction')
> 10:29:07,904 [31499] Scheduler - ERROR - Error retrieving status
> 10:29:07,904 [31499] Scheduler - ERROR - (1213, u'Deadlock found when 
> trying to get lock; try restarting transaction')
>

A typical output on database (MariaDB) console is for example:

------------------------
> LATEST DETECTED DEADLOCK
> ------------------------ 
>
...

LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s)
> ...
> UPDATE scheduler_worker SET status='ACTIVE',last_heartbeat='2016-10-27 
> 10:10:22',worker_stats='{"status": "ACTIVE", "errors": 0, "workers": 0, 
> "queue": 0, "empty_runs": 3, "sleep": 5, "distribution": null, "total": 
> 152}' WHERE (scheduler_worker.worker_name = 
> 'aradev2.ad-00.ent-01.adgroup#31499')
> ...
> DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat < 
> '2016-10-27 10:10:07') AND (scheduler_worker.status = 'ACTIVE')) OR 
> ((scheduler_worker.last_heartbeat < '2016-10-27 10:06:37') AND 
> (scheduler_worker.status <> 'ACTIVE')))
>

It seems so, two worker wants to do something equal at the same time on the 
same scheduler_worker's records.

I read the code of function send_heartbeat and saw it can happen that two 
or more worker does the same steps.

Anyway, with an extra db.commit() after worker's status update inside 
send_heartbeat function in scheduler.py I could reduce that appearance 
concerning the 'cleaning up' situation; remains the 'retrieving status' 
error.

Here a database output for this case:

------------------------
> LATEST DETECTED DEADLOCK
> ------------------------
> ...
> LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s)
> ...
> UPDATE scheduler_worker SET status='ACTIVE',last_heartbeat='2016-10-27 
> 11:03:06',worker_stats='{"status": "ACTIVE", "errors": 0, "workers": 0, 
> "queue": 0, "empty_runs": 4, "sleep": 5, "distribution": null, "total": 
> 60}' WHERE (scheduler_worker.worker_name = 
> 'aradev2.ad-00.ent-01.adgroup#14665')
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 915 page no 3 n bits 80 index `PRIMARY` of table 
> `arm`.`scheduler_worker` trx table locks 1 total table locks 3  trx id 
> 14202929 lock_mode X locks rec but not gap waiting lock hold time 0 wait 
> time before grant 0 
> *** (2) TRANSACTION:
> TRANSACTION 14202927, ACTIVE 0 sec fetching rows
> mysql tables in use 1, locked 1
> 7 lock struct(s), heap size 1184, 47 row lock(s)
> MySQL thread id 60199, OS thread handle 0x7f73df089b00, query id 13827564 
> aradev2.ad-00.ent-01.adgroup 10.220.43.234 armdba updating
> DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat < 
> '2016-10-27 11:02:51') AND (scheduler_worker.status = 'ACTIVE')) OR 
> ((scheduler_worker.last_heartbeat < '2016-10-27 10:59:21') AND 
> (scheduler_worker.status <> 'ACTIVE')))
>

The first worker wants to update their status but the other worker tried to 
delete obsolete worker parallel.

Any idea, what I have to do?

Thx, Erwn

my environment
~~~~~~~~~~~~~~
   web2py: 2.14.6
 database: 10.1.14-MariaDB

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[web2py] DEADLOCKs between two or more scheduler worker

Reply via email to