Hi,
I have a deadlock after a while of running more than one worker, e.g.
output for both appearences:
10:29:07,903 [31505] Scheduler - ERROR - Error cleaning up
> 10:29:07,904 [31505] Scheduler - ERROR - (1213, u'Deadlock found when
> trying to get lock; try restarting transaction')
> 10:29:07,904 [31499] Scheduler - ERROR - Error retrieving status
> 10:29:07,904 [31499] Scheduler - ERROR - (1213, u'Deadlock found when
> trying to get lock; try restarting transaction')
>
A typical output on database (MariaDB) console is for example:
------------------------
> LATEST DETECTED DEADLOCK
> ------------------------
>
...
LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s)
> ...
> UPDATE scheduler_worker SET status='ACTIVE',last_heartbeat='2016-10-27
> 10:10:22',worker_stats='{"status": "ACTIVE", "errors": 0, "workers": 0,
> "queue": 0, "empty_runs": 3, "sleep": 5, "distribution": null, "total":
> 152}' WHERE (scheduler_worker.worker_name =
> 'aradev2.ad-00.ent-01.adgroup#31499')
> ...
> DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat <
> '2016-10-27 10:10:07') AND (scheduler_worker.status = 'ACTIVE')) OR
> ((scheduler_worker.last_heartbeat < '2016-10-27 10:06:37') AND
> (scheduler_worker.status <> 'ACTIVE')))
>
It seems so, two worker wants to do something equal at the same time on the
same scheduler_worker's records.
I read the code of function send_heartbeat and saw it can happen that two
or more worker does the same steps.
Anyway, with an extra db.commit() after worker's status update inside
send_heartbeat function in scheduler.py I could reduce that appearance
concerning the 'cleaning up' situation; remains the 'retrieving status'
error.
Here a database output for this case:
------------------------
> LATEST DETECTED DEADLOCK
> ------------------------
> ...
> LOCK WAIT 3 lock struct(s), heap size 1184, 2 row lock(s)
> ...
> UPDATE scheduler_worker SET status='ACTIVE',last_heartbeat='2016-10-27
> 11:03:06',worker_stats='{"status": "ACTIVE", "errors": 0, "workers": 0,
> "queue": 0, "empty_runs": 4, "sleep": 5, "distribution": null, "total":
> 60}' WHERE (scheduler_worker.worker_name =
> 'aradev2.ad-00.ent-01.adgroup#14665')
> *** (1) WAITING FOR THIS LOCK TO BE GRANTED:
> RECORD LOCKS space id 915 page no 3 n bits 80 index `PRIMARY` of table
> `arm`.`scheduler_worker` trx table locks 1 total table locks 3 trx id
> 14202929 lock_mode X locks rec but not gap waiting lock hold time 0 wait
> time before grant 0
> *** (2) TRANSACTION:
> TRANSACTION 14202927, ACTIVE 0 sec fetching rows
> mysql tables in use 1, locked 1
> 7 lock struct(s), heap size 1184, 47 row lock(s)
> MySQL thread id 60199, OS thread handle 0x7f73df089b00, query id 13827564
> aradev2.ad-00.ent-01.adgroup 10.220.43.234 armdba updating
> DELETE FROM scheduler_worker WHERE (((scheduler_worker.last_heartbeat <
> '2016-10-27 11:02:51') AND (scheduler_worker.status = 'ACTIVE')) OR
> ((scheduler_worker.last_heartbeat < '2016-10-27 10:59:21') AND
> (scheduler_worker.status <> 'ACTIVE')))
>
The first worker wants to update their status but the other worker tried to
delete obsolete worker parallel.
Any idea, what I have to do?
Thx, Erwn
my environment
~~~~~~~~~~~~~~
web2py: 2.14.6
database: 10.1.14-MariaDB
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to the Google Groups
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.