Folks,

We have been analyzing a bunch of random failures in Fuel tests and encountered 
several ones caused by detector raising errors occasionally [1]. After attempts 
to reproduce the same behavior have failed we’ve decided to run the same test 
suit on overloaded nodes. Those test-runs allowed us to catch the same behavior 
we’ve seen on CI slaves. After analyzing both PostgreSQL logs and Nailgun’s 
code we’ve found no reasons for those deadlocks to occur.

Thinking about the facts mentioned we came up with the idea that those random 
deadlocks occur in cases when CI slaves are overloaded by other jobs and 
transactions start hitting deadlock timeout. Thus I propose to change 
PostgreSQL’s deadlock_timeout value from the default one to 3-5 seconds. That 
will slow down tests, if they run on an overloaded CI slave but will help to 
avoid random and false-positive deadlock warnings.


References:

1. https://bugs.launchpad.net/fuel/+bug/1556070


- romcheg
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to