Folks, We have been analyzing a bunch of random failures in Fuel tests and encountered several ones caused by detector raising errors occasionally [1]. After attempts to reproduce the same behavior have failed we’ve decided to run the same test suit on overloaded nodes. Those test-runs allowed us to catch the same behavior we’ve seen on CI slaves. After analyzing both PostgreSQL logs and Nailgun’s code we’ve found no reasons for those deadlocks to occur.
Thinking about the facts mentioned we came up with the idea that those random deadlocks occur in cases when CI slaves are overloaded by other jobs and transactions start hitting deadlock timeout. Thus I propose to change PostgreSQL’s deadlock_timeout value from the default one to 3-5 seconds. That will slow down tests, if they run on an overloaded CI slave but will help to avoid random and false-positive deadlock warnings. References: 1. https://bugs.launchpad.net/fuel/+bug/1556070 - romcheg __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
