Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate
After some research, this review fixes the tempest failures: https://review.openstack.org/#/c/416503/1 (newer patchset has an unrelated fix for the functional tests gate) Multiple local tempest runs and gate rechecks all turned green with this fix. That is the good news part. The bad news is that I am still not sure on the root cause. The code that triggers the problems is: https://github.com/openstack/networking-sfc/blob/f5b52d5304796e44431b3874117aa0be91ed13d8/networking_sfc/services/sfc/drivers/ovs/db.py#L292 _get_port_detail() is just a wrapper on CommonDbMixin._get_by_id() from neutron, so is it triggered by two _model_query() calls in a row? Hoping someone can shed a light here, next time it may not be as an easy fix as removing an unused line On 22 December 2016 at 20:48, Mike Bayer wrote: > > On 12/20/2016 06:50 PM, Cathy Zhang wrote: >> >> Hi Bernard, >> >> Thanks for the email. I will take a look at this. Xiaodong has been >> working on tempest test scripts. >> I will work with Xiaodong on this issue. > > > I've added a comment to the issue which refers to upstream SQLAlchemy issue > https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential > contributor, though looking at the logs linked from the issue it appears > that database deadlocks are also occurring which may also be a precursor > here. There are many improvements in SQLAlchemy 1.1 such that the > "rollback()" state should not be as susceptible to a corrupted database > connection as seems to be the case here. > > > > > >> >> Cathy >> >> >> -Original Message- >> From: Bernard Cafarelli [mailto:bcafa...@redhat.com] >> Sent: Tuesday, December 20, 2016 3:00 AM >> To: OpenStack Development Mailing List >> Subject: [openstack-dev] [networking-sfc] Intermittent database >> transaction issues, affecting the tempest gate >> >> Hi everyone, >> >> we have an open bug (thanks Igor for the report) on DB transaction issues: >> https://bugs.launchpad.net/networking-sfc/+bug/1630503 >> >> The thing is, I am seeing quite a few tempest gate failures that follow >> the same pattern: at some point in the test suite, the service gets >> warnings/errors from the DB layer (reentrant call, closed transaction, >> nested rollback, …), and all following tests fail. >> >> This affects both master and stable/newton branches (not many changes for >> now in the DB parts between these branches) >> >> Some examples: >> * https://review.openstack.org/#/c/400396/ failed with console log >> >> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544 >> and service log >> >> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301 >> * https://review.openstack.org/#/c/405391/ failed, >> >> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323 >> and >> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840 >> * another on master branch: https://review.openstack.org/#/c/411194/ >> with >> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260 >> and >> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310 >> >> I took a look at the errors, but only found old-and-apparently-fixed >> pymysql bugs, and suggestions like: >> * >> http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar >> * https://review.openstack.org/#/c/230481/ >> Not really my forte, so if someone could take a look at these logs and fix >> the problem, it would be great! Especially with the upcoming multinode >> tempest gate >> >> Thanks, >> -- >> Bernard Cafarelli >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> __ >> OpenStack Development Mailin
Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate
On 12/20/2016 06:50 PM, Cathy Zhang wrote: Hi Bernard, Thanks for the email. I will take a look at this. Xiaodong has been working on tempest test scripts. I will work with Xiaodong on this issue. I've added a comment to the issue which refers to upstream SQLAlchemy issue https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential contributor, though looking at the logs linked from the issue it appears that database deadlocks are also occurring which may also be a precursor here. There are many improvements in SQLAlchemy 1.1 such that the "rollback()" state should not be as susceptible to a corrupted database connection as seems to be the case here. Cathy -Original Message- From: Bernard Cafarelli [mailto:bcafa...@redhat.com] Sent: Tuesday, December 20, 2016 3:00 AM To: OpenStack Development Mailing List Subject: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate Hi everyone, we have an open bug (thanks Igor for the report) on DB transaction issues: https://bugs.launchpad.net/networking-sfc/+bug/1630503 The thing is, I am seeing quite a few tempest gate failures that follow the same pattern: at some point in the test suite, the service gets warnings/errors from the DB layer (reentrant call, closed transaction, nested rollback, …), and all following tests fail. This affects both master and stable/newton branches (not many changes for now in the DB parts between these branches) Some examples: * https://review.openstack.org/#/c/400396/ failed with console log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544 and service log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301 * https://review.openstack.org/#/c/405391/ failed, http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323 and http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840 * another on master branch: https://review.openstack.org/#/c/411194/ with http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260 and http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310 I took a look at the errors, but only found old-and-apparently-fixed pymysql bugs, and suggestions like: * http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar * https://review.openstack.org/#/c/230481/ Not really my forte, so if someone could take a look at these logs and fix the problem, it would be great! Especially with the upcoming multinode tempest gate Thanks, -- Bernard Cafarelli __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate
Hi Bernard, Thanks for the email. I will take a look at this. Xiaodong has been working on tempest test scripts. I will work with Xiaodong on this issue. Cathy -Original Message- From: Bernard Cafarelli [mailto:bcafa...@redhat.com] Sent: Tuesday, December 20, 2016 3:00 AM To: OpenStack Development Mailing List Subject: [openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate Hi everyone, we have an open bug (thanks Igor for the report) on DB transaction issues: https://bugs.launchpad.net/networking-sfc/+bug/1630503 The thing is, I am seeing quite a few tempest gate failures that follow the same pattern: at some point in the test suite, the service gets warnings/errors from the DB layer (reentrant call, closed transaction, nested rollback, …), and all following tests fail. This affects both master and stable/newton branches (not many changes for now in the DB parts between these branches) Some examples: * https://review.openstack.org/#/c/400396/ failed with console log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544 and service log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301 * https://review.openstack.org/#/c/405391/ failed, http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323 and http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840 * another on master branch: https://review.openstack.org/#/c/411194/ with http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260 and http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310 I took a look at the errors, but only found old-and-apparently-fixed pymysql bugs, and suggestions like: * http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar * https://review.openstack.org/#/c/230481/ Not really my forte, so if someone could take a look at these logs and fix the problem, it would be great! Especially with the upcoming multinode tempest gate Thanks, -- Bernard Cafarelli __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate
Hi everyone, we have an open bug (thanks Igor for the report) on DB transaction issues: https://bugs.launchpad.net/networking-sfc/+bug/1630503 The thing is, I am seeing quite a few tempest gate failures that follow the same pattern: at some point in the test suite, the service gets warnings/errors from the DB layer (reentrant call, closed transaction, nested rollback, …), and all following tests fail. This affects both master and stable/newton branches (not many changes for now in the DB parts between these branches) Some examples: * https://review.openstack.org/#/c/400396/ failed with console log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544 and service log http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301 * https://review.openstack.org/#/c/405391/ failed, http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323 and http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840 * another on master branch: https://review.openstack.org/#/c/411194/ with http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260 and http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310 I took a look at the errors, but only found old-and-apparently-fixed pymysql bugs, and suggestions like: * http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar * https://review.openstack.org/#/c/230481/ Not really my forte, so if someone could take a look at these logs and fix the problem, it would be great! Especially with the upcoming multinode tempest gate Thanks, -- Bernard Cafarelli __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev