Re: [openstack-dev] [Fuel] [Nailgun] Deadlocks and random test failures

2015-12-30 Thread Alexander Kislitsky
Hi, guys.

Igor is absolutelly right - there are no deadlocks. We have only warnings
from detector, but they caused by difference between actual locking order
in the code and allowed by detector. It is annoying, but detection is used
only in development environment, thus it is not high bug.

When DB deadlock occurs we have ShareLock exception in the logs and it
raised before detector warning. So we always have deadlock exception in the
logs if it occurs.
I think tests failures are caused by another issue. As I can see we have a
set of random failures in the tests now and bugs on it:

https://bugs.launchpad.net/fuel/+bug/1437232
https://bugs.launchpad.net/fuel/+bug/1502908
https://bugs.launchpad.net/fuel/+bug/1518268
https://bugs.launchpad.net/fuel/+bug/1521966

We should focus on fixing these bugs. Could you please help us with
detection of the root cause of UI tests failures? May be we have another
floating bug in tests.

On Wed, Dec 30, 2015 at 4:11 PM, Igor Kalnitsky 
wrote:

> Hey Vitaly,
>
> Are you the problem is in deadlock? I see the deadlock detecter
> traceback, but not an actual deadlock.
>
> I'm not sure could it be a reason for failure or not, it's better to
> ask Alexander Kislitsky.
>
> Thanks,
> Igor
>
> On Wed, Dec 30, 2015 at 2:57 PM, Vitaly Kramskikh
>  wrote:
> > Hi,
> >
> > We have a long-living issue with deadlocks in nailgun which used to
> almost
> > harmless and caused rare test failures. But test failures become more
> > frequent and today there is ~20% probability that they will fail on a
> > working code, which is really annoying. Moreover, a few weeks ago it
> started
> > to affect UI functional tests: cluster reset task may hang, so this issue
> > now may affect real deployments.
> >
> > I think we need to do something with it ASAP.
> >
> > --
> > Vitaly Kramskikh,
> > Fuel UI Tech Lead,
> > Mirantis, Inc.
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] [Nailgun] Deadlocks and random test failures

2015-12-30 Thread Igor Kalnitsky
Hey Vitaly,

Are you the problem is in deadlock? I see the deadlock detecter
traceback, but not an actual deadlock.

I'm not sure could it be a reason for failure or not, it's better to
ask Alexander Kislitsky.

Thanks,
Igor

On Wed, Dec 30, 2015 at 2:57 PM, Vitaly Kramskikh
 wrote:
> Hi,
>
> We have a long-living issue with deadlocks in nailgun which used to almost
> harmless and caused rare test failures. But test failures become more
> frequent and today there is ~20% probability that they will fail on a
> working code, which is really annoying. Moreover, a few weeks ago it started
> to affect UI functional tests: cluster reset task may hang, so this issue
> now may affect real deployments.
>
> I think we need to do something with it ASAP.
>
> --
> Vitaly Kramskikh,
> Fuel UI Tech Lead,
> Mirantis, Inc.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel] [Nailgun] Deadlocks and random test failures

2015-12-30 Thread Vitaly Kramskikh
Alexander,

Thanks for the response.

As for tasks hanging bug, I removed "deadlock" from its title: there is
another exception, which I didn't find due to huge number of traces from
deadlock detector.

As for random test failures bugs, I totally agree we should focus on them.
Just look at this: https://review.openstack.org/#/c/262183/ - I ran
"recheck" 5 times, rebased on master 1 time and still no luck: sometimes I
get -1 from Jenkins, sometimes from Fuel CI. I think this is a Critical
issue.

2015-12-30 16:39 GMT+03:00 Alexander Kislitsky :

> Hi, guys.
>
> Igor is absolutelly right - there are no deadlocks. We have only warnings
> from detector, but they caused by difference between actual locking order
> in the code and allowed by detector. It is annoying, but detection is used
> only in development environment, thus it is not high bug.
>
> When DB deadlock occurs we have ShareLock exception in the logs and it
> raised before detector warning. So we always have deadlock exception in the
> logs if it occurs.
> I think tests failures are caused by another issue. As I can see we have a
> set of random failures in the tests now and bugs on it:
>
> https://bugs.launchpad.net/fuel/+bug/1437232
> https://bugs.launchpad.net/fuel/+bug/1502908
> https://bugs.launchpad.net/fuel/+bug/1518268
> https://bugs.launchpad.net/fuel/+bug/1521966
>
> We should focus on fixing these bugs. Could you please help us with
> detection of the root cause of UI tests failures? May be we have another
> floating bug in tests.
>
> On Wed, Dec 30, 2015 at 4:11 PM, Igor Kalnitsky 
> wrote:
>
>> Hey Vitaly,
>>
>> Are you the problem is in deadlock? I see the deadlock detecter
>> traceback, but not an actual deadlock.
>>
>> I'm not sure could it be a reason for failure or not, it's better to
>> ask Alexander Kislitsky.
>>
>> Thanks,
>> Igor
>>
>> On Wed, Dec 30, 2015 at 2:57 PM, Vitaly Kramskikh
>>  wrote:
>> > Hi,
>> >
>> > We have a long-living issue with deadlocks in nailgun which used to
>> almost
>> > harmless and caused rare test failures. But test failures become more
>> > frequent and today there is ~20% probability that they will fail on a
>> > working code, which is really annoying. Moreover, a few weeks ago it
>> started
>> > to affect UI functional tests: cluster reset task may hang, so this
>> issue
>> > now may affect real deployments.
>> >
>> > I think we need to do something with it ASAP.
>> >
>> > --
>> > Vitaly Kramskikh,
>> > Fuel UI Tech Lead,
>> > Mirantis, Inc.
>> >
>> >
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 
Vitaly Kramskikh,
Fuel UI Tech Lead,
Mirantis, Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev