Re: [openstack-dev] [Fuel][Nailgun] Random failures in unit tests

2016-03-23 Thread Mike Scherbakov
I finally got it passing all the tests, including performance:
https://review.openstack.org/#/c/294976/. I'd appreciate if you guys can
review/land it sooner than later: patch touches many tests, and it would be
beneficial for everyone to be based on updated code.

Thanks,

On Mon, Mar 21, 2016 at 12:22 AM Mike Scherbakov 
wrote:

> FakeUI, which is based on fake threads, is obviously needed for
> development purposes.
> Ideally we need to refactor our integration tests, so that we don't run
> whole pipeline in every test. To start, I suggest that we switch from
> threads to synchronous runs of test cases (while keeping threads for
> fakeUI).
> Please take a look & comment in this draft:
> https://review.openstack.org/#/c/294976/
>
> Thanks,
>
> On Wed, Mar 16, 2016 at 7:30 AM Igor Kalnitsky 
> wrote:
>
>> Hey Vitaly,
>>
>> Thanks for your feedback, it's an important notice. However, I think
>> you didn't get the problem quite well so let me explain it again.
>>
>> You see, Nailgun unit tests are failing due to races or deadlocks
>> happened by two transactions: test transaction and fake thread
>> transaction, and we must face it and fix it. This problem has nothing
>> to do with the problem you're encountering in UI tests. Besides,
>> removing them from test doesn't mean removing them from Nailgun code
>> base.
>>
>> So your problem must be addressed, but it's kinda another story.
>>
>> Thanks,
>> Igor
>>
>> On Wed, Mar 16, 2016 at 4:21 PM, Vitaly Kramskikh
>>  wrote:
>> > Igor,
>> >
>> > We have UI and CLI integration tests which use fake mode of Nailgun,
>> and we
>> > can't avoid using fake threads for them. So I think we need to think
>> how to
>> > fix fake threads instead. There is a critical bug which is the main
>> reason
>> > of randomly failing UI tests. To fix it, we need to fix fake threads
>> > behaviour.
>> >
>> > 2016-03-16 17:06 GMT+03:00 Igor Kalnitsky :
>> >>
>> >> Hey Fuelers,
>> >>
>> >> As you might know recently we encounter a lot of random test failures
>> >> on CI, and they are still there (likely with a bit less probability).
>> >> A nature of that random failures is actually not a random, they are
>> >> happened because of so called fake threads.
>> >>
>> >> Fake threads, actually, ain't fake at all. They are native OS threads
>> >> that are designed to emulate Astute behaviour (i.e. catch RPC call and
>> >> respond with appropriate message). Since they are native threads and
>> >> we use SQLAlchemy's scoped_session, fake threads are using a separate
>> >> database session, hence - transaction. That leads to the following
>> >> issues:
>> >>
>> >> * Races. We don't know when threads are switched, therefore, we don't
>> >> know what's committed and what's not. Some Nailgun tests sends
>> >> something via RPC (catched by fake threads) and immediately checks
>> >> something. The issue is, we can't guarantee fake threads is already
>> >> committed produced result. That could be avoided by waiting for
>> >> 'ready' status of created nailgun task, however, it's better to simply
>> >> do not use fake threads in that case and simply call appropriate
>> >> Nailgun receiver's method directly in the test.
>> >>
>> >> * Deadlocks. It's incredibly hard to ensure the same order of database
>> >> locks in test + business code on one hand and fake thread code on
>> >> other hand. That's why we can (and we do) encounter deadlocks on CI,
>> >> when test case waits for lock acquired by fake thread, and fake thread
>> >> waits for lock acquired by test case.
>> >>
>> >> Fake threads are became a bottleneck of landing patches to master in
>> >> time, and we can't ignore it anymore. We have ~190 tests that use fake
>> >> threads, and fixing them all at once is a boring routine. So I kindly
>> >> ask Nailgun contrubitors to fix them as soon as we face them. Let's
>> >> file a bug on each file in CI, and quicly prepare a separate patch
>> >> that removes fake thread from failed test.
>> >>
>> >> Thanks in advance,
>> >> Igor
>> >>
>> >>
>> __
>> >> OpenStack Development Mailing List (not for usage questions)
>> >> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> >
>> >
>> >
>> > --
>> > Vitaly Kramskikh,
>> > Fuel UI Tech Lead,
>> > Mirantis, Inc.
>> >
>> >
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> 

Re: [openstack-dev] [Fuel][Nailgun] Random failures in unit tests

2016-03-21 Thread Mike Scherbakov
FakeUI, which is based on fake threads, is obviously needed for development
purposes.
Ideally we need to refactor our integration tests, so that we don't run
whole pipeline in every test. To start, I suggest that we switch from
threads to synchronous runs of test cases (while keeping threads for
fakeUI).
Please take a look & comment in this draft:
https://review.openstack.org/#/c/294976/

Thanks,

On Wed, Mar 16, 2016 at 7:30 AM Igor Kalnitsky 
wrote:

> Hey Vitaly,
>
> Thanks for your feedback, it's an important notice. However, I think
> you didn't get the problem quite well so let me explain it again.
>
> You see, Nailgun unit tests are failing due to races or deadlocks
> happened by two transactions: test transaction and fake thread
> transaction, and we must face it and fix it. This problem has nothing
> to do with the problem you're encountering in UI tests. Besides,
> removing them from test doesn't mean removing them from Nailgun code
> base.
>
> So your problem must be addressed, but it's kinda another story.
>
> Thanks,
> Igor
>
> On Wed, Mar 16, 2016 at 4:21 PM, Vitaly Kramskikh
>  wrote:
> > Igor,
> >
> > We have UI and CLI integration tests which use fake mode of Nailgun, and
> we
> > can't avoid using fake threads for them. So I think we need to think how
> to
> > fix fake threads instead. There is a critical bug which is the main
> reason
> > of randomly failing UI tests. To fix it, we need to fix fake threads
> > behaviour.
> >
> > 2016-03-16 17:06 GMT+03:00 Igor Kalnitsky :
> >>
> >> Hey Fuelers,
> >>
> >> As you might know recently we encounter a lot of random test failures
> >> on CI, and they are still there (likely with a bit less probability).
> >> A nature of that random failures is actually not a random, they are
> >> happened because of so called fake threads.
> >>
> >> Fake threads, actually, ain't fake at all. They are native OS threads
> >> that are designed to emulate Astute behaviour (i.e. catch RPC call and
> >> respond with appropriate message). Since they are native threads and
> >> we use SQLAlchemy's scoped_session, fake threads are using a separate
> >> database session, hence - transaction. That leads to the following
> >> issues:
> >>
> >> * Races. We don't know when threads are switched, therefore, we don't
> >> know what's committed and what's not. Some Nailgun tests sends
> >> something via RPC (catched by fake threads) and immediately checks
> >> something. The issue is, we can't guarantee fake threads is already
> >> committed produced result. That could be avoided by waiting for
> >> 'ready' status of created nailgun task, however, it's better to simply
> >> do not use fake threads in that case and simply call appropriate
> >> Nailgun receiver's method directly in the test.
> >>
> >> * Deadlocks. It's incredibly hard to ensure the same order of database
> >> locks in test + business code on one hand and fake thread code on
> >> other hand. That's why we can (and we do) encounter deadlocks on CI,
> >> when test case waits for lock acquired by fake thread, and fake thread
> >> waits for lock acquired by test case.
> >>
> >> Fake threads are became a bottleneck of landing patches to master in
> >> time, and we can't ignore it anymore. We have ~190 tests that use fake
> >> threads, and fixing them all at once is a boring routine. So I kindly
> >> ask Nailgun contrubitors to fix them as soon as we face them. Let's
> >> file a bug on each file in CI, and quicly prepare a separate patch
> >> that removes fake thread from failed test.
> >>
> >> Thanks in advance,
> >> Igor
> >>
> >>
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> >
> > --
> > Vitaly Kramskikh,
> > Fuel UI Tech Lead,
> > Mirantis, Inc.
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 
Mike Scherbakov
#mihgen
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Fuel][Nailgun] Random failures in unit tests

2016-03-19 Thread Igor Kalnitsky
Hey Fuelers,

As you might know recently we encounter a lot of random test failures
on CI, and they are still there (likely with a bit less probability).
A nature of that random failures is actually not a random, they are
happened because of so called fake threads.

Fake threads, actually, ain't fake at all. They are native OS threads
that are designed to emulate Astute behaviour (i.e. catch RPC call and
respond with appropriate message). Since they are native threads and
we use SQLAlchemy's scoped_session, fake threads are using a separate
database session, hence - transaction. That leads to the following
issues:

* Races. We don't know when threads are switched, therefore, we don't
know what's committed and what's not. Some Nailgun tests sends
something via RPC (catched by fake threads) and immediately checks
something. The issue is, we can't guarantee fake threads is already
committed produced result. That could be avoided by waiting for
'ready' status of created nailgun task, however, it's better to simply
do not use fake threads in that case and simply call appropriate
Nailgun receiver's method directly in the test.

* Deadlocks. It's incredibly hard to ensure the same order of database
locks in test + business code on one hand and fake thread code on
other hand. That's why we can (and we do) encounter deadlocks on CI,
when test case waits for lock acquired by fake thread, and fake thread
waits for lock acquired by test case.

Fake threads are became a bottleneck of landing patches to master in
time, and we can't ignore it anymore. We have ~190 tests that use fake
threads, and fixing them all at once is a boring routine. So I kindly
ask Nailgun contrubitors to fix them as soon as we face them. Let's
file a bug on each file in CI, and quicly prepare a separate patch
that removes fake thread from failed test.

Thanks in advance,
Igor

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel][Nailgun] Random failures in unit tests

2016-03-19 Thread Igor Kalnitsky
Hey Vitaly,

Thanks for your feedback, it's an important notice. However, I think
you didn't get the problem quite well so let me explain it again.

You see, Nailgun unit tests are failing due to races or deadlocks
happened by two transactions: test transaction and fake thread
transaction, and we must face it and fix it. This problem has nothing
to do with the problem you're encountering in UI tests. Besides,
removing them from test doesn't mean removing them from Nailgun code
base.

So your problem must be addressed, but it's kinda another story.

Thanks,
Igor

On Wed, Mar 16, 2016 at 4:21 PM, Vitaly Kramskikh
 wrote:
> Igor,
>
> We have UI and CLI integration tests which use fake mode of Nailgun, and we
> can't avoid using fake threads for them. So I think we need to think how to
> fix fake threads instead. There is a critical bug which is the main reason
> of randomly failing UI tests. To fix it, we need to fix fake threads
> behaviour.
>
> 2016-03-16 17:06 GMT+03:00 Igor Kalnitsky :
>>
>> Hey Fuelers,
>>
>> As you might know recently we encounter a lot of random test failures
>> on CI, and they are still there (likely with a bit less probability).
>> A nature of that random failures is actually not a random, they are
>> happened because of so called fake threads.
>>
>> Fake threads, actually, ain't fake at all. They are native OS threads
>> that are designed to emulate Astute behaviour (i.e. catch RPC call and
>> respond with appropriate message). Since they are native threads and
>> we use SQLAlchemy's scoped_session, fake threads are using a separate
>> database session, hence - transaction. That leads to the following
>> issues:
>>
>> * Races. We don't know when threads are switched, therefore, we don't
>> know what's committed and what's not. Some Nailgun tests sends
>> something via RPC (catched by fake threads) and immediately checks
>> something. The issue is, we can't guarantee fake threads is already
>> committed produced result. That could be avoided by waiting for
>> 'ready' status of created nailgun task, however, it's better to simply
>> do not use fake threads in that case and simply call appropriate
>> Nailgun receiver's method directly in the test.
>>
>> * Deadlocks. It's incredibly hard to ensure the same order of database
>> locks in test + business code on one hand and fake thread code on
>> other hand. That's why we can (and we do) encounter deadlocks on CI,
>> when test case waits for lock acquired by fake thread, and fake thread
>> waits for lock acquired by test case.
>>
>> Fake threads are became a bottleneck of landing patches to master in
>> time, and we can't ignore it anymore. We have ~190 tests that use fake
>> threads, and fixing them all at once is a boring routine. So I kindly
>> ask Nailgun contrubitors to fix them as soon as we face them. Let's
>> file a bug on each file in CI, and quicly prepare a separate patch
>> that removes fake thread from failed test.
>>
>> Thanks in advance,
>> Igor
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> --
> Vitaly Kramskikh,
> Fuel UI Tech Lead,
> Mirantis, Inc.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Fuel][Nailgun] Random failures in unit tests

2016-03-19 Thread Vitaly Kramskikh
Igor,

We have UI and CLI integration tests which use fake mode of Nailgun, and we
can't avoid using fake threads for them. So I think we need to think how to
fix fake threads instead. There is a critical bug
 which is the main reason of
randomly failing UI tests. To fix it, we need to fix fake threads behaviour.

2016-03-16 17:06 GMT+03:00 Igor Kalnitsky :

> Hey Fuelers,
>
> As you might know recently we encounter a lot of random test failures
> on CI, and they are still there (likely with a bit less probability).
> A nature of that random failures is actually not a random, they are
> happened because of so called fake threads.
>
> Fake threads, actually, ain't fake at all. They are native OS threads
> that are designed to emulate Astute behaviour (i.e. catch RPC call and
> respond with appropriate message). Since they are native threads and
> we use SQLAlchemy's scoped_session, fake threads are using a separate
> database session, hence - transaction. That leads to the following
> issues:
>
> * Races. We don't know when threads are switched, therefore, we don't
> know what's committed and what's not. Some Nailgun tests sends
> something via RPC (catched by fake threads) and immediately checks
> something. The issue is, we can't guarantee fake threads is already
> committed produced result. That could be avoided by waiting for
> 'ready' status of created nailgun task, however, it's better to simply
> do not use fake threads in that case and simply call appropriate
> Nailgun receiver's method directly in the test.
>
> * Deadlocks. It's incredibly hard to ensure the same order of database
> locks in test + business code on one hand and fake thread code on
> other hand. That's why we can (and we do) encounter deadlocks on CI,
> when test case waits for lock acquired by fake thread, and fake thread
> waits for lock acquired by test case.
>
> Fake threads are became a bottleneck of landing patches to master in
> time, and we can't ignore it anymore. We have ~190 tests that use fake
> threads, and fixing them all at once is a boring routine. So I kindly
> ask Nailgun contrubitors to fix them as soon as we face them. Let's
> file a bug on each file in CI, and quicly prepare a separate patch
> that removes fake thread from failed test.
>
> Thanks in advance,
> Igor
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Vitaly Kramskikh,
Fuel UI Tech Lead,
Mirantis, Inc.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev