Re: [openstack-dev] Lots of slow tests timing out jobs

2018-07-26 Thread Matt Riedemann

On 7/25/2018 1:46 AM, Ghanshyam Mann wrote:

As per avg time, I have voted (currently based on 14 days avg) on ethercalc which 
all test to mark as slow. I taken the criteria of >120 sec avg time.  Once we 
have more and more people votes there we can mark them slow.

[3]https://ethercalc.openstack.org/dorupfz6s9qt


I've made my votes for the compute-specific tests along with 
justification either way on each one.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Lots of slow tests timing out jobs

2018-07-25 Thread Ghanshyam Mann



  On Wed, 25 Jul 2018 22:22:24 +0900 Matt Riedemann  
wrote  
 > On 7/25/2018 1:46 AM, Ghanshyam Mann wrote:
 > > yeah, there are many tests taking too long time. I do not know the reason 
 > > this time but last time we did audit for slow tests was mainly due to ssh 
 > > failure.
 > > I have created the similar ethercalc [3] to collect time taking tests and 
 > > then round figure of their avg time taken since last 14 days from health 
 > > dashboard. Yes, there is no calculated avg time on o-h so I did not take 
 > > exact avg time its round figure.
 > > 
 > > May be 14 days  is too less to take decision to mark them slow but i think 
 > > their avg time since 3 months will be same. should we consider 3 month 
 > > time period for those ?
 > > 
 > > As per avg time, I have voted (currently based on 14 days avg) on 
 > > ethercalc which all test to mark as slow. I taken the criteria of >120 sec 
 > > avg time.  Once we have more and more people votes there we can mark them 
 > > slow.
 > > 
 > > [3]https://ethercalc.openstack.org/dorupfz6s9qt
 > 
 > Thanks for this. I haven't gone through all of the tests in there yet, 
 > but noticed (yesterday) a couple of them were personality file compute 
 > API tests, which I thought was strange. Do we have any idea where the 
 > time is being spent there? I assume it must be something with ssh 
 > validation to try and read injected files off the guest. I need to dig 
 > into this one a bit more because by default, file injection is disabled 
 > in the libvirt driver so I'm not even sure how these are running (or 
 > really doing anything useful). 

That is set to True explicitly in tempest-full job [1] and then devstack set it 
True on nova. 

>Given we have deprecated personality 
 > files in the compute API [1] I would definitely mark those as slow tests 
 > so we can still run them but don't care about them as much.

Make sense, +1.


[1] http://git.openstack.org/cgit/openstack/tempest/tree/.zuul.yaml#n56

-gmann
 > 
 > [1] 
 > https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id52
 > 
 > -- 
 > 
 > Thanks,
 > 
 > Matt
 > 
 > __
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 > 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Lots of slow tests timing out jobs

2018-07-25 Thread Matt Riedemann

On 7/25/2018 1:46 AM, Ghanshyam Mann wrote:

yeah, there are many tests taking too long time. I do not know the reason this 
time but last time we did audit for slow tests was mainly due to ssh failure.
I have created the similar ethercalc [3] to collect time taking tests and then 
round figure of their avg time taken since last 14 days from health dashboard. 
Yes, there is no calculated avg time on o-h so I did not take exact avg time 
its round figure.

May be 14 days  is too less to take decision to mark them slow but i think 
their avg time since 3 months will be same. should we consider 3 month time 
period for those ?

As per avg time, I have voted (currently based on 14 days avg) on ethercalc which 
all test to mark as slow. I taken the criteria of >120 sec avg time.  Once we 
have more and more people votes there we can mark them slow.

[3]https://ethercalc.openstack.org/dorupfz6s9qt


Thanks for this. I haven't gone through all of the tests in there yet, 
but noticed (yesterday) a couple of them were personality file compute 
API tests, which I thought was strange. Do we have any idea where the 
time is being spent there? I assume it must be something with ssh 
validation to try and read injected files off the guest. I need to dig 
into this one a bit more because by default, file injection is disabled 
in the libvirt driver so I'm not even sure how these are running (or 
really doing anything useful). Given we have deprecated personality 
files in the compute API [1] I would definitely mark those as slow tests 
so we can still run them but don't care about them as much.


[1] 
https://docs.openstack.org/nova/latest/reference/api-microversion-history.html#id52


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Lots of slow tests timing out jobs

2018-07-25 Thread Ghanshyam Mann
  On Wed, 25 Jul 2018 05:15:53 +0900 Matt Riedemann  
wrote  
 > While going through our uncategorized gate failures [1] I found that we 
 > have a lot of jobs failing (161 in 7 days) due to the tempest run timing 
 > out [2]. I originally thought it was just the networking scenario tests, 
 > but I was able to identify a handful of API tests that are also taking 
 > nearly 3 minutes each, which seems like they should be moved to scenario 
 > tests and/or marked slow so they can be run in a dedicated tempest-slow job.
 > 
 > I'm not sure how to get the history on the longest-running tests on 
 > average to determine where to start drilling down on the worst 
 > offenders, but it seems like an audit is in order.

yeah, there are many tests taking too long time. I do not know the reason this 
time but last time we did audit for slow tests was mainly due to ssh failure. 
I have created the similar ethercalc [3] to collect time taking tests and then 
round figure of their avg time taken since last 14 days from health dashboard. 
Yes, there is no calculated avg time on o-h so I did not take exact avg time 
its round figure. 

May be 14 days  is too less to take decision to mark them slow but i think 
their avg time since 3 months will be same. should we consider 3 month time 
period for those ?

As per avg time, I have voted (currently based on 14 days avg) on ethercalc 
which all test to mark as slow. I taken the criteria of >120 sec avg time.  
Once we have more and more people votes there we can mark them slow. 

[3] https://ethercalc.openstack.org/dorupfz6s9qt

-gmann

 > 
 > [1] http://status.openstack.org/elastic-recheck/data/integrated_gate.html
 > [2] https://bugs.launchpad.net/tempest/+bug/1783405
 > 
 > -- 
 > 
 > Thanks,
 > 
 > Matt
 > 
 > __
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 > 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Lots of slow tests timing out jobs

2018-07-24 Thread Matt Riedemann
While going through our uncategorized gate failures [1] I found that we 
have a lot of jobs failing (161 in 7 days) due to the tempest run timing 
out [2]. I originally thought it was just the networking scenario tests, 
but I was able to identify a handful of API tests that are also taking 
nearly 3 minutes each, which seems like they should be moved to scenario 
tests and/or marked slow so they can be run in a dedicated tempest-slow job.


I'm not sure how to get the history on the longest-running tests on 
average to determine where to start drilling down on the worst 
offenders, but it seems like an audit is in order.


[1] http://status.openstack.org/elastic-recheck/data/integrated_gate.html
[2] https://bugs.launchpad.net/tempest/+bug/1783405

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev