Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-22 Thread Assaf Muller
On Tue, Mar 22, 2016 at 9:31 AM, Kevin Benton  wrote:
> Thanks for doing this. I dug into the test_volume_boot_pattern test to see
> what was going on.
>
> On the first boot, Nova called Neutron to create a port at 23:29:44 and it
> took 441ms to return the port to Nova.[1]
> Nova then plugged the interface for that port into OVS a little over 6
> seconds later at 23:29:50.[2]
> The Neutron agent attempted to process this on the iteration at 23:29:52
> [3]; however, it didn't get the ofport populated from the OVSDB monitor... a
> bug![4] The Neutron agent did catch it on the next iteration two seconds
> later on a retry and notified the Neutron server at 23:29:54.[5]

Good work as usual Kevin, just approved the fix to this bug.

> The Neutron server processed the port ACTIVE change in just under 80ms[6],
> but it did not dispatch the notification to Nova until 2 seconds later at
> 23:29:56 [7] due to the Nova notification batching mechanism[8].
>
> Total time between port create and boot is about 12 seconds. 6 in Nova and 6
> in Neutron.
>
> For the Neutron side, the bug fix should eliminate 2 seconds. We could
> probably make the Nova notifier batching mechanism a little more aggressive
> so it only batches up calls in a very short interval rather than making 2
> second buckets at all times. The remaining 2 seconds is just the agent
> processing loop interval, which can be tuned with a config but it should be
> okay if that's the only bottleneck.
>
> For Nova, we need to improve that 6 seconds after it has created the Neutron
> port before it has plugged it into the vswitch. I can see it makes some
> other calls to Neutron in this time to list security groups and floating
> IPs. Maybe this can be done asynchronously because I don't think they should
> block the initial VM boot to pause that plugs in the VIF.
>
> Completely unrelated to the boot process, the entire tempest run spent ~412
> seconds building and destroying Neutron resources in setup and teardown.[9]
> However, considering the number of tests executed, this seems reasonable so
> I'm not sure we need to work on optimizing that yet.
>
>
> Cheers,
> Kevin Benton
>
>
> 1.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_45_341
> 2.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-n-cpu.txt.gz#_2016-03-21_23_29_50_629
> 3.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_52_216
> 4. https://bugs.launchpad.net/neutron/+bug/1560464
> 5.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_54_738
> 6.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_54_813
> 7.
> http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_56_782
> 8.
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/notifiers/nova.py
> 9. egrep -R 'tearDown|setUp' tempest.txt.gz | grep 9696 | awk '{print
> $(NF)}' | ./fsum
>
> On Mon, Mar 21, 2016 at 5:09 PM, Clark Boylan  wrote:
>>
>> On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
>> > On 03/21/2016 04:09 PM, Clark Boylan wrote:
>> > > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
>> > >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
>> > >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
>> >  Do you have an a better insight of job runtimes vs jobs in other
>> >  projects?
>> >  Most of the time in the job runtime is actually spent setting the
>> >  infrastructure up, and I am not sure we can do anything about it,
>> >  unless
>> >  we
>> >  take this with Infra.
>> > >>>
>> > >>> I haven't done a comparison yet buts lets break down the runtime of
>> > >>> a
>> > >>> recent successful neutron full run against neutron master [0].
>> > >>
>> > >> And now for some comparative data from the gate-tempest-dsvm-full job
>> > >> [0]. This job also ran against a master change that merged and ran in
>> > >> the same cloud and region as the neutron job.
>> > >>
>> > > snip
>> > >> Generally each step of this job was quicker. There were big
>> > >> differences
>> > >> in devstack and tempest run time though. Is devstack much slower to
>> > >> setup neutron when compared to nova net? For tempest it looks like we
>> > >> run ~1510 tests against neutron and only ~1269 against nova net. This
>> > >> may account for the large difference there. I also recall that we run
>> > >> ipv6 tempest tests against neutron deployments that were inefficient
>> > >> and
>> > >> booted 2 qemu VMs per test (not sure if that is still the case but
>> > >> illustrates that the tests themselves may not be very quick in the
>> > >> neutron 

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-22 Thread Kevin Benton
Thanks for doing this. I dug into the test_volume_boot_pattern test to see
what was going on.

On the first boot, Nova called Neutron to create a port at 23:29:44 and it
took 441ms to return the port to Nova.[1]
Nova then plugged the interface for that port into OVS a little over 6
seconds later at 23:29:50.[2]
The Neutron agent attempted to process this on the iteration at 23:29:52
[3]; however, it didn't get the ofport populated from the OVSDB monitor...
a bug![4] The Neutron agent did catch it on the next iteration two seconds
later on a retry and notified the Neutron server at 23:29:54.[5]
The Neutron server processed the port ACTIVE change in just under 80ms[6],
but it did not dispatch the notification to Nova until 2 seconds later at
23:29:56 [7] due to the Nova notification batching mechanism[8].

Total time between port create and boot is about 12 seconds. 6 in Nova and
6 in Neutron.

For the Neutron side, the bug fix should eliminate 2 seconds. We could
probably make the Nova notifier batching mechanism a little more aggressive
so it only batches up calls in a very short interval rather than making 2
second buckets at all times. The remaining 2 seconds is just the agent
processing loop interval, which can be tuned with a config but it should be
okay if that's the only bottleneck.

For Nova, we need to improve that 6 seconds after it has created the
Neutron port before it has plugged it into the vswitch. I can see it makes
some other calls to Neutron in this time to list security groups and
floating IPs. Maybe this can be done asynchronously because I don't think
they should block the initial VM boot to pause that plugs in the VIF.

Completely unrelated to the boot process, the entire tempest run spent ~412
seconds building and destroying Neutron resources in setup and teardown.[9]
However, considering the number of tests executed, this seems reasonable so
I'm not sure we need to work on optimizing that yet.


Cheers,
Kevin Benton


1.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_45_341
2.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-n-cpu.txt.gz#_2016-03-21_23_29_50_629
3.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_52_216
4. https://bugs.launchpad.net/neutron/+bug/1560464
5.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-agt.txt.gz#_2016-03-21_23_29_54_738
6.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_54_813
7.
http://logs.openstack.org/87/295487/1/check/gate-tempest-dsvm-neutron-full/5022853/logs/screen-q-svc.txt.gz#_2016-03-21_23_29_56_782
8.
http://git.openstack.org/cgit/openstack/neutron/tree/neutron/notifiers/nova.py
9. egrep -R 'tearDown|setUp' tempest.txt.gz | grep 9696 | awk '{print
$(NF)}' | ./fsum

On Mon, Mar 21, 2016 at 5:09 PM, Clark Boylan  wrote:

> On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
> > On 03/21/2016 04:09 PM, Clark Boylan wrote:
> > > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> > >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> > >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
> >  Do you have an a better insight of job runtimes vs jobs in other
> >  projects?
> >  Most of the time in the job runtime is actually spent setting the
> >  infrastructure up, and I am not sure we can do anything about it,
> unless
> >  we
> >  take this with Infra.
> > >>>
> > >>> I haven't done a comparison yet buts lets break down the runtime of a
> > >>> recent successful neutron full run against neutron master [0].
> > >>
> > >> And now for some comparative data from the gate-tempest-dsvm-full job
> > >> [0]. This job also ran against a master change that merged and ran in
> > >> the same cloud and region as the neutron job.
> > >>
> > > snip
> > >> Generally each step of this job was quicker. There were big
> differences
> > >> in devstack and tempest run time though. Is devstack much slower to
> > >> setup neutron when compared to nova net? For tempest it looks like we
> > >> run ~1510 tests against neutron and only ~1269 against nova net. This
> > >> may account for the large difference there. I also recall that we run
> > >> ipv6 tempest tests against neutron deployments that were inefficient
> and
> > >> booted 2 qemu VMs per test (not sure if that is still the case but
> > >> illustrates that the tests themselves may not be very quick in the
> > >> neutron case).
> > >
> > > Looking at the tempest slowest tests output for each of these jobs
> > > (neutron and nova net) some tests line up really well across jobs and
> > > others do not. In order to get a better handle on the runtime for
> > > individual tests I have pushed https://review.openstack.org/295487

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Anita Kuno
On 03/21/2016 09:48 PM, Clark Boylan wrote:
> On Mon, Mar 21, 2016, at 06:37 PM, Assaf Muller wrote:
>> On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan 
>> wrote:
>>> On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote:
 On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan 
 wrote:

 If what we want is to cut down execution time I'd suggest to stop
 running Cinder tests on Neutron patches (Call it as an experiment) and
 see how long it takes for a regression to slip in. Being an
 optimistic, I would guess: Never!
>>>
>>> Experience has shown about a week and that its not an if but a when.
>>
>> I'm really curious how can a Neutron patch screw up Cinder (And the
>> regression be missed by Neutron and Nova tests that interact with
>> Neutron). I guess I wasn't around when this was happening. If anyone
>> could shed historic light on this I'd appreciate it.
> 
> Not neutron screwing up cinder just general time to regression when gate
> stops testing something. We saw it when we stopped testing postgres for
> example.
> 
 If we're running these tests on Neutron patches solely as a data point
 for performance testing, Tempest is obviously not the tool for the job
 and doesn't provide any added value we can't get from Rally and
 profilers for example. If there's otherwise value for running Cinder
 (And other tests that don't exercise the Neutron API), I'd love to
 know what it is :) I cannot remember any legit Cinder failure on
 Neutron patches.
>>>
>>> I think that is the complete wrong approach to take here. We have caught
>>> a problem in neutron your goal should be to fix it not to stop testing
>>> it.
>>
>> You misunderstood my intentions. I'm not saying we should plant our
>> head in the sand and sing until the problem goes away, but I am saying
>> that if we're interested in uncovering performance issues with
>> Neutron's control plane, then there's more effective ways to do so. If
>> you're interested and have the energy, profiling the neutron-server
>> process while running Rally tests is a much better usage of time.
>> Comparing nova-network and Neutron is just not a useful data point.
> 
> The question was why is Neutron CI so slow. Upon investigation I found
> that jobs using nova-net are ~20 minutes faster in one cloud than those
> using neutron. I am not attempting to do performance testing on Neutron
> I am attempting to narrow down where this lost 20 minutes can be found.
> In this case it is a very useful data point. We know we can run these
> tests faster because we have that data. Therefore the assumption is that
> neutron can (and honestly it should) run just as quickly.
> 
> We need these tests for integration testing (at least thats the
> assertion by them living in tempest). We also want the jobs to run
> faster (the topic of this thread). Using the data available to us we
> find that the biggest costs in these jobs is the tempest testing itself.
> The best way to make the jobs run quicker is to address the tests
> themselves. Looking at the relative performance of the two solutions
> available to us we find that there is room for improvement in the
> Neutron testing. Thats all I am trying to point out. This has nothing to
> do with proper performance testing or running rally and everything to do
> with make the integration tests quicker.
> 
>>> The fact that neutron is much slower in these test cases is an
>>> indication that these tests DO exercise the neutron api and that you do
>>> want to cover these code paths and that you need to address them, not
>>> that you should stop testing them.
>>>
>>> We are not running these tests on neutron solely for performance
>>> testing. In fact to get reasonable performance testing out of it I had
>>> to jump through a few hoops: make tempest run serially then recheck
>>> until the jobs ran in the same cloud more than once. Performance testing
>>> has never been the goal of these tests. These tests exist to make sure
>>> that OpenStack works. Boot from volume is an important piece of this and
>>> we are making sure that OpenStack (this means glance, nova, neutron,
>>> cinder) continue to work for this use case.
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

I would like to thank Clark, who could have chosen many different tasks
fighting for his attention today, yet chose to focus on getting data for
neutron tests in order to help Rosella and Ihar in their stated goal.

Thank you, Clark,
Anita.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 06:37 PM, Assaf Muller wrote:
> On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan 
> wrote:
> > On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote:
> >> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan 
> >> wrote:
> >>
> >> If what we want is to cut down execution time I'd suggest to stop
> >> running Cinder tests on Neutron patches (Call it as an experiment) and
> >> see how long it takes for a regression to slip in. Being an
> >> optimistic, I would guess: Never!
> >
> > Experience has shown about a week and that its not an if but a when.
> 
> I'm really curious how can a Neutron patch screw up Cinder (And the
> regression be missed by Neutron and Nova tests that interact with
> Neutron). I guess I wasn't around when this was happening. If anyone
> could shed historic light on this I'd appreciate it.

Not neutron screwing up cinder just general time to regression when gate
stops testing something. We saw it when we stopped testing postgres for
example.

> >> If we're running these tests on Neutron patches solely as a data point
> >> for performance testing, Tempest is obviously not the tool for the job
> >> and doesn't provide any added value we can't get from Rally and
> >> profilers for example. If there's otherwise value for running Cinder
> >> (And other tests that don't exercise the Neutron API), I'd love to
> >> know what it is :) I cannot remember any legit Cinder failure on
> >> Neutron patches.
> >
> > I think that is the complete wrong approach to take here. We have caught
> > a problem in neutron your goal should be to fix it not to stop testing
> > it.
> 
> You misunderstood my intentions. I'm not saying we should plant our
> head in the sand and sing until the problem goes away, but I am saying
> that if we're interested in uncovering performance issues with
> Neutron's control plane, then there's more effective ways to do so. If
> you're interested and have the energy, profiling the neutron-server
> process while running Rally tests is a much better usage of time.
> Comparing nova-network and Neutron is just not a useful data point.

The question was why is Neutron CI so slow. Upon investigation I found
that jobs using nova-net are ~20 minutes faster in one cloud than those
using neutron. I am not attempting to do performance testing on Neutron
I am attempting to narrow down where this lost 20 minutes can be found.
In this case it is a very useful data point. We know we can run these
tests faster because we have that data. Therefore the assumption is that
neutron can (and honestly it should) run just as quickly.

We need these tests for integration testing (at least thats the
assertion by them living in tempest). We also want the jobs to run
faster (the topic of this thread). Using the data available to us we
find that the biggest costs in these jobs is the tempest testing itself.
The best way to make the jobs run quicker is to address the tests
themselves. Looking at the relative performance of the two solutions
available to us we find that there is room for improvement in the
Neutron testing. Thats all I am trying to point out. This has nothing to
do with proper performance testing or running rally and everything to do
with make the integration tests quicker.

> > The fact that neutron is much slower in these test cases is an
> > indication that these tests DO exercise the neutron api and that you do
> > want to cover these code paths and that you need to address them, not
> > that you should stop testing them.
> >
> > We are not running these tests on neutron solely for performance
> > testing. In fact to get reasonable performance testing out of it I had
> > to jump through a few hoops: make tempest run serially then recheck
> > until the jobs ran in the same cloud more than once. Performance testing
> > has never been the goal of these tests. These tests exist to make sure
> > that OpenStack works. Boot from volume is an important piece of this and
> > we are making sure that OpenStack (this means glance, nova, neutron,
> > cinder) continue to work for this use case.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Assaf Muller
On Mon, Mar 21, 2016 at 9:26 PM, Clark Boylan  wrote:
> On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote:
>> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan 
>> wrote:
>> > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
>> >> On 03/21/2016 04:09 PM, Clark Boylan wrote:
>> >> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
>> >> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
>> >> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
>> >>  Do you have an a better insight of job runtimes vs jobs in other
>> >>  projects?
>> >>  Most of the time in the job runtime is actually spent setting the
>> >>  infrastructure up, and I am not sure we can do anything about it, 
>> >>  unless
>> >>  we
>> >>  take this with Infra.
>> >> >>>
>> >> >>> I haven't done a comparison yet buts lets break down the runtime of a
>> >> >>> recent successful neutron full run against neutron master [0].
>> >> >>
>> >> >> And now for some comparative data from the gate-tempest-dsvm-full job
>> >> >> [0]. This job also ran against a master change that merged and ran in
>> >> >> the same cloud and region as the neutron job.
>> >> >>
>> >> > snip
>> >> >> Generally each step of this job was quicker. There were big differences
>> >> >> in devstack and tempest run time though. Is devstack much slower to
>> >> >> setup neutron when compared to nova net? For tempest it looks like we
>> >> >> run ~1510 tests against neutron and only ~1269 against nova net. This
>> >> >> may account for the large difference there. I also recall that we run
>> >> >> ipv6 tempest tests against neutron deployments that were inefficient 
>> >> >> and
>> >> >> booted 2 qemu VMs per test (not sure if that is still the case but
>> >> >> illustrates that the tests themselves may not be very quick in the
>> >> >> neutron case).
>> >> >
>> >> > Looking at the tempest slowest tests output for each of these jobs
>> >> > (neutron and nova net) some tests line up really well across jobs and
>> >> > others do not. In order to get a better handle on the runtime for
>> >> > individual tests I have pushed https://review.openstack.org/295487 which
>> >> > will run tempest serially reducing the competition for resources between
>> >> > tests.
>> >> >
>> >> > Hopefully the subunit logs generated by this change can provide more
>> >> > insight into where we are losing time during the tempest test runs.
>> >
>> > The results are in, we have gate-tempest-dsvm-full [0] and
>> > gate-tempest-dsvm-neutron-full [1] job results where tempest ran
>> > serially to reduce resource contention and provide accurateish per test
>> > timing data. Both of these jobs ran on the same cloud so should have
>> > comparable performance from the underlying VMs.
>> >
>> > gate-tempest-dsvm-full
>> > Time spent in job before tempest: 700 seconds
>> > Time spent running tempest: 2428
>> > Tempest tests run: 1269 (113 skipped)
>> >
>> > gate-tempest-dsvm-neutron-full
>> > Time spent in job before tempest: 789 seconds
>> > Time spent running tempest: 4407 seconds
>> > Tempest tests run: 1510 (76 skipped)
>> >
>> > All times above are wall time as recorded by Jenkins.
>> >
>> > We can also compare the 10 slowest tests in the non neutron job against
>> > their runtimes in the neutron job. (note this isn't a list of the top 10
>> > slowest tests in the neutron job because that job runs extra tests).
>> >
>> > nova net job
>> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
>> >   85.232
>> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
>> > 83.319
>> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
>> >  50.338
>> > tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
>> > 43.494
>> > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
>> > 40.225
>> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
>> >39.653
>> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
>> > 37.720
>> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
>> > 36.355
>> > tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
>> >27.375
>> > tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
>> > 27.025
>> >
>> > neutron job
>> > 

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 06:15 PM, Assaf Muller wrote:
> On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan 
> wrote:
> > On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
> >> On 03/21/2016 04:09 PM, Clark Boylan wrote:
> >> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> >> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> >> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
> >>  Do you have an a better insight of job runtimes vs jobs in other
> >>  projects?
> >>  Most of the time in the job runtime is actually spent setting the
> >>  infrastructure up, and I am not sure we can do anything about it, 
> >>  unless
> >>  we
> >>  take this with Infra.
> >> >>>
> >> >>> I haven't done a comparison yet buts lets break down the runtime of a
> >> >>> recent successful neutron full run against neutron master [0].
> >> >>
> >> >> And now for some comparative data from the gate-tempest-dsvm-full job
> >> >> [0]. This job also ran against a master change that merged and ran in
> >> >> the same cloud and region as the neutron job.
> >> >>
> >> > snip
> >> >> Generally each step of this job was quicker. There were big differences
> >> >> in devstack and tempest run time though. Is devstack much slower to
> >> >> setup neutron when compared to nova net? For tempest it looks like we
> >> >> run ~1510 tests against neutron and only ~1269 against nova net. This
> >> >> may account for the large difference there. I also recall that we run
> >> >> ipv6 tempest tests against neutron deployments that were inefficient and
> >> >> booted 2 qemu VMs per test (not sure if that is still the case but
> >> >> illustrates that the tests themselves may not be very quick in the
> >> >> neutron case).
> >> >
> >> > Looking at the tempest slowest tests output for each of these jobs
> >> > (neutron and nova net) some tests line up really well across jobs and
> >> > others do not. In order to get a better handle on the runtime for
> >> > individual tests I have pushed https://review.openstack.org/295487 which
> >> > will run tempest serially reducing the competition for resources between
> >> > tests.
> >> >
> >> > Hopefully the subunit logs generated by this change can provide more
> >> > insight into where we are losing time during the tempest test runs.
> >
> > The results are in, we have gate-tempest-dsvm-full [0] and
> > gate-tempest-dsvm-neutron-full [1] job results where tempest ran
> > serially to reduce resource contention and provide accurateish per test
> > timing data. Both of these jobs ran on the same cloud so should have
> > comparable performance from the underlying VMs.
> >
> > gate-tempest-dsvm-full
> > Time spent in job before tempest: 700 seconds
> > Time spent running tempest: 2428
> > Tempest tests run: 1269 (113 skipped)
> >
> > gate-tempest-dsvm-neutron-full
> > Time spent in job before tempest: 789 seconds
> > Time spent running tempest: 4407 seconds
> > Tempest tests run: 1510 (76 skipped)
> >
> > All times above are wall time as recorded by Jenkins.
> >
> > We can also compare the 10 slowest tests in the non neutron job against
> > their runtimes in the neutron job. (note this isn't a list of the top 10
> > slowest tests in the neutron job because that job runs extra tests).
> >
> > nova net job
> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
> >   85.232
> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
> > 83.319
> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
> >  50.338
> > tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
> > 43.494
> > tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
> > 40.225
> > tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
> >39.653
> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
> > 37.720
> > tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
> > 36.355
> > tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
> >27.375
> > tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
> > 27.025
> >
> > neutron job
> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
> >  110.345
> > tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
> >   

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Assaf Muller
On Mon, Mar 21, 2016 at 8:09 PM, Clark Boylan  wrote:
> On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
>> On 03/21/2016 04:09 PM, Clark Boylan wrote:
>> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
>> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
>> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
>>  Do you have an a better insight of job runtimes vs jobs in other
>>  projects?
>>  Most of the time in the job runtime is actually spent setting the
>>  infrastructure up, and I am not sure we can do anything about it, unless
>>  we
>>  take this with Infra.
>> >>>
>> >>> I haven't done a comparison yet buts lets break down the runtime of a
>> >>> recent successful neutron full run against neutron master [0].
>> >>
>> >> And now for some comparative data from the gate-tempest-dsvm-full job
>> >> [0]. This job also ran against a master change that merged and ran in
>> >> the same cloud and region as the neutron job.
>> >>
>> > snip
>> >> Generally each step of this job was quicker. There were big differences
>> >> in devstack and tempest run time though. Is devstack much slower to
>> >> setup neutron when compared to nova net? For tempest it looks like we
>> >> run ~1510 tests against neutron and only ~1269 against nova net. This
>> >> may account for the large difference there. I also recall that we run
>> >> ipv6 tempest tests against neutron deployments that were inefficient and
>> >> booted 2 qemu VMs per test (not sure if that is still the case but
>> >> illustrates that the tests themselves may not be very quick in the
>> >> neutron case).
>> >
>> > Looking at the tempest slowest tests output for each of these jobs
>> > (neutron and nova net) some tests line up really well across jobs and
>> > others do not. In order to get a better handle on the runtime for
>> > individual tests I have pushed https://review.openstack.org/295487 which
>> > will run tempest serially reducing the competition for resources between
>> > tests.
>> >
>> > Hopefully the subunit logs generated by this change can provide more
>> > insight into where we are losing time during the tempest test runs.
>
> The results are in, we have gate-tempest-dsvm-full [0] and
> gate-tempest-dsvm-neutron-full [1] job results where tempest ran
> serially to reduce resource contention and provide accurateish per test
> timing data. Both of these jobs ran on the same cloud so should have
> comparable performance from the underlying VMs.
>
> gate-tempest-dsvm-full
> Time spent in job before tempest: 700 seconds
> Time spent running tempest: 2428
> Tempest tests run: 1269 (113 skipped)
>
> gate-tempest-dsvm-neutron-full
> Time spent in job before tempest: 789 seconds
> Time spent running tempest: 4407 seconds
> Tempest tests run: 1510 (76 skipped)
>
> All times above are wall time as recorded by Jenkins.
>
> We can also compare the 10 slowest tests in the non neutron job against
> their runtimes in the neutron job. (note this isn't a list of the top 10
> slowest tests in the neutron job because that job runs extra tests).
>
> nova net job
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
>   85.232
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
> 83.319
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
>  50.338
> tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
> 43.494
> tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
> 40.225
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
>39.653
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
> 37.720
> tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
> 36.355
> tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
>27.375
> tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
> 27.025
>
> neutron job
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
>  110.345
> tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
>108.170
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
>  63.852
> tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
>  

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 01:23 PM, Sean Dague wrote:
> On 03/21/2016 04:09 PM, Clark Boylan wrote:
> > On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> >> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> >>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
>  Do you have an a better insight of job runtimes vs jobs in other
>  projects?
>  Most of the time in the job runtime is actually spent setting the
>  infrastructure up, and I am not sure we can do anything about it, unless
>  we
>  take this with Infra.
> >>>
> >>> I haven't done a comparison yet buts lets break down the runtime of a
> >>> recent successful neutron full run against neutron master [0].
> >>
> >> And now for some comparative data from the gate-tempest-dsvm-full job
> >> [0]. This job also ran against a master change that merged and ran in
> >> the same cloud and region as the neutron job.
> >>
> > snip
> >> Generally each step of this job was quicker. There were big differences
> >> in devstack and tempest run time though. Is devstack much slower to
> >> setup neutron when compared to nova net? For tempest it looks like we
> >> run ~1510 tests against neutron and only ~1269 against nova net. This
> >> may account for the large difference there. I also recall that we run
> >> ipv6 tempest tests against neutron deployments that were inefficient and
> >> booted 2 qemu VMs per test (not sure if that is still the case but
> >> illustrates that the tests themselves may not be very quick in the
> >> neutron case).
> > 
> > Looking at the tempest slowest tests output for each of these jobs
> > (neutron and nova net) some tests line up really well across jobs and
> > others do not. In order to get a better handle on the runtime for
> > individual tests I have pushed https://review.openstack.org/295487 which
> > will run tempest serially reducing the competition for resources between
> > tests.
> > 
> > Hopefully the subunit logs generated by this change can provide more
> > insight into where we are losing time during the tempest test runs.

The results are in, we have gate-tempest-dsvm-full [0] and
gate-tempest-dsvm-neutron-full [1] job results where tempest ran
serially to reduce resource contention and provide accurateish per test
timing data. Both of these jobs ran on the same cloud so should have
comparable performance from the underlying VMs.

gate-tempest-dsvm-full
Time spent in job before tempest: 700 seconds
Time spent running tempest: 2428
Tempest tests run: 1269 (113 skipped)

gate-tempest-dsvm-neutron-full
Time spent in job before tempest: 789 seconds
Time spent running tempest: 4407 seconds
Tempest tests run: 1510 (76 skipped)

All times above are wall time as recorded by Jenkins.

We can also compare the 10 slowest tests in the non neutron job against
their runtimes in the neutron job. (note this isn't a list of the top 10
slowest tests in the neutron job because that job runs extra tests).

nova net job
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
  85.232
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
83.319
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
 50.338
tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
43.494
tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario
40.225
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
   39.653
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV1Test.test_volume_backup_create_get_detailed_list_restore_delete
37.720
tempest.api.volume.admin.test_volumes_backup.VolumesBackupsV2Test.test_volume_backup_create_get_detailed_list_restore_delete
36.355
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped
   27.375
tempest.scenario.test_encrypted_cinder_volumes.TestEncryptedCinderVolumes.test_encrypted_cinder_volumes_luks
27.025

neutron job
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern
 110.345
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern
   108.170
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_volume_backed_instance
 63.852
tempest.scenario.test_shelve_instance.TestShelveInstance.test_shelve_instance
   59.931
tempest.scenario.test_snapshot_pattern.TestSnapshotPattern.test_snapshot_pattern
57.835

Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Sean Dague
On 03/21/2016 04:09 PM, Clark Boylan wrote:
> On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
>> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
>>> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
 Do you have an a better insight of job runtimes vs jobs in other
 projects?
 Most of the time in the job runtime is actually spent setting the
 infrastructure up, and I am not sure we can do anything about it, unless
 we
 take this with Infra.
>>>
>>> I haven't done a comparison yet buts lets break down the runtime of a
>>> recent successful neutron full run against neutron master [0].
>>
>> And now for some comparative data from the gate-tempest-dsvm-full job
>> [0]. This job also ran against a master change that merged and ran in
>> the same cloud and region as the neutron job.
>>
> snip
>> Generally each step of this job was quicker. There were big differences
>> in devstack and tempest run time though. Is devstack much slower to
>> setup neutron when compared to nova net? For tempest it looks like we
>> run ~1510 tests against neutron and only ~1269 against nova net. This
>> may account for the large difference there. I also recall that we run
>> ipv6 tempest tests against neutron deployments that were inefficient and
>> booted 2 qemu VMs per test (not sure if that is still the case but
>> illustrates that the tests themselves may not be very quick in the
>> neutron case).
> 
> Looking at the tempest slowest tests output for each of these jobs
> (neutron and nova net) some tests line up really well across jobs and
> others do not. In order to get a better handle on the runtime for
> individual tests I have pushed https://review.openstack.org/295487 which
> will run tempest serially reducing the competition for resources between
> tests.
> 
> Hopefully the subunit logs generated by this change can provide more
> insight into where we are losing time during the tempest test runs.

Subunit logs aren't the full story here. Activity in addCleanup doesn't
get added to the subunit time accounting for the test, which causes some
interesting issues when waiting for resources to delete. I would be
especially cautious of that on some of these.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 11:49 AM, Clark Boylan wrote:
> On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> > On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
> > > Do you have an a better insight of job runtimes vs jobs in other
> > > projects?
> > > Most of the time in the job runtime is actually spent setting the
> > > infrastructure up, and I am not sure we can do anything about it, unless
> > > we
> > > take this with Infra.
> > 
> > I haven't done a comparison yet buts lets break down the runtime of a
> > recent successful neutron full run against neutron master [0].
> 
> And now for some comparative data from the gate-tempest-dsvm-full job
> [0]. This job also ran against a master change that merged and ran in
> the same cloud and region as the neutron job.
> 
snip
> Generally each step of this job was quicker. There were big differences
> in devstack and tempest run time though. Is devstack much slower to
> setup neutron when compared to nova net? For tempest it looks like we
> run ~1510 tests against neutron and only ~1269 against nova net. This
> may account for the large difference there. I also recall that we run
> ipv6 tempest tests against neutron deployments that were inefficient and
> booted 2 qemu VMs per test (not sure if that is still the case but
> illustrates that the tests themselves may not be very quick in the
> neutron case).

Looking at the tempest slowest tests output for each of these jobs
(neutron and nova net) some tests line up really well across jobs and
others do not. In order to get a better handle on the runtime for
individual tests I have pushed https://review.openstack.org/295487 which
will run tempest serially reducing the competition for resources between
tests.

Hopefully the subunit logs generated by this change can provide more
insight into where we are losing time during the tempest test runs.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 11:08 AM, Clark Boylan wrote:
> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
> > Do you have an a better insight of job runtimes vs jobs in other
> > projects?
> > Most of the time in the job runtime is actually spent setting the
> > infrastructure up, and I am not sure we can do anything about it, unless
> > we
> > take this with Infra.
> 
> I haven't done a comparison yet buts lets break down the runtime of a
> recent successful neutron full run against neutron master [0].

And now for some comparative data from the gate-tempest-dsvm-full job
[0]. This job also ran against a master change that merged and ran in
the same cloud and region as the neutron job.

Basic host setup takes 63 seconds. Start of job to 2016-03-21
16:46:41.058 [1]
Workspace setup takes 380 seconds. 2016-03-21 16:46:41.058 [1] to
2016-03-21 16:53:01.754 [2]
Devstack takes 890 seconds. 2016-03-21 16:53:19.235 [3] to 2016-03-21
17:08:10.082 [4]
Loading old tempest subunit streams takes 63 seconds. 2016-03-21
17:08:10.111 [5] to 2016-03-21 17:09:13.454 [6]
Tempest takes 1347 seconds. 2016-03-21 17:09:13.587 [7] to 2016-03-21
17:31:40.885 [8]
Then we spend the rest of the test time (52 seconds) cleaning up.
2016-03-21 17:31:40.885 [8] to end of job.

[0]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/
[1]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_46_41_058
[2]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_53_01_754
[3]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_16_53_19_235
[4]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_08_10_082
[5]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_08_10_111
[6]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_09_13_454
[7]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_09_13_587
[8]
http://logs.openstack.org/48/294548/1/gate/gate-tempest-dsvm-full/d94901e/console.html#_2016-03-21_17_31_40_885

Generally each step of this job was quicker. There were big differences
in devstack and tempest run time though. Is devstack much slower to
setup neutron when compared to nova net? For tempest it looks like we
run ~1510 tests against neutron and only ~1269 against nova net. This
may account for the large difference there. I also recall that we run
ipv6 tempest tests against neutron deployments that were inefficient and
booted 2 qemu VMs per test (not sure if that is still the case but
illustrates that the tests themselves may not be very quick in the
neutron case).

Of course we may also be seeing differences in cloud VMs (though tried
to control for by looking at tests that ran in the same region). Hard to
say without more data. In any case this hopefully serves as a good
starting point for others to dig into the ~20 minute discrepancy between
nova net + tempest and neutron + tempest.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Armando M.
On 21 March 2016 at 11:08, Clark Boylan  wrote:

> On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote:
> > Do you have an a better insight of job runtimes vs jobs in other
> > projects?
> > Most of the time in the job runtime is actually spent setting the
> > infrastructure up, and I am not sure we can do anything about it, unless
> > we
> > take this with Infra.
>
> I haven't done a comparison yet buts lets break down the runtime of a
> recent successful neutron full run against neutron master [0].
>
> Basic host setup takes 65 seconds. Start of job to 2016-03-17
> 22:14:27.397 [1]
> Workspace setup takes 520 seconds. 2016-03-17 22:14:27.397 [1] to
> 2016-03-17 22:23:07.429 [2]
> Devstack takes 1205 seconds. 2016-03-17 22:23:18.760 [3] to 2016-03-17
> 22:43:23.339 [4]
> Loading old tempest subunit streams takes 155 seconds. 2016-03-17
> 22:43:23.340 [5] to 2016-03-17 22:45:58.061 [6]
> Tempest takes 1982 seconds. 2016-03-17 22:45:58.201 [7] to 2016-03-17
> 23:19:00.117 [8]
> Then we spend the rest of the test time (76 seconds) cleaning up.
> 2016-03-17 23:19:00.117 [8] to end of job.
>
> Note that I haven't accounted for all of the time used and instead of
> focused on the major steps that use the most time. Also it is Monday
> morning and some of my math may be off.


> [0]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/
> [1]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_14_27_397
> [2]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_07_429
> [3]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_18_760
> [4]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_339
> [5]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_340
> [6]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_061
> [7]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_201
> [8]
>
> http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_23_19_00_117
>
> One big takeaway from this is that the vast majority of the time is
> spent in devstack and tempest not in the infrastructure setup. You
> should be able to dig into both the devstack setup and tempest test
> runtimes and hopefully speed things up.
>
> Hopefully this gives you enough information to get started into digging
> on this.
>

Clark: thanks for this insightful response.

I should clarify my comment about infrastructure setup (it is Monday for me
too :)): what I meant was the there is a good portion of time spent to get
to a point where tests can be run. That includes node setup as well as
stacking. That is obviously less than 50%, but even >30% feels like a
substantial overhead. I am not sure what we can do about it, but looping
you in this discussion seemed like the least this thread should do.

That said, there are many tempest tests that take over 30 seconds to
complete and those do not even touch Neutron. For those that do, then we
should clearly identify where the slowness comes from and I think that's
where, as a Neutron team, our focus should be.

IMO, before we go on and talk about evicting jobs, I think we should take a
closer look (i.e. profiling) where time is spent so that we can make each
test run leaner.

[1]
http://status.openstack.org//openstack-health/#/job/gate-tempest-dsvm-neutron-full?groupKey=project=hour=2016-03-21T18:14:19.534Z

>
>
> Clark
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Clark Boylan
On Mon, Mar 21, 2016, at 09:32 AM, Armando M. wrote: 
> Do you have an a better insight of job runtimes vs jobs in other
> projects?
> Most of the time in the job runtime is actually spent setting the
> infrastructure up, and I am not sure we can do anything about it, unless
> we
> take this with Infra.

I haven't done a comparison yet buts lets break down the runtime of a
recent successful neutron full run against neutron master [0].

Basic host setup takes 65 seconds. Start of job to 2016-03-17
22:14:27.397 [1]
Workspace setup takes 520 seconds. 2016-03-17 22:14:27.397 [1] to
2016-03-17 22:23:07.429 [2]
Devstack takes 1205 seconds. 2016-03-17 22:23:18.760 [3] to 2016-03-17
22:43:23.339 [4]
Loading old tempest subunit streams takes 155 seconds. 2016-03-17
22:43:23.340 [5] to 2016-03-17 22:45:58.061 [6]
Tempest takes 1982 seconds. 2016-03-17 22:45:58.201 [7] to 2016-03-17
23:19:00.117 [8]
Then we spend the rest of the test time (76 seconds) cleaning up.
2016-03-17 23:19:00.117 [8] to end of job.

Note that I haven't accounted for all of the time used and instead of
focused on the major steps that use the most time. Also it is Monday
morning and some of my math may be off.

[0]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/
[1]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_14_27_397
[2]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_07_429
[3]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_23_18_760
[4]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_339
[5]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_43_23_340
[6]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_061
[7]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_22_45_58_201
[8]
http://logs.openstack.org/18/294018/3/check/gate-tempest-dsvm-neutron-full/1cce1e8/console.html.gz#_2016-03-17_23_19_00_117

One big takeaway from this is that the vast majority of the time is
spent in devstack and tempest not in the infrastructure setup. You
should be able to dig into both the devstack setup and tempest test
runtimes and hopefully speed things up.

Hopefully this gives you enough information to get started into digging
on this.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Armando M.
On 21 March 2016 at 04:15, Rossella Sblendido  wrote:

> Hello all,
>
> the tests that we run on the gate for Neutron take pretty long (longer
> than one hour). I think we can improve that and make better use of the
> resources.

Here are some ideas that came up when Ihar and I discussed this topic
> during the sprint in Brno:
>
> 1) We have few jobs that are non-voting. I think it's OK to have
> non-voting jobs for a limited amount of time, while we try to make them
> stable but this shouldn't be too long, otherwise we waste time running
> those tests without even using the results. If a job is still not-voting
> after 3 months (or 4 or 6, we can find a good time interval) the job
> should be removed. My hope is that this threat will make us find some
> time to actually fix the job and make it vote :)
>
> 2) multi-node jobs run for every patch set. Is that really what we want?
> They take pretty long. We could move them to a periodic job. I know we
> can easily forget about periodic jobs, to avoid that we could run them
> in the gate queue too. If a patch can't merge because of a failure we
> will fix the issue. To trigger them for a specific patch that might
> affect multi-node we can run the experimental jobs.
>
> Thoughts?
>

Thanks for raising the topic. That said, I am not sure I see how what you
propose is going to make things better. Jobs, either non voting or multnode
run in parallel, thus reducing the number of jobs won't reduce the time to
feedback though it would improve resource usage. We are already pretty
conscious of that and compared to other projects we already run a limited
numbers of jobs, but we can do better, of course.

Do you have an a better insight of job runtimes vs jobs in other projects?
Most of the time in the job runtime is actually spent setting the
infrastructure up, and I am not sure we can do anything about it, unless we
take this with Infra.


>
> Rossella
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Armando M.
On 21 March 2016 at 04:32, Sean M. Collins  wrote:

> Rossella Sblendido wrote:
> > 2) multi-node jobs run for every patch set. Is that really what we want?
> > They take pretty long. We could move them to a periodic job.
>
> I would rather remove all the single-node jobs. Nova has been moving to
> multinode jobs for their gate (if I recall correctly my
> conversation with Dan Smith) and we should be moving in this direction
> too. We should test Neutron the way it is deployed in production.
>
>
This was not true last time I checked. Switching to multinode jobs for the
gate means that all projects in the integrated gate will have to use the
miltinode configuration.


> Also, who is really monitoring the periodic jobs? Truthfully? I know
> there are some IPv6 jobs that are periodic and I'll be the first to
> admit that I am not following them *at all*.
>
> So, my thinking is, unless it's running at the gate and inflicting pain
> on people, it's not going to be a treated as a priority. Look at Linux
> Bridge - serious race conditions that existed for years only
> got fixed once I inflicted pain on all the Neutron devs by making it
> voting and running on every patchset (sorry, not sorry).
>
> --
> Sean M. Collins
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Doug Wiegley

> On Mar 21, 2016, at 5:40 AM, Ihar Hrachyshka  wrote:
> 
> Sean M. Collins  wrote:
> 
>> Rossella Sblendido wrote:
>>> 2) multi-node jobs run for every patch set. Is that really what we want?
>>> They take pretty long. We could move them to a periodic job.
>> 
>> I would rather remove all the single-node jobs. Nova has been moving to
>> multinode jobs for their gate (if I recall correctly my
>> conversation with Dan Smith) and we should be moving in this direction
>> too. We should test Neutron the way it is deployed in production.
>> 
>> Also, who is really monitoring the periodic jobs? Truthfully? I know
>> there are some IPv6 jobs that are periodic and I'll be the first to
>> admit that I am not following them *at all*.
> 
> Well, stable maintainers track their periodic job failures. :) Email 
> notifications when something starts failing help.
> 
>> 
>> So, my thinking is, unless it's running at the gate and inflicting pain
>> on people, it's not going to be a treated as a priority. Look at Linux
>> Bridge - serious race conditions that existed for years only
>> got fixed once I inflicted pain on all the Neutron devs by making it
>> voting and running on every patchset (sorry, not sorry).
> 
> I think there is still common ground between you and Rossella’s stances: the 
> fact that we want to inflict gating pain does not mean that we want to 
> execute every single job on each PS uploaded to gerrit. For some advanced and 
> non-obvious checks [like partial grenade] the validation could be probably 
> postponed till the patch hits the gate.
> 
> Yes, sometimes it will mean gate being reset due to the bad patch. This can 
> be avoided in most of cases if reviewers and the author for a patch that 
> potentially touches a specific scenario execute the jobs before hitting the 
> gate with the patch [for example, if the job is in experimental set, it’s a 
> matter of ‘check experimental’ before pressing W+1].

We have been pretty consciously moving neutron jobs to cause pain to *neutron* 
and not everyone else, which is the opposite of a “gate only” plan. Aside from 
that being against infra policy, I think I’m reading between the lines that 
folks are wanting faster iterations between patchsets. I note that the standard 
-full job is up to 55-65 minutes, from it’s old time of 40-45. Have we 
characterized why that’s so much slower now? Perhaps addressing that will bring 
down to the turn-around for all.

Thanks,
doug


> 
> Ihar
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Ihar Hrachyshka

Sean M. Collins  wrote:


Rossella Sblendido wrote:

2) multi-node jobs run for every patch set. Is that really what we want?
They take pretty long. We could move them to a periodic job.


I would rather remove all the single-node jobs. Nova has been moving to
multinode jobs for their gate (if I recall correctly my
conversation with Dan Smith) and we should be moving in this direction
too. We should test Neutron the way it is deployed in production.

Also, who is really monitoring the periodic jobs? Truthfully? I know
there are some IPv6 jobs that are periodic and I'll be the first to
admit that I am not following them *at all*.


Well, stable maintainers track their periodic job failures. :) Email  
notifications when something starts failing help.




So, my thinking is, unless it's running at the gate and inflicting pain
on people, it's not going to be a treated as a priority. Look at Linux
Bridge - serious race conditions that existed for years only
got fixed once I inflicted pain on all the Neutron devs by making it
voting and running on every patchset (sorry, not sorry).


I think there is still common ground between you and Rossella’s stances:  
the fact that we want to inflict gating pain does not mean that we want to  
execute every single job on each PS uploaded to gerrit. For some advanced  
and non-obvious checks [like partial grenade] the validation could be  
probably postponed till the patch hits the gate.


Yes, sometimes it will mean gate being reset due to the bad patch. This can  
be avoided in most of cases if reviewers and the author for a patch that  
potentially touches a specific scenario execute the jobs before hitting the  
gate with the patch [for example, if the job is in experimental set, it’s a  
matter of ‘check experimental’ before pressing W+1].


Ihar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Sean M. Collins
Rossella Sblendido wrote:
> 2) multi-node jobs run for every patch set. Is that really what we want?
> They take pretty long. We could move them to a periodic job. 

I would rather remove all the single-node jobs. Nova has been moving to
multinode jobs for their gate (if I recall correctly my
conversation with Dan Smith) and we should be moving in this direction
too. We should test Neutron the way it is deployed in production.

Also, who is really monitoring the periodic jobs? Truthfully? I know
there are some IPv6 jobs that are periodic and I'll be the first to
admit that I am not following them *at all*.

So, my thinking is, unless it's running at the gate and inflicting pain
on people, it's not going to be a treated as a priority. Look at Linux
Bridge - serious race conditions that existed for years only
got fixed once I inflicted pain on all the Neutron devs by making it
voting and running on every patchset (sorry, not sorry).

-- 
Sean M. Collins

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron] CI jobs take pretty long, can we improve that?

2016-03-21 Thread Rossella Sblendido
Hello all,

the tests that we run on the gate for Neutron take pretty long (longer
than one hour). I think we can improve that and make better use of the
resources.
Here are some ideas that came up when Ihar and I discussed this topic
during the sprint in Brno:

1) We have few jobs that are non-voting. I think it's OK to have
non-voting jobs for a limited amount of time, while we try to make them
stable but this shouldn't be too long, otherwise we waste time running
those tests without even using the results. If a job is still not-voting
after 3 months (or 4 or 6, we can find a good time interval) the job
should be removed. My hope is that this threat will make us find some
time to actually fix the job and make it vote :)

2) multi-node jobs run for every patch set. Is that really what we want?
They take pretty long. We could move them to a periodic job. I know we
can easily forget about periodic jobs, to avoid that we could run them
in the gate queue too. If a patch can't merge because of a failure we
will fix the issue. To trigger them for a specific patch that might
affect multi-node we can run the experimental jobs.

Thoughts?

Rossella

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev