Re: [openstack-dev] [nova] CI for reliable live-migration

2015-09-09 Thread Timofei Durakov
Hello,
Update for gate-tempest-dsvm-multinode-full job.
Here is top 12 failing tests in weekly period:
tempest.api.compute.servers.test_disk_config.ServerDiskConfigTestJSON.test_resize_server_from_manual_to_auto:
14
tempest.api.compute.servers.test_disk_config.ServerDiskConfigTestJSON.test_resize_server_from_auto_to_manual:
14
tempest.scenario.test_server_advanced_ops.TestServerAdvancedOps.test_resize_server_confirm:
12
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert:
12
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm:
12
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration_paused:
12
tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_verify_resize_state:
12
tempest.api.compute.admin.test_migrations.MigrationsAdminTest.test_list_migrations_in_flavor_resize_situation:
12
tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_confirm_from_stopped:
12
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern:
10
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern:
10
tempest.api.compute.admin.test_live_migration.LiveBlockMigrationTestJSON.test_live_block_migration:
10


Full list of failing tests: http://xsnippet.org/360947/


On Fri, Aug 28, 2015 at 12:14 AM, Kraminsky, Arkadiy <
arkadiy.kramin...@hp.com> wrote:

> Hello,
>
> I'm a new developer on the Openstack project and am in the process of
> creating live migration CI for HP's 3PAR and Lefthand backends. I noticed
> you guys are looking for someone to pick up Joe Gordon's change for volume
> backed live migration tests and we can sure use something like this. I can
> take a look into the change, and see what I can do. :)
>
> Thanks,
>
> Arkadiy Kraminsky
> 
> From: Joe Gordon [joe.gord...@gmail.com]
> Sent: Wednesday, August 26, 2015 9:26 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] CI for reliable live-migration
>
>
>
> On Wed, Aug 26, 2015 at 8:18 AM, Matt Riedemann <
> mrie...@linux.vnet.ibm.com<mailto:mrie...@linux.vnet.ibm.com>> wrote:
>
>
> On 8/26/2015 3:21 AM, Timofei Durakov wrote:
> Hello,
>
> Here is the situation: nova has live-migration feature but doesn't have
> ci job to cover it by functional tests, only
> gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
> block-migration only.
> The problem here is, that live-migration could be different, depending
> on how instance was booted(volume-backed/ephemeral), how environment is
> configured(is shared instance directory(NFS, for example), or RBD used
> to store ephemeral disk), or for example user don't have that and is
> going to use --block-migrate flag. To claim that we have reliable
> live-migration in nova, we should check it at least on envs with rbd or
> nfs as more popular than envs without shared storages at all.
> Here is the steps for that:
>
>  1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
> block-migration testing purposes;
>
> When we are ready to make multinode voting we should remove the equivalent
> single node job.
>
>
> If it's been stable for awhile then I'd be OK with making it voting on
> nova changes, I agree it's important to have at least *something* that
> gates on multi-node testing for nova since we seem to break this a few
> times per release.
>
> Last I checked it isn't as stable is single node yet:
> http://jogo.github.io/gate/multinode [0].  The data going into graphite
> is a bit noisy so this may be a red herring, but at the very least it needs
> to be investigated. When I was last looking into this there were at least
> two known bugs:
>
> https://bugs.launchpad.net/nova/+bug/1445569
> <https://bugs.launchpad.net/nova/+bug/1445569>
> https://bugs.launchpad.net/nova/+bug/1462305
>
>
> [0]
> http://graphite.openstack.org/graph/?from=-36hours=500=now=800=ff=00=100=0=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)=Check%20Failure%20Rates%20(36%20hours)&_t=0.48646087432280183
> <
> http://graphite.openstack.org/graph/?from=-36hours=500=now=800=ff=00=

Re: [openstack-dev] [nova] CI for reliable live-migration

2015-08-27 Thread Kraminsky, Arkadiy
Hello,

I'm a new developer on the Openstack project and am in the process of creating 
live migration CI for HP's 3PAR and Lefthand backends. I noticed you guys are 
looking for someone to pick up Joe Gordon's change for volume backed live 
migration tests and we can sure use something like this. I can take a look into 
the change, and see what I can do. :)

Thanks,

Arkadiy Kraminsky

From: Joe Gordon [joe.gord...@gmail.com]
Sent: Wednesday, August 26, 2015 9:26 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] CI for reliable live-migration



On Wed, Aug 26, 2015 at 8:18 AM, Matt Riedemann 
mrie...@linux.vnet.ibm.commailto:mrie...@linux.vnet.ibm.com wrote:


On 8/26/2015 3:21 AM, Timofei Durakov wrote:
Hello,

Here is the situation: nova has live-migration feature but doesn't have
ci job to cover it by functional tests, only
gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
block-migration only.
The problem here is, that live-migration could be different, depending
on how instance was booted(volume-backed/ephemeral), how environment is
configured(is shared instance directory(NFS, for example), or RBD used
to store ephemeral disk), or for example user don't have that and is
going to use --block-migrate flag. To claim that we have reliable
live-migration in nova, we should check it at least on envs with rbd or
nfs as more popular than envs without shared storages at all.
Here is the steps for that:

 1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
block-migration testing purposes;

When we are ready to make multinode voting we should remove the equivalent 
single node job.


If it's been stable for awhile then I'd be OK with making it voting on nova 
changes, I agree it's important to have at least *something* that gates on 
multi-node testing for nova since we seem to break this a few times per release.

Last I checked it isn't as stable is single node yet: 
http://jogo.github.io/gate/multinode [0].  The data going into graphite is a 
bit noisy so this may be a red herring, but at the very least it needs to be 
investigated. When I was last looking into this there were at least two known 
bugs:

https://bugs.launchpad.net/nova/+bug/1445569
https://bugs.launchpad.net/nova/+bug/1445569
https://bugs.launchpad.net/nova/+bug/1462305


[0] 
http://graphite.openstack.org/graph/?from=-36hoursheight=500until=nowwidth=800bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)title=Check%20Failure%20Rates%20(36%20hours)_t=0.48646087432280183http://graphite.openstack.org/graph/?from=-36hoursheight=500until=nowwidth=800bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-f
 
ull.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)title=Check%20Failure%20Rates%20(36%20hours)_t=0.48646087432280183


 2. contribute to tempest to cover volume-backed instances live-migration;

jogo has had a patch up for this for awhile:

https://review.openstack.org/#/c/165233/

Since it's not full time on openstack anymore I assume some help there in 
picking up the change would be appreciated.

yes please


 3. make another job with rbd for storing ephemerals, it also requires
changing tempest config;

We already have a voting ceph job for nova - can we turn that into a multi-node 
testing job and run live migration with shared storage using that?

 4. make job with nfs for ephemerals.

Can't we use a multi-node ceph job (#3) for this?


These steps should help us to improve current situation with
live-migration.

--
Timofey.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribehttp://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--

Thanks,

Matt Riedemann

Re: [openstack-dev] [nova] CI for reliable live-migration

2015-08-26 Thread Joe Gordon
On Wed, Aug 26, 2015 at 8:18 AM, Matt Riedemann mrie...@linux.vnet.ibm.com
wrote:



 On 8/26/2015 3:21 AM, Timofei Durakov wrote:

 Hello,

 Here is the situation: nova has live-migration feature but doesn't have
 ci job to cover it by functional tests, only
 gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
 block-migration only.
 The problem here is, that live-migration could be different, depending
 on how instance was booted(volume-backed/ephemeral), how environment is
 configured(is shared instance directory(NFS, for example), or RBD used
 to store ephemeral disk), or for example user don't have that and is
 going to use --block-migrate flag. To claim that we have reliable
 live-migration in nova, we should check it at least on envs with rbd or
 nfs as more popular than envs without shared storages at all.
 Here is the steps for that:

  1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
 block-migration testing purposes;


When we are ready to make multinode voting we should remove the equivalent
single node job.



 If it's been stable for awhile then I'd be OK with making it voting on
 nova changes, I agree it's important to have at least *something* that
 gates on multi-node testing for nova since we seem to break this a few
 times per release.


Last I checked it isn't as stable is single node yet:
http://jogo.github.io/gate/multinode [0].  The data going into graphite is
a bit noisy so this may be a red herring, but at the very least it needs to
be investigated. When I was last looking into this there were at least two
known bugs:

https://bugs.launchpad.net/nova/+bug/1445569
https://bugs.launchpad.net/nova/+bug/1445569
https://bugs.launchpad.net/nova/+bug/1462305


[0]
http://graphite.openstack.org/graph/?from=-36hoursheight=500until=nowwidth=800bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.{SUCCESS,FAILURE})),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)title=Check%20Failure%20Rates%20(36%20hours)_t=0.48646087432280183
http://graphite.openstack.org/graph/?from=-36hoursheight=500until=nowwidth=800bgcolor=fffgcolor=00yMax=100yMin=0target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-full.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-full%27),%27orange%27)target=color(alias(movingAverage(asPercent(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.FAILURE,sum(stats.zuul.pipeline.check.job.gate-tempest-dsvm-multinode-full.%7BSUCCESS,FAILURE%7D)),%275hours%27),%20%27gate-tempest-dsvm-multinode-full%27),%27brown%27)title=Check%20Failure%20Rates%20(36%20hours)_t=0.48646087432280183



  2. contribute to tempest to cover volume-backed instances live-migration;


 jogo has had a patch up for this for awhile:

 https://review.openstack.org/#/c/165233/

 Since it's not full time on openstack anymore I assume some help there in
 picking up the change would be appreciated.


yes please



  3. make another job with rbd for storing ephemerals, it also requires
 changing tempest config;


 We already have a voting ceph job for nova - can we turn that into a
 multi-node testing job and run live migration with shared storage using
 that?


  4. make job with nfs for ephemerals.


 Can't we use a multi-node ceph job (#3) for this?


 These steps should help us to improve current situation with
 live-migration.

 --
 Timofey.



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 --

 Thanks,

 Matt Riedemann


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] CI for reliable live-migration

2015-08-26 Thread Timofei Durakov
Update:

1. Job fails from time to time, I'm collecting statistics to understand
whether it is valid fails or some races, etc.
2. This sounds good:

 jogo has had a patch up for this for awhile:
 https://review.openstack.org/#/c/165233/

3. It's required more research:

 We already have a voting ceph job for nova - can we turn that into a
 multi-node testing job and run live migration with shared storage using
 that?

4.  I think no: there is a branch in execution flow that could be checked,
when we have shared instance path only.

 Can't we use a multi-node ceph job (#3) for this?


On Wed, Aug 26, 2015 at 6:18 PM, Matt Riedemann mrie...@linux.vnet.ibm.com
wrote:



 On 8/26/2015 3:21 AM, Timofei Durakov wrote:

 Hello,

 Here is the situation: nova has live-migration feature but doesn't have
 ci job to cover it by functional tests, only
 gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
 block-migration only.
 The problem here is, that live-migration could be different, depending
 on how instance was booted(volume-backed/ephemeral), how environment is
 configured(is shared instance directory(NFS, for example), or RBD used
 to store ephemeral disk), or for example user don't have that and is
 going to use --block-migrate flag. To claim that we have reliable
 live-migration in nova, we should check it at least on envs with rbd or
 nfs as more popular than envs without shared storages at all.
 Here is the steps for that:

  1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
 block-migration testing purposes;


 If it's been stable for awhile then I'd be OK with making it voting on
 nova changes, I agree it's important to have at least *something* that
 gates on multi-node testing for nova since we seem to break this a few
 times per release.

  2. contribute to tempest to cover volume-backed instances live-migration;


 jogo has had a patch up for this for awhile:

 https://review.openstack.org/#/c/165233/

 Since it's not full time on openstack anymore I assume some help there in
 picking up the change would be appreciated.

  3. make another job with rbd for storing ephemerals, it also requires
 changing tempest config;


 We already have a voting ceph job for nova - can we turn that into a
 multi-node testing job and run live migration with shared storage using
 that?

  4. make job with nfs for ephemerals.


 Can't we use a multi-node ceph job (#3) for this?


 These steps should help us to improve current situation with
 live-migration.

 --
 Timofey.



 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 --

 Thanks,

 Matt Riedemann


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] CI for reliable live-migration

2015-08-26 Thread Timofei Durakov
Hello,

Here is the situation: nova has live-migration feature but doesn't have ci
job to cover it by functional tests, only
gate-tempest-dsvm-multinode-full(non-voting,
btw), which covers block-migration only.
The problem here is, that live-migration could be different, depending on
how instance was booted(volume-backed/ephemeral), how environment is
configured(is shared instance directory(NFS, for example), or RBD used to
store ephemeral disk), or for example user don't have that and is going to
use --block-migrate flag. To claim that we have reliable live-migration in
nova, we should check it at least on envs with rbd or nfs as more popular
than envs without shared storages at all.
Here is the steps for that:

   1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
   block-migration testing purposes;
   2. contribute to tempest to cover volume-backed instances live-migration;
   3. make another job with rbd for storing ephemerals, it also requires
   changing tempest config;
   4. make job with nfs for ephemerals.

These steps should help us to improve current situation with
live-migration.

--
Timofey.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] CI for reliable live-migration

2015-08-26 Thread Matt Riedemann



On 8/26/2015 3:21 AM, Timofei Durakov wrote:

Hello,

Here is the situation: nova has live-migration feature but doesn't have
ci job to cover it by functional tests, only
gate-tempest-dsvm-multinode-full(non-voting, btw), which covers
block-migration only.
The problem here is, that live-migration could be different, depending
on how instance was booted(volume-backed/ephemeral), how environment is
configured(is shared instance directory(NFS, for example), or RBD used
to store ephemeral disk), or for example user don't have that and is
going to use --block-migrate flag. To claim that we have reliable
live-migration in nova, we should check it at least on envs with rbd or
nfs as more popular than envs without shared storages at all.
Here is the steps for that:

 1. make  gate-tempest-dsvm-multinode-full voting, as it looks OK for
block-migration testing purposes;


If it's been stable for awhile then I'd be OK with making it voting on 
nova changes, I agree it's important to have at least *something* that 
gates on multi-node testing for nova since we seem to break this a few 
times per release.



 2. contribute to tempest to cover volume-backed instances live-migration;


jogo has had a patch up for this for awhile:

https://review.openstack.org/#/c/165233/

Since it's not full time on openstack anymore I assume some help there 
in picking up the change would be appreciated.



 3. make another job with rbd for storing ephemerals, it also requires
changing tempest config;


We already have a voting ceph job for nova - can we turn that into a 
multi-node testing job and run live migration with shared storage using 
that?



 4. make job with nfs for ephemerals.


Can't we use a multi-node ceph job (#3) for this?



These steps should help us to improve current situation with
live-migration.

--
Timofey.



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev