Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
I just opened this bug, it's going to be one of the blockers for us to get PowerVM CI going in Icehouse: https://bugs.launchpad.net/nova/+bug/1241619 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Matt Riedemann/Rochester/IBM@IBMUS To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/11/2013 10:59 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI Matthew Treinish mtrein...@kortar.org wrote on 10/10/2013 10:31:29 PM: From: Matthew Treinish mtrein...@kortar.org To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 11:07 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote: On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. Well that's almost a full run right now, the full tempest jobs have 1290 tests of which we skip 65 because of bugs or configuration. (don't run neutron api tests without neutron) That number is actually pretty high since you are running with neutron. Right now the neutron gating jobs only have 221 jobs and skip 8 of those. Can you share the list of things you've got working with neutron so we can up the number of gating tests? Here is the nose.cfg we run with: Some of the tests are excluded because of performance issues that still need to be worked out (like test_list_image_filters - it works but it takes over 20 minutes sometimes). Some of the tests are excluded because of limitations with DB2, e.g. test_list_servers_filtered_by_name_wildcard Some of them are probably old excludes on bugs that are now fixed. We have to go back through what's excluded every once in awhile to figure out what's still broken and clean things up. Here is the tempest.cfg we use on ppc64: And here are the xunit results from our latest run: Note that we have known issues with some cinder and neutron failures in there. I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. ++ I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
And this guy: https://bugs.launchpad.net/nova/+bug/1241628 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From: Matt Riedemann/Rochester/IBM@IBMUS To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/18/2013 09:25 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI I just opened this bug, it's going to be one of the blockers for us to get PowerVM CI going in Icehouse: https://bugs.launchpad.net/nova/+bug/1241619 Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States From:Matt Riedemann/Rochester/IBM@IBMUS To:OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date:10/11/2013 10:59 AM Subject:Re: [openstack-dev] [nova][powervm] my notes from the meeting onpowervm CI Matthew Treinish mtrein...@kortar.org wrote on 10/10/2013 10:31:29 PM: From: Matthew Treinish mtrein...@kortar.org To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 11:07 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote: On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. Well that's almost a full run right now, the full tempest jobs have 1290 tests of which we skip 65 because of bugs or configuration. (don't run neutron api tests without neutron) That number is actually pretty high since you are running with neutron. Right now the neutron gating jobs only have 221 jobs and skip 8 of those. Can you share the list of things you've got working with neutron so we can up the number of gating tests? Here is the nose.cfg we run with: Some of the tests are excluded because of performance issues that still need to be worked out (like test_list_image_filters - it works but it takes over 20 minutes sometimes). Some of the tests are excluded because of limitations with DB2, e.g. test_list_servers_filtered_by_name_wildcard Some of them are probably old excludes on bugs that are now fixed. We have to go back through what's excluded every once in awhile to figure out what's still broken and clean things up. Here is the tempest.cfg we use on ppc64: And here are the xunit results from our latest run: Note that we have known issues with some cinder and neutron failures in there. I think
[openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
Based on the discussion with Russell and Dan Smith in the nova meeting today, here are some of my notes from the meeting that can continue the discussion. These are all pretty rough at the moment so please bear with me, this is more to just get the ball rolling on ideas. Notes on powervm CI: 1. What OS to run on? Fedora 19, RHEL 6.4? - Either of those is probably fine, we use RHEL 6.4 right now internally. 2. Deployment - RDO? SmokeStack? Devstack? - SmokeStack is preferable since it packages rpms which is what we're using internally. 3. Backing database - mysql or DB2 10.5? - Prefer DB2 since that's what we want to support in Icehouse and it's what we use internally, but there are differences in how long it takes to create a database with DB2 versus MySQL so when you multiply that times 7 databases (keystone, cinder, glance, nova, heat, neutron, ceilometer) it's going to add up unless we can figure out a better way to do it (single database with multiple schemas?). Internally we use a pre-created image with the DB2 databases already created, we just run the migrate scripts against them so we don't have to wait for the create times every run - would that fly in community? 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. 7. Cinder backend? We're running with the storwize driver but we do we do about the remote v7000? Again, just getting some thoughts out there to help us figure out our goals for this, especially around 4 and 5. Thanks, MATT RIEDEMANN Advisory Software Engineer Cloud Solutions and OpenStack Development Phone: 1-507-253-7622 | Mobile: 1-507-990-1889 E-mail: mrie...@us.ibm.com 3605 Hwy 52 N Rochester, MN 55901-1407 United States image/gif___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
Dan Smith d...@danplanet.com wrote on 10/10/2013 08:26:14 PM: From: Dan Smith d...@danplanet.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 08:31 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach our nose.cfg and latest x-unit results xml file. 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. Hmm, so that means neutron? AFAIK, not much of tempest runs with Nova/Neutron. I kinda think that since nova-network is our default right now (for better or worse) that the run should include that mode, especially if using neutron excludes a large portion of the tests. I think you said you're actually running a bunch of tempest right now, which conflicts with my understanding of neutron workiness. Can you clarify? Correct, we're running with neutron using the ovs plugin. We basically have the same issues that the neutron gate jobs have, which is related to concurrency issues and tenant isolation (we're doing the same as devstack with neutron in that we don't run tempest with tenant isolation). We are running most of the nova and most of the neutron API tests though (we don't have all of the neutron-dependent scenario tests working though, probably more due to incompetence in setting up neutron than anything else). 7. Cinder backend? We're running with the storwize driver but we do we do about the remote v7000? Is there any reason not to just run with a local LVM setup like we do in the real gate? I mean, additional coverage for the v7000 driver is great, but if it breaks and causes you to not have any coverage at all, that seems, like, bad to me :) Yeah, I think we'd just run with a local LVM setup, that's what we do for x86_64 and s390x tempest runs. For whatever reason we thought we'd do storwize for our ppc64 runs, probably just to have a matrix of coverage. Again, just getting some thoughts out there to help us figure out our goals for this, especially around 4 and 5. Yeah, thanks for starting this discussion! --Dan ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: Dan Smith d...@danplanet.com wrote on 10/10/2013 08:26:14 PM: From: Dan Smith d...@danplanet.com To: OpenStack Development Mailing List openstack-dev@lists.openstack.org, Date: 10/10/2013 08:31 PM Subject: Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. ++ I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach our nose.cfg and latest x-unit results xml file. We should publish all logs, similar to what we do for upstream ( http://logs.openstack.org/96/48196/8/gate/gate-tempest-devstack-vm-full/70ae562/ ). 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. Hmm, so that means neutron? AFAIK, not much of tempest runs with Nova/Neutron. I kinda think that since nova-network is our default right now (for better or worse) that the run should include that mode, especially if using neutron excludes a large portion of the tests. I think you said you're actually running a bunch of tempest right now, which conflicts with my understanding of neutron workiness. Can you clarify? Correct, we're running with neutron using the ovs plugin. We basically have the same issues that the neutron gate jobs have, which is related to concurrency issues and tenant isolation (we're doing the same as devstack with neutron in that we don't run tempest with tenant isolation). We are running most of the nova and most of the neutron API tests though (we don't have all of the neutron-dependent scenario tests working though, probably more due to incompetence in setting up neutron than anything else). 7. Cinder backend? We're running with the storwize driver but we do we do about the remote v7000? Is there any reason not to just run with a local LVM setup like we do in the real gate? I mean, additional coverage for the v7000 driver is great, but if it breaks and causes you to not have any coverage at all, that seems, like, bad to me :) Yeah, I think we'd just run with a local LVM setup, that's what we do for x86_64 and s390x tempest runs. For whatever reason we thought we'd do storwize for our ppc64 runs, probably just to have a matrix of coverage. Again, just getting some thoughts out there to help us figure out our goals for this, especially around 4 and 5. Yeah, thanks for starting
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
On Thu, Oct 10, 2013 at 04:55:51PM -0500, Matt Riedemann wrote: Based on the discussion with Russell and Dan Smith in the nova meeting today, here are some of my notes from the meeting that can continue the discussion. These are all pretty rough at the moment so please bear with me, this is more to just get the ball rolling on ideas. Notes on powervm CI: 1. What OS to run on? Fedora 19, RHEL 6.4? - Either of those is probably fine, we use RHEL 6.4 right now internally. I'd say use Fedora 19 over RHEL 6.4 that way you can use python 2.7 and run tempest with testr instead of nose. While you won't be able to run things in parallel if you're using neutron right now, moving forward that should hopefully be fixed soon. Running in parallel may help with the execution time a bit. -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][powervm] my notes from the meeting on powervm CI
On Thu, Oct 10, 2013 at 07:39:37PM -0700, Joe Gordon wrote: On Thu, Oct 10, 2013 at 7:28 PM, Matt Riedemann mrie...@us.ibm.com wrote: 4. What is the max amount of time for us to report test results? Dan didn't seem to think 48 hours would fly. :) Honestly, I think that 12 hours during peak times is the upper limit of what could be considered useful. If it's longer than that, many patches could go into the tree without a vote, which defeats the point. Yeah, I was just joking about the 48 hour thing, 12 hours seems excessive but I guess that has happened when things are super backed up with gate issues and rechecks. Right now things take about 4 hours, with Tempest being around 1.5 hours of that. The rest of the time is setup and install, which includes heat and ceilometer. So I guess that raises another question, if we're really setting this up right now because of nova, do we need to have heat and ceilometer installed and configured in the initial delivery of this if we're not going to run tempest tests against them (we don't right now)? In general the faster the better, and if things get to slow enough that we have to wait for powervm CI to report back, I think its reasonable to go ahead and approve things without hearing back. In reality if you can report back in under 12 hours this will rarely happen (I think). I think some aspect of the slow setup time is related to DB2 and how the migrations perform with some of that, but the overall time is not considerably different from when we were running this with MySQL so I'm reluctant to blame it all on DB2. I think some of our topology could have something to do with it too since the IVM hypervisor is running on a separate system and we are gated on how it's performing at any given time. I think that will be our biggest challenge for the scale issues with community CI. 5. What are the minimum tests that need to run (excluding APIs that the powervm driver doesn't currently support)? - smoke/gate/negative/whitebox/scenario/cli? Right now we have 1152 tempest tests running, those are only within api/scenario/cli and we don't run everything. Well that's almost a full run right now, the full tempest jobs have 1290 tests of which we skip 65 because of bugs or configuration. (don't run neutron api tests without neutron) That number is actually pretty high since you are running with neutron. Right now the neutron gating jobs only have 221 jobs and skip 8 of those. Can you share the list of things you've got working with neutron so we can up the number of gating tests? I think that a full run of tempest should be required. That said, if there are things that the driver legitimately doesn't support, it makes sense to exclude those from the tempest run, otherwise it's not useful. ++ I think you should publish the tempest config (or config script, or patch, or whatever) that you're using so that we can see what it means in terms of the coverage you're providing. Just to clarify, do you mean publish what we are using now or publish once it's all working? I can certainly attach our nose.cfg and latest x-unit results xml file. We should publish all logs, similar to what we do for upstream ( http://logs.openstack.org/96/48196/8/gate/gate-tempest-devstack-vm-full/70ae562/ ). Yes, and part of that is the devstack logs which shows all the configuration steps for getting an environment up and running. This is sometimes very useful for debugging. So this is probably information that you'll want to replicate in whatever the logging output for the powervm jobs ends up being. 6. Network service? We're running with openvswitch 1.10 today so we probably want to continue with that if possible. Hmm, so that means neutron? AFAIK, not much of tempest runs with Nova/Neutron. I kinda think that since nova-network is our default right now (for better or worse) that the run should include that mode, especially if using neutron excludes a large portion of the tests. I think you said you're actually running a bunch of tempest right now, which conflicts with my understanding of neutron workiness. Can you clarify? Correct, we're running with neutron using the ovs plugin. We basically have the same issues that the neutron gate jobs have, which is related to concurrency issues and tenant isolation (we're doing the same as devstack with neutron in that we don't run tempest with tenant isolation). We are running most of the nova and most of the neutron API tests though (we don't have all of the neutron-dependent scenario tests working though, probably more due to incompetence in setting up neutron than anything else). I also agree with Dan here in the short term you should probably at least have a run with nova-network since it's the default. It'll also let you run