Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 12/3/18 10:34 AM, Bogdan Dobrelya wrote: Hi Kevin. Puppet not only creates config files but also executes a service dependent steps, like db sync, so neither '[base] -> [puppet]' nor '[base] -> [service]' would not be enough on its own. That requires some services specific code to be included into *config* images as well. PS. There is a related spec [0] created by Dan, please take a look and propose you feedback [0] https://review.openstack.org/620062 I'm terribly sorry, but that's a corrected link [0] to that spec. [0] https://review.openstack.org/620909 On 11/30/18 6:48 PM, Fox, Kevin M wrote: Still confused by: [base] -> [service] -> [+ puppet] not: [base] -> [puppet] and [base] -> [service] ? Thanks, Kevin ____ From: Bogdan Dobrelya [bdobr...@redhat.com] Sent: Friday, November 30, 2018 5:31 AM To: Dan Prince; openstack-dev@lists.openstack.org; openstack-disc...@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes On 11/30/18 1:52 PM, Dan Prince wrote: On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote: On 11/29/18 6:42 PM, Jiří Stránský wrote: On 28. 11. 18 18:29, Bogdan Dobrelya wrote: On 11/28/18 6:02 PM, Jiří Stránský wrote: Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. +1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context). We are going in circles here I think +1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making. As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd? Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope) [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672 It seems like the general consenses is that cleaning up some of the RPM dependencies so that
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
Hi Kevin. Puppet not only creates config files but also executes a service dependent steps, like db sync, so neither '[base] -> [puppet]' nor '[base] -> [service]' would not be enough on its own. That requires some services specific code to be included into *config* images as well. PS. There is a related spec [0] created by Dan, please take a look and propose you feedback [0] https://review.openstack.org/620062 On 11/30/18 6:48 PM, Fox, Kevin M wrote: Still confused by: [base] -> [service] -> [+ puppet] not: [base] -> [puppet] and [base] -> [service] ? Thanks, Kevin ____ From: Bogdan Dobrelya [bdobr...@redhat.com] Sent: Friday, November 30, 2018 5:31 AM To: Dan Prince; openstack-dev@lists.openstack.org; openstack-disc...@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes On 11/30/18 1:52 PM, Dan Prince wrote: On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote: On 11/29/18 6:42 PM, Jiří Stránský wrote: On 28. 11. 18 18:29, Bogdan Dobrelya wrote: On 11/28/18 6:02 PM, Jiří Stránský wrote: Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. +1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context). We are going in circles here I think +1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making. As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd? Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope) [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672 It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win. What confuses me is why are there still patches posted to move Puppet out of the base layer when we agr
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/30/18 1:52 PM, Dan Prince wrote: On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote: On 11/29/18 6:42 PM, Jiří Stránský wrote: On 28. 11. 18 18:29, Bogdan Dobrelya wrote: On 11/28/18 6:02 PM, Jiří Stránský wrote: Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. +1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context). We are going in circles here I think +1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making. As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd? Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope) [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672 It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win. What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size. Dan In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.) So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB. Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both externa
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/29/18 6:42 PM, Jiří Stránský wrote: On 28. 11. 18 18:29, Bogdan Dobrelya wrote: On 11/28/18 6:02 PM, Jiří Stránský wrote: Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. +1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context). We are going in circles here I think +1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): |base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making. As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: |base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.) So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB. Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds). And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside. Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time. All in all i'd argue we should be looking at different options still. Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating. Ack :/ Long story short, we cannot shoot both rabbits wi
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/28/18 8:55 PM, Doug Hellmann wrote: I thought the preferred solution for more complex settings was config maps. Did that approach not work out? Regardless, now that the driver work is done if someone wants to take another stab at etcd integration it’ll be more straightforward today. Doug While sharing configs is a feasible option to consider for large scale configuration management, Etcd only provides a strong consistency, which is also known as "Unavailable" [0]. For edge scenarios, to configure 40,000 remote computes over WAN connections, we'd rather want instead weaker consistency models, like "Sticky Available" [0]. That would allow services to fetch their configuration either from a central "uplink" or locally as well, when the latter is not accessible from remote edge sites. Etcd cannot provide 40,000 local endpoints to fit that case I'm afraid, even if those would be read only replicas. That is also something I'm highlighting in the paper [1] drafted for ICFC-2019. But had we such a sticky available key value storage solution, we would indeed have solved the problem of multiple configuration management system execution for thousands of nodes as James describes it. [0] https://jepsen.io/consistency [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf On 11/28/18 11:22 PM, Dan Prince wrote: On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote: On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya wrote: Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure. I think it's this last point that is the crux of this discussion. We can agree to disagree about the merits of this proposal and whether it's a pre-optimzation or micro-optimization, which I admit are somewhat subjective terms. Ultimately, it seems to be about the "why" do we need to do this as to the reason why the conversation seems to be going in circles a bit. I'm all for reducing container image size, but the reality is that this proposal doesn't necessarily help us with the Edge use cases we are talking about trying to solve. Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords? Maybe we should instead be thinking about how we can do that *1* time centrally, and produce a configuration that can be reused across 40,000 nodes with little effort. The opportunity for a significant impact in terms of how we can scale TripleO is much larger if we consider approaching these problems with a wider net of what we could do. There's opportunity for a lot of better reuse in TripleO, configuration is just one area. The plan and Heat stack (within the ResourceGroup) are some other areas. We run Puppet for configuration because that is what we did on baremetal and we didn't break backwards compatability for our configuration options for upgrades. Our Puppet model relies on being executed on each local host in order to splice in the correct IP address and hostname. It executes in a distributed fashion, and works fairly well considering the history of the project. It is robust, guarantees no duplicate configs are being set, and is backwards compatible with all the options TripleO supported on baremetal. Puppet is arguably better for configuration than Ansible (which is what I hear people most often suggest we replace it with). It suits our needs fine, but it is perhaps a bit overkill considering we are only generating config files. I think the answer here is moving to something like Etcd. Perhaps Not Etcd I think, see my comment above. But you're absolutely right Dan. skipping over Ansible entirely as a config management tool (it is arguably less capable than Puppet in this category anyway). Or we could use Ansible for "legacy" services only, switch to Etcd for a majority of the OpenStack services, and drop Puppet entirely (my favorite option). Consolidating our technology stack would be wise. We've already put some work and analysis into the Etcd effort. Just need to push on it some more. Looking at the previous Kubernetes prototypes for TripleO would be the place to start. Config management migration is going to be tedious. Its technical debt that needs to be handled at some point anyway. I think it is a general TripleO improvement that could benefit all clouds, not just Edge. Dan At the same time, if some folks want to work on smaller optimizations (such as container image size), with an approa
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/28/18 6:02 PM, Jiří Stránský wrote: Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. +1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context). We are going in circles here I think +1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): |base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making. As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating. Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure. [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction Dan Thanks Jirka __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/28/18 2:58 PM, Dan Prince wrote: On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote: To follow up and explain the patches for code review: The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736 This email was cross-posted to multiple lists and I think we may have lost some of the context in the process as the subject was changed. Most of the suggestions and patches are about making our base container(s) smaller in size. And the means by which the patches do that is to share binaries/applications across containers with custom mounts/volumes. I've -2'd most of them. What concerns me however is that some of the TripleO cores seemed open to this idea yesterday on IRC. Perhaps I've misread things but what you appear to be doing here is quite drastic I think we need to consider any of this carefully before proceeding with any of it. Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well: The current self-containing (config and runtime bits) architecture of containers badly affects: * the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size). You are accomplishing this by removing Puppet from the base container, but you are also creating another container in the process. This would still be required on all nodes as Puppet is our config tool. So you would still be downloading some of this data anyways. Understood your reasons for doing this are that it avoids rebuilding all containers when there is a change to any of these packages in the base container. What you are missing however is how often is it the case that Puppet is updated that something else in the base container isn't? For CI jobs updating all containers, its quite an often to have changes in openstack/tripleo puppet modules to pull in. IIUC, that automatically picks up any updates for all of its dependencies and for the dependencies of dependencies, and all that multiplied by a hundred of total containers to get it updated. That is a *pain* we're used to have these day for quite often timing out CI jobs... Ofc, the main cause is delayed promotions though. For real deployments, I have no data for the cadence of minor updates in puppet and tripleo & openstack modules for it, let's ask operators (as we're happened to be in the merged openstack-discuss list)? For its dependencies though, like systemd and ruby, I'm pretty sure it's quite often to have CVEs fixed there. So I expect what "in the fields" security fixes delivering for those might bring some unwanted hassle for long-term maintenance of LTS releases. As Tengu noted on IRC: "well, between systemd, puppet and ruby, there are many security concernes, almost every month... and also, what's the point keeping them in runtime containers when they are useless?" I would wager that it is more rare than you'd think. Perhaps looking at the history of an OpenStack distribution would be a valid way to assess this more critically. Without this data to backup the numbers I'm afraid what you are doing here falls into "pre-optimization" territory for me and I don't think the means used in the patches warrent the benefits you mention here. * Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size). It would seem to me there are other ways to solve the CI containers update problems. Rebuilding the base layer more often would solve this right? If we always build our service containers off of a base layer that is recent there should be no updates to the system/puppet packages there in our CI pipelines. * security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers. We aren't actually using systemd within our containers. I think those packages are getting pulled in by an RPM dependency elsewhere. So rather than using 'rpm -ev --nodeps' to remove it we could create a sub-package for containers in those cases and install it instead. In short rather than hack this to remove them why not pursue a proper packaging fix? In general I am a fan of getting things out of the base container we don't need... so yeah lets do this. But lets do it properly. * services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting
Re: [openstack-dev] [TripleO][Edge][Kolla] Reduce base layer of containers for security and size of images (maintenance) sakes
Added Kolla tag as we all together might want to do something to that systemd included in containers via *multiple* package dependencies, like [0]. Ideally, that might be properly packaging all/some (like those names listed in [1]) of the places having it as a dependency, to stop doing that as of now it's Containers Time?.. As a temporary security band-aiding I was thinking of removing systemd via footers [1] as an extra layer added on top, but not sure that buys something good long-term. [0] https://pastebin.com/RSaRsYgZ [1] https://review.openstack.org/#/c/620310/2/container-images/tripleo_kolla_template_overrides.j2@680 On 11/28/18 12:45 PM, Bogdan Dobrelya wrote: To follow up and explain the patches for code review: The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736 Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well: The current self-containing (config and runtime bits) architecture of containers badly affects: * the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size). * Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size). * security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers. * services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever. On 11/27/18 4:08 PM, Bogdan Dobrelya wrote: Changing the topic to follow the subject. [tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily. Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all Challenge: Here is a related bug [1] and implementation [1] for that. PTAL folks! [0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling. On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås redhat.com> wrote: We add this to all images: https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 /bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers? Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience. On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?) [1] https://review.openstack.org/527927 -- Best regards, Bogdan Dobrelya, Irc #bogdando -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
To follow up and explain the patches for code review: The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736 Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well: The current self-containing (config and runtime bits) architecture of containers badly affects: * the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size). * Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size). * security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers. * services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever. On 11/27/18 4:08 PM, Bogdan Dobrelya wrote: Changing the topic to follow the subject. [tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily. Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all Challenge: Here is a related bug [1] and implementation [1] for that. PTAL folks! [0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling. On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås wrote: We add this to all images: https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 /bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers? Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience. On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?) [1] https://review.openstack.org/527927 -- Best regards, Bogdan Dobrelya, Irc #bogdando -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
e? Or?) [1] https://review.openstack.org/527927 -- Best regards, Bogdan Dobrelya, Irc #bogdando -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage
Here is a related bug [1] and implementation [1] for that. PTAL folks! [0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling. On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås wrote: We add this to all images: https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 /bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers? Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience. On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?) [1] https://review.openstack.org/527927 -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO
+1 On 11/15/18 4:50 PM, Sagi Shnaidman wrote: Hi, I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. Quique is actively involved in improvements and development of TripleO and TripleO CI. He also helps in other projects including but not limited to Infrastructure. He shows a very good understanding how TripleO and CI works and I'd like suggest him as core reviewer of TripleO in CI related code. Please vote! My +1 is here :) Thanks -- Best regards Sagi Shnaidman __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Hello. The final version of the position paper "Edge Clouds Multiple Control Planes Data Replication Challenges" [0],[1] drafted, and have been uploaded to EDAS. The deadline expires today and I'm afraid there is no time left for more of amendments. Thank you all for reviews and inputs, and those edge sessions at the summit in Berlin were really mind opening! PS. I wish I could have kept working on that draft while was attending the summit, but that was not the case :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.tex On 11/8/18 6:58 PM, Bogdan Dobrelya wrote: Hi folks. The deadline for papers seems to be extended till Nov 17, so that's a great news! I finished drafting the position paper [0],[1]. Please /proof read and review. There is also open questions placed there and it would be really nice to have a co-author here for any of those items remaining... ... -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Hi folks. The deadline for papers seems to be extended till Nov 17, so that's a great news! I finished drafting the position paper [0],[1]. Please /proof read and review. There is also open questions placed there and it would be really nice to have a co-author here for any of those items remaining... I'm also looking for some help with... **uploading PDF** to EDAS system! :) It throws on me: pdf notstampable The PDF file is not compliant with PDF standards and cannot be stamped (see FAQ)... And FAQ says: "First, try using the most current version of dvipdf for LaTeX or the most current version of Word. You can also distill the file by using Acrobat (Pro, not Acrobat Reader): * Open the PDF file in Acrobat Pro; * Go to the File Menu > Save As or File > Export To... (in Adobe DC Pro) or File > Save As Other... > More Options > Postscript (in Adobe Pro version 11) * Give the file a new name (do not overwrite the original file); * Under "Save As Type", choose "PostScript File (*.ps)" * Open Distiller and browse for this file or go to the directory where the file exists and double click on the file - this will open and run Distiller and regenerate the PDF file. If you do not have Acrobat Pro, you can also try to save the PostScript version via Apple Preview, using the "Print..." menu and the "PDF v" selector in the lower left hand corner to pick "Save as PostScript...". Unfortunately, Apple Preview saves PDF files as version 1.3, which is not acceptable to PDF Xpress, but tools such as docupub appear to produce compliant PDF." I have yet tried those MS word/adobe pro and distutils dances ( I think I should try that as well... ), but neither docupub [2] nor dvipdf(m) for LaTeX helped to produce a pdf edas can eat :-( [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.tex [2] https://docupub.com/pdfconvert/ Folks, I have drafted a few more sections [0] for your /proof reading and kind review please. Also left some notes for TBD things, either for the potential co-authors' attention or myself :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Folks, I have drafted a few more sections [0] for your /proof reading and kind review please. Also left some notes for TBD things, either for the potential co-authors' attention or myself :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf On 11/5/18 6:50 PM, Bogdan Dobrelya wrote: Update: I have yet found co-authors, I'll keep drafting that position paper [0],[1]. Just did some baby steps so far. I'm open for feedback and contributions! PS. Deadline is Nov 9 03:00 UTC, but *may be* it will be extended, if the event chairs decide to do so. Fingers crossed. [0] https://github.com/bogdando/papers-ieee#in-the-current-development-looking-for-co-authors [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf On 11/5/18 3:06 PM, Bogdan Dobrelya wrote: Thank you for a reply, Flavia: Hi Bogdan sorry for the late reply - yesterday was a Holiday here in Brazil! I am afraid I will not be able to engage in this collaboration with such a short time...we had to have started this initiative a little earlier... That's understandable. I hoped though a position paper is something we (all who reads that, not just you and me) could achieve in a couple of days, without a lot of research associated. That's a postion paper, which is not expected to contain formal prove or implementation details. The vision for tooling is the hardest part though, and indeed requires some time. So let me please [tl;dr] the outcome of that position paper: * position: given Always Available autonomy support as a starting point, define invariants for both operational and data storage consistency requirements of control/management plane (I've already drafted some in [0]) * vision: show that in the end that data synchronization and conflict resolving solution just boils down to having a causally consistent KVS (either causal+ or causal-RT, or lazy replication based, or anything like that), and cannot be achieved with *only* transactional distributed database, like Galera cluster. The way how to show that is an open question, we could refer to the existing papers (COPS, causal-RT, lazy replication et al) and claim they fit the defined invariants nicely, while transactional DB cannot fit it by design (it's consensus protocols require majority/quorums to operate and being always available for data put/write operations). We probably may omit proving that obvious thing formally? At least for the postion paper... * opportunity: that is basically designing and implementing of such a causally-consistent KVS solution (see COPS library as example) for OpenStack, and ideally, unifying it for PaaS operators (OpenShift/Kubernetes) and tenants willing to host their containerized workloads on PaaS distributed over a Fog Cloud of Edge clouds and leverage its data synchronization and conflict resolving solution as-a-service. Like Amazon dynamo DB, for example, except that fitting the edge cases of another cloud stack :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md As for working collaboratively with latex, I would recommend using overleaf - it is not that difficult and has lots of edition resources as markdown and track changes, for instance. Thanks and good luck! Flavia On 11/2/18 5:32 PM, Bogdan Dobrelya wrote: Hello folks. Here is an update for today. I crated a draft [0], and spend some time with building LaTeX with live-updating for the compiled PDF... The latter is only informational, if someone wants to contribute, please follow the instructions listed by the link (hint: you need no to have any LaTeX experience, only basic markdown knowledge should be enough!) [0] https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors On 10/31/18 6:54 PM, Ildiko Vancsa wrote: Hi, Thank you for sharing your proposal. I think this is a very interesting topic with a list of possible solutions some of which this group is also discussing. It would also be great to learn more about the IEEE activities and have experience about the process in this group on the way forward. I personally do not have experience with IEEE conferences, but I’m happy to help with the paper if I can. Thanks, Ildikó (added from the parallel thread) On 2018. Oct 31., at 19:11, Mike Bayer zzzcomputing.com> wrote: On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote: (cross-posting openstack-dev) Hello. [tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data consistency requirements and challenges" a position paper [0] (papers submitting deadline is Nov 8). The problem scope is synchronizing control plane and/or deployments-specific data (not necessary limited to OpenStack) across remote Edges and central Edge and management site(s). Including
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Update: I have yet found co-authors, I'll keep drafting that position paper [0],[1]. Just did some baby steps so far. I'm open for feedback and contributions! PS. Deadline is Nov 9 03:00 UTC, but *may be* it will be extended, if the event chairs decide to do so. Fingers crossed. [0] https://github.com/bogdando/papers-ieee#in-the-current-development-looking-for-co-authors [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf On 11/5/18 3:06 PM, Bogdan Dobrelya wrote: Thank you for a reply, Flavia: Hi Bogdan sorry for the late reply - yesterday was a Holiday here in Brazil! I am afraid I will not be able to engage in this collaboration with such a short time...we had to have started this initiative a little earlier... That's understandable. I hoped though a position paper is something we (all who reads that, not just you and me) could achieve in a couple of days, without a lot of research associated. That's a postion paper, which is not expected to contain formal prove or implementation details. The vision for tooling is the hardest part though, and indeed requires some time. So let me please [tl;dr] the outcome of that position paper: * position: given Always Available autonomy support as a starting point, define invariants for both operational and data storage consistency requirements of control/management plane (I've already drafted some in [0]) * vision: show that in the end that data synchronization and conflict resolving solution just boils down to having a causally consistent KVS (either causal+ or causal-RT, or lazy replication based, or anything like that), and cannot be achieved with *only* transactional distributed database, like Galera cluster. The way how to show that is an open question, we could refer to the existing papers (COPS, causal-RT, lazy replication et al) and claim they fit the defined invariants nicely, while transactional DB cannot fit it by design (it's consensus protocols require majority/quorums to operate and being always available for data put/write operations). We probably may omit proving that obvious thing formally? At least for the postion paper... * opportunity: that is basically designing and implementing of such a causally-consistent KVS solution (see COPS library as example) for OpenStack, and ideally, unifying it for PaaS operators (OpenShift/Kubernetes) and tenants willing to host their containerized workloads on PaaS distributed over a Fog Cloud of Edge clouds and leverage its data synchronization and conflict resolving solution as-a-service. Like Amazon dynamo DB, for example, except that fitting the edge cases of another cloud stack :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md As for working collaboratively with latex, I would recommend using overleaf - it is not that difficult and has lots of edition resources as markdown and track changes, for instance. Thanks and good luck! Flavia On 11/2/18 5:32 PM, Bogdan Dobrelya wrote: Hello folks. Here is an update for today. I crated a draft [0], and spend some time with building LaTeX with live-updating for the compiled PDF... The latter is only informational, if someone wants to contribute, please follow the instructions listed by the link (hint: you need no to have any LaTeX experience, only basic markdown knowledge should be enough!) [0] https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors On 10/31/18 6:54 PM, Ildiko Vancsa wrote: Hi, Thank you for sharing your proposal. I think this is a very interesting topic with a list of possible solutions some of which this group is also discussing. It would also be great to learn more about the IEEE activities and have experience about the process in this group on the way forward. I personally do not have experience with IEEE conferences, but I’m happy to help with the paper if I can. Thanks, Ildikó (added from the parallel thread) On 2018. Oct 31., at 19:11, Mike Bayer wrote: On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote: (cross-posting openstack-dev) Hello. [tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data consistency requirements and challenges" a position paper [0] (papers submitting deadline is Nov 8). The problem scope is synchronizing control plane and/or deployments-specific data (not necessary limited to OpenStack) across remote Edges and central Edge and management site(s). Including the same aspects for overclouds and undercloud(s), in terms of TripleO; and other deployment tools of your choice. Another problem is to not go into different solutions for Edge deployments management and control planes of edges. And for tenants as well, if we think of tenants also doing Edge deployments based on Edge Data Replication as a Service, say for Kubernetes/Ope
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Thank you for a reply, Flavia: Hi Bogdan sorry for the late reply - yesterday was a Holiday here in Brazil! I am afraid I will not be able to engage in this collaboration with such a short time...we had to have started this initiative a little earlier... That's understandable. I hoped though a position paper is something we (all who reads that, not just you and me) could achieve in a couple of days, without a lot of research associated. That's a postion paper, which is not expected to contain formal prove or implementation details. The vision for tooling is the hardest part though, and indeed requires some time. So let me please [tl;dr] the outcome of that position paper: * position: given Always Available autonomy support as a starting point, define invariants for both operational and data storage consistency requirements of control/management plane (I've already drafted some in [0]) * vision: show that in the end that data synchronization and conflict resolving solution just boils down to having a causally consistent KVS (either causal+ or causal-RT, or lazy replication based, or anything like that), and cannot be achieved with *only* transactional distributed database, like Galera cluster. The way how to show that is an open question, we could refer to the existing papers (COPS, causal-RT, lazy replication et al) and claim they fit the defined invariants nicely, while transactional DB cannot fit it by design (it's consensus protocols require majority/quorums to operate and being always available for data put/write operations). We probably may omit proving that obvious thing formally? At least for the postion paper... * opportunity: that is basically designing and implementing of such a causally-consistent KVS solution (see COPS library as example) for OpenStack, and ideally, unifying it for PaaS operators (OpenShift/Kubernetes) and tenants willing to host their containerized workloads on PaaS distributed over a Fog Cloud of Edge clouds and leverage its data synchronization and conflict resolving solution as-a-service. Like Amazon dynamo DB, for example, except that fitting the edge cases of another cloud stack :) [0] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md As for working collaboratively with latex, I would recommend using overleaf - it is not that difficult and has lots of edition resources as markdown and track changes, for instance. Thanks and good luck! Flavia On 11/2/18 5:32 PM, Bogdan Dobrelya wrote: Hello folks. Here is an update for today. I crated a draft [0], and spend some time with building LaTeX with live-updating for the compiled PDF... The latter is only informational, if someone wants to contribute, please follow the instructions listed by the link (hint: you need no to have any LaTeX experience, only basic markdown knowledge should be enough!) [0] https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors On 10/31/18 6:54 PM, Ildiko Vancsa wrote: Hi, Thank you for sharing your proposal. I think this is a very interesting topic with a list of possible solutions some of which this group is also discussing. It would also be great to learn more about the IEEE activities and have experience about the process in this group on the way forward. I personally do not have experience with IEEE conferences, but I’m happy to help with the paper if I can. Thanks, Ildikó (added from the parallel thread) On 2018. Oct 31., at 19:11, Mike Bayer wrote: On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote: (cross-posting openstack-dev) Hello. [tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data consistency requirements and challenges" a position paper [0] (papers submitting deadline is Nov 8). The problem scope is synchronizing control plane and/or deployments-specific data (not necessary limited to OpenStack) across remote Edges and central Edge and management site(s). Including the same aspects for overclouds and undercloud(s), in terms of TripleO; and other deployment tools of your choice. Another problem is to not go into different solutions for Edge deployments management and control planes of edges. And for tenants as well, if we think of tenants also doing Edge deployments based on Edge Data Replication as a Service, say for Kubernetes/OpenShift on top of OpenStack. So the paper should name the outstanding problems, define data consistency requirements and pose possible solutions for synchronization and conflicts resolving. Having maximum autonomy cases supported for isolated sites, with a capability to eventually catch up its distributed state. Like global database [1], or something different perhaps (see causal-real-time consistency model [2],[3]), or even using git. And probably more than that?.. (looking for ideas) I can offer detail on whatever aspects of the "shared
Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling. On 10/31/18 6:35 PM, Alex Schultz wrote: So this is a single layer that is updated once and shared by all the containers that inherit from it. I did notice the same thing and have proposed a change in the layering of these packages last night. https://review.openstack.org/#/c/614371/ In general this does raise a point about dependencies of services and what the actual impact of adding new ones to projects is. Especially in the container world where this might be duplicated N times depending on the number of services deployed. With the move to containers, much of the sharedness that being on a single host provided has been lost at a cost of increased bandwidth, memory, and storage usage. Thanks, -Alex -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
Hello folks. Here is an update for today. I crated a draft [0], and spend some time with building LaTeX with live-updating for the compiled PDF... The latter is only informational, if someone wants to contribute, please follow the instructions listed by the link (hint: you need no to have any LaTeX experience, only basic markdown knowledge should be enough!) [0] https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors On 10/31/18 6:54 PM, Ildiko Vancsa wrote: Hi, Thank you for sharing your proposal. I think this is a very interesting topic with a list of possible solutions some of which this group is also discussing. It would also be great to learn more about the IEEE activities and have experience about the process in this group on the way forward. I personally do not have experience with IEEE conferences, but I’m happy to help with the paper if I can. Thanks, Ildikó (added from the parallel thread) On 2018. Oct 31., at 19:11, Mike Bayer wrote: On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya wrote: (cross-posting openstack-dev) Hello. [tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data consistency requirements and challenges" a position paper [0] (papers submitting deadline is Nov 8). The problem scope is synchronizing control plane and/or deployments-specific data (not necessary limited to OpenStack) across remote Edges and central Edge and management site(s). Including the same aspects for overclouds and undercloud(s), in terms of TripleO; and other deployment tools of your choice. Another problem is to not go into different solutions for Edge deployments management and control planes of edges. And for tenants as well, if we think of tenants also doing Edge deployments based on Edge Data Replication as a Service, say for Kubernetes/OpenShift on top of OpenStack. So the paper should name the outstanding problems, define data consistency requirements and pose possible solutions for synchronization and conflicts resolving. Having maximum autonomy cases supported for isolated sites, with a capability to eventually catch up its distributed state. Like global database [1], or something different perhaps (see causal-real-time consistency model [2],[3]), or even using git. And probably more than that?.. (looking for ideas) I can offer detail on whatever aspects of the "shared / global database" idea. The way we're doing it with Galera for now is all about something simple and modestly effective for the moment, but it doesn't have any of the hallmarks of a long-term, canonical solution, because Galera is not well suited towards being present on many (dozens) of endpoints. The concept that the StarlingX folks were talking about, that of independent databases that are synchronized using some kind of middleware is potentially more scalable, however I think the best approach would be API-level replication, that is, you have a bunch of Keystone services and there is a process that is regularly accessing the APIs of these keystone services and cross-publishing state amongst all of them. Clearly the big challenge with that is how to resolve conflicts, I think the answer would lie in the fact that the data being replicated would be of limited scope and potentially consist of mostly or fully non-overlapping records. That is, I think "global database" is a cheap way to get what would be more effective as asynchronous state synchronization between identity services. Recently we’ve been also exploring federation with an IdP (Identity Provider) master: https://wiki.openstack.org/wiki/Keystone_edge_architectures#Identity_Provider_.28IdP.29_Master_with_shadow_users One of the pros is that it removes the need for synchronization and potentially increases scalability. Thanks, Ildikó -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Edge-computing][tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
I forgot to mention the submission registration and abstract has to be submitted today. I've created it as #1570506394, and the paper itself can be uploaded until the Nov 8 (or Nov 9 perhaps as the registration system shows to me). I'm not sure that paper number is searchable publicly, so here is the paper name and abstract for your kind review please: name: "Edge clouds control plane and management data consistency challenges" abstract: "Fog computing is emerging Cloud of (Edge) Clouds technology. Its control plane and deployments data synchronization is a major challenge. Autonomy requirements expect even the most distant edge sites always manageable, available for monitoring and alerting, scaling up/down, upgrading and applying security fixes. Whenever temporary disconnected sites are managed locally or centrally, some changes and data need to be eventually synchronized back to the central site(s) with having its merge-conflicts resolved for the central data hub(s). While some data needs to be pushed from the central site(s) to the Edge, which might require resolving data collisions at the remote sites as well. In this paper, we position the outstanding data synchronization problems for OpenStack cloud platform becoming a solution number one for fog computing. We outline the data consistency requirements and design approaches to meet the AA (Always Available) autonomy expectations. Finally, the paper brings the vision of unified tooling, which solves the data synchronization problems the same way for infrastructure owners, IaaS cloud operators and tenants running workloads for PaaS like OpenShift or Kubernetes deployed on top of OpenStack. The secondary goal of this work is to help cloud architects and developers to federate stateful cloud components over reliable distributed data backends and having its failure modes known." Thank you for your time, if still reading this. On 10/31/18 3:57 PM, Bogdan Dobrelya wrote: (cross-posting openstack-dev) Hello. [tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data consistency requirements and challenges" a position paper [0] (papers submitting deadline is Nov 8). The problem scope is synchronizing control plane and/or deployments-specific data (not necessary limited to OpenStack) across remote Edges and central Edge and management site(s). Including the same aspects for overclouds and undercloud(s), in terms of TripleO; and other deployment tools of your choice. Another problem is to not go into different solutions for Edge deployments management and control planes of edges. And for tenants as well, if we think of tenants also doing Edge deployments based on Edge Data Replication as a Service, say for Kubernetes/OpenShift on top of OpenStack. So the paper should name the outstanding problems, define data consistency requirements and pose possible solutions for synchronization and conflicts resolving. Having maximum autonomy cases supported for isolated sites, with a capability to eventually catch up its distributed state. Like global database [1], or something different perhaps (see causal-real-time consistency model [2],[3]), or even using git. And probably more than that?.. (looking for ideas) See also the "check" list in-line, which I think also meets the data consistency topics well - it would be always nice to have some theoretical foundations at hand, when repairing some 1000-edges-spread-off and fully broken global database, by hand :) PS. I must admit I have yet any experience with those IEEE et al academic things and looking for someone who has it, to team and co-author that positioning paper by. That's as a start, then we can think of presenting it and expanding into work items for OpenStack Edge WG and future development plans. [0] http://conferences.computer.org/ICFC/2019/Paper_Submission.html [1] https://review.openstack.org/600555 [2] https://jepsen.io/consistency [3] http://www.cs.cornell.edu/lorenzo/papers/cac-tr.pdf On 10/22/18 3:44 PM, Flavia Delicato wrote: = IEEE International Conference on Fog Computing (ICFC 2019) June 24-26, 2019 Prague, Czech Republic http://conferences.computer.org/ICFC/2019/ Co-located with the IEEE International Conference on Cloud Engineering (IC2E 2019) == Important Dates --- Paper registration and abstract: Nov 1st, 2018 Full paper submission due: Nov 8th, 2018 Notification of paper acceptance: Jan. 20th, 2019 Workshop and tutorial proposals due: Nov 11, 2018 Notification of proposal acceptance: Nov 18, 2018 Call for Contributions -- Fog computing is the extension of cloud computing into its edge and the physical world to meet the data volume and decision velocity requirements in many
Re: [openstack-dev] [Edge-computing][tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching
National University of Singapore Neeraj Suri, TU Darmstadt Albert Zomaya, The University of Sydney Program Committee -- Tarek Abdelzaher, UIUC Anne Benoit, ENS Lyon David Bermbach, TU Berlin Bharat Bhargava, Purdue University Olivier Brun, LAAS/CNRS Laboratory Jiannong Cao, Hong Kong Polytech Flavia C. Delicato, UFRJ, Brazil Xiaotie Deng, Peking University, China Schahram Dustdar, TU Wien, Germany Maria Gorlatova, Duke University Dharanipragada Janakiram, IIT Madras Wenjing Luo, Virginia Tech Pedro José Marrón, Universität Duisburg-Essen Geyong Min, University of Exeter Suman Nath, Microsoft Research Vincenzo Piuri, Universita Degli Studi Di Milano Yong Meng Teo, National University of Singapore Guoliang Xing, Chinese University of Hong Kong Yuanyuan Yang, SUNY Stony Brook Xiaoyun Zhu, Cloudera -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] easily identifying how services are configured
On 10/19/18 8:04 PM, Alex Schultz wrote: On Fri, Oct 19, 2018 at 10:53 AM James Slagle wrote: On Wed, Oct 17, 2018 at 11:14 AM Alex Schultz wrote: > Additionally I took a stab at combining the puppet/docker service > definitions for the aodh services in a similar structure to start > reducing the overhead we've had from maintaining the docker/puppet > implementations seperately. You can see the patch > https://review.openstack.org/#/c/611188/ for an additional example of > this. That patch takes the approach of removing baremetal support. Is that what we agreed to do? Since it's deprecated since Queens[0], yes? I think it is time to stop continuing this method of installation. Given that I'm not even sure My point and concern retains as before, unless we fully dropped the docker support for Queens (and downstream LTS released for it), we should not modify the t-h-t directory tree, due to associated maintenance of backports complexity reasons the upgrade process even works anymore with baremetal, I don't think there's a reason to keep it as it directly impacts the time it takes to perform deployments and also contributes to increased complexity all around. [0] http://lists.openstack.org/pipermail/openstack-dev/2017-September/122248.html I'm not specifically opposed, as I'm pretty sure the baremetal implementations are no longer tested anywhere, but I know that Dan had some concerns about that last time around. The alternative we discussed was using jinja2 to include common data/tasks in both the puppet/docker/ansible implementations. That would also result in reducing the number of Heat resources in these stacks and hopefully reduce the amount of time it takes to create/update the ServiceChain stacks. I'd rather we officially get rid of the one of the two methods and converge on a single method without increasing the complexity via jinja to continue to support both. If there's an improvement to be had after we've converged on a single structure for including the base bits, maybe we could do that then? Thanks, -Alex -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Proposing Bob Fournier as core reviewer
+1 On 10/19/18 3:44 PM, Alex Schultz wrote: +1 On Fri, Oct 19, 2018 at 6:29 AM Emilien Macchi wrote: On Fri, Oct 19, 2018 at 8:24 AM Juan Antonio Osorio Robles wrote: I would like to propose Bob Fournier (bfournie) as a core reviewer in TripleO. His patches and reviews have spanned quite a wide range in our project, his reviews show great insight and quality and I think he would be a addition to the core team. What do you folks think? Big +1, Bob is a solid contributor/reviewer. His area of knowledge has been critical in all aspects of Hardware Provisioning integration but also in other TripleO bits. -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Openstack-sigs] [FEMDC] [Edge] [tripleo] On the use of terms "Edge" and "Far Edge"
On 10/18/18 4:33 PM, arkady.kanev...@dell.com wrote: Love the idea to have clearer terminology. Suggest we let telco folks to suggest terminology to use. This is not a 3 level hierarchy but much more. There are several layers of aggregation from local to metro, to regional, to DC. And potential multiple layers in each. -Original Message- From: Dmitry Tantsur Sent: Thursday, October 18, 2018 9:23 AM To: OpenStack Development Mailing List (not for usage questions); openstack-s...@lists.openstack.org Subject: [Openstack-sigs] [FEMDC] [Edge] [tripleo] On the use of terms "Edge" and "Far Edge" [EXTERNAL EMAIL] Please report any suspicious attachments, links, or requests for sensitive information. Hi all, Sorry for chiming in really late in this topic, but I think $subj is worth discussing until we settle harder on the potentially confusing terminology. I think the difference between "Edge" and "Far Edge" is too vague to use these terms in practice. Think about the "edge" metaphor itself: something rarely has several layers of edges. A knife has an edge, there are no far edges. I imagine zooming in and seeing more edges at the edge, and then it's quite cool indeed, but is it really a useful metaphor for those who never used a strong microscope? :) I think in the trivial sense "Far Edge" is a tautology, and should be avoided. As a weak proof of my words, I already see a lot of smart people confusing these two and actually use Central/Edge where they mean Edge/Far Edge. I suggest we adopt a different terminology, even if it less consistent with typical marketing term around the "Edge" movement. Now, I don't have really great suggestions. Something that came up in TripleO discussions [1] is Core/Hub/Edge, which I think reflects the idea better. I'd be very interested to hear your ideas. Similarly to NUMA distance is equal to the shortest path between the NUMA nodes, we could think of edges as facets and Edge distance as the shortest path between edge sites, counting from the central Edge (distance 0), or central Edges, if we have those decentralized and there is no a single central Edge? Dmitry [1] https://etherpad.openstack.org/p/tripleo-edge-mvp ___ openstack-sigs mailing list openstack-s...@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs ___ openstack-sigs mailing list openstack-s...@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ui][tempest][oooq] Refreshing plugins from git
On 10/18/18 2:17 AM, Honza Pokorny wrote: Hello folks, I'm working on the automated ui testing blueprint[1], and I think we need to change the way we ship our tempest tests. Here is where things stand at the moment: * We have a kolla image for tempest * This image contains the tempest rpm, and the openstack-tempest-all rpm * The openstack-tempest-all package in turn contains all of the openstack tempest plugins * Each of the plugins is shipped as an rpm So, in order for a new test in tempest-tripleo-ui to appear in CI we have to go through at least the following tests: * New tempest-tripleo-ui rpm * New openstack-tempest-all rpm * New tempest kolla image This could easily take a week, if not more. What I would like to build is something like the following: * Add an option to the tempest-setup.sh script in tripleo-quickstart to refresh all tempest plugins from git before running any tests * Optionally specify a zuul change for any of the plugins being refreshed * Hook up the test job to patches in tripleo-ui (which tests in tempest-tripleo-ui are testing) so that I can run a fix and its test in a single CI job This would allow the tripleo-ui team to develop code and tests at the same time, and prevent breakage before a patch is even merged. Here are a few questions: * Do you think this is a good idea? This reminds the update_containers case, but relaxed the next level of updating from sources instead of rpm. Given that we already have that update_containers thing, the idea seems acceptable for CI use only. Although I'd prefer to see the packages and the tempest container (and all that update_containers affects) rebuilt in the same CI job run instead. Though I'm not sure for having different paths for "new test in tempest-tripleo-ui" getting into container: executed in CI vs executed via TripleO UI? I think the path it takes should always be the same. But please excuse me if I got the case wrong. [0] https://goo.gl/5bCWRX * Could we accomplish this by some other, simple mechanism? Any helpful suggestions, corrections, and feedback are much appreciated. Thanks Honza Pokorny [1]: https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [puppet][tripleo][all] Zuul job backlog
Wesley Hayutin writes: [snip] The TripleO project has created a single node container based composable OpenStack deployment [2]. It is the projects intention to replace most of the TripleO upstream jobs with the Standalone deployment. We would like to reduce our multi-node usage to a total of two or three multinode jobs to handle a basic overcloud deployment, updates and upgrades[a]. Currently in master we are relying on multiple multi-node scenario jobs to test many of the OpenStack services in a single job. Our intention is to move these multinode scenario jobs to single node job(s) that tests a smaller subset of services. The goal of this would be target the specific areas of the TripleO code base that affect these services and only run those there. This would replace the existing 2-3 hour two node job(s) with single node job(s) for specific services that completes in about half the time. This unfortunately will reduce the overall coverage upstream but still allows us a basic smoke test of the supported OpenStack services and their deployment upstream. Ideally projects other than TripleO would make use of the Standalone deployment to test their particular service with containers, upgrades or for various other reasons. Additional projects using this deployment would help ensure bugs are found quickly and resolved providing additional resilience to the upstream gate jobs. The TripleO team will begin review to scope out and create estimates for the above work starting on October 18 2018. One should expect to see updates on our progress posted to the list. Below are some details on the proposed changes. Thank you all for your time and patience! Performance improvements: * Standalone jobs use half the nodes of multinode jobs * The standalone job has an average run time of 60-80 minutes, about half the run time of our multinode jobs Base TripleO Job Definitions (Stein onwards): Multi-node jobs * containers-multinode * containers-multinode-updates * containers-multinode-upgrades Single node jobs * undercloud * undercloud-upgrade * standalone Jobs to be removed (Stein onwards): Multi-node jobs[b] * scenario001-multinode * scenario002-multinode * scenario003-multinode * scenario004-multinode * scenario006-mulitinode * scenario007-multinode * scenario008-multinode * scenario009-multinode * scenario010-multinode * scenario011-multinode Jobs that may need to be created to cover additional services[4] (Stein onwards): Single node jobs[c] * standalone-barbican * standalone-ceph[d] * standalone-designate * standalone-manila * standalone-octavia * standalone-openshift * standalone-sahara * standalone-telemetry [1] https://gist.github.com/notmyname/8bf3dbcb7195250eb76f2a1a8996fb00 [2] https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html [3] http://lists.openstack.org/pipermail/openstack-dev/2018-September/134867.html [4] https://github.com/openstack/tripleo-heat-templates/blob/master/README.rst#service-testing-matrix I wanted to follow-up that original thread [0] wrt running a default standalone tripleo deployment integration job for openstack-puppet modules to see if it breaks tripleo. There is a topic [1] to review please. The issue (IMO) is that the default standalone setup deploys a fixed set of openstack services, some are disabled [2] and some go by default [3], which may be either an excessive or lacking coverage (like Ironic) for some of the puppet openstack modules. My take is it only makes sense to deploy that standalone setup for the puppet-openstack-integration perhaps (and tripleo itself obviously, as that involves a majority of openstack-puppet modules), but not for each particular puppet-foo module. Why wasting CI resources for that default job clonned for the modules and see, for example, puppet-keystone (and all other modules) standalone jobs failing because of an unrelated puppet-nova's libvirt issue [4]? That's pointless and inefficient. And to cover Ironic deployments, we'd have to keep the undercloud job as a separate. Although that probably is acceptable as a first iteration... But ideally I'd like to see that standalone job composable and adapted to only test a deployment for the wanted components for puppet-foo modules under check/gate. And it also makes sense to disable tempest for the standalone job(s) perhaps as it is already covered by neighbour jobs. [0] https://goo.gl/UFNtcC [1] https://goo.gl/dPkgCH [2] https://goo.gl/eZ1wuC [3] https://goo.gl/H8ZnAJ [4] https://review.openstack.org/609289 -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls
g/cgi-bin/mailman/listinfo/openstack-dev -- Regards, Rabi Mishra __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best Regards, Sergii Golovatiuk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Regards, Rabi Mishra __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best Regards, Sergii Golovatiuk __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Plan management refactoring for Life cycle
On 9/11/18 4:43 AM, James Slagle wrote: On Mon, Sep 10, 2018 at 10:12 AM Jiri Tomasek wrote: Hi Mathieu, Thanks for bringing up the topic. There are several efforts currently in progress which should lead to solving the problems you're describing. We are working on introducing CLI commands which would perform the deployment configuration operations on deployment plan in Swift. This is a main step to finally reach CLI and GUI compatibility/interoperability. CLI will perform actions to configure deployment (roles, networks, environments selection, parameters setting etc.) by calling Mistral workflows which store the information in deployment plan in Swift. The result is that all the information which define the deployment are stored in central place - deployment plan in Swift and the deploy command is turned into simple 'openstack overcloud deploy'. Deployment plan then has plan-environment.yaml which has the list of environments used and customized parameter values, roles-data.yaml which carry roles definition and network-data.yaml which carry networks definition. The information stored in these files (and deployment plan in general) can then be treated as source of information about deployment. The deployment can then be easily exported and reliably replicated. Here is the document which we put together to identify missing pieces between GUI,CLI and Mistral TripleO API. We'll use this to discuss the topic at PTG this week and define work needed to be done to achieve the complete interoperability. [1] Also there is a pending patch from Steven Hardy which aims to remove CLI specific environments merging which should fix the problem with tracking of the environments used with CLI deployment. [2] [1] https://gist.github.com/jtomasek/8c2ae6118be0823784cdafebd9c0edac (Apologies for inconvenient format, I'll try to update this to better/editable format. Original doc: https://docs.google.com/spreadsheets/d/1ERfx2rnPq6VjkJ62JlA_E6jFuHt9vVl3j95dg6-mZBM/edit?usp=sharing) [2] https://review.openstack.org/#/c/448209/ Related to this work, I'd like to see us store the plan in git instead of swift. I think this would reduce some of the complexity around plan management, and move us closer to a simpler undercloud architecture. It would be nice to see each change to the plan represented as new git commit, so we can even see the changes to the plan as roles, networks, services, etc, are selected. I also think git would provide a familiar experience for both developers and operators who are already accustomed to devops type workflows. I think we could make these changes without it impact the API too much or, hopefully, at all. +42! See also the related RFE (drafted only) [0] [0] https://bugs.launchpad.net/tripleo/+bug/1782139 -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] quickstart for humans
On 8/31/18 6:03 PM, Raoul Scarazzini wrote: On 8/31/18 12:07 PM, Jiří Stránský wrote: [...] * "for humans" definition differs significantly based on who you ask. E.g. my intention with [2] was to readily expose *more* knobs and tweaks and be more transparent with the underlying workings of Ansible, because i felt like quickstart.sh hides too much from me. In my opinion [2] is sufficiently "for humans", yet it does pretty much the opposite of what you're looking for. Hey Jiri, I think that "for humans" means simply that you launch the command with just one parameter (i.e. the virthost), and then you have something. yes, this ^^ I'd also add one more thing: if you later remove that something while having the virthost as your localhost, and the non root user as your current logged-in user, you remain operational :) Teardown is quite destructive for CI, which might be not applicable for devboxes running on a laptop. I have a few changes [0] in work for addressing that case. [0] https://review.openstack.org/#/q/topic:localcon+(status:open+OR+status:merged) And because of this I think here is just a matter of concentrate the efforts to turn back quickstart.sh to its original scope: making you launch it with just one parameter and have an available environment after a while (OK, sometimes more than a while). Since part of the recent discussions were around the hypotheses of removing it, maybe we can think about make it useful again. Mostly because it is right that the needs of everyone are different, but on the other side with a solid starting point (the default) you can think about customizing depending on your needs. I'm for recycling what we have, planet (and me) will enjoy it! My 0,002 cents. -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls
in similar ways. Ultimately whether you run 'docker run' or 'podman run' you end up with the same thing as far as the existing TripleO architecture goes. Dan You have a tough job. I wish you all the luck in the world in making these decisions and hope politics and internal corporate management decisions play as little a role in them as possible. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls
a role in them as possible. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][Edge][FEMDC] Edge clouds and controlplane updates
On 8/13/18 9:47 PM, Giulio Fidente wrote: Hello, I'd like to get some feedback regarding the remaining work for the split controlplane spec implementation [1] Specifically, while for some services like nova-compute it is not necessary to update the controlplane nodes after an edge cloud is deployed, for other services, like cinder (or glance, probably others), it is necessary to do an update of the config files on the controlplane when a new edge cloud is deployed. In fact for services like cinder or glance, which are hosted in the controlplane, we need to pull data from the edge clouds (for example the newly deployed ceph cluster keyrings and fsid) to configure cinder (or glance) with a new backend. It looks like this demands for some architectural changes to solve the > following two: - how do we trigger/drive updates of the controlplane nodes after the edge cloud is deployed? Note, there is also a strict(?) requirement of local management capabilities for edge clouds temporary disconnected off the central controlplane. That complicates the updates triggering even more. We'll need at least a notification-and-triggering system to perform required state synchronizations, including conflicts resolving. If that's the case, the architecture changes for TripleO deployment framework are inevitable AFAICT. - how do we scale the controlplane parameters to accomodate for N backends of the same type? A very rough approach to the latter could be to use jinja to scale up the CephClient service so that we can have multiple copies of it in the controlplane. Each instance of CephClient should provide the ceph config file and keyring necessary for each cinder (or glance) backend. Also note that Ceph is only a particular example but we'd need a similar workflow for any backend type. The etherpad for the PTG session [2] touches this, but it'd be good to start this conversation before then. 1. https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/split-controlplane.html 2. https://etherpad.openstack.org/p/tripleo-ptg-queens-split-controlplane -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Patches to speed up plan operations
On 8/2/18 1:34 AM, Ian Main wrote: Hey folks! So I've been working on some patches to speed up plan operations in TripleO. This was originally driven by the UI needing to be able to perform a 'plan upload' in something less than several minutes. :) https://review.openstack.org/#/c/581153/ https://review.openstack.org/#/c/581141/ I have a functioning set of patches, and it actually cuts over 2 minutes off the overcloud deployment time. Without patch: + openstack overcloud plan create --templates /home/stack/tripleo-heat-templates/ overcloud Creating Swift container to store the plan Creating plan from template files in: /home/stack/tripleo-heat-templates/ Plan created. real 3m3.415s With patch: + openstack overcloud plan create --templates /home/stack/tripleo-heat-templates/ overcloud Creating Swift container to store the plan Creating plan from template files in: /home/stack/tripleo-heat-templates/ Plan created. real 0m44.694s This is on VMs. On real hardware it now takes something like 15-20 seconds to do the plan upload which is much more manageable from the UI standpoint. Some things about what this patch does: - It makes use of process-templates.py (written for the undercloud) to process the jinjafied templates. This reduces replication with the existing version in the code base and is very fast as it's all done on local disk. Just wanted to say Special Big Thank You for doing that code consolidation work! - It stores the bulk of the templates as a tarball in swift. Any individual files in swift take precedence over the contents of the tarball so it should be backwards compatible. This is a great speed up as we're not accessing a lot of individual files in swift. There's still some work to do; cleaning up and fixing the unit tests, testing upgrades etc. I just wanted to get some feedback on the general idea and hopefully some reviews and/or help - especially with the unit test stuff. Thanks everyone! Ian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Proposing Lukas Bezdicka core on TripleO
+1 On 8/1/18 1:31 PM, Giulio Fidente wrote: Hi, I would like to propose Lukas Bezdicka core on TripleO. Lukas did a lot work in our tripleoclient, tripleo-common and tripleo-heat-templates repos to make FFU possible. FFU, which is meant to permit upgrades from Newton to Queens, requires in depth understanding of many TripleO components (for example Heat, Mistral and the TripleO client) but also of specific TripleO features which were added during the course of the three releases (for example config-download and upgrade tasks). I believe his FFU work to have been very challenging. Given his broad understanding, more recently Lukas started helping doing reviews in other areas. I am so sure he'll be a great addition to our group that I am not even looking for comments, just votes :D -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] [tripleo-validations] using using top-level fact vars will deprecated in future Ansible versions
On 7/23/18 9:33 PM, Emilien Macchi wrote: But it seems like, starting with Ansible 2.5 (what we already have in Rocky and beyond), we should encourage the usage of ansible_facts dictionary. Example: var=hostvars[inventory_hostname].ansible_facts.hostname instead of: var=ansible_hostname If that means rewriting all ansible_foo things around the globe, we'd have a huge scope for changes. Those are used literally everywhere. Here is only a search for tripleo-quickstart [0] [0] http://codesearch.openstack.org/?q=%5B%5C.%27%22%5Dansible_%5CS%2B%5B%5E%3A%5D=nope=roles=tripleo-quickstart -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] How to integrate a Heat plugin in a containerized deployment?
On 7/23/18 12:50 PM, Ricardo Noriega De Soto wrote: Hello guys, I need to deploy the following Neutron BGPVPN heat plugin. https://docs.openstack.org/networking-bgpvpn/ocata/heat.html This will allow users, to create Heat templates with BGPVPN resources. Right now, BGPVPN service plugin is only available in neutron-server-opendaylight Kolla image: https://github.com/openstack/kolla/blob/master/docker/neutron/neutron-server-opendaylight/Dockerfile.j2#L13 It would make sense to add right there the python-networking-bgpvpn-heat package. Is that correct? Heat exposes a parameter to configure plugins You can override that via neutron_server_opendaylight_packages_append in tripleo common, like [0] [0] http://git.openstack.org/cgit/openstack/tripleo-common/tree/container-images/tripleo_kolla_template_overrides.j2#n76 ( HeatEnginePluginDirs), that corresponds to plugins_dir parameter in heat.conf. What is the issue here? Heat will try to search any available plugin in the path determined by HeatEnginePluginDirs, however, the heat plugin is located in a separate container (neutron_api). How should we tackle this? I see no other example of this type of integration. Here is the most recent example [1] of inter-containers state sharing for Ironic containers. I think something similar should be done for docker/services/heat* yaml files. [1] https://review.openstack.org/#/c/584265/ AFAIK, /usr/lib/python2.7/site-packages is not exposed to the host as a mounted volume, so how is heat supposed to find bgpvpn heat plugin? Thanks for your advice. Cheers -- Ricardo Noriega Senior Software Engineer - NFV Partner Engineer | Office of Technology | Red Hat irc: rnoriega @freenode __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] New "validation" subcommand for "openstack undercloud"
On 7/16/18 6:32 PM, Dan Prince wrote: On Mon, Jul 16, 2018 at 11:27 AM Cédric Jeanneret wrote: Dear Stackers, In order to let operators properly validate their undercloud node, I propose to create a new subcommand in the "openstack undercloud" "tree": `openstack undercloud validate' This should only run the different validations we have in the undercloud_preflight.py¹ That way, an operator will be able to ensure all is valid before starting "for real" any other command like "install" or "upgrade". Of course, this "validate" step is embedded in the "install" and "upgrade" already, but having the capability to just validate without any further action is something that can be interesting, for example: - ensure the current undercloud hardware/vm is sufficient for an update - ensure the allocated VM for the undercloud is sufficient for a deploy - and so on There are probably other possibilities, if we extend the "validation" scope outside the "undercloud" (like, tripleo, allinone, even overcloud). What do you think? Any pros/cons/thoughts? I think this command could be very useful. I'm assuming the underlying implementation would call a 'heat stack-validate' using an ephemeral heat-all instance. If so way we implement it for the undercloud vs the I think that should be just ansible commands triggered natively via tripleoclient. Why would we validate with heat deploying a throwaway one-time ephemeral stacks (for undercloud/standalon) each time a user runs that heat installer? We had to introduce the virtual stack state tracking system [0], for puppet manifests compatibility sakes only (it sometimes rely on states CREATE vs UPDATE), which added more "ephemeral complexity" in DF. I'm not following why would we validate ephemeral stacks or using it as an additional moving part? [0] https://review.openstack.org/#/q/topic:bug/1778505+(status:open+OR+status:merged) 'standalone' use case would likely be a bit different. We can probably subclass the implementations to share common code across the efforts though. For the undercloud you are likely to have a few extra 'local only' validations. Perhaps extra checks for things on the client side. For the all-in-one I had envisioned using the output from the 'heat stack-validate' to create a sample config file for a custom set of services. Similar to how tools like Packstack generate a config file for example. Dan Cheers, C. ¹ http://git.openstack.org/cgit/openstack/python-tripleoclient/tree/tripleoclient/v1/undercloud_preflight.py -- Cédric Jeanneret Software Engineer DFG:DF __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Proposing Jose Luis Franco for TripleO core reviewer on Upgrade bits
On 7/20/18 11:07 AM, Carlos Camacho Gonzalez wrote: Hi!!! I'll like to propose Jose Luis Franco [1][2] for core reviewer in all the TripleO upgrades bits. He shows a constant and active involvement in improving and fixing our updates/upgrades workflows, he helps also trying to develop/improve/fix our upstream support for testing the updates/upgrades. Please vote -1/+1, and consider this my +1 vote :) +1! [1]: https://review.openstack.org/#/q/owner:jfrancoa%2540redhat.com [2]: http://stackalytics.com/?release=all=commits_id=jfrancoa Cheers, Carlos. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes
On 7/20/18 2:13 AM, Ben Nemec wrote: On 07/19/2018 03:37 PM, Emilien Macchi wrote: Today I played a little bit with Standalone deployment [1] to deploy a single OpenStack cloud without the need of an undercloud and overcloud. The use-case I am testing is the following: "As an operator, I want to deploy a single node OpenStack, that I can extend with remote compute nodes on the edge when needed." We still have a bunch of things to figure out so it works out of the box, but so far I was able to build something that worked, and I found useful to share it early to gather some feedback: https://gitlab.com/emacchi/tripleo-standalone-edge Keep in mind this is a proof of concept, based on upstream documentation and re-using 100% what is in TripleO today. The only thing I'm doing is to change the environment and the roles for the remote compute node. I plan to work on cleaning the manual steps that I had to do to make it working, like hardcoding some hiera parameters and figure out how to override ServiceNetmap. Anyway, feel free to test / ask questions / provide feedback. What is the benefit of doing this over just using deployed server to install a remote server from the central management system? You need to have connectivity back to the central location anyway. Won't this become unwieldy with a large number of edge nodes? I thought we told people not to use Packstack for multi-node deployments for exactly that reason. I guess my concern is that eliminating the undercloud makes sense for single-node PoC's and development work, but for what sounds like a production workload I feel like you're cutting off your nose to spite your face. In the interest of saving one VM's worth of resources, now all of your day 2 operations have no built-in orchestration. Every time you want to change a configuration it's "copy new script to system, ssh to system, run script, repeat for all systems. So maybe this is a backdoor way to make Ansible our API? ;-) Ansible may orchestrate that for day 2. Deploying Heat stacks is already made ephemeral for standalone/underclouds so only thing you'll need for day 2 is ansible really. Hence, the need of undercloud shrinks into having an ansible control node, like your laptop, to control all clouds via inventory. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Stein blueprint - Plan to remove Keepalived support (replaced by Pacemaker)
I'm all for it! Another benefit is better coverage for the standalone CI job(s), when it will (hopefully) become a mandatory dependency for overcloud multinode jobs. On 7/16/18 12:49 PM, Sergii Golovatiuk wrote: Hi, On Fri, Jul 13, 2018 at 9:11 PM, Juan Antonio Osorio wrote: Sounds good to me. Even if pacemaker is heavier, less options and consistency is better. Greetings from Mexico :D Greetings from Poznań :D On Fri, 13 Jul 2018, 13:33 Emilien Macchi, wrote: Greetings, We have been supporting both Keepalived and Pacemaker to handle VIP management. This is really good initiative which supports the main idea of 'simplicity'. Keepalived is actually the tool used by the undercloud when SSL is enabled (for SSL termination). While Pacemaker is used on the overcloud to handle VIPs but also services HA. I see some benefits at removing support for keepalived and deploying Pacemaker by default: - pacemaker can be deployed on one node (we actually do it in CI), so can be deployed on the undercloud to handle VIPs and manage HA as well. Additionally, undercloud services may be done HA on 3 nodes if/when it's really required. - it'll allow to extend undercloud & standalone use cases to support multinode one day, with HA and SSL, like we already have on the overcloud. - it removes the complexity of managing two tools so we'll potentially removing code in TripleO. ++ - of course since pacemaker features from overcloud would be usable in standalone environment, but also on the undercloud. The same OCF scripts will be used for undercloud and overcloud. There is probably some downside, the first one is I think Keepalived is much more lightweight than Pacemaker, we probably need to run some benchmark here and make sure we don't make the undercloud heavier than it is now. From other perspective operator need to learn/support 2 tools. I went ahead and created this blueprint for Stein: https://blueprints.launchpad.net/tripleo/+spec/undercloud-pacemaker-default I also plan to prototype some basic code soon and provide an upgrade path if we accept this blueprint. I would like to participate in this initiative as I found it very valuable. This is something I would like to discuss here and at the PTG, feel free to bring questions/concerns, Thanks! -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [kolla][nova][tripleo] Safe guest shutdowns with kolla?
[Added tripleo] It would be nice to have this situation verified/improved for containerized libvirt for compute nodes deployed with TripleO as well. On 7/12/18 11:02 PM, Clint Byrum wrote: Greetings! We've been deploying with Kolla on CentOS 7 now for a while, and we've recently noticed a rather troubling behavior when we shutdown hypervisors. Somewhere between systemd and libvirt's systemd-machined integration, we see that guests get killed aggressively by SIGTERM'ing all of the qemu-kvm processes. This seems to happen because they are scoped into machine.slice, but systemd-machined is killed which drops those scopes and thus results in killing off the machines. So far we had observed the similar [0] happening, but to systemd vs containers managed by docker-daemon (dockerd). [0] https://bugs.launchpad.net/tripleo/+bug/1778913 In the past, we've used the libvirt-guests service when our libvirt was running outside of containers. This worked splendidly, as we could have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding interrupting any running processes. But this service isn't available on the host OS, as it won't be able to talk to libvirt inside the container. The solution I've come up with for now is this: [Unit] Description=Manage libvirt guests in kolla safely After=docker.service systemd-machined.service Requires=docker.service [Install] WantedBy=sysinit.target [Service] Type=oneshot RemainAfterExit=yes TimeoutStopSec=400 ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh start ExecStart=/usr/bin/docker start nova_compute ExecStop=/usr/bin/docker stop nova_compute ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh shutdown This doesn't seem to work, though I'm still trying to work out the ordering and such. It should ensure that before we stop the systemd-machined and destroy all of its scopes (thus, killing all the vms), we run the libvirt-guests.sh script to try and shut them down. The TimeoutStopSec=400 is because the script itself waits 300 seconds for any VM that refuses to shutdown cleanly, so this gives it a chance to wait for at least one of those. This is an imperfect solution but it allows us to move forward after having made a reasonable attempt at clean shutdowns. Anyway, just wondering if anybody else using kolla-ansible or kolla containers in general have run into this problem, and whether or not there are better/known solutions. As I noted above, I think the issue may be valid for TripleO as well. Thanks! __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Rocky blueprints
On 7/11/18 7:39 PM, Alex Schultz wrote: Hello everyone, As milestone 3 is quickly approaching, it's time to review the open blueprints[0] and their status. It appears that we have made good progress on implementing significant functionality this cycle but we still have some open items. Below is the list of blueprints that are still open so we'll want to see if they will make M3 and if not, we'd like to move them out to Stein and they won't make Rocky without an FFE. Currently not marked implemented but without any open patches (likely implemented): - https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow - https://blueprints.launchpad.net/tripleo/+spec/tripleo-predictable-ctlplane-ips Currently open with pending patches (may need FFE): - https://blueprints.launchpad.net/tripleo/+spec/config-download-ui - https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow - https://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud This needs FFE please. The remaining work [0] is mostly cosmetic (defaults switching) though it's somewhat blocked on CI infrastructure readiness [1] for containerized undercloud and overcloud deployments. The situation had been drastically improved by the recent changes though, like longer container images caching, enabling ansible pipelining, using shared local container registries for undercloud and overcloud deployments and may be more I'm missing. There is also ongoing work to mitigate the CI walltime [2]. [0] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132126.html [1] https://trello.com/c/1yDVHmqm/115-switch-remaining-ci-jobs [2] https://trello.com/c/PpNtarue/126-ci-break-the-openstack-infra-3h-timeout-wall - https://blueprints.launchpad.net/tripleo/+spec/bluestore - https://blueprints.launchpad.net/tripleo/+spec/gui-node-discovery-by-range - https://blueprints.launchpad.net/tripleo/+spec/multiarch-support - https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-templates - https://blueprints.launchpad.net/tripleo/+spec/sriov-vfs-as-network-interface - https://blueprints.launchpad.net/tripleo/+spec/custom-validations Currently open without work (should be moved to Stein): - https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing - https://blueprints.launchpad.net/tripleo/+spec/plan-from-git-in-gui - https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-react-walkthrough - https://blueprints.launchpad.net/tripleo/+spec/wrapping-workflow-for-node-operations - https://blueprints.launchpad.net/tripleo/+spec/ironic-overcloud-ci Please take some time to review this list and update it. If you think you are close to finishing out the feature and would like to request an FFE please start getting that together with appropriate details and justifications for the FFE. Thanks, -Alex [0] https://blueprints.launchpad.net/tripleo/rocky __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] easily identifying how services are configured
On 7/6/18 7:02 PM, Ben Nemec wrote: On 07/05/2018 01:23 PM, Dan Prince wrote: On Thu, 2018-07-05 at 14:13 -0400, James Slagle wrote: I would almost rather see us organize the directories by service name/project instead of implementation. Instead of: puppet/services/nova-api.yaml puppet/services/nova-conductor.yaml docker/services/nova-api.yaml docker/services/nova-conductor.yaml We'd have: services/nova/nova-api-puppet.yaml services/nova/nova-conductor-puppet.yaml services/nova/nova-api-docker.yaml services/nova/nova-conductor-docker.yaml (or perhaps even another level of directories to indicate puppet/docker/ansible?) I'd be open to this but doing changes on this scale is a much larger developer and user impact than what I was thinking we would be willing to entertain for the issue that caused me to bring this up (i.e. how to identify services which get configured by Ansible). Its also worth noting that many projects keep these sorts of things in different repos too. Like Kolla fully separates kolla-ansible and kolla-kubernetes as they are quite divergent. We have been able to preserve some of our common service architectures but as things move towards kubernetes we may which to change things structurally a bit too. True, but the current directory layout was from back when we intended to support multiple deployment tools in parallel (originally tripleo-image-elements and puppet). Since I think it has become clear that it's impractical to maintain two different technologies to do essentially the same thing I'm not sure there's a need for it now. It's also worth noting that kolla-kubernetes basically died because there wasn't enough people to maintain both deployment methods, so we're not the only ones who have found that to be true. If/when we move to kubernetes I would anticipate it going like the initial containers work did - development for a couple of cycles, then a switch to the new thing and deprecation of the old thing, then removal of support for the old thing. That being said, because of the fact that the service yamls are essentially an API for TripleO because they're referenced in user this ^^ resource registries, I'm not sure it's worth the churn to move everything either. I think that's going to be an issue either way though, it's just a question of the scope. _Something_ is going to move around no matter how we reorganize so it's a problem that needs to be addressed anyway. [tl;dr] I can foresee reorganizing that API becomes a nightmare for maintainers doing backports for queens (and the LTS downstream release based on it). Now imagine kubernetes support comes within those next a few years, before we can let the old API just go... I have an example [0] to share all that pain brought by a simple move of 'API defaults' from environments/services-docker to environments/services plus environments/services-baremetal. Each time a file changes contents by its old location, like here [1], I had to run a lot of sanity checks to rebase it properly. Like checking for the updated paths in resource registries are still valid or had to/been moved as well, then picking the source of truth for diverged old vs changes locations - all that to loose nothing important in progress. So I'd say please let's do *not* change services' paths/namespaces in t-h-t "API" w/o real need to do that, when there is no more alternatives left to that. [0] https://review.openstack.org/#/q/topic:containers-default-stable/queens [1] https://review.openstack.org/#/c/567810 -Ben __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] tripleo gate is blocked - please read
On 6/14/18 3:50 AM, Emilien Macchi wrote: TL;DR: gate queue was 25h+, we put all patches from gate on standby, do not restore/recheck until further announcement. We recently enabled the containerized undercloud for multinode jobs and we believe this was a bit premature as the container download process wasn't optimized so it's not pulling the mirrors for the same containers multiple times yet. It caused the job runtime to increase and probably the load on docker.io <http://docker.io> mirrors hosted by OpenStack Infra to be a bit slower to provide the same containers multiple times. The time taken to prepare containers on the undercloud and then for the overcloud caused the jobs to randomly timeout therefore the gate to fail in a high amount of times, so we decided to remove all jobs from the gate by abandoning the patches temporarily (I have them in my browser and will restore when things are stable again, please do not touch anything). Steve Baker has been working on a series of patches that optimize the way we prepare the containers but basically the workflow will be: - pull containers needed for the undercloud into a local registry, using infra mirror if available - deploy the containerized undercloud - pull containers needed for the overcloud minus the ones already pulled for the undercloud, using infra mirror if available - update containers on the overcloud - deploy the containerized undercloud Let me also note that it's may be time to introduce jobs dependencies [0]. Dependencies might somewhat alleviate registries/mirrors DoS issues, like that one we have currently, by running jobs in batches, and not firing of all at once. We still have options to think of. The undercloud deployment takes longer than standalone, but provides better coverage therefore better extrapolates (and cuts off) future overcloud failures for the dependent jobs. Standalone is less stable yet though. The containers update check may be also an option for the step 1, or step 2, before the remaining multinode jobs execute. Making those dependent jobs skipped, in turn, reduces DoS effects caused to registries and mirrors. [0] https://review.openstack.org/#/q/status:open+project:openstack-infra/tripleo-ci+topic:ci_pipelines With that process, we hope to reduce the runtime of the deployment and therefore reduce the timeouts in the gate. To enable it, we need to land in that order: https://review.openstack.org/#/c/571613/, https://review.openstack.org/#/c/574485/, https://review.openstack.org/#/c/571631/ and https://review.openstack.org/#/c/568403. In the meantime, we are disabling the containerized undercloud recently enabled on all scenarios: https://review.openstack.org/#/c/575264/ for mitigation with the hope to stabilize things until Steve's patches land. Hopefully, we can merge Steve's work tonight/tomorrow and re-enable the containerized undercloud on scenarios after checking that we don't have timeouts and reasonable deployment runtimes. That's the plan we came with, if you have any question / feedback please share it. -- Emilien, Steve and Wes __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Proposing Alan Bishop tripleo core on storage bits
On 6/13/18 6:50 PM, Emilien Macchi wrote: Alan Bishop has been highly involved in the Storage backends integration in TripleO and Puppet modules, always here to update with new features, fix (nasty and untestable third-party backends) bugs and manage all the backports for stable releases: https://review.openstack.org/#/q/owner:%22Alan+Bishop+%253Cabishop%2540redhat.com%253E%22 He's also well knowledgeable of how TripleO works and how containers are integrated, I would like to propose him as core on TripleO projects for patches related to storage things (Cinder, Glance, Swift, Manila, and backends). Please vote -1/+1, +1 Thanks! -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
The proposed undercloud installation jobs dependency [0] worked, see the jobs start time [1], [2] confirms that. The resulting delay for the full pipeline is an ~80 minutes, as it was expected. So PTAL folks, I propose to try it out in real gating and see how the tripleo zuul queue gets relieved. The remaining patch [1] adding a dependency on tox/linting didn't work, I'll need some help please to figure out why. Thank you Tristan and James and y'all folks for helping! [0] https://review.openstack.org/#/c/568536/ [1] http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-undercloud-containers/cfebec0/ara-report/ [2] http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-containers-multinode/1a211bb/ara-report/ [3] https://review.openstack.org/#/c/568543/ Perhaps this has something to do with jobs evaluation order, it may be worth trying to add the dependencies list in the project-templates, like it is done here for example: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 It also easier to read dependencies from pipelines definition imo. -Tristan -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/28/18 11:43 AM, Bogdan Dobrelya wrote: On 5/25/18 6:40 PM, Tristan Cacqueray wrote: Hello Bogdan, Perhaps this has something to do with jobs evaluation order, it may be worth trying to add the dependencies list in the project-templates, like it is done here for example: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 It also easier to read dependencies from pipelines definition imo. Thank you! It seems for the most places, tripleo uses pre-defined templates, see [0]. And templates can not import dependencies [1] :( Here is a zuul story for that [2] [2] https://storyboard.openstack.org/#!/story/2002113 [0] http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations [1] https://review.openstack.org/#/c/568536/4 -Tristan On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote: Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started simultaneously. While I expected them run one by one. According to the patch 568536 [3], [1] is a dependency for [2] and [3]. The same can be observed for the remaining patches in the topic [4]. Is that a bug or I misunderstood what zuul job dependencies actually do? [0] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ [1] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ [2] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ [3] https://review.openstack.org/#/c/568536/ [4] https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/25/18 6:40 PM, Tristan Cacqueray wrote: Hello Bogdan, Perhaps this has something to do with jobs evaluation order, it may be worth trying to add the dependencies list in the project-templates, like it is done here for example: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 It also easier to read dependencies from pipelines definition imo. Thank you! It seems for the most places, tripleo uses pre-defined templates, see [0]. And templates can not import dependencies [1] :( [0] http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations [1] https://review.openstack.org/#/c/568536/4 -Tristan On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote: Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started simultaneously. While I expected them run one by one. According to the patch 568536 [3], [1] is a dependency for [2] and [3]. The same can be observed for the remaining patches in the topic [4]. Is that a bug or I misunderstood what zuul job dependencies actually do? [0] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ [1] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ [2] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ [3] https://review.openstack.org/#/c/568536/ [4] https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started simultaneously. While I expected them run one by one. According to the patch 568536 [3], [1] is a dependency for [2] and [3]. The same can be observed for the remaining patches in the topic [4]. Is that a bug or I misunderstood what zuul job dependencies actually do? [0] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ [1] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ [2] http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ [3] https://review.openstack.org/#/c/568536/ [4] https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) On 5/15/18 11:39 AM, Bogdan Dobrelya wrote: Added a few more patches [0], [1] by the discussion results. PTAL folks. Wrt remaining in the topic, I'd propose to give it a try and revert it, if it proved to be worse than better. Thank you for feedback! The next step could be reusing artifacts, like DLRN repos and containers built for patches and hosted undercloud, in the consequent pipelined jobs. But I'm not sure how to even approach that. [0] https://review.openstack.org/#/c/568536/ [1] https://review.openstack.org/#/c/568543/ On 5/15/18 10:54 AM, Bogdan Dobrelya wrote: On 5/14/18 10:06 PM, Alex Schultz wrote: On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote: An update for your review please folks Bogdan Dobrelya writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not overhead the RDO CI execution times so much to become a problem. WDYT? I'm not sure it makes sense to add a dependency on other deployment tests. It's going to add additional time to the CI run because the upgrade won't start until well over an hour after the rest of the The things are not so simple. There is also a significant time-to-wait-in-queue jobs start delay. And it takes probably even longer than the time to execute jobs. And that delay is a function of available HW resources and zuul queue length. And the proposed change affects those parameters as well, assuming jobs with failed dependencies won't run at all. So we could expect longer execution times compensated with shorter wait times! I'm not sure how to
Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching
ards patch you listed, we would have backport this change to *every* > branch, and it wouldn't really help to avoid the issue. The source of > problem is not branchless repo here. > No we shouldn't be backporting every change. The logic in oooq-extras should be version specific and if we're changing an interface in tripleo in a breaking fashion we're doing it wrong in tripleo. If we're backporting things to work around tripleo issues, we're doing it wrong in quickstart. > Regarding catching such issues and Bogdans point, that's right we added a > few jobs to catch such issues in the future and prevent breakages, and a few > running jobs is reasonable price to keep configuration working in all > branches. Comparing to maintenance nightmare with branches of CI code, it's > really a *zero* price. > Nothing is free. If there's a high maintenance cost, we haven't properly identified the optimal way to separate functionality between tripleo/quickstart. I have repeatedly said that the provisioning parts of quickstart should be separate because those aren't tied to a tripleo version and this along with the scenario configs should be the only unbranched repo we have. Any roles related to how to configure/work with tripleo should be branched and tied to a stable branch of tripleo. This would actually be beneficial for tripleo as well because then we can see when we are introducing backwards incompatible changes. Thanks, -Alex > Thanks > > > On Wed, May 23, 2018 at 3:43 PM, Sergii Golovatiuk <sgolo...@redhat.com <mailto:sgolo...@redhat.com>> > wrote: >> >> Hi, >> >> Looking at [1], I am thinking about the price we paid for not >> branching tripleo-quickstart. Can we discuss the options to prevent >> the issues such as [1]? Thank you in advance. >> >> [1] https://review.openstack.org/#/c/569830/4 <https://review.openstack.org/#/c/569830/4> >> >> -- >> Best Regards, >> Sergii Golovatiuk >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> > > > > > -- > Best regards > Sagi Shnaidman > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> -- Best regards Sagi Shnaidman __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching
On 5/23/18 2:43 PM, Sergii Golovatiuk wrote: Hi, Looking at [1], I am thinking about the price we paid for not branching tripleo-quickstart. Can we discuss the options to prevent the issues such as [1]? Thank you in advance. [1] https://review.openstack.org/#/c/569830/4 That was only a half of the full price, actually, see also additional multinode containers check/gate jobs [0],[1] from now on executed against the master branches of all tripleo repos (IIUC), for release -2 and -1 from master. [0] https://review.openstack.org/#/c/569932/ [1] https://review.openstack.org/#/c/569854/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] [barbican] [tc] key store in base services
On 5/17/18 9:58 AM, Thierry Carrez wrote: Jeremy Stanley wrote: [...] As a community, we're likely to continue to make imbalanced trade-offs against relevant security features if we don't move forward and declare that some sort of standardized key storage solution is a fundamental component on which OpenStack services can rely. Being able to just assume that you can encrypt volumes in Swift, even as a means to further secure a TripleO undercloud, would be a step in the right direction for security-minded deployments. Unfortunately, I'm unable to find any follow-up summary on the mailing list from the aforementioned session, but recollection from those who were present (I had a schedule conflict at that time) was that a Castellan-compatible key store would at least be a candidate for inclusion in our base services list: https://governance.openstack.org/tc/reference/base-services.html Yes, last time this was discussed, there was lazy consensus that adding "a Castellan-compatible secret store" would be a good addition to the base services list if we wanted to avoid proliferation of half-baked keystore implementations in various components. The two blockers were: 1/ castellan had to be made less Barbican-specific, offer at least one other secrets store (Vault), and move under Oslo (done) Back to the subject and tripleo underclouds running Barbican, using vault as a backend may be a good option, given that openshift supports [0] it as well for storing k8s secrets, and kubespray does [1] for vanilla k8s deployments, and that we have openshift/k8s-based control plane for openstack on the integration roadmap. So we'll highly likely end up running Barbican/Vault on undercloud anyway. [0] https://blog.openshift.com/managing-secrets-openshift-vault-integration/ [1] https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vault.md 2/ some projects (was it Designate ? Octavia ?) were relying on advanced functions of Barbican not generally found in other secrets store, like certificate generation, and so would prefer to depend on Barbican itself, which confuses the messaging around the base service addition a bit ("any Castellan-supported secret store as long as it's Barbican") -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/16/18 2:17 PM, Jeremy Stanley wrote: On 2018-05-16 11:31:30 +0200 (+0200), Bogdan Dobrelya wrote: [...] I'm pretty sure though with broader containers adoption, openstack infra will catch up eventually, so we all could benefit our upstream CI jobs with affinity based and co-located data available around for consequent build steps. I still don't see what it has to do with containers. We've known My understanding, I may be totally wrong, is that unlike to packages and repos (do not count OSTree [0]), containers use layers and can be exported into tarballs with built-in de-duplication. This makes idea of tossing those tarballs around much more attractive, than doing something similar to repositories with packages. Of course container images can be pre-built into nodepool images, just like packages, so CI users can rebuild on top with less changes brought into new layers, which is another nice to have option by the way. [0] https://rpm-ostree.readthedocs.io/en/latest/ these were potentially useful features long before container-oriented projects came into the picture. We simply focused on implementing other, even more generally-applicable features first. Right, I think this only confirms that it *does* have something to containers, and priorities for containerized use cases will follow containers adoption trends. If everyone one day suddenly ask for nodepool images containing latest kolla containers injected, for example. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/15/18 10:31 PM, Wesley Hayutin wrote: On Tue, May 15, 2018 at 1:29 PM James E. Blair <cor...@inaugust.com <mailto:cor...@inaugust.com>> wrote: Jeremy Stanley <fu...@yuggoth.org <mailto:fu...@yuggoth.org>> writes: > On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote: > [...] >> We're also talking about making a new kind of job which can continue to >> run after it's "finished" so that you could use it to do something like >> host a container registry that's used by other jobs running on the >> change. We don't have that feature yet, but if we did, would you prefer >> to use that instead of the intermediate swift storage? > > If the subsequent jobs depending on that one get nodes allocated > from the same provider, that could solve a lot of the potential > network performance risks as well. That's... tricky. We're *also* looking at affinity for buildsets, and I'm optimistic we'll end up with something there eventually, but that's likely to be a more substantive change and probably won't happen as soon. I do agree it will be nice, especially for use cases like this. -Jim __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev There is a lot here to unpack and discuss, but I really like the ideas I'm seeing. Nice work Bogdan! I've added it the tripleo meeting agenda for next week so we can continue socializing the idea and get feedback. Thanks! https://etherpad.openstack.org/p/tripleo-meeting-items Thank you for feedback, folks. There is a lot of technical caveats, right. I'm pretty sure though with broader containers adoption, openstack infra will catch up eventually, so we all could benefit our upstream CI jobs with affinity based and co-located data available around for consequent build steps. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/15/18 5:08 PM, Sagi Shnaidman wrote: Bogdan, I think before final decisions we need to know exactly - what a price we need to pay? Without exact numbers it will be difficult to discuss about. I we need to wait 80 mins of undercloud-containers job to finish for starting all other jobs, it will be about 4.5 hours to wait for result (+ 4.5 hours in gate) which is too big price imho and doesn't worth an effort. What are exact numbers we are talking about? I fully agree but can't have those numbers, sorry! As I noted above, those are definitely sitting in openstack-infra's elastic search DB, just needed to get extracted with some assistance of folks who know more on that! Thanks On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobr...@redhat.com <mailto:bdobr...@redhat.com>> wrote: Let me clarify the problem I want to solve with pipelines. It is getting *hard* to develop things and move patches to the Happy End (merged): - Patches wait too long for CI jobs to start. It should be minutes and not hours of waiting. - If a patch fails a job w/o a good reason, the consequent recheck operation repeat waiting all over again. How pipelines may help solve it? Pipelines only alleviate, not solve the problem of waiting. We only want to build pipelines for the main zuul check process, omitting gating and RDO CI (for now). Where are two cases to consider: - A patch succeeds all checks - A patch fails a check with dependencies The latter cases benefit us the most, when pipelines are designed like it is proposed here. So that any jobs expected to fail, when a dependency fails, will be omitted from execution. This saves HW resources and zuul queue places a lot, making it available for other patches and allowing those to have CI jobs started faster (less waiting!). When we have "recheck storms", like because of some known intermittent side issue, that outcome is multiplied by the recheck storm um... level, and delivers even better and absolutely amazing results :) Zuul queue will not be growing insanely getting overwhelmed by multiple clones of the rechecked jobs highly likely deemed to fail, and blocking other patches what might have chances to pass checks as non-affected by that intermittent issue. And for the first case, when a patch succeeds, it takes some extended time, and that is the price to pay. How much time it takes to finish in a pipeline fully depends on implementation. The effectiveness could only be measured with numbers extracted from elastic search data, like average time to wait for a job to start, success vs fail execution time percentiles for a job, average amount of rechecks, recheck storms history et al. I don't have that data and don't know how to get it. Any help with that is very appreciated and could really help to move the proposed patches forward or decline it. And we could then compare "before" and "after" as well. I hope that explains the problem scope and the methodology to address that. On 5/14/18 6:15 PM, Bogdan Dobrelya wrote: An update for your review please folks Bogdan Dobrelya http://redhat.com>> writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies <https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies> for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/15/18 4:30 PM, James E. Blair wrote: Bogdan Dobrelya <bdobr...@redhat.com> writes: Added a few more patches [0], [1] by the discussion results. PTAL folks. Wrt remaining in the topic, I'd propose to give it a try and revert it, if it proved to be worse than better. Thank you for feedback! The next step could be reusing artifacts, like DLRN repos and containers built for patches and hosted undercloud, in the consequent pipelined jobs. But I'm not sure how to even approach that. [0] https://review.openstack.org/#/c/568536/ [1] https://review.openstack.org/#/c/568543/ In order to use an artifact in a dependent job, you need to store it somewhere and retrieve it. In the parent job, I'd recommend storing the artifact on the log server (in an "artifacts/" directory) next to the job's logs. The log server is essentially a time-limited artifact repository keyed on the zuul build UUID. Pass the URL to the child job using the zuul_return Ansible module. Have the child job fetch it from the log server using the URL it gets. However, don't do that if the artifacts are very large -- more than a few MB -- we'll end up running out of space quickly. In that case, please volunteer some time to help the infra team set up a swift container to store these artifacts. We don't need to *run* swift -- we have clouds with swift already. We just need some help setting up accounts, secrets, and Ansible roles to use it from Zuul. Thank you, that's a good proposal! So when we have done that upstream infra swift setup for tripleo, the 1st step in the job dependency graph may be using quickstart to do something like: * check out testing depends-on things, * build repos and all tripleo docker images from these repos, * upload into a swift container, with an automatic expiration set, the de-duplicated and compressed tarball created with something like: # docker save $(docker images -q) | gzip -1 > all.tar.xz (I expect it will be something like a 2G file) * something similar for DLRN repos prolly, I'm not an expert for this part. Then those stored artifacts to be picked up by the next step in the graph, deploying undercloud and overcloud in the single step, like: * fetch the swift containers with repos and container images * docker load -i all.tar.xz * populate images into a local registry, as usual * something similar for the repos. Includes an offline yum update (we already have a compressed repo, right? profit!) * deploy UC * deploy OC, if a job wants it And if OC deployment brought into a separate step, we do not need local registries, just 'docker load -i all.tar.xz' issued for overcloud nodes should replace image prep workflows and registries, AFAICT. Not sure with the repos for that case. I wish to assist with the upstream infra swift setup for tripleo, and that plan, just need a blessing and more hands from tripleo CI squad ;) -Jim __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/15/18 2:30 PM, Jeremy Stanley wrote: On 2018-05-15 14:07:56 +0200 (+0200), Bogdan Dobrelya wrote: [...] How pipelines may help solve it? Pipelines only alleviate, not solve the problem of waiting. We only want to build pipelines for the main zuul check process, omitting gating and RDO CI (for now). Where are two cases to consider: - A patch succeeds all checks - A patch fails a check with dependencies The latter cases benefit us the most, when pipelines are designed like it is proposed here. So that any jobs expected to fail, when a dependency fails, will be omitted from execution. [...] Your choice of terminology is making it hard to follow this proposal. You seem to mean something other than https://zuul-ci.org/docs/zuul/user/config.html#pipeline when you use the term "pipeline" (which gets confusing very quickly for anyone familiar with Zuul configuration concepts). Indeed, sorry for that confusion. I mean pipelines as jobs executed in batches, ordered via defined dependencies, like gitlab pipelines [0]. And those batches can also be thought of steps, or whatever we call that. [0] https://docs.gitlab.com/ee/ci/pipelines.html __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
Let me clarify the problem I want to solve with pipelines. It is getting *hard* to develop things and move patches to the Happy End (merged): - Patches wait too long for CI jobs to start. It should be minutes and not hours of waiting. - If a patch fails a job w/o a good reason, the consequent recheck operation repeat waiting all over again. How pipelines may help solve it? Pipelines only alleviate, not solve the problem of waiting. We only want to build pipelines for the main zuul check process, omitting gating and RDO CI (for now). Where are two cases to consider: - A patch succeeds all checks - A patch fails a check with dependencies The latter cases benefit us the most, when pipelines are designed like it is proposed here. So that any jobs expected to fail, when a dependency fails, will be omitted from execution. This saves HW resources and zuul queue places a lot, making it available for other patches and allowing those to have CI jobs started faster (less waiting!). When we have "recheck storms", like because of some known intermittent side issue, that outcome is multiplied by the recheck storm um... level, and delivers even better and absolutely amazing results :) Zuul queue will not be growing insanely getting overwhelmed by multiple clones of the rechecked jobs highly likely deemed to fail, and blocking other patches what might have chances to pass checks as non-affected by that intermittent issue. And for the first case, when a patch succeeds, it takes some extended time, and that is the price to pay. How much time it takes to finish in a pipeline fully depends on implementation. The effectiveness could only be measured with numbers extracted from elastic search data, like average time to wait for a job to start, success vs fail execution time percentiles for a job, average amount of rechecks, recheck storms history et al. I don't have that data and don't know how to get it. Any help with that is very appreciated and could really help to move the proposed patches forward or decline it. And we could then compare "before" and "after" as well. I hope that explains the problem scope and the methodology to address that. On 5/14/18 6:15 PM, Bogdan Dobrelya wrote: An update for your review please folks Bogdan Dobrelya writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
Added a few more patches [0], [1] by the discussion results. PTAL folks. Wrt remaining in the topic, I'd propose to give it a try and revert it, if it proved to be worse than better. Thank you for feedback! The next step could be reusing artifacts, like DLRN repos and containers built for patches and hosted undercloud, in the consequent pipelined jobs. But I'm not sure how to even approach that. [0] https://review.openstack.org/#/c/568536/ [1] https://review.openstack.org/#/c/568543/ On 5/15/18 10:54 AM, Bogdan Dobrelya wrote: On 5/14/18 10:06 PM, Alex Schultz wrote: On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote: An update for your review please folks Bogdan Dobrelya writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not overhead the RDO CI execution times so much to become a problem. WDYT? I'm not sure it makes sense to add a dependency on other deployment tests. It's going to add additional time to the CI run because the upgrade won't start until well over an hour after the rest of the The things are not so simple. There is also a significant time-to-wait-in-queue jobs start delay. And it takes probably even longer than the time to execute jobs. And that delay is a function of available HW resources and zuul queue length. And the proposed change affects those parameters as well, assuming jobs with failed dependencies won't run at all. So we could expect longer execution times compensated with shorter wait times! I'm not sure how to estimate that tho. You folks have all numbers and knowledge, let's use that please. jobs. The only thing I could think of where this makes more sense is to delay the deployment tests until the pep8/unit tests pass. e.g. let's not burn resources when the code is bad. There might be arguments about lack of information from a deployment when developing things but I would argue that the patch should be vetted properly first in a local environment before taking CI resources. I support this idea as well, though I'm sceptical about having that blessed in the end :) I'll add a patch though. Thanks, -Alex [0] https://review.openstack.org/#/c/568275/ [1] https://review.openstack.org/#/c/568278/ [2] https://review.openstack.org/#/c/568326/ [3] https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html [4] http://tripleo.
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/14/18 10:06 PM, Alex Schultz wrote: On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote: An update for your review please folks Bogdan Dobrelya writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not overhead the RDO CI execution times so much to become a problem. WDYT? I'm not sure it makes sense to add a dependency on other deployment tests. It's going to add additional time to the CI run because the upgrade won't start until well over an hour after the rest of the The things are not so simple. There is also a significant time-to-wait-in-queue jobs start delay. And it takes probably even longer than the time to execute jobs. And that delay is a function of available HW resources and zuul queue length. And the proposed change affects those parameters as well, assuming jobs with failed dependencies won't run at all. So we could expect longer execution times compensated with shorter wait times! I'm not sure how to estimate that tho. You folks have all numbers and knowledge, let's use that please. jobs. The only thing I could think of where this makes more sense is to delay the deployment tests until the pep8/unit tests pass. e.g. let's not burn resources when the code is bad. There might be arguments about lack of information from a deployment when developing things but I would argue that the patch should be vetted properly first in a local environment before taking CI resources. I support this idea as well, though I'm sceptical about having that blessed in the end :) I'll add a patch though. Thanks, -Alex [0] https://review.openstack.org/#/c/568275/ [1] https://review.openstack.org/#/c/568278/ [2] https://review.openstack.org/#/c/568326/ [3] https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html [4] http://tripleo.org/cistatus.html * ignore the column 1, it's obsolete, all CI jobs now using configs download AFAICT... -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for us
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
On 5/14/18 9:15 PM, Sagi Shnaidman wrote: Hi, Bogdan I like the idea with undercloud job. Actually if undercloud fails, I'd stop all other jobs, because it doens't make sense to run them. Seeing the same failure in 10 jobs doesn't add too much. So maybe adding undercloud job as dependency for all multinode jobs would be great idea. I like that idea, I'll add another patch in the topic then. I think it's worth to check also how long it will delay jobs. Will all jobs wait until undercloud job is running? Or they will be aborted when undercloud job is failing? That is is a good question for openstack-infra folks developing zuul :) But, we could just try it and see how it works, happily zuul v3 allows doing that just in the scope of proposed patches! My expectation is all jobs delayed (and I mean the main zuul pipeline execution time here) by an average time of the undercloud deploy job of ~80 min, which hopefully should not be a big deal given that there is a separate RDO CI pipeline running in parallel, which normally *highly likely* extends that extended time anyway :) And given high chances of additional 'recheck rdo' runs we can observe these days for patches on review. I wish we could introduce inter-pipeline dependencies (zuul CI <-> RDO CI) for those as well... However I'm very sceptical about multinode containers and scenarios jobs, they could fail because of very different reasons, like race conditions in product or infra issues. Having skipping some of them will lead to more rechecks from devs trying to discover all problems in a row, which will delay the development process significantly. right, I roughly estimated delay for the main zuul pipeline execution time for jobs might be a ~2.5h, which is not good. We could live with that had it be a ~1h only, like it takes for the undercloud containers job dependency example. Thanks On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya <bdobr...@redhat.com <mailto:bdobr...@redhat.com>> wrote: An update for your review please folks Bogdan Dobrelya http://redhat.com>> writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies <https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies> for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwis
Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
An update for your review please folks Bogdan Dobrelya writes: Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. What you're describing sounds more like a job graph within a pipeline. See: https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies for how to configure a job to run only after another job has completed. There is also a facility to pass data between such jobs. ... (skipped) ... Creating a job graph to have one job use the results of the previous job can make sense in a lot of cases. It doesn't always save *time* however. It's worth noting that in OpenStack's Zuul, we have made an explicit choice not to have long-running integration jobs depend on shorter pep8 or tox jobs, and that's because we value developer time more than CPU time. We would rather run all of the tests and return all of the results so a developer can fix all of the errors as quickly as possible, rather than forcing an iterative workflow where they have to fix all the whitespace issues before the CI system will tell them which actual tests broke. -Jim I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for undercloud deployments vs upgrades testing (and some more). Given that those undercloud jobs have not so high fail rates though, I think Emilien is right in his comments and those would buy us nothing. From the other side, what do you think folks of making the tripleo-ci-centos-7-3nodes-multinode depend on tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily and long running, and is non-voting. It deploys (see featuresets configs [3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the containers-multinode fails - see the CI stats page [4]. I've found only a 2 cases there for the otherwise situation, when containers-multinode fails, but 3nodes-multinode passes. So cutting off those future failures via the dependency added, *would* buy us something and allow other jobs to wait less to commence, by a reasonable price of somewhat extended time of the main zuul pipeline. I think it makes sense and that extended CI time will not overhead the RDO CI execution times so much to become a problem. WDYT? [0] https://review.openstack.org/#/c/568275/ [1] https://review.openstack.org/#/c/568278/ [2] https://review.openstack.org/#/c/568326/ [3] https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html [4] http://tripleo.org/cistatus.html * ignore the column 1, it's obsolete, all CI jobs now using configs download AFAICT... -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] CI Squads’ Sprint 12 Summary: libvirt-reproducer, python-tempestconf
not gated tripleo-upgrade repo If you have any questions and/or suggestions, please contact us in #oooq or #tripleo Thanks, Matt tq: https://github.com/openstack/tripleo-quickstart tqe: https://github.com/openstack/tripleo-quickstart-extras [1] https://specs.openstack.org/openstack/tripleo-specs/specs/policy/ci-team-structure.html [2] {{tq}}/roles/libvirt/setup/overcloud/tasks/libvirt_nodepool.yml [3] {{tqe}}/roles/create-reproducer-script/templates/reproducer-quickstart.sh.j2#L50 [4] {{tqe}}/roles/snapshot-libvirt [5] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/openstack_integration_test_suite_guide [6] https://blogs.rdoproject.org/2018/05/running-tempest-tests-against-a-tripleo-undercloud [7] https://blogs.rdoproject.org/2018/05/consuming-kolla-tempest-container-image-for-running-tempest-tests [8] https://github.com/redhat-cip/ansible-role-openstack-certification [9] https://review.rdoproject.org/etherpad/p/ruckrover-sprint12 [10] https://etherpad.openstack.org/p/rover-030518 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] Fwd: Re: [tripleo][kolla] roadmap on containers workflow
Added the kolla tag in the hope to get some feedback wrt the question Forwarded Message Subject: Re: [openstack-dev] [tripleo] roadmap on containers workflow Date: Mon, 23 Apr 2018 10:08:36 +0200 From: Bogdan Dobrelya <bdobr...@redhat.com> Organization: Red Hat To: openstack-dev@lists.openstack.org On 4/20/18 8:56 PM, Emilien Macchi wrote: So the role has proven to be useful and we're now sure that we need it to deploy a container registry before any container in the deployment, which means we can't use the puppet code anymore for this service. I propose that we move the role to OpenStack: https://review.openstack.org/#/c/563197/ https://review.openstack.org/#/c/563198/ https://review.openstack.org/#/c/563200/ So we can properly peer review and gate the new role. In the meantime, we continue to work on the new workflow. Thanks, On Sun, Apr 15, 2018 at 7:24 PM, Emilien Macchi <emil...@redhat.com <mailto:emil...@redhat.com>> wrote: On Fri, Apr 13, 2018 at 5:58 PM, Emilien Macchi <emil...@redhat.com <mailto:emil...@redhat.com>> wrote: A bit of progress today, I prototyped an Ansible role for that purpose: https://github.com/EmilienM/ansible-role-container-registry <https://github.com/EmilienM/ansible-role-container-registry> The next step is, I'm going to investigate if we can deploy Docker and Docker Distribution (to deploy the registry v2) via the existing composable services in THT by using external_deploy_tasks maybe (or another interface). The idea is really to have the registry ready before actually deploying the undercloud containers, so we can modify them in the middle (run container-check in our case). This patch: https://review.openstack.org/#/c/561377 <https://review.openstack.org/#/c/561377> is deploying Docker and Docker Registry v2 *before* containers deployment in the docker_steps. It's using the external_deploy_tasks interface that runs right after the host_prep_tasks, so still before starting containers. It's using the Ansible role that was prototyped on Friday, please take a look and raise any concern. I have only a question if we could reuse something instead if already had been solved in projects like Kolla? Otherwise it's LGTM. Now I would like to investigate how we can run container workflows between the deployment and docker and containers deployments. -- Emilien Macchi -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] roadmap on containers workflow
On 4/20/18 8:56 PM, Emilien Macchi wrote: So the role has proven to be useful and we're now sure that we need it to deploy a container registry before any container in the deployment, which means we can't use the puppet code anymore for this service. I propose that we move the role to OpenStack: https://review.openstack.org/#/c/563197/ https://review.openstack.org/#/c/563198/ https://review.openstack.org/#/c/563200/ So we can properly peer review and gate the new role. In the meantime, we continue to work on the new workflow. Thanks, On Sun, Apr 15, 2018 at 7:24 PM, Emilien Macchi <emil...@redhat.com <mailto:emil...@redhat.com>> wrote: On Fri, Apr 13, 2018 at 5:58 PM, Emilien Macchi <emil...@redhat.com <mailto:emil...@redhat.com>> wrote: A bit of progress today, I prototyped an Ansible role for that purpose: https://github.com/EmilienM/ansible-role-container-registry <https://github.com/EmilienM/ansible-role-container-registry> The next step is, I'm going to investigate if we can deploy Docker and Docker Distribution (to deploy the registry v2) via the existing composable services in THT by using external_deploy_tasks maybe (or another interface). The idea is really to have the registry ready before actually deploying the undercloud containers, so we can modify them in the middle (run container-check in our case). This patch: https://review.openstack.org/#/c/561377 <https://review.openstack.org/#/c/561377> is deploying Docker and Docker Registry v2 *before* containers deployment in the docker_steps. It's using the external_deploy_tasks interface that runs right after the host_prep_tasks, so still before starting containers. It's using the Ansible role that was prototyped on Friday, please take a look and raise any concern. I have only a question if we could reuse something instead if already had been solved in projects like Kolla? Otherwise it's LGTM. Now I would like to investigate how we can run container workflows between the deployment and docker and containers deployments. -- Emilien Macchi -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Ironic Inspector in the overcloud
On 4/18/18 12:07 PM, Derek Higgins wrote: Hi All, I've been testing the ironic inspector containerised service in the overcloud, the service essentially works but there is a couple of hurdles to tackle to set it up, the first of these is how to get the IPA kernel and ramdisk where they need to be. These need to be be present in the ironic_pxe_http container to be served out over http, whats the best way to get them there? On the undercloud this is done by copying the files across the filesystem[1][2] to /httpboot when we run "openstack overcloud image upload", but on the overcloud an alternative is required, could the files be pulled into the container during setup? I'd prefer keep bind-mounting IPA kernel and ramdisk into a container via the /var/lib/ironic/httpboot host-path. So the question then becomes how to deliver those by that path for overcloud nodes? thanks, Derek 1 - https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L421-L433 2 - https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L181 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] The Weekly Owl - 17th Edition
ipleo-security-squa <http://ck.org/p/tripleo-security-squa>d ++ | Owl fact | ++ Did you know owls were watching you while working on TripleO? Check this out: https://www.reddit.com/r/pics/comments/8cz8v0/owls_born_outside_of_office_window_wont_stop/ (Thanks Wes for the link) Thanks all for reading and stay tuned! -- Your fellow reporter, Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] roadmap on containers workflow
On 4/12/18 12:38 AM, Steve Baker wrote: On 11/04/18 12:50, Emilien Macchi wrote: Greetings, Steve Baker and I had a quick chat today about the work that is being done around containers workflow in Rocky cycle. If you're not familiar with the topic, I suggest to first read the blueprint to understand the context here: https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow One of the great outcomes of this blueprint is that in Rocky, the operator won't have to run all the "openstack overcloud container" commands to prepare the container registry and upload the containers. Indeed, it'll be driven by Heat and Mistral mostly. But today our discussion extended on 2 uses-cases that we're going to explore and find how we can address them: 1) I'm a developer and want to deploy a containerized undercloud with customized containers (more or less related to the all-in-one discussions on another thread [1]). 2) I'm submitting a patch in tripleo-common (let's say a workflow) and need my patch to be tested when the undercloud is containerized (see [2] for an excellent example). I'm fairly sure the only use cases for this will be developer or CI based. I think we need to be strongly encouraging image modifications for production deployments to go through some kind of image building pipeline. See Next Steps below for the implications of this. Both cases would require additional things: - The container registry needs to be deployed *before* actually installing the undercloud. - We need a tool to update containers from this registry and *before* deploying them. We already have this tool in place in our CI for the overcloud (see [3] and [4]). Now we need a similar thing for the undercloud. One problem I see is that we use roles and environment files to filter the images to be pulled/modified/uploaded. Now we would need to assemble a list of undercloud *and* overcloud environments, and build some kind of aggregate role data for both. This would need to happen before the undercloud is even deployed, which is quite a different order from what quickstart does currently. Either that or we do no image filtering and just process every image regardless of whether it will be used. Next steps: - Agree that we need to deploy the container-registry before the undercloud. - If agreed, we'll create a new Ansible role called ansible-role-container-registry that for now will deploy exactly what we have in TripleO, without extra feature. +1 - Drive the playbook runtime from tripleoclient to bootstrap the container registry (which of course could be disabled in undercloud.conf). tripleoclient could switch to using this role instead of puppet-tripleo to install the registry, however since the only use-cases we have are dev/CI driven I wonder if quickstart/infrared can just invoke the role when required, before tripleoclient is involved. - Create another Ansible role that would re-use container-check tool but the idea is to provide a role to modify containers when needed, and we could also control it from tripleoclient. The role would be using the ContainerImagePrepare parameter, which Steve is working on right now. Since the use cases are all upstream CI/dev I do wonder if we should just have a dedicated container-check <https://github.com/imain/container-check> role inside tripleo-quickstart-extras which can continue to use the script[3] or whatever. Keeping the logic in quickstart will remove the temptation to use it instead of a proper image build pipeline for production deployments. +1 to put it in quickstart-extras to "hide" it from the production use cases. Alternatively it could still be a standalone role which quickstart invokes, just to accommodate development workflows which don't use quickstart. Feedback is welcome, thanks. [1] All-In-One thread: http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html [2] Bug report when undercloud is containeirzed https://bugs.launchpad.net/tripleo/+bug/1762422 [3] Tool to update containers if needed: https://github.com/imain/container-check [4] Container-check running in TripleO CI: https://review.openstack.org/#/c/558885/ and https://review.openstack.org/#/c/529399/ -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __
Re: [openstack-dev] [tripleo] roadmap on containers workflow
On 4/12/18 12:38 AM, Steve Baker wrote: On 11/04/18 12:50, Emilien Macchi wrote: Greetings, Steve Baker and I had a quick chat today about the work that is being done around containers workflow in Rocky cycle. If you're not familiar with the topic, I suggest to first read the blueprint to understand the context here: https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow One of the great outcomes of this blueprint is that in Rocky, the operator won't have to run all the "openstack overcloud container" commands to prepare the container registry and upload the containers. Indeed, it'll be driven by Heat and Mistral mostly. But today our discussion extended on 2 uses-cases that we're going to explore and find how we can address them: 1) I'm a developer and want to deploy a containerized undercloud with customized containers (more or less related to the all-in-one discussions on another thread [1]). 2) I'm submitting a patch in tripleo-common (let's say a workflow) and need my patch to be tested when the undercloud is containerized (see [2] for an excellent example). I'm fairly sure the only use cases for this will be developer or CI based. I think we need to be strongly encouraging image modifications for production deployments to go through some kind of image building pipeline. See Next Steps below for the implications of this. Both cases would require additional things: - The container registry needs to be deployed *before* actually installing the undercloud. - We need a tool to update containers from this registry and *before* deploying them. We already have this tool in place in our CI for the overcloud (see [3] and [4]). Now we need a similar thing for the undercloud. One problem I see is that we use roles and environment files to filter the images to be pulled/modified/uploaded. Now we would need to assemble a list of undercloud *and* overcloud environments, and build some kind of aggregate role data for both. This would need to happen before the undercloud is even deployed, which is quite a different order from what quickstart does currently. Either that or we do no image filtering and just process every image regardless of whether it will be used. Next steps: - Agree that we need to deploy the container-registry before the undercloud. - If agreed, we'll create a new Ansible role called ansible-role-container-registry that for now will deploy exactly what we have in TripleO, without extra feature. +1 - Drive the playbook runtime from tripleoclient to bootstrap the container registry (which of course could be disabled in undercloud.conf). tripleoclient could switch to using this role instead of puppet-tripleo to install the registry, however since the only use-cases we have are dev/CI driven I wonder if quickstart/infrared can just invoke the role when required, before tripleoclient is involved. Please let's do that for tripleoclient and only make quickstart and other tools to invoke commands. We should keep being close to what users would do, which is only issuing client commands. - Create another Ansible role that would re-use container-check tool but the idea is to provide a role to modify containers when needed, and we could also control it from tripleoclient. The role would be using the ContainerImagePrepare parameter, which Steve is working on right now. Since the use cases are all upstream CI/dev I do wonder if we should just have a dedicated container-check <https://github.com/imain/container-check> role inside tripleo-quickstart-extras which can continue to use the script[3] or whatever. Keeping the logic in quickstart will remove the temptation to use it instead of a proper image build pipeline for production deployments. Alternatively it could still be a standalone role which quickstart invokes, just to accommodate development workflows which don't use quickstart. Feedback is welcome, thanks. [1] All-In-One thread: http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html [2] Bug report when undercloud is containeirzed https://bugs.launchpad.net/tripleo/+bug/1762422 [3] Tool to update containers if needed: https://github.com/imain/container-check [4] Container-check running in TripleO CI: https://review.openstack.org/#/c/558885/ and https://review.openstack.org/#/c/529399/ -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best
Re: [openstack-dev] [tripleo] roadmap on containers workflow
On 4/12/18 10:08 AM, Sergii Golovatiuk wrote: Hi, Thank you very much for brining up this topic. On Wed, Apr 11, 2018 at 2:50 AM, Emilien Macchi <emil...@redhat.com> wrote: Greetings, Steve Baker and I had a quick chat today about the work that is being done around containers workflow in Rocky cycle. If you're not familiar with the topic, I suggest to first read the blueprint to understand the context here: https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow One of the great outcomes of this blueprint is that in Rocky, the operator won't have to run all the "openstack overcloud container" commands to prepare the container registry and upload the containers. Indeed, it'll be driven by Heat and Mistral mostly. I am trying to think as operator and it's very similar to 'openstack container' which is swift. So it might be confusing I guess. But today our discussion extended on 2 uses-cases that we're going to explore and find how we can address them: 1) I'm a developer and want to deploy a containerized undercloud with customized containers (more or less related to the all-in-one discussions on another thread [1]). 2) I'm submitting a patch in tripleo-common (let's say a workflow) and need my patch to be tested when the undercloud is containerized (see [2] for an excellent example). That's very nice initiative. Both cases would require additional things: - The container registry needs to be deployed *before* actually installing the undercloud. - We need a tool to update containers from this registry and *before* deploying them. We already have this tool in place in our CI for the overcloud (see [3] and [4]). Now we need a similar thing for the undercloud. I would use external registry in this case. Quay.io might be a good choice to have rock solid simplicity. It might not be good for CI as requires very strong connectivity but it should be sufficient for developers. Next steps: - Agree that we need to deploy the container-registry before the undercloud. - If agreed, we'll create a new Ansible role called ansible-role-container-registry that for now will deploy exactly what we have in TripleO, without extra feature. Deploy own registry as part of UC deployment or use external. For instance, for production use I would like to have a cluster of 3-5 registries with HAProxy in front to speed up 1k nodes deployments. Note that this implies HA undercloud as well. Although, given that HA undercloud is goodness indeed, I would *not* invest time into reliable container registry deployment architecture for undercloud as we'll have it for free, once kubernetes/openshift control plane for openstack becomes adopted. There is a very strong notion of build pipelines, reliable containers registries et al. - Drive the playbook runtime from tripleoclient to bootstrap the container registry (which of course could be disabled in undercloud.conf). - Create another Ansible role that would re-use container-check tool but the idea is to provide a role to modify containers when needed, and we could also control it from tripleoclient. The role would be using the ContainerImagePrepare parameter, which Steve is working on right now. Feedback is welcome, thanks. [1] All-In-One thread: http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html [2] Bug report when undercloud is containeirzed https://bugs.launchpad.net/tripleo/+bug/1762422 [3] Tool to update containers if needed: https://github.com/imain/container-check [4] Container-check running in TripleO CI: https://review.openstack.org/#/c/558885/ and https://review.openstack.org/#/c/529399/ -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] roadmap on containers workflow
ts.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] PTG session about All-In-One installer: recap & roadmap
>> gather ideas, use cases, needs, before we go design a prototype in Rocky. >> > > I would like to offer help with initial testing once there is something in the repos, so count me in! > > Regards, > Javier > >> Thanks everyone who'll be involved, >> -- >> Emilien Macchi >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream
On 3/8/18 6:44 PM, Raoul Scarazzini wrote: On 08/03/2018 17:03, Adam Spiers wrote: [...] Yes agreed again, this is a strong case for collaboration between the self-healing and QA SIGs. In Dublin we also discussed the idea of the self-healing and API SIGs collaborating on the related topic of health check APIs. Guys, thanks a ton for your involvement in the topic, I am +1 to any kind of meeting we can have to discuss this (like it was proposed by Please count me in as well. I can't stop dreaming of Jepsen's Nemesis [0] hammering openstack to make it stronger :D Jokes off, let's do the best to consolidate on frameworks and tools and ditching NIH syndrome! [0] https://github.com/jepsen-io/jepsen/blob/master/jepsen/src/jepsen/nemesis.clj Adam) so I'll offer my bluejeans channel for whatever kind of meeting we want to organize. About the best practices part Georg was mentioning I'm 100% in agreement, the testing methodologies are the first thing we need to care about, starting from what we want to achieve. That said, I'll keep studying Yardstick. Hope to hear from you soon, and thanks again! -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] upgrading to a containerized undercloud
On 3/4/18 10:29 PM, Emilien Macchi wrote: The use case that I'm working on right now is the following: As an operator, I would like to upgrade my non-containerized undercloud running on Queens to a containerized undercloud running on Rocky. Also, I would like to maintain the exact same command to upgrade my undercloud, which is: openstack undercloud upgrade (with --use-heat to containerize it). The work has been tracked here: https://trello.com/c/nFbky9Uk/5-upgrade-support-from-instack-undercloud But here's an update and some open discussion before we continue to make progress. ## Workflow This is what I've found the easiest to implement and maintain: 1) Update python-tripleoclient-* and tripleo-heat-templates. Just a note that those need to be installed first, though you have this covered https://review.openstack.org/#/c/549624/7/tripleoclient/v1/undercloud.py@100 2) Run openstack overcloud container prepare. 3) Run openstack undercloud upgrade --use-heat, that underneath will: stop non-containerized services, upgrade all packages & dependencies and deploy a containerized undercloud. As we have discussed and you noted for https://review.openstack.org/#/c/549609/ we're better off changing the workflow to run the upgrade_tasks so we avoid code duplication. Note: the data isn't touched, so when the upgrade is done, the undercloud is just upgraded to Rocky, and containerized. ## Blockers encountered 1) Passwords were re-generated during the containerization, will be fixed by: https://review.openstack.org/#/c/549600/ 2) Neutron DB name was different in instack-undercloud. DB will be renamed by https://review.openstack.org/#/c/549609/ 3) Upgrade logic will live in tripleoclient: https://review.openstack.org/#/c/549624/ (note that it's small) ## Testing I'm using https://review.openstack.org/#/c/549611/ for testing but I'm also deploying in my local environment. I've been upgrading Pike to Queens successfully, when applying my patches. ## Roadmap I would like us to solve the containerized undercloud upgrade case by rocky-m1, and have by the end of m1 a CI job that actually test the operator workflow. I'll need some feedback, reviews on the proposal & reviews. Thanks in advance, -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal
Hello. As Zuul documentation [0] explains, the names "check", "gate", and "post" may be altered for more advanced pipelines. Is it doable to introduce, for particular openstack projects, multiple check stages/steps as check-1, check-2 and so on? And is it possible to make the consequent steps reusing environments from the previous steps finished with? Narrowing down to tripleo CI scope, the problem I'd want we to solve with this "virtual RFE", and using such multi-staged check pipelines, is reducing (ideally, de-duplicating) some of the common steps for existing CI jobs. For example, we may want to omit running all of the OVB or multinode (non-upgrade) jobs deploying overclouds, when the *undercloud* fails to install. This case makes even more sense, when undercloud is deployed from the same heat templates (aka Containerized Undercloud) and uses the same packages and containers images, as overcloud would do! Or, maybe, just stop the world, when tox failed at the step1 and do nothing more, as it makes very little sense to run anything else (IMO), if the patch can never be gated with a failed tox check anyway... What I propose here, is to think and discuss, and come up with an RFE, either for tripleo, or zuul, or both, of the following scenarios (examples are tripleo/RDO CI specific, though you can think of other use cases ofc!): case A. No deduplication, simple multi-staged check pipeline: * check-1: syntax only, lint/tox * check-2 : undercloud install with heat and containers * check-3 : undercloud install with heat and containers, build overcloud images (if not multinode job type), deploy overcloud... (repeats OVB jobs as is, basically) case B. Full de-duplication scenario (consequent steps re-use the previous steps results, building "on-top"): * check-1: syntax only, lint/tox * check-2 : undercloud unstall, reuses nothing from the step1 prolly * check-3 : build overcloud images, if not multinode job type, extends stage 2 * check-4: deploy overcloud, extends stages 2/3 * check-5: upgrade undercloud, extends stage 2 * check-6: upgrade overcloud, extends stage 4 (looking into future...) * check-7: deploy openshift/k8s on openstack and do e2e/conformance et al, extends either stage 4 or 6 I believe even the simplest 'case A' would reduce the zuul queues for tripleo CI dramatically. What do you think folks? See also PTG tripleo CI notes [1]. [0] https://docs.openstack.org/infra/zuul/user/concepts.html [1] https://etherpad.openstack.org/p/tripleo-ptg-ci -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [yaql] [tripleo] Backward incompatible change in YAQL minor version
On 2/19/18 3:18 PM, Sofer Athlan-Guyot wrote: Hi, Emilien Macchi <emil...@redhat.com> writes: Upgrading YAQL from 1.1.0 to 1.1.3 breaks advanced queries with groupBy aggregation. The commit that broke it is https://github.com/openstack/yaql/commit/3fb91784018de335440b01b3b069fe45dc53e025 It broke TripleO: https://bugs.launchpad.net/tripleo/+bug/1750032 But Alex and I figured (after a strong headache) that we needed to update the query like this: https://review.openstack.org/545498 This is great, but we still have a pending issue. Mixed upgrade jobs are failing from Pike on. Those are very experimental jobs[1][2] but the error is present. The problem being that in mixed version we have the 1.1.3 yaql version (master undercloud) but not the fix in the templates which are either N-1 or N-3. But if we get the fix in previous version, the deployment shouldn't work anymore as we would not have yaql 1.1.3, but the new syntax. With a backport of the YAQL fixes for tht made for Pike, would it be the full fix to make a backport of yaql 1.1.3 for Pike repos as well? Or am I missing something? It's not only CI which is affected. Any kind of mixed version operation would fail as well. [1] P->Q: http://logs.openstack.org/62/545762/3/experimental/tripleo-ci-centos-7-scenario001-multinode-oc-upgrade/afc98a5/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-02-19_09_48_54 [2] Fast Forward Upgrade: http://logs.openstack.org/86/525686/55/experimental/tripleo-ci-centos-7-scenario001-multinode-ffu-upgrade/5412555/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz It would be great to avoid this kind of change within minor versions, please please. Happy weekend, PS: I'm adding YAQL to my linkedin profile right now. -- Emilien Macchi __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream
On 2/16/18 2:59 PM, Raoul Scarazzini wrote: On 16/02/2018 10:24, Bogdan Dobrelya wrote: [...] +1 this looks like a perfect fit. Would it be possible to install that tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside the quickstart, then apply destructive-testing playbooks with either the quickstart's static inventory [0] (from your admin/control node) or maybe via dynamic inventory [1] (from undercloud managing the overcloud under test via config-download and/or external ansible deployment mechanisms)? [0] https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory [1] https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory Hi Bogdan, thanks for your answer. On the inventory side of things these playbooks work on any kind of inventory, we're using it at the moment with both manual and quickstart generated environments, or even infrared ones. We're able to do it at the same time the environment gets deployed or in a second time like a day two action. What is not clear to me is the ansible-galaxy part you're mentioning, today we rely on the github.com/redhat-openstack git repo, so we clone it and then launch the playbooks via ansible-playbook command, how do you see ansible-galaxy into the picture? Git clone just works as well... Though, I was thinking of some minimal integration via *playbooks* (not roles) in quickstart/tripleo-validations and *external* roles. So the in-repo playbooks will be referencing those external destructive testing roles. While the roles are installed with galaxy, like: $ ansible-galaxy install git+https://$repo_name,master -p $external_roles_path or prolly adding the $repo_name and $release (master or a tag) into some galaxy-requirements.yaml file and install from it: $ ansible-galaxy install --force -r quickstart-extras/playbooks/external/galaxy-requirements.yaml -p $external_roles_path Then invoked for quickstart-extras/tripleo-validations like: $ ansible-playbook -i inventory quickstart-extras/playbooks/external/destructive-tests.yaml Thanks! -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][CI] Validating HA on upstream
On 2/15/18 8:22 PM, Raoul Scarazzini wrote: TL;DR: we would like to change the way HA is tested upstream to avoid being hitten by evitable bugs that the CI process should discover. Long version: Today HA testing in upstream consist only in verifying that a three controllers setup comes up correctly and can spawn an instance. That's something, but it’s far from being enough since we continuously see "day two" bugs. We started covering this more than a year ago in internal CI and today also on rdocloud using a project named tripleo-quickstart-utils [1]. Apart from his name, the project is not limited to tripleo-quickstart, it covers three principal roles: 1 - stonith-config: a playbook that can be used to automate the creation of fencing devices in the overcloud; 2 - instance-ha: a playbook that automates the seventeen manual steps needed to configure instance HA in the overcloud, test them via rally and verify that instance HA works; 3 - validate-ha: a playbook that runs a series of disruptive actions in the overcloud and verifies it always behaves correctly by deploying a heat-template that involves all the overcloud components; To make this usable upstream, we need to understand where to put this code. Here some choices: 1 - tripleo-validations: the most logical place to put this, at least looking at the name, would be tripleo-validations. I've talked with some of the folks working on it, and it came out that the meaning of tripleo-validations project is not doing disruptive tests. Integrating this stuff would be out of scope. 2 - tripleo-quickstart-extras: apart from the fact that this is not something meant just for quickstart (the project supports infrared and "plain" environments as well) even if we initially started there, in the end, it came out that nobody was looking at the patches since nobody was able to verify them. The result was a series of reviews stuck forever. So moving back to extras would be a step backward. 3 - Dedicated project (tripleo-ha-utils or just tripleo-utils): like for tripleo-upgrades or tripleo-validations it would be perfect having all this grouped and usable as a standalone thing. Any integration is possible inside the playbook for whatever kind of test. Today we're +1 this looks like a perfect fit. Would it be possible to install that tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside the quickstart, then apply destructive-testing playbooks with either the quickstart's static inventory [0] (from your admin/control node) or maybe via dynamic inventory [1] (from undercloud managing the overcloud under test via config-download and/or external ansible deployment mechanisms)? [0] https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory [1] https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory using the bash framework to interact with the cluster, rally to test instance-ha and Ansible itself to simulate full power outage scenarios. There's been a lot of talk about this during the last PTG [2], and unfortunately, I'll not be part of the next one, but I would like to see things moving on this side. Everything I wrote is of course up to discussion, that's precisely the meaning of this mail. Thanks to all who'll give advice, suggestions, and thoughts about all this stuff. [1] https://github.com/redhat-openstack/tripleo-quickstart-utils [2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane
concept [0], with healthchecks and shared namespaces and logical coupling of sidecars, which is the agents and helping daemons running in namespaces. I hope it does. [0] https://kubernetes.io/docs/concepts/workloads/pods/pod/ -Brian __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe <http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev> __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Updates on the TripleO on Kubernetes work
On 12/1/17 5:11 PM, Jiří Stránský wrote: On 21.11.2017 12:01, Jiří Stránský wrote: Kubernetes on the overcloud === The work on this front started with 2[0][1] patches that some of you might have seen and then evolved into using the config download mechanism to execute these tasks as part of the undercloud tasks[2][3] (Thanks a bunch, Jiri, for your work here). Note that [0] needs to be refactored to use the same mechanism used in [2]. For those interested in trying the work we've done on deploying vanilla Kubernetes, i put together a post showing how to deploy it with OOOQ, and also briefly explaining the new external_deploy_tasks in service templates: https://www.jistr.com/blog/2017-11-21-kubernetes-in-tripleo/ And we've had a first Kubespray deployment success in CI, the job is ready to move from experimental to non-voting check [1]. The job doesn't yet deploy any pods on that Kubernetes cluster, but it's a step ;) Well done. Note that deployed with a netchecker app [0], it puts some pods on that cluster, and runs free connectivity (DNS) checks as a bonus. Works even better multinode, as it checks N to N connectivity mesh, IIRC. [0] https://github.com/kubernetes-incubator/kubespray/blob/master/docs/netcheck.md [1] https://review.openstack.org/#/c/524547/ There are quite a few things to improve here: - How to configure/manage the loadbalancer/vips on the overcloud Kubespray is - currently being cloned and we need to build a package for it More CI is likely - needed for this work [0] https://review.openstack.org/494470 [1] https://review.openstack.org/471759 [2] https://review.openstack.org/#/c/511272/ [3] https://review.openstack.org/#/c/514730/ __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] containerized undercloud update
On 11/30/17 10:36 PM, Wesley Hayutin wrote: Greetings, Just wanted to share some progress with the containerized undercloud work. Ian pushed some of the patches along and we now have a successful undercloud install with containers. The initial undercloud install works [1] The idempotency check failed where we reinstall the undercloud [2] Question: Do we expect the reinstallation to work at this point? Should the check be turned off? I will try it w/o the idempotency check, I suspect I will run into errors in a full run with an overcloud deployment. I ran into issues I've been deploying this way my dev envs, which is deployed-servers for overcloud nodes, like external deployments with configs download. Feel free to invite me for some brain storming as well :) Yeah, and kudos! Well done! I'm happy to see undercloud containers working better and getting adopted for CI/devs. weeks ago. I suspect if we do hit something it will be CI related as Dan Price has been deploying the overcloud for a while now. Dan I may need to review your latest doit.sh scripts to check for diffs in the CI. Thanks [1] http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_install.log.txt.gz [2] http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_reinstall.log.txt.gz#_2017-11-30_19_51_26 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] containerized undercloud update
On 11/30/17 10:36 PM, Wesley Hayutin wrote: Greetings, Just wanted to share some progress with the containerized undercloud work. Ian pushed some of the patches along and we now have a successful undercloud install with containers. The initial undercloud install works [1] The idempotency check failed where we reinstall the undercloud [2] Question: Do we expect the reinstallation to work at this point? Should the check be turned off? Yeah, there is a bug for that [0]. Not critical to fix, though nice to have for developers. I'm used to deploy with undercloud containers, and it's a pain to do a full teardown and reinstall for each change being tested. By the way, somewhat related, I have a PoC for undercloud containers all-in-one [1], by quickstart off-road. And a few 'enabler' bug-fixes [2],[3],[4], JFYI and review please. I think all-in-one uc may be useful either for CI, or dev cases. Like for those who want to deploy *things* on top of openstack, yet suffering from healing devstack and searching alternatives, like packstack et al. So they may want to switch suffering from healing tripleo (undercloud containers) instead. [0] https://bugs.launchpad.net/tripleo/+bug/1698349 [1] https://github.com/bogdando/oooq-warp/blob/master/rdocloud-guide.md [2] https://review.openstack.org/#/c/524114/ [3] https://review.openstack.org/#/c/524133/ [4] https://review.openstack.org/#/c/524187 I will try it w/o the idempotency check, I suspect I will run into errors in a full run with an overcloud deployment. I ran into issues weeks ago. I suspect if we do hit something it will be CI related as Dan Price has been deploying the overcloud for a while now. Dan I may need to review your latest doit.sh scripts to check for diffs in the CI. Thanks [1] http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_install.log.txt.gz [2] http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_reinstall.log.txt.gz#_2017-11-30_19_51_26 __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] rename ovb jobs?
On 11/30/17 8:11 PM, Emilien Macchi wrote: A few months ago, we renamed ovb-updates to be tripleo-ci-centos-7-ovb-1ctlr_1comp_1ceph-featureset024. The name is much longer but it describes better what it's doing. We know it's a job with one controller, one compute and one storage node, deploying the quickstart featureset n°24. For consistency, I propose that we rename all OVB jobs this way. For example, tripleo-ci-centos-7-ovb-ha-oooq would become tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001 etc. Any thoughts / feedback before we proceed? Before someone asks, I'm not in favor of renaming the multinode scenarios now, because they became quite familiar now, and it would confuse people to rename the jobs. Thanks, I'd like to see featuresets clarified in names as well. Just to bring the main message, w/o going into the test matrix details, like tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ovn/ceph/k8s/tempest whatever it is. -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] IPSEC integration
On 11/16/17 8:01 AM, Juan Antonio Osorio wrote: Hello folks! A few months ago Dan Sneddon and me worked in an ansible role that would enable IPSEC for the overcloud [1]. Currently, one would run it as an extra step after the overcloud deployment. But, I would like to start integrating it to TripleO itself, making it another option, probably as a composable service. For this, I'm planning to move the tripleo-ipsec ansible role repository under the TripleO umbrella. Would that be fine with everyone? Or should I add this ansible role as part of another repository? After that's This looks very similar to Kubespray [0] integration case. I hope that external deployments bits can be added without a hard requirement of being under the umbrella and packaged in RDO. I've tried to follow the guide [1] for adding RDO packages and the package review [2] and didn't succeed. There should be a simpler solution to host a package somewhere outside of RDO, and being able to add it for an external deployment managed by tripleo. My 2c. [0] https://github.com/kubernetes-incubator/kubespray [1] https://www.rdoproject.org/documentation/add-packages/ [2] https://bugzilla.redhat.com/show_bug.cgi?id=1482524 available and packaged in RDO. I'll then look into the actual TripleO composable service. Any input and contributions are welcome! [1] https://github.com/JAORMX/tripleo-ipsec -- Juan Antonio Osorio R. e-mail: jaosor...@gmail.com <mailto:jaosor...@gmail.com> __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Migrating TripleO CI in-tree tomorrow - please README
On 11/16/17 7:20 PM, Emilien Macchi wrote: TL;DR: don't approve or recheck any tripleo patch from now, until further notice on this thread. Some good progress has been made on migrating legacy tripleo CI jobs to be in-tree: https://review.openstack.org/#/q/topic:tripleo/migrate-to-zuulv3 The next steps: - Let the current gate to finish their jobs running. - Stop approving patches from now, and wait the gate to be done and cleared - Alex and I will approve the migration patches tomorrow and we hope to have them in the gate by Friday afternoon (US time) when gate isn't busy anymore. We'll also have to backport them all. - When these patches will be merged (it might take the weekend to land, depending how the gate will be), we'll run duplicated jobs until https://review.openstack.org/514778 is merged. I'll try to ping someone from Infra over the week-end if we can land it, that would be great. - Once https://review.openstack.org/514778 is merged, people are free to do recheck or approve any patches. We hope it should happen over the weekend. - I'll continue to migrate all other tripleo projects to have in-tree layout. On the list: t-p-e, t-i-e, paunch, os-*-config, tripleo-validations. Thanks for your help, Thank you for working on this Emilien! That's well done, and provides a good example for future use in other projects as well. -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Upstream LTS Releases
Thank you Mathieu for the insights! To add details to what happened: * Upgrade was never made a #1 priority. It was a one man show for far too long. (myself) I suppose that confirms that upgrades is very nice to have in production deployments, eventually, maybe... (please read below to continue) * I also happen to manage and work on other priorities. * Lot of work made to prepare for multiple versions support in our deployment tools. (we use Puppet) * Lot of work in the packaging area to speedup packaging. (we are still using deb packages but with virtualenv to stay Puppet compatible) * We need to forward-port private patches which upstream won't accept and/or are private business logic. ... yet long time maintaining and landing fixes is the ops' *reality* and pain #1. And upgrades are only pain #2. LTS can not directly help with #2, but only indirectly, if the vendors' downstream teams could better cooperate with #1 and have more time and resources to dedicate for #2, upgrades stories for shipped products and distros. Let's please to not lower the real value of LTS branches and not substitute #1 with #2. This topic is not about bureaucracy and policies, it is about how could the community help vendors to cooperate over maintaining of commodity things, with as less bureaucracy as possible, to ease the operators pains in the end. * Our developer teams didn't have enough free cycles to work right away on the upgrade. (this means delays) * We need to test compatibility with 3rd party systems which takes some time. (and make them compatible) This confirms perhaps why it is vital to only run 3rd party CI jobs for LTS branches? * We need to update systems ever which we don't have full control. This means serious delays when it comes to deployment. * We need to test features/stability during some time in our dev environment. * We need to test features/stability during some time in our staging/pre-prod environment. * We need to announce and inform our users at least 2 weeks in advance before performing an upgrade. * We choose to upgrade one service at a time (in all regions) to avoid a huge big bang upgrade. (this means more maintenance windows to plan and you can't stack them too much) * We need to swiftly respond to bug discovered by our users. This means change of priorities and delay in other service upgrades. * We will soon need to upgrade operating systems to support latest OpenStack versions. (this means we have to stop OpenStack upgrades until all nodes are upgraded) It seems that the answer to the question sounded, "Why upgrades are so painful and take so much time for ops?" is "as upgrades are not the priority. Long Time Support and maintenance are". -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Upstream LTS Releases
The concept, in general, is to create a new set of cores from these groups, and use 3rd party CI to validate patches. There are lots of details to be worked out yet, but our amazing UC (User Committee) will be begin working out the details. What is the most worrying is the exact "take over" process. Does it mean that the teams will give away the +2 power to a different team? Or will our (small) stable teams still be responsible for landing changes? If so, will they have to learn how to debug 3rd party CI jobs? Generally, I'm scared of both overloading the teams and losing the control over quality at the same time :) Probably the final proposal will clarify it.. The quality of backported fixes is expected to be a direct (and only?) interest of those new teams of new cores, coming from users and operators and vendors. The more parties to establish their 3rd party checking jobs, the better proposed changes communicated, which directly affects the quality in the end. I also suppose, contributors from ops world will likely be only struggling to see things getting fixed, and not new features adopted by legacy deployments they're used to maintain. So in theory, this works and as a mainstream developer and maintainer, you need no to fear of losing control over LTS code :) Another question is how to not block all on each over, and not push contributors away when things are getting awry, jobs failing and merging is blocked for a long time, or there is no consensus reached in a code review. I propose the LTS policy to enforce CI jobs be non-voting, as a first step on that way, and giving every LTS team member a core rights maybe? Not sure if that works though. -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] When to use parameters vs parameter_defaults
On 11/10/17 10:20 AM, Giulio Fidente wrote: On 11/26/2015 03:17 PM, Jiří Stránský wrote: On 26.11.2015 14:12, Jiří Stránský wrote: [...] It seems TripleO is hitting similar composability and sanity limits with the top-down approach, and the number of parameters which can only be fed via parameter_defaults is increasing. (The disadvantage of parameter_defaults is that, unlike hiera, we currently have no clear namespacing rules, which means a higher chance of conflict. Perhaps the unit tests suggested in another subthread would be a good start, maybe we could even think about how to do proper namespacing.) Does what i described seem somewhat accurate? Should we maybe buy into the concept of "composable templates, externally fed hierarchy-transcending parameters" for the long term? I now realized i might have used too generic or Puppetish terms in the explanation, perhaps drowning the gist of the message a bit :) What i'm suggesting is: let's consider going with parameter_defaults wherever we can, for the sake of composability, and figure out what is the best way to prevent parameter name collisions. +1 I like very much the idea of parameter_defaults + strictier namespacing rules Specifically regarding namespaces, puppet was great but ansible doesn't seem to be as good (at least to me), in fact I think we have chances for conflicts in both THT and the ansible playbooks Tripleo docs should have this explained, like in the ansible docs [1] [1] http://docs.ansible.com/ansible/latest/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] Next steps for pre-deployment workflows (e.g derive parameters)
On 11/8/17 6:09 AM, Steven Hardy wrote: Hi all, Today I had a productive hallway discussion with jtomasek and stevebaker re $subject, so I wanted to elaborate here for the benefit of those folks not present. Hopefully we can get feedback on the ideas and see if it makes sense to continue and work on some patches: The problem under discussion is how do we run pre-deployment workflows (such as those integrated recently to calculate derived parameters, and in future perhaps also those which download container images etc), and in particular how do we make these discoverable via the UI (including any input parameters). The idea we came up with has two parts: 1. Add a new optional section to roles_data for services that require pre-deploy workflows E.g something like this: pre_deploy_workflows: - derive_params: workflow_name: tripleo.derive_params_formulas.v1.dpdk_derive_params inputs: ... This would allow us to associate a specific mistral workflow with a given service template, and also work around the fact that currently mistral inputs don't have any schema (only key/value input) as we could encode the required type and any constraints in the inputs block (clearly this could be removed in future should typed parameters become available in mistral). 2. Add a new workflow that calculates the enabled services and returns all pre_deploy_workflows This would take all enabled environments, then use heat to validate the configuration and return the merged resource registry (which will require https://review.openstack.org/#/c/509760/), then we would iterate over all enabled services in the registry and extract a given roles_data key (e.g pre_deploy_workflows) The result of the workflow would be a list of all pre_deploy_workflows for all enabled services, which the UI could then use to run the workflows as part of the pre-deploy process. If this makes sense I can go ahead and push some patches so we can iterate on the implementation? I apologise for a generic/non-techy comment: it would be nice to keep required workflows near the services' definition templates, to keep it as much self-contained as possible. IIUC, that's covered by #1. For future steps, I'd like to see all of the "bulk processing" to sit in those templates as well. Thanks, Steve __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] undercloud containers with SELinux Enforcing
So the rule of thumb I propose is "if a container bind-mounts /run (/var/run), make it privileged to not mess with SELinux enforcing". I've yet to found better alternatives to allow containers access the host sockets. Additionally, the patch allows developers of t-h-t docker/services to not guess and repeat :z flags for generic /var/lib/, /etc/puppet/, /usr/share/openstack-puppet/modules and /var/log/containers/ paths for services as the wanted context for those will be configured at the deploy steps tasks [0], and the docker-puppet.py tool [1]. That kind of follows DRY the best. I hope that works. [0] https://review.openstack.org/#/c/513669/11/common/deploy-steps.j2 [1] https://review.openstack.org/#/c/513669/12/docker/docker-puppet.py@277 On 11/6/17 2:49 PM, Bogdan Dobrelya wrote: Hi. I've made some progress with containerized undercloud deployment guide and SELinux enforcing ( the bug [0] and the topic [1] ). Although I'm now completely stuck [2] with fixing t-h-t's docker/services to nail the selinux thing fully, including the containerized *overclouds* part. The main issue is to make some of the host-path volumes bind-mounted, like /run:/run and /dev:/dev, selinux friendly. Any help is appreciated! Hello folks. I need your feedback please on SELinux fixes [0] (or rather workarounds) for containerized undercloud feature, which is experimental in Pike. [TL;DR] The problem I'm trying to solve is primarily allowing TripleO users to follow the guide [1] w/o telling them "please disable SELinux". Especially, given the note "The undercloud is intended to work correctly with SELinux enforcing, and cannot be installed to a system with SELinux disabled". I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see [2]) to all of the host paths bind-mounted into containers is not secure, and from SELinux perspective allows everything to all containers. That could be a first step for docker volumes working w/o shutting down SELinux on *hosts* though. I plan to use the same approach for the t-h-t docker/services host-prep tasks as well. Why not using docker's :z :Z directly? IIUC, it doesn't allow combine with other mount flags, like :ro:z won't work. I look forward for better solutions and ideas! [0] https://review.openstack.org/#/q/topic:bug/1682179 [1] https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html [2] https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/ [0] https://bugs.launchpad.net/tripleo/+bug/1682179 [1] https://review.openstack.org/#/q/topic:bug/1682179 [2] https://review.openstack.org/#/c/517383/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] undercloud containers with SELinux Enforcing
Hi. I've made some progress with containerized undercloud deployment guide and SELinux enforcing ( the bug [0] and the topic [1] ). Although I'm now completely stuck [2] with fixing t-h-t's docker/services to nail the selinux thing fully, including the containerized *overclouds* part. The main issue is to make some of the host-path volumes bind-mounted, like /run:/run and /dev:/dev, selinux friendly. Any help is appreciated! Hello folks. I need your feedback please on SELinux fixes [0] (or rather workarounds) for containerized undercloud feature, which is experimental in Pike. [TL;DR] The problem I'm trying to solve is primarily allowing TripleO users to follow the guide [1] w/o telling them "please disable SELinux". Especially, given the note "The undercloud is intended to work correctly with SELinux enforcing, and cannot be installed to a system with SELinux disabled". I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see [2]) to all of the host paths bind-mounted into containers is not secure, and from SELinux perspective allows everything to all containers. That could be a first step for docker volumes working w/o shutting down SELinux on *hosts* though. I plan to use the same approach for the t-h-t docker/services host-prep tasks as well. Why not using docker's :z :Z directly? IIUC, it doesn't allow combine with other mount flags, like :ro:z won't work. I look forward for better solutions and ideas! [0] https://review.openstack.org/#/q/topic:bug/1682179 [1] https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html [2] https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/ [0] https://bugs.launchpad.net/tripleo/+bug/1682179 [1] https://review.openstack.org/#/q/topic:bug/1682179 [2] https://review.openstack.org/#/c/517383/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] containerized undercloud in Queens
On 11/6/17 1:01 AM, Emilien Macchi wrote: On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince <dpri...@redhat.com> wrote: [...] -CI resources: better use of CI resources. At the PTG we received feedback from the OpenStack infrastructure team that our upstream CI resource usage is quite high at times (even as high as 50% of the total). Because of the shared framework and single node capabilities we can re-architecture much of our upstream CI matrix around single node. We no longer require multinode jobs to be able to test many of the services in tripleo-heat-templates... we can just use a single cloud VM instead. We'll still want multinode undercloud -> overcloud jobs for testing things like HA and baremetal provisioning. But we can cover a large set of the services (in particular many of the new scenario jobs we added in Pike) with single node CI test runs in much less time. After the last (terrible) weeks in CI, it's pretty clear we need to find a solution to reduce and optimize our testing. I'm now really convinced by switching our current scenarios jobs to NOT deploy the overcloud, and just an undercloud with composable services & run tempest. +1 And we should start using the quickstart-extras undercloud-reploy role for that. Benefits: - deploy 1 node instead of 2 nodes, so we save nodepool resources - faster (no overcloud) - reduce gate queue time, faster development process, faster CI Challenges: - keep overcloud testing, with OVB - reduce OVB to strict minimum: Ironic, Nova, Mistral and basic containerized services on overcloud. I really want to get consensus on these points, please raise your voice now before we engage some work on that front. [...] -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo][containers] Please always do mirrored hiera changes for (non)containerized cases
Hi folks. When changing hiera defaults for the instack repo or elsewhere, please do a "mirrored" change requests to make sure things are consistent for containerized overclouds, undercloud containers and those "golden images" that instack produces in the end, IIUC. Please take a look a perfect example with all mirrors all-right [0]. Few more examples for (probably) missing mirrored changes [1], [2]. PS. I wonder how could we automate the check for "missing mirrors" in hiera data living instack vs other places? [0] https://review.openstack.org/#/q/topic:bug/1729293 [1] https://review.openstack.org/#/c/515123/ [2] https://review.openstack.org/#/c/516990/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Logging format: let's discuss a bit about default format, format configuration and so on
Hi. Few comments inline. Dear Stackers, While working on my locally deployed Openstack (Pike using TripleO), I was a bit struggling with the logging part. Currently, all logs are pushed to per-service files, in the standard format "one line per entry", plain flat text. It's nice, but if one is wanting to push and index those logs in an ELK, the current, default format isn't really good. After some discussions about oslo.log, it appears it provides a nice JSONFormatter handler¹ one might want to use for each (python) service using oslo central library. A JSON format is really cool, as it's easy to parse for machines, and it can be on a multi-line without any bit issue - this is especially important for stack traces, as their output is multi-line without real way to have a common delimiter so that we can re-format it and feed it to any log parser (logstash, fluentd, …). After some more talks, olso.log will not provide a unified interface in order to output all received logs as JSON - this makes sens, as that would mean "rewrite almost all the python logging management interface"², and that's pretty useless, since (all?) services have their own "logging.conf" file. That said… to the main purpose of that mail: - Default format for logs A first question would be "are we all OK with the default output format" - I'm pretty sure "humans" are happy with that, as it's really convenient to read and grep. But on a "standard" Openstack deploy, I'm pretty sure one does not have only one controller, one ceph node and one compute. Hence comes the log centralization, and with that, the log indexation and treatments. For that, one might argue "I'm using plain files on my logger, and grep-ing -r in them". That's a way to do things, and for that, plain, flat logs are great. But… I'm pretty sure I'm not the only one wanting to use some kind of ELK cluster for that kind of purpose. So the right question is: what about switching the default log format to JSON? On my part, I don't see "cons", only "pros", but my judgment is of course biased, as I'm "alone in my corner". But what about you, Community? - Provide a way to configure the output format/handler While poking around in the puppet modules code, I didn't find any way to set the output handler for the logs. For example, in puppet-nova³ we can set a lot of things, but not the useful handler for the output. It would be really cool to get, for each puppet module, the capability to set the handler so that one can just push some stuff in hiera, and Voilà, we have JSON logs. Doing so would allow people to chose between the default (current) output, and something more "computable". This is being implemented (for Q) in the scope of this blueprint [0]. By default, containers will keep the current logging behaviour, which is writing logs into bind-mounted host-path directories, and for those services that can syslog only, sending the logs to the host journald via syslog (bind-mounted /dev/log). And if the stdout/stderr feature enabled, side-car containers will make sure everything is captured in journald and tagged properly, with multiline entries as well. And journald can dump messages as JSON as well. I hope that works! Please send you comments to the aforementioned bp's spec. [0] https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog Of course, either proposal will require a nice code change in all puppet modules (add a new parameter for the foo::logging class, and use that new param in the configuration file, and so on), but at least people will be able to actually chose. So, before opening an issue on each launchpad project (that would be… long), I'd rather open the discussion in here and, eventually, come to some nice, acceptable and accepted solution that would make the Openstack Community happy :). Any thoughts? Thank you for your attention, feedback and wonderful support for that monster project :). Cheers, C. ¹ https://github.com/openstack/oslo.log/blob/master/oslo_log/formatters.py#L166-L235 ² http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2017-11-01.log.html#t2017-11-01T13:23:14 ³ https://github.com/openstack/puppet-nova/blob/master/manifests/logging.pp -- Cédric Jeanneret Senior Linux System Administrator Infrastructure Solutions Camptocamp SA PSE-A / EPFL, 1015 Lausanne Phone: +41 21 619 10 32 Office: +41 21 619 10 02 Email: cedric.jeanne...@camptocamp.com -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO][containers]ironic-inspector
On 11/1/17 7:04 PM, milanisko k wrote: Folks, = I've got a dilemma right now about how to proceed with containerising ironic-inspector: Fat Container -- put ironic-inspector and dnsmasq into a single container i.e consider a container as a complete inspection service shipping unit, use supervisord inside to fork and monitor both the services. Pros: * decoupling: dnsmasq of inspector isn't used by any other service which makes our life simpler as we won't need to reset dnsmasq configuration in case inspector died (to avoid exposing an unfiltered DHCP service) * we can use dnsmasq filter (an on-line configuration files updating facility) driver to limit access to dnsmasq instead of iptables, in a self-contained "package" that is configured to work together as a single unit * we don't have to worry about always scheduling dnsmasq and inspector containers on a single node (both services are bundled) * we have a *Spine-Leaf deployment capable & containerised undercloud* * an *HA capable inspector* service to be reused in overcloud * an integrated solution, tested to work by upstream CI in inspector (compatibility, versioning, configuration,...) Cons: * inflexibility: container has to be rebuilt to be used with different DHCP service (filter driver) * supervisord dependency and the need to refactor current container of inspector * Flat Container --- Put inspector and dnsmasq into separate containers. Use the (current) iptables driver to protect dnsmasq. IIRC this is the current approach. Pros: * containerised undercloud Cons: * no decoupling of dnsmasq * no spine-leaf (iptables) * containers have to be scheduled together on a single node, * no HA (iptables driver) * container won't be cool for overcloud as it won't be HA Flat container with dnsmasq filter driver Same as above but iptables isn't used anymore. Since it's not the current approach, we'd have to reshape the containers of dnsmasq and inspector to expose each others configuration so that inspector can write dnsmasq configuration on the fly (does inotify work in the mounted directories case???) Pros: * containerised undercloud * Spine-Leaf Cons: * No (easy) HA (dnsmasq would be exposed in case inspector died) Could it be managed by pacemaker bundles then? * No decoupling of dnsmasq (shared between multiple services) A dedicated side-car container can be used, just like the logging bp [0] implements it. So nothing will be shared then. [0] https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog * containers to be reshaped to expose the configuration Seems like this is inevitable anyway * overcloud-uncool container (lack of HA) Could it be managed by pacemaker bundles then? No Container -- we ship inspector as a service and configure dnsmasq for inspector to be shut down in case inspector dies (to prevent exposing an unfiltered DHCP service interference). We use dnsmasq (configuration) filter driver to have a Spine-Leaf deployment capable undercloud. Pros: * Spine Cons: * no HA inspector (shared dnsmasq?) * no containers * no reusable container for overcloud * we'd have to update the undercloud systemd to shut down dnsmasq in case inspector dies if we want to use the dnsmasq filter driver * no decoupling The Question -- What is your take on it? Cheers, milan __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)
Thank you for working on this! I know it is needed to unblock development of tripleo. I have though a few comments inline. On 10/26/17 6:14 AM, Emilien Macchi wrote: On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi <emil...@redhat.com> wrote: Quick update before being afk for some hours: - Still trying to land https://review.openstack.org/#/c/513701 (thanks Paul for promoting it in gate). Landed. - Disabling voting on scenario001 and scenario004 container jobs: https://review.openstack.org/#/c/515188/ Done, please be very careful while these jobs are not voting. If any doubt, please ping me or fultonj or gfidente on #tripleo. - overcloudrc/keystone v2 workaround: https://review.openstack.org/#/c/515161/ (d0ugal will work on proper fix for https://bugs.launchpad.net/tripleo/+bug/1727454) Merged - Dougal will work on the real fix this week but not urgent anymore. - Fixing zaqar/notification issues on https://review.openstack.org/#/c/515123 - we hope that helps to reduce some failures in gate In gate right now and hopefully merged in less than 2 hours. Otherwise, please keep rechecking it. According to Thomas Hervé, il will reduce the change to timeout. - puppet-tripleo gate broken on stable branches (syntax jobs not running properly) - jeblair is looking at it now jeblair will provide a fix hopefully this week but this is not critical at this time. Thanks Jim for your help. Once again, we'll need to retrospect and see why we reached that terrible state but let's focus on bringing our CI in a good shape again. Thanks a ton to everyone who is involved, I'm now restoring all patches that I killed from the gate. You can now recheck / rebase / approve what you want, but please save our CI resources and do it with moderation. We are not done yet. I won't call victory but we've merged almost all our blockers, one is missing but currently in gate: https://review.openstack.org/515123 - need babysit until merged. I have to warn tripleo folks about any instack-only changes these days. Please make sure each instack-only change, like Hiera overrides, has follow-up patches for containerized cases as well, which do not use instack. Otherwise, we're putting the whole containers thing under high risk to keep in place the regressions fixed for non-containers. That is dangerous, given that we disable voting for it from time to time. For this particular case, please add it in a separate review in puppet/services/zaqar*. Thanks @bandini for confirming that on IRC. Now let's see how RDO promotion works. We're close :-) Thanks everyone, On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi <emil...@redhat.com> wrote: Status: - Heat Convergence switch *might* be a reason why overcloud timeout so much. Thomas proposed to disable it: https://review.openstack.org/515077 - Every time a patch fails in the tripleo gate queue, it reset the gate. I proposed to remove this common queue: https://review.openstack.org/515070 - I cleared the patches in check and queue to make sure the 2 blockers are tested and can be merged in priority. I'll keep an eye on it today. Any help is very welcome. On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi <emil...@redhat.com> wrote: We have been working very hard to get a package/container promotions (since 44 days) and now our blocker is https://review.openstack.org/#/c/513701/. Because the gate queue is huge, we decided to block the gate and kill all the jobs running there until we can get https://review.openstack.org/#/c/513701/ and its backport https://review.openstack.org/#/c/514584 (both are blocking the whole production chain). We hope to promote after these 2 patches, unless there is something else, in that case we would iterate to the next problem. We hope you understand and support us during this effort. So please do not recheck, rebase or approve any patch until further notice. Thank you, -- Emilien Macchi -- Emilien Macchi -- Emilien Macchi -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [tripleo] undercloud containers with SELinux Enforcing
Hello folks. I need your feedback please on SELinux fixes [0] (or rather workarounds) for containerized undercloud feature, which is experimental in Pike. [TL;DR] The problem I'm trying to solve is primarily allowing TripleO users to follow the guide [1] w/o telling them "please disable SELinux". Especially, given the note "The undercloud is intended to work correctly with SELinux enforcing, and cannot be installed to a system with SELinux disabled". I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see [2]) to all of the host paths bind-mounted into containers is not secure, and from SELinux perspective allows everything to all containers. That could be a first step for docker volumes working w/o shutting down SELinux on *hosts* though. I plan to use the same approach for the t-h-t docker/services host-prep tasks as well. Why not using docker's :z :Z directly? IIUC, it doesn't allow combine with other mount flags, like :ro:z won't work. I look forward for better solutions and ideas! [0] https://review.openstack.org/#/q/topic:bug/1682179 [1] https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html [2] https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tc] [stable] [tripleo] [kolla] [ansible] [puppet] Proposing changes in stable policy for installers
On 10/17/17 12:50 AM, Michał Jastrzębski wrote: > So my 0.02$ > > Problem with handling Newton goes beyond deployment tools. Yes, it's > popular to use, but if our dependencies (openstack services > themselves) are unmaintained, so should we. If we say "we support > Newton" in deployment tools, we make kind of promise we can't keep. If > for example there is CVE in Nova that affects Newton, there is nothing > we can do about it and our "support" is meaningless. > > Not having LTS kind of model was issue for OpenStack operators > forever, but that's not problem we can solve in deployment tools > (although we are often asked for that because our communities are > largely operators and we're arguably closest projects to operators). > > I, for one, think we should keep current stable policy, not make > exception for deployment tools, and address this issue across the > board. What Emilien is describing is real issue that hurts operators. I agree we should keep current stable policy and never backport features. It adds too much of the maintenance overhead, high risks of breaking changes and requires a lots of additional testing. So then deployment tools, if want features backported, should not be holding that stable policy tag. > > On 16 October 2017 at 15:38, Emilien Macchi <emil...@redhat.com> wrote: >> On Mon, Oct 16, 2017 at 4:27 AM, Thierry Carrez <thie...@openstack.org> >> wrote: >>> Emilien Macchi wrote: >>>> [...] >>>> ## Proposal >>>> >>>> Proposal 1: create a new policy that fits for projects like installers. >>>> I kicked-off something here: https://review.openstack.org/#/c/511968/ >>>> (open for feedback). >>>> Content can be read here: >>>> http://docs-draft.openstack.org/68/511968/1/check/gate-project-team-guide-docs-ubuntu-xenial/1a5b40e//doc/build/html/stable-branches.html#support-phases >>>> Tag created here: https://review.openstack.org/#/c/511969/ (same, >>>> please review). >>>> >>>> The idea is really to not touch the current stable policy and create a >>>> new one, more "relax" that suits well for projects like installers. >>>> >>>> Proposal 2: change the current policy and be more relax for projects >>>> like installers. >>>> I haven't worked on this proposal while it was something I was >>>> considering doing first, because I realized it could bring confusion >>>> in which projects actually follow the real stable policy and the ones >>>> who have exceptions. >>>> That's why I thought having a dedicated tag would help to separate them. >>>> >>>> Proposal 3: no change anywhere, projects like installer can't claim >>>> stability etiquette (not my best option in my opinion). >>>> >>>> Anyway, feedback is welcome, I'm now listening. If you work on Kolla, >>>> TripleO, OpenStack-Ansible, PuppetOpenStack (or any project who has >>>> this need), please get involved in the review process. >>> >>> My preference goes to proposal 1, however rather than call it "relaxed" >>> I would make it specific to deployment/lifecycle or cycle-trailing >>> projects. >>> >>> Ideally this policy could get adopted by any such project. The >>> discussion started on the review and it's going well, so let's see where >>> it goes :) >> >> Thierry, when I read your comment on Gerrit I understand you prefer to >> amend the existing policy and just make a note for installers (which >> is I think the option #2 that I proposed). Can you please confirm >> that? >> So far I see option #1 has large consensus here, I'll wait for >> Thierry's answer to continue to work on it. >> >> Thanks for the feedback so far! >> -- >> Emilien Macchi >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [TripleO] containerized undercloud in Queens
On 10/3/17 10:46 PM, Dan Prince wrote: > > > This reduces our complexity greatly I think in that once it is completed > will allow us to eliminate two project (instack and instack-undercloud) > and the maintenance thereof. Furthermore, as this dovetails nice with > the Ansible > > > IMHO doit.sh is not acceptable as an undercloud installer and > this is what I've been trying to point out as the actual impact to the > end user who has to use this thing. > > > doit.sh is an example of where the effort is today. It is essentially > the same stuff we document online > here: http://tripleo.org/install/containers_deployment/undercloud.html. > > Similar to quickstart it is just something meant to help you setup a dev > environment. Please note that quickstart can "doit.sh" [0] as well and even more :) Although the undercloud_deploy role needs to be supported in the quickstart.sh maybe, and its documentation [1] should be explaining the case better. Undercloud_install_script and the script template itself fully addresses the needed flexibility for developers and operators. You can git clone / pip install things, or do not. It follows the standard quickstart way, which is jinja-templating bash scripts in order to provide an operator-ish friendly, and independent from Ansible et al, way to reproduce the scripted steps. And helps to auto-generate documentation as well. [0] https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/tree/roles/undercloud-deploy/README.md [1] https://docs.openstack.org/tripleo-quickstart/latest/ -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [tripleo] Install Kubernetes in the overcloud using TripleO
On 08.06.2017 18:36, Flavio Percoco wrote: > Hey y'all, > > Just wanted to give an updated on the work around tripleo+kubernetes. > This is > still far in the future but as we move tripleo to containers using > docker-cmd, > we're also working on the final goal, which is to have it run these > containers > on kubernetes. > > One of the first steps is to have TripleO install Kubernetes in the > overcloud > nodes and I've moved forward with this work: > > https://review.openstack.org/#/c/471759/ > > The patch depends on the `ceph-ansible` work and it uses the > mistral-ansible > action to deploy kubernetes by leveraging kargo. As it is, the patch > doesn't > quite work as it requires some files to be in some places (ssh keys) and a > couple of other things. None of these "things" are blockers as in they > can be > solved by just sending some patches here and there. > > I thought I'd send this out as an update and to request some early > feedback on > the direction of this patch. The patch, of course, works in my local > environment > ;) Note that Kubespray (former Kargo) now supports the kubeadm tool natively [0]. This speeds up a cluster bootstrapping from an average 25-30 min to a 9 or so. I believe this makes Kubespray a viable option for upstream development of OpenStack overclouds managed by K8s. Especially, bearing in mind the #deployment-time effort and all the hard work work done by tripleo and infra teams in order to shorten the CI jobs time. By the way, here is a package review [1] for adding a kubespray-ansible library, just ansible roles and playbooks, to RDO. I'd appreciate some help with moving this forward, like choosing another place to host the package, it got stuck a little bit. [0] https://github.com/kubernetes-incubator/kubespray/issues/553 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1482524 > > Flavio > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Best regards, Bogdan Dobrelya, Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev