from:"Bogdan Dobrelya"

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-12-03 Thread Bogdan Dobrelya


On 12/3/18 10:34 AM, Bogdan Dobrelya wrote:

Hi Kevin.
Puppet not only creates config files but also executes a service 
dependent steps, like db sync, so neither '[base] -> [puppet]' nor 
'[base] -> [service]' would not be enough on its own. That requires some 
services specific code to be included into *config* images as well.


PS. There is a related spec [0] created by Dan, please take a look and 
propose you feedback


[0] https://review.openstack.org/620062


I'm terribly sorry, but that's a corrected link [0] to that spec.

[0] https://review.openstack.org/620909



On 11/30/18 6:48 PM, Fox, Kevin M wrote:

Still confused by:
[base] -> [service] -> [+ puppet]
not:
[base] -> [puppet]
and
[base] -> [service]
?

Thanks,
Kevin
____
From: Bogdan Dobrelya [bdobr...@redhat.com]
Sent: Friday, November 30, 2018 5:31 AM
To: Dan Prince; openstack-dev@lists.openstack.org; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of 
containers for security and size of images (maintenance) sakes


On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a
full blown spec. Just changing RPM deps and layers for containers
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted,
it should be working as of fedora28(or 29) I hope)

[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-12-03 Thread Bogdan Dobrelya


Hi Kevin.
Puppet not only creates config files but also executes a service 
dependent steps, like db sync, so neither '[base] -> [puppet]' nor 
'[base] -> [service]' would not be enough on its own. That requires some 
services specific code to be included into *config* images as well.


PS. There is a related spec [0] created by Dan, please take a look and 
propose you feedback


[0] https://review.openstack.org/620062

On 11/30/18 6:48 PM, Fox, Kevin M wrote:

Still confused by:
[base] -> [service] -> [+ puppet]
not:
[base] -> [puppet]
and
[base] -> [service]
?

Thanks,
Kevin
____
From: Bogdan Dobrelya [bdobr...@redhat.com]
Sent: Friday, November 30, 2018 5:31 AM
To: Dan Prince; openstack-dev@lists.openstack.org; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a
full blown spec. Just changing RPM deps and layers for containers
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted,
it should be working as of fedora28(or 29) I hope)

[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that we don't install Systemd is the biggest win.

What confuses me is why are there still patches posted to move Puppet
out of the base layer when we agr

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Bogdan Dobrelya


On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a 
full blown spec. Just changing RPM deps and layers for containers 
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, 
it should be working as of fedora28(or 29) I hope)


[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that we don't install Systemd is the biggest win.

What confuses me is why are there still patches posted to move Puppet
out of the base layer when we agree moving it out of the base layer
would actually cause our resulting container image set to be larger in
size.

Dan





In my basic deployment, undercloud seems to have 17 "components"
(49
containers), overcloud controller 15 components (48 containers),
and
overcloud compute 4 components (7 containers). Accounting for
overlaps,
the total number of "components" used seems to be 19. (By
"components"
here i mean whatever uses a different ConfigImage than other
services. I
just eyeballed it but i think i'm not too far off the correct
number.)

So we'd subtract 267 MB from base image and add that to 19 leaf
images
used in this deployment. That means difference of +4.8 GB to the
current
image sizes. My /var/lib/registry dir on undercloud with all the
images
currently has 5.1 GB. We'd almost double that to 9.9 GB.

Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the
CDNs
(both externa

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Bogdan Dobrelya


On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on
IRC that this is the only thing that makes sense to seriously consider.
But even then it's not a win-win -- we'd gain some security by leaner
production images, but pay for it with space+bandwidth by duplicating
image content (IOW we can help achieve one of the goals we had in mind
by worsening the situation w/r/t the other goal we had in mind.)

Personally i'm not sold yet but it's something that i'd consider if we
got measurements of how much more space/bandwidth usage this would
consume, and if we got some further details/examples about how serious
are the security concerns if we leave config mgmt tools in runtime 
images.


IIRC the other options (that were brought forward so far) were already
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
mounting being too hacky and fragile, and nsenter not really solving the
problem (because it allows us to switch to having different bins/libs
available, but it does not allow merging the availability of bins/libs
from two containers into a single context).



We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have
config tools in runtime images", but IMO we all sorta agree that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we do this
(i'll borrow Dan's drawing):

|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g.
separately per controller, per compute). This could help with decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in sizes 
as follows. I think the crucial point here is the layering. If we do 
this image layering:


|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the 
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be 
267-190MB, so the math below would be looking much better




In my basic deployment, undercloud seems to have 17 "components" (49 
containers), overcloud controller 15 components (48 containers), and 
overcloud compute 4 components (7 containers). Accounting for overlaps, 
the total number of "components" used seems to be 19. (By "components" 
here i mean whatever uses a different ConfigImage than other services. I 
just eyeballed it but i think i'm not too far off the correct number.)


So we'd subtract 267 MB from base image and add that to 19 leaf images 
used in this deployment. That means difference of +4.8 GB to the current 
image sizes. My /var/lib/registry dir on undercloud with all the images 
currently has 5.1 GB. We'd almost double that to 9.9 GB.


Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs 
(both external and e.g. internal within OpenStack Infra CI clouds).


And for internal traffic between local registry and overcloud nodes, it 
gives +3.7 GB per controller and +800 MB per compute. That may not be so 
critical but still feels like a considerable downside.


Another gut feeling is that this way of image layering would take longer 
time to build and to run the modify-image Ansible role which we use in 
CI, so that could endanger how our CI jobs fit into the time limit. We 
could also probably measure this but i'm not sure if it's worth spending 
the time.


All in all i'd argue we should be looking at different options still.



Given that we should decouple systemd from all/some of the dependencies
(an example topic for RDO [0]), that could save a 190MB. But it seems we
cannot break the love of puppet and systemd as it heavily relies on the
latter and changing packaging like that would higly likely affect
baremetal deployments with puppet and systemd co-operating.


Ack :/



Long story short, we cannot shoot both rabbits wi

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Bogdan Dobrelya

On 11/28/18 8:55 PM, Doug Hellmann wrote:

I thought the preferred solution for more complex settings was config maps. Did
that approach not work out?

Regardless, now that the driver work is done if someone wants to take another
stab at etcd integration it’ll be more straightforward today.

Doug

While sharing configs is a feasible option to consider for large scale
configuration management, Etcd only provides a strong consistency, which
is also known as "Unavailable" [0]. For edge scenarios, to configure
40,000 remote computes over WAN connections, we'd rather want instead
weaker consistency models, like "Sticky Available" [0]. That would allow
services to fetch their configuration either from a central "uplink" or
locally as well, when the latter is not accessible from remote edge
sites. Etcd cannot provide 40,000 local endpoints to fit that case I'm
afraid, even if those would be read only replicas. That is also
something I'm highlighting in the paper [1] drafted for ICFC-2019.

But had we such a sticky available key value storage solution, we would
indeed have solved the problem of multiple configuration management
system execution for thousands of nodes as James describes it.

[0] https://jepsen.io/consistency
[1]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf

On 11/28/18 11:22 PM, Dan Prince wrote:

On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:

On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya
wrote:
Long story short, we cannot shoot both rabbits with a single shot,
not
with puppet :) May be we could with ansible replacing puppet
fully...
So splitting config and runtime images is the only choice yet to
address
the raised security concerns. And let's forget about edge cases for
now.
Tossing around a pair of extra bytes over 40,000 WAN-distributed
computes ain't gonna be our the biggest problem for sure.

I think it's this last point that is the crux of this discussion. We
can agree to disagree about the merits of this proposal and whether
it's a pre-optimzation or micro-optimization, which I admit are
somewhat subjective terms. Ultimately, it seems to be about the "why"
do we need to do this as to the reason why the conversation seems to
be going in circles a bit.

I'm all for reducing container image size, but the reality is that
this proposal doesn't necessarily help us with the Edge use cases we
are talking about trying to solve.

Why would we even run the exact same puppet binary + manifest
individually 40,000 times so that we can produce the exact same set
of
configuration files that differ only by things such as IP address,
hostnames, and passwords? Maybe we should instead be thinking about
how we can do that *1* time centrally, and produce a configuration
that can be reused across 40,000 nodes with little effort. The
opportunity for a significant impact in terms of how we can scale
TripleO is much larger if we consider approaching these problems with
a wider net of what we could do. There's opportunity for a lot of
better reuse in TripleO, configuration is just one area. The plan and
Heat stack (within the ResourceGroup) are some other areas.

We run Puppet for configuration because that is what we did on
baremetal and we didn't break backwards compatability for our
configuration options for upgrades. Our Puppet model relies on being
executed on each local host in order to splice in the correct IP
address and hostname. It executes in a distributed fashion, and works
fairly well considering the history of the project. It is robust,
guarantees no duplicate configs are being set, and is backwards
compatible with all the options TripleO supported on baremetal. Puppet
is arguably better for configuration than Ansible (which is what I hear
people most often suggest we replace it with). It suits our needs fine,
but it is perhaps a bit overkill considering we are only generating
config files.

I think the answer here is moving to something like Etcd. Perhaps

Not Etcd I think, see my comment above. But you're absolutely right Dan.

skipping over Ansible entirely as a config management tool (it is
arguably less capable than Puppet in this category anyway). Or we could
use Ansible for "legacy" services only, switch to Etcd for a majority
of the OpenStack services, and drop Puppet entirely (my favorite
option). Consolidating our technology stack would be wise.

We've already put some work and analysis into the Etcd effort. Just
need to push on it some more. Looking at the previous Kubernetes
prototypes for TripleO would be the place to start.

Config management migration is going to be tedious. Its technical debt
that needs to be handled at some point anyway. I think it is a general
TripleO improvement that could benefit all clouds, not just Edge.

Dan

At the same time, if some folks want to work on smaller optimizations
(such as container image size), with an approa

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya


On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on 
IRC that this is the only thing that makes sense to seriously consider. 
But even then it's not a win-win -- we'd gain some security by leaner 
production images, but pay for it with space+bandwidth by duplicating 
image content (IOW we can help achieve one of the goals we had in mind 
by worsening the situation w/r/t the other goal we had in mind.)


Personally i'm not sold yet but it's something that i'd consider if we 
got measurements of how much more space/bandwidth usage this would 
consume, and if we got some further details/examples about how serious 
are the security concerns if we leave config mgmt tools in runtime images.


IIRC the other options (that were brought forward so far) were already 
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind 
mounting being too hacky and fragile, and nsenter not really solving the 
problem (because it allows us to switch to having different bins/libs 
available, but it does not allow merging the availability of bins/libs 
from two containers into a single context).




We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have 
config tools in runtime images", but IMO we all sorta agree that it 
would be better not to have them there, if it came at no cost.


I think to move forward, it would be interesting to know: if we do this 
(i'll borrow Dan's drawing):


|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g. 
separately per controller, per compute). This could help with decision 
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be 
downloaded/fetched into registries.


Given that we should decouple systemd from all/some of the dependencies 
(an example topic for RDO [0]), that could save a 190MB. But it seems we 
cannot break the love of puppet and systemd as it heavily relies on the 
latter and changing packaging like that would higly likely affect 
baremetal deployments with puppet and systemd co-operating.


Long story short, we cannot shoot both rabbits with a single shot, not 
with puppet :) May be we could with ansible replacing puppet fully...
So splitting config and runtime images is the only choice yet to address 
the raised security concerns. And let's forget about edge cases for now.
Tossing around a pair of extra bytes over 40,000 WAN-distributed 
computes ain't gonna be our the biggest problem for sure.


[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction





Dan



Thanks

Jirka

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya


On 11/28/18 2:58 PM, Dan Prince wrote:

On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:

To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires)
https://review.rdoproject.org/r/#/c/17534/, and also
https://review.openstack.org/620061 -> (which in turn requires)
https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736


This email was cross-posted to multiple lists and I think we may have
lost some of the context in the process as the subject was changed.

Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I've -2'd most of them. What concerns me however is
that some of the TripleO cores seemed open to this idea yesterday on
IRC. Perhaps I've misread things but what you appear to be doing here
is quite drastic I think we need to consider any of this carefully
before proceeding with any of it.




Please also read the commit messages, I tried to explain all "Whys"
very
carefully. Just to sum up it here as well:

The current self-containing (config and runtime bits) architecture
of
containers badly affects:

* the size of the base layer and all containers images as an
additional 300MB (adds an extra 30% of size).


You are accomplishing this by removing Puppet from the base container,
but you are also creating another container in the process. This would
still be required on all nodes as Puppet is our config tool. So you
would still be downloading some of this data anyways. Understood your
reasons for doing this are that it avoids rebuilding all containers
when there is a change to any of these packages in the base container.
What you are missing however is how often is it the case that Puppet is
updated that something else in the base container isn't?


For CI jobs updating all containers, its quite an often to have changes 
in openstack/tripleo puppet modules to pull in. IIUC, that automatically 
picks up any updates for all of its dependencies and for the 
dependencies of dependencies, and all that multiplied by a hundred of 
total containers to get it updated. That is a *pain* we're used to have 
these day for quite often timing out CI jobs... Ofc, the main cause is 
delayed promotions though.


For real deployments, I have no data for the cadence of minor updates in 
puppet and tripleo & openstack modules for it, let's ask operators (as 
we're happened to be in the merged openstack-discuss list)? For its 
dependencies though, like systemd and ruby, I'm pretty sure it's quite 
often to have CVEs fixed there. So I expect what "in the fields" 
security fixes delivering for those might bring some unwanted hassle for 
long-term maintenance of LTS releases. As Tengu noted on IRC:
"well, between systemd, puppet and ruby, there are many security 
concernes, almost every month... and also, what's the point keeping them 
in runtime containers when they are useless?"




I would wager that it is more rare than you'd think. Perhaps looking at
the history of an OpenStack distribution would be a valid way to assess
this more critically. Without this data to backup the numbers I'm
afraid what you are doing here falls into "pre-optimization" territory
for me and I don't think the means used in the patches warrent the
benefits you mention here.



* Edge cases, where we have containers images to be distributed, at
least once to hit local registries, over high-latency and limited
bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
services (CI jobs do not rebuild containers so each container gets
updated for those 300MB of extra size).


It would seem to me there are other ways to solve the CI containers
update problems. Rebuilding the base layer more often would solve this
right? If we always build our service containers off of a base layer
that is recent there should be no updates to the system/puppet packages
there in our CI pipelines.


* security and the surface of attacks, by introducing systemd et al
as
additional subjects for CVE fixes to maintain for all containers.


We aren't actually using systemd within our containers. I think those
packages are getting pulled in by an RPM dependency elsewhere. So
rather than using 'rpm -ev --nodeps' to remove it we could create a
sub-package for containers in those cases and install it instead. In
short rather than hack this to remove them why not pursue a proper
packaging fix?

In general I am a fan of getting things out of the base container we
don't need... so yeah lets do this. But lets do it properly.


* services uptime, by additional restarts of services related to
security maintanence of irrelevant to openstack components sitting

Re: [openstack-dev] [TripleO][Edge][Kolla] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya

Added Kolla tag as we all together might want to do something to that 
systemd included in containers via *multiple* package dependencies, like 
[0]. Ideally, that might be properly packaging all/some (like those 
names listed in [1]) of the places having it as a dependency, to stop 
doing that as of now it's Containers Time?.. As a temporary security 
band-aiding I was thinking of removing systemd via footers [1] as an 
extra layer added on top, but not sure that buys something good long-term.


[0] https://pastebin.com/RSaRsYgZ
[1] 
https://review.openstack.org/#/c/620310/2/container-images/tripleo_kolla_template_overrides.j2@680


On 11/28/18 12:45 PM, Bogdan Dobrelya wrote:

To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires) 
https://review.rdoproject.org/r/#/c/17534/, and also 
https://review.openstack.org/620061 -> (which in turn requires)

https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736

Please also read the commit messages, I tried to explain all "Whys" very 
carefully. Just to sum up it here as well:


The current self-containing (config and runtime bits) architecture of 
containers badly affects:


* the size of the base layer and all containers images as an
   additional 300MB (adds an extra 30% of size).
* Edge cases, where we have containers images to be distributed, at
   least once to hit local registries, over high-latency and limited
   bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
   services (CI jobs do not rebuild containers so each container gets
   updated for those 300MB of extra size).
* security and the surface of attacks, by introducing systemd et al as
   additional subjects for CVE fixes to maintain for all containers.
* services uptime, by additional restarts of services related to
   security maintanence of irrelevant to openstack components sitting
   as a dead weight in containers images for ever.

On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:

Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.


Background: 1) For the Distributed Compute Node edge case, there is 
potentially tens of thousands of a single-compute-node remote edge 
sites connected over WAN to a single control plane, which is having 
high latency, like a 100ms or so, and limited bandwith.

2) For a generic security case,
3) TripleO CI updates all

Challenge:


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and 
each container!
So if we did so, we should then either install puppet-tripleo and co 
on the host and bind-mount it for the docker-puppet deployment task 
steps (bad idea IMO), OR use the magical --volumes-from 
 option to mount volumes from some 
"puppet-config" sidecar container inside each of the containers 
being launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås redhat.com> wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 



/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB
Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando









--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya


To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires) 
https://review.rdoproject.org/r/#/c/17534/, and also 
https://review.openstack.org/620061 -> (which in turn requires)

https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736

Please also read the commit messages, I tried to explain all "Whys" very 
carefully. Just to sum up it here as well:


The current self-containing (config and runtime bits) architecture of 
containers badly affects:


* the size of the base layer and all containers images as an
  additional 300MB (adds an extra 30% of size).
* Edge cases, where we have containers images to be distributed, at
  least once to hit local registries, over high-latency and limited
  bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
  services (CI jobs do not rebuild containers so each container gets
  updated for those 300MB of extra size).
* security and the surface of attacks, by introducing systemd et al as
  additional subjects for CVE fixes to maintain for all containers.
* services uptime, by additional restarts of services related to
  security maintanence of irrelevant to openstack components sitting
  as a dead weight in containers images for ever.

On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:

Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.


Background: 1) For the Distributed Compute Node edge case, there is 
potentially tens of thousands of a single-compute-node remote edge sites 
connected over WAN to a single control plane, which is having high 
latency, like a 100ms or so, and limited bandwith.

2) For a generic security case,
3) TripleO CI updates all

Challenge:


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and 
each container!
So if we did so, we should then either install puppet-tripleo and co 
on the host and bind-mount it for the docker-puppet deployment task 
steps (bad idea IMO), OR use the magical --volumes-from 
 option to mount volumes from some 
"puppet-config" sidecar container inside each of the containers being 
launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  
wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 



/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB
Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando






--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-27 Thread Bogdan Dobrelya

e? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-26 Thread Bogdan Dobrelya


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and each 
container!
So if we did so, we should then either install puppet-tripleo and co on 
the host and bind-mount it for the docker-puppet deployment task steps 
(bad idea IMO), OR use the magical --volumes-from  
option to mount volumes from some "puppet-config" sidecar container 
inside each of the containers being launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  
wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35

/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB 


Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-16 Thread Bogdan Dobrelya


+1

On 11/15/18 4:50 PM, Sagi Shnaidman wrote:

Hi,
I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. 
Quique is actively involved in improvements and development of TripleO 
and TripleO CI. He also helps in other projects including but not 
limited to Infrastructure.
He shows a very good understanding how TripleO and CI works and I'd like 
suggest him as core reviewer of TripleO in CI related code.


Please vote!
My +1 is here :)

Thanks
--
Best regards
Sagi Shnaidman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-16 Thread Bogdan Dobrelya


Hello.
The final version of the position paper "Edge Clouds Multiple Control
Planes Data Replication Challenges" [0],[1] drafted, and have been 
uploaded to EDAS. The deadline expires today and I'm afraid there is no 
time left for more of amendments. Thank you all for reviews and inputs, 
and those edge sessions at the summit in Berlin were really mind opening!


PS. I wish I could have kept working on that draft while was attending 
the summit, but that was not the case :)


[0] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf
[1] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.tex


On 11/8/18 6:58 PM, Bogdan Dobrelya wrote:

Hi folks.
The deadline for papers seems to be extended till Nov 17, so that's a 
great news!

I finished drafting the position paper [0],[1].

Please /proof read and review. There is also open questions placed there 
and it would be really nice to have a co-author here for any of those 
items remaining...


...



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-08 Thread Bogdan Dobrelya


Hi folks.
The deadline for papers seems to be extended till Nov 17, so that's a 
great news!

I finished drafting the position paper [0],[1].

Please /proof read and review. There is also open questions placed there 
and it would be really nice to have a co-author here for any of those 
items remaining...


I'm also looking for some help with... **uploading PDF** to EDAS system! 
:) It throws on me:
 pdf	notstampable	The PDF file is not compliant with PDF standards and 
cannot be stamped (see FAQ)...


And FAQ says:


"First, try using the most current version of dvipdf for LaTeX or the most 
current version of Word. You can also distill the file by using Acrobat (Pro, not 
Acrobat Reader):
* Open the PDF file in Acrobat Pro;
* Go to the File Menu > Save As or File > Export To... (in Adobe DC Pro) or File > 
Save As Other... > More Options > Postscript (in Adobe Pro version 11)
* Give the file a new name (do not overwrite the original file);
* Under "Save As Type", choose "PostScript File (*.ps)"
* Open Distiller and browse for this file or go to the directory where the file 
exists and double click on the file - this will open and run Distiller and 
regenerate the PDF file.

If you do not have Acrobat Pro, you can also try to save the PostScript version via Apple Preview, using the 
"Print..." menu and the "PDF v" selector in the lower left hand corner to pick "Save as 
PostScript...". Unfortunately, Apple Preview saves PDF files as version 1.3, which is not acceptable to PDF 
Xpress, but tools such as docupub appear to produce compliant PDF."



I have yet tried those MS word/adobe pro and distutils dances ( I think 
I should try that as well... ), but neither docupub [2] nor dvipdf(m) 
for LaTeX helped to produce a pdf edas can eat :-(


[0] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf
[1] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.tex

[2] https://docupub.com/pdfconvert/


Folks, I have drafted a few more sections [0] for your /proof reading 
and kind review please. Also left some notes for TBD things, either for 
the potential co-authors' attention or myself :)


[0] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-06 Thread Bogdan Dobrelya

Folks, I have drafted a few more sections [0] for your /proof reading
and kind review please. Also left some notes for TBD things, either for
the potential co-authors' attention or myself :)

[0]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf

On 11/5/18 6:50 PM, Bogdan Dobrelya wrote:
Update: I have yet found co-authors, I'll keep drafting that position
paper [0],[1]. Just did some baby steps so far. I'm open for feedback
and contributions!

PS. Deadline is Nov 9 03:00 UTC, but *may be* it will be extended, if
the event chairs decide to do so. Fingers crossed.

[0]
https://github.com/bogdando/papers-ieee#in-the-current-development-looking-for-co-authors

[1]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf

On 11/5/18 3:06 PM, Bogdan Dobrelya wrote:

Thank you for a reply, Flavia:

Hi Bogdan
sorry for the late reply - yesterday was a Holiday here in Brazil!
I am afraid I will not be able to engage in this collaboration with
such a short time...we had to have started this initiative a little
earlier...

That's understandable.

I hoped though a position paper is something we (all who reads that,
not just you and me) could achieve in a couple of days, without a lot
of research associated. That's a postion paper, which is not expected
to contain formal prove or implementation details. The vision for
tooling is the hardest part though, and indeed requires some time.

So let me please [tl;dr] the outcome of that position paper:

* position: given Always Available autonomy support as a starting point,
define invariants for both operational and data storage consistency
requirements of control/management plane (I've already drafted some in
[0])

* vision: show that in the end that data synchronization and conflict
resolving solution just boils down to having a causally
consistent KVS (either causal+ or causal-RT, or lazy replication
based, or anything like that), and cannot be achieved with *only*
transactional distributed database, like Galera cluster. The way how
to show that is an open question, we could refer to the existing
papers (COPS, causal-RT, lazy replication et al) and claim they fit
the defined invariants nicely, while transactional DB cannot fit it
by design (it's consensus protocols require majority/quorums to
operate and being always available for data put/write operations).
We probably may omit proving that obvious thing formally? At least for
the postion paper...

* opportunity: that is basically designing and implementing of such a
causally-consistent KVS solution (see COPS library as example) for
OpenStack, and ideally, unifying it for PaaS operators
(OpenShift/Kubernetes) and tenants willing to host their containerized
workloads on PaaS distributed over a Fog Cloud of Edge clouds and
leverage its data synchronization and conflict resolving solution
as-a-service. Like Amazon dynamo DB, for example, except that fitting
the edge cases of another cloud stack :)

[0]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md

As for working collaboratively with latex, I would recommend using
overleaf - it is not that difficult and has lots of edition resources
as markdown and track changes, for instance.
Thanks and good luck!
Flavia

On 11/2/18 5:32 PM, Bogdan Dobrelya wrote:

Hello folks.
Here is an update for today. I crated a draft [0], and spend some
time with building LaTeX with live-updating for the compiled PDF...
The latter is only informational, if someone wants to contribute,
please follow the instructions listed by the link (hint: you need no
to have any LaTeX experience, only basic markdown knowledge should be
enough!)

[0]
https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors

On 10/31/18 6:54 PM, Ildiko Vancsa wrote:

Hi,

Thank you for sharing your proposal.

I think this is a very interesting topic with a list of possible
solutions some of which this group is also discussing. It would also
be great to learn more about the IEEE activities and have experience
about the process in this group on the way forward.

I personally do not have experience with IEEE conferences, but I’m
happy to help with the paper if I can.

Thanks,
Ildikó

(added from the parallel thread)
On 2018. Oct 31., at 19:11, Mike Bayer zzzcomputing.com> wrote:

On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote:

(cross-posting openstack-dev)

Hello.
[tl;dr] I'm looking for co-author(s) to come up with "Edge clouds
data

consistency requirements and challenges" a position paper [0] (papers
submitting deadline is Nov 8).

The problem scope is synchronizing control plane and/or
deployments-specific data (not necessary limited to OpenStack) across
remote Edges and central Edge and management site(s). Including

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-05 Thread Bogdan Dobrelya

Update: I have yet found co-authors, I'll keep drafting that position
paper [0],[1]. Just did some baby steps so far. I'm open for feedback
and contributions!

PS. Deadline is Nov 9 03:00 UTC, but *may be* it will be extended, if
the event chairs decide to do so. Fingers crossed.

[0]
https://github.com/bogdando/papers-ieee#in-the-current-development-looking-for-co-authors

[1]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf

On 11/5/18 3:06 PM, Bogdan Dobrelya wrote:

Thank you for a reply, Flavia:

That's understandable.

I hoped though a position paper is something we (all who reads that, not
just you and me) could achieve in a couple of days, without a lot of
research associated. That's a postion paper, which is not expected to
contain formal prove or implementation details. The vision for tooling
is the hardest part though, and indeed requires some time.

So let me please [tl;dr] the outcome of that position paper:

[0]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md

On 11/2/18 5:32 PM, Bogdan Dobrelya wrote:

Hello folks.
Here is an update for today. I crated a draft [0], and spend some time
with building LaTeX with live-updating for the compiled PDF... The
latter is only informational, if someone wants to contribute, please
follow the instructions listed by the link (hint: you need no to have
any LaTeX experience, only basic markdown knowledge should be enough!)

[0]
https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors

On 10/31/18 6:54 PM, Ildiko Vancsa wrote:

Hi,

Thank you for sharing your proposal.

I personally do not have experience with IEEE conferences, but I’m
happy to help with the paper if I can.

Thanks,
Ildikó

(added from the parallel thread)
On 2018. Oct 31., at 19:11, Mike Bayer
wrote:

On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote:

(cross-posting openstack-dev)

Hello.
[tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data
consistency requirements and challenges" a position paper [0] (papers
submitting deadline is Nov 8).

The problem scope is synchronizing control plane and/or
deployments-specific data (not necessary limited to OpenStack) across
remote Edges and central Edge and management site(s). Including the
same
aspects for overclouds and undercloud(s), in terms of TripleO; and
other

deployment tools of your choice.

Another problem is to not go into different solutions for Edge
deployments management and control planes of edges. And for tenants as
well, if we think of tenants also doing Edge deployments based on Edge
Data Replication as a Service, say for Kubernetes/Ope

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-05 Thread Bogdan Dobrelya

Thank you for a reply, Flavia:

That's understandable.

I hoped though a position paper is something we (all who reads that, not
just you and me) could achieve in a couple of days, without a lot of
research associated. That's a postion paper, which is not expected to
contain formal prove or implementation details. The vision for tooling
is the hardest part though, and indeed requires some time.

So let me please [tl;dr] the outcome of that position paper:

[0]
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/challenges.md

On 11/2/18 5:32 PM, Bogdan Dobrelya wrote:

Hello folks.
Here is an update for today. I crated a draft [0], and spend some time
with building LaTeX with live-updating for the compiled PDF... The
latter is only informational, if someone wants to contribute, please
follow the instructions listed by the link (hint: you need no to have
any LaTeX experience, only basic markdown knowledge should be enough!)

[0]
https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors

On 10/31/18 6:54 PM, Ildiko Vancsa wrote:

Hi,

Thank you for sharing your proposal.

I personally do not have experience with IEEE conferences, but I’m
happy to help with the paper if I can.

Thanks,
Ildikó

(added from the parallel thread)
On 2018. Oct 31., at 19:11, Mike Bayer
wrote:

On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya redhat.com> wrote:

(cross-posting openstack-dev)

Hello.
[tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data
consistency requirements and challenges" a position paper [0] (papers
submitting deadline is Nov 8).

deployment tools of your choice.

So the paper should name the outstanding problems, define data
consistency requirements and pose possible solutions for
synchronization

and conflicts resolving. Having maximum autonomy cases supported for
isolated sites, with a capability to eventually catch up its
distributed

state. Like global database [1], or something different perhaps (see
causal-real-time consistency model [2],[3]), or even using git. And
probably more than that?.. (looking for ideas)

I can offer detail on whatever aspects of the "shared

Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-05 Thread Bogdan Dobrelya


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and each 
container!
So if we did so, we should then either install puppet-tripleo and co on 
the host and bind-mount it for the docker-puppet deployment task steps 
(bad idea IMO), OR use the magical --volumes-from  
option to mount volumes from some "puppet-config" sidecar container 
inside each of the containers being launched by docker-puppet tooling.


On 10/31/18 6:35 PM, Alex Schultz wrote:


So this is a single layer that is updated once and shared by all the
containers that inherit from it. I did notice the same thing and have
proposed a change in the layering of these packages last night.

https://review.openstack.org/#/c/614371/

In general this does raise a point about dependencies of services and
what the actual impact of adding new ones to projects is. Especially
in the container world where this might be duplicated N times
depending on the number of services deployed.  With the move to
containers, much of the sharedness that being on a single host
provided has been lost at a cost of increased bandwidth, memory, and
storage usage.

Thanks,
-Alex



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Edge-computing] [tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-11-02 Thread Bogdan Dobrelya

Hello folks.
Here is an update for today. I crated a draft [0], and spend some time
with building LaTeX with live-updating for the compiled PDF... The
latter is only informational, if someone wants to contribute, please
follow the instructions listed by the link (hint: you need no to have
any LaTeX experience, only basic markdown knowledge should be enough!)

[0]
https://github.com/bogdando/papers-ieee/#in-the-current-development-looking-for-co-authors

On 10/31/18 6:54 PM, Ildiko Vancsa wrote:

Hi,

Thank you for sharing your proposal.

I think this is a very interesting topic with a list of possible solutions some
of which this group is also discussing. It would also be great to learn more
about the IEEE activities and have experience about the process in this group
on the way forward.

I personally do not have experience with IEEE conferences, but I’m happy to
help with the paper if I can.

Thanks,
Ildikó

(added from the parallel thread)

On 2018. Oct 31., at 19:11, Mike Bayer wrote:

On Wed, Oct 31, 2018 at 10:57 AM Bogdan Dobrelya wrote:

(cross-posting openstack-dev)

Hello.
[tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data
consistency requirements and challenges" a position paper [0] (papers
submitting deadline is Nov 8).

The problem scope is synchronizing control plane and/or
deployments-specific data (not necessary limited to OpenStack) across
remote Edges and central Edge and management site(s). Including the same
aspects for overclouds and undercloud(s), in terms of TripleO; and other
deployment tools of your choice.

So the paper should name the outstanding problems, define data
consistency requirements and pose possible solutions for synchronization
and conflicts resolving. Having maximum autonomy cases supported for
isolated sites, with a capability to eventually catch up its distributed
state. Like global database [1], or something different perhaps (see
causal-real-time consistency model [2],[3]), or even using git. And
probably more than that?.. (looking for ideas)

I can offer detail on whatever aspects of the "shared / global
database" idea. The way we're doing it with Galera for now is all
about something simple and modestly effective for the moment, but it
doesn't have any of the hallmarks of a long-term, canonical solution,
because Galera is not well suited towards being present on many
(dozens) of endpoints. The concept that the StarlingX folks were
talking about, that of independent databases that are synchronized
using some kind of middleware is potentially more scalable, however I
think the best approach would be API-level replication, that is, you
have a bunch of Keystone services and there is a process that is
regularly accessing the APIs of these keystone services and
cross-publishing state amongst all of them. Clearly the big
challenge with that is how to resolve conflicts, I think the answer
would lie in the fact that the data being replicated would be of
limited scope and potentially consist of mostly or fully
non-overlapping records.

That is, I think "global database" is a cheap way to get what would be
more effective as asynchronous state synchronization between identity
services.

Recently we’ve been also exploring federation with an IdP (Identity Provider)
master:
https://wiki.openstack.org/wiki/Keystone_edge_architectures#Identity_Provider_.28IdP.29_Master_with_shadow_users

One of the pros is that it removes the need for synchronization and potentially
increases scalability.

Thanks,
Ildikó

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Edge-computing][tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-10-31 Thread Bogdan Dobrelya

I forgot to mention the submission registration and abstract has to be 
submitted today. I've created it as #1570506394, and the paper itself 
can be uploaded until the Nov 8 (or Nov 9 perhaps as the registration 
system shows to me). I'm not sure that paper number is searchable 
publicly, so here is the paper name and abstract for your kind review 
please:


name: "Edge clouds control plane and management data consistency challenges"
abstract: "Fog computing is emerging Cloud of (Edge) Clouds technology. 
Its control plane and deployments data synchronization is a major 
challenge. Autonomy requirements expect even the most distant edge sites 
always manageable, available for monitoring and alerting, scaling 
up/down, upgrading and applying security fixes. Whenever temporary 
disconnected sites are managed locally or centrally, some changes and 
data need to be eventually synchronized back to the central site(s) with 
having its merge-conflicts resolved for the central data hub(s). While 
some data needs to be pushed from the central site(s) to the Edge, which 
might require resolving data collisions at the remote sites as well. In 
this paper, we position the outstanding data synchronization problems 
for OpenStack cloud platform becoming a solution number one for fog 
computing. We outline the data consistency requirements and design 
approaches to meet the AA (Always Available) autonomy expectations. 
Finally, the paper brings the vision of unified tooling, which solves 
the data synchronization problems the same way for infrastructure 
owners, IaaS cloud operators and tenants running workloads for PaaS like 
OpenShift or Kubernetes deployed on top of OpenStack. The secondary goal 
of this work is to help cloud architects and developers to federate 
stateful cloud components over reliable distributed data backends and 
having its failure modes known."

Thank you  for your time, if still reading this.

On 10/31/18 3:57 PM, Bogdan Dobrelya wrote:

(cross-posting openstack-dev)

Hello.
[tl;dr] I'm looking for co-author(s) to come up with "Edge clouds data 
consistency requirements and challenges" a position paper [0] (papers 
submitting deadline is Nov 8).


The problem scope is synchronizing control plane and/or 
deployments-specific data (not necessary limited to OpenStack) across 
remote Edges and central Edge and management site(s). Including the same 
aspects for overclouds and undercloud(s), in terms of TripleO; and other 
deployment tools of your choice.


Another problem is to not go into different solutions for Edge 
deployments management and control planes of edges. And for tenants as 
well, if we think of tenants also doing Edge deployments based on Edge 
Data Replication as a Service, say for Kubernetes/OpenShift on top of 
OpenStack.


So the paper should name the outstanding problems, define data 
consistency requirements and pose possible solutions for synchronization 
and conflicts resolving. Having maximum autonomy cases supported for 
isolated sites, with a capability to eventually catch up its distributed 
state. Like global database [1], or something different perhaps (see 
causal-real-time consistency model [2],[3]), or even using git. And 
probably more than that?.. (looking for ideas)


See also the "check" list in-line, which I think also meets the data 
consistency topics well - it would be always nice to have some 
theoretical foundations at hand, when repairing some 
1000-edges-spread-off and fully broken global database, by hand :)


PS. I must admit I have yet any experience with those IEEE et al 
academic things and looking for someone who has it, to team and 
co-author that positioning paper by. That's as a start, then we can 
think of presenting it and expanding into work items for OpenStack Edge 
WG and future development plans.


[0] http://conferences.computer.org/ICFC/2019/Paper_Submission.html
[1] https://review.openstack.org/600555
[2] https://jepsen.io/consistency
[3] http://www.cs.cornell.edu/lorenzo/papers/cac-tr.pdf

On 10/22/18 3:44 PM, Flavia Delicato wrote:
= 


IEEE International Conference on Fog Computing (ICFC 2019)
June 24-26, 2019
Prague, Czech Republic
http://conferences.computer.org/ICFC/2019/
Co-located with the IEEE International Conference on Cloud Engineering
(IC2E 2019)
== 



Important Dates
---
Paper registration and abstract: Nov 1st, 2018
Full paper submission due: Nov 8th, 2018
Notification of paper acceptance: Jan. 20th, 2019
Workshop and tutorial proposals due: Nov 11, 2018
Notification of proposal acceptance: Nov 18, 2018

Call for Contributions
--
Fog computing is the extension of cloud computing into its edge and
the physical world to meet the data volume and decision velocity
requirements in many

Re: [openstack-dev] [Edge-computing][tripleo][FEMDC] IEEE Fog Computing: Call for Contributions - Deadline Approaching

2018-10-31 Thread Bogdan Dobrelya

 National University of Singapore
Neeraj Suri, TU Darmstadt
Albert Zomaya, The University of Sydney

Program Committee
--

Tarek Abdelzaher, UIUC
Anne Benoit, ENS Lyon
David Bermbach, TU Berlin
Bharat Bhargava, Purdue University
Olivier Brun, LAAS/CNRS Laboratory
Jiannong Cao, Hong Kong Polytech
Flavia C. Delicato, UFRJ, Brazil
Xiaotie Deng, Peking University, China
Schahram Dustdar, TU Wien, Germany
Maria Gorlatova, Duke University
Dharanipragada Janakiram, IIT Madras
Wenjing Luo, Virginia Tech
Pedro José Marrón, Universität Duisburg-Essen
Geyong Min, University of Exeter
Suman Nath, Microsoft Research
Vincenzo Piuri, Universita Degli Studi Di Milano
Yong Meng Teo, National University of Singapore
Guoliang Xing, Chinese University of Hong Kong
Yuanyuan Yang, SUNY Stony Brook
Xiaoyun Zhu, Cloudera




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] easily identifying how services are configured

2018-10-25 Thread Bogdan Dobrelya

On 10/19/18 8:04 PM, Alex Schultz wrote:

On Fri, Oct 19, 2018 at 10:53 AM James Slagle  wrote:

On Wed, Oct 17, 2018 at 11:14 AM Alex Schultz  wrote:
> Additionally I took a stab at combining the puppet/docker service
> definitions for the aodh services in a similar structure to start
> reducing the overhead we've had from maintaining the docker/puppet
> implementations seperately.  You can see the patch
> https://review.openstack.org/#/c/611188/ for an additional example of
> this.

That patch takes the approach of removing baremetal support. Is that
what we agreed to do?

Since it's deprecated since Queens[0], yes? I think it is time to stop
continuing this method of installation.  Given that I'm not even sure

My point and concern retains as before, unless we fully dropped the 
docker support for Queens (and downstream LTS released for it), we 
should not modify the t-h-t directory tree, due to associated 
maintenance of backports complexity reasons

the upgrade process even works anymore with baremetal, I don't think
there's a reason to keep it as it directly impacts the time it takes
to perform deployments and also contributes to increased complexity
all around.

[0] 
http://lists.openstack.org/pipermail/openstack-dev/2017-September/122248.html

I'm not specifically opposed, as I'm pretty sure the baremetal
implementations are no longer tested anywhere, but I know that Dan had
some concerns about that last time around.

The alternative we discussed was using jinja2 to include common
data/tasks in both the puppet/docker/ansible implementations. That
would also result in reducing the number of Heat resources in these
stacks and hopefully reduce the amount of time it takes to
create/update the ServiceChain stacks.

I'd rather we officially get rid of the one of the two methods and
converge on a single method without increasing the complexity via
jinja to continue to support both. If there's an improvement to be had
after we've converged on a single structure for including the base
bits, maybe we could do that then?

Thanks,
-Alex

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Proposing Bob Fournier as core reviewer

2018-10-22 Thread Bogdan Dobrelya


+1

On 10/19/18 3:44 PM, Alex Schultz wrote:

+1
On Fri, Oct 19, 2018 at 6:29 AM Emilien Macchi  wrote:


On Fri, Oct 19, 2018 at 8:24 AM Juan Antonio Osorio Robles 
 wrote:


I would like to propose Bob Fournier (bfournie) as a core reviewer in
TripleO. His patches and reviews have spanned quite a wide range in our
project, his reviews show great insight and quality and I think he would
be a addition to the core team.

What do you folks think?



Big +1, Bob is a solid contributor/reviewer. His area of knowledge has been 
critical in all aspects of Hardware Provisioning integration but also in other 
TripleO bits.
--
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-sigs] [FEMDC] [Edge] [tripleo] On the use of terms "Edge" and "Far Edge"

2018-10-18 Thread Bogdan Dobrelya


On 10/18/18 4:33 PM, arkady.kanev...@dell.com wrote:

Love  the idea to have clearer terminology.
Suggest we let telco folks to suggest terminology to use.
This is not a 3 level hierarchy but much more.
There are several layers of aggregation from local to metro, to  regional, to 
DC. And potential multiple layers in each.

-Original Message-
From: Dmitry Tantsur 
Sent: Thursday, October 18, 2018 9:23 AM
To: OpenStack Development Mailing List (not for usage questions); 
openstack-s...@lists.openstack.org
Subject: [Openstack-sigs] [FEMDC] [Edge] [tripleo] On the use of terms "Edge" and 
"Far Edge"


[EXTERNAL EMAIL]
Please report any suspicious attachments, links, or requests for sensitive 
information.


Hi all,

Sorry for chiming in really late in this topic, but I think $subj is worth
discussing until we settle harder on the potentially confusing terminology.

I think the difference between "Edge" and "Far Edge" is too vague to use these
terms in practice. Think about the "edge" metaphor itself: something rarely has
several layers of edges. A knife has an edge, there are no far edges. I imagine
zooming in and seeing more edges at the edge, and then it's quite cool indeed,
but is it really a useful metaphor for those who never used a strong 
microscope? :)

I think in the trivial sense "Far Edge" is a tautology, and should be avoided.
As a weak proof of my words, I already see a lot of smart people confusing these
two and actually use Central/Edge where they mean Edge/Far Edge. I suggest we
adopt a different terminology, even if it less consistent with typical marketing
term around the "Edge" movement.

Now, I don't have really great suggestions. Something that came up in TripleO
discussions [1] is Core/Hub/Edge, which I think reflects the idea better.

I'd be very interested to hear your ideas.


Similarly to NUMA distance is equal to the shortest path between the 
NUMA nodes, we could think of edges as facets and Edge distance as the 
shortest path between edge sites, counting from the central Edge 
(distance 0), or central Edges, if we have those decentralized and there 
is no a single central Edge?




Dmitry

[1] https://etherpad.openstack.org/p/tripleo-edge-mvp

___
openstack-sigs mailing list
openstack-s...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs
___
openstack-sigs mailing list
openstack-s...@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-sigs




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ui][tempest][oooq] Refreshing plugins from git

2018-10-18 Thread Bogdan Dobrelya


On 10/18/18 2:17 AM, Honza Pokorny wrote:

Hello folks,

I'm working on the automated ui testing blueprint[1], and I think we
need to change the way we ship our tempest tests.

Here is where things stand at the moment:

* We have a kolla image for tempest
* This image contains the tempest rpm, and the openstack-tempest-all rpm
* The openstack-tempest-all package in turn contains all of the
   openstack tempest plugins
* Each of the plugins is shipped as an rpm

So, in order for a new test in tempest-tripleo-ui to appear in CI we
have to go through at least the following tests:

* New tempest-tripleo-ui rpm
* New openstack-tempest-all rpm
* New tempest kolla image

This could easily take a week, if not more.

What I would like to build is something like the following:

* Add an option to the tempest-setup.sh script in tripleo-quickstart to
   refresh all tempest plugins from git before running any tests
* Optionally specify a zuul change for any of the plugins being
   refreshed
* Hook up the test job to patches in tripleo-ui (which tests in
   tempest-tripleo-ui are testing) so that I can run a fix and its test
   in a single CI job

This would allow the tripleo-ui team to develop code and tests at the
same time, and prevent breakage before a patch is even merged.

Here are a few questions:

* Do you think this is a good idea?


This reminds the update_containers case, but relaxed the next level of
updating from sources instead of rpm. Given that we already have that 
update_containers thing, the idea seems acceptable for CI use only. 
Although I'd prefer to see the packages and the tempest container (and 
all that update_containers affects) rebuilt in the same CI job run instead.


Though I'm not sure for having different paths for "new test in 
tempest-tripleo-ui" getting into container: executed in CI vs executed 
via TripleO UI? I think the path it takes should always be the same. But 
please excuse me if I got the case wrong.


[0] https://goo.gl/5bCWRX


* Could we accomplish this by some other, simple mechanism?

Any helpful suggestions, corrections, and feedback are much appreciated.

Thanks

Honza Pokorny

[1]: https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [puppet][tripleo][all] Zuul job backlog

2018-10-10 Thread Bogdan Dobrelya


Wesley Hayutin  writes:

[snip]


The TripleO project has created a single node container based composable
OpenStack deployment [2]. It is the projects intention to replace most of
the TripleO upstream jobs with the Standalone deployment.  We would like to
reduce our multi-node usage to a total of two or three multinode jobs to
handle a basic overcloud deployment, updates and upgrades[a]. Currently in
master we are relying on multiple multi-node scenario jobs to test many of
the OpenStack services in a single job. Our intention is to move these
multinode scenario jobs to single node job(s) that tests a smaller subset
of services. The goal of this would be target the specific areas of the
TripleO code base that affect these services and only run those there. This
would replace the existing 2-3 hour two node job(s) with single node job(s)
for specific services that completes in about half the time.  This
unfortunately will reduce the overall coverage upstream but still allows us
a basic smoke test of the supported OpenStack services and their deployment
upstream.

Ideally projects other than TripleO would make use of the Standalone
deployment to test their particular service with containers, upgrades or
for various other reasons.  Additional projects using this deployment would
help ensure bugs are found quickly and resolved providing additional
resilience to the upstream gate jobs. The TripleO team will begin review to
scope out and create estimates for the above work starting on October 18
2018.  One should expect to see updates on our progress posted to the
list.  Below are some details on the proposed changes.

Thank you all for your time and patience!

Performance improvements:
  * Standalone jobs use half the nodes of multinode jobs
  * The standalone job has an average run time of 60-80 minutes, about half
the run time of our multinode jobs

Base TripleO Job Definitions (Stein onwards):
Multi-node jobs
  * containers-multinode
  * containers-multinode-updates
  * containers-multinode-upgrades
Single node jobs
  * undercloud
  * undercloud-upgrade
  * standalone

Jobs to be removed (Stein onwards):
Multi-node jobs[b]
  * scenario001-multinode
  * scenario002-multinode
  * scenario003-multinode
  * scenario004-multinode
  * scenario006-mulitinode
  * scenario007-multinode
  * scenario008-multinode
  * scenario009-multinode
  * scenario010-multinode
  * scenario011-multinode

Jobs that may need to be created to cover additional services[4] (Stein
onwards):
Single node jobs[c]
  * standalone-barbican
  * standalone-ceph[d]
  * standalone-designate
  * standalone-manila
  * standalone-octavia
  * standalone-openshift
  * standalone-sahara
  * standalone-telemetry

[1] https://gist.github.com/notmyname/8bf3dbcb7195250eb76f2a1a8996fb00
[2]
https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html
[3]
http://lists.openstack.org/pipermail/openstack-dev/2018-September/134867.html
[4]
https://github.com/openstack/tripleo-heat-templates/blob/master/README.rst#service-testing-matrix


I wanted to follow-up that original thread [0] wrt running a default 
standalone tripleo deployment integration job for openstack-puppet 
modules to see if it breaks tripleo. There is a topic [1] to review please.


The issue (IMO) is that the default standalone setup deploys a fixed set 
of openstack services, some are disabled [2] and some go by default [3], 
which may be either an excessive or lacking coverage (like Ironic) for 
some of the puppet openstack modules.


My take is it only makes sense to deploy that standalone setup for the 
puppet-openstack-integration perhaps (and tripleo itself obviously, as 
that involves a majority of openstack-puppet modules), but not for each 
particular puppet-foo module.


Why wasting CI resources for that default job clonned for the modules 
and see, for example, puppet-keystone (and all other modules) standalone 
jobs failing because of an unrelated puppet-nova's libvirt issue [4]? 
That's pointless and inefficient. And to cover Ironic deployments, we'd 
have to keep the undercloud job as a separate.


Although that probably is acceptable as a first iteration... But ideally 
I'd like to see that standalone job composable and adapted to only test 
a deployment for the wanted components for puppet-foo modules under 
check/gate. And it also makes sense to disable tempest for the 
standalone job(s) perhaps as it is already covered by neighbour jobs.


[0] https://goo.gl/UFNtcC
[1] https://goo.gl/dPkgCH
[2] https://goo.gl/eZ1wuC
[3] https://goo.gl/H8ZnAJ
[4] https://review.openstack.org/609289

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-09-13 Thread Bogdan Dobrelya

g/cgi-bin/mailman/listinfo/openstack-dev





--
Regards,
Rabi Mishra



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





--
Best Regards,
Sergii Golovatiuk

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





--
Regards,
Rabi Mishra


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





--
Best Regards,
Sergii Golovatiuk

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Plan management refactoring for Life cycle

2018-09-11 Thread Bogdan Dobrelya

On 9/11/18 4:43 AM, James Slagle wrote:

On Mon, Sep 10, 2018 at 10:12 AM Jiri Tomasek wrote:

Hi Mathieu,

Thanks for bringing up the topic. There are several efforts currently in progress
which should lead to solving the problems you're describing. We are working on
introducing CLI commands which would perform the deployment configuration operations
on deployment plan in Swift. This is a main step to finally reach CLI and GUI
compatibility/interoperability. CLI will perform actions to configure deployment
(roles, networks, environments selection, parameters setting etc.) by calling Mistral
workflows which store the information in deployment plan in Swift. The result is that
all the information which define the deployment are stored in central place -
deployment plan in Swift and the deploy command is turned into simple 'openstack
overcloud deploy'. Deployment plan then has plan-environment.yaml
which has the list of environments used and customized parameter values,
roles-data.yaml which carry roles definition and network-data.yaml which carry
networks definition. The information stored in these files (and deployment plan in
general) can then be treated as source of information about deployment. The
deployment can then be easily exported and reliably replicated.

Here is the document which we put together to identify missing pieces between
GUI,CLI and Mistral TripleO API. We'll use this to discuss the topic at PTG
this week and define work needed to be done to achieve the complete
interoperability. [1]

Also there is a pending patch from Steven Hardy which aims to remove CLI
specific environments merging which should fix the problem with tracking of the
environments used with CLI deployment. [2]

[1] https://gist.github.com/jtomasek/8c2ae6118be0823784cdafebd9c0edac
(Apologies for inconvenient format, I'll try to update this to better/editable
format. Original doc:
https://docs.google.com/spreadsheets/d/1ERfx2rnPq6VjkJ62JlA_E6jFuHt9vVl3j95dg6-mZBM/edit?usp=sharing)
[2] https://review.openstack.org/#/c/448209/

Related to this work, I'd like to see us store the plan in git instead
of swift. I think this would reduce some of the complexity around plan
management, and move us closer to a simpler undercloud architecture.
It would be nice to see each change to the plan represented as new git
commit, so we can even see the changes to the plan as roles, networks,
services, etc, are selected.

I also think git would provide a familiar experience for both
developers and operators who are already accustomed to devops type
workflows. I think we could make these changes without it impact the
API too much or, hopefully, at all.

+42!
See also the related RFE (drafted only) [0]

[0] https://bugs.launchpad.net/tripleo/+bug/1782139

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] quickstart for humans

2018-09-10 Thread Bogdan Dobrelya


On 8/31/18 6:03 PM, Raoul Scarazzini wrote:

On 8/31/18 12:07 PM, Jiří Stránský wrote:
[...]

* "for humans" definition differs significantly based on who you ask.
E.g. my intention with [2] was to readily expose *more* knobs and tweaks
and be more transparent with the underlying workings of Ansible, because
i felt like quickstart.sh hides too much from me. In my opinion [2] is
sufficiently "for humans", yet it does pretty much the opposite of what
you're looking for.


Hey Jiri,
I think that "for humans" means simply that you launch the command with
just one parameter (i.e. the virthost), and then you have something.


yes, this ^^
I'd also add one more thing: if you later remove that something while 
having the virthost as your localhost, and the non root user as your 
current logged-in user, you remain operational :) Teardown is quite 
destructive for CI, which might be not applicable for devboxes running 
on a laptop. I have a few changes [0] in work for addressing that case.


[0] 
https://review.openstack.org/#/q/topic:localcon+(status:open+OR+status:merged)



And because of this I think here is just a matter of concentrate the
efforts to turn back quickstart.sh to its original scope: making you
launch it with just one parameter and have an available environment
after a while (OK, sometimes more than a while).
Since part of the recent discussions were around the hypotheses of
removing it, maybe we can think about make it useful again. Mostly
because it is right that the needs of everyone are different, but on the
other side with a solid starting point (the default) you can think about
customizing depending on your needs.
I'm for recycling what we have, planet (and me) will enjoy it!

My 0,002 cents.




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-23 Thread Bogdan Dobrelya

 in
similar ways. Ultimately whether you run 'docker run' or 'podman run'
you end up with the same thing as far as the existing TripleO
architecture goes.

Dan



You have a tough job. I wish you all the luck in the world in making
these decisions and hope politics and internal corporate management
decisions play as little a role in them as possible.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] podman: varlink interface for nice API calls

2018-08-23 Thread Bogdan Dobrelya

 a role in them as possible.

Best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][Edge][FEMDC] Edge clouds and controlplane updates

2018-08-14 Thread Bogdan Dobrelya


On 8/13/18 9:47 PM, Giulio Fidente wrote:

Hello,

I'd like to get some feedback regarding the remaining
work for the split controlplane spec implementation [1]

Specifically, while for some services like nova-compute it is not
necessary to update the controlplane nodes after an edge cloud is
deployed, for other services, like cinder (or glance, probably
others), it is necessary to do an update of the config files on the
controlplane when a new edge cloud is deployed.

In fact for services like cinder or glance, which are hosted in the
controlplane, we need to pull data from the edge clouds (for example
the newly deployed ceph cluster keyrings and fsid) to configure cinder
(or glance) with a new backend.

It looks like this demands for some architectural changes to solve the > 
following two:

- how do we trigger/drive updates of the controlplane nodes after the
edge cloud is deployed?


Note, there is also a strict(?) requirement of local management 
capabilities for edge clouds temporary disconnected off the central 
controlplane. That complicates the updates triggering even more. We'll 
need at least a notification-and-triggering system to perform required 
state synchronizations, including conflicts resolving. If that's the 
case, the architecture changes for TripleO deployment framework are 
inevitable AFAICT.




- how do we scale the controlplane parameters to accomodate for N
backends of the same type?

A very rough approach to the latter could be to use jinja to scale up
the CephClient service so that we can have multiple copies of it in the
controlplane.

Each instance of CephClient should provide the ceph config file and
keyring necessary for each cinder (or glance) backend.

Also note that Ceph is only a particular example but we'd need a similar
workflow for any backend type.

The etherpad for the PTG session [2] touches this, but it'd be good to
start this conversation before then.

1.
https://specs.openstack.org/openstack/tripleo-specs/specs/rocky/split-controlplane.html

2. https://etherpad.openstack.org/p/tripleo-ptg-queens-split-controlplane




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Patches to speed up plan operations

2018-08-07 Thread Bogdan Dobrelya


On 8/2/18 1:34 AM, Ian Main wrote:


Hey folks!

So I've been working on some patches to speed up plan operations in 
TripleO.  This was originally driven by the UI needing to be able to 
perform a 'plan upload' in something less than several minutes. :)


https://review.openstack.org/#/c/581153/
https://review.openstack.org/#/c/581141/

I have a functioning set of patches, and it actually cuts over 2 minutes 
off the overcloud deployment time.


Without patch:
+ openstack overcloud plan create --templates 
/home/stack/tripleo-heat-templates/ overcloud

Creating Swift container to store the plan
Creating plan from template files in: /home/stack/tripleo-heat-templates/
Plan created.
real    3m3.415s

With patch:
+ openstack overcloud plan create --templates 
/home/stack/tripleo-heat-templates/ overcloud

Creating Swift container to store the plan
Creating plan from template files in: /home/stack/tripleo-heat-templates/
Plan created.
real    0m44.694s

This is on VMs.  On real hardware it now takes something like 15-20 
seconds to do the plan upload which is much more manageable from the UI 
standpoint.


Some things about what this patch does:

- It makes use of process-templates.py (written for the undercloud) to 
process the jinjafied templates.  This reduces replication with the 
existing version in the code base and is very fast as it's all done on 
local disk.


Just wanted to say Special Big Thank You for doing that code 
consolidation work!


- It stores the bulk of the templates as a tarball in swift.  Any 
individual files in swift take precedence over the contents of the 
tarball so it should be backwards compatible.  This is a great speed up 
as we're not accessing a lot of individual files in swift.


There's still some work to do; cleaning up and fixing the unit tests, 
testing upgrades etc.  I just wanted to get some feedback on the general 
idea and hopefully some reviews and/or help - especially with the unit 
test stuff.


Thanks everyone!

     Ian


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Proposing Lukas Bezdicka core on TripleO

2018-08-06 Thread Bogdan Dobrelya


+1

On 8/1/18 1:31 PM, Giulio Fidente wrote:

Hi,

I would like to propose Lukas Bezdicka core on TripleO.

Lukas did a lot work in our tripleoclient, tripleo-common and
tripleo-heat-templates repos to make FFU possible.

FFU, which is meant to permit upgrades from Newton to Queens, requires
in depth understanding of many TripleO components (for example Heat,
Mistral and the TripleO client) but also of specific TripleO features
which were added during the course of the three releases (for example
config-download and upgrade tasks). I believe his FFU work to have been
very challenging.

Given his broad understanding, more recently Lukas started helping doing
reviews in other areas.

I am so sure he'll be a great addition to our group that I am not even
looking for comments, just votes :D




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [tripleo-validations] using using top-level fact vars will deprecated in future Ansible versions

2018-07-24 Thread Bogdan Dobrelya


On 7/23/18 9:33 PM, Emilien Macchi wrote:
But it seems like, starting with Ansible 2.5 (what we already have in 
Rocky and beyond), we should encourage the usage of ansible_facts 
dictionary.

Example:
var=hostvars[inventory_hostname].ansible_facts.hostname
instead of:
var=ansible_hostname


If that means rewriting all ansible_foo things around the globe, we'd 
have a huge scope for changes. Those are used literally everywhere. Here 
is only a search for tripleo-quickstart [0]


[0] 
http://codesearch.openstack.org/?q=%5B%5C.%27%22%5Dansible_%5CS%2B%5B%5E%3A%5D=nope=roles=tripleo-quickstart


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] How to integrate a Heat plugin in a containerized deployment?

2018-07-23 Thread Bogdan Dobrelya


On 7/23/18 12:50 PM, Ricardo Noriega De Soto wrote:

Hello guys,

I need to deploy the following Neutron BGPVPN heat plugin.

https://docs.openstack.org/networking-bgpvpn/ocata/heat.html


This will allow users, to create Heat templates with BGPVPN resources. 
Right now, BGPVPN service plugin is only available in 
neutron-server-opendaylight Kolla image:



https://github.com/openstack/kolla/blob/master/docker/neutron/neutron-server-opendaylight/Dockerfile.j2#L13


It would make sense to add right there the python-networking-bgpvpn-heat 
package. Is that correct? Heat exposes a parameter to configure plugins 


You can override that via neutron_server_opendaylight_packages_append in
tripleo common, like [0]

[0] 
http://git.openstack.org/cgit/openstack/tripleo-common/tree/container-images/tripleo_kolla_template_overrides.j2#n76 



( HeatEnginePluginDirs), that corresponds to plugins_dir parameter in 
heat.conf.


What is the issue here?

Heat will try to search any available plugin in the path determined by 
HeatEnginePluginDirs, however, the heat plugin is located in a separate 
container (neutron_api). How should we tackle this? I see no other 
example of this type of integration.


Here is the most recent example [1] of inter-containers state sharing 
for Ironic containers. I think something similar should be done for 
docker/services/heat* yaml files.


[1] https://review.openstack.org/#/c/584265/



AFAIK, /usr/lib/python2.7/site-packages is not exposed to the host as a 
mounted volume, so how is heat supposed to find bgpvpn heat plugin?


Thanks for your advice.

Cheers


--
Ricardo Noriega

Senior Software Engineer - NFV Partner Engineer | Office of Technology 
  | Red Hat

irc: rnoriega @freenode


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] New "validation" subcommand for "openstack undercloud"

2018-07-20 Thread Bogdan Dobrelya


On 7/16/18 6:32 PM, Dan Prince wrote:

On Mon, Jul 16, 2018 at 11:27 AM Cédric Jeanneret  wrote:


Dear Stackers,

In order to let operators properly validate their undercloud node, I
propose to create a new subcommand in the "openstack undercloud" "tree":
`openstack undercloud validate'

This should only run the different validations we have in the
undercloud_preflight.py¹
That way, an operator will be able to ensure all is valid before
starting "for real" any other command like "install" or "upgrade".

Of course, this "validate" step is embedded in the "install" and
"upgrade" already, but having the capability to just validate without
any further action is something that can be interesting, for example:

- ensure the current undercloud hardware/vm is sufficient for an update
- ensure the allocated VM for the undercloud is sufficient for a deploy
- and so on

There are probably other possibilities, if we extend the "validation"
scope outside the "undercloud" (like, tripleo, allinone, even overcloud).

What do you think? Any pros/cons/thoughts?


I think this command could be very useful. I'm assuming the underlying
implementation would call a 'heat stack-validate' using an ephemeral
heat-all instance. If so way we implement it for the undercloud vs the


I think that should be just ansible commands triggered natively via 
tripleoclient. Why would we validate with heat deploying a throwaway 
one-time ephemeral stacks (for undercloud/standalon) each time a user 
runs that heat installer? We had to introduce the virtual stack state 
tracking system [0], for puppet manifests compatibility sakes only (it 
sometimes rely on states CREATE vs UPDATE), which added more "ephemeral 
complexity" in DF. I'm not following why would we validate ephemeral 
stacks or using it as an additional moving part?


[0] 
https://review.openstack.org/#/q/topic:bug/1778505+(status:open+OR+status:merged)



'standalone' use case would likely be a bit different. We can probably
subclass the implementations to share common code across the efforts
though.

For the undercloud you are likely to have a few extra 'local only'
validations. Perhaps extra checks for things on the client side.

For the all-in-one I had envisioned using the output from the 'heat
stack-validate' to create a sample config file for a custom set of
services. Similar to how tools like Packstack generate a config file
for example.

Dan



Cheers,

C.



¹
http://git.openstack.org/cgit/openstack/python-tripleoclient/tree/tripleoclient/v1/undercloud_preflight.py
--
Cédric Jeanneret
Software Engineer
DFG:DF

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Proposing Jose Luis Franco for TripleO core reviewer on Upgrade bits

2018-07-20 Thread Bogdan Dobrelya


On 7/20/18 11:07 AM, Carlos Camacho Gonzalez wrote:

Hi!!!

I'll like to propose Jose Luis Franco [1][2] for core reviewer in all 
the TripleO upgrades bits. He shows a constant and active involvement in 
improving and fixing our updates/upgrades workflows, he helps also 
trying to develop/improve/fix our upstream support for testing the 
updates/upgrades.


Please vote -1/+1, and consider this my +1 vote :)


+1!



[1]: https://review.openstack.org/#/q/owner:jfrancoa%2540redhat.com
[2]: http://stackalytics.com/?release=all=commits_id=jfrancoa

Cheers,
Carlos.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] prototype with standalone mode and remote edge compute nodes

2018-07-20 Thread Bogdan Dobrelya


On 7/20/18 2:13 AM, Ben Nemec wrote:



On 07/19/2018 03:37 PM, Emilien Macchi wrote:
Today I played a little bit with Standalone deployment [1] to deploy a 
single OpenStack cloud without the need of an undercloud and overcloud.

The use-case I am testing is the following:
"As an operator, I want to deploy a single node OpenStack, that I can 
extend with remote compute nodes on the edge when needed."


We still have a bunch of things to figure out so it works out of the 
box, but so far I was able to build something that worked, and I found 
useful to share it early to gather some feedback:

https://gitlab.com/emacchi/tripleo-standalone-edge

Keep in mind this is a proof of concept, based on upstream 
documentation and re-using 100% what is in TripleO today. The only 
thing I'm doing is to change the environment and the roles for the 
remote compute node.
I plan to work on cleaning the manual steps that I had to do to make 
it working, like hardcoding some hiera parameters and figure out how 
to override ServiceNetmap.


Anyway, feel free to test / ask questions / provide feedback.


What is the benefit of doing this over just using deployed server to 
install a remote server from the central management system?  You need to 
have connectivity back to the central location anyway.  Won't this 
become unwieldy with a large number of edge nodes?  I thought we told 
people not to use Packstack for multi-node deployments for exactly that 
reason.


I guess my concern is that eliminating the undercloud makes sense for 
single-node PoC's and development work, but for what sounds like a 
production workload I feel like you're cutting off your nose to spite 
your face.  In the interest of saving one VM's worth of resources, now 
all of your day 2 operations have no built-in orchestration.  Every time 
you want to change a configuration it's "copy new script to system, ssh 
to system, run script, repeat for all systems.  So maybe this is a 
backdoor way to make Ansible our API? ;-)


Ansible may orchestrate that for day 2. Deploying Heat stacks is already 
made ephemeral for standalone/underclouds so only thing you'll need for 
day 2 is ansible really. Hence, the need of undercloud shrinks into 
having an ansible control node, like your laptop, to control all clouds 
via inventory.




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Stein blueprint - Plan to remove Keepalived support (replaced by Pacemaker)

2018-07-16 Thread Bogdan Dobrelya


I'm all for it!
Another benefit is better coverage for the standalone CI job(s), when it 
will (hopefully) become a mandatory dependency for overcloud multinode 
jobs.


On 7/16/18 12:49 PM, Sergii Golovatiuk wrote:

Hi,

On Fri, Jul 13, 2018 at 9:11 PM, Juan Antonio Osorio
 wrote:

  Sounds good to me. Even if pacemaker is heavier, less options and
consistency is better.

Greetings from Mexico :D


Greetings from Poznań :D



On Fri, 13 Jul 2018, 13:33 Emilien Macchi,  wrote:


Greetings,

We have been supporting both Keepalived and Pacemaker to handle VIP
management.


This is really good initiative which supports the main idea of 'simplicity'.


Keepalived is actually the tool used by the undercloud when SSL is enabled
(for SSL termination).
While Pacemaker is used on the overcloud to handle VIPs but also services
HA.

I see some benefits at removing support for keepalived and deploying
Pacemaker by default:
- pacemaker can be deployed on one node (we actually do it in CI), so can
be deployed on the undercloud to handle VIPs and manage HA as well.


Additionally, undercloud services may be done HA on 3 nodes if/when
it's really required.


- it'll allow to extend undercloud & standalone use cases to support
multinode one day, with HA and SSL, like we already have on the overcloud.
- it removes the complexity of managing two tools so we'll potentially
removing code in TripleO.


++


- of course since pacemaker features from overcloud would be usable in
standalone environment, but also on the undercloud.


The same OCF scripts will be used for undercloud and overcloud.



There is probably some downside, the first one is I think Keepalived is
much more lightweight than Pacemaker, we probably need to run some benchmark
here and make sure we don't make the undercloud heavier than it is now.


 From other perspective operator need to learn/support 2 tools.



I went ahead and created this blueprint for Stein:

https://blueprints.launchpad.net/tripleo/+spec/undercloud-pacemaker-default
I also plan to prototype some basic code soon and provide an upgrade path
if we accept this blueprint.


I would like to participate in this initiative as I found it very valuable.



This is something I would like to discuss here and at the PTG, feel free
to bring questions/concerns,
Thanks!
--
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev






--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [kolla][nova][tripleo] Safe guest shutdowns with kolla?

2018-07-13 Thread Bogdan Dobrelya


[Added tripleo]

It would be nice to have this situation verified/improved for 
containerized libvirt for compute nodes deployed with TripleO as well.


On 7/12/18 11:02 PM, Clint Byrum wrote:

Greetings! We've been deploying with Kolla on CentOS 7 now for a while, and
we've recently noticed a rather troubling behavior when we shutdown
hypervisors.

Somewhere between systemd and libvirt's systemd-machined integration,
we see that guests get killed aggressively by SIGTERM'ing all of the
qemu-kvm processes. This seems to happen because they are scoped into
machine.slice, but systemd-machined is killed which drops those scopes
and thus results in killing off the machines.


So far we had observed the similar [0] happening, but to systemd vs 
containers managed by docker-daemon (dockerd).


[0] https://bugs.launchpad.net/tripleo/+bug/1778913



In the past, we've used the libvirt-guests service when our libvirt was
running outside of containers. This worked splendidly, as we could
have it wait 5 minutes for VMs to attempt a graceful shutdown, avoiding
interrupting any running processes. But this service isn't available on
the host OS, as it won't be able to talk to libvirt inside the container.

The solution I've come up with for now is this:

[Unit]
Description=Manage libvirt guests in kolla safely
After=docker.service systemd-machined.service
Requires=docker.service

[Install]
WantedBy=sysinit.target

[Service]
Type=oneshot
RemainAfterExit=yes
TimeoutStopSec=400
ExecStart=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh start
ExecStart=/usr/bin/docker start nova_compute
ExecStop=/usr/bin/docker stop nova_compute
ExecStop=/usr/bin/docker exec nova_libvirt /usr/libexec/libvirt-guests.sh 
shutdown

This doesn't seem to work, though I'm still trying to work out
the ordering and such. It should ensure that before we stop the
systemd-machined and destroy all of its scopes (thus, killing all the
vms), we run the libvirt-guests.sh script to try and shut them down. The
TimeoutStopSec=400 is because the script itself waits 300 seconds for any
VM that refuses to shutdown cleanly, so this gives it a chance to wait
for at least one of those. This is an imperfect solution but it allows us
to move forward after having made a reasonable attempt at clean shutdowns.

Anyway, just wondering if anybody else using kolla-ansible or kolla
containers in general have run into this problem, and whether or not
there are better/known solutions.


As I noted above, I think the issue may be valid for TripleO as well.



Thanks!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Rocky blueprints

2018-07-12 Thread Bogdan Dobrelya


On 7/11/18 7:39 PM, Alex Schultz wrote:

Hello everyone,

As milestone 3 is quickly approaching, it's time to review the open
blueprints[0] and their status.  It appears that we have made good
progress on implementing significant functionality this cycle but we
still have some open items.  Below is the list of blueprints that are
still open so we'll want to see if they will make M3 and if not, we'd
like to move them out to Stein and they won't make Rocky without an
FFE.

Currently not marked implemented but without any open patches (likely
implemented):
- https://blueprints.launchpad.net/tripleo/+spec/major-upgrade-workflow
- 
https://blueprints.launchpad.net/tripleo/+spec/tripleo-predictable-ctlplane-ips

Currently open with pending patches (may need FFE):
- https://blueprints.launchpad.net/tripleo/+spec/config-download-ui
- https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow
- https://blueprints.launchpad.net/tripleo/+spec/containerized-undercloud


This needs FFE please. The remaining work [0] is mostly cosmetic 
(defaults switching) though it's somewhat blocked on CI infrastructure 
readiness [1] for containerized undercloud and overcloud deployments. 
The situation had been drastically improved by the recent changes 
though, like longer container images caching, enabling ansible 
pipelining, using shared local container registries for undercloud and 
overcloud deployments and may be more I'm missing. There is also ongoing 
work to mitigate the CI walltime [2].


[0] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132126.html
[1] https://trello.com/c/1yDVHmqm/115-switch-remaining-ci-jobs
[2] 
https://trello.com/c/PpNtarue/126-ci-break-the-openstack-infra-3h-timeout-wall



- https://blueprints.launchpad.net/tripleo/+spec/bluestore
- https://blueprints.launchpad.net/tripleo/+spec/gui-node-discovery-by-range
- https://blueprints.launchpad.net/tripleo/+spec/multiarch-support
- 
https://blueprints.launchpad.net/tripleo/+spec/tripleo-routed-networks-templates
- https://blueprints.launchpad.net/tripleo/+spec/sriov-vfs-as-network-interface
- https://blueprints.launchpad.net/tripleo/+spec/custom-validations

Currently open without work (should be moved to Stein):
- https://blueprints.launchpad.net/tripleo/+spec/automated-ui-testing
- https://blueprints.launchpad.net/tripleo/+spec/plan-from-git-in-gui
- https://blueprints.launchpad.net/tripleo/+spec/tripleo-ui-react-walkthrough
- 
https://blueprints.launchpad.net/tripleo/+spec/wrapping-workflow-for-node-operations
- https://blueprints.launchpad.net/tripleo/+spec/ironic-overcloud-ci


Please take some time to review this list and update it.  If you think
you are close to finishing out the feature and would like to request
an FFE please start getting that together with appropriate details and
justifications for the FFE.

Thanks,
-Alex

[0] https://blueprints.launchpad.net/tripleo/rocky

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] easily identifying how services are configured

2018-07-09 Thread Bogdan Dobrelya


On 7/6/18 7:02 PM, Ben Nemec wrote:



On 07/05/2018 01:23 PM, Dan Prince wrote:

On Thu, 2018-07-05 at 14:13 -0400, James Slagle wrote:


I would almost rather see us organize the directories by service
name/project instead of implementation.

Instead of:

puppet/services/nova-api.yaml
puppet/services/nova-conductor.yaml
docker/services/nova-api.yaml
docker/services/nova-conductor.yaml

We'd have:

services/nova/nova-api-puppet.yaml
services/nova/nova-conductor-puppet.yaml
services/nova/nova-api-docker.yaml
services/nova/nova-conductor-docker.yaml

(or perhaps even another level of directories to indicate
puppet/docker/ansible?)


I'd be open to this but doing changes on this scale is a much larger
developer and user impact than what I was thinking we would be willing
to entertain for the issue that caused me to bring this up (i.e. how to
identify services which get configured by Ansible).

Its also worth noting that many projects keep these sorts of things in
different repos too. Like Kolla fully separates kolla-ansible and
kolla-kubernetes as they are quite divergent. We have been able to
preserve some of our common service architectures but as things move
towards kubernetes we may which to change things structurally a bit
too.


True, but the current directory layout was from back when we intended to 
support multiple deployment tools in parallel (originally 
tripleo-image-elements and puppet).  Since I think it has become clear 
that it's impractical to maintain two different technologies to do 
essentially the same thing I'm not sure there's a need for it now.  It's 
also worth noting that kolla-kubernetes basically died because there 
wasn't enough people to maintain both deployment methods, so we're not 
the only ones who have found that to be true.  If/when we move to 
kubernetes I would anticipate it going like the initial containers work 
did - development for a couple of cycles, then a switch to the new thing 
and deprecation of the old thing, then removal of support for the old 
thing.


That being said, because of the fact that the service yamls are 
essentially an API for TripleO because they're referenced in user


this ^^

resource registries, I'm not sure it's worth the churn to move 
everything either.  I think that's going to be an issue either way 
though, it's just a question of the scope.  _Something_ is going to move 
around no matter how we reorganize so it's a problem that needs to be 
addressed anyway.


[tl;dr] I can foresee reorganizing that API becomes a nightmare for 
maintainers doing backports for queens (and the LTS downstream release 
based on it). Now imagine kubernetes support comes within those next a 
few years, before we can let the old API just go...


I have an example [0] to share all that pain brought by a simple move of 
'API defaults' from environments/services-docker to 
environments/services plus environments/services-baremetal. Each time a 
file changes contents by its old location, like here [1], I had to run a 
lot of sanity checks to rebase it properly. Like checking for the 
updated paths in resource registries are still valid or had to/been 
moved as well, then picking the source of truth for diverged old vs 
changes locations - all that to loose nothing important in progress.


So I'd say please let's do *not* change services' paths/namespaces in 
t-h-t "API" w/o real need to do that, when there is no more alternatives 
left to that.


[0] https://review.openstack.org/#/q/topic:containers-default-stable/queens
[1] https://review.openstack.org/#/c/567810



-Ben

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] tripleo gate is blocked - please read

2018-06-14 Thread Bogdan Dobrelya


On 6/14/18 3:50 AM, Emilien Macchi wrote:
TL;DR: gate queue was 25h+, we put all patches from gate on standby, do 
not restore/recheck until further announcement.


We recently enabled the containerized undercloud for multinode jobs and 
we believe this was a bit premature as the container download process 
wasn't optimized so it's not pulling the mirrors for the same containers 
multiple times yet.
It caused the job runtime to increase and probably the load on docker.io 
<http://docker.io> mirrors hosted by OpenStack Infra to be a bit slower 
to provide the same containers multiple times. The time taken to prepare 
containers on the undercloud and then for the overcloud caused the jobs 
to randomly timeout therefore the gate to fail in a high amount of 
times, so we decided to remove all jobs from the gate by abandoning the 
patches temporarily (I have them in my browser and will restore when 
things are stable again, please do not touch anything).


Steve Baker has been working on a series of patches that optimize the 
way we prepare the containers but basically the workflow will be:
- pull containers needed for the undercloud into a local registry, using 
infra mirror if available

- deploy the containerized undercloud
- pull containers needed for the overcloud minus the ones already pulled 
for the undercloud, using infra mirror if available

- update containers on the overcloud
- deploy the containerized undercloud


Let me also note that it's may be time to introduce jobs dependencies 
[0]. Dependencies might somewhat alleviate registries/mirrors DoS 
issues, like that one we have currently, by running jobs in batches, and 
not firing of all at once.


We still have options to think of. The undercloud deployment takes 
longer than standalone, but provides better coverage therefore better 
extrapolates (and cuts off) future overcloud failures for the dependent 
jobs. Standalone is less stable yet though. The containers update check 
may be also an option for the step 1, or step 2, before the remaining 
multinode jobs execute.


Making those dependent jobs skipped, in turn, reduces DoS effects caused 
to registries and mirrors.


[0] 
https://review.openstack.org/#/q/status:open+project:openstack-infra/tripleo-ci+topic:ci_pipelines




With that process, we hope to reduce the runtime of the deployment and 
therefore reduce the timeouts in the gate.
To enable it, we need to land in that order: 
https://review.openstack.org/#/c/571613/, 
https://review.openstack.org/#/c/574485/, 
https://review.openstack.org/#/c/571631/ and 
https://review.openstack.org/#/c/568403.


In the meantime, we are disabling the containerized undercloud recently 
enabled on all scenarios: https://review.openstack.org/#/c/575264/ for 
mitigation with the hope to stabilize things until Steve's patches land.
Hopefully, we can merge Steve's work tonight/tomorrow and re-enable the 
containerized undercloud on scenarios after checking that we don't have 
timeouts and reasonable deployment runtimes.


That's the plan we came with, if you have any question / feedback please 
share it.

--
Emilien, Steve and Wes

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Proposing Alan Bishop tripleo core on storage bits

2018-06-14 Thread Bogdan Dobrelya


On 6/13/18 6:50 PM, Emilien Macchi wrote:
Alan Bishop has been highly involved in the Storage backends integration 
in TripleO and Puppet modules, always here to update with new features, 
fix (nasty and untestable third-party backends) bugs and manage all the 
backports for stable releases:

https://review.openstack.org/#/q/owner:%22Alan+Bishop+%253Cabishop%2540redhat.com%253E%22

He's also well knowledgeable of how TripleO works and how containers are 
integrated, I would like to propose him as core on TripleO projects for 
patches related to storage things (Cinder, Glance, Swift, Manila, and 
backends).


Please vote -1/+1,


+1


Thanks!
--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-06-05 Thread Bogdan Dobrelya

The proposed undercloud installation jobs dependency [0] worked, see the 
jobs start time [1], [2] confirms that. The resulting delay for the full 
pipeline is an ~80 minutes, as it was expected. So PTAL folks, I propose 
to try it out in real gating and see how the tripleo zuul queue gets 
relieved.


The remaining patch [1] adding a dependency on tox/linting didn't work, 
I'll need some help please to figure out why.


Thank you Tristan and James and y'all folks for helping!

[0] https://review.openstack.org/#/c/568536/
[1] 
http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-undercloud-containers/cfebec0/ara-report/
[2] 
http://logs.openstack.org/36/568536/6/check/tripleo-ci-centos-7-containers-multinode/1a211bb/ara-report/

[3] https://review.openstack.org/#/c/568543/




Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799

It also easier to read dependencies from pipelines definition imo.

-Tristan



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-28 Thread Bogdan Dobrelya


On 5/28/18 11:43 AM, Bogdan Dobrelya wrote:

On 5/25/18 6:40 PM, Tristan Cacqueray wrote:

Hello Bogdan,

Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 



It also easier to read dependencies from pipelines definition imo.


Thank you!
It seems for the most places, tripleo uses pre-defined templates, see 
[0]. And templates can not import dependencies [1] :(


Here is a zuul story for that [2]

[2] https://storyboard.openstack.org/#!/story/2002113



[0] 
http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations 



[1] https://review.openstack.org/#/c/568536/4



-Tristan

On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote:
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started 
simultaneously. While I expected them run one by one. According to 
the patch 568536 [3], [1] is a dependency for [2] and [3].


The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ 

[1] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ 

[2] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ 


[3] https://review.openstack.org/#/c/568536/
[4] 
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) 








--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-28 Thread Bogdan Dobrelya


On 5/25/18 6:40 PM, Tristan Cacqueray wrote:

Hello Bogdan,

Perhaps this has something to do with jobs evaluation order, it may be
worth trying to add the dependencies list in the project-templates, like
it is done here for example:
http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul.d/projects.yaml#n9799 



It also easier to read dependencies from pipelines definition imo.


Thank you!
It seems for the most places, tripleo uses pre-defined templates, see 
[0]. And templates can not import dependencies [1] :(


[0] 
http://codesearch.openstack.org/?q=-%20project%3A=nope==tripleo-ci,tripleo-common,tripleo-common-tempest-plugin,tripleo-docs,tripleo-ha-utils,tripleo-heat-templates,tripleo-image-elements,tripleo-ipsec,tripleo-puppet-elements,tripleo-quickstart,tripleo-quickstart-extras,tripleo-repos,tripleo-specs,tripleo-ui,tripleo-upgrade,tripleo-validations


[1] https://review.openstack.org/#/c/568536/4



-Tristan

On May 25, 2018 12:45 pm, Bogdan Dobrelya wrote:
Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started 
simultaneously. While I expected them run one by one. According to the 
patch 568536 [3], [1] is a dependency for [2] and [3].


The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/ 

[1] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/ 

[2] 
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/ 


[3] https://review.openstack.org/#/c/568536/
[4] 
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged) 





--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-25 Thread Bogdan Dobrelya

Job dependencies seem ignored by zuul, see jobs [0],[1],[2] started
simultaneously. While I expected them run one by one. According to the
patch 568536 [3], [1] is a dependency for [2] and [3].

The same can be observed for the remaining patches in the topic [4].
Is that a bug or I misunderstood what zuul job dependencies actually do?

[0]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-undercloud-containers/731183a/ara-report/
[1]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-3nodes-multinode/a1353ed/ara-report/
[2]
http://logs.openstack.org/36/568536/2/check/tripleo-ci-centos-7-containers-multinode/9777136/ara-report/

[3] https://review.openstack.org/#/c/568536/
[4]
https://review.openstack.org/#/q/topic:ci_pipelines+(status:open+OR+status:merged)

On 5/15/18 11:39 AM, Bogdan Dobrelya wrote:

Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert it,
if it proved to be worse than better.

Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and containers
built for patches and hosted undercloud, in the consequent pipelined
jobs. But I'm not sure how to even approach that.

[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/

On 5/15/18 10:54 AM, Bogdan Dobrelya wrote:

On 5/14/18 10:06 PM, Alex Schultz wrote:
On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya
<bdobr...@redhat.com> wrote:

An update for your review please folks

Bogdan Dobrelya writes:

Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post" may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to
make

the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.

What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies

for how to configure a job to run only after another job has
completed.

There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job

can make sense in a lot of cases. It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter
pep8

or tox jobs, and that's because we value developer time more than CPU
time. We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as
possible,
rather than forcing an iterative workflow where they have to fix
all the
whitespace issues before the CI system will tell them which actual
tests

broke.

-Jim

I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given
that those
undercloud jobs have not so high fail rates though, I think Emilien
is right

in his comments and those would buy us nothing.

From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
faily
and long running, and is non-voting. It deploys (see featuresets
configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing,
when the
containers-multinode fails - see the CI stats page [4]. I've found
only a 2
cases there for the otherwise situation, when containers-multinode
fails,
but 3nodes-multinode passes. So cutting off those future failures
via the
dependency added, *would* buy us something and allow other jobs to
wait less
to commence, by a reasonable price of somewhat extended time of the
main
zuul pipeline. I think it makes sense and that extended CI time will
not

overhead the RDO CI execution times so much to become a problem. WDYT?

I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the

The things are not so simple. There is also a significant
time-to-wait-in-queue jobs start delay. And it takes probably even
longer than the time to execute jobs. And that delay is a function of
available HW resources and zuul queue length. And the proposed change
affects those parameters as well, assuming jobs with failed
dependencies won't run at all. So we could expect longer execution
times compensated with shorter wait times! I'm not sure how to

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-24 Thread Bogdan Dobrelya

ards patch you listed, we would have backport this change to 
*every*
> branch, and it wouldn't really help to avoid the issue. The source of
> problem is not branchless repo here.
>

No we shouldn't be backporting every change.  The logic in oooq-extras
should be version specific and if we're changing an interface in
tripleo in a breaking fashion we're doing it wrong in tripleo. If
we're backporting things to work around tripleo issues, we're doing it
wrong in quickstart.

> Regarding catching such issues and Bogdans point, that's right we added a
> few jobs to catch such issues in the future and prevent breakages, and a 
few
> running jobs is reasonable price to keep configuration working in all
> branches. Comparing to maintenance nightmare with branches of CI code, 
it's
> really a *zero* price.
>

Nothing is free. If there's a high maintenance cost, we haven't
properly identified the optimal way to separate functionality between
tripleo/quickstart.  I have repeatedly said that the provisioning
parts of quickstart should be separate because those aren't tied to a
tripleo version and this along with the scenario configs should be the
only unbranched repo we have. Any roles related to how to
configure/work with tripleo should be branched and tied to a stable
branch of tripleo. This would actually be beneficial for tripleo as
well because then we can see when we are introducing backwards
incompatible changes.

Thanks,
-Alex

 > Thanks
 >
 >
 > On Wed, May 23, 2018 at 3:43 PM, Sergii Golovatiuk
<sgolo...@redhat.com <mailto:sgolo...@redhat.com>>
 > wrote:
 >>
 >> Hi,
 >>
 >> Looking at [1], I am thinking about the price we paid for not
 >> branching tripleo-quickstart. Can we discuss the options to prevent
 >> the issues such as [1]? Thank you in advance.
 >>
 >> [1] https://review.openstack.org/#/c/569830/4
<https://review.openstack.org/#/c/569830/4>
 >>
 >> --
 >> Best Regards,
 >> Sergii Golovatiuk
 >>
 >>
__
 >> OpenStack Development Mailing List (not for usage questions)
 >> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
 >>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
 >
 >
 >
 >
 > --
 > Best regards
 > Sagi Shnaidman
 >
 >
__
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
 >

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>




--
Best regards
Sagi Shnaidman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo][ci][infra] Quickstart Branching

2018-05-23 Thread Bogdan Dobrelya


On 5/23/18 2:43 PM, Sergii Golovatiuk wrote:

Hi,

Looking at [1], I am thinking about the price we paid for not
branching tripleo-quickstart. Can we discuss the options to prevent
the issues such as [1]? Thank you in advance.

[1] https://review.openstack.org/#/c/569830/4



That was only a half of the full price, actually, see also additional 
multinode containers check/gate jobs  [0],[1] from now on executed 
against the master branches of all tripleo repos (IIUC), for release -2 
and -1

from master.

[0] https://review.openstack.org/#/c/569932/
[1] https://review.openstack.org/#/c/569854/


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [barbican] [tc] key store in base services

2018-05-17 Thread Bogdan Dobrelya


On 5/17/18 9:58 AM, Thierry Carrez wrote:

Jeremy Stanley wrote:

[...]
As a community, we're likely to continue to make imbalanced
trade-offs against relevant security features if we don't move
forward and declare that some sort of standardized key storage
solution is a fundamental component on which OpenStack services can
rely. Being able to just assume that you can encrypt volumes in
Swift, even as a means to further secure a TripleO undercloud, would
be a step in the right direction for security-minded deployments.

Unfortunately, I'm unable to find any follow-up summary on the
mailing list from the aforementioned session, but recollection from
those who were present (I had a schedule conflict at that time) was
that a Castellan-compatible key store would at least be a candidate
for inclusion in our base services list:

https://governance.openstack.org/tc/reference/base-services.html


Yes, last time this was discussed, there was lazy consensus that adding 
"a Castellan-compatible secret store" would be a good addition to the 
base services list if we wanted to avoid proliferation of half-baked 
keystore implementations in various components.


The two blockers were:

1/ castellan had to be made less Barbican-specific, offer at least one 
other secrets store (Vault), and move under Oslo (done)


Back to the subject and tripleo underclouds running Barbican, using 
vault as a backend may be a good option, given that openshift supports 
[0] it as well for storing k8s secrets, and kubespray does [1] for 
vanilla k8s deployments, and that we have openshift/k8s-based control 
plane for openstack on the integration roadmap. So we'll highly likely 
end up running Barbican/Vault on undercloud anyway.


[0] https://blog.openshift.com/managing-secrets-openshift-vault-integration/
[1] 
https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vault.md




2/ some projects (was it Designate ? Octavia ?) were relying on advanced 
functions of Barbican not generally found in other secrets store, like 
certificate generation, and so would prefer to depend on Barbican 
itself, which confuses the messaging around the base service addition a 
bit ("any Castellan-supported secret store as long as it's Barbican")





--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Bogdan Dobrelya


On 5/16/18 2:17 PM, Jeremy Stanley wrote:

On 2018-05-16 11:31:30 +0200 (+0200), Bogdan Dobrelya wrote:
[...]

I'm pretty sure though with broader containers adoption, openstack
infra will catch up eventually, so we all could benefit our
upstream CI jobs with affinity based and co-located data available
around for consequent build steps.


I still don't see what it has to do with containers. We've known


My understanding, I may be totally wrong, is that unlike to packages and 
repos (do not count OSTree [0]), containers use layers and can be 
exported into tarballs with built-in de-duplication. This makes idea of 
tossing those tarballs around much more attractive, than doing something 
similar to repositories with packages. Of course container images can be 
pre-built into nodepool images, just like packages, so CI users can 
rebuild on top with less changes brought into new layers, which is 
another nice to have option by the way.


[0] https://rpm-ostree.readthedocs.io/en/latest/


these were potentially useful features long before
container-oriented projects came into the picture. We simply focused
on implementing other, even more generally-applicable features
first.


Right, I think this only confirms that it *does* have something to 
containers, and priorities for containerized use cases will follow 
containers adoption trends. If everyone one day suddenly ask for 
nodepool images containing latest kolla containers injected, for example.





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-16 Thread Bogdan Dobrelya

On 5/15/18 10:31 PM, Wesley Hayutin wrote:

On Tue, May 15, 2018 at 1:29 PM James E. Blair <cor...@inaugust.com 
<mailto:cor...@inaugust.com>> wrote:

Jeremy Stanley <fu...@yuggoth.org <mailto:fu...@yuggoth.org>> writes:

 > On 2018-05-15 09:40:28 -0700 (-0700), James E. Blair wrote:
 > [...]
 >> We're also talking about making a new kind of job which can
continue to
 >> run after it's "finished" so that you could use it to do
something like
 >> host a container registry that's used by other jobs running on the
 >> change.  We don't have that feature yet, but if we did, would
you prefer
 >> to use that instead of the intermediate swift storage?
 >
 > If the subsequent jobs depending on that one get nodes allocated
 > from the same provider, that could solve a lot of the potential
 > network performance risks as well.

That's... tricky.  We're *also* looking at affinity for buildsets, and
I'm optimistic we'll end up with something there eventually, but that's
likely to be a more substantive change and probably won't happen as
soon.  I do agree it will be nice, especially for use cases like this.

-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

There is a lot here to unpack and discuss, but I really like the ideas 
I'm seeing.
Nice work Bogdan!  I've added it the tripleo meeting agenda for next 
week so we can continue socializing the idea and get feedback.

Thanks!

https://etherpad.openstack.org/p/tripleo-meeting-items

Thank you for feedback, folks. There is a lot of technical caveats, 
right. I'm pretty sure though with broader containers adoption, 
openstack infra will catch up eventually, so we all could benefit our 
upstream CI jobs with affinity based and co-located data available 
around for consequent build steps.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 5:08 PM, Sagi Shnaidman wrote:

Bogdan,

I think before final decisions we need to know exactly - what a price we 
need to pay? Without exact numbers it will be difficult to discuss about.
I we need to wait 80 mins of undercloud-containers job to finish for 
starting all other jobs, it will be about 4.5 hours to wait for result 
(+ 4.5 hours in gate) which is too big price imho and doesn't worth an 
effort.


What are exact numbers we are talking about?


I fully agree but can't have those numbers, sorry! As I noted above, 
those are definitely sitting in openstack-infra's elastic search DB, 
just needed to get extracted with some assistance of folks who know more 
on that!




Thanks


On Tue, May 15, 2018 at 3:07 PM, Bogdan Dobrelya <bdobr...@redhat.com 
<mailto:bdobr...@redhat.com>> wrote:


Let me clarify the problem I want to solve with pipelines.

It is getting *hard* to develop things and move patches to the Happy
End (merged):
- Patches wait too long for CI jobs to start. It should be minutes
and not hours of waiting.
- If a patch fails a job w/o a good reason, the consequent recheck
operation repeat waiting all over again.

How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only
want to build pipelines for the main zuul check process, omitting
gating and RDO CI (for now).

Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed
like it is proposed here. So that any jobs expected to fail, when a
dependency fails, will be omitted from execution. This saves HW
resources and zuul queue places a lot, making it available for other
patches and allowing those to have CI jobs started faster (less
waiting!). When we have "recheck storms", like because of some known
intermittent side issue, that outcome is multiplied by the recheck
storm um... level, and delivers even better and absolutely amazing
results :) Zuul queue will not be growing insanely getting
overwhelmed by multiple clones of the rechecked jobs highly likely
deemed to fail, and blocking other patches what might have chances
to pass checks as non-affected by that intermittent issue.

And for the first case, when a patch succeeds, it takes some
extended time, and that is the price to pay. How much time it takes
to finish in a pipeline fully depends on implementation.

The effectiveness could only be measured with numbers extracted from
elastic search data, like average time to wait for a job to start,
success vs fail execution time percentiles for a job, average amount
of rechecks, recheck storms history et al. I don't have that data
and don't know how to get it. Any help with that is very appreciated
and could really help to move the proposed patches forward or
decline it. And we could then compare "before" and "after" as well.

I hope that explains the problem scope and the methodology to
    address that.


On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

An update for your review please folks

Bogdan Dobrelya http://redhat.com>>
writes:

Hello.
As Zuul documentation [0] explains, the names "check",
"gate", and
"post"  may be altered for more advanced pipelines. Is
it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it
possible to make
the consequent steps reusing environments from the
previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want
we to solve
with this "virtual RFE", and using such multi-staged
check pipelines,
is reducing (ideally, de-duplicating) some of the common
steps for
existing CI jobs.


What you're describing sounds more like a job graph within a
pipeline.
See:

https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies

<https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies>

for how to configure a job to run only after another job has
completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 4:30 PM, James E. Blair wrote:

Bogdan Dobrelya <bdobr...@redhat.com> writes:


Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert
it, if it proved to be worse than better.
Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and
containers built for patches and hosted undercloud, in the consequent
pipelined jobs. But I'm not sure how to even approach that.

[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/


In order to use an artifact in a dependent job, you need to store it
somewhere and retrieve it.

In the parent job, I'd recommend storing the artifact on the log server
(in an "artifacts/" directory) next to the job's logs.  The log server
is essentially a time-limited artifact repository keyed on the zuul
build UUID.

Pass the URL to the child job using the zuul_return Ansible module.

Have the child job fetch it from the log server using the URL it gets.

However, don't do that if the artifacts are very large -- more than a
few MB -- we'll end up running out of space quickly.

In that case, please volunteer some time to help the infra team set up a
swift container to store these artifacts.  We don't need to *run*
swift -- we have clouds with swift already.  We just need some help
setting up accounts, secrets, and Ansible roles to use it from Zuul.


Thank you, that's a good proposal! So when we have done that upstream 
infra swift setup for tripleo, the 1st step in the job dependency graph 
may be using quickstart to do something like:


* check out testing depends-on things,
* build repos and all tripleo docker images from these repos,
* upload into a swift container, with an automatic expiration set, the 
de-duplicated and compressed tarball created with something like:

  # docker save $(docker images -q) | gzip -1 > all.tar.xz
(I expect it will be something like a 2G file)
* something similar for DLRN repos prolly, I'm not an expert for this part.

Then those stored artifacts to be picked up by the next step in the 
graph, deploying undercloud and overcloud in the single step, like:

* fetch the swift containers with repos and container images
* docker load -i all.tar.xz
* populate images into a local registry, as usual
* something similar for the repos. Includes an offline yum update (we 
already have a compressed repo, right? profit!)

* deploy UC
* deploy OC, if a job wants it

And if OC deployment brought into a separate step, we do not need local 
registries, just 'docker load -i all.tar.xz' issued for overcloud nodes 
should replace image prep workflows and registries, AFAICT. Not sure 
with the repos for that case.


I wish to assist with the upstream infra swift setup for tripleo, and 
that plan, just need a blessing and more hands from tripleo CI squad ;)




-Jim

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/15/18 2:30 PM, Jeremy Stanley wrote:

On 2018-05-15 14:07:56 +0200 (+0200), Bogdan Dobrelya wrote:
[...]

How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only want to
build pipelines for the main zuul check process, omitting gating and RDO CI
(for now).

Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed like it is
proposed here. So that any jobs expected to fail, when a dependency fails,
will be omitted from execution.

[...]

Your choice of terminology is making it hard to follow this
proposal. You seem to mean something other than
https://zuul-ci.org/docs/zuul/user/config.html#pipeline when you use
the term "pipeline" (which gets confusing very quickly for anyone
familiar with Zuul configuration concepts).


Indeed, sorry for that confusion. I mean pipelines as jobs executed in 
batches, ordered via defined dependencies, like gitlab pipelines [0]. 
And those batches can also be thought of steps, or whatever we call that.


[0] https://docs.gitlab.com/ee/ci/pipelines.html




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


Let me clarify the problem I want to solve with pipelines.

It is getting *hard* to develop things and move patches to the Happy End 
(merged):
- Patches wait too long for CI jobs to start. It should be minutes and 
not hours of waiting.
- If a patch fails a job w/o a good reason, the consequent recheck 
operation repeat waiting all over again.


How pipelines may help solve it?
Pipelines only alleviate, not solve the problem of waiting. We only want 
to build pipelines for the main zuul check process, omitting gating and 
RDO CI (for now).


Where are two cases to consider:
- A patch succeeds all checks
- A patch fails a check with dependencies

The latter cases benefit us the most, when pipelines are designed like 
it is proposed here. So that any jobs expected to fail, when a 
dependency fails, will be omitted from execution. This saves HW 
resources and zuul queue places a lot, making it available for other 
patches and allowing those to have CI jobs started faster (less 
waiting!). When we have "recheck storms", like because of some known 
intermittent side issue, that outcome is multiplied by the recheck storm 
um... level, and delivers even better and absolutely amazing results :) 
Zuul queue will not be growing insanely getting overwhelmed by multiple 
clones of the rechecked jobs highly likely deemed to fail, and blocking 
other patches what might have chances to pass checks as non-affected by 
that intermittent issue.


And for the first case, when a patch succeeds, it takes some extended 
time, and that is the price to pay. How much time it takes to finish in 
a pipeline fully depends on implementation.


The effectiveness could only be measured with numbers extracted from 
elastic search data, like average time to wait for a job to start, 
success vs fail execution time percentiles for a job, average amount of 
rechecks, recheck storms history et al. I don't have that data and don't 
know how to get it. Any help with that is very appreciated and could 
really help to move the proposed patches forward or decline it. And we 
could then compare "before" and "after" as well.


I hope that explains the problem scope and the methodology to address that.

On 5/14/18 6:15 PM, Bogdan Dobrelya wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.


What you're describing sounds more like a job graph within a pipeline.
See: 
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies 


for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for 
undercloud deployments vs upgrades testing (and some more). Given that 
those undercloud jobs have not so high fail rates though, I think 
Emilien is right in his comments and those would buy us nothing.


 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on 
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily and long running, and is non-voting. It deploys (see featuresets 
configs [3]*) a 3 nodes in HA fashion. And it seems almost never 
passing, when the containers-multinode fails - see the CI stats page 
[4]. I've found only a 2 cases there for the otherwise situation, when 
containers-multinode fails, but 3nodes-multinode passes. So cutting off 
those future failures via the dependency added, *would* buy us something 
and allow other jobs to wait less to commence, by a reasonable price of 
somewhat extended time of the main zuul pipeline. I think it makes sense 
and that extended CI time will not

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


Added a few more patches [0], [1] by the discussion results. PTAL folks.
Wrt remaining in the topic, I'd propose to give it a try and revert it, 
if it proved to be worse than better.

Thank you for feedback!

The next step could be reusing artifacts, like DLRN repos and containers 
built for patches and hosted undercloud, in the consequent pipelined 
jobs. But I'm not sure how to even approach that.


[0] https://review.openstack.org/#/c/568536/
[1] https://review.openstack.org/#/c/568543/

On 5/15/18 10:54 AM, Bogdan Dobrelya wrote:

On 5/14/18 10:06 PM, Alex Schultz wrote:
On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya 
<bdobr...@redhat.com> wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.



What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies 


for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous 
job

can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as 
possible,
rather than forcing an iterative workflow where they have to fix all 
the
whitespace issues before the CI system will tell them which actual 
tests

broke.

-Jim



I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given 
that those
undercloud jobs have not so high fail rates though, I think Emilien 
is right

in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily

and long running, and is non-voting. It deploys (see featuresets configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing, 
when the
containers-multinode fails - see the CI stats page [4]. I've found 
only a 2
cases there for the otherwise situation, when containers-multinode 
fails,
but 3nodes-multinode passes. So cutting off those future failures via 
the
dependency added, *would* buy us something and allow other jobs to 
wait less

to commence, by a reasonable price of somewhat extended time of the main
zuul pipeline. I think it makes sense and that extended CI time will not
overhead the RDO CI execution times so much to become a problem. WDYT?



I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the


The things are not so simple. There is also a significant 
time-to-wait-in-queue jobs start delay. And it takes probably even 
longer than the time to execute jobs. And that delay is a function of 
available HW resources and zuul queue length. And the proposed change 
affects those parameters as well, assuming jobs with failed dependencies 
won't run at all. So we could expect longer execution times compensated 
with shorter wait times! I'm not sure how to estimate that tho. You 
folks have all numbers and knowledge, let's use that please.



jobs.  The only thing I could think of where this makes more sense is
to delay the deployment tests until the pep8/unit tests pass.  e.g.
let's not burn resources when the code is bad. There might be
arguments about lack of information from a deployment when developing
things but I would argue that the patch should be vetted properly
first in a local environment before taking CI resources.


I support this idea as well, though I'm sceptical about having that 
blessed in the end :) I'll add a patch though.




Thanks,
-Alex


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3]
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html 


[4] http://tripleo.

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/14/18 10:06 PM, Alex Schultz wrote:

On Mon, May 14, 2018 at 10:15 AM, Bogdan Dobrelya <bdobr...@redhat.com> wrote:

An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.



What you're describing sounds more like a job graph within a pipeline.
See:
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim



I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for
undercloud deployments vs upgrades testing (and some more). Given that those
undercloud jobs have not so high fail rates though, I think Emilien is right
in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite faily
and long running, and is non-voting. It deploys (see featuresets configs
[3]*) a 3 nodes in HA fashion. And it seems almost never passing, when the
containers-multinode fails - see the CI stats page [4]. I've found only a 2
cases there for the otherwise situation, when containers-multinode fails,
but 3nodes-multinode passes. So cutting off those future failures via the
dependency added, *would* buy us something and allow other jobs to wait less
to commence, by a reasonable price of somewhat extended time of the main
zuul pipeline. I think it makes sense and that extended CI time will not
overhead the RDO CI execution times so much to become a problem. WDYT?



I'm not sure it makes sense to add a dependency on other deployment
tests. It's going to add additional time to the CI run because the
upgrade won't start until well over an hour after the rest of the


The things are not so simple. There is also a significant 
time-to-wait-in-queue jobs start delay. And it takes probably even 
longer than the time to execute jobs. And that delay is a function of 
available HW resources and zuul queue length. And the proposed change 
affects those parameters as well, assuming jobs with failed dependencies 
won't run at all. So we could expect longer execution times compensated 
with shorter wait times! I'm not sure how to estimate that tho. You 
folks have all numbers and knowledge, let's use that please.



jobs.  The only thing I could think of where this makes more sense is
to delay the deployment tests until the pep8/unit tests pass.  e.g.
let's not burn resources when the code is bad. There might be
arguments about lack of information from a deployment when developing
things but I would argue that the patch should be vetted properly
first in a local environment before taking CI resources.


I support this idea as well, though I'm sceptical about having that 
blessed in the end :) I'll add a patch though.




Thanks,
-Alex


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3]
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html
[4] http://tripleo.org/cistatus.html

* ignore the column 1, it's obsolete, all CI jobs now using configs download
AFAICT...

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for us

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-15 Thread Bogdan Dobrelya


On 5/14/18 9:15 PM, Sagi Shnaidman wrote:

Hi, Bogdan

I like the idea with undercloud job. Actually if undercloud fails, I'd 
stop all other jobs, because it doens't make sense to run them. Seeing 
the same failure in 10 jobs doesn't add too much. So maybe adding 
undercloud job as dependency for all multinode jobs would be great idea.


I like that idea, I'll add another patch in the topic then.

I think it's worth to check also how long it will delay jobs. Will all 
jobs wait until undercloud job is running? Or they will be aborted when 
undercloud job is failing?


That is is a good question for openstack-infra folks developing zuul :)
But, we could just try it and see how it works, happily zuul v3 allows 
doing that just in the scope of proposed patches! My expectation is all 
jobs delayed (and I mean the main zuul pipeline execution time here) by 
an average time of the undercloud deploy job of ~80 min, which hopefully 
should not be a big deal given that there is a separate RDO CI pipeline 
running in parallel, which normally *highly likely* extends that 
extended time anyway :) And given high chances of additional 'recheck 
rdo' runs we can observe these days for patches on review. I wish we 
could introduce inter-pipeline dependencies (zuul CI <-> RDO CI) for 
those as well...




However I'm very sceptical about multinode containers and scenarios 
jobs, they could fail because of very different reasons, like race 
conditions in product or infra issues. Having skipping some of them will 
lead to more rechecks from devs trying to discover all problems in a 
row, which will delay the development process significantly.


right, I roughly estimated delay for the main zuul pipeline execution 
time for jobs might be a ~2.5h, which is not good. We could live with 
that had it be a ~1h only, like it takes for the undercloud containers 
job dependency example.




Thanks


On Mon, May 14, 2018 at 7:15 PM, Bogdan Dobrelya <bdobr...@redhat.com 
<mailto:bdobr...@redhat.com>> wrote:


An update for your review please folks

    Bogdan Dobrelya http://redhat.com>> writes:

Hello.
As Zuul documentation [0] explains, the names "check",
"gate", and
"post"  may be altered for more advanced pipelines. Is it
doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it
possible to make
the consequent steps reusing environments from the previous
steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we
to solve
with this "virtual RFE", and using such multi-staged check
pipelines,
is reducing (ideally, de-duplicating) some of the common
steps for
existing CI jobs.


What you're describing sounds more like a job graph within a
pipeline.
See:

https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies

<https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies>
for how to configure a job to run only after another job has
completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the
previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on
shorter pep8
or tox jobs, and that's because we value developer time more
than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as
possible,
rather than forcing an iterative workflow where they have to fix
all the
whitespace issues before the CI system will tell them which
actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines
for undercloud deployments vs upgrades testing (and some more).
Given that those undercloud jobs have not so high fail rates though,
I think Emilien is right in his comments and those would buy us nothing.

 From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite
faily and long running, and is non-voting. It deploys (see
featuresets configs [3]*) a 3 nodes in HA fashion. And it seems
almost never passing, when the containers-multinode fails - see the
CI stats page [4]. I've found only a 2 cases there for the otherwis

Re: [openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-05-14 Thread Bogdan Dobrelya


An update for your review please folks


Bogdan Dobrelya  writes:


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and
"post"  may be altered for more advanced pipelines. Is it doable to
introduce, for particular openstack projects, multiple check
stages/steps as check-1, check-2 and so on? And is it possible to make
the consequent steps reusing environments from the previous steps
finished with?

Narrowing down to tripleo CI scope, the problem I'd want we to solve
with this "virtual RFE", and using such multi-staged check pipelines,
is reducing (ideally, de-duplicating) some of the common steps for
existing CI jobs.


What you're describing sounds more like a job graph within a pipeline.
See: 
https://docs.openstack.org/infra/zuul/user/config.html#attr-job.dependencies
for how to configure a job to run only after another job has completed.
There is also a facility to pass data between such jobs.

... (skipped) ...

Creating a job graph to have one job use the results of the previous job
can make sense in a lot of cases.  It doesn't always save *time*
however.

It's worth noting that in OpenStack's Zuul, we have made an explicit
choice not to have long-running integration jobs depend on shorter pep8
or tox jobs, and that's because we value developer time more than CPU
time.  We would rather run all of the tests and return all of the
results so a developer can fix all of the errors as quickly as possible,
rather than forcing an iterative workflow where they have to fix all the
whitespace issues before the CI system will tell them which actual tests
broke.

-Jim


I proposed a few zuul dependencies [0], [1] to tripleo CI pipelines for 
undercloud deployments vs upgrades testing (and some more). Given that 
those undercloud jobs have not so high fail rates though, I think 
Emilien is right in his comments and those would buy us nothing.


From the other side, what do you think folks of making the
tripleo-ci-centos-7-3nodes-multinode depend on 
tripleo-ci-centos-7-containers-multinode [2]? The former seems quite 
faily and long running, and is non-voting. It deploys (see featuresets 
configs [3]*) a 3 nodes in HA fashion. And it seems almost never 
passing, when the containers-multinode fails - see the CI stats page 
[4]. I've found only a 2 cases there for the otherwise situation, when 
containers-multinode fails, but 3nodes-multinode passes. So cutting off 
those future failures via the dependency added, *would* buy us something 
and allow other jobs to wait less to commence, by a reasonable price of 
somewhat extended time of the main zuul pipeline. I think it makes sense 
and that extended CI time will not overhead the RDO CI execution times 
so much to become a problem. WDYT?


[0] https://review.openstack.org/#/c/568275/
[1] https://review.openstack.org/#/c/568278/
[2] https://review.openstack.org/#/c/568326/
[3] 
https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html

[4] http://tripleo.org/cistatus.html

* ignore the column 1, it's obsolete, all CI jobs now using configs 
download AFAICT...


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] CI Squads’ Sprint 12 Summary: libvirt-reproducer, python-tempestconf

2018-05-09 Thread Bogdan Dobrelya

not gated tripleo-upgrade repo

If you have any questions and/or suggestions, please contact us in #oooq 
or #tripleo


Thanks,

Matt


tq: https://github.com/openstack/tripleo-quickstart
tqe: https://github.com/openstack/tripleo-quickstart-extras

[1] 
https://specs.openstack.org/openstack/tripleo-specs/specs/policy/ci-team-structure.html

[2] {{tq}}/roles/libvirt/setup/overcloud/tasks/libvirt_nodepool.yml
[3] 
{{tqe}}/roles/create-reproducer-script/templates/reproducer-quickstart.sh.j2#L50 


[4] {{tqe}}/roles/snapshot-libvirt
[5] 
https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/12/html-single/openstack_integration_test_suite_guide
[6] 
https://blogs.rdoproject.org/2018/05/running-tempest-tests-against-a-tripleo-undercloud
[7] 
https://blogs.rdoproject.org/2018/05/consuming-kolla-tempest-container-image-for-running-tempest-tests 


[8] https://github.com/redhat-cip/ansible-role-openstack-certification
[9] https://review.rdoproject.org/etherpad/p/ruckrover-sprint12
[10] https://etherpad.openstack.org/p/rover-030518


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] Fwd: Re: [tripleo][kolla] roadmap on containers workflow

2018-04-23 Thread Bogdan Dobrelya

Added the kolla tag in the hope to get some feedback wrt the question

 Forwarded Message 
Subject: Re: [openstack-dev] [tripleo] roadmap on containers workflow
Date: Mon, 23 Apr 2018 10:08:36 +0200
From: Bogdan Dobrelya <bdobr...@redhat.com>
Organization: Red Hat
To: openstack-dev@lists.openstack.org

On 4/20/18 8:56 PM, Emilien Macchi wrote:
So the role has proven to be useful and we're now sure that we need it 
to deploy a container registry before any container in the deployment, 
which means we can't use the puppet code anymore for this service.

I propose that we move the role to OpenStack:
https://review.openstack.org/#/c/563197/
https://review.openstack.org/#/c/563198/
https://review.openstack.org/#/c/563200/

So we can properly peer review and gate the new role.

In the meantime, we continue to work on the new workflow.
Thanks,

On Sun, Apr 15, 2018 at 7:24 PM, Emilien Macchi <emil...@redhat.com 
<mailto:emil...@redhat.com>> wrote:

On Fri, Apr 13, 2018 at 5:58 PM, Emilien Macchi <emil...@redhat.com
<mailto:emil...@redhat.com>> wrote:

A bit of progress today, I prototyped an Ansible role for that
purpose:
https://github.com/EmilienM/ansible-role-container-registry
<https://github.com/EmilienM/ansible-role-container-registry>

The next step is, I'm going to investigate if we can deploy
Docker and Docker Distribution (to deploy the registry v2) via
the existing composable services in THT by
using external_deploy_tasks maybe (or another interface).
The idea is really to have the registry ready before actually
deploying the undercloud containers, so we can modify them in
the middle (run container-check in our case).

This patch: https://review.openstack.org/#/c/561377
<https://review.openstack.org/#/c/561377> is deploying Docker and
Docker Registry v2 *before* containers deployment in the docker_steps.
It's using the external_deploy_tasks interface that runs right after
the host_prep_tasks, so still before starting containers.

It's using the Ansible role that was prototyped on Friday, please
take a look and raise any concern.

I have only a question if we could reuse something instead if already 
had been solved in projects like Kolla? Otherwise it's LGTM.

Now I would like to investigate how we can run container workflows
between the deployment and docker and containers deployments.
-- 
Emilien Macchi

--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] roadmap on containers workflow

2018-04-23 Thread Bogdan Dobrelya


On 4/20/18 8:56 PM, Emilien Macchi wrote:
So the role has proven to be useful and we're now sure that we need it 
to deploy a container registry before any container in the deployment, 
which means we can't use the puppet code anymore for this service.


I propose that we move the role to OpenStack:
https://review.openstack.org/#/c/563197/
https://review.openstack.org/#/c/563198/
https://review.openstack.org/#/c/563200/

So we can properly peer review and gate the new role.

In the meantime, we continue to work on the new workflow.
Thanks,

On Sun, Apr 15, 2018 at 7:24 PM, Emilien Macchi <emil...@redhat.com 
<mailto:emil...@redhat.com>> wrote:


On Fri, Apr 13, 2018 at 5:58 PM, Emilien Macchi <emil...@redhat.com
<mailto:emil...@redhat.com>> wrote:


A bit of progress today, I prototyped an Ansible role for that
purpose:
https://github.com/EmilienM/ansible-role-container-registry
<https://github.com/EmilienM/ansible-role-container-registry>

The next step is, I'm going to investigate if we can deploy
Docker and Docker Distribution (to deploy the registry v2) via
the existing composable services in THT by
using external_deploy_tasks maybe (or another interface).
The idea is really to have the registry ready before actually
deploying the undercloud containers, so we can modify them in
the middle (run container-check in our case).


This patch: https://review.openstack.org/#/c/561377
<https://review.openstack.org/#/c/561377> is deploying Docker and
Docker Registry v2 *before* containers deployment in the docker_steps.
It's using the external_deploy_tasks interface that runs right after
the host_prep_tasks, so still before starting containers.

It's using the Ansible role that was prototyped on Friday, please
take a look and raise any concern.


I have only a question if we could reuse something instead if already 
had been solved in projects like Kolla? Otherwise it's LGTM.



Now I would like to investigate how we can run container workflows
between the deployment and docker and containers deployments.
-- 
Emilien Macchi





--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Ironic Inspector in the overcloud

2018-04-18 Thread Bogdan Dobrelya


On 4/18/18 12:07 PM, Derek Higgins wrote:

Hi All,

I've been testing the ironic inspector containerised service in the 
overcloud, the service essentially works but there is a couple of 
hurdles to tackle to set it up, the first of these is how to get  the 
IPA kernel and ramdisk where they need to be.


These need to be be present in the ironic_pxe_http container to be 
served out over http, whats the best way to get them there?


On the undercloud this is done by copying the files across the 
filesystem[1][2] to /httpboot  when we run "openstack overcloud image 
upload", but on the overcloud an alternative is required, could the 
files be pulled into the container during setup?


I'd prefer keep bind-mounting IPA kernel and ramdisk into a container 
via the /var/lib/ironic/httpboot host-path. So the question then becomes 
how to deliver those by that path for overcloud nodes?




thanks,
Derek

1 - 
https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L421-L433
2 - 
https://github.com/openstack/python-tripleoclient/blob/3cf44eb/tripleoclient/v1/overcloud_image.py#L181 



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] The Weekly Owl - 17th Edition

2018-04-18 Thread Bogdan Dobrelya

ipleo-security-squa
<http://ck.org/p/tripleo-security-squa>d

++
| Owl fact  |
++

Did you know owls were watching you while working on TripleO?
Check this out:

https://www.reddit.com/r/pics/comments/8cz8v0/owls_born_outside_of_office_window_wont_stop/
(Thanks Wes for the link)


Thanks all for reading and stay tuned!
--
Your fellow reporter, Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] roadmap on containers workflow

2018-04-12 Thread Bogdan Dobrelya


On 4/12/18 12:38 AM, Steve Baker wrote:



On 11/04/18 12:50, Emilien Macchi wrote:

Greetings,

Steve Baker and I had a quick chat today about the work that is being 
done around containers workflow in Rocky cycle.


If you're not familiar with the topic, I suggest to first read the 
blueprint to understand the context here:

https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow

One of the great outcomes of this blueprint is that in Rocky, the 
operator won't have to run all the "openstack overcloud container" 
commands to prepare the container registry and upload the containers. 
Indeed, it'll be driven by Heat and Mistral mostly.
But today our discussion extended on 2 uses-cases that we're going to 
explore and find how we can address them:
1) I'm a developer and want to deploy a containerized undercloud with 
customized containers (more or less related to the all-in-one 
discussions on another thread [1]).
2) I'm submitting a patch in tripleo-common (let's say a workflow) and 
need my patch to be tested when the undercloud is containerized (see 
[2] for an excellent example).


I'm fairly sure the only use cases for this will be developer or CI 
based. I think we need to be strongly encouraging image modifications 
for production deployments to go through some kind of image building 
pipeline. See Next Steps below for the implications of this.



Both cases would require additional things:
- The container registry needs to be deployed *before* actually 
installing the undercloud.
- We need a tool to update containers from this registry and *before* 
deploying them. We already have this tool in place in our CI for the 
overcloud (see [3] and [4]). Now we need a similar thing for the 
undercloud.


One problem I see is that we use roles and environment files to filter 
the images to be pulled/modified/uploaded. Now we would need to assemble 
a list of undercloud *and* overcloud environments, and build some kind 
of aggregate role data for both. This would need to happen before the 
undercloud is even deployed, which is quite a different order from what 
quickstart does currently.


Either that or we do no image filtering and just process every image 
regardless of whether it will be used.




Next steps:
- Agree that we need to deploy the container-registry before the 
undercloud.
- If agreed, we'll create a new Ansible role called 
ansible-role-container-registry that for now will deploy exactly what 
we have in TripleO, without extra feature.

+1
- Drive the playbook runtime from tripleoclient to bootstrap the 
container registry (which of course could be disabled in undercloud.conf).
tripleoclient could switch to using this role instead of puppet-tripleo 
to install the registry, however since the only use-cases we have are 
dev/CI driven I wonder if quickstart/infrared can just invoke the role 
when required, before tripleoclient is involved.


- Create another Ansible role that would re-use container-check tool 
but the idea is to provide a role to modify containers when needed, 
and we could also control it from tripleoclient. The role would be 
using the ContainerImagePrepare parameter, which Steve is working on 
right now.


Since the use cases are all upstream CI/dev I do wonder if we should 
just have a dedicated container-check 
<https://github.com/imain/container-check> role inside 
tripleo-quickstart-extras which can continue to use the script[3] or 
whatever. Keeping the logic in quickstart will remove the temptation to 
use it instead of a proper image build pipeline for production deployments.


+1 to put it in quickstart-extras to "hide" it from the production use 
cases.




Alternatively it could still be a standalone role which quickstart 
invokes, just to accommodate development workflows which don't use 
quickstart.



Feedback is welcome, thanks.

[1] All-In-One thread: 
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html
[2] Bug report when undercloud is containeirzed 
https://bugs.launchpad.net/tripleo/+bug/1762422
[3] Tool to update containers if needed: 
https://github.com/imain/container-check
[4] Container-check running in TripleO CI: 
https://review.openstack.org/#/c/558885/ and 
https://review.openstack.org/#/c/529399/

--
Emilien Macchi


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__

Re: [openstack-dev] [tripleo] roadmap on containers workflow

2018-04-12 Thread Bogdan Dobrelya


On 4/12/18 12:38 AM, Steve Baker wrote:



On 11/04/18 12:50, Emilien Macchi wrote:

Greetings,

Steve Baker and I had a quick chat today about the work that is being 
done around containers workflow in Rocky cycle.


If you're not familiar with the topic, I suggest to first read the 
blueprint to understand the context here:

https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow

One of the great outcomes of this blueprint is that in Rocky, the 
operator won't have to run all the "openstack overcloud container" 
commands to prepare the container registry and upload the containers. 
Indeed, it'll be driven by Heat and Mistral mostly.
But today our discussion extended on 2 uses-cases that we're going to 
explore and find how we can address them:
1) I'm a developer and want to deploy a containerized undercloud with 
customized containers (more or less related to the all-in-one 
discussions on another thread [1]).
2) I'm submitting a patch in tripleo-common (let's say a workflow) and 
need my patch to be tested when the undercloud is containerized (see 
[2] for an excellent example).


I'm fairly sure the only use cases for this will be developer or CI 
based. I think we need to be strongly encouraging image modifications 
for production deployments to go through some kind of image building 
pipeline. See Next Steps below for the implications of this.



Both cases would require additional things:
- The container registry needs to be deployed *before* actually 
installing the undercloud.
- We need a tool to update containers from this registry and *before* 
deploying them. We already have this tool in place in our CI for the 
overcloud (see [3] and [4]). Now we need a similar thing for the 
undercloud.


One problem I see is that we use roles and environment files to filter 
the images to be pulled/modified/uploaded. Now we would need to assemble 
a list of undercloud *and* overcloud environments, and build some kind 
of aggregate role data for both. This would need to happen before the 
undercloud is even deployed, which is quite a different order from what 
quickstart does currently.


Either that or we do no image filtering and just process every image 
regardless of whether it will be used.




Next steps:
- Agree that we need to deploy the container-registry before the 
undercloud.
- If agreed, we'll create a new Ansible role called 
ansible-role-container-registry that for now will deploy exactly what 
we have in TripleO, without extra feature.

+1
- Drive the playbook runtime from tripleoclient to bootstrap the 
container registry (which of course could be disabled in undercloud.conf).
tripleoclient could switch to using this role instead of puppet-tripleo 
to install the registry, however since the only use-cases we have are 
dev/CI driven I wonder if quickstart/infrared can just invoke the role 
when required, before tripleoclient is involved.


Please let's do that for tripleoclient and only make quickstart and 
other tools to invoke commands. We should keep being close to what users 
would do, which is only issuing client commands.




- Create another Ansible role that would re-use container-check tool 
but the idea is to provide a role to modify containers when needed, 
and we could also control it from tripleoclient. The role would be 
using the ContainerImagePrepare parameter, which Steve is working on 
right now.


Since the use cases are all upstream CI/dev I do wonder if we should 
just have a dedicated container-check 
<https://github.com/imain/container-check> role inside 
tripleo-quickstart-extras which can continue to use the script[3] or 
whatever. Keeping the logic in quickstart will remove the temptation to 
use it instead of a proper image build pipeline for production deployments.


Alternatively it could still be a standalone role which quickstart 
invokes, just to accommodate development workflows which don't use 
quickstart.



Feedback is welcome, thanks.

[1] All-In-One thread: 
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html
[2] Bug report when undercloud is containeirzed 
https://bugs.launchpad.net/tripleo/+bug/1762422
[3] Tool to update containers if needed: 
https://github.com/imain/container-check
[4] Container-check running in TripleO CI: 
https://review.openstack.org/#/c/558885/ and 
https://review.openstack.org/#/c/529399/

--
Emilien Macchi


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best

Re: [openstack-dev] [tripleo] roadmap on containers workflow

2018-04-12 Thread Bogdan Dobrelya


On 4/12/18 10:08 AM, Sergii Golovatiuk wrote:

Hi,

Thank you very much for brining up this topic.

On Wed, Apr 11, 2018 at 2:50 AM, Emilien Macchi <emil...@redhat.com> wrote:

Greetings,

Steve Baker and I had a quick chat today about the work that is being done
around containers workflow in Rocky cycle.

If you're not familiar with the topic, I suggest to first read the blueprint
to understand the context here:
https://blueprints.launchpad.net/tripleo/+spec/container-prepare-workflow

One of the great outcomes of this blueprint is that in Rocky, the operator
won't have to run all the "openstack overcloud container" commands to
prepare the container registry and upload the containers. Indeed, it'll be
driven by Heat and Mistral mostly.


I am trying to think as operator and it's very similar to 'openstack
container' which is swift. So it might be confusing I guess.



But today our discussion extended on 2 uses-cases that we're going to
explore and find how we can address them:
1) I'm a developer and want to deploy a containerized undercloud with
customized containers (more or less related to the all-in-one discussions on
another thread [1]).
2) I'm submitting a patch in tripleo-common (let's say a workflow) and need
my patch to be tested when the undercloud is containerized (see [2] for an
excellent example).


That's very nice initiative.


Both cases would require additional things:
- The container registry needs to be deployed *before* actually installing
the undercloud.
- We need a tool to update containers from this registry and *before*
deploying them. We already have this tool in place in our CI for the
overcloud (see [3] and [4]). Now we need a similar thing for the undercloud.


I would use external registry in this case. Quay.io might be a good
choice to have rock solid simplicity. It might not be good for CI as
requires very strong connectivity but it should be sufficient for
developers.


Next steps:
- Agree that we need to deploy the container-registry before the undercloud.
- If agreed, we'll create a new Ansible role called
ansible-role-container-registry that for now will deploy exactly what we
have in TripleO, without extra feature.


Deploy own registry as part of UC deployment or use external. For
instance, for production use I would like to have a cluster of 3-5
registries with HAProxy in front to speed up 1k nodes deployments.


Note that this implies HA undercloud as well. Although, given that HA 
undercloud is goodness indeed, I would *not* invest time into reliable 
container registry deployment architecture for undercloud as we'll have 
it for free, once kubernetes/openshift control plane for openstack 
becomes adopted. There is a very strong notion of build pipelines, 
reliable containers registries et al.





- Drive the playbook runtime from tripleoclient to bootstrap the container
registry (which of course could be disabled in undercloud.conf).
- Create another Ansible role that would re-use container-check tool but the
idea is to provide a role to modify containers when needed, and we could
also control it from tripleoclient. The role would be using the
ContainerImagePrepare parameter, which Steve is working on right now.

Feedback is welcome, thanks.

[1] All-In-One thread:
http://lists.openstack.org/pipermail/openstack-dev/2018-March/128900.html
[2] Bug report when undercloud is containeirzed
https://bugs.launchpad.net/tripleo/+bug/1762422
[3] Tool to update containers if needed:
https://github.com/imain/container-check
[4] Container-check running in TripleO CI:
https://review.openstack.org/#/c/558885/ and
https://review.openstack.org/#/c/529399/
--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev








--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] roadmap on containers workflow

2018-04-12 Thread Bogdan Dobrelya

ts.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] PTG session about All-In-One installer: recap & roadmap

2018-04-05 Thread Bogdan Dobrelya

>> gather ideas, use cases, needs, before we go design a prototype
in Rocky.
 >>
 >
 > I would like to offer help with initial testing once there is
something in the repos, so count me in!
 >
 > Regards,
 > Javier
 >
 >> Thanks everyone who'll be involved,
 >> --
 >> Emilien Macchi
 >>
 >>
__
 >> OpenStack Development Mailing List (not for usage questions)
 >> Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
 >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 >
 >
__
 > OpenStack Development Mailing List (not for usage questions)
 > Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
 > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA][HA][Eris][LCOO] Validating HA on upstream

2018-03-09 Thread Bogdan Dobrelya


On 3/8/18 6:44 PM, Raoul Scarazzini wrote:

On 08/03/2018 17:03, Adam Spiers wrote:
[...]

Yes agreed again, this is a strong case for collaboration between the
self-healing and QA SIGs.  In Dublin we also discussed the idea of the
self-healing and API SIGs collaborating on the related topic of health
check APIs.


Guys, thanks a ton for your involvement in the topic, I am +1 to any
kind of meeting we can have to discuss this (like it was proposed by


Please count me in as well. I can't stop dreaming of Jepsen's Nemesis 
[0] hammering openstack to make it stronger :D
Jokes off, let's do the best to consolidate on frameworks and tools and 
ditching NIH syndrome!


[0] 
https://github.com/jepsen-io/jepsen/blob/master/jepsen/src/jepsen/nemesis.clj



Adam) so I'll offer my bluejeans channel for whatever kind of meeting we
want to organize.
About the best practices part Georg was mentioning I'm 100% in
agreement, the testing methodologies are the first thing we need to care
about, starting from what we want to achieve.
That said, I'll keep studying Yardstick.

Hope to hear from you soon, and thanks again!




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] upgrading to a containerized undercloud

2018-03-05 Thread Bogdan Dobrelya


On 3/4/18 10:29 PM, Emilien Macchi wrote:

The use case that I'm working on right now is the following:

   As an operator, I would like to upgrade my non-containerized 
undercloud running on Queens to a containerized undercloud running on Rocky.
   Also, I would like to maintain the exact same command to upgrade my 
undercloud, which is: openstack undercloud upgrade (with --use-heat to 
containerize it).


The work has been tracked here:
https://trello.com/c/nFbky9Uk/5-upgrade-support-from-instack-undercloud

But here's an update and some open discussion before we continue to make 
progress.


## Workflow

This is what I've found the easiest to implement and maintain:

1) Update python-tripleoclient-* and tripleo-heat-templates.


Just a note that those need to be installed first, though you have this 
covered 
https://review.openstack.org/#/c/549624/7/tripleoclient/v1/undercloud.py@100



2) Run openstack overcloud container prepare.
3) Run openstack undercloud upgrade --use-heat, that underneath will: 
stop non-containerized services, upgrade all packages & dependencies and 
deploy a containerized undercloud.


As we have discussed and you noted for 
https://review.openstack.org/#/c/549609/ we're better off changing the 
workflow to run the upgrade_tasks so we avoid code duplication.




Note: the data isn't touched, so when the upgrade is done, the 
undercloud is just upgraded to Rocky, and containerized.


## Blockers encountered

1) Passwords were re-generated during the containerization, will be 
fixed by: https://review.openstack.org/#/c/549600/
2) Neutron DB name was different in instack-undercloud. DB will be 
renamed by https://review.openstack.org/#/c/549609/
3) Upgrade logic will live in tripleoclient: 
https://review.openstack.org/#/c/549624/ (note that it's small)


## Testing

I'm using https://review.openstack.org/#/c/549611/ for testing but I'm 
also deploying in my local environment. I've been upgrading Pike to 
Queens successfully, when applying my patches.


## Roadmap

I would like us to solve the containerized undercloud upgrade case by 
rocky-m1, and have by the end of m1 a CI job that actually test the 
operator workflow.



I'll need some feedback, reviews on the proposal & reviews.
Thanks in advance,
--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ci][infra][tripleo] Multi-staged check pipelines for Zuul v3 proposal

2018-03-02 Thread Bogdan Dobrelya


Hello.
As Zuul documentation [0] explains, the names "check", "gate", and 
"post"  may be altered for more advanced pipelines. Is it doable to 
introduce, for particular openstack projects, multiple check 
stages/steps as check-1, check-2 and so on? And is it possible to make 
the consequent steps reusing environments from the previous steps 
finished with?


Narrowing down to tripleo CI scope, the problem I'd want we to solve 
with this "virtual RFE", and using such multi-staged check pipelines, is 
reducing (ideally, de-duplicating) some of the common steps for existing 
CI jobs.


For example, we may want to omit running all of the OVB or multinode 
(non-upgrade) jobs deploying overclouds, when the *undercloud* fails to 
install. This case makes even more sense, when undercloud is deployed 
from the same heat templates (aka Containerized Undercloud) and uses the 
same packages and containers images, as overcloud would do! Or, maybe, 
just stop the world, when tox failed at the step1 and do nothing more, 
as it makes very little sense to run anything else (IMO), if the patch 
can never be gated with a failed tox check anyway...


What I propose here, is to think and discuss, and come up with an RFE, 
either for tripleo, or zuul, or both, of the following scenarios 
(examples are tripleo/RDO CI specific, though you can think of other use 
cases ofc!):


case A. No deduplication, simple multi-staged check pipeline:

* check-1: syntax only, lint/tox
* check-2 : undercloud install with heat and containers
* check-3 : undercloud install with heat and containers, build overcloud 
images (if not multinode job type), deploy overcloud... (repeats OVB 
jobs as is, basically)


case B. Full de-duplication scenario (consequent steps re-use the 
previous steps results, building "on-top"):


* check-1: syntax only, lint/tox
* check-2 : undercloud unstall, reuses nothing from the step1 prolly
* check-3 : build overcloud images, if not multinode job type, extends 
stage 2

* check-4:  deploy overcloud, extends stages 2/3
* check-5: upgrade undercloud, extends stage 2
* check-6: upgrade overcloud, extends stage 4
(looking into future...)
* check-7: deploy openshift/k8s on openstack and do e2e/conformance et 
al, extends either stage 4 or 6


I believe even the simplest 'case A' would reduce the zuul queues for 
tripleo CI dramatically. What do you think folks? See also PTG tripleo 
CI notes [1].


[0] https://docs.openstack.org/infra/zuul/user/concepts.html
[1] https://etherpad.openstack.org/p/tripleo-ptg-ci

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [yaql] [tripleo] Backward incompatible change in YAQL minor version

2018-02-19 Thread Bogdan Dobrelya


On 2/19/18 3:18 PM, Sofer Athlan-Guyot wrote:

Hi,

Emilien Macchi <emil...@redhat.com> writes:


Upgrading YAQL from 1.1.0 to 1.1.3 breaks advanced queries with groupBy
aggregation.

The commit that broke it is
https://github.com/openstack/yaql/commit/3fb91784018de335440b01b3b069fe45dc53e025

It broke TripleO: https://bugs.launchpad.net/tripleo/+bug/1750032
But Alex and I figured (after a strong headache) that we needed to update
the query like this: https://review.openstack.org/545498



This is great, but we still have a pending issue.  Mixed upgrade jobs
are failing from Pike on.  Those are very experimental jobs[1][2] but
the error is present.  The problem being that in mixed version we have
the 1.1.3 yaql version (master undercloud) but not the fix in the
templates which are either N-1 or N-3.

But if we get the fix in previous version, the deployment shouldn't work
anymore as we would not have yaql 1.1.3, but the new syntax.


With a backport of the YAQL fixes for tht made for Pike, would it be the 
full fix to make a backport of yaql 1.1.3 for Pike repos as well? Or am 
I missing something?




It's not only CI which is affected.  Any kind of mixed version operation
would fail as well.

[1] P->Q: 
http://logs.openstack.org/62/545762/3/experimental/tripleo-ci-centos-7-scenario001-multinode-oc-upgrade/afc98a5/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-02-19_09_48_54
[2] Fast Forward Upgrade: 
http://logs.openstack.org/86/525686/55/experimental/tripleo-ci-centos-7-scenario001-multinode-ffu-upgrade/5412555/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz


It would be great to avoid this kind of change within minor versions,
please please.

Happy weekend,

PS: I'm adding YAQL to my linkedin profile right now.
--
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI][QA] Validating HA on upstream

2018-02-16 Thread Bogdan Dobrelya


On 2/16/18 2:59 PM, Raoul Scarazzini wrote:

On 16/02/2018 10:24, Bogdan Dobrelya wrote:
[...]

+1 this looks like a perfect fit. Would it be possible to install that
tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside
the quickstart, then apply destructive-testing playbooks with either the
quickstart's static inventory [0] (from your admin/control node) or
maybe via dynamic inventory [1] (from undercloud managing the overcloud
under test via config-download and/or external ansible deployment
mechanisms)?
[0]
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
[1]
https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory


Hi Bogdan,
thanks for your answer. On the inventory side of things these playbooks
work on any kind of inventory, we're using it at the moment with both
manual and quickstart generated environments, or even infrared ones.
We're able to do it at the same time the environment gets deployed or in
a second time like a day two action.
What is not clear to me is the ansible-galaxy part you're mentioning,
today we rely on the github.com/redhat-openstack git repo, so we clone
it and then launch the playbooks via ansible-playbook command, how do
you see ansible-galaxy into the picture?


Git clone just works as well... Though, I was thinking of some minimal 
integration via *playbooks* (not roles) in 
quickstart/tripleo-validations and *external* roles. So the in-repo 
playbooks will be referencing those external destructive testing roles. 
While the roles are installed with galaxy, like:


$ ansible-galaxy install git+https://$repo_name,master -p 
$external_roles_path


or prolly adding the $repo_name and $release (master or a tag) into some 
galaxy-requirements.yaml file and install from it:


$ ansible-galaxy install --force -r 
quickstart-extras/playbooks/external/galaxy-requirements.yaml -p 
$external_roles_path


Then invoked for quickstart-extras/tripleo-validations like:

$ ansible-playbook -i inventory 
quickstart-extras/playbooks/external/destructive-tests.yaml




Thanks!




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][CI] Validating HA on upstream

2018-02-16 Thread Bogdan Dobrelya


On 2/15/18 8:22 PM, Raoul Scarazzini wrote:

TL;DR: we would like to change the way HA is tested upstream to avoid
being hitten by evitable bugs that the CI process should discover.

Long version:

Today HA testing in upstream consist only in verifying that a three
controllers setup comes up correctly and can spawn an instance. That's
something, but it’s far from being enough since we continuously see "day
two" bugs.
We started covering this more than a year ago in internal CI and today
also on rdocloud using a project named tripleo-quickstart-utils [1].
Apart from his name, the project is not limited to tripleo-quickstart,
it covers three principal roles:

1 - stonith-config: a playbook that can be used to automate the creation
of fencing devices in the overcloud;
2 - instance-ha: a playbook that automates the seventeen manual steps
needed to configure instance HA in the overcloud, test them via rally
and verify that instance HA works;
3 - validate-ha: a playbook that runs a series of disruptive actions in
the overcloud and verifies it always behaves correctly by deploying a
heat-template that involves all the overcloud components;

To make this usable upstream, we need to understand where to put this
code. Here some choices:

1 - tripleo-validations: the most logical place to put this, at least
looking at the name, would be tripleo-validations. I've talked with some
of the folks working on it, and it came out that the meaning of
tripleo-validations project is not doing disruptive tests. Integrating
this stuff would be out of scope.

2 - tripleo-quickstart-extras: apart from the fact that this is not
something meant just for quickstart (the project supports infrared and
"plain" environments as well) even if we initially started there, in the
end, it came out that nobody was looking at the patches since nobody was
able to verify them. The result was a series of reviews stuck forever.
So moving back to extras would be a step backward.

3 - Dedicated project (tripleo-ha-utils or just tripleo-utils): like for
tripleo-upgrades or tripleo-validations it would be perfect having all
this grouped and usable as a standalone thing. Any integration is
possible inside the playbook for whatever kind of test. Today we're


+1 this looks like a perfect fit. Would it be possible to install that 
tripleo-ha-utils/tripleo-quickstart-utils with ansible-galaxy, alongside 
the quickstart, then apply destructive-testing playbooks with either the 
quickstart's static inventory [0] (from your admin/control node) or 
maybe via dynamic inventory [1] (from undercloud managing the overcloud 
under test via config-download and/or external ansible deployment 
mechanisms)?


[0] 
https://git.openstack.org/cgit/openstack/tripleo-quickstart/tree/roles/tripleo-inventory
[1] 
https://git.openstack.org/cgit/openstack/tripleo-validations/tree/scripts/tripleo-ansible-inventory



using the bash framework to interact with the cluster, rally to test
instance-ha and Ansible itself to simulate full power outage scenarios.

There's been a lot of talk about this during the last PTG [2], and
unfortunately, I'll not be part of the next one, but I would like to see
things moving on this side.
Everything I wrote is of course up to discussion, that's precisely the
meaning of this mail.

Thanks to all who'll give advice, suggestions, and thoughts about all
this stuff.

[1] https://github.com/redhat-openstack/tripleo-quickstart-utils
[2] https://etherpad.openstack.org/p/qa-queens-ptg-destructive-testing




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] [neutron] Current containerized neutron agents introduce a significant regression in the dataplane

2018-02-14 Thread Bogdan Dobrelya

concept [0], with 
healthchecks and shared namespaces and logical coupling of sidecars, 
which is the agents and helping daemons running in namespaces. I hope it 
does.


[0] https://kubernetes.io/docs/concepts/workloads/pods/pod/






-Brian


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe:
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
<http://openstack-dev-requ...@lists.openstack.org?subject:unsubscribe>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Updates on the TripleO on Kubernetes work

2017-12-01 Thread Bogdan Dobrelya


On 12/1/17 5:11 PM, Jiří Stránský wrote:

On 21.11.2017 12:01, Jiří Stránský wrote:

Kubernetes on the overcloud
===

The work on this front started with 2[0][1] patches that some of you 
might have
seen and then evolved into using the config download mechanism to 
execute these
tasks as part of the undercloud tasks[2][3] (Thanks a bunch, Jiri, 
for your work
here). Note that [0] needs to be refactored to use the same mechanism 
used in

[2].


For those interested in trying the work we've done on deploying vanilla
Kubernetes, i put together a post showing how to deploy it with OOOQ,
and also briefly explaining the new external_deploy_tasks in service
templates:

https://www.jistr.com/blog/2017-11-21-kubernetes-in-tripleo/


And we've had a first Kubespray deployment success in CI, the job is 
ready to move from experimental to non-voting check [1]. The job doesn't 
yet deploy any pods on that Kubernetes cluster, but it's a step ;)


Well done.
Note that deployed with a netchecker app [0], it puts some pods on that 
cluster, and runs free connectivity (DNS) checks as a bonus. Works even 
better multinode, as it checks N to N connectivity mesh, IIRC.


[0] 
https://github.com/kubernetes-incubator/kubespray/blob/master/docs/netcheck.md




[1] https://review.openstack.org/#/c/524547/





There are quite a few things to improve here:

- How to configure/manage the loadbalancer/vips on the overcloud 
Kubespray is
- currently being cloned and we need to build a package for it More 
CI is likely

- needed for this work

[0] https://review.openstack.org/494470
[1] https://review.openstack.org/471759
[2] https://review.openstack.org/#/c/511272/
[3] https://review.openstack.org/#/c/514730/


__ 


OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] containerized undercloud update

2017-12-01 Thread Bogdan Dobrelya


On 11/30/17 10:36 PM, Wesley Hayutin wrote:

Greetings,

Just wanted to share some progress with the containerized undercloud work.
Ian pushed some of the patches along and we now have a successful 
undercloud install with containers.


The initial undercloud install works [1]
The idempotency check failed where we reinstall the undercloud [2]

Question: Do we expect the reinstallation to work at this point? Should 
the check be turned off?


I will try it w/o the idempotency check, I suspect I will run into 
errors in a full run with an overcloud deployment.  I ran into issues 


I've been deploying this way my dev envs, which is deployed-servers for 
overcloud nodes, like external deployments with configs download. Feel 
free to invite me for some brain storming as well :)


Yeah, and kudos! Well done! I'm happy to see undercloud containers 
working better and getting adopted for CI/devs.


weeks ago.  I suspect if we do hit something it will be CI related as 
Dan Price has been deploying the overcloud for a while now.  Dan I may 
need to review your latest doit.sh scripts to check for diffs in the CI.


Thanks


[1] 
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_install.log.txt.gz
[2] 
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_reinstall.log.txt.gz#_2017-11-30_19_51_26



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] containerized undercloud update

2017-12-01 Thread Bogdan Dobrelya


On 11/30/17 10:36 PM, Wesley Hayutin wrote:

Greetings,

Just wanted to share some progress with the containerized undercloud work.
Ian pushed some of the patches along and we now have a successful 
undercloud install with containers.


The initial undercloud install works [1]
The idempotency check failed where we reinstall the undercloud [2]

Question: Do we expect the reinstallation to work at this point? Should 
the check be turned off?


Yeah, there is a bug for that [0]. Not critical to fix, though nice to 
have for developers. I'm used to deploy with undercloud containers, and 
it's a pain to do a full teardown and reinstall for each change being 
tested.


By the way, somewhat related, I have a PoC for undercloud containers 
all-in-one [1], by quickstart off-road. And a few 'enabler' bug-fixes 
[2],[3],[4], JFYI and review please.


I think all-in-one uc may be useful either for CI, or dev cases. Like 
for those who want to deploy *things* on top of openstack, yet suffering 
from healing devstack and searching alternatives, like packstack et al. 
So they may want to switch suffering from healing tripleo (undercloud 
containers) instead.


[0] https://bugs.launchpad.net/tripleo/+bug/1698349
[1] https://github.com/bogdando/oooq-warp/blob/master/rdocloud-guide.md
[2] https://review.openstack.org/#/c/524114/
[3] https://review.openstack.org/#/c/524133/
[4] https://review.openstack.org/#/c/524187



I will try it w/o the idempotency check, I suspect I will run into 
errors in a full run with an overcloud deployment.  I ran into issues 
weeks ago.  I suspect if we do hit something it will be CI related as 
Dan Price has been deploying the overcloud for a while now.  Dan I may 
need to review your latest doit.sh scripts to check for diffs in the CI.


Thanks


[1] 
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_install.log.txt.gz
[2] 
http://logs.openstack.org/18/518118/6/check/tripleo-ci-centos-7-undercloud-oooq/73115d6/logs/undercloud/home/zuul/undercloud_reinstall.log.txt.gz#_2017-11-30_19_51_26



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] rename ovb jobs?

2017-12-01 Thread Bogdan Dobrelya


On 11/30/17 8:11 PM, Emilien Macchi wrote:

A few months ago, we renamed ovb-updates to be
tripleo-ci-centos-7-ovb-1ctlr_1comp_1ceph-featureset024.
The name is much longer but it describes better what it's doing.
We know it's a job with one controller, one compute and one storage
node, deploying the quickstart featureset n°24.

For consistency, I propose that we rename all OVB jobs this way.
For example, tripleo-ci-centos-7-ovb-ha-oooq would become
tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001
etc.

Any thoughts / feedback before we proceed?
Before someone asks, I'm not in favor of renaming the multinode
scenarios now, because they became quite familiar now, and it would
confuse people to rename the jobs.

Thanks,



I'd like to see featuresets clarified in names as well. Just to bring 
the main message, w/o going into the test matrix details, like 
tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ovn/ceph/k8s/tempest 
whatever it is.


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] IPSEC integration

2017-11-17 Thread Bogdan Dobrelya


On 11/16/17 8:01 AM, Juan Antonio Osorio wrote:

Hello folks!

A few months ago Dan Sneddon and me worked in an ansible role that would 
enable IPSEC for the overcloud [1]. Currently, one would run it as an 
extra step after the overcloud deployment. But, I would like to start 
integrating it to TripleO itself, making it another option, probably as 
a composable service.


For this, I'm planning to move the tripleo-ipsec ansible role repository 
under the TripleO umbrella. Would that be fine with everyone? Or should 
I add this ansible role as part of another repository? After that's 


This looks very similar to Kubespray [0] integration case. I hope that 
external deployments bits can be added without a hard requirement of 
being under the umbrella and packaged in RDO.



I've tried to follow the guide [1] for adding RDO packages and the 
package review [2] and didn't succeed. There should be a simpler 
solution to host a package somewhere outside of RDO, and being able to 
add it for an external deployment managed by tripleo. My 2c.


[0] https://github.com/kubernetes-incubator/kubespray
[1] https://www.rdoproject.org/documentation/add-packages/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1482524

available and packaged in RDO. I'll then look into the actual TripleO 
composable service.


Any input and contributions are welcome!

[1] https://github.com/JAORMX/tripleo-ipsec

--
Juan Antonio Osorio R.
e-mail: jaosor...@gmail.com <mailto:jaosor...@gmail.com>


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Migrating TripleO CI in-tree tomorrow - please README

2017-11-17 Thread Bogdan Dobrelya


On 11/16/17 7:20 PM, Emilien Macchi wrote:

TL;DR: don't approve or recheck any tripleo patch from now, until
further notice on this thread.

Some good progress has been made on migrating legacy tripleo CI jobs
to be in-tree:
https://review.openstack.org/#/q/topic:tripleo/migrate-to-zuulv3

The next steps:
- Let the current gate to finish their jobs running.
- Stop approving patches from now, and wait the gate to be done and cleared
- Alex and I will approve the migration patches tomorrow and we hope
to have them in the gate by Friday afternoon (US time) when gate isn't
busy anymore. We'll also have to backport them all.
- When these patches will be merged (it might take the weekend to
land, depending how the gate will be), we'll run duplicated jobs until
https://review.openstack.org/514778 is merged. I'll try to ping
someone from Infra over the week-end if we can land it, that would be
great.
- Once https://review.openstack.org/514778 is merged, people are free
to do recheck or approve any patches. We hope it should happen over
the weekend.
- I'll continue to migrate all other tripleo projects to have in-tree
layout. On the list: t-p-e, t-i-e, paunch, os-*-config,
tripleo-validations.

Thanks for your help,



Thank you for working on this Emilien! That's well done, and provides a 
good example for future use in other projects as well.


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Upstream LTS Releases

2017-11-15 Thread Bogdan Dobrelya


Thank you Mathieu for the insights!


To add details to what happened:
* Upgrade was never made a #1 priority. It was a one man show for far
too long. (myself)


I suppose that confirms that upgrades is very nice to have in production 
deployments, eventually, maybe... (please read below to continue)



* I also happen to manage and work on other priorities.
* Lot of work made to prepare for multiple versions support in our
deployment tools. (we use Puppet)
* Lot of work in the packaging area to speedup packaging. (we are
still using deb packages but with virtualenv to stay Puppet
compatible)
* We need to forward-port private patches which upstream won't accept
and/or are private business logic.


... yet long time maintaining and landing fixes is the ops' *reality* 
and pain #1. And upgrades are only pain #2. LTS can not directly help 
with #2, but only indirectly, if the vendors' downstream teams could 
better cooperate with #1 and have more time and resources to dedicate 
for #2, upgrades stories for shipped products and distros.


Let's please to not lower the real value of LTS branches and not 
substitute #1 with #2. This topic is not about bureaucracy and policies, 
it is about how could the community help vendors to cooperate over 
maintaining of commodity things, with as less bureaucracy as possible, 
to ease the operators pains in the end.



* Our developer teams didn't have enough free cycles to work right
away on the upgrade. (this means delays)
* We need to test compatibility with 3rd party systems which takes
some time. (and make them compatible)


This confirms perhaps why it is vital to only run 3rd party CI jobs for 
LTS branches?



* We need to update systems ever which we don't have full control.
This means serious delays when it comes to deployment.
* We need to test features/stability during some time in our dev environment.
* We need to test features/stability during some time in our
staging/pre-prod environment.
* We need to announce and inform our users at least 2 weeks in advance
before performing an upgrade.
* We choose to upgrade one service at a time (in all regions) to avoid
a huge big bang upgrade. (this means more maintenance windows to plan
and you can't stack them too much)
* We need to swiftly respond to bug discovered by our users. This
means change of priorities and delay in other service upgrades.
* We will soon need to upgrade operating systems to support latest
OpenStack versions. (this means we have to stop OpenStack upgrades
until all nodes are upgraded)


It seems that the answer to the question sounded, "Why upgrades are so 
painful and take so much time for ops?" is "as upgrades are not the 
priority. Long Time Support and maintenance are".


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Upstream LTS Releases

2017-11-14 Thread Bogdan Dobrelya


The concept, in general, is to create a new set of cores from these
groups, and use 3rd party CI to validate patches. There are lots of
details to be worked out yet, but our amazing UC (User Committee) will
be begin working out the details.


What is the most worrying is the exact "take over" process. Does it mean that 
the teams will give away the +2 power to a different team? Or will our (small) 
stable teams still be responsible for landing changes? If so, will they have to 
learn how to debug 3rd party CI jobs?


Generally, I'm scared of both overloading the teams and losing the control over 
quality at the same time :) Probably the final proposal will clarify it..


The quality of backported fixes is expected to be a direct (and only?) 
interest of those new teams of new cores, coming from users and 
operators and vendors. The more parties to establish their 3rd party 
checking jobs, the better proposed changes communicated, which directly 
affects the quality in the end. I also suppose, contributors from ops 
world will likely be only struggling to see things getting fixed, and 
not new features adopted by legacy deployments they're used to maintain. 
So in theory, this works and as a mainstream developer and maintainer, 
you need no to fear of losing control over LTS code :)


Another question is how to not block all on each over, and not push 
contributors away when things are getting awry, jobs failing and merging 
is blocked for a long time, or there is no consensus reached in a code 
review. I propose the LTS policy to enforce CI jobs be non-voting, as a 
first step on that way, and giving every LTS team member a core rights 
maybe? Not sure if that works though.


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] When to use parameters vs parameter_defaults

2017-11-10 Thread Bogdan Dobrelya


On 11/10/17 10:20 AM, Giulio Fidente wrote:

On 11/26/2015 03:17 PM, Jiří Stránský wrote:

On 26.11.2015 14:12, Jiří Stránský wrote:


[...]


It seems TripleO is hitting similar composability and sanity limits with
the top-down approach, and the number of parameters which can only be
fed via parameter_defaults is increasing. (The disadvantage of
parameter_defaults is that, unlike hiera, we currently have no clear
namespacing rules, which means a higher chance of conflict. Perhaps the
unit tests suggested in another subthread would be a good start, maybe
we could even think about how to do proper namespacing.)


Does what i described seem somewhat accurate? Should we maybe buy into
the concept of "composable templates, externally fed
hierarchy-transcending parameters" for the long term?


I now realized i might have used too generic or Puppetish terms in the
explanation, perhaps drowning the gist of the message a bit :) What i'm
suggesting is: let's consider going with parameter_defaults wherever we
can, for the sake of composability, and figure out what is the best way
to prevent parameter name collisions.

+1 I like very much the idea of parameter_defaults + strictier
namespacing rules

Specifically regarding namespaces, puppet was great but ansible doesn't
seem to be as good (at least to me), in fact I think we have chances for
conflicts in both THT and the ansible playbooks



Tripleo docs should have this explained, like in the ansible docs [1]

[1] 
http://docs.ansible.com/ansible/latest/playbooks_variables.html#variable-precedence-where-should-i-put-a-variable


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] Next steps for pre-deployment workflows (e.g derive parameters)

2017-11-08 Thread Bogdan Dobrelya


On 11/8/17 6:09 AM, Steven Hardy wrote:

Hi all,

Today I had a productive hallway discussion with jtomasek and
stevebaker re $subject, so I wanted to elaborate here for the benefit
of those folks not present.  Hopefully we can get feedback on the
ideas and see if it makes sense to continue and work on some patches:

The problem under discussion is how do we run pre-deployment workflows
(such as those integrated recently to calculate derived parameters,
and in future perhaps also those which download container images etc),
and in particular how do we make these discoverable via the UI
(including any input parameters).

The idea we came up with has two parts:

1. Add a new optional section to roles_data for services that require
pre-deploy workflows

E.g something like this:

  pre_deploy_workflows:
 - derive_params:
   workflow_name:
tripleo.derive_params_formulas.v1.dpdk_derive_params
   inputs:
   ...

This would allow us to associate a specific mistral workflow with a
given service template, and also work around the fact that currently
mistral inputs don't have any schema (only key/value input) as we
could encode the required type and any constraints in the inputs block
(clearly this could be removed in future should typed parameters
become available in mistral).

2. Add a new workflow that calculates the enabled services and returns
all pre_deploy_workflows

This would take all enabled environments, then use heat to validate
the configuration and return the merged resource registry (which will
require https://review.openstack.org/#/c/509760/), then we would
iterate over all enabled services in the registry and extract a given
roles_data key (e.g pre_deploy_workflows)

The result of the workflow would be a list of all pre_deploy_workflows
for all enabled services, which the UI could then use to run the
workflows as part of the pre-deploy process.

If this makes sense I can go ahead and push some patches so we can
iterate on the implementation?


I apologise for a generic/non-techy comment: it would be nice to keep 
required workflows near the services' definition templates, to keep it 
as much self-contained as possible. IIUC, that's covered by #1.
For future steps, I'd like to see all of the "bulk processing" to sit in 
those templates as well.




Thanks,

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] undercloud containers with SELinux Enforcing

2017-11-06 Thread Bogdan Dobrelya

So the rule of thumb I propose is "if a container bind-mounts /run 
(/var/run), make it privileged to not mess with SELinux enforcing". I've 
yet to found better alternatives to allow containers access the host 
sockets.


Additionally, the patch allows developers of t-h-t docker/services to 
not guess and repeat :z flags for generic
/var/lib/, /etc/puppet/, 
/usr/share/openstack-puppet/modules and /var/log/containers/ paths 
for services as the wanted context for those will be configured at the 
deploy steps tasks [0], and the docker-puppet.py tool [1]. That kind of 
follows DRY the best.


I hope that works.

[0] https://review.openstack.org/#/c/513669/11/common/deploy-steps.j2
[1] https://review.openstack.org/#/c/513669/12/docker/docker-puppet.py@277

On 11/6/17 2:49 PM, Bogdan Dobrelya wrote:

Hi.
I've made some progress with containerized undercloud deployment guide
and SELinux enforcing ( the bug [0] and the topic [1] ).

Although I'm now completely stuck [2] with fixing t-h-t's 
docker/services to nail the selinux thing fully, including the 
containerized *overclouds* part. The main issue is to make some of the 
host-path volumes bind-mounted, like /run:/run and /dev:/dev, selinux 
friendly. Any help is appreciated!



Hello folks.
I need your feedback please on SELinux fixes [0] (or rather 
workarounds) for containerized undercloud feature, which is 
experimental in Pike.


[TL;DR] The problem I'm trying to solve is primarily allowing TripleO 
users to follow the guide [1] w/o telling them "please disable SELinux".


Especially, given the note "The undercloud is intended to work 
correctly with SELinux enforcing, and cannot be installed to a system 
with SELinux disabled".


I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see 
[2]) to all of the host paths bind-mounted into containers is not 
secure, and from SELinux perspective allows everything to all 
containers. That could be a first step for docker volumes working w/o 
shutting down SELinux on *hosts* though.


I plan to use the same approach for the t-h-t docker/services 
host-prep tasks as well. Why not using docker's :z :Z directly? IIUC, 
it doesn't allow combine with other mount flags, like :ro:z won't 
work. I look forward for better solutions and ideas!


[0] https://review.openstack.org/#/q/topic:bug/1682179
[1] 
https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html 

[2] 
https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/ 



[0] https://bugs.launchpad.net/tripleo/+bug/1682179
[1] https://review.openstack.org/#/q/topic:bug/1682179
[2] https://review.openstack.org/#/c/517383/





--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] undercloud containers with SELinux Enforcing

2017-11-06 Thread Bogdan Dobrelya


Hi.
I've made some progress with containerized undercloud deployment guide
and SELinux enforcing ( the bug [0] and the topic [1] ).

Although I'm now completely stuck [2] with fixing t-h-t's 
docker/services to nail the selinux thing fully, including the 
containerized *overclouds* part. The main issue is to make some of the 
host-path volumes bind-mounted, like /run:/run and /dev:/dev, selinux 
friendly. Any help is appreciated!



Hello folks.
I need your feedback please on SELinux fixes [0] (or rather workarounds) 
for containerized undercloud feature, which is experimental in Pike.


[TL;DR] The problem I'm trying to solve is primarily allowing TripleO 
users to follow the guide [1] w/o telling them "please disable SELinux".


Especially, given the note "The undercloud is intended to work correctly 
with SELinux enforcing, and cannot be installed to a system with SELinux 
disabled".


I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see 
[2]) to all of the host paths bind-mounted into containers is not 
secure, and from SELinux perspective allows everything to all 
containers. That could be a first step for docker volumes working w/o 
shutting down SELinux on *hosts* though.


I plan to use the same approach for the t-h-t docker/services host-prep 
tasks as well. Why not using docker's :z :Z directly? IIUC, it doesn't 
allow combine with other mount flags, like :ro:z won't work. I look 
forward for better solutions and ideas!


[0] https://review.openstack.org/#/q/topic:bug/1682179
[1] 
https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html
[2] 
https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/


[0] https://bugs.launchpad.net/tripleo/+bug/1682179
[1] https://review.openstack.org/#/q/topic:bug/1682179
[2] https://review.openstack.org/#/c/517383/


--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-11-06 Thread Bogdan Dobrelya


On 11/6/17 1:01 AM, Emilien Macchi wrote:

On Mon, Oct 2, 2017 at 5:02 AM, Dan Prince <dpri...@redhat.com> wrote:
[...]


  -CI resources: better use of CI resources. At the PTG we received
feedback from the OpenStack infrastructure team that our upstream CI
resource usage is quite high at times (even as high as 50% of the
total). Because of the shared framework and single node capabilities we
can re-architecture much of our upstream CI matrix around single node.
We no longer require multinode jobs to be able to test many of the
services in tripleo-heat-templates... we can just use a single cloud VM
instead. We'll still want multinode undercloud -> overcloud jobs for
testing things like HA and baremetal provisioning. But we can cover a
large set of the services (in particular many of the new scenario jobs
we added in Pike) with single node CI test runs in much less time.


After the last (terrible) weeks in CI, it's pretty clear we need to
find a solution to reduce and optimize our testing.
I'm now really convinced by switching our current scenarios jobs to
NOT deploy the overcloud, and just an undercloud with composable
services & run tempest.


+1
And we should start using the quickstart-extras undercloud-reploy role 
for that.




Benefits:
- deploy 1 node instead of 2 nodes, so we save nodepool resources
- faster (no overcloud)
- reduce gate queue time, faster development process, faster CI

Challenges:
- keep overcloud testing, with OVB
- reduce OVB to strict minimum: Ironic, Nova, Mistral and basic
containerized services on overcloud.

I really want to get consensus on these points, please raise your
voice now before we engage some work on that front.

[...]




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tripleo][containers] Please always do mirrored hiera changes for (non)containerized cases

2017-11-02 Thread Bogdan Dobrelya


Hi folks.
When changing hiera defaults for the instack repo or elsewhere, please 
do a "mirrored" change requests to make sure things are consistent for 
containerized overclouds, undercloud containers and those "golden 
images" that instack produces in the end, IIUC.


Please take a look a perfect example with all mirrors all-right [0].
Few more examples for (probably) missing mirrored changes [1], [2].

PS. I wonder how could we automate the check for "missing mirrors" in 
hiera data living instack vs other places?


[0] https://review.openstack.org/#/q/topic:bug/1729293
[1] https://review.openstack.org/#/c/515123/
[2] https://review.openstack.org/#/c/516990/

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Logging format: let's discuss a bit about default format, format configuration and so on

2017-11-02 Thread Bogdan Dobrelya

Hi. Few comments inline.

Dear Stackers,
While working on my locally deployed Openstack (Pike using TripleO), I
was a bit struggling with the logging part. Currently, all logs are
pushed to per-service files, in the standard format "one line per
entry", plain flat text.

It's nice, but if one is wanting to push and index those logs in an ELK,
the current, default format isn't really good.

After some discussions about oslo.log, it appears it provides a nice
JSONFormatter handler¹ one might want to use for each (python) service
using oslo central library.

A JSON format is really cool, as it's easy to parse for machines, and it
can be on a multi-line without any bit issue - this is especially
important for stack traces, as their output is multi-line without real
way to have a common delimiter so that we can re-format it and feed it
to any log parser (logstash, fluentd, …).

After some more talks, olso.log will not provide a unified interface in
order to output all received logs as JSON - this makes sens, as that
would mean "rewrite almost all the python logging management
interface"², and that's pretty useless, since (all?) services have their
own "logging.conf" file.

That said… to the main purpose of that mail:

- Default format for logs
A first question would be "are we all OK with the default output format"
- I'm pretty sure "humans" are happy with that, as it's really
convenient to read and grep. But on a "standard" Openstack deploy, I'm
pretty sure one does not have only one controller, one ceph node and one
compute. Hence comes the log centralization, and with that, the log
indexation and treatments.

For that, one might argue "I'm using plain files on my logger, and
grep-ing -r in them". That's a way to do things, and for that, plain,
flat logs are great.

But… I'm pretty sure I'm not the only one wanting to use some kind of
ELK cluster for that kind of purpose. So the right question is: what
about switching the default log format to JSON? On my part, I don't see
"cons", only "pros", but my judgment is of course biased, as I'm "alone
in my corner". But what about you, Community?

- Provide a way to configure the output format/handler
While poking around in the puppet modules code, I didn't find any way to
set the output handler for the logs. For example, in puppet-nova³ we can
set a lot of things, but not the useful handler for the output.

It would be really cool to get, for each puppet module, the capability
to set the handler so that one can just push some stuff in hiera, and
Voilà, we have JSON logs.

Doing so would allow people to chose between the default (current)
output, and something more "computable".

This is being implemented (for Q) in the scope of this blueprint [0].
By default, containers will keep the current logging behaviour, which is
writing logs into bind-mounted host-path directories, and for those
services that can syslog only, sending the logs to the host journald via
syslog (bind-mounted /dev/log). And if the stdout/stderr feature
enabled, side-car containers will make sure everything is captured in
journald and tagged properly, with multiline entries as well. And
journald can dump messages as JSON as well. I hope that works! Please
send you comments to the aforementioned bp's spec.

[0] https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog

Of course, either proposal will require a nice code change in all puppet
modules (add a new parameter for the foo::logging class, and use that
new param in the configuration file, and so on), but at least people
will be able to actually chose.

So, before opening an issue on each launchpad project (that would be…
long), I'd rather open the discussion in here and, eventually, come to
some nice, acceptable and accepted solution that would make the
Openstack Community happy :).

Any thoughts?

Thank you for your attention, feedback and wonderful support for that
monster project :).

Cheers,

¹
https://github.com/openstack/oslo.log/blob/master/oslo_log/formatters.py#L166-L235
²
http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2017-11-01.log.html#t2017-11-01T13:23:14
³ https://github.com/openstack/puppet-nova/blob/master/manifests/logging.pp

--
Cédric Jeanneret
Senior Linux System Administrator
Infrastructure Solutions

Camptocamp SA
PSE-A / EPFL, 1015 Lausanne

Phone: +41 21 619 10 32
Office: +41 21 619 10 02
Email: cedric.jeanne...@camptocamp.com

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Re: [openstack-dev] [TripleO][containers]ironic-inspector

2017-11-02 Thread Bogdan Dobrelya


On 11/1/17 7:04 PM, milanisko k wrote:

Folks,
=

I've got a dilemma right now about how to proceed with containerising 
ironic-inspector:


Fat Container
--
put ironic-inspector and dnsmasq into a single container i.e consider a 
container as a complete inspection service shipping unit, use 
supervisord inside to fork and monitor both the services.


Pros:

* decoupling: dnsmasq of inspector isn't used by any other service which 
makes our life simpler as we won't need to reset dnsmasq configuration 
in case inspector died (to avoid exposing an unfiltered DHCP service)


* we can use dnsmasq filter (an on-line configuration files updating 
facility) driver to limit access to dnsmasq instead of iptables, in a 
self-contained "package" that is configured to work together as a single 
unit


* we don't have to worry about always scheduling dnsmasq and inspector 
containers on a single node (both services are bundled)


* we have a *Spine-Leaf deployment capable & containerised undercloud*

* an *HA capable inspector* service to be reused in overcloud

* an integrated solution, tested to work by upstream CI in inspector 
(compatibility, versioning, configuration,...)


Cons:

* inflexibility: container has to be rebuilt to be used with different 
DHCP service (filter driver)


* supervisord dependency and the need to refactor current container of 
inspector


* 

Flat Container
---

Put inspector and dnsmasq into separate containers. Use the (current) 
iptables driver to protect dnsmasq. IIRC this is the current approach.


Pros:

* containerised undercloud

Cons:

* no decoupling of dnsmasq

* no spine-leaf (iptables)

* containers have to be scheduled together on a single node,

* no HA (iptables driver)

* container won't be cool for overcloud as it won't be HA

Flat container with dnsmasq filter driver


Same as above but iptables isn't used anymore. Since it's not the 
current approach, we'd have to reshape the containers of dnsmasq and 
inspector to expose each others configuration so that inspector can 
write dnsmasq configuration on the fly (does inotify work in the mounted 
directories case???)


Pros:

* containerised undercloud

* Spine-Leaf

Cons:

* No (easy) HA (dnsmasq would be exposed in case inspector died)


Could it be managed by pacemaker bundles then?



* No decoupling of dnsmasq (shared between multiple services)


A dedicated side-car container can be used, just like the logging bp [0] 
implements it. So nothing will be shared then.


[0] https://blueprints.launchpad.net/tripleo/+spec/logging-stdout-rsyslog



* containers to be reshaped to expose the configuration


Seems like this is inevitable anyway



* overcloud-uncool container (lack of HA)


Could it be managed by pacemaker bundles then?



No Container
--

we ship inspector as a service and configure dnsmasq for inspector to be 
shut down in case inspector dies (to prevent exposing an unfiltered DHCP 
service interference). We use dnsmasq (configuration) filter driver to 
have a Spine-Leaf deployment capable undercloud.


Pros:

* Spine

Cons:

* no HA inspector (shared dnsmasq?)

* no containers

* no reusable container for overcloud

* we'd have to update the undercloud systemd to shut down dnsmasq in 
case inspector dies if we want to use the dnsmasq filter driver


* no decoupling

The Question
--

What is your take on it?

Cheers,
milan


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Blocking gate - do not recheck / rebase / approve any patch now (please)

2017-10-26 Thread Bogdan Dobrelya


Thank you for working on this!
I know it is needed to unblock development of tripleo. I have though a 
few comments inline.


On 10/26/17 6:14 AM, Emilien Macchi wrote:

On Wed, Oct 25, 2017 at 1:59 PM, Emilien Macchi <emil...@redhat.com> wrote:

Quick update before being afk for some hours:

- Still trying to land https://review.openstack.org/#/c/513701 (thanks
Paul for promoting it in gate).


Landed.


- Disabling voting on scenario001 and scenario004 container jobs:
https://review.openstack.org/#/c/515188/


Done, please be very careful while these jobs are not voting.
If any doubt, please ping me or fultonj or gfidente on #tripleo.


- overcloudrc/keystone v2 workaround:
https://review.openstack.org/#/c/515161/ (d0ugal will work on proper
fix for https://bugs.launchpad.net/tripleo/+bug/1727454)


Merged - Dougal will work on the real fix this week but not urgent anymore.


- Fixing zaqar/notification issues on
https://review.openstack.org/#/c/515123 - we hope that helps to reduce
some failures in gate


In gate right now and hopefully merged in less than 2 hours.
Otherwise, please keep rechecking it.
According to Thomas Hervé, il will reduce the change to timeout.


- puppet-tripleo gate broken on stable branches (syntax jobs not
running properly) - jeblair is looking at it now


jeblair will provide a fix hopefully this week but this is not
critical at this time.
Thanks Jim for your help.


Once again, we'll need to retrospect and see why we reached that
terrible state but let's focus on bringing our CI in a good shape
again.
Thanks a ton to everyone who is involved,


I'm now restoring all patches that I killed from the gate.
You can now recheck / rebase / approve what you want, but please save
our CI resources and do it with moderation. We are not done yet.

I won't call victory but we've merged almost all our blockers, one is
missing but currently in gate:
https://review.openstack.org/515123 - need babysit until merged.


I have to warn tripleo folks about any instack-only changes these days.
Please make sure each instack-only change, like Hiera overrides, has 
follow-up patches for containerized cases as well, which do not use 
instack. Otherwise, we're putting the whole containers thing under high 
risk to keep in place the regressions fixed for non-containers. That is 
dangerous, given that we disable voting for it from time to time.


For this particular case, please add it in a separate review in 
puppet/services/zaqar*. Thanks @bandini for confirming that on IRC.




Now let's see how RDO promotion works. We're close :-)

Thanks everyone,


On Wed, Oct 25, 2017 at 7:25 AM, Emilien Macchi <emil...@redhat.com> wrote:

Status:

- Heat Convergence switch *might* be a reason why overcloud timeout so
much. Thomas proposed to disable it:
https://review.openstack.org/515077
- Every time a patch fails in the tripleo gate queue, it reset the
gate. I proposed to remove this common queue:
https://review.openstack.org/515070
- I cleared the patches in check and queue to make sure the 2 blockers
are tested and can be merged in priority. I'll keep an eye on it
today.

Any help is very welcome.

On Wed, Oct 25, 2017 at 5:58 AM, Emilien Macchi <emil...@redhat.com> wrote:

We have been working very hard to get a package/container promotions
(since 44 days) and now our blocker is
https://review.openstack.org/#/c/513701/.

Because the gate queue is huge, we decided to block the gate and kill
all the jobs running there until we can get
https://review.openstack.org/#/c/513701/ and its backport
https://review.openstack.org/#/c/514584 (both are blocking the whole
production chain).
We hope to promote after these 2 patches, unless there is something
else, in that case we would iterate to the next problem.

We hope you understand and support us during this effort.
So please do not recheck, rebase or approve any patch until further notice.

Thank you,
--
Emilien Macchi




--
Emilien Macchi




--
Emilien Macchi







--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tripleo] undercloud containers with SELinux Enforcing

2017-10-23 Thread Bogdan Dobrelya

Hello folks.
I need your feedback please on SELinux fixes [0] (or rather workarounds)
for containerized undercloud feature, which is experimental in Pike.

[TL;DR] The problem I'm trying to solve is primarily allowing TripleO
users to follow the guide [1] w/o telling them "please disable SELinux".

Especially, given the note "The undercloud is intended to work correctly
with SELinux enforcing, and cannot be installed to a system with SELinux
disabled".

I understand that putting "chcon -Rt svirt_sandbox_file_t -l s0" (see
[2]) to all of the host paths bind-mounted into containers is not
secure, and from SELinux perspective allows everything to all
containers. That could be a first step for docker volumes working w/o
shutting down SELinux on *hosts* though.

I plan to use the same approach for the t-h-t docker/services host-prep
tasks as well. Why not using docker's :z :Z directly? IIUC, it doesn't
allow combine with other mount flags, like :ro:z won't work. I look
forward for better solutions and ideas!

[0] https://review.openstack.org/#/q/topic:bug/1682179
[1]
https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/undercloud.html
[2]
https://www.projectatomic.io/blog/2015/06/using-volumes-with-docker-can-cause-problems-with-selinux/

--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

Re: [openstack-dev] [tc] [stable] [tripleo] [kolla] [ansible] [puppet] Proposing changes in stable policy for installers

2017-10-17 Thread Bogdan Dobrelya

On 10/17/17 12:50 AM, Michał Jastrzębski wrote:
> So my 0.02$
> 
> Problem with handling Newton goes beyond deployment tools. Yes, it's
> popular to use, but if our dependencies (openstack services
> themselves) are unmaintained, so should we. If we say "we support
> Newton" in deployment tools, we make kind of promise we can't keep. If
> for example there is CVE in Nova that affects Newton, there is nothing
> we can do about it and our "support" is meaningless.
> 
> Not having LTS kind of model was issue for OpenStack operators
> forever, but that's not problem we can solve in deployment tools
> (although we are often asked for that because our communities are
> largely operators and we're arguably closest projects to operators).
> 
> I, for one, think we should keep current stable policy, not make
> exception for deployment tools, and address this issue across the
> board. What Emilien is describing is real issue that hurts operators.

I agree we should keep current stable policy and never backport
features. It adds too much of the maintenance overhead, high risks of
breaking changes and requires a lots of additional testing. So then
deployment tools, if want features backported, should not be holding
that stable policy tag.

> 
> On 16 October 2017 at 15:38, Emilien Macchi <emil...@redhat.com> wrote:
>> On Mon, Oct 16, 2017 at 4:27 AM, Thierry Carrez <thie...@openstack.org> 
>> wrote:
>>> Emilien Macchi wrote:
>>>> [...]
>>>> ## Proposal
>>>>
>>>> Proposal 1: create a new policy that fits for projects like installers.
>>>> I kicked-off something here: https://review.openstack.org/#/c/511968/
>>>> (open for feedback).
>>>> Content can be read here:
>>>> http://docs-draft.openstack.org/68/511968/1/check/gate-project-team-guide-docs-ubuntu-xenial/1a5b40e//doc/build/html/stable-branches.html#support-phases
>>>> Tag created here: https://review.openstack.org/#/c/511969/ (same,
>>>> please review).
>>>>
>>>> The idea is really to not touch the current stable policy and create a
>>>> new one, more "relax" that suits well for projects like installers.
>>>>
>>>> Proposal 2: change the current policy and be more relax for projects
>>>> like installers.
>>>> I haven't worked on this proposal while it was something I was
>>>> considering doing first, because I realized it could bring confusion
>>>> in which projects actually follow the real stable policy and the ones
>>>> who have exceptions.
>>>> That's why I thought having a dedicated tag would help to separate them.
>>>>
>>>> Proposal 3: no change anywhere, projects like installer can't claim
>>>> stability etiquette (not my best option in my opinion).
>>>>
>>>> Anyway, feedback is welcome, I'm now listening. If you work on Kolla,
>>>> TripleO, OpenStack-Ansible, PuppetOpenStack (or any project who has
>>>> this need), please get involved in the review process.
>>>
>>> My preference goes to proposal 1, however rather than call it "relaxed"
>>> I would make it specific to deployment/lifecycle or cycle-trailing
>>> projects.
>>>
>>> Ideally this policy could get adopted by any such project. The
>>> discussion started on the review and it's going well, so let's see where
>>> it goes :)
>>
>> Thierry, when I read your comment on Gerrit I understand you prefer to
>> amend the existing policy and just make a note for installers (which
>> is I think the option #2 that I proposed). Can you please confirm
>> that?
>> So far I see option #1 has large consensus here, I'll wait for
>> Thierry's answer to continue to work on it.
>>
>> Thanks for the feedback so far!
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO] containerized undercloud in Queens

2017-10-16 Thread Bogdan Dobrelya

On 10/3/17 10:46 PM, Dan Prince wrote:
> 
> 
> This reduces our complexity greatly I think in that once it is completed
> will allow us to eliminate two project (instack and instack-undercloud)
> and the maintenance thereof. Furthermore, as this dovetails nice with
> the Ansible
>  
> 
>  IMHO doit.sh is not acceptable as an undercloud installer and
> this is what I've been trying to point out as the actual impact to the
> end user who has to use this thing.
> 
> 
> doit.sh is an example of where the effort is today. It is essentially
> the same stuff we document online
> here: http://tripleo.org/install/containers_deployment/undercloud.html.
> 
> Similar to quickstart it is just something meant to help you setup a dev
> environment.

Please note that quickstart can "doit.sh" [0] as well and even more :)
Although the undercloud_deploy role needs to be supported in the
quickstart.sh maybe, and its documentation [1] should be explaining the
case better.

Undercloud_install_script and the script template itself fully addresses
the needed flexibility for developers and operators. You can git clone /
pip install things, or do not.

It follows the standard quickstart way, which is jinja-templating bash
scripts in order to provide an operator-ish friendly, and independent
from Ansible et al, way to reproduce the scripted steps. And helps to
auto-generate documentation as well.

[0]
https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/tree/roles/undercloud-deploy/README.md
[1] https://docs.openstack.org/tripleo-quickstart/latest/

-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Install Kubernetes in the overcloud using TripleO

2017-09-20 Thread Bogdan Dobrelya

On 08.06.2017 18:36, Flavio Percoco wrote:
> Hey y'all,
> 
> Just wanted to give an updated on the work around tripleo+kubernetes.
> This is
> still far in the future but as we move tripleo to containers using
> docker-cmd,
> we're also working on the final goal, which is to have it run these
> containers
> on kubernetes.
> 
> One of the first steps is to have TripleO install Kubernetes in the
> overcloud
> nodes and I've moved forward with this work:
> 
> https://review.openstack.org/#/c/471759/
> 
> The patch depends on the `ceph-ansible` work and it uses the
> mistral-ansible
> action to deploy kubernetes by leveraging kargo. As it is, the patch
> doesn't
> quite work as it requires some files to be in some places (ssh keys) and a
> couple of other things. None of these "things" are blockers as in they
> can be
> solved by just sending some patches here and there.
> 
> I thought I'd send this out as an update and to request some early
> feedback on
> the direction of this patch. The patch, of course, works in my local
> environment
> ;)

Note that Kubespray (former Kargo) now supports the kubeadm tool
natively [0]. This speeds up a cluster bootstrapping from an average
25-30 min to a 9 or so. I believe this makes Kubespray a viable option
for upstream development of OpenStack overclouds managed by K8s.
Especially, bearing in mind the #deployment-time effort and all the hard
work work done by tripleo and infra teams in order to shorten the CI
jobs time.

By the way, here is a package review [1] for adding a kubespray-ansible
library, just ansible roles and playbooks, to RDO. I'd appreciate some
help with moving this forward, like choosing another place to host the
package, it got stuck a little bit.

[0] https://github.com/kubernetes-incubator/kubespray/issues/553
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1482524

> 
> Flavio
> 
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

1 2 3 >

1 - 100 of 273 matches

Mail list logo