Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-12-03 Thread Bogdan Dobrelya

On 12/3/18 10:34 AM, Bogdan Dobrelya wrote:

Hi Kevin.
Puppet not only creates config files but also executes a service 
dependent steps, like db sync, so neither '[base] -> [puppet]' nor 
'[base] -> [service]' would not be enough on its own. That requires some 
services specific code to be included into *config* images as well.


PS. There is a related spec [0] created by Dan, please take a look and 
propose you feedback


[0] https://review.openstack.org/620062


I'm terribly sorry, but that's a corrected link [0] to that spec.

[0] https://review.openstack.org/620909



On 11/30/18 6:48 PM, Fox, Kevin M wrote:

Still confused by:
[base] -> [service] -> [+ puppet]
not:
[base] -> [puppet]
and
[base] -> [service]
?

Thanks,
Kevin

From: Bogdan Dobrelya [bdobr...@redhat.com]
Sent: Friday, November 30, 2018 5:31 AM
To: Dan Prince; openstack-dev@lists.openstack.org; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of 
containers for security and size of images (maintenance) sakes


On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a
full blown spec. Just changing RPM deps and layers for containers
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted,
it should be working as of fedora28(or 29) I hope)

[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-12-03 Thread Bogdan Dobrelya

Hi Kevin.
Puppet not only creates config files but also executes a service 
dependent steps, like db sync, so neither '[base] -> [puppet]' nor 
'[base] -> [service]' would not be enough on its own. That requires some 
services specific code to be included into *config* images as well.


PS. There is a related spec [0] created by Dan, please take a look and 
propose you feedback


[0] https://review.openstack.org/620062

On 11/30/18 6:48 PM, Fox, Kevin M wrote:

Still confused by:
[base] -> [service] -> [+ puppet]
not:
[base] -> [puppet]
and
[base] -> [service]
?

Thanks,
Kevin

From: Bogdan Dobrelya [bdobr...@redhat.com]
Sent: Friday, November 30, 2018 5:31 AM
To: Dan Prince; openstack-dev@lists.openstack.org; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a
full blown spec. Just changing RPM deps and layers for containers
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted,
it should be working as of fedora28(or 29) I hope)

[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that we don't install Systemd is the biggest win.

What confuses me is why are there still patches posted to move Puppet
out of the base layer when we agr

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Fox, Kevin M
Still confused by:
[base] -> [service] -> [+ puppet]
not:
[base] -> [puppet]
and
[base] -> [service]
?

Thanks,
Kevin

From: Bogdan Dobrelya [bdobr...@redhat.com]
Sent: Friday, November 30, 2018 5:31 AM
To: Dan Prince; openstack-dev@lists.openstack.org; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 11/30/18 1:52 PM, Dan Prince wrote:
> On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
>> On 11/29/18 6:42 PM, Jiří Stránský wrote:
>>> On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
>>>> On 11/28/18 6:02 PM, Jiří Stránský wrote:
>>>>> 
>>>>>
>>>>>> Reiterating again on previous points:
>>>>>>
>>>>>> -I'd be fine removing systemd. But lets do it properly and
>>>>>> not via 'rpm
>>>>>> -ev --nodeps'.
>>>>>> -Puppet and Ruby *are* required for configuration. We can
>>>>>> certainly put
>>>>>> them in a separate container outside of the runtime service
>>>>>> containers
>>>>>> but doing so would actually cost you much more
>>>>>> space/bandwidth for each
>>>>>> service container. As both of these have to get downloaded to
>>>>>> each node
>>>>>> anyway in order to generate config files with our current
>>>>>> mechanisms
>>>>>> I'm not sure this buys you anything.
>>>>>
>>>>> +1. I was actually under the impression that we concluded
>>>>> yesterday on
>>>>> IRC that this is the only thing that makes sense to seriously
>>>>> consider.
>>>>> But even then it's not a win-win -- we'd gain some security by
>>>>> leaner
>>>>> production images, but pay for it with space+bandwidth by
>>>>> duplicating
>>>>> image content (IOW we can help achieve one of the goals we had
>>>>> in mind
>>>>> by worsening the situation w/r/t the other goal we had in
>>>>> mind.)
>>>>>
>>>>> Personally i'm not sold yet but it's something that i'd
>>>>> consider if we
>>>>> got measurements of how much more space/bandwidth usage this
>>>>> would
>>>>> consume, and if we got some further details/examples about how
>>>>> serious
>>>>> are the security concerns if we leave config mgmt tools in
>>>>> runtime
>>>>> images.
>>>>>
>>>>> IIRC the other options (that were brought forward so far) were
>>>>> already
>>>>> dismissed in yesterday's IRC discussion and on the reviews.
>>>>> Bin/lib bind
>>>>> mounting being too hacky and fragile, and nsenter not really
>>>>> solving the
>>>>> problem (because it allows us to switch to having different
>>>>> bins/libs
>>>>> available, but it does not allow merging the availability of
>>>>> bins/libs
>>>>> from two containers into a single context).
>>>>>
>>>>>> We are going in circles here I think
>>>>>
>>>>> +1. I think too much of the discussion focuses on "why it's bad
>>>>> to have
>>>>> config tools in runtime images", but IMO we all sorta agree
>>>>> that it
>>>>> would be better not to have them there, if it came at no cost.
>>>>>
>>>>> I think to move forward, it would be interesting to know: if we
>>>>> do this
>>>>> (i'll borrow Dan's drawing):
>>>>>
>>>>>> base container| --> |service container| --> |service
>>>>>> container w/
>>>>> Puppet installed|
>>>>>
>>>>> How much more space and bandwidth would this consume per node
>>>>> (e.g.
>>>>> separately per controller, per compute). This could help with
>>>>> decision
>>>>> making.
>>>>
>>>> As I've already evaluated in the related bug, that is:
>>>>
>>>> puppet-* modules and manifests ~ 16MB
>>>> puppet with dependencies ~61MB
>>>> dependencies of the seemingly largest a dependency, systemd
>>>> ~190MB
>>>>
>>>> that would be an extra layer size fo

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Bogdan Dobrelya

On 11/30/18 1:52 PM, Dan Prince wrote:

On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:




Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and
not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can
certainly put
them in a separate container outside of the runtime service
containers
but doing so would actually cost you much more
space/bandwidth for each
service container. As both of these have to get downloaded to
each node
anyway in order to generate config files with our current
mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded
yesterday on
IRC that this is the only thing that makes sense to seriously
consider.
But even then it's not a win-win -- we'd gain some security by
leaner
production images, but pay for it with space+bandwidth by
duplicating
image content (IOW we can help achieve one of the goals we had
in mind
by worsening the situation w/r/t the other goal we had in
mind.)

Personally i'm not sold yet but it's something that i'd
consider if we
got measurements of how much more space/bandwidth usage this
would
consume, and if we got some further details/examples about how
serious
are the security concerns if we leave config mgmt tools in
runtime
images.

IIRC the other options (that were brought forward so far) were
already
dismissed in yesterday's IRC discussion and on the reviews.
Bin/lib bind
mounting being too hacky and fragile, and nsenter not really
solving the
problem (because it allows us to switch to having different
bins/libs
available, but it does not allow merging the availability of
bins/libs
from two containers into a single context).


We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad
to have
config tools in runtime images", but IMO we all sorta agree
that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we
do this
(i'll borrow Dan's drawing):


base container| --> |service container| --> |service
container w/

Puppet installed|

How much more space and bandwidth would this consume per node
(e.g.
separately per controller, per compute). This could help with
decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd
~190MB

that would be an extra layer size for each of the container
images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in
sizes
as follows. I think the crucial point here is the layering. If we
do
this image layering:


base| --> |+ service| --> |+ Puppet|


we'd drop ~267 MB from base image, but we'd be installing that to
the
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be
267-190MB, so the math below would be looking much better


Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?


Perhaps it would be. But honestly, I see nothing biggie to require a 
full blown spec. Just changing RPM deps and layers for containers 
images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, 
it should be working as of fedora28(or 29) I hope)


[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672




It seems like the general consenses is that cleaning up some of the RPM
dependencies so that we don't install Systemd is the biggest win.

What confuses me is why are there still patches posted to move Puppet
out of the base layer when we agree moving it out of the base layer
would actually cause our resulting container image set to be larger in
size.

Dan





In my basic deployment, undercloud seems to have 17 "components"
(49
containers), overcloud controller 15 components (48 containers),
and
overcloud compute 4 components (7 containers). Accounting for
overlaps,
the total number of "components" used seems to be 19. (By
"components"
here i mean whatever uses a different ConfigImage than other
services. I
just eyeballed it but i think i'm not too far off the correct
number.)

So we'd subtract 267 MB from base image and add that to 19 leaf
images
used in this deployment. That means difference of +4.8 GB to the
current
image sizes. My /var/lib/registry dir on undercloud with all the
images
currently has 5.1 GB. We'd almost double that to 9.9 GB.

Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the
CDNs
(both external and e.g. internal within OpenStack Infra CI clouds).

And for 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Dan Prince
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
> On 11/29/18 6:42 PM, Jiří Stránský wrote:
> > On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
> > > On 11/28/18 6:02 PM, Jiří Stránský wrote:
> > > > 
> > > > 
> > > > > Reiterating again on previous points:
> > > > > 
> > > > > -I'd be fine removing systemd. But lets do it properly and
> > > > > not via 'rpm
> > > > > -ev --nodeps'.
> > > > > -Puppet and Ruby *are* required for configuration. We can
> > > > > certainly put
> > > > > them in a separate container outside of the runtime service
> > > > > containers
> > > > > but doing so would actually cost you much more
> > > > > space/bandwidth for each
> > > > > service container. As both of these have to get downloaded to
> > > > > each node
> > > > > anyway in order to generate config files with our current
> > > > > mechanisms
> > > > > I'm not sure this buys you anything.
> > > > 
> > > > +1. I was actually under the impression that we concluded
> > > > yesterday on
> > > > IRC that this is the only thing that makes sense to seriously
> > > > consider.
> > > > But even then it's not a win-win -- we'd gain some security by
> > > > leaner
> > > > production images, but pay for it with space+bandwidth by
> > > > duplicating
> > > > image content (IOW we can help achieve one of the goals we had
> > > > in mind
> > > > by worsening the situation w/r/t the other goal we had in
> > > > mind.)
> > > > 
> > > > Personally i'm not sold yet but it's something that i'd
> > > > consider if we
> > > > got measurements of how much more space/bandwidth usage this
> > > > would
> > > > consume, and if we got some further details/examples about how
> > > > serious
> > > > are the security concerns if we leave config mgmt tools in
> > > > runtime 
> > > > images.
> > > > 
> > > > IIRC the other options (that were brought forward so far) were
> > > > already
> > > > dismissed in yesterday's IRC discussion and on the reviews.
> > > > Bin/lib bind
> > > > mounting being too hacky and fragile, and nsenter not really
> > > > solving the
> > > > problem (because it allows us to switch to having different
> > > > bins/libs
> > > > available, but it does not allow merging the availability of
> > > > bins/libs
> > > > from two containers into a single context).
> > > > 
> > > > > We are going in circles here I think
> > > > 
> > > > +1. I think too much of the discussion focuses on "why it's bad
> > > > to have
> > > > config tools in runtime images", but IMO we all sorta agree
> > > > that it
> > > > would be better not to have them there, if it came at no cost.
> > > > 
> > > > I think to move forward, it would be interesting to know: if we
> > > > do this
> > > > (i'll borrow Dan's drawing):
> > > > 
> > > > > base container| --> |service container| --> |service
> > > > > container w/
> > > > Puppet installed|
> > > > 
> > > > How much more space and bandwidth would this consume per node
> > > > (e.g.
> > > > separately per controller, per compute). This could help with
> > > > decision
> > > > making.
> > > 
> > > As I've already evaluated in the related bug, that is:
> > > 
> > > puppet-* modules and manifests ~ 16MB
> > > puppet with dependencies ~61MB
> > > dependencies of the seemingly largest a dependency, systemd
> > > ~190MB
> > > 
> > > that would be an extra layer size for each of the container
> > > images to be
> > > downloaded/fetched into registries.
> > 
> > Thanks, i tried to do the math of the reduction vs. inflation in
> > sizes 
> > as follows. I think the crucial point here is the layering. If we
> > do 
> > this image layering:
> > 
> > > base| --> |+ service| --> |+ Puppet|
> > 
> > we'd drop ~267 MB from base image, but we'd be installing that to
> > the 
> > topmost level, per-component, right?
> 
> Given we detached systemd from puppet, cronie et al, that would be 
> 267-190MB, so the math below would be looking much better

Would it be worth writing a spec that summarizes what action items are
bing taken to optimize our base image with regards to the systemd?

It seems like the general consenses is that cleaning up some of the RPM
dependencies so that we don't install Systemd is the biggest win.

What confuses me is why are there still patches posted to move Puppet
out of the base layer when we agree moving it out of the base layer
would actually cause our resulting container image set to be larger in
size.

Dan


> 
> > In my basic deployment, undercloud seems to have 17 "components"
> > (49 
> > containers), overcloud controller 15 components (48 containers),
> > and 
> > overcloud compute 4 components (7 containers). Accounting for
> > overlaps, 
> > the total number of "components" used seems to be 19. (By
> > "components" 
> > here i mean whatever uses a different ConfigImage than other
> > services. I 
> > just eyeballed it but i think i'm not too far off the correct
> > number.)
> > 
> > So we'd subtract 267 MB from base image and add that to 19 leaf
> > images 
> > used in this 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-30 Thread Bogdan Dobrelya

On 11/29/18 6:42 PM, Jiří Stránský wrote:

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on
IRC that this is the only thing that makes sense to seriously consider.
But even then it's not a win-win -- we'd gain some security by leaner
production images, but pay for it with space+bandwidth by duplicating
image content (IOW we can help achieve one of the goals we had in mind
by worsening the situation w/r/t the other goal we had in mind.)

Personally i'm not sold yet but it's something that i'd consider if we
got measurements of how much more space/bandwidth usage this would
consume, and if we got some further details/examples about how serious
are the security concerns if we leave config mgmt tools in runtime 
images.


IIRC the other options (that were brought forward so far) were already
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
mounting being too hacky and fragile, and nsenter not really solving the
problem (because it allows us to switch to having different bins/libs
available, but it does not allow merging the availability of bins/libs
from two containers into a single context).



We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have
config tools in runtime images", but IMO we all sorta agree that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we do this
(i'll borrow Dan's drawing):

|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g.
separately per controller, per compute). This could help with decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in sizes 
as follows. I think the crucial point here is the layering. If we do 
this image layering:


|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the 
topmost level, per-component, right?


Given we detached systemd from puppet, cronie et al, that would be 
267-190MB, so the math below would be looking much better




In my basic deployment, undercloud seems to have 17 "components" (49 
containers), overcloud controller 15 components (48 containers), and 
overcloud compute 4 components (7 containers). Accounting for overlaps, 
the total number of "components" used seems to be 19. (By "components" 
here i mean whatever uses a different ConfigImage than other services. I 
just eyeballed it but i think i'm not too far off the correct number.)


So we'd subtract 267 MB from base image and add that to 19 leaf images 
used in this deployment. That means difference of +4.8 GB to the current 
image sizes. My /var/lib/registry dir on undercloud with all the images 
currently has 5.1 GB. We'd almost double that to 9.9 GB.


Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs 
(both external and e.g. internal within OpenStack Infra CI clouds).


And for internal traffic between local registry and overcloud nodes, it 
gives +3.7 GB per controller and +800 MB per compute. That may not be so 
critical but still feels like a considerable downside.


Another gut feeling is that this way of image layering would take longer 
time to build and to run the modify-image Ansible role which we use in 
CI, so that could endanger how our CI jobs fit into the time limit. We 
could also probably measure this but i'm not sure if it's worth spending 
the time.


All in all i'd argue we should be looking at different options still.



Given that we should decouple systemd from all/some of the dependencies
(an example topic for RDO [0]), that could save a 190MB. But it seems we
cannot break the love of puppet and systemd as it heavily relies on the
latter and changing packaging like that would higly likely affect
baremetal deployments with puppet and systemd co-operating.


Ack :/



Long story short, we cannot shoot both rabbits with a single shot, not
with puppet :) May be we could with 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Jiří Stránský

On 29. 11. 18 20:20, Fox, Kevin M wrote:

If the base layers are shared, you won't pay extra for the separate puppet 
container


Yes, and that's the state we're in right now.


unless you have another container also installing ruby in an upper layer.


Not just Ruby but also Puppet and Systemd. I think that's what the 
proposal we're discussing here suggests -- removing this content from 
the base layer (so that we can get service runtime images without this 
content present) and putting this content *on top* of individual service 
images. Unless i'm missing some trick to start sharing *top* layers 
rather than *base* layers, i think that effectively disables the space 
sharing for the Ruby+Puppet+Systemd content.



With OpenStack, thats unlikely.

the apparent size of a container is not equal to its actual size.


Yes. :)

Thanks

Jirka



Thanks,
Kevin

From: Jiří Stránský [ji...@redhat.com]
Sent: Thursday, November 29, 2018 9:42 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on
IRC that this is the only thing that makes sense to seriously consider.
But even then it's not a win-win -- we'd gain some security by leaner
production images, but pay for it with space+bandwidth by duplicating
image content (IOW we can help achieve one of the goals we had in mind
by worsening the situation w/r/t the other goal we had in mind.)

Personally i'm not sold yet but it's something that i'd consider if we
got measurements of how much more space/bandwidth usage this would
consume, and if we got some further details/examples about how serious
are the security concerns if we leave config mgmt tools in runtime images.

IIRC the other options (that were brought forward so far) were already
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
mounting being too hacky and fragile, and nsenter not really solving the
problem (because it allows us to switch to having different bins/libs
available, but it does not allow merging the availability of bins/libs
from two containers into a single context).



We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have
config tools in runtime images", but IMO we all sorta agree that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we do this
(i'll borrow Dan's drawing):

|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g.
separately per controller, per compute). This could help with decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in sizes
as follows. I think the crucial point here is the layering. If we do
this image layering:

|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the
topmost level, per-component, right?

In my basic deployment, undercloud seems to have 17 "components" (49
containers), overcloud controller 15 components (48 containers), and
overcloud compute 4 components (7 containers). Accounting for overlaps,
the total number of "components" used seems to be 19. (By "components"
here i mean whatever uses a different ConfigImage than other services. I
just eyeballed it but i think i'm not too far off the correct number.)

So we'd subtract 267 MB from base image and add that to 19 leaf images
used in this deployment. That means difference of +4.8 GB to the current
image sizes. My /var/lib/registry dir on undercloud with all the images
currently has 5.1 GB. We'd almost double that to 9.9 GB.

Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs
(both external and e.g. internal within OpenStack Infra CI clouds).

And for interna

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Fox, Kevin M
Oh, rereading the conversation again, the concern is having shared deps move up 
layers? so more systemd related then ruby?

The conversation about --nodeps makes it sound like its not actually used. Just 
an artifact of how the rpms are built... What about creating a dummy package 
that provides(systemd)? That avoids using --nodeps.

Thanks,
Kevin

From: Fox, Kevin M [kevin@pnnl.gov]
Sent: Thursday, November 29, 2018 11:20 AM
To: Former OpenStack Development Mailing List, use openstack-discuss now
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

If the base layers are shared, you won't pay extra for the separate puppet 
container unless you have another container also installing ruby in an upper 
layer. With OpenStack, thats unlikely.

the apparent size of a container is not equal to its actual size.

Thanks,
Kevin

From: Jiří Stránský [ji...@redhat.com]
Sent: Thursday, November 29, 2018 9:42 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
> On 11/28/18 6:02 PM, Jiří Stránský wrote:
>> 
>>
>>>
>>> Reiterating again on previous points:
>>>
>>> -I'd be fine removing systemd. But lets do it properly and not via 'rpm
>>> -ev --nodeps'.
>>> -Puppet and Ruby *are* required for configuration. We can certainly put
>>> them in a separate container outside of the runtime service containers
>>> but doing so would actually cost you much more space/bandwidth for each
>>> service container. As both of these have to get downloaded to each node
>>> anyway in order to generate config files with our current mechanisms
>>> I'm not sure this buys you anything.
>>
>> +1. I was actually under the impression that we concluded yesterday on
>> IRC that this is the only thing that makes sense to seriously consider.
>> But even then it's not a win-win -- we'd gain some security by leaner
>> production images, but pay for it with space+bandwidth by duplicating
>> image content (IOW we can help achieve one of the goals we had in mind
>> by worsening the situation w/r/t the other goal we had in mind.)
>>
>> Personally i'm not sold yet but it's something that i'd consider if we
>> got measurements of how much more space/bandwidth usage this would
>> consume, and if we got some further details/examples about how serious
>> are the security concerns if we leave config mgmt tools in runtime images.
>>
>> IIRC the other options (that were brought forward so far) were already
>> dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
>> mounting being too hacky and fragile, and nsenter not really solving the
>> problem (because it allows us to switch to having different bins/libs
>> available, but it does not allow merging the availability of bins/libs
>> from two containers into a single context).
>>
>>>
>>> We are going in circles here I think
>>
>> +1. I think too much of the discussion focuses on "why it's bad to have
>> config tools in runtime images", but IMO we all sorta agree that it
>> would be better not to have them there, if it came at no cost.
>>
>> I think to move forward, it would be interesting to know: if we do this
>> (i'll borrow Dan's drawing):
>>
>> |base container| --> |service container| --> |service container w/
>> Puppet installed|
>>
>> How much more space and bandwidth would this consume per node (e.g.
>> separately per controller, per compute). This could help with decision
>> making.
>
> As I've already evaluated in the related bug, that is:
>
> puppet-* modules and manifests ~ 16MB
> puppet with dependencies ~61MB
> dependencies of the seemingly largest a dependency, systemd ~190MB
>
> that would be an extra layer size for each of the container images to be
> downloaded/fetched into registries.

Thanks, i tried to do the math of the reduction vs. inflation in sizes
as follows. I think the crucial point here is the layering. If we do
this image layering:

|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the
topmost level, per-component, right?

In my basic deployment, undercloud seems to have 17 "components" (49
containers), overcloud controller 15 components (48 containers), and
overcloud compute 4 components (7 containers). Accounting for overlaps,
the total number of "components" used seems to be 19. (By &q

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Fox, Kevin M
If the base layers are shared, you won't pay extra for the separate puppet 
container unless you have another container also installing ruby in an upper 
layer. With OpenStack, thats unlikely.

the apparent size of a container is not equal to its actual size.

Thanks,
Kevin

From: Jiří Stránský [ji...@redhat.com]
Sent: Thursday, November 29, 2018 9:42 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
> On 11/28/18 6:02 PM, Jiří Stránský wrote:
>> 
>>
>>>
>>> Reiterating again on previous points:
>>>
>>> -I'd be fine removing systemd. But lets do it properly and not via 'rpm
>>> -ev --nodeps'.
>>> -Puppet and Ruby *are* required for configuration. We can certainly put
>>> them in a separate container outside of the runtime service containers
>>> but doing so would actually cost you much more space/bandwidth for each
>>> service container. As both of these have to get downloaded to each node
>>> anyway in order to generate config files with our current mechanisms
>>> I'm not sure this buys you anything.
>>
>> +1. I was actually under the impression that we concluded yesterday on
>> IRC that this is the only thing that makes sense to seriously consider.
>> But even then it's not a win-win -- we'd gain some security by leaner
>> production images, but pay for it with space+bandwidth by duplicating
>> image content (IOW we can help achieve one of the goals we had in mind
>> by worsening the situation w/r/t the other goal we had in mind.)
>>
>> Personally i'm not sold yet but it's something that i'd consider if we
>> got measurements of how much more space/bandwidth usage this would
>> consume, and if we got some further details/examples about how serious
>> are the security concerns if we leave config mgmt tools in runtime images.
>>
>> IIRC the other options (that were brought forward so far) were already
>> dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
>> mounting being too hacky and fragile, and nsenter not really solving the
>> problem (because it allows us to switch to having different bins/libs
>> available, but it does not allow merging the availability of bins/libs
>> from two containers into a single context).
>>
>>>
>>> We are going in circles here I think
>>
>> +1. I think too much of the discussion focuses on "why it's bad to have
>> config tools in runtime images", but IMO we all sorta agree that it
>> would be better not to have them there, if it came at no cost.
>>
>> I think to move forward, it would be interesting to know: if we do this
>> (i'll borrow Dan's drawing):
>>
>> |base container| --> |service container| --> |service container w/
>> Puppet installed|
>>
>> How much more space and bandwidth would this consume per node (e.g.
>> separately per controller, per compute). This could help with decision
>> making.
>
> As I've already evaluated in the related bug, that is:
>
> puppet-* modules and manifests ~ 16MB
> puppet with dependencies ~61MB
> dependencies of the seemingly largest a dependency, systemd ~190MB
>
> that would be an extra layer size for each of the container images to be
> downloaded/fetched into registries.

Thanks, i tried to do the math of the reduction vs. inflation in sizes
as follows. I think the crucial point here is the layering. If we do
this image layering:

|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the
topmost level, per-component, right?

In my basic deployment, undercloud seems to have 17 "components" (49
containers), overcloud controller 15 components (48 containers), and
overcloud compute 4 components (7 containers). Accounting for overlaps,
the total number of "components" used seems to be 19. (By "components"
here i mean whatever uses a different ConfigImage than other services. I
just eyeballed it but i think i'm not too far off the correct number.)

So we'd subtract 267 MB from base image and add that to 19 leaf images
used in this deployment. That means difference of +4.8 GB to the current
image sizes. My /var/lib/registry dir on undercloud with all the images
currently has 5.1 GB. We'd almost double that to 9.9 GB.

Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs
(both external and e.g. internal within OpenStack Infra CI clouds).

And for internal traffic between local registry and overcloud nodes, it
gives +3.7 GB per controlle

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Jiří Stránský

On 28. 11. 18 18:29, Bogdan Dobrelya wrote:

On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on
IRC that this is the only thing that makes sense to seriously consider.
But even then it's not a win-win -- we'd gain some security by leaner
production images, but pay for it with space+bandwidth by duplicating
image content (IOW we can help achieve one of the goals we had in mind
by worsening the situation w/r/t the other goal we had in mind.)

Personally i'm not sold yet but it's something that i'd consider if we
got measurements of how much more space/bandwidth usage this would
consume, and if we got some further details/examples about how serious
are the security concerns if we leave config mgmt tools in runtime images.

IIRC the other options (that were brought forward so far) were already
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind
mounting being too hacky and fragile, and nsenter not really solving the
problem (because it allows us to switch to having different bins/libs
available, but it does not allow merging the availability of bins/libs
from two containers into a single context).



We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have
config tools in runtime images", but IMO we all sorta agree that it
would be better not to have them there, if it came at no cost.

I think to move forward, it would be interesting to know: if we do this
(i'll borrow Dan's drawing):

|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g.
separately per controller, per compute). This could help with decision
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be
downloaded/fetched into registries.


Thanks, i tried to do the math of the reduction vs. inflation in sizes 
as follows. I think the crucial point here is the layering. If we do 
this image layering:


|base| --> |+ service| --> |+ Puppet|

we'd drop ~267 MB from base image, but we'd be installing that to the 
topmost level, per-component, right?


In my basic deployment, undercloud seems to have 17 "components" (49 
containers), overcloud controller 15 components (48 containers), and 
overcloud compute 4 components (7 containers). Accounting for overlaps, 
the total number of "components" used seems to be 19. (By "components" 
here i mean whatever uses a different ConfigImage than other services. I 
just eyeballed it but i think i'm not too far off the correct number.)


So we'd subtract 267 MB from base image and add that to 19 leaf images 
used in this deployment. That means difference of +4.8 GB to the current 
image sizes. My /var/lib/registry dir on undercloud with all the images 
currently has 5.1 GB. We'd almost double that to 9.9 GB.


Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs 
(both external and e.g. internal within OpenStack Infra CI clouds).


And for internal traffic between local registry and overcloud nodes, it 
gives +3.7 GB per controller and +800 MB per compute. That may not be so 
critical but still feels like a considerable downside.


Another gut feeling is that this way of image layering would take longer 
time to build and to run the modify-image Ansible role which we use in 
CI, so that could endanger how our CI jobs fit into the time limit. We 
could also probably measure this but i'm not sure if it's worth spending 
the time.


All in all i'd argue we should be looking at different options still.



Given that we should decouple systemd from all/some of the dependencies
(an example topic for RDO [0]), that could save a 190MB. But it seems we
cannot break the love of puppet and systemd as it heavily relies on the
latter and changing packaging like that would higly likely affect
baremetal deployments with puppet and systemd co-operating.


Ack :/



Long story short, we cannot shoot both rabbits with a single shot, not
with puppet :) May be we could with ansible replacing puppet fully...
So splitting config and runtime images is the only choice yet to address
the raised security concerns. And let's forget about edge cases for 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-29 Thread Bogdan Dobrelya

On 11/28/18 8:55 PM, Doug Hellmann wrote:

I thought the preferred solution for more complex settings was config maps. Did 
that approach not work out?

Regardless, now that the driver work is done if someone wants to take another 
stab at etcd integration it’ll be more straightforward today.

Doug



While sharing configs is a feasible option to consider for large scale 
configuration management, Etcd only provides a strong consistency, which 
is also known as "Unavailable" [0]. For edge scenarios, to configure 
40,000 remote computes over WAN connections, we'd rather want instead 
weaker consistency models, like "Sticky Available" [0]. That would allow 
services to fetch their configuration either from a central "uplink" or 
locally as well, when the latter is not accessible from remote edge 
sites. Etcd cannot provide 40,000 local endpoints to fit that case I'm 
afraid, even if those would be read only replicas. That is also 
something I'm highlighting in the paper [1] drafted for ICFC-2019.


But had we such a sticky available key value storage solution, we would 
indeed have solved the problem of multiple configuration management 
system execution for thousands of nodes as James describes it.


[0] https://jepsen.io/consistency
[1] 
https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position_paper_1570506394.pdf


On 11/28/18 11:22 PM, Dan Prince wrote:

On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:

On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya 
wrote:
Long story short, we cannot shoot both rabbits with a single shot,
not
with puppet :) May be we could with ansible replacing puppet
fully...
So splitting config and runtime images is the only choice yet to
address
the raised security concerns. And let's forget about edge cases for
now.
Tossing around a pair of extra bytes over 40,000 WAN-distributed
computes ain't gonna be our the biggest problem for sure.


I think it's this last point that is the crux of this discussion. We
can agree to disagree about the merits of this proposal and whether
it's a pre-optimzation or micro-optimization, which I admit are
somewhat subjective terms. Ultimately, it seems to be about the "why"
do we need to do this as to the reason why the conversation seems to
be going in circles a bit.

I'm all for reducing container image size, but the reality is that
this proposal doesn't necessarily help us with the Edge use cases we
are talking about trying to solve.

Why would we even run the exact same puppet binary + manifest
individually 40,000 times so that we can produce the exact same set
of
configuration files that differ only by things such as IP address,
hostnames, and passwords? Maybe we should instead be thinking about
how we can do that *1* time centrally, and produce a configuration
that can be reused across 40,000 nodes with little effort. The
opportunity for a significant impact in terms of how we can scale
TripleO is much larger if we consider approaching these problems with
a wider net of what we could do. There's opportunity for a lot of
better reuse in TripleO, configuration is just one area. The plan and
Heat stack (within the ResourceGroup) are some other areas.


We run Puppet for configuration because that is what we did on
baremetal and we didn't break backwards compatability for our
configuration options for upgrades. Our Puppet model relies on being
executed on each local host in order to splice in the correct IP
address and hostname. It executes in a distributed fashion, and works
fairly well considering the history of the project. It is robust,
guarantees no duplicate configs are being set, and is backwards
compatible with all the options TripleO supported on baremetal. Puppet
is arguably better for configuration than Ansible (which is what I hear
people most often suggest we replace it with). It suits our needs fine,
but it is perhaps a bit overkill considering we are only generating
config files.

I think the answer here is moving to something like Etcd. Perhaps


Not Etcd I think, see my comment above. But you're absolutely right Dan.


skipping over Ansible entirely as a config management tool (it is
arguably less capable than Puppet in this category anyway). Or we could
use Ansible for "legacy" services only, switch to Etcd for a majority
of the OpenStack services, and drop Puppet entirely (my favorite
option). Consolidating our technology stack would be wise.

We've already put some work and analysis into the Etcd effort. Just
need to push on it some more. Looking at the previous Kubernetes
prototypes for TripleO would be the place to start.

Config management migration is going to be tedious. Its technical debt
that needs to be handled at some point anyway. I think it is a general
TripleO improvement that could benefit all clouds, not just Edge.

Dan



At the same time, if some folks want to work on smaller optimizations
(such as container image size), with an approach that can be agreed
upon, then they should do 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Dan Prince
On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:
> On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya  > wrote:
> > Long story short, we cannot shoot both rabbits with a single shot,
> > not
> > with puppet :) May be we could with ansible replacing puppet
> > fully...
> > So splitting config and runtime images is the only choice yet to
> > address
> > the raised security concerns. And let's forget about edge cases for
> > now.
> > Tossing around a pair of extra bytes over 40,000 WAN-distributed
> > computes ain't gonna be our the biggest problem for sure.
> 
> I think it's this last point that is the crux of this discussion. We
> can agree to disagree about the merits of this proposal and whether
> it's a pre-optimzation or micro-optimization, which I admit are
> somewhat subjective terms. Ultimately, it seems to be about the "why"
> do we need to do this as to the reason why the conversation seems to
> be going in circles a bit.
> 
> I'm all for reducing container image size, but the reality is that
> this proposal doesn't necessarily help us with the Edge use cases we
> are talking about trying to solve.
> 
> Why would we even run the exact same puppet binary + manifest
> individually 40,000 times so that we can produce the exact same set
> of
> configuration files that differ only by things such as IP address,
> hostnames, and passwords? Maybe we should instead be thinking about
> how we can do that *1* time centrally, and produce a configuration
> that can be reused across 40,000 nodes with little effort. The
> opportunity for a significant impact in terms of how we can scale
> TripleO is much larger if we consider approaching these problems with
> a wider net of what we could do. There's opportunity for a lot of
> better reuse in TripleO, configuration is just one area. The plan and
> Heat stack (within the ResourceGroup) are some other areas.

We run Puppet for configuration because that is what we did on
baremetal and we didn't break backwards compatability for our
configuration options for upgrades. Our Puppet model relies on being
executed on each local host in order to splice in the correct IP
address and hostname. It executes in a distributed fashion, and works
fairly well considering the history of the project. It is robust,
guarantees no duplicate configs are being set, and is backwards
compatible with all the options TripleO supported on baremetal. Puppet
is arguably better for configuration than Ansible (which is what I hear
people most often suggest we replace it with). It suits our needs fine,
but it is perhaps a bit overkill considering we are only generating
config files.

I think the answer here is moving to something like Etcd. Perhaps
skipping over Ansible entirely as a config management tool (it is
arguably less capable than Puppet in this category anyway). Or we could
use Ansible for "legacy" services only, switch to Etcd for a majority
of the OpenStack services, and drop Puppet entirely (my favorite
option). Consolidating our technology stack would be wise.

We've already put some work and analysis into the Etcd effort. Just
need to push on it some more. Looking at the previous Kubernetes
prototypes for TripleO would be the place to start.

Config management migration is going to be tedious. Its technical debt
that needs to be handled at some point anyway. I think it is a general
TripleO improvement that could benefit all clouds, not just Edge.

Dan

> 
> At the same time, if some folks want to work on smaller optimizations
> (such as container image size), with an approach that can be agreed
> upon, then they should do so. We just ought to be careful about how
> we
> justify those changes so that we can carefully weigh the effort vs
> the
> payoff. In this specific case, I don't personally see this proposal
> helping us with Edge use cases in a meaningful way given the scope of
> the changes. That's not to say there aren't other use cases that
> could
> justify it though (such as the security points brought up earlier).
> 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread James Slagle
On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya  wrote:
> Long story short, we cannot shoot both rabbits with a single shot, not
> with puppet :) May be we could with ansible replacing puppet fully...
> So splitting config and runtime images is the only choice yet to address
> the raised security concerns. And let's forget about edge cases for now.
> Tossing around a pair of extra bytes over 40,000 WAN-distributed
> computes ain't gonna be our the biggest problem for sure.

I think it's this last point that is the crux of this discussion. We
can agree to disagree about the merits of this proposal and whether
it's a pre-optimzation or micro-optimization, which I admit are
somewhat subjective terms. Ultimately, it seems to be about the "why"
do we need to do this as to the reason why the conversation seems to
be going in circles a bit.

I'm all for reducing container image size, but the reality is that
this proposal doesn't necessarily help us with the Edge use cases we
are talking about trying to solve.

Why would we even run the exact same puppet binary + manifest
individually 40,000 times so that we can produce the exact same set of
configuration files that differ only by things such as IP address,
hostnames, and passwords? Maybe we should instead be thinking about
how we can do that *1* time centrally, and produce a configuration
that can be reused across 40,000 nodes with little effort. The
opportunity for a significant impact in terms of how we can scale
TripleO is much larger if we consider approaching these problems with
a wider net of what we could do. There's opportunity for a lot of
better reuse in TripleO, configuration is just one area. The plan and
Heat stack (within the ResourceGroup) are some other areas.

At the same time, if some folks want to work on smaller optimizations
(such as container image size), with an approach that can be agreed
upon, then they should do so. We just ought to be careful about how we
justify those changes so that we can carefully weigh the effort vs the
payoff. In this specific case, I don't personally see this proposal
helping us with Edge use cases in a meaningful way given the scope of
the changes. That's not to say there aren't other use cases that could
justify it though (such as the security points brought up earlier).

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya

On 11/28/18 6:02 PM, Jiří Stránský wrote:





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on 
IRC that this is the only thing that makes sense to seriously consider. 
But even then it's not a win-win -- we'd gain some security by leaner 
production images, but pay for it with space+bandwidth by duplicating 
image content (IOW we can help achieve one of the goals we had in mind 
by worsening the situation w/r/t the other goal we had in mind.)


Personally i'm not sold yet but it's something that i'd consider if we 
got measurements of how much more space/bandwidth usage this would 
consume, and if we got some further details/examples about how serious 
are the security concerns if we leave config mgmt tools in runtime images.


IIRC the other options (that were brought forward so far) were already 
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind 
mounting being too hacky and fragile, and nsenter not really solving the 
problem (because it allows us to switch to having different bins/libs 
available, but it does not allow merging the availability of bins/libs 
from two containers into a single context).




We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have 
config tools in runtime images", but IMO we all sorta agree that it 
would be better not to have them there, if it came at no cost.


I think to move forward, it would be interesting to know: if we do this 
(i'll borrow Dan's drawing):


|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g. 
separately per controller, per compute). This could help with decision 
making.


As I've already evaluated in the related bug, that is:

puppet-* modules and manifests ~ 16MB
puppet with dependencies ~61MB
dependencies of the seemingly largest a dependency, systemd ~190MB

that would be an extra layer size for each of the container images to be 
downloaded/fetched into registries.


Given that we should decouple systemd from all/some of the dependencies 
(an example topic for RDO [0]), that could save a 190MB. But it seems we 
cannot break the love of puppet and systemd as it heavily relies on the 
latter and changing packaging like that would higly likely affect 
baremetal deployments with puppet and systemd co-operating.


Long story short, we cannot shoot both rabbits with a single shot, not 
with puppet :) May be we could with ansible replacing puppet fully...
So splitting config and runtime images is the only choice yet to address 
the raised security concerns. And let's forget about edge cases for now.
Tossing around a pair of extra bytes over 40,000 WAN-distributed 
computes ain't gonna be our the biggest problem for sure.


[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction





Dan



Thanks

Jirka

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Jiří Stránský





Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.


+1. I was actually under the impression that we concluded yesterday on 
IRC that this is the only thing that makes sense to seriously consider. 
But even then it's not a win-win -- we'd gain some security by leaner 
production images, but pay for it with space+bandwidth by duplicating 
image content (IOW we can help achieve one of the goals we had in mind 
by worsening the situation w/r/t the other goal we had in mind.)


Personally i'm not sold yet but it's something that i'd consider if we 
got measurements of how much more space/bandwidth usage this would 
consume, and if we got some further details/examples about how serious 
are the security concerns if we leave config mgmt tools in runtime images.


IIRC the other options (that were brought forward so far) were already 
dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind 
mounting being too hacky and fragile, and nsenter not really solving the 
problem (because it allows us to switch to having different bins/libs 
available, but it does not allow merging the availability of bins/libs 
from two containers into a single context).




We are going in circles here I think


+1. I think too much of the discussion focuses on "why it's bad to have 
config tools in runtime images", but IMO we all sorta agree that it 
would be better not to have them there, if it came at no cost.


I think to move forward, it would be interesting to know: if we do this 
(i'll borrow Dan's drawing):


|base container| --> |service container| --> |service container w/
Puppet installed|

How much more space and bandwidth would this consume per node (e.g. 
separately per controller, per compute). This could help with decision 
making.




Dan



Thanks

Jirka

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Workflows Squad changes

2018-11-28 Thread Ryan Brady
On Wed, Nov 28, 2018 at 5:13 AM Jiri Tomasek  wrote:

> Hi all,
>
> Recently, the workflows squad has been reorganized and people from the
> squad are joining different squads. I would like to discuss how we are
> going to adjust to this situation to make sure that tripleo-common
> development is not going to be blocked in terms of feature work and reviews.
>
> With this change, most of the tripleo-common maintenance work goes
> naturally to UI & Validations squad as CLI and GUI are the consumers of the
> API provided by tripleo-common. Adriano Petrich from workflows squad has
> joined UI squad to take on this work.
>
> As a possible solution, I would like to propose Adriano as a core reviewer
> to tripleo-common and adding tripleo-ui cores right to +2 tripleo-common
> patches.
>

> It would be great to hear opinions especially former members of Workflows
> squad and regular contributors to tripleo-common on these changes and in
> general on how to establish regular reviews and maintenance to ensure that
> tripleo-common codebase is moved towards converging the CLI and GUI
> deployment workflow.
>

Well, I'm not really going that far and plan to continue working in
tripleo-common for the time being.  If that isn't sustainable during the
next cycle, I'll make sure to shout.


> Thanks
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 

RYAN BRADY

SENIOR SOFTWARE ENGINEER

Red Hat Inc 

rbr...@redhat.comT: (919)-890-8925 IM: rbrady

@redhatway    @redhatinc
   @redhatsnaps

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Fox, Kevin M
Ok, so you have the workflow in place, but it sounds like the containers are 
not laid out to best use that workflow. Puppet is in the base layer. That means 
whenever puppet gets updated, all the other containers must be too. And other 
such update coupling issues.

I'm with you, that binaries should not be copied from one container to another 
though.

Thanks,
Kevin

From: Dan Prince [dpri...@redhat.com]
Sent: Wednesday, November 28, 2018 5:31 AM
To: Former OpenStack Development Mailing List, use openstack-discuss now; 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On Wed, 2018-11-28 at 00:31 +, Fox, Kevin M wrote:
> The pod concept allows you to have one tool per container do one
> thing and do it well.
>
> You can have a container for generating config, and another container
> for consuming it.
>
> In a Kubernetes pod, if you still wanted to do puppet,
> you could have a pod that:
> 1. had an init container that ran puppet and dumped the resulting
> config to an emptyDir volume.
> 2. had your main container pull its config from the emptyDir volume.

We have basically implemented the same workflow in TripleO today. First
we execute Puppet in an "init container" (really just an ephemeral
container that generates the config files and then goes away). Then we
bind mount those configs into the service container.

One improvement we could make (which we aren't doing yet) is to use a
data container/volume to store the config files instead of using the
host. Sharing *data* within a 'pod' (set of containers, etc.) is
certainly a valid use of container volumes.

None of this is what we are really talking about in this thread though.
Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I don't think it is a good idea at all as it violates
encapsulation of the containers in general, regardless of whether we
use pods or not.

Dan


>
> Then each container would have no dependency on each other.
>
> In full blown Kubernetes cluster you might have puppet generate a
> configmap though and ship it to your main container directly. Thats
> another matter though. I think the example pod example above is still
> usable without k8s?
>
> Thanks,
> Kevin
> 
> From: Dan Prince [dpri...@redhat.com]
> Sent: Tuesday, November 27, 2018 10:10 AM
> To: OpenStack Development Mailing List (not for usage questions);
> openstack-disc...@lists.openstack.org
> Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of
> containers for security and size of images (maintenance) sakes
>
> On Tue, 2018-11-27 at 16:24 +0100, Bogdan Dobrelya wrote:
> > Changing the topic to follow the subject.
> >
> > [tl;dr] it's time to rearchitect container images to stop
> > incluiding
> > config-time only (puppet et al) bits, which are not needed runtime
> > and
> > pose security issues, like CVEs, to maintain daily.
>
> I think your assertion that we need to rearchitect the config images
> to
> container the puppet bits is incorrect here.
>
> After reviewing the patches you linked to below it appears that you
> are
> proposing we use --volumes-from to bind mount application binaries
> from
> one container into another. I don't believe this is a good pattern
> for
> containers. On baremetal if we followed the same pattern it would be
> like using an /nfs share to obtain access to binaries across the
> network to optimize local storage. Now... some people do this (like
> maybe high performance computing would launch an MPI job like this)
> but
> I don't think we should consider it best practice for our containers
> in
> TripleO.
>
> Each container should container its own binaries and libraries as
> much
> as possible. And while I do think we should be using --volumes-from
> more often in TripleO it would be for sharing *data* between
> containers, not binaries.
>
>
> > Background:
> > 1) For the Distributed Compute Node edge case, there is potentially
> > tens
> > of thousands of a single-compute-node remote edge sites connected
> > over
> > WAN to a single control plane, which is having high latency, like a
> > 100ms or so, and limited bandwith. Reducing the base layer size
> > becomes
> > a decent goal there. See the security background below.
>
> The reason we put Puppet into the base layer was in fact to prevent
> it
> from being downloaded multiple times. If we were to re-architect the
> image

Re: [openstack-dev] [tripleo] Workflows Squad changes

2018-11-28 Thread Emilien Macchi
On Wed, Nov 28, 2018 at 5:13 AM Jiri Tomasek  wrote:
[...]

> As a possible solution, I would like to propose Adriano as a core reviewer
> to tripleo-common and adding tripleo-ui cores right to +2 tripleo-common
> patches.
>
[...]

Not a member of the squad but +2 to the idea

Thanks for proposing,
-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Sergii Golovatiuk
Hi,
On Tue, Nov 27, 2018 at 7:13 PM Dan Prince  wrote:

> On Tue, 2018-11-27 at 16:24 +0100, Bogdan Dobrelya wrote:
> > Changing the topic to follow the subject.
> >
> > [tl;dr] it's time to rearchitect container images to stop incluiding
> > config-time only (puppet et al) bits, which are not needed runtime
> > and
> > pose security issues, like CVEs, to maintain daily.
>
> I think your assertion that we need to rearchitect the config images to
> container the puppet bits is incorrect here.
>
> After reviewing the patches you linked to below it appears that you are
> proposing we use --volumes-from to bind mount application binaries from
> one container into another. I don't believe this is a good pattern for
> containers. On baremetal if we followed the same pattern it would be
> like using an /nfs share to obtain access to binaries across the
> network to optimize local storage. Now... some people do this (like
> maybe high performance computing would launch an MPI job like this) but
> I don't think we should consider it best practice for our containers in
> TripleO.
>
> Each container should container its own binaries and libraries as much
> as possible. And while I do think we should be using --volumes-from
> more often in TripleO it would be for sharing *data* between
> containers, not binaries.
>
>
> >
> > Background:
> > 1) For the Distributed Compute Node edge case, there is potentially
> > tens
> > of thousands of a single-compute-node remote edge sites connected
> > over
> > WAN to a single control plane, which is having high latency, like a
> > 100ms or so, and limited bandwith. Reducing the base layer size
> > becomes
> > a decent goal there. See the security background below.
>
> The reason we put Puppet into the base layer was in fact to prevent it
> from being downloaded multiple times. If we were to re-architect the
> image layers such that the child layers all contained their own copies
> of Puppet for example there would actually be a net increase in
> bandwidth and disk usage. So I would argue we are already addressing
> the goal of optimizing network and disk space.
>
> Moving it out of the base layer so that you can patch it more often
> without disrupting other services is a valid concern. But addressing
> this concern while also preserving our definiation of a container (see
> above, a container should contain all of its binaries) is going to cost
> you something, namely disk and network space because Puppet would need
> to be duplicated in each child container.
>
> As Puppet is used to configure a majority of the services in TripleO
> having it in the base container makes most sense. And yes, if there are
> security patches for Puppet/Ruby those might result in a bunch of
> containers getting pushed. But let Docker layers take care of this I
> think... Don't try to solve things by constructing your own custom
> mounts and volumes to work around the issue.
>
>
> > 2) For a generic security (Day 2, maintenance) case, when
> > puppet/ruby/systemd/name-it gets a CVE fixed, the base layer has to
> > be
> > updated and all layers on top - to be rebuild, and all of those
> > layers,
> > to be re-fetched for cloud hosts and all containers to be
> > restarted...
> > And all of that because of some fixes that have nothing to OpenStack.
> > By
> > the remote edge sites as well, remember of "tens of thousands", high
> > latency and limited bandwith?..
> > 3) TripleO CI updates (including puppet*) packages in containers, not
> > in
> > a common base layer of those. So each a CI job has to update puppet*
> > and
> > its dependencies - ruby/systemd as well. Reducing numbers of packages
> > to
> > update for each container makes sense for CI as well.
> >
> > Implementation related:
> >
> > WIP patches [0],[1] for early review, uses a config "pod" approach,
> > does
> > not require to maintain a two sets of config vs runtime images.
> > Future
> > work: a) cronie requires systemd, we'd want to fix that also off the
> > base layer. b) rework to podman pods for docker-puppet.py instead of
> > --volumes-from a side car container (can't be backported for Queens
> > then, which is still nice to have a support for the Edge DCN case,
> > at
> > least downstream only perhaps).
> >
> > Some questions raised on IRC:
> >
> > Q: is having a service be able to configure itself really need to
> > involve a separate pod?
> > A: Highly likely yes, removing not-runtime things is a good idea and
> > pods is an established PaaS paradigm already. That will require some
> > changes in the architecture though (see the topic with WIP patches).
>
> I'm a little confused on this one. Are you suggesting that we have 2
> containers for each service? One with Puppet and one without?
>
> That is certainly possible, but to pull it off would likely require you
> to have things built like this:
>
>  |base container| --> |service container| --> |service container w/
> Puppet installed|
>
> The end result would be Puppet being 

Re: [openstack-dev] [tripleo] Let's improve upstream docs

2018-11-28 Thread Natal Ngétal
On Wed, Nov 28, 2018 at 4:19 PM Marios Andreou  wrote:
> great you are very welcome !
Thanks.

> not really, I mean "anything goes" as long as it's an improvement ( and the 
> usual review process will determine if it is or not :)  ). Could be as small 
> as typos or broken links/images, through to reorganising sections or even 
> bigger contributions like  completely new sections if you can and want. Take 
> a look at the existing patches that are on the bug for ideas
I see. I have made a first patch and I'm going to find what I can do
and continue to make code review.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Let's improve upstream docs

2018-11-28 Thread Marios Andreou
On Wed, Nov 28, 2018 at 4:33 PM Natal Ngétal  wrote:

> On Tue, Nov 27, 2018 at 4:50 PM Marios Andreou  wrote:
> > as just mentioned in the tripleo weekly irc meeting [1] some of us are
> trying to make small weekly improvements to the tripleo docs  [2]. We are
> using this bug [3] for tracking and this effort is a result of some
> feedback during the recent Berlin summit.
> It's a good idea. The documentation of a project it's very important.
>
> > The general idea is 1 per week (or more if you can and want) -
> improvement/removal of stale content/identifying missing sections, or
> anything else you might care to propose. Please join us if you can, just
> add "Related-Bug: #1804642"  to your commit message
> I'm going to try to help you on this ticket. I started to make code
>

great you are very welcome !


> review. I would to know if you have more details about that. I mean,
> do you have examples of part able improved or something like that.
>
>
not really, I mean "anything goes" as long as it's an improvement ( and the
usual review process will determine if it is or not :)  ). Could be as
small as typos or broken links/images, through to reorganising sections or
even bigger contributions like  completely new sections if you can and
want. Take a look at the existing patches that are on the bug for ideas

thanks



> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Let's improve upstream docs

2018-11-28 Thread Natal Ngétal
On Tue, Nov 27, 2018 at 4:50 PM Marios Andreou  wrote:
> as just mentioned in the tripleo weekly irc meeting [1] some of us are trying 
> to make small weekly improvements to the tripleo docs  [2]. We are using this 
> bug [3] for tracking and this effort is a result of some feedback during the 
> recent Berlin summit.
It's a good idea. The documentation of a project it's very important.

> The general idea is 1 per week (or more if you can and want) - 
> improvement/removal of stale content/identifying missing sections, or 
> anything else you might care to propose. Please join us if you can, just add 
> "Related-Bug: #1804642"  to your commit message
I'm going to try to help you on this ticket. I started to make code
review. I would to know if you have more details about that. I mean,
do you have examples of part able improved or something like that.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Dan Prince
On Wed, 2018-11-28 at 15:12 +0100, Bogdan Dobrelya wrote:
> On 11/28/18 2:58 PM, Dan Prince wrote:
> > On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
> > > To follow up and explain the patches for code review:
> > > 
> > > The "header" patch https://review.openstack.org/620310 ->
> > > (requires)
> > > https://review.rdoproject.org/r/#/c/17534/, and also
> > > https://review.openstack.org/620061 -> (which in turn requires)
> > > https://review.openstack.org/619744 -> (Kolla change, the 1st to
> > > go)
> > > https://review.openstack.org/619736
> > 
> > This email was cross-posted to multiple lists and I think we may
> > have
> > lost some of the context in the process as the subject was changed.
> > 
> > Most of the suggestions and patches are about making our base
> > container(s) smaller in size. And the means by which the patches do
> > that is to share binaries/applications across containers with
> > custom
> > mounts/volumes. I've -2'd most of them. What concerns me however is
> > that some of the TripleO cores seemed open to this idea yesterday
> > on
> > IRC. Perhaps I've misread things but what you appear to be doing
> > here
> > is quite drastic I think we need to consider any of this carefully
> > before proceeding with any of it.
> > 
> > 
> > > Please also read the commit messages, I tried to explain all
> > > "Whys"
> > > very
> > > carefully. Just to sum up it here as well:
> > > 
> > > The current self-containing (config and runtime bits)
> > > architecture
> > > of
> > > containers badly affects:
> > > 
> > > * the size of the base layer and all containers images as an
> > > additional 300MB (adds an extra 30% of size).
> > 
> > You are accomplishing this by removing Puppet from the base
> > container,
> > but you are also creating another container in the process. This
> > would
> > still be required on all nodes as Puppet is our config tool. So you
> > would still be downloading some of this data anyways. Understood
> > your
> > reasons for doing this are that it avoids rebuilding all containers
> > when there is a change to any of these packages in the base
> > container.
> > What you are missing however is how often is it the case that
> > Puppet is
> > updated that something else in the base container isn't?
> 
> For CI jobs updating all containers, its quite an often to have
> changes 
> in openstack/tripleo puppet modules to pull in. IIUC, that
> automatically 
> picks up any updates for all of its dependencies and for the 
> dependencies of dependencies, and all that multiplied by a hundred
> of 
> total containers to get it updated. That is a *pain* we're used to
> have 
> these day for quite often timing out CI jobs... Ofc, the main cause
> is 
> delayed promotions though.

Regarding CI I made a separate suggestion on that below in that
rebuilding the base layer more often could be a good solution here. I
don't think the puppet-tripleo package is that large however so we
could just live with it.

> 
> For real deployments, I have no data for the cadence of minor updates
> in 
> puppet and tripleo & openstack modules for it, let's ask operators
> (as 
> we're happened to be in the merged openstack-discuss list)? For its 
> dependencies though, like systemd and ruby, I'm pretty sure it's
> quite 
> often to have CVEs fixed there. So I expect what "in the fields" 
> security fixes delivering for those might bring some unwanted hassle
> for 
> long-term maintenance of LTS releases. As Tengu noted on IRC:
> "well, between systemd, puppet and ruby, there are many security 
> concernes, almost every month... and also, what's the point keeping
> them 
> in runtime containers when they are useless?"

Reiterating again on previous points:

-I'd be fine removing systemd. But lets do it properly and not via 'rpm
-ev --nodeps'.
-Puppet and Ruby *are* required for configuration. We can certainly put
them in a separate container outside of the runtime service containers
but doing so would actually cost you much more space/bandwidth for each
service container. As both of these have to get downloaded to each node
anyway in order to generate config files with our current mechanisms
I'm not sure this buys you anything.

We are going in circles here I think

Dan

> 
> > I would wager that it is more rare than you'd think. Perhaps
> > looking at
> > the history of an OpenStack distribution would be a valid way to
> > assess
> > this more critically. Without this data to backup the numbers I'm
> > afraid what you are doing here falls into "pre-optimization"
> > territory
> > for me and I don't think the means used in the patches warrent the
> > benefits you mention here.
> > 
> > 
> > > * Edge cases, where we have containers images to be distributed,
> > > at
> > > least once to hit local registries, over high-latency and
> > > limited
> > > bandwith, highly unreliable WAN connections.
> > > * numbers of packages to update in CI for all containers for all
> > > services 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya

On 11/28/18 2:58 PM, Dan Prince wrote:

On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:

To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires)
https://review.rdoproject.org/r/#/c/17534/, and also
https://review.openstack.org/620061 -> (which in turn requires)
https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736


This email was cross-posted to multiple lists and I think we may have
lost some of the context in the process as the subject was changed.

Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I've -2'd most of them. What concerns me however is
that some of the TripleO cores seemed open to this idea yesterday on
IRC. Perhaps I've misread things but what you appear to be doing here
is quite drastic I think we need to consider any of this carefully
before proceeding with any of it.




Please also read the commit messages, I tried to explain all "Whys"
very
carefully. Just to sum up it here as well:

The current self-containing (config and runtime bits) architecture
of
containers badly affects:

* the size of the base layer and all containers images as an
additional 300MB (adds an extra 30% of size).


You are accomplishing this by removing Puppet from the base container,
but you are also creating another container in the process. This would
still be required on all nodes as Puppet is our config tool. So you
would still be downloading some of this data anyways. Understood your
reasons for doing this are that it avoids rebuilding all containers
when there is a change to any of these packages in the base container.
What you are missing however is how often is it the case that Puppet is
updated that something else in the base container isn't?


For CI jobs updating all containers, its quite an often to have changes 
in openstack/tripleo puppet modules to pull in. IIUC, that automatically 
picks up any updates for all of its dependencies and for the 
dependencies of dependencies, and all that multiplied by a hundred of 
total containers to get it updated. That is a *pain* we're used to have 
these day for quite often timing out CI jobs... Ofc, the main cause is 
delayed promotions though.


For real deployments, I have no data for the cadence of minor updates in 
puppet and tripleo & openstack modules for it, let's ask operators (as 
we're happened to be in the merged openstack-discuss list)? For its 
dependencies though, like systemd and ruby, I'm pretty sure it's quite 
often to have CVEs fixed there. So I expect what "in the fields" 
security fixes delivering for those might bring some unwanted hassle for 
long-term maintenance of LTS releases. As Tengu noted on IRC:
"well, between systemd, puppet and ruby, there are many security 
concernes, almost every month... and also, what's the point keeping them 
in runtime containers when they are useless?"




I would wager that it is more rare than you'd think. Perhaps looking at
the history of an OpenStack distribution would be a valid way to assess
this more critically. Without this data to backup the numbers I'm
afraid what you are doing here falls into "pre-optimization" territory
for me and I don't think the means used in the patches warrent the
benefits you mention here.



* Edge cases, where we have containers images to be distributed, at
least once to hit local registries, over high-latency and limited
bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
services (CI jobs do not rebuild containers so each container gets
updated for those 300MB of extra size).


It would seem to me there are other ways to solve the CI containers
update problems. Rebuilding the base layer more often would solve this
right? If we always build our service containers off of a base layer
that is recent there should be no updates to the system/puppet packages
there in our CI pipelines.


* security and the surface of attacks, by introducing systemd et al
as
additional subjects for CVE fixes to maintain for all containers.


We aren't actually using systemd within our containers. I think those
packages are getting pulled in by an RPM dependency elsewhere. So
rather than using 'rpm -ev --nodeps' to remove it we could create a
sub-package for containers in those cases and install it instead. In
short rather than hack this to remove them why not pursue a proper
packaging fix?

In general I am a fan of getting things out of the base container we
don't need... so yeah lets do this. But lets do it properly.


* services uptime, by additional restarts of services related to
security maintanence of irrelevant to openstack components sitting
as a dead weight in containers images for ever.


Like I said 

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Dan Prince
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
> To follow up and explain the patches for code review:
> 
> The "header" patch https://review.openstack.org/620310 -> (requires) 
> https://review.rdoproject.org/r/#/c/17534/, and also 
> https://review.openstack.org/620061 -> (which in turn requires)
> https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
> https://review.openstack.org/619736

This email was cross-posted to multiple lists and I think we may have
lost some of the context in the process as the subject was changed.

Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I've -2'd most of them. What concerns me however is
that some of the TripleO cores seemed open to this idea yesterday on
IRC. Perhaps I've misread things but what you appear to be doing here
is quite drastic I think we need to consider any of this carefully
before proceeding with any of it.


> 
> Please also read the commit messages, I tried to explain all "Whys"
> very 
> carefully. Just to sum up it here as well:
> 
> The current self-containing (config and runtime bits) architecture
> of 
> containers badly affects:
> 
> * the size of the base layer and all containers images as an
>additional 300MB (adds an extra 30% of size).

You are accomplishing this by removing Puppet from the base container,
but you are also creating another container in the process. This would
still be required on all nodes as Puppet is our config tool. So you
would still be downloading some of this data anyways. Understood your
reasons for doing this are that it avoids rebuilding all containers
when there is a change to any of these packages in the base container.
What you are missing however is how often is it the case that Puppet is
updated that something else in the base container isn't?

I would wager that it is more rare than you'd think. Perhaps looking at
the history of an OpenStack distribution would be a valid way to assess
this more critically. Without this data to backup the numbers I'm
afraid what you are doing here falls into "pre-optimization" territory
for me and I don't think the means used in the patches warrent the
benefits you mention here.


> * Edge cases, where we have containers images to be distributed, at
>least once to hit local registries, over high-latency and limited
>bandwith, highly unreliable WAN connections.
> * numbers of packages to update in CI for all containers for all
>services (CI jobs do not rebuild containers so each container gets
>updated for those 300MB of extra size).

It would seem to me there are other ways to solve the CI containers
update problems. Rebuilding the base layer more often would solve this
right? If we always build our service containers off of a base layer
that is recent there should be no updates to the system/puppet packages
there in our CI pipelines.

> * security and the surface of attacks, by introducing systemd et al
> as
>additional subjects for CVE fixes to maintain for all containers.

We aren't actually using systemd within our containers. I think those
packages are getting pulled in by an RPM dependency elsewhere. So
rather than using 'rpm -ev --nodeps' to remove it we could create a
sub-package for containers in those cases and install it instead. In
short rather than hack this to remove them why not pursue a proper
packaging fix?

In general I am a fan of getting things out of the base container we
don't need... so yeah lets do this. But lets do it properly.

> * services uptime, by additional restarts of services related to
>security maintanence of irrelevant to openstack components sitting
>as a dead weight in containers images for ever.

Like I said above how often is it that these packages actually change
where something else in the base container doesn't? Perhaps we should
get more data here before blindly implementing a solution we aren't
sure really helps out in the real world.

> 
> On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
> > Changing the topic to follow the subject.
> > 
> > [tl;dr] it's time to rearchitect container images to stop
> > incluiding 
> > config-time only (puppet et al) bits, which are not needed runtime
> > and 
> > pose security issues, like CVEs, to maintain daily.
> > 
> > Background: 1) For the Distributed Compute Node edge case, there
> > is 
> > potentially tens of thousands of a single-compute-node remote edge
> > sites 
> > connected over WAN to a single control plane, which is having high 
> > latency, like a 100ms or so, and limited bandwith.
> > 2) For a generic security case,
> > 3) TripleO CI updates all
> > 
> > Challenge:
> > 
> > > Here is a related bug [1] and implementation [1] for that. PTAL
> > > folks!
> > > 
> > > [0] https://bugs.launchpad.net/tripleo/+bug/1804822
> > > [1] 
> > > 

Re: [openstack-dev] [TripleO][Edge][Kolla] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya
Added Kolla tag as we all together might want to do something to that 
systemd included in containers via *multiple* package dependencies, like 
[0]. Ideally, that might be properly packaging all/some (like those 
names listed in [1]) of the places having it as a dependency, to stop 
doing that as of now it's Containers Time?.. As a temporary security 
band-aiding I was thinking of removing systemd via footers [1] as an 
extra layer added on top, but not sure that buys something good long-term.


[0] https://pastebin.com/RSaRsYgZ
[1] 
https://review.openstack.org/#/c/620310/2/container-images/tripleo_kolla_template_overrides.j2@680


On 11/28/18 12:45 PM, Bogdan Dobrelya wrote:

To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires) 
https://review.rdoproject.org/r/#/c/17534/, and also 
https://review.openstack.org/620061 -> (which in turn requires)

https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736

Please also read the commit messages, I tried to explain all "Whys" very 
carefully. Just to sum up it here as well:


The current self-containing (config and runtime bits) architecture of 
containers badly affects:


* the size of the base layer and all containers images as an
   additional 300MB (adds an extra 30% of size).
* Edge cases, where we have containers images to be distributed, at
   least once to hit local registries, over high-latency and limited
   bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
   services (CI jobs do not rebuild containers so each container gets
   updated for those 300MB of extra size).
* security and the surface of attacks, by introducing systemd et al as
   additional subjects for CVE fixes to maintain for all containers.
* services uptime, by additional restarts of services related to
   security maintanence of irrelevant to openstack components sitting
   as a dead weight in containers images for ever.

On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:

Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.


Background: 1) For the Distributed Compute Node edge case, there is 
potentially tens of thousands of a single-compute-node remote edge 
sites connected over WAN to a single control plane, which is having 
high latency, like a 100ms or so, and limited bandwith.

2) For a generic security case,
3) TripleO CI updates all

Challenge:


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and 
each container!
So if we did so, we should then either install puppet-tripleo and co 
on the host and bind-mount it for the docker-puppet deployment task 
steps (bad idea IMO), OR use the magical --volumes-from 
 option to mount volumes from some 
"puppet-config" sidecar container inside each of the containers 
being launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås redhat.com> wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 



/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB
Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando









--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Dan Prince
On Wed, 2018-11-28 at 00:31 +, Fox, Kevin M wrote:
> The pod concept allows you to have one tool per container do one
> thing and do it well.
> 
> You can have a container for generating config, and another container
> for consuming it.
> 
> In a Kubernetes pod, if you still wanted to do puppet,
> you could have a pod that:
> 1. had an init container that ran puppet and dumped the resulting
> config to an emptyDir volume.
> 2. had your main container pull its config from the emptyDir volume.

We have basically implemented the same workflow in TripleO today. First
we execute Puppet in an "init container" (really just an ephemeral
container that generates the config files and then goes away). Then we
bind mount those configs into the service container.

One improvement we could make (which we aren't doing yet) is to use a
data container/volume to store the config files instead of using the
host. Sharing *data* within a 'pod' (set of containers, etc.) is
certainly a valid use of container volumes.

None of this is what we are really talking about in this thread though.
Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I don't think it is a good idea at all as it violates
encapsulation of the containers in general, regardless of whether we
use pods or not.

Dan


> 
> Then each container would have no dependency on each other.
> 
> In full blown Kubernetes cluster you might have puppet generate a
> configmap though and ship it to your main container directly. Thats
> another matter though. I think the example pod example above is still
> usable without k8s?
> 
> Thanks,
> Kevin
> 
> From: Dan Prince [dpri...@redhat.com]
> Sent: Tuesday, November 27, 2018 10:10 AM
> To: OpenStack Development Mailing List (not for usage questions); 
> openstack-disc...@lists.openstack.org
> Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of
> containers for security and size of images (maintenance) sakes
> 
> On Tue, 2018-11-27 at 16:24 +0100, Bogdan Dobrelya wrote:
> > Changing the topic to follow the subject.
> > 
> > [tl;dr] it's time to rearchitect container images to stop
> > incluiding
> > config-time only (puppet et al) bits, which are not needed runtime
> > and
> > pose security issues, like CVEs, to maintain daily.
> 
> I think your assertion that we need to rearchitect the config images
> to
> container the puppet bits is incorrect here.
> 
> After reviewing the patches you linked to below it appears that you
> are
> proposing we use --volumes-from to bind mount application binaries
> from
> one container into another. I don't believe this is a good pattern
> for
> containers. On baremetal if we followed the same pattern it would be
> like using an /nfs share to obtain access to binaries across the
> network to optimize local storage. Now... some people do this (like
> maybe high performance computing would launch an MPI job like this)
> but
> I don't think we should consider it best practice for our containers
> in
> TripleO.
> 
> Each container should container its own binaries and libraries as
> much
> as possible. And while I do think we should be using --volumes-from
> more often in TripleO it would be for sharing *data* between
> containers, not binaries.
> 
> 
> > Background:
> > 1) For the Distributed Compute Node edge case, there is potentially
> > tens
> > of thousands of a single-compute-node remote edge sites connected
> > over
> > WAN to a single control plane, which is having high latency, like a
> > 100ms or so, and limited bandwith. Reducing the base layer size
> > becomes
> > a decent goal there. See the security background below.
> 
> The reason we put Puppet into the base layer was in fact to prevent
> it
> from being downloaded multiple times. If we were to re-architect the
> image layers such that the child layers all contained their own
> copies
> of Puppet for example there would actually be a net increase in
> bandwidth and disk usage. So I would argue we are already addressing
> the goal of optimizing network and disk space.
> 
> Moving it out of the base layer so that you can patch it more often
> without disrupting other services is a valid concern. But addressing
> this concern while also preserving our definiation of a container
> (see
> above, a container should contain all of its binaries) is going to
> cost
> you something, namely disk and network space because Puppet would
> need
> to be duplicated in each child container.
> 
> A

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-28 Thread Bogdan Dobrelya

To follow up and explain the patches for code review:

The "header" patch https://review.openstack.org/620310 -> (requires) 
https://review.rdoproject.org/r/#/c/17534/, and also 
https://review.openstack.org/620061 -> (which in turn requires)

https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736

Please also read the commit messages, I tried to explain all "Whys" very 
carefully. Just to sum up it here as well:


The current self-containing (config and runtime bits) architecture of 
containers badly affects:


* the size of the base layer and all containers images as an
  additional 300MB (adds an extra 30% of size).
* Edge cases, where we have containers images to be distributed, at
  least once to hit local registries, over high-latency and limited
  bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
  services (CI jobs do not rebuild containers so each container gets
  updated for those 300MB of extra size).
* security and the surface of attacks, by introducing systemd et al as
  additional subjects for CVE fixes to maintain for all containers.
* services uptime, by additional restarts of services related to
  security maintanence of irrelevant to openstack components sitting
  as a dead weight in containers images for ever.

On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:

Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.


Background: 1) For the Distributed Compute Node edge case, there is 
potentially tens of thousands of a single-compute-node remote edge sites 
connected over WAN to a single control plane, which is having high 
latency, like a 100ms or so, and limited bandwith.

2) For a generic security case,
3) TripleO CI updates all

Challenge:


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and 
each container!
So if we did so, we should then either install puppet-tripleo and co 
on the host and bind-mount it for the docker-puppet deployment task 
steps (bad idea IMO), OR use the magical --volumes-from 
 option to mount volumes from some 
"puppet-config" sidecar container inside each of the containers being 
launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  
wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35 



/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB
Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando






--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [tripleo] Workflows Squad changes

2018-11-28 Thread Jiri Tomasek
Hi all,

Recently, the workflows squad has been reorganized and people from the
squad are joining different squads. I would like to discuss how we are
going to adjust to this situation to make sure that tripleo-common
development is not going to be blocked in terms of feature work and reviews.

With this change, most of the tripleo-common maintenance work goes
naturally to UI & Validations squad as CLI and GUI are the consumers of the
API provided by tripleo-common. Adriano Petrich from workflows squad has
joined UI squad to take on this work.

As a possible solution, I would like to propose Adriano as a core reviewer
to tripleo-common and adding tripleo-ui cores right to +2 tripleo-common
patches.

It would be great to hear opinions especially former members of Workflows
squad and regular contributors to tripleo-common on these changes and in
general on how to establish regular reviews and maintenance to ensure that
tripleo-common codebase is moved towards converging the CLI and GUI
deployment workflow.

Thanks
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-27 Thread Fox, Kevin M
The pod concept allows you to have one tool per container do one thing and do 
it well.

You can have a container for generating config, and another container for 
consuming it.

In a Kubernetes pod, if you still wanted to do puppet,
you could have a pod that:
1. had an init container that ran puppet and dumped the resulting config to an 
emptyDir volume.
2. had your main container pull its config from the emptyDir volume.

Then each container would have no dependency on each other.

In full blown Kubernetes cluster you might have puppet generate a configmap 
though and ship it to your main container directly. Thats another matter 
though. I think the example pod example above is still usable without k8s?

Thanks,
Kevin

From: Dan Prince [dpri...@redhat.com]
Sent: Tuesday, November 27, 2018 10:10 AM
To: OpenStack Development Mailing List (not for usage questions); 
openstack-disc...@lists.openstack.org
Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers 
for security and size of images (maintenance) sakes

On Tue, 2018-11-27 at 16:24 +0100, Bogdan Dobrelya wrote:
> Changing the topic to follow the subject.
>
> [tl;dr] it's time to rearchitect container images to stop incluiding
> config-time only (puppet et al) bits, which are not needed runtime
> and
> pose security issues, like CVEs, to maintain daily.

I think your assertion that we need to rearchitect the config images to
container the puppet bits is incorrect here.

After reviewing the patches you linked to below it appears that you are
proposing we use --volumes-from to bind mount application binaries from
one container into another. I don't believe this is a good pattern for
containers. On baremetal if we followed the same pattern it would be
like using an /nfs share to obtain access to binaries across the
network to optimize local storage. Now... some people do this (like
maybe high performance computing would launch an MPI job like this) but
I don't think we should consider it best practice for our containers in
TripleO.

Each container should container its own binaries and libraries as much
as possible. And while I do think we should be using --volumes-from
more often in TripleO it would be for sharing *data* between
containers, not binaries.


>
> Background:
> 1) For the Distributed Compute Node edge case, there is potentially
> tens
> of thousands of a single-compute-node remote edge sites connected
> over
> WAN to a single control plane, which is having high latency, like a
> 100ms or so, and limited bandwith. Reducing the base layer size
> becomes
> a decent goal there. See the security background below.

The reason we put Puppet into the base layer was in fact to prevent it
from being downloaded multiple times. If we were to re-architect the
image layers such that the child layers all contained their own copies
of Puppet for example there would actually be a net increase in
bandwidth and disk usage. So I would argue we are already addressing
the goal of optimizing network and disk space.

Moving it out of the base layer so that you can patch it more often
without disrupting other services is a valid concern. But addressing
this concern while also preserving our definiation of a container (see
above, a container should contain all of its binaries) is going to cost
you something, namely disk and network space because Puppet would need
to be duplicated in each child container.

As Puppet is used to configure a majority of the services in TripleO
having it in the base container makes most sense. And yes, if there are
security patches for Puppet/Ruby those might result in a bunch of
containers getting pushed. But let Docker layers take care of this I
think... Don't try to solve things by constructing your own custom
mounts and volumes to work around the issue.


> 2) For a generic security (Day 2, maintenance) case, when
> puppet/ruby/systemd/name-it gets a CVE fixed, the base layer has to
> be
> updated and all layers on top - to be rebuild, and all of those
> layers,
> to be re-fetched for cloud hosts and all containers to be
> restarted...
> And all of that because of some fixes that have nothing to OpenStack.
> By
> the remote edge sites as well, remember of "tens of thousands", high
> latency and limited bandwith?..
> 3) TripleO CI updates (including puppet*) packages in containers, not
> in
> a common base layer of those. So each a CI job has to update puppet*
> and
> its dependencies - ruby/systemd as well. Reducing numbers of packages
> to
> update for each container makes sense for CI as well.
>
> Implementation related:
>
> WIP patches [0],[1] for early review, uses a config "pod" approach,
> does
> not require to maintain a two sets of config vs runtime images.
> Future
> work: a) cronie requires systemd, we'd want to fix t

Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-27 Thread Dan Prince
On Tue, 2018-11-27 at 16:24 +0100, Bogdan Dobrelya wrote:
> Changing the topic to follow the subject.
> 
> [tl;dr] it's time to rearchitect container images to stop incluiding 
> config-time only (puppet et al) bits, which are not needed runtime
> and 
> pose security issues, like CVEs, to maintain daily.

I think your assertion that we need to rearchitect the config images to
container the puppet bits is incorrect here.

After reviewing the patches you linked to below it appears that you are
proposing we use --volumes-from to bind mount application binaries from
one container into another. I don't believe this is a good pattern for
containers. On baremetal if we followed the same pattern it would be
like using an /nfs share to obtain access to binaries across the
network to optimize local storage. Now... some people do this (like
maybe high performance computing would launch an MPI job like this) but
I don't think we should consider it best practice for our containers in
TripleO.

Each container should container its own binaries and libraries as much
as possible. And while I do think we should be using --volumes-from
more often in TripleO it would be for sharing *data* between
containers, not binaries.


> 
> Background:
> 1) For the Distributed Compute Node edge case, there is potentially
> tens 
> of thousands of a single-compute-node remote edge sites connected
> over 
> WAN to a single control plane, which is having high latency, like a 
> 100ms or so, and limited bandwith. Reducing the base layer size
> becomes 
> a decent goal there. See the security background below.

The reason we put Puppet into the base layer was in fact to prevent it
from being downloaded multiple times. If we were to re-architect the
image layers such that the child layers all contained their own copies
of Puppet for example there would actually be a net increase in
bandwidth and disk usage. So I would argue we are already addressing
the goal of optimizing network and disk space.

Moving it out of the base layer so that you can patch it more often
without disrupting other services is a valid concern. But addressing
this concern while also preserving our definiation of a container (see
above, a container should contain all of its binaries) is going to cost
you something, namely disk and network space because Puppet would need
to be duplicated in each child container.

As Puppet is used to configure a majority of the services in TripleO
having it in the base container makes most sense. And yes, if there are
security patches for Puppet/Ruby those might result in a bunch of
containers getting pushed. But let Docker layers take care of this I
think... Don't try to solve things by constructing your own custom
mounts and volumes to work around the issue.


> 2) For a generic security (Day 2, maintenance) case, when 
> puppet/ruby/systemd/name-it gets a CVE fixed, the base layer has to
> be 
> updated and all layers on top - to be rebuild, and all of those
> layers, 
> to be re-fetched for cloud hosts and all containers to be
> restarted... 
> And all of that because of some fixes that have nothing to OpenStack.
> By 
> the remote edge sites as well, remember of "tens of thousands", high 
> latency and limited bandwith?..
> 3) TripleO CI updates (including puppet*) packages in containers, not
> in 
> a common base layer of those. So each a CI job has to update puppet*
> and 
> its dependencies - ruby/systemd as well. Reducing numbers of packages
> to 
> update for each container makes sense for CI as well.
> 
> Implementation related:
> 
> WIP patches [0],[1] for early review, uses a config "pod" approach,
> does 
> not require to maintain a two sets of config vs runtime images.
> Future 
> work: a) cronie requires systemd, we'd want to fix that also off the 
> base layer. b) rework to podman pods for docker-puppet.py instead of 
> --volumes-from a side car container (can't be backported for Queens 
> then, which is still nice to have a support for the Edge DCN case,
> at 
> least downstream only perhaps).
> 
> Some questions raised on IRC:
> 
> Q: is having a service be able to configure itself really need to 
> involve a separate pod?
> A: Highly likely yes, removing not-runtime things is a good idea and 
> pods is an established PaaS paradigm already. That will require some 
> changes in the architecture though (see the topic with WIP patches).

I'm a little confused on this one. Are you suggesting that we have 2
containers for each service? One with Puppet and one without?

That is certainly possible, but to pull it off would likely require you
to have things built like this:

 |base container| --> |service container| --> |service container w/
Puppet installed|

The end result would be Puppet being duplicated in a layer for each
services "config image". Very inefficient.

Again, I'm ansering this assumping we aren't violating our container
constraints and best practices where each container has the binaries
its needs to do its 

[openstack-dev] [tripleo] Let's improve upstream docs

2018-11-27 Thread Marios Andreou
Hi folks,

as just mentioned in the tripleo weekly irc meeting [1] some of us are
trying to make small weekly improvements to the tripleo docs  [2]. We are
using this bug [3] for tracking and this effort is a result of some
feedback during the recent Berlin summit.

The general idea is 1 per week (or more if you can and want) -
improvement/removal of stale content/identifying missing sections, or
anything else you might care to propose. Please join us if you can, just
add "Related-Bug: #1804642"  to your commit message

thanks

[1] https://wiki.openstack.org/wiki/Meetings/TripleO
[2] https://docs.openstack.org/tripleo-docs/latest/
[3] https://bugs.launchpad.net/tripleo/+bug/1804642
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

2018-11-27 Thread Bogdan Dobrelya

Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.


Background:
1) For the Distributed Compute Node edge case, there is potentially tens 
of thousands of a single-compute-node remote edge sites connected over 
WAN to a single control plane, which is having high latency, like a 
100ms or so, and limited bandwith. Reducing the base layer size becomes 
a decent goal there. See the security background below.
2) For a generic security (Day 2, maintenance) case, when 
puppet/ruby/systemd/name-it gets a CVE fixed, the base layer has to be 
updated and all layers on top - to be rebuild, and all of those layers, 
to be re-fetched for cloud hosts and all containers to be restarted... 
And all of that because of some fixes that have nothing to OpenStack. By 
the remote edge sites as well, remember of "tens of thousands", high 
latency and limited bandwith?..
3) TripleO CI updates (including puppet*) packages in containers, not in 
a common base layer of those. So each a CI job has to update puppet* and 
its dependencies - ruby/systemd as well. Reducing numbers of packages to 
update for each container makes sense for CI as well.


Implementation related:

WIP patches [0],[1] for early review, uses a config "pod" approach, does 
not require to maintain a two sets of config vs runtime images. Future 
work: a) cronie requires systemd, we'd want to fix that also off the 
base layer. b) rework to podman pods for docker-puppet.py instead of 
--volumes-from a side car container (can't be backported for Queens 
then, which is still nice to have a support for the Edge DCN case, at 
least downstream only perhaps).


Some questions raised on IRC:

Q: is having a service be able to configure itself really need to 
involve a separate pod?
A: Highly likely yes, removing not-runtime things is a good idea and 
pods is an established PaaS paradigm already. That will require some 
changes in the architecture though (see the topic with WIP patches).


Q: that's (fetching a config container) actually more data that about to 
 download otherwise
A: It's not, if thinking of Day 2, when have to re-fetch the base layer 
and top layers, when some unrelated to openstack CVEs got fixed there 
for ruby/puppet/systemd. Avoid the need to restart service containers 
because of those minor updates puched is also a nice thing.


Q: the best solution here would be using packages on the host, 
generating the config files on the host. And then having an all-in-one 
container for all the services which lets them run in an isolated mannner.
A: I think for Edge cases, that's a no go as we might want to consider 
tiny low footprint OS distros like former known Container Linux or 
Atomic. Also, an all-in-one container looks like an anti-pattern from 
the world of VMs.


[0] https://review.openstack.org/#/q/topic:base-container-reduction
[1] https://review.rdoproject.org/r/#/q/topic:base-container-reduction


Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and each 
container!
So if we did so, we should then either install puppet-tripleo and co on 
the host and bind-mount it for the docker-puppet deployment task steps 
(bad idea IMO), OR use the magical --volumes-from  
option to mount volumes from some "puppet-config" sidecar container 
inside each of the containers being launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  
wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35

/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB 


Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--

Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-26 Thread Bogdan Dobrelya

Here is a related bug [1] and implementation [1] for that. PTAL folks!

[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1] https://review.openstack.org/#/q/topic:base-container-reduction


Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and each 
container!
So if we did so, we should then either install puppet-tripleo and co on 
the host and bind-mount it for the docker-puppet deployment task steps 
(bad idea IMO), OR use the magical --volumes-from  
option to mount volumes from some "puppet-config" sidecar container 
inside each of the containers being launched by docker-puppet tooling.


On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  
wrote:

We add this to all images:

https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35

/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
socat sudo which openstack-tripleo-common-container-base rsync cronie
crudini openstack-selinux ansible python-shade puppet-tripleo python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB 


Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind of
touching the filesystem impact the size due to docker layers?

Also: python2-kubernetes is a fairly large package (18007990) do we use
that in every image? I don't see any tripleo related repos importing
from that when searching on Hound? The original commit message[1]
adding it states it is for future convenience.

On my undercloud we have 101 images, if we are downloading every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)


[1] https://review.openstack.org/527927




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Feedback about our project at Summit

2018-11-20 Thread Emilien Macchi
Hi folks,

I wasn't at the Summit but I was interested by the feedback people gave
about TripleO so I've discussed with a few people who made the trip. I
would like to see what actions we can take on short and long term to
address it.
I collected some thoughts here:
https://etherpad.openstack.org/p/BER-tripleo-feedback
Which is based on
https://etherpad.openstack.org/p/BER-deployment-tools-feedback initially.

Feel fee to add more feedback if missing, and also comment what was
written. If you're aware of some WIP that address the feedback, adjust some
wording if needed or just put some links if something exists and is
documented already.
I believe it is critical to us to listen this feedback and include some of
it into our short term roadmap, so we can reduce some frustration that we
can hear sometimes.

Thanks for your help,
-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-19 Thread Steven Hardy
On Thu, Nov 15, 2018 at 3:54 PM Sagi Shnaidman  wrote:
>
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. Quique 
> is actively involved in improvements and development of TripleO and TripleO 
> CI. He also helps in other projects including but not limited to 
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like 
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)

+1!

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-19 Thread Gabriele Cerami
On 15 Nov, Sagi Shnaidman wrote:
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO and
> TripleO CI. He also helps in other projects including but not limited to
> Infrastructure.

It'll be grand.
+1

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-19 Thread Juan Antonio Osorio Robles
+1 on making him tripleo-ci core.


Great work!

On 11/15/18 5:50 PM, Sagi Shnaidman wrote:
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO
> and TripleO CI. He also helps in other projects including but not
> limited to Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd
> like suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)
>
> Thanks
> -- 
> Best regards
> Sagi Shnaidman
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-16 Thread Bogdan Dobrelya

+1

On 11/15/18 4:50 PM, Sagi Shnaidman wrote:

Hi,
I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. 
Quique is actively involved in improvements and development of TripleO 
and TripleO CI. He also helps in other projects including but not 
limited to Infrastructure.
He shows a very good understanding how TripleO and CI works and I'd like 
suggest him as core reviewer of TripleO in CI related code.


Please vote!
My +1 is here :)

Thanks
--
Best regards
Sagi Shnaidman

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-16 Thread Yatin Karel
+1
On Thu, Nov 15, 2018 at 9:24 PM Sagi Shnaidman  wrote:
>
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. Quique 
> is actively involved in improvements and development of TripleO and TripleO 
> CI. He also helps in other projects including but not limited to 
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like 
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)
>
> Thanks
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-16 Thread Felix Enrique Llorente Pastora
Thanks guys!

On Fri, Nov 16, 2018 at 8:26 AM Martin André  wrote:

> On Thu, Nov 15, 2018 at 5:00 PM Wesley Hayutin 
> wrote:
> >
> >
> >
> > On Thu, Nov 15, 2018 at 8:52 AM Sagi Shnaidman 
> wrote:
> >>
> >> Hi,
> >> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO and
> TripleO CI. He also helps in other projects including but not limited to
> Infrastructure.
> >> He shows a very good understanding how TripleO and CI works and I'd
> like suggest him as core reviewer of TripleO in CI related code.
> >>
> >> Please vote!
> >> My +1 is here :)
> >
> >
> > +1 for tripleo-ci core, I don't think we're proposing tripleo core atm.
> > Thanks for proposing and sending this Sagi!
>
> +1
>
> Martin
>
> >>
> >>
> >> Thanks
> >> --
> >> Best regards
> >> Sagi Shnaidman
> >>
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Quique Llorente

Openstack TripleO CI
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Martin André
On Thu, Nov 15, 2018 at 5:00 PM Wesley Hayutin  wrote:
>
>
>
> On Thu, Nov 15, 2018 at 8:52 AM Sagi Shnaidman  wrote:
>>
>> Hi,
>> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. 
>> Quique is actively involved in improvements and development of TripleO and 
>> TripleO CI. He also helps in other projects including but not limited to 
>> Infrastructure.
>> He shows a very good understanding how TripleO and CI works and I'd like 
>> suggest him as core reviewer of TripleO in CI related code.
>>
>> Please vote!
>> My +1 is here :)
>
>
> +1 for tripleo-ci core, I don't think we're proposing tripleo core atm.
> Thanks for proposing and sending this Sagi!

+1

Martin

>>
>>
>> Thanks
>> --
>> Best regards
>> Sagi Shnaidman
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Marios Andreou
On Thu, Nov 15, 2018 at 5:51 PM Sagi Shnaidman  wrote:

> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO and
> TripleO CI. He also helps in other projects including but not limited to
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)
>
> +1++




> Thanks
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Cédric Jeanneret
Of course +1 :).

On 11/15/18 5:37 PM, Emilien Macchi wrote:
> +1 to have him part of TripleO CI core team, thanks for your dedication
> and hard work. I'm glad to see you're learning fast. Keep your
> motivation and thanks again!
> 
> On Thu, Nov 15, 2018 at 11:33 AM Alex Schultz  > wrote:
> 
> +1
> On Thu, Nov 15, 2018 at 8:51 AM Sagi Shnaidman  > wrote:
> >
> > Hi,
> > I'd like to propose Quique (@quiquell) as a core reviewer for
> TripleO. Quique is actively involved in improvements and development
> of TripleO and TripleO CI. He also helps in other projects including
> but not limited to Infrastructure.
> > He shows a very good understanding how TripleO and CI works and
> I'd like suggest him as core reviewer of TripleO in CI related code.
> >
> > Please vote!
> > My +1 is here :)
> >
> > Thanks
> > --
> > Best regards
> > Sagi Shnaidman
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> -- 
> Emilien Macchi
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

-- 
Cédric Jeanneret
Software Engineer
DFG:DF



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Emilien Macchi
+1 to have him part of TripleO CI core team, thanks for your dedication and
hard work. I'm glad to see you're learning fast. Keep your motivation and
thanks again!

On Thu, Nov 15, 2018 at 11:33 AM Alex Schultz  wrote:

> +1
> On Thu, Nov 15, 2018 at 8:51 AM Sagi Shnaidman 
> wrote:
> >
> > Hi,
> > I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO and
> TripleO CI. He also helps in other projects including but not limited to
> Infrastructure.
> > He shows a very good understanding how TripleO and CI works and I'd like
> suggest him as core reviewer of TripleO in CI related code.
> >
> > Please vote!
> > My +1 is here :)
> >
> > Thanks
> > --
> > Best regards
> > Sagi Shnaidman
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Alex Schultz
+1
On Thu, Nov 15, 2018 at 8:51 AM Sagi Shnaidman  wrote:
>
> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO. Quique 
> is actively involved in improvements and development of TripleO and TripleO 
> CI. He also helps in other projects including but not limited to 
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like 
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)
>
> Thanks
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Jiří Stránský

On 15. 11. 18 16:54, Wesley Hayutin wrote:

On Thu, Nov 15, 2018 at 8:52 AM Sagi Shnaidman  wrote:


Hi,
I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
Quique is actively involved in improvements and development of TripleO and
TripleO CI. He also helps in other projects including but not limited to
Infrastructure.
He shows a very good understanding how TripleO and CI works and I'd like
suggest him as core reviewer of TripleO in CI related code.

Please vote!
My +1 is here :)



+1 for tripleo-ci core, I don't think we're proposing tripleo core atm.
Thanks for proposing and sending this Sagi!


+1!






Thanks
--
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Wesley Hayutin
On Thu, Nov 15, 2018 at 8:52 AM Sagi Shnaidman  wrote:

> Hi,
> I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
> Quique is actively involved in improvements and development of TripleO and
> TripleO CI. He also helps in other projects including but not limited to
> Infrastructure.
> He shows a very good understanding how TripleO and CI works and I'd like
> suggest him as core reviewer of TripleO in CI related code.
>
> Please vote!
> My +1 is here :)
>

+1 for tripleo-ci core, I don't think we're proposing tripleo core atm.
Thanks for proposing and sending this Sagi!


>
> Thanks
> --
> Best regards
> Sagi Shnaidman
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Proposing Enrique Llorente Pastora as a core reviewer for TripleO

2018-11-15 Thread Sagi Shnaidman
Hi,
I'd like to propose Quique (@quiquell) as a core reviewer for TripleO.
Quique is actively involved in improvements and development of TripleO and
TripleO CI. He also helps in other projects including but not limited to
Infrastructure.
He shows a very good understanding how TripleO and CI works and I'd like
suggest him as core reviewer of TripleO in CI related code.

Please vote!
My +1 is here :)

Thanks
-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Proposal: Remove newton/ocata CI jobs

2018-11-14 Thread Rafael Folco
Greetings,

The non-containerized multinode scenario jobs were active up to Ocata
release and are no longer supported. I'm proposing a cleanup [1] on these
old jobs so I've added this topic to the next tripleo meeting agenda [2] to
discuss with the tripleo team.
Since this may affect multiple projects, these jobs need to be deleted from
their respective zuul config before the cleanup on tripleo-ci.

Thanks,
--Folco

[1] https://review.openstack.org/#/c/617999/
[2] https://etherpad.openstack.org/p/tripleo-meeting-items
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] no recheck / no workflow until gate is stable

2018-11-13 Thread Emilien Macchi
We have serious issues with the gate at this time, we believe it is a mix
of mirrors errors (infra) and tempest timeouts (see
https://review.openstack.org/617845).

Until the situation is resolved, do not recheck or approve any patch for
now.
Thanks for your understanding,
-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] puppet5 has broken the master gate

2018-11-12 Thread Chandan kumar
Hello Alex,

On Tue, Nov 13, 2018 at 9:53 AM Alex Schultz  wrote:
>
> Just a heads up but we recently updated to puppet5 in the master
> dependencies. It appears that this has completely hosed the master
> scenarios and containers-multinode jobs.  Please do recheck/approve
> anything until we get this resolved.
>
> See https://bugs.launchpad.net/tripleo/+bug/1803024
>
> I have a possible fix (https://review.openstack.org/#/c/617441/) but
> it's probably a better idea to roll back the puppet package if
> possible.
>

In RDO, we have reverted Revert "Stein: push puppet 5.5.6" ->
https://review.rdoproject.org/r/#/c/17333/1

Thanks for the heads up!

Thanks,

Chandan Kumar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo][openstack-ansible] Updates on collaboration on os_tempest role

2018-11-12 Thread Chandan kumar
Hello,

During the starting of Denver 2018 PTG [1]., We started collaborating
towards using the
openstack-ansible-os_tempest role [2] as a unified tempest role in TripleO and
openstack-ansible project within OpenStack community.

It will help us to improve the testing strategies between two projects
which can be
further expanded to other OpenStack deployment tools.

We will be sharing bi-weekly updates through mailing lists.
We are tracking/planning all the work here:
Proposal doc: https://etherpad.openstack.org/p/ansible-tempest-role
Work item collaboration doc:
https://etherpad.openstack.org/p/openstack-ansible-tempest

Here is the update till now:
openstack-ansible-os_tempest project:

* Enable stackviz support - https://review.openstack.org/603100
* Added support for installing tempest from distro -
https://review.openstack.org/591424
* Fixed missing ; from if statement in tempest_run -
https://review.openstack.org/614521
* Added task to list tempest plugins - https://review.openstack.org/615837
* Remove apt_package_pinning dependency from os_tempest role -
https://review.openstack.org/609992
* Enable python-tempestconf support - https://review.openstack.org/612968

Support added to openstack/rpm-packaging project (will be consumed in
os_tempest role):
* Added spec file for stackviz - https://review.openstack.org/609337
* Add initial spec for python-tempestconf - https://review.openstack.org/598143

Upcoming improvements:
* Finish the integration of python-tempestconf in os_tempest role.

Have queries, Feel free to ping us on #tripleo or #openstack-ansible channel.

Links:
[1.] http://lists.openstack.org/pipermail/openstack-dev/2018-August/133119.html
[2.] http://git.openstack.org/cgit/openstack/openstack-ansible-os_tempest

Thanks,

Chandan Kumar

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] puppet5 has broken the master gate

2018-11-12 Thread Alex Schultz
Just a heads up but we recently updated to puppet5 in the master
dependencies. It appears that this has completely hosed the master
scenarios and containers-multinode jobs.  Please do recheck/approve
anything until we get this resolved.

See https://bugs.launchpad.net/tripleo/+bug/1803024

I have a possible fix (https://review.openstack.org/#/c/617441/) but
it's probably a better idea to roll back the puppet package if
possible.

Thanks,
-Alex

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] using molecule to test ansible playbooks and roles

2018-11-11 Thread Paul Belanger
On Sun, Nov 11, 2018 at 11:29:43AM +, Sorin Sbarnea wrote:
> I recently came across molecule   a 
> project originated at Cisco which recently become an officially Ansible 
> project, at the same time as ansible-lint. Both projects were transferred 
> from their former locations to Ansible github organization -- I guess as a 
> confirmation that they are now officially supported by the core team. I used 
> ansible-lint for years and it did same me a lot of time, molecule is still 
> new to me.
> 
> Few weeks back I started to play with molecule as at least on paper it was 
> supposed to resolve the problem of testing roles on multiple platforms and 
> usage scenarios and while the work done for enabling tripleo-quickstart to 
> support fedora-28 (py3). I was trying to find a faster way to test these 
> changes faster and locally --- and avoid increasing the load on CI before I 
> get the confirmation that code works locally.
> 
> The results of my testing that started about two weeks ago are very positive 
> and can be seen on:
> https://review.openstack.org/#/c/613672/ 
> 
> You can find there a job names opentstack-tox-molecule which runs in 
> ~15minutes but this is only because on CI docker caching does not work as 
> well as locally, locally it re-runs in ~2-3minutes.
> 
> I would like to hear your thoughts on this and if you also have some time to 
> checkout that change and run it yourself it would be wonderful.
> 
> Once you download the change you only have to run "tox -e molecule", (or 
> "make" which also clones sister extras repo if needed)
> 
> Feel free to send questions to the change itself, on #oooq or by email.
> 
I've been doing this for a while with ansible-role-nodepool[1], same
idea you run tox -emolecule and the role will use the docker backend to
validate. I also run it in the gate (with docker backend) however this
is online to validate that end users will not be broken locally if they
run tox -emolecule. There is a downside with docker, no systemd
integration, which is fine for me as I have other tests that are able to
provide coverage.

With zuul, it really isn't needed to run nested docker for linters and
smoke testing, as it mostly creates unneeded overhead.  However, if you
do want to standardize on molecule, I recommend you don't use docker
backend but use the delegated and reused the inventory provided by zuul.
Then you still use molecule but get the bonus of using the VMs presented
by zuul / nodepool.

- Paul

[1] http://git.openstack.org/cgit/openstack/ansible-role-nodepool

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] using molecule to test ansible playbooks and roles

2018-11-11 Thread Sorin Sbarnea
I recently came across molecule   a 
project originated at Cisco which recently become an officially Ansible 
project, at the same time as ansible-lint. Both projects were transferred from 
their former locations to Ansible github organization -- I guess as a 
confirmation that they are now officially supported by the core team. I used 
ansible-lint for years and it did same me a lot of time, molecule is still new 
to me.

Few weeks back I started to play with molecule as at least on paper it was 
supposed to resolve the problem of testing roles on multiple platforms and 
usage scenarios and while the work done for enabling tripleo-quickstart to 
support fedora-28 (py3). I was trying to find a faster way to test these 
changes faster and locally --- and avoid increasing the load on CI before I get 
the confirmation that code works locally.

The results of my testing that started about two weeks ago are very positive 
and can be seen on:
https://review.openstack.org/#/c/613672/ 

You can find there a job names opentstack-tox-molecule which runs in ~15minutes 
but this is only because on CI docker caching does not work as well as locally, 
locally it re-runs in ~2-3minutes.

I would like to hear your thoughts on this and if you also have some time to 
checkout that change and run it yourself it would be wonderful.

Once you download the change you only have to run "tox -e molecule", (or "make" 
which also clones sister extras repo if needed)

Feel free to send questions to the change itself, on #oooq or by email.

Cheers
Sorin Sbarnea__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] TripleO CI Summary: Sprint 21

2018-11-09 Thread Rafael Folco
Greetings,

The TripleO CI team has just completed Sprint 21 (Oct-18 thru Nov-07).
The following is a summary of completed work during this sprint cycle:

   -

   Created an initial base Standalone job for Fedora 28.
   -

   Added initial support for installing Tempest rpm in
   openstack-ansible_os-tempest.
   -

   Started running project specific tempest tests against puppet-projects
   in tripleo-standalone gates.
   -

   Added initial support to python-tempestconf on
   openstack-ansible_os-tempest.
   -

   Prepared grounds to make all required variables for the zuulv3 workflow
   available for the reproducer.


The sprint task board for CI team has moved from Trello to Taiga [1]. The
Ruck and Rover notes for this sprint has been tracked in the etherpad [2].

The planned work for the next sprint focuses in iterating on the upstream
standalone job for Fedora 28 to bring it to completion. This includes
moving the multinode scenarios to the standalone jobs. The team continues
to work on the reproducer, enabling Tempest coverage in puppet-* projects,
and preparing a CI environment for OVB.

The Ruck and Rover for this sprint are Gabriele Cerami (panda) and Chandan
Kumar (chkumar). Please direct questions or queries to them regarding CI
status or issues in #tripleo, ideally to whomever has the ‘|ruck’ suffix on
their nick.

Thanks,

Folco

[1] https://tree.taiga.io/project/tripleo-ci-board/taskboard/sprint-21-175

[2] https://review.rdoproject.org/etherpad/p/ruckrover-sprint21
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] No meeting next week for the security squad

2018-11-09 Thread Juan Antonio Osorio Robles
There will be no meeting for the security squad next Tuesday 13th of November 
since there's the
OpenStack summit.


Best Regards


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] No weekly meeting next week

2018-11-09 Thread Juan Antonio Osorio Robles
There will be no meeting next Tuesday 13th of November since there's the
OpenStack summit.


Best Regards


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI is broken

2018-11-07 Thread Emilien Macchi
No alert anymore, gate is green.
recheck if needed.

On Wed, Nov 7, 2018 at 2:22 PM Emilien Macchi  wrote:

> I updated the bugs, and so far we have one alert left:
> https://bugs.launchpad.net/tripleo/+bug/1801969
>
> The patch is in gate, be patient and then we'll be able to +A/recheck
> stuff again.
>
> On Wed, Nov 7, 2018 at 7:30 AM Juan Antonio Osorio Robles <
> jaosor...@redhat.com> wrote:
>
>> Hello folks,
>>
>>
>> Please do not attempt to merge or recheck patches until we get this
>> sorted out.
>>
>> We are dealing with several issues that have broken all jobs.
>>
>> https://bugs.launchpad.net/tripleo/+bug/1801769
>> https://bugs.launchpad.net/tripleo/+bug/1801969
>> https://bugs.launchpad.net/tripleo/+bug/1802083
>> https://bugs.launchpad.net/tripleo/+bug/1802085
>>
>> Best Regards!
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Emilien Macchi
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] Edge squad meeting this week and next week

2018-11-07 Thread James Slagle
I won't be around to run the Edge squad meeting this week and next
week. If someone else wants to pick it up, that would be great.
Otherwise, consider it cancelled :). Thanks!

https://etherpad.openstack.org/p/tripleo-edge-squad-status

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] CI is broken

2018-11-07 Thread Emilien Macchi
I updated the bugs, and so far we have one alert left:
https://bugs.launchpad.net/tripleo/+bug/1801969

The patch is in gate, be patient and then we'll be able to +A/recheck stuff
again.

On Wed, Nov 7, 2018 at 7:30 AM Juan Antonio Osorio Robles <
jaosor...@redhat.com> wrote:

> Hello folks,
>
>
> Please do not attempt to merge or recheck patches until we get this
> sorted out.
>
> We are dealing with several issues that have broken all jobs.
>
> https://bugs.launchpad.net/tripleo/+bug/1801769
> https://bugs.launchpad.net/tripleo/+bug/1801969
> https://bugs.launchpad.net/tripleo/+bug/1802083
> https://bugs.launchpad.net/tripleo/+bug/1802085
>
> Best Regards!
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] CI is broken

2018-11-07 Thread Juan Antonio Osorio Robles
Hello folks,


Please do not attempt to merge or recheck patches until we get this
sorted out.

We are dealing with several issues that have broken all jobs.

https://bugs.launchpad.net/tripleo/+bug/1801769
https://bugs.launchpad.net/tripleo/+bug/1801969
https://bugs.launchpad.net/tripleo/+bug/1802083
https://bugs.launchpad.net/tripleo/+bug/1802085

Best Regards!


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] shutting down 3rd party TripleO CI for measurements

2018-11-06 Thread Sagi Shnaidman
We measured results and would like to shut down check jobs in RDO cloud CI
today. Please let us know if you have objections.

Thanks

On Thu, Nov 1, 2018 at 12:14 AM Wesley Hayutin  wrote:

> Greetings,
>
> The TripleO-CI team would like to consider shutting down all the third
> party check jobs running against TripleO projects in order to measure
> results with and without load on the cloud for some amount of time.  I
> suspect we would want to shut things down for roughly 24-48 hours.
>
> If there are any strong objects please let us know.
> Thank you
> --
>
> Wes Hayutin
>
> Associate MANAGER
>
> Red Hat
>
> 
>
> whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay
> 
>
> View my calendar and check my availability for meetings HERE
> 
>


-- 
Best regards
Sagi Shnaidman
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Ansible getting bumped up from 2.4 -> 2.6.6

2018-11-06 Thread Giulio Fidente
On 11/5/18 11:23 PM, Wesley Hayutin wrote:
> Greetings,
> 
> Please be aware of the following patch [1].  This updates ansible in
> queens, rocky, and stein.
>  This was just pointed out to me, and I didn't see it coming so I
> thought I'd email the group.
> 
> That is all, thanks
> 
> [1] https://review.rdoproject.org/r/#/c/14960
thanks Wes for bringing this up

note that we're trying to update ansible to 2.6 because 2.4 is
unsupported and 2.5 is only receiving security fixes already

with the upcoming updates for ceph-ansible in ceph luminous, support for
older ansible releases will be dropped
-- 
Giulio Fidente
GPG KEY: 08D733BA

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] Ansible getting bumped up from 2.4 -> 2.6.6

2018-11-05 Thread Wesley Hayutin
Greetings,

Please be aware of the following patch [1].  This updates ansible in
queens, rocky, and stein.
 This was just pointed out to me, and I didn't see it coming so I thought
I'd email the group.

That is all, thanks

[1] https://review.rdoproject.org/r/#/c/14960
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] PSA lets use deploy_steps_tasks

2018-11-05 Thread Dan Prince
On Mon, Nov 5, 2018 at 4:06 AM Cédric Jeanneret  wrote:
>
> On 11/2/18 2:39 PM, Dan Prince wrote:
> > I pushed a patch[1] to update our containerized deployment
> > architecture docs yesterday. There are 2 new fairly useful sections we
> > can leverage with TripleO's stepwise deployment. They appear to be
> > used somewhat sparingly so I wanted to get the word out.
>
> Good thing, it's important to highlight this feature and explain how it
> works, big thumb up Dan!
>
> >
> > The first is 'deploy_steps_tasks' which gives you a means to run
> > Ansible snippets on each node/role in a stepwise fashion during
> > deployment. Previously it was only possible to execute puppet or
> > docker commands where as now that we have deploy_steps_tasks we can
> > execute ad-hoc ansible in the same manner.
>
> I'm wondering if such a thing could be used for the "inflight
> validations" - i.e. a step to validate a service/container is working as
> expected once it's deployed, in order to get early failure.
> For instance, we deploy a rabbitmq container, and right after it's
> deployed, we'd like to ensure it's actually running and works as
> expected before going forward in the deploy.
>
> Care to have a look at that spec[1] and see if, instead of adding a new
> "validation_tasks" entry, we could "just" use the "deploy_steps_tasks"
> with the right step number? That would be really, really cool, and will
> probably avoid a lot of code in the end :).

It could work fine I think. As deploy-steps-tasks runs before the
"common container/baremetal" actions special care would need to be
taken so that validations for a containers startup occur at the
beginning of the next step. So a container started at step 2 would be
validated early in step 3. This may also require us to have a "post"
deploy_steps_tasks" iteration so that we can validate late starting
containers.

If if we use the more generic deploy_steps_tasks section we'd probably
rely on conventions to always add Ansible tags onto the validation
tasks. These could be useful for those wanting to selectively execute
them externally (not sure if that was part of your spec but I could
see someone wanting this).

Dan

>
> Thank you!
>
> C.
>
> [1] https://review.openstack.org/#/c/602007/
>
> >
> > The second is 'external_deploy_tasks' which allows you to use run
> > Ansible snippets on the Undercloud during stepwise deployment. This is
> > probably most useful for driving an external installer but might also
> > help with some complex tasks that need to originate from a single
> > Ansible client.
> >
> > The only downside I see to these approaches is that both appear to be
> > implemented with Ansible's default linear strategy. I saw shardy's
> > comment here [2] that the :free strategy does not yet apparently work
> > with the any_errors_fatal option. Perhaps we can reach out to someone
> > in the Ansible community in this regard to improve running these
> > things in parallel like TripleO used to work with Heat agents.
> >
> > This is also how host_prep_tasks is implemented which BTW we should
> > now get rid of as a duplicate architectural step since we have
> > deploy_steps_tasks anyway.
> >
> > [1] https://review.openstack.org/#/c/614822/
> > [2] 
> > http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/common/deploy-steps.j2#n554
> >
> > __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> --
> Cédric Jeanneret
> Software Engineer
> DFG:DF
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-05 Thread Alex Schultz
On Mon, Nov 5, 2018 at 3:47 AM Bogdan Dobrelya  wrote:
>
> Let's also think of removing puppet-tripleo from the base container.
> It really brings the world-in (and yum updates in CI!) each job and each
> container!
> So if we did so, we should then either install puppet-tripleo and co on
> the host and bind-mount it for the docker-puppet deployment task steps
> (bad idea IMO), OR use the magical --volumes-from 
> option to mount volumes from some "puppet-config" sidecar container
> inside each of the containers being launched by docker-puppet tooling.
>

This does bring an interesting point as we also include this in
overcloud-full. I know Dan had a patch to stop using the
puppet-tripleo from the host[0] which is the opposite of this.  While
these yum updates happen a bunch in CI, they aren't super large
updates. But yes I think we need to figure out the correct way forward
with these packages.

Thanks,
-Alex

[0] https://review.openstack.org/#/c/550848/


> On 10/31/18 6:35 PM, Alex Schultz wrote:
> >
> > So this is a single layer that is updated once and shared by all the
> > containers that inherit from it. I did notice the same thing and have
> > proposed a change in the layering of these packages last night.
> >
> > https://review.openstack.org/#/c/614371/
> >
> > In general this does raise a point about dependencies of services and
> > what the actual impact of adding new ones to projects is. Especially
> > in the container world where this might be duplicated N times
> > depending on the number of services deployed.  With the move to
> > containers, much of the sharedness that being on a single host
> > provided has been lost at a cost of increased bandwidth, memory, and
> > storage usage.
> >
> > Thanks,
> > -Alex
> >
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-05 Thread Cédric Jeanneret


On 11/5/18 11:47 AM, Bogdan Dobrelya wrote:
> Let's also think of removing puppet-tripleo from the base container.
> It really brings the world-in (and yum updates in CI!) each job and each
> container!
> So if we did so, we should then either install puppet-tripleo and co on
> the host and bind-mount it for the docker-puppet deployment task steps
> (bad idea IMO), OR use the magical --volumes-from 
> option to mount volumes from some "puppet-config" sidecar container
> inside each of the containers being launched by docker-puppet tooling.

And, in addition, I'd rather see the "podman" thingy as a bind-mount,
especially since we MUST get the same version in all the calls.

> 
> On 10/31/18 6:35 PM, Alex Schultz wrote:
>>
>> So this is a single layer that is updated once and shared by all the
>> containers that inherit from it. I did notice the same thing and have
>> proposed a change in the layering of these packages last night.
>>
>> https://review.openstack.org/#/c/614371/
>>
>> In general this does raise a point about dependencies of services and
>> what the actual impact of adding new ones to projects is. Especially
>> in the container world where this might be duplicated N times
>> depending on the number of services deployed.  With the move to
>> containers, much of the sharedness that being on a single host
>> provided has been lost at a cost of increased bandwidth, memory, and
>> storage usage.
>>
>> Thanks,
>> -Alex
>>
> 

-- 
Cédric Jeanneret
Software Engineer
DFG:DF



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-05 Thread Bogdan Dobrelya

Let's also think of removing puppet-tripleo from the base container.
It really brings the world-in (and yum updates in CI!) each job and each 
container!
So if we did so, we should then either install puppet-tripleo and co on 
the host and bind-mount it for the docker-puppet deployment task steps 
(bad idea IMO), OR use the magical --volumes-from  
option to mount volumes from some "puppet-config" sidecar container 
inside each of the containers being launched by docker-puppet tooling.


On 10/31/18 6:35 PM, Alex Schultz wrote:


So this is a single layer that is updated once and shared by all the
containers that inherit from it. I did notice the same thing and have
proposed a change in the layering of these packages last night.

https://review.openstack.org/#/c/614371/

In general this does raise a point about dependencies of services and
what the actual impact of adding new ones to projects is. Especially
in the container world where this might be duplicated N times
depending on the number of services deployed.  With the move to
containers, much of the sharedness that being on a single host
provided has been lost at a cost of increased bandwidth, memory, and
storage usage.

Thanks,
-Alex



--
Best regards,
Bogdan Dobrelya,
Irc #bogdando

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] PSA lets use deploy_steps_tasks

2018-11-05 Thread Cédric Jeanneret
On 11/2/18 2:39 PM, Dan Prince wrote:
> I pushed a patch[1] to update our containerized deployment
> architecture docs yesterday. There are 2 new fairly useful sections we
> can leverage with TripleO's stepwise deployment. They appear to be
> used somewhat sparingly so I wanted to get the word out.

Good thing, it's important to highlight this feature and explain how it
works, big thumb up Dan!

> 
> The first is 'deploy_steps_tasks' which gives you a means to run
> Ansible snippets on each node/role in a stepwise fashion during
> deployment. Previously it was only possible to execute puppet or
> docker commands where as now that we have deploy_steps_tasks we can
> execute ad-hoc ansible in the same manner.

I'm wondering if such a thing could be used for the "inflight
validations" - i.e. a step to validate a service/container is working as
expected once it's deployed, in order to get early failure.
For instance, we deploy a rabbitmq container, and right after it's
deployed, we'd like to ensure it's actually running and works as
expected before going forward in the deploy.

Care to have a look at that spec[1] and see if, instead of adding a new
"validation_tasks" entry, we could "just" use the "deploy_steps_tasks"
with the right step number? That would be really, really cool, and will
probably avoid a lot of code in the end :).

Thank you!

C.

[1] https://review.openstack.org/#/c/602007/

> 
> The second is 'external_deploy_tasks' which allows you to use run
> Ansible snippets on the Undercloud during stepwise deployment. This is
> probably most useful for driving an external installer but might also
> help with some complex tasks that need to originate from a single
> Ansible client.
> 
> The only downside I see to these approaches is that both appear to be
> implemented with Ansible's default linear strategy. I saw shardy's
> comment here [2] that the :free strategy does not yet apparently work
> with the any_errors_fatal option. Perhaps we can reach out to someone
> in the Ansible community in this regard to improve running these
> things in parallel like TripleO used to work with Heat agents.
> 
> This is also how host_prep_tasks is implemented which BTW we should
> now get rid of as a duplicate architectural step since we have
> deploy_steps_tasks anyway.
> 
> [1] https://review.openstack.org/#/c/614822/
> [2] 
> http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/common/deploy-steps.j2#n554
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

-- 
Cédric Jeanneret
Software Engineer
DFG:DF



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] PSA lets use deploy_steps_tasks

2018-11-02 Thread James Slagle
On Fri, Nov 2, 2018 at 9:39 AM Dan Prince  wrote:
>
> I pushed a patch[1] to update our containerized deployment
> architecture docs yesterday. There are 2 new fairly useful sections we
> can leverage with TripleO's stepwise deployment. They appear to be
> used somewhat sparingly so I wanted to get the word out.
>
> The first is 'deploy_steps_tasks' which gives you a means to run
> Ansible snippets on each node/role in a stepwise fashion during
> deployment. Previously it was only possible to execute puppet or
> docker commands where as now that we have deploy_steps_tasks we can
> execute ad-hoc ansible in the same manner.
>
> The second is 'external_deploy_tasks' which allows you to use run
> Ansible snippets on the Undercloud during stepwise deployment. This is
> probably most useful for driving an external installer but might also
> help with some complex tasks that need to originate from a single
> Ansible client.

+1


> The only downside I see to these approaches is that both appear to be
> implemented with Ansible's default linear strategy. I saw shardy's
> comment here [2] that the :free strategy does not yet apparently work
> with the any_errors_fatal option. Perhaps we can reach out to someone
> in the Ansible community in this regard to improve running these
> things in parallel like TripleO used to work with Heat agents.

It's effectively parallel across one role at a time at the moment, up
to the number of configured forks (default: 25). The reason it won't
parallelize across roles, is because it's a different task file used
with import_tasks for each role. Ansible won't run that in parallel
since the task list is different.

I was able to make this parallel across roles for the pre and post
deployments by making the task file the same for each role, and
controlling the difference with group and host vars:
https://review.openstack.org/#/c/574474/
From Ansible's perspective, the task list is now the same for each
host, although different things will be done depending on the value of
vars for each host.

It's possible a similar approach could be done with the other
interfaces you point out here.

In addition to the any_errors_fatal issue when using strategy:free, is
that you'd also lose the grouping of the task output per role after
each task finishes. This is mostly cosmetic, but using free does
create a lot more noisier output IMO.

-- 
-- James Slagle
--

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] PSA lets use deploy_steps_tasks

2018-11-02 Thread Juan Antonio Osorio Robles
Thanks! We have been slow to update our docs. I did put up a blog post
about these sections of the templates [1], in case folks find that useful.


[1] http://jaormx.github.io/2018/dissecting-tripleo-service-templates-p2/

On 11/2/18 3:39 PM, Dan Prince wrote:
> I pushed a patch[1] to update our containerized deployment
> architecture docs yesterday. There are 2 new fairly useful sections we
> can leverage with TripleO's stepwise deployment. They appear to be
> used somewhat sparingly so I wanted to get the word out.
>
> The first is 'deploy_steps_tasks' which gives you a means to run
> Ansible snippets on each node/role in a stepwise fashion during
> deployment. Previously it was only possible to execute puppet or
> docker commands where as now that we have deploy_steps_tasks we can
> execute ad-hoc ansible in the same manner.
>
> The second is 'external_deploy_tasks' which allows you to use run
> Ansible snippets on the Undercloud during stepwise deployment. This is
> probably most useful for driving an external installer but might also
> help with some complex tasks that need to originate from a single
> Ansible client.
>
> The only downside I see to these approaches is that both appear to be
> implemented with Ansible's default linear strategy. I saw shardy's
> comment here [2] that the :free strategy does not yet apparently work
> with the any_errors_fatal option. Perhaps we can reach out to someone
> in the Ansible community in this regard to improve running these
> things in parallel like TripleO used to work with Heat agents.
>
> This is also how host_prep_tasks is implemented which BTW we should
> now get rid of as a duplicate architectural step since we have
> deploy_steps_tasks anyway.
>
> [1] https://review.openstack.org/#/c/614822/
> [2] 
> http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/common/deploy-steps.j2#n554
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] PSA lets use deploy_steps_tasks

2018-11-02 Thread Dan Prince
I pushed a patch[1] to update our containerized deployment
architecture docs yesterday. There are 2 new fairly useful sections we
can leverage with TripleO's stepwise deployment. They appear to be
used somewhat sparingly so I wanted to get the word out.

The first is 'deploy_steps_tasks' which gives you a means to run
Ansible snippets on each node/role in a stepwise fashion during
deployment. Previously it was only possible to execute puppet or
docker commands where as now that we have deploy_steps_tasks we can
execute ad-hoc ansible in the same manner.

The second is 'external_deploy_tasks' which allows you to use run
Ansible snippets on the Undercloud during stepwise deployment. This is
probably most useful for driving an external installer but might also
help with some complex tasks that need to originate from a single
Ansible client.

The only downside I see to these approaches is that both appear to be
implemented with Ansible's default linear strategy. I saw shardy's
comment here [2] that the :free strategy does not yet apparently work
with the any_errors_fatal option. Perhaps we can reach out to someone
in the Ansible community in this regard to improve running these
things in parallel like TripleO used to work with Heat agents.

This is also how host_prep_tasks is implemented which BTW we should
now get rid of as a duplicate architectural step since we have
deploy_steps_tasks anyway.

[1] https://review.openstack.org/#/c/614822/
[2] 
http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/common/deploy-steps.j2#n554

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Bob Fournier as core reviewer

2018-11-01 Thread Bob Fournier
On Thu, Nov 1, 2018 at 4:26 PM Emilien Macchi  wrote:

> done, you're now core in TripleO; Thanks Bob for your hard work!
>

Thank you Emilien, Juan, and everyone else!


>
> On Mon, Oct 22, 2018 at 4:55 PM Jason E. Rist  wrote:
>
>> On 10/19/2018 06:23 AM, Juan Antonio Osorio Robles wrote:
>> > Hello!
>> >
>> >
>> > I would like to propose Bob Fournier (bfournie) as a core reviewer in
>> > TripleO. His patches and reviews have spanned quite a wide range in our
>> > project, his reviews show great insight and quality and I think he would
>> > be a addition to the core team.
>> >
>> > What do you folks think?
>> >
>> >
>> > Best Regards
>> >
>> >
>> >
>> >
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>> yup.
>>
>> --
>> Jason E. Rist
>> Senior Software Engineer
>> OpenStack User Interfaces
>> Red Hat, Inc.
>> Freenode: jrist
>> github/twitter: knowncitizen
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Emilien Macchi
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Proposing Bob Fournier as core reviewer

2018-11-01 Thread Emilien Macchi
done, you're now core in TripleO; Thanks Bob for your hard work!

On Mon, Oct 22, 2018 at 4:55 PM Jason E. Rist  wrote:

> On 10/19/2018 06:23 AM, Juan Antonio Osorio Robles wrote:
> > Hello!
> >
> >
> > I would like to propose Bob Fournier (bfournie) as a core reviewer in
> > TripleO. His patches and reviews have spanned quite a wide range in our
> > project, his reviews show great insight and quality and I think he would
> > be a addition to the core team.
> >
> > What do you folks think?
> >
> >
> > Best Regards
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> yup.
>
> --
> Jason E. Rist
> Senior Software Engineer
> OpenStack User Interfaces
> Red Hat, Inc.
> Freenode: jrist
> github/twitter: knowncitizen
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] gate issues please do not approve/recheck

2018-11-01 Thread Wesley Hayutin
Thanks Alex!

On Thu, Nov 1, 2018 at 10:27 AM Alex Schultz  wrote:

> Ok since the podman revert patche has been successfully merged and
> we've landed most of the non-voting scenario patches, it should be OK
> to restore/recheck.  It would be a good idea to prioritized things to
> land and if it's not critical, let's hold off on approving until we're
> sure the gate is much better.
>
> Thanks,
> -Alex
>
> On Wed, Oct 31, 2018 at 9:39 AM Alex Schultz  wrote:
> >
> > Hey folks,
> >
> > So we have identified an issue that has been causing a bunch of
> > failures and proposed a revert of our podman testing[0].  We have
> > cleared the gate and are asking that you not approve or recheck any
> > patches at this time.  We will let you know when it is safe to start
> > approving things.
> >
> > Thanks,
> > -Alex
> >
> > [0] https://review.openstack.org/#/c/614537/
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-11-01 Thread Ben Nemec



On 10/30/18 4:16 PM, Clark Boylan wrote:

On Tue, Oct 30, 2018, at 1:01 PM, Ben Nemec wrote:



On 10/30/18 1:25 PM, Clark Boylan wrote:

On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:

On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec  wrote:


Tagging with tripleo since my suggestion below is specific to that project.

On 10/30/18 11:03 AM, Clark Boylan wrote:

Hello everyone,

A little while back I sent email explaining how the gate queues work and how 
fixing bugs helps us test and merge more code. All of this still is still true 
and we should keep pushing to improve our testing to avoid gate resets.

Last week we migrated Zuul and Nodepool to a new Zookeeper cluster. In the 
process of doing this we had to restart Zuul which brought in a new logging 
feature that exposes node resource usage by jobs. Using this data I've been 
able to generate some report information on where our node demand is going. 
This change [0] produces this report [1].

As with optimizing software we want to identify which changes will have the 
biggest impact and to be able to measure whether or not changes have had an 
impact once we have made them. Hopefully this information is a start at doing 
that. Currently we can only look back to the point Zuul was restarted, but we 
have a thirty day log rotation for this service and should be able to look at a 
month's worth of data going forward.

Looking at the data you might notice that Tripleo is using many more node 
resources than our other projects. They are aware of this and have a plan [2] 
to reduce their resource consumption. We'll likely be using this report 
generator to check progress of this plan over time.


I know at one point we had discussed reducing the concurrency of the
tripleo gate to help with this. Since tripleo is still using >50% of the
resources it seems like maybe we should revisit that, at least for the
short-term until the more major changes can be made? Looking through the
merge history for tripleo projects I don't see a lot of cases (any, in
fact) where more than a dozen patches made it through anyway*, so I
suspect it wouldn't have a significant impact on gate throughput, but it
would free up quite a few nodes for other uses.



It's the failures in gate and resets.  At this point I think it would
be a good idea to turn down the concurrency of the tripleo queue in
the gate if possible. As of late it's been timeouts but we've been
unable to track down why it's timing out specifically.  I personally
have a feeling it's the container download times since we do not have
a local registry available and are only able to leverage the mirrors
for some levels of caching. Unfortunately we don't get the best
information about this out of docker (or the mirrors) and it's really
hard to determine what exactly makes things run a bit slower.


We actually tried this not too long ago 
https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
 but decided to revert it because it didn't decrease the check queue backlog 
significantly. We were still running at several hours behind most of the time.


I'm surprised to hear that. Counting the tripleo jobs in the gate at
positions 11-20 right now, I see around 84 nodes tied up in long-running
jobs and another 32 for shorter unit test jobs. The latter probably
don't have much impact, but the former is a non-trivial amount. It may
not erase the entire 2300+ job queue that we have right now, but it
seems like it should help.



If we want to set up better monitoring and measuring and try it again we can do 
that. But we probably want to measure queue sizes with and without the change 
like that to better understand if it helps.


This seems like good information to start capturing, otherwise we are
kind of just guessing. Is there something in infra already that we could
use or would it need to be new tooling?


Digging around in graphite we currently track mean in pipelines. This is 
probably a reasonable metric to use for this specific case.

Looking at the check queue [3] shows the mean time enqueued in check during the rough 
period window floor was 10 and [4] shows it since then. The 26th and 27th are bigger 
peaks than previously seen (possibly due to losing inap temporarily) but otherwise a 
queue backlog of ~200 minutes was "normal" in both time periods.

[3] 
http://graphite.openstack.org/render/?from=20181015=20181019=scale(stats.timers.zuul.tenant.openstack.pipeline.check.resident_time.mean,%200.166)
[4] 
http://graphite.openstack.org/render/?from=20181019=20181030=scale(stats.timers.zuul.tenant.openstack.pipeline.check.resident_time.mean,%200.166)

You should be able to change check to eg gate or other queue names and poke 
around more if you like. Note the scale factor scales from milliseconds to 
minutes.

Clark



Cool, thanks. Seems like things have been better for the past couple of 
days, but I'll keep this in my back pocket for 

Re: [openstack-dev] [tripleo] gate issues please do not approve/recheck

2018-11-01 Thread Alex Schultz
Ok since the podman revert patche has been successfully merged and
we've landed most of the non-voting scenario patches, it should be OK
to restore/recheck.  It would be a good idea to prioritized things to
land and if it's not critical, let's hold off on approving until we're
sure the gate is much better.

Thanks,
-Alex

On Wed, Oct 31, 2018 at 9:39 AM Alex Schultz  wrote:
>
> Hey folks,
>
> So we have identified an issue that has been causing a bunch of
> failures and proposed a revert of our podman testing[0].  We have
> cleared the gate and are asking that you not approve or recheck any
> patches at this time.  We will let you know when it is safe to start
> approving things.
>
> Thanks,
> -Alex
>
> [0] https://review.openstack.org/#/c/614537/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-11-01 Thread Derek Higgins
On Wed, 31 Oct 2018 at 17:22, Alex Schultz  wrote:
>
> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex

While on the topic of reducing the CI footprint

something worth considering when pushing up a string of patches would
be to remove a bunch of the check jobs at the start of the patch set.

e.g. If I'm working on t-h-t and have a series of 10 patches, while
looking for feedback I could remove most of the jobs from
zuul.d/layout.yaml in patch 1 so all 10 patches don't run the entire
suite of CI jobs. Once it becomes clear that the patchset is nearly
ready to merge, I change patch 1 leave zuul.d/layout.yaml as is.

I'm not suggesting everybody does this but anybody who tends to push
up multiple patch sets together could consider it to not tie up
resources for hours.

>
> [0] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2] 
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Ben Nemec



On 10/31/18 4:59 PM, Harald Jensås wrote:

On Wed, 2018-10-31 at 11:39 -0600, Wesley Hayutin wrote:



On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz 
wrote:

Hey everyone,

Based on previous emails around this[0][1], I have proposed a
possible
reducing in our usage by switching the scenario001--011 jobs to
non-voting and removing them from the gate[2]. This will reduce the
likelihood of causing gate resets and hopefully allow us to land
corrective patches sooner.  In terms of risks, there is a risk that
we
might introduce breaking changes in the scenarios because they are
officially non-voting, and we will still be gating promotions on
these
scenarios.  This means that if they are broken, they will need the
same attention and care to fix them so we should be vigilant when
the
jobs are failing.

The hope is that we can switch these scenarios out with voting
standalone versions in the next few weeks, but until that I think
we
should proceed by removing them from the gate.  I know this is less
than ideal but as most failures with these jobs in the gate are
either
timeouts or unrelated to the changes (or gate queue), they are more
of
hindrance than a help at this point.

Thanks,
-Alex


I think I also have to agree.
Having to deploy with containers, update containers and run with two
nodes is no longer a very viable option upstream.  It's not
impossible but it should be the exception and not the rule for all
our jobs.


afaict in my local environment, the container prep stuff takes ages
when adding the playbooks to update them with yum. We will still have
to do this for every standalone job right?



Also, I enabled profiling for ansible tasks on the undercloud and
noticed that the UndercloudPostDeploy was high on the list, actually
the longest running task when re-running the undercloud install ...

Moving from shell script using openstack cli to python reduced the time
for this task dramatically in my environment, see:
https://review.openstack.org/614540. 6 and half minutes reduced to 40
seconds ...


Everything old is new again: 
https://github.com/openstack/instack-undercloud/commit/0eb1b59926c7dc46e321c56db29af95b3d755f34#diff-5602f1b710e86ca1eb7334cb0632f9ee


:-)




How much time would we save in the gates if we converted some of the
shell scripting to python, or if we want to stay in shell script we can
use the interactive shell or use the client-as-a-service[2]?

Interactive shell:
time openstack <<-EOC
server list
workflow list
workflow execution list
EOC

real0m2.852s
time (openstack server list; \
   openstack workflow list; \
   openstack workflow execution list)

real0m7.119s

The difference is significant.

We could cache a token[1], and specify the end-point on each command,
but doing so is still far from as effective as using the interactive.


There is an old thread[2] on the mailing list, which contain a
server/client solution. If we run this service in CI nodes and drop in
the replacement openstack command in /usr/local/bin/openstack we would
use ~1/5 of the time for each command.

(undercloud) [stack@leafs ~]$ time (/usr/bin/openstack network list -f
value -c ID; /usr/bin/openstack network segment list -f value -c ID;
/usr/bin/openstack subnet list -f value -c ID)


real0m6.443s
user0m2.171s
sys 0m0.366s

(undercloud) [stack@leafs ~]$ time (/usr/local/bin/openstack network
list -f value -c ID; /usr/local/bin/openstack network segment list -f
value -c ID; /usr/local/bin/openstack subnet list -f value -c ID)

real0m1.698s
user0m0.042s
sys 0m0.018s



I relize this is a kind of hacky approch, but it does seem to work and
it should be fairly quick to get in there. (With the Undercloud post
script I see 6 minutes returned, what can we get in CI, 10-15 minutes?
Then we could look at moving these scripts to python or use ansible
openstack modules which hopefully does'nt share the same issues with
loading as the python clients?


I'm personally a fan of using Python as then it is unit-testable, but 
I'm not sure how that works with the tht-based code so maybe it's not a 
factor.






[1] https://wiki.openstack.org/wiki/OpenStackClient/Authentication
[2]
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html



Thanks Alex

  

[0]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
[1]
http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
[2]
https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged
)

___
___
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
bscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
WES HAYUTIN
ASSOCIATE MANAGER
Red Hat

whayu...@redhat.comT: +19194232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

[openstack-dev] [tripleo] shutting down 3rd party TripleO CI for measurements

2018-10-31 Thread Wesley Hayutin
Greetings,

The TripleO-CI team would like to consider shutting down all the third
party check jobs running against TripleO projects in order to measure
results with and without load on the cloud for some amount of time.  I
suspect we would want to shut things down for roughly 24-48 hours.

If there are any strong objects please let us know.
Thank you
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Harald Jensås
On Wed, 2018-10-31 at 11:39 -0600, Wesley Hayutin wrote:
> 
> 
> On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz 
> wrote:
> > Hey everyone,
> > 
> > Based on previous emails around this[0][1], I have proposed a
> > possible
> > reducing in our usage by switching the scenario001--011 jobs to
> > non-voting and removing them from the gate[2]. This will reduce the
> > likelihood of causing gate resets and hopefully allow us to land
> > corrective patches sooner.  In terms of risks, there is a risk that
> > we
> > might introduce breaking changes in the scenarios because they are
> > officially non-voting, and we will still be gating promotions on
> > these
> > scenarios.  This means that if they are broken, they will need the
> > same attention and care to fix them so we should be vigilant when
> > the
> > jobs are failing.
> > 
> > The hope is that we can switch these scenarios out with voting
> > standalone versions in the next few weeks, but until that I think
> > we
> > should proceed by removing them from the gate.  I know this is less
> > than ideal but as most failures with these jobs in the gate are
> > either
> > timeouts or unrelated to the changes (or gate queue), they are more
> > of
> > hindrance than a help at this point.
> > 
> > Thanks,
> > -Alex
> 
> I think I also have to agree.
> Having to deploy with containers, update containers and run with two
> nodes is no longer a very viable option upstream.  It's not
> impossible but it should be the exception and not the rule for all
> our jobs.
> 
afaict in my local environment, the container prep stuff takes ages
when adding the playbooks to update them with yum. We will still have
to do this for every standalone job right?



Also, I enabled profiling for ansible tasks on the undercloud and
noticed that the UndercloudPostDeploy was high on the list, actually
the longest running task when re-running the undercloud install ...

Moving from shell script using openstack cli to python reduced the time
for this task dramatically in my environment, see: 
https://review.openstack.org/614540. 6 and half minutes reduced to 40
seconds ...


How much time would we save in the gates if we converted some of the
shell scripting to python, or if we want to stay in shell script we can
use the interactive shell or use the client-as-a-service[2]?

Interactive shell:
time openstack <<-EOC
server list
workflow list
workflow execution list
EOC

real0m2.852s
time (openstack server list; \
  openstack workflow list; \
  openstack workflow execution list)

real0m7.119s

The difference is significant.

We could cache a token[1], and specify the end-point on each command,
but doing so is still far from as effective as using the interactive.


There is an old thread[2] on the mailing list, which contain a
server/client solution. If we run this service in CI nodes and drop in
the replacement openstack command in /usr/local/bin/openstack we would
use ~1/5 of the time for each command.

(undercloud) [stack@leafs ~]$ time (/usr/bin/openstack network list -f
value -c ID; /usr/bin/openstack network segment list -f value -c ID;
/usr/bin/openstack subnet list -f value -c ID)


real0m6.443s
user0m2.171s
sys 0m0.366s

(undercloud) [stack@leafs ~]$ time (/usr/local/bin/openstack network
list -f value -c ID; /usr/local/bin/openstack network segment list -f
value -c ID; /usr/local/bin/openstack subnet list -f value -c ID)

real0m1.698s
user0m0.042s
sys 0m0.018s



I relize this is a kind of hacky approch, but it does seem to work and
it should be fairly quick to get in there. (With the Undercloud post
script I see 6 minutes returned, what can we get in CI, 10-15 minutes?
Then we could look at moving these scripts to python or use ansible
openstack modules which hopefully does'nt share the same issues with
loading as the python clients?



[1] https://wiki.openstack.org/wiki/OpenStackClient/Authentication
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2016-April/092546.html


> Thanks Alex
> 
>  
> > [0] 
> > http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> > [1] 
> > http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> > [2] 
> > https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged
> > )
> > 
> > ___
> > ___
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsu
> > bscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> -- 
> WES HAYUTIN
> ASSOCIATE MANAGER
> Red Hat 
> 
> whayu...@redhat.comT: +19194232509 IRC:  weshay
> 
> 
> View my calendar and check my availability for meetings HERE
> _
> _
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: 

Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-31 Thread Harald Jensås
On Wed, 2018-10-31 at 11:35 -0600, Alex Schultz wrote:
> On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås 
> wrote:
> > 
> > On Tue, 2018-10-30 at 15:00 -0600, Alex Schultz wrote:
> > > On Tue, Oct 30, 2018 at 12:25 PM Clark Boylan <
> > > cboy...@sapwetik.org>
> > > wrote:
> > > > 
> > > > On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:
> > > > > On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec <
> > > > > openst...@nemebean.com> wrote:
> > > > > > 
> > > > > > Tagging with tripleo since my suggestion below is specific
> > > > > > to
> > > > > > that project.
> > > > > > 
> > > > > > On 10/30/18 11:03 AM, Clark Boylan wrote:
> > > > > > > Hello everyone,
> > > > > > > 
> > > > > > > A little while back I sent email explaining how the gate
> > > > > > > queues work and how fixing bugs helps us test and merge
> > > > > > > more
> > > > > > > code. All of this still is still true and we should keep
> > > > > > > pushing to improve our testing to avoid gate resets.
> > > > > > > 
> > > > > > > Last week we migrated Zuul and Nodepool to a new
> > > > > > > Zookeeper
> > > > > > > cluster. In the process of doing this we had to restart
> > > > > > > Zuul
> > > > > > > which brought in a new logging feature that exposes node
> > > > > > > resource usage by jobs. Using this data I've been able to
> > > > > > > generate some report information on where our node demand
> > > > > > > is
> > > > > > > going. This change [0] produces this report [1].
> > > > > > > 
> > > > > > > As with optimizing software we want to identify which
> > > > > > > changes
> > > > > > > will have the biggest impact and to be able to measure
> > > > > > > whether or not changes have had an impact once we have
> > > > > > > made
> > > > > > > them. Hopefully this information is a start at doing
> > > > > > > that.
> > > > > > > Currently we can only look back to the point Zuul was
> > > > > > > restarted, but we have a thirty day log rotation for this
> > > > > > > service and should be able to look at a month's worth of
> > > > > > > data
> > > > > > > going forward.
> > > > > > > 
> > > > > > > Looking at the data you might notice that Tripleo is
> > > > > > > using
> > > > > > > many more node resources than our other projects. They
> > > > > > > are
> > > > > > > aware of this and have a plan [2] to reduce their
> > > > > > > resource
> > > > > > > consumption. We'll likely be using this report generator
> > > > > > > to
> > > > > > > check progress of this plan over time.
> > > > > > 
> > > > > > I know at one point we had discussed reducing the
> > > > > > concurrency
> > > > > > of the
> > > > > > tripleo gate to help with this. Since tripleo is still
> > > > > > using
> > > > > > > 50% of the
> > > > > > 
> > > > > > resources it seems like maybe we should revisit that, at
> > > > > > least
> > > > > > for the
> > > > > > short-term until the more major changes can be made?
> > > > > > Looking
> > > > > > through the
> > > > > > merge history for tripleo projects I don't see a lot of
> > > > > > cases
> > > > > > (any, in
> > > > > > fact) where more than a dozen patches made it through
> > > > > > anyway*,
> > > > > > so I
> > > > > > suspect it wouldn't have a significant impact on gate
> > > > > > throughput, but it
> > > > > > would free up quite a few nodes for other uses.
> > > > > > 
> > > > > 
> > > > > It's the failures in gate and resets.  At this point I think
> > > > > it
> > > > > would
> > > > > be a good idea to turn down the concurrency of the tripleo
> > > > > queue
> > > > > in
> > > > > the gate if possible. As of late it's been timeouts but we've
> > > > > been
> > > > > unable to track down why it's timing out specifically.  I
> > > > > personally
> > > > > have a feeling it's the container download times since we do
> > > > > not
> > > > > have
> > > > > a local registry available and are only able to leverage the
> > > > > mirrors
> > > > > for some levels of caching. Unfortunately we don't get the
> > > > > best
> > > > > information about this out of docker (or the mirrors) and
> > > > > it's
> > > > > really
> > > > > hard to determine what exactly makes things run a bit slower.
> > > > 
> > > > We actually tried this not too long ago
> > > > 
https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
> > > >  but decided to revert it because it didn't decrease the check
> > > > queue backlog significantly. We were still running at several
> > > > hours
> > > > behind most of the time.
> > > > 
> > > > If we want to set up better monitoring and measuring and try it
> > > > again we can do that. But we probably want to measure queue
> > > > sizes
> > > > with and without the change like that to better understand if
> > > > it
> > > > helps.
> > > > 
> > > > As for container image download times can we quantify that via
> > > > docker logs? Basically sum up the amount of time spent by a job
> > > > downloading images so that we can see what the impact is 

Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Doug Hellmann
Alex Schultz  writes:

> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex
>
> [0] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1] 
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2] 
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

This makes a lot of sense as a temporary measure. Thanks for continuing
to drive these changes!

Doug


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Wesley Hayutin
On Wed, Oct 31, 2018 at 11:21 AM Alex Schultz  wrote:

> Hey everyone,
>
> Based on previous emails around this[0][1], I have proposed a possible
> reducing in our usage by switching the scenario001--011 jobs to
> non-voting and removing them from the gate[2]. This will reduce the
> likelihood of causing gate resets and hopefully allow us to land
> corrective patches sooner.  In terms of risks, there is a risk that we
> might introduce breaking changes in the scenarios because they are
> officially non-voting, and we will still be gating promotions on these
> scenarios.  This means that if they are broken, they will need the
> same attention and care to fix them so we should be vigilant when the
> jobs are failing.
>
> The hope is that we can switch these scenarios out with voting
> standalone versions in the next few weeks, but until that I think we
> should proceed by removing them from the gate.  I know this is less
> than ideal but as most failures with these jobs in the gate are either
> timeouts or unrelated to the changes (or gate queue), they are more of
> hindrance than a help at this point.
>
> Thanks,
> -Alex
>

I think I also have to agree.
Having to deploy with containers, update containers and run with two nodes
is no longer a very viable option upstream.  It's not impossible but it
should be the exception and not the rule for all our jobs.

Thanks Alex



>
> [0]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
> [1]
> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
> [2]
> https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 

Wes Hayutin

Associate MANAGER

Red Hat



whayu...@redhat.comT: +1919 <+19197544114>4232509 IRC:  weshay


View my calendar and check my availability for meetings HERE

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-31 Thread Alex Schultz
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås  wrote:
>
> On Tue, 2018-10-30 at 15:00 -0600, Alex Schultz wrote:
> > On Tue, Oct 30, 2018 at 12:25 PM Clark Boylan 
> > wrote:
> > >
> > > On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:
> > > > On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec <
> > > > openst...@nemebean.com> wrote:
> > > > >
> > > > > Tagging with tripleo since my suggestion below is specific to
> > > > > that project.
> > > > >
> > > > > On 10/30/18 11:03 AM, Clark Boylan wrote:
> > > > > > Hello everyone,
> > > > > >
> > > > > > A little while back I sent email explaining how the gate
> > > > > > queues work and how fixing bugs helps us test and merge more
> > > > > > code. All of this still is still true and we should keep
> > > > > > pushing to improve our testing to avoid gate resets.
> > > > > >
> > > > > > Last week we migrated Zuul and Nodepool to a new Zookeeper
> > > > > > cluster. In the process of doing this we had to restart Zuul
> > > > > > which brought in a new logging feature that exposes node
> > > > > > resource usage by jobs. Using this data I've been able to
> > > > > > generate some report information on where our node demand is
> > > > > > going. This change [0] produces this report [1].
> > > > > >
> > > > > > As with optimizing software we want to identify which changes
> > > > > > will have the biggest impact and to be able to measure
> > > > > > whether or not changes have had an impact once we have made
> > > > > > them. Hopefully this information is a start at doing that.
> > > > > > Currently we can only look back to the point Zuul was
> > > > > > restarted, but we have a thirty day log rotation for this
> > > > > > service and should be able to look at a month's worth of data
> > > > > > going forward.
> > > > > >
> > > > > > Looking at the data you might notice that Tripleo is using
> > > > > > many more node resources than our other projects. They are
> > > > > > aware of this and have a plan [2] to reduce their resource
> > > > > > consumption. We'll likely be using this report generator to
> > > > > > check progress of this plan over time.
> > > > >
> > > > > I know at one point we had discussed reducing the concurrency
> > > > > of the
> > > > > tripleo gate to help with this. Since tripleo is still using
> > > > > >50% of the
> > > > > resources it seems like maybe we should revisit that, at least
> > > > > for the
> > > > > short-term until the more major changes can be made? Looking
> > > > > through the
> > > > > merge history for tripleo projects I don't see a lot of cases
> > > > > (any, in
> > > > > fact) where more than a dozen patches made it through anyway*,
> > > > > so I
> > > > > suspect it wouldn't have a significant impact on gate
> > > > > throughput, but it
> > > > > would free up quite a few nodes for other uses.
> > > > >
> > > >
> > > > It's the failures in gate and resets.  At this point I think it
> > > > would
> > > > be a good idea to turn down the concurrency of the tripleo queue
> > > > in
> > > > the gate if possible. As of late it's been timeouts but we've
> > > > been
> > > > unable to track down why it's timing out specifically.  I
> > > > personally
> > > > have a feeling it's the container download times since we do not
> > > > have
> > > > a local registry available and are only able to leverage the
> > > > mirrors
> > > > for some levels of caching. Unfortunately we don't get the best
> > > > information about this out of docker (or the mirrors) and it's
> > > > really
> > > > hard to determine what exactly makes things run a bit slower.
> > >
> > > We actually tried this not too long ago
> > > https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
> > >  but decided to revert it because it didn't decrease the check
> > > queue backlog significantly. We were still running at several hours
> > > behind most of the time.
> > >
> > > If we want to set up better monitoring and measuring and try it
> > > again we can do that. But we probably want to measure queue sizes
> > > with and without the change like that to better understand if it
> > > helps.
> > >
> > > As for container image download times can we quantify that via
> > > docker logs? Basically sum up the amount of time spent by a job
> > > downloading images so that we can see what the impact is but also
> > > measure if changes improve that? As for other ideas improving
> > > things seems like many of the images that tripleo use are quite
> > > large. I recall seeing a > 600MB image just for rsyslog. Wouldn't
> > > it be advantageous for both the gate and tripleo in the real world
> > > to trim the size of those images (which should improve download
> > > times). In any case quantifying the size of the downloads and
> > > trimming those if possible is likely also worthwhile.
> > >
> >
> > So it's not that simple as we don't just download all the images in a
> > distinct task and there isn't any 

[openstack-dev] [tripleo] reducing our upstream CI footprint

2018-10-31 Thread Alex Schultz
Hey everyone,

Based on previous emails around this[0][1], I have proposed a possible
reducing in our usage by switching the scenario001--011 jobs to
non-voting and removing them from the gate[2]. This will reduce the
likelihood of causing gate resets and hopefully allow us to land
corrective patches sooner.  In terms of risks, there is a risk that we
might introduce breaking changes in the scenarios because they are
officially non-voting, and we will still be gating promotions on these
scenarios.  This means that if they are broken, they will need the
same attention and care to fix them so we should be vigilant when the
jobs are failing.

The hope is that we can switch these scenarios out with voting
standalone versions in the next few weeks, but until that I think we
should proceed by removing them from the gate.  I know this is less
than ideal but as most failures with these jobs in the gate are either
timeouts or unrelated to the changes (or gate queue), they are more of
hindrance than a help at this point.

Thanks,
-Alex

[0] http://lists.openstack.org/pipermail/openstack-dev/2018-October/136141.html
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-October/135396.html
[2] 
https://review.openstack.org/#/q/topic:reduce-tripleo-usage+(status:open+OR+status:merged)

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-31 Thread Harald Jensås
On Tue, 2018-10-30 at 15:00 -0600, Alex Schultz wrote:
> On Tue, Oct 30, 2018 at 12:25 PM Clark Boylan 
> wrote:
> > 
> > On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:
> > > On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec <
> > > openst...@nemebean.com> wrote:
> > > > 
> > > > Tagging with tripleo since my suggestion below is specific to
> > > > that project.
> > > > 
> > > > On 10/30/18 11:03 AM, Clark Boylan wrote:
> > > > > Hello everyone,
> > > > > 
> > > > > A little while back I sent email explaining how the gate
> > > > > queues work and how fixing bugs helps us test and merge more
> > > > > code. All of this still is still true and we should keep
> > > > > pushing to improve our testing to avoid gate resets.
> > > > > 
> > > > > Last week we migrated Zuul and Nodepool to a new Zookeeper
> > > > > cluster. In the process of doing this we had to restart Zuul
> > > > > which brought in a new logging feature that exposes node
> > > > > resource usage by jobs. Using this data I've been able to
> > > > > generate some report information on where our node demand is
> > > > > going. This change [0] produces this report [1].
> > > > > 
> > > > > As with optimizing software we want to identify which changes
> > > > > will have the biggest impact and to be able to measure
> > > > > whether or not changes have had an impact once we have made
> > > > > them. Hopefully this information is a start at doing that.
> > > > > Currently we can only look back to the point Zuul was
> > > > > restarted, but we have a thirty day log rotation for this
> > > > > service and should be able to look at a month's worth of data
> > > > > going forward.
> > > > > 
> > > > > Looking at the data you might notice that Tripleo is using
> > > > > many more node resources than our other projects. They are
> > > > > aware of this and have a plan [2] to reduce their resource
> > > > > consumption. We'll likely be using this report generator to
> > > > > check progress of this plan over time.
> > > > 
> > > > I know at one point we had discussed reducing the concurrency
> > > > of the
> > > > tripleo gate to help with this. Since tripleo is still using
> > > > >50% of the
> > > > resources it seems like maybe we should revisit that, at least
> > > > for the
> > > > short-term until the more major changes can be made? Looking
> > > > through the
> > > > merge history for tripleo projects I don't see a lot of cases
> > > > (any, in
> > > > fact) where more than a dozen patches made it through anyway*,
> > > > so I
> > > > suspect it wouldn't have a significant impact on gate
> > > > throughput, but it
> > > > would free up quite a few nodes for other uses.
> > > > 
> > > 
> > > It's the failures in gate and resets.  At this point I think it
> > > would
> > > be a good idea to turn down the concurrency of the tripleo queue
> > > in
> > > the gate if possible. As of late it's been timeouts but we've
> > > been
> > > unable to track down why it's timing out specifically.  I
> > > personally
> > > have a feeling it's the container download times since we do not
> > > have
> > > a local registry available and are only able to leverage the
> > > mirrors
> > > for some levels of caching. Unfortunately we don't get the best
> > > information about this out of docker (or the mirrors) and it's
> > > really
> > > hard to determine what exactly makes things run a bit slower.
> > 
> > We actually tried this not too long ago 
> > https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
> >  but decided to revert it because it didn't decrease the check
> > queue backlog significantly. We were still running at several hours
> > behind most of the time.
> > 
> > If we want to set up better monitoring and measuring and try it
> > again we can do that. But we probably want to measure queue sizes
> > with and without the change like that to better understand if it
> > helps.
> > 
> > As for container image download times can we quantify that via
> > docker logs? Basically sum up the amount of time spent by a job
> > downloading images so that we can see what the impact is but also
> > measure if changes improve that? As for other ideas improving
> > things seems like many of the images that tripleo use are quite
> > large. I recall seeing a > 600MB image just for rsyslog. Wouldn't
> > it be advantageous for both the gate and tripleo in the real world
> > to trim the size of those images (which should improve download
> > times). In any case quantifying the size of the downloads and
> > trimming those if possible is likely also worthwhile.
> > 
> 
> So it's not that simple as we don't just download all the images in a
> distinct task and there isn't any information provided around
> size/speed AFAIK.  Additionally we aren't doing anything special with
> the images (it's mostly kolla built containers with a handful of
> tweaks) so that's just the size of the containers.  I am currently
> working on 

Re: [openstack-dev] [tripleo] request for feedback/review on docker2podman upgrade

2018-10-31 Thread Sofer Athlan-Guyot
Emilien Macchi  writes:

> A bit of an update here:
>
> - We merged the patch in openstack/paunch that stop the Docker container if
> we try to start a Podman container.
> - We switched the undercloud upgrade job to test upgrades from Docker to
> Podman (for now containers are stopped in Docker and then started in
> Podman).
> - We are now looking how and where to remove the Docker containers once the
> upgrade finished. For that work, I started with the Undercloud and patched
> tripleoclient to run the post_upgrade_tasks which to me is a good candidate
> to run docker rm.

+1

>
> Please look:
> - tripleoclient / run post_upgrade_tasks when upgrading
> standalone/undercloud: https://review.openstack.org/614349
> - THT: prototype on how we would remove the Docker containers:
> https://review.openstack.org/611092
>

reviewed.

> Note: for now we assume that Docker is still available on the host after
> the upgrade as we are testing things under centos7. I'm aware that this
> assumption can change in the future but we'll probably re-iterate.
>
> What I need from the upgrade team is feedback on this workflow, and see if
> we can re-use these bits originally tested on Undercloud / Standalone, for
> the Overcloud as well.
>

So that workflow won't break in any case for the overcloud.  For an
inplace upgrade then we need that clean up anyway given how paunch
detect the need for an upgrade. For other upgrade scenario this won't do
anything bad.

So +1 for me.

> Thanks for the feedback,
>
>
> On Fri, Oct 19, 2018 at 8:00 AM Emilien Macchi  wrote:
>
>> On Fri, Oct 19, 2018 at 4:24 AM Giulio Fidente 
>> wrote:
>>
>>> 1) create the podman systemd unit
>>> 2) delete the docker container
>>>
>>
>> We finally went with "stop the docker container"
>>
>> 3) start the podman container
>>>
>>
>> and 4) delete the docker container later in THT upgrade_tasks.
>>
>> And yes +1 to do the same in ceph-ansible if possible.
>> --
>> Emilien Macchi
>>
>
>
> -- 
> Emilien Macchi
-- 
Sofer Athlan-Guyot

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] gate issues please do not approve/recheck

2018-10-31 Thread Alex Schultz
Hey folks,

So we have identified an issue that has been causing a bunch of
failures and proposed a revert of our podman testing[0].  We have
cleared the gate and are asking that you not approve or recheck any
patches at this time.  We will let you know when it is safe to start
approving things.

Thanks,
-Alex

[0] https://review.openstack.org/#/c/614537/

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-31 Thread Chris Dent

On Wed, 31 Oct 2018, Eduardo Gonzalez wrote:


- Run db syncs as there is not command for that yet in the master branch
- Apply upgrade process for db changes


The placement-side pieces for this are nearly ready, see the stack
beginning at https://review.openstack.org/#/c/611441/

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-31 Thread Eduardo Gonzalez
Hi, from kolla side I've started the work.

In kolla images [0] for now only placement is separated to an independent
image, only source is code is being installed, binary still uses
nova-placement packages until a binary package exists from the debian and
centos families.

In kolla-ansible [1] placement service has been moved into a separate role
applied just before nova.

Things missing for now:
- Binary packages from distributions
- Run db syncs as there is not command for that yet in the master branch
- Apply upgrade process for db changes


[0] https://review.openstack.org/#/c/613589/
[1] https://review.openstack.org/#/c/613629/

Regards

El mié., 31 oct. 2018 a las 10:19, Lee Yarwood ()
escribió:

> On 30-10-18 14:29:12, Emilien Macchi wrote:
> > On the TripleO side, it sounds like Lee Yarwood is taking the lead with a
> > first commit in puppet-placement:
> > https://review.openstack.org/#/c/604182/
> >
> > Lee, can you confirm that you and your team are working on it for Stein
> > cycle?
>
> ACK, just getting back online after being out for three weeks but still
> planning on getting everything in place by the original M2 goal we
> agreed to at PTG. I'll try to post more details by the end of the week.
>
> Cheers,
>
> Lee
>
> > On Thu, Oct 25, 2018 at 1:34 PM Matt Riedemann 
> wrote:
> >
> > > Hello OSA/TripleO people,
> > >
> > > A plan/checklist was put in place at the Stein PTG for extracting
> > > placement from nova [1]. The first item in that list is done in grenade
> > > [2], which is the devstack-based upgrade project in the integrated
> gate.
> > > That should serve as a template for the necessary upgrade steps in
> > > deployment projects. The related devstack change for extracted
> placement
> > > on the master branch (Stein) is [3]. Note that change has some
> > > dependencies.
> > >
> > > The second point in the plan from the PTG was getting extracted
> > > placement upgrade tooling support in a deployment project, notably
> > > TripleO (and/or OpenStackAnsible).
> > >
> > > Given the grenade change is done and passing tests, TripleO/OSA should
> > > be able to start coding up and testing an upgrade step when going from
> > > Rocky to Stein. My question is who can we name as an owner in either
> > > project to start this work? Because we really need to be starting this
> > > as soon as possible to flush out any issues before they are too late to
> > > correct in Stein.
> > >
> > > So if we have volunteers or better yet potential patches that I'm just
> > > not aware of, please speak up here so we know who to contact about
> > > status updates and if there are any questions with the upgrade.
> > >
> > > [1]
> http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html
> > > [2] https://review.openstack.org/#/c/604454/
> > > [3] https://review.openstack.org/#/c/600162/
>
> --
> Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672
> 2D76
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-31 Thread Lee Yarwood
On 30-10-18 14:29:12, Emilien Macchi wrote:
> On the TripleO side, it sounds like Lee Yarwood is taking the lead with a
> first commit in puppet-placement:
> https://review.openstack.org/#/c/604182/
> 
> Lee, can you confirm that you and your team are working on it for Stein
> cycle?

ACK, just getting back online after being out for three weeks but still
planning on getting everything in place by the original M2 goal we
agreed to at PTG. I'll try to post more details by the end of the week.

Cheers,

Lee
 
> On Thu, Oct 25, 2018 at 1:34 PM Matt Riedemann  wrote:
> 
> > Hello OSA/TripleO people,
> >
> > A plan/checklist was put in place at the Stein PTG for extracting
> > placement from nova [1]. The first item in that list is done in grenade
> > [2], which is the devstack-based upgrade project in the integrated gate.
> > That should serve as a template for the necessary upgrade steps in
> > deployment projects. The related devstack change for extracted placement
> > on the master branch (Stein) is [3]. Note that change has some
> > dependencies.
> >
> > The second point in the plan from the PTG was getting extracted
> > placement upgrade tooling support in a deployment project, notably
> > TripleO (and/or OpenStackAnsible).
> >
> > Given the grenade change is done and passing tests, TripleO/OSA should
> > be able to start coding up and testing an upgrade step when going from
> > Rocky to Stein. My question is who can we name as an owner in either
> > project to start this work? Because we really need to be starting this
> > as soon as possible to flush out any issues before they are too late to
> > correct in Stein.
> >
> > So if we have volunteers or better yet potential patches that I'm just
> > not aware of, please speak up here so we know who to contact about
> > status updates and if there are any questions with the upgrade.
> >
> > [1] 
> > http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html
> > [2] https://review.openstack.org/#/c/604454/
> > [3] https://review.openstack.org/#/c/600162/

-- 
Lee Yarwood A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-30 Thread Clark Boylan
On Tue, Oct 30, 2018, at 1:01 PM, Ben Nemec wrote:
> 
> 
> On 10/30/18 1:25 PM, Clark Boylan wrote:
> > On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:
> >> On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec  wrote:
> >>>
> >>> Tagging with tripleo since my suggestion below is specific to that 
> >>> project.
> >>>
> >>> On 10/30/18 11:03 AM, Clark Boylan wrote:
>  Hello everyone,
> 
>  A little while back I sent email explaining how the gate queues work and 
>  how fixing bugs helps us test and merge more code. All of this still is 
>  still true and we should keep pushing to improve our testing to avoid 
>  gate resets.
> 
>  Last week we migrated Zuul and Nodepool to a new Zookeeper cluster. In 
>  the process of doing this we had to restart Zuul which brought in a new 
>  logging feature that exposes node resource usage by jobs. Using this 
>  data I've been able to generate some report information on where our 
>  node demand is going. This change [0] produces this report [1].
> 
>  As with optimizing software we want to identify which changes will have 
>  the biggest impact and to be able to measure whether or not changes have 
>  had an impact once we have made them. Hopefully this information is a 
>  start at doing that. Currently we can only look back to the point Zuul 
>  was restarted, but we have a thirty day log rotation for this service 
>  and should be able to look at a month's worth of data going forward.
> 
>  Looking at the data you might notice that Tripleo is using many more 
>  node resources than our other projects. They are aware of this and have 
>  a plan [2] to reduce their resource consumption. We'll likely be using 
>  this report generator to check progress of this plan over time.
> >>>
> >>> I know at one point we had discussed reducing the concurrency of the
> >>> tripleo gate to help with this. Since tripleo is still using >50% of the
> >>> resources it seems like maybe we should revisit that, at least for the
> >>> short-term until the more major changes can be made? Looking through the
> >>> merge history for tripleo projects I don't see a lot of cases (any, in
> >>> fact) where more than a dozen patches made it through anyway*, so I
> >>> suspect it wouldn't have a significant impact on gate throughput, but it
> >>> would free up quite a few nodes for other uses.
> >>>
> >>
> >> It's the failures in gate and resets.  At this point I think it would
> >> be a good idea to turn down the concurrency of the tripleo queue in
> >> the gate if possible. As of late it's been timeouts but we've been
> >> unable to track down why it's timing out specifically.  I personally
> >> have a feeling it's the container download times since we do not have
> >> a local registry available and are only able to leverage the mirrors
> >> for some levels of caching. Unfortunately we don't get the best
> >> information about this out of docker (or the mirrors) and it's really
> >> hard to determine what exactly makes things run a bit slower.
> > 
> > We actually tried this not too long ago 
> > https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
> >  but decided to revert it because it didn't decrease the check queue 
> > backlog significantly. We were still running at several hours behind most 
> > of the time.
> 
> I'm surprised to hear that. Counting the tripleo jobs in the gate at 
> positions 11-20 right now, I see around 84 nodes tied up in long-running 
> jobs and another 32 for shorter unit test jobs. The latter probably 
> don't have much impact, but the former is a non-trivial amount. It may 
> not erase the entire 2300+ job queue that we have right now, but it 
> seems like it should help.
> 
> > 
> > If we want to set up better monitoring and measuring and try it again we 
> > can do that. But we probably want to measure queue sizes with and without 
> > the change like that to better understand if it helps.
> 
> This seems like good information to start capturing, otherwise we are 
> kind of just guessing. Is there something in infra already that we could 
> use or would it need to be new tooling?

Digging around in graphite we currently track mean in pipelines. This is 
probably a reasonable metric to use for this specific case.

Looking at the check queue [3] shows the mean time enqueued in check during the 
rough period window floor was 10 and [4] shows it since then. The 26th and 27th 
are bigger peaks than previously seen (possibly due to losing inap temporarily) 
but otherwise a queue backlog of ~200 minutes was "normal" in both time periods.

[3] 
http://graphite.openstack.org/render/?from=20181015=20181019=scale(stats.timers.zuul.tenant.openstack.pipeline.check.resident_time.mean,%200.166)
[4] 

Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-30 Thread Alex Schultz
On Tue, Oct 30, 2018 at 12:25 PM Clark Boylan  wrote:
>
> On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:
> > On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec  wrote:
> > >
> > > Tagging with tripleo since my suggestion below is specific to that 
> > > project.
> > >
> > > On 10/30/18 11:03 AM, Clark Boylan wrote:
> > > > Hello everyone,
> > > >
> > > > A little while back I sent email explaining how the gate queues work 
> > > > and how fixing bugs helps us test and merge more code. All of this 
> > > > still is still true and we should keep pushing to improve our testing 
> > > > to avoid gate resets.
> > > >
> > > > Last week we migrated Zuul and Nodepool to a new Zookeeper cluster. In 
> > > > the process of doing this we had to restart Zuul which brought in a new 
> > > > logging feature that exposes node resource usage by jobs. Using this 
> > > > data I've been able to generate some report information on where our 
> > > > node demand is going. This change [0] produces this report [1].
> > > >
> > > > As with optimizing software we want to identify which changes will have 
> > > > the biggest impact and to be able to measure whether or not changes 
> > > > have had an impact once we have made them. Hopefully this information 
> > > > is a start at doing that. Currently we can only look back to the point 
> > > > Zuul was restarted, but we have a thirty day log rotation for this 
> > > > service and should be able to look at a month's worth of data going 
> > > > forward.
> > > >
> > > > Looking at the data you might notice that Tripleo is using many more 
> > > > node resources than our other projects. They are aware of this and have 
> > > > a plan [2] to reduce their resource consumption. We'll likely be using 
> > > > this report generator to check progress of this plan over time.
> > >
> > > I know at one point we had discussed reducing the concurrency of the
> > > tripleo gate to help with this. Since tripleo is still using >50% of the
> > > resources it seems like maybe we should revisit that, at least for the
> > > short-term until the more major changes can be made? Looking through the
> > > merge history for tripleo projects I don't see a lot of cases (any, in
> > > fact) where more than a dozen patches made it through anyway*, so I
> > > suspect it wouldn't have a significant impact on gate throughput, but it
> > > would free up quite a few nodes for other uses.
> > >
> >
> > It's the failures in gate and resets.  At this point I think it would
> > be a good idea to turn down the concurrency of the tripleo queue in
> > the gate if possible. As of late it's been timeouts but we've been
> > unable to track down why it's timing out specifically.  I personally
> > have a feeling it's the container download times since we do not have
> > a local registry available and are only able to leverage the mirrors
> > for some levels of caching. Unfortunately we don't get the best
> > information about this out of docker (or the mirrors) and it's really
> > hard to determine what exactly makes things run a bit slower.
>
> We actually tried this not too long ago 
> https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
>  but decided to revert it because it didn't decrease the check queue backlog 
> significantly. We were still running at several hours behind most of the time.
>
> If we want to set up better monitoring and measuring and try it again we can 
> do that. But we probably want to measure queue sizes with and without the 
> change like that to better understand if it helps.
>
> As for container image download times can we quantify that via docker logs? 
> Basically sum up the amount of time spent by a job downloading images so that 
> we can see what the impact is but also measure if changes improve that? As 
> for other ideas improving things seems like many of the images that tripleo 
> use are quite large. I recall seeing a > 600MB image just for rsyslog. 
> Wouldn't it be advantageous for both the gate and tripleo in the real world 
> to trim the size of those images (which should improve download times). In 
> any case quantifying the size of the downloads and trimming those if possible 
> is likely also worthwhile.
>

So it's not that simple as we don't just download all the images in a
distinct task and there isn't any information provided around
size/speed AFAIK.  Additionally we aren't doing anything special with
the images (it's mostly kolla built containers with a handful of
tweaks) so that's just the size of the containers.  I am currently
working on reducing any tripleo specific dependencies (ie removal of
instack-undercloud, etc) in hopes that we'll shave off some of the
dependencies but it seems that there's a larger (bloat) issue around
containers in general.  I have no idea why the rsyslog container would
be 600M, but yea that does seem excessive.

> Clark
>
> 

Re: [openstack-dev] [tripleo] request for feedback/review on docker2podman upgrade

2018-10-30 Thread Emilien Macchi
A bit of an update here:

- We merged the patch in openstack/paunch that stop the Docker container if
we try to start a Podman container.
- We switched the undercloud upgrade job to test upgrades from Docker to
Podman (for now containers are stopped in Docker and then started in
Podman).
- We are now looking how and where to remove the Docker containers once the
upgrade finished. For that work, I started with the Undercloud and patched
tripleoclient to run the post_upgrade_tasks which to me is a good candidate
to run docker rm.

Please look:
- tripleoclient / run post_upgrade_tasks when upgrading
standalone/undercloud: https://review.openstack.org/614349
- THT: prototype on how we would remove the Docker containers:
https://review.openstack.org/611092

Note: for now we assume that Docker is still available on the host after
the upgrade as we are testing things under centos7. I'm aware that this
assumption can change in the future but we'll probably re-iterate.

What I need from the upgrade team is feedback on this workflow, and see if
we can re-use these bits originally tested on Undercloud / Standalone, for
the Overcloud as well.

Thanks for the feedback,


On Fri, Oct 19, 2018 at 8:00 AM Emilien Macchi  wrote:

> On Fri, Oct 19, 2018 at 4:24 AM Giulio Fidente 
> wrote:
>
>> 1) create the podman systemd unit
>> 2) delete the docker container
>>
>
> We finally went with "stop the docker container"
>
> 3) start the podman container
>>
>
> and 4) delete the docker container later in THT upgrade_tasks.
>
> And yes +1 to do the same in ceph-ansible if possible.
> --
> Emilien Macchi
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Zuul Queue backlogs and resource usage

2018-10-30 Thread Ben Nemec



On 10/30/18 1:25 PM, Clark Boylan wrote:

On Tue, Oct 30, 2018, at 10:42 AM, Alex Schultz wrote:

On Tue, Oct 30, 2018 at 11:36 AM Ben Nemec  wrote:


Tagging with tripleo since my suggestion below is specific to that project.

On 10/30/18 11:03 AM, Clark Boylan wrote:

Hello everyone,

A little while back I sent email explaining how the gate queues work and how 
fixing bugs helps us test and merge more code. All of this still is still true 
and we should keep pushing to improve our testing to avoid gate resets.

Last week we migrated Zuul and Nodepool to a new Zookeeper cluster. In the 
process of doing this we had to restart Zuul which brought in a new logging 
feature that exposes node resource usage by jobs. Using this data I've been 
able to generate some report information on where our node demand is going. 
This change [0] produces this report [1].

As with optimizing software we want to identify which changes will have the 
biggest impact and to be able to measure whether or not changes have had an 
impact once we have made them. Hopefully this information is a start at doing 
that. Currently we can only look back to the point Zuul was restarted, but we 
have a thirty day log rotation for this service and should be able to look at a 
month's worth of data going forward.

Looking at the data you might notice that Tripleo is using many more node 
resources than our other projects. They are aware of this and have a plan [2] 
to reduce their resource consumption. We'll likely be using this report 
generator to check progress of this plan over time.


I know at one point we had discussed reducing the concurrency of the
tripleo gate to help with this. Since tripleo is still using >50% of the
resources it seems like maybe we should revisit that, at least for the
short-term until the more major changes can be made? Looking through the
merge history for tripleo projects I don't see a lot of cases (any, in
fact) where more than a dozen patches made it through anyway*, so I
suspect it wouldn't have a significant impact on gate throughput, but it
would free up quite a few nodes for other uses.



It's the failures in gate and resets.  At this point I think it would
be a good idea to turn down the concurrency of the tripleo queue in
the gate if possible. As of late it's been timeouts but we've been
unable to track down why it's timing out specifically.  I personally
have a feeling it's the container download times since we do not have
a local registry available and are only able to leverage the mirrors
for some levels of caching. Unfortunately we don't get the best
information about this out of docker (or the mirrors) and it's really
hard to determine what exactly makes things run a bit slower.


We actually tried this not too long ago 
https://git.openstack.org/cgit/openstack-infra/project-config/commit/?id=22d98f7aab0fb23849f715a8796384cffa84600b
 but decided to revert it because it didn't decrease the check queue backlog 
significantly. We were still running at several hours behind most of the time.


I'm surprised to hear that. Counting the tripleo jobs in the gate at 
positions 11-20 right now, I see around 84 nodes tied up in long-running 
jobs and another 32 for shorter unit test jobs. The latter probably 
don't have much impact, but the former is a non-trivial amount. It may 
not erase the entire 2300+ job queue that we have right now, but it 
seems like it should help.




If we want to set up better monitoring and measuring and try it again we can do 
that. But we probably want to measure queue sizes with and without the change 
like that to better understand if it helps.


This seems like good information to start capturing, otherwise we are 
kind of just guessing. Is there something in infra already that we could 
use or would it need to be new tooling?




As for container image download times can we quantify that via docker logs? 
Basically sum up the amount of time spent by a job downloading images so that we 
can see what the impact is but also measure if changes improve that? As for other 
ideas improving things seems like many of the images that tripleo use are quite 
large. I recall seeing a > 600MB image just for rsyslog. Wouldn't it be 
advantageous for both the gate and tripleo in the real world to trim the size of 
those images (which should improve download times). In any case quantifying the 
size of the downloads and trimming those if possible is likely also worthwhile.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Re: [openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-30 Thread Chris Dent

On Tue, 30 Oct 2018, Mohammed Naser wrote:


We spoke about this today in the OpenStack Ansible meeting, we've come
up with the following steps:


Great! Thank you, Guilherme, and Lee very much.


1) Create a role for placement which will be called `os_placement`
located in `openstack/openstack-ansible-os_placement`
2) Integrate that role with the OSA master and stop using the built-in
placement service
3) Update the playbooks to handle upgrades and verify using our
periodic upgrade jobs


Makes sense.


The difficult task really comes in the upgrade jobs, I really hope
that we can get some help on this as this probably puts a bit of a
load already on Guilherme, so anyone up to look into that part when
the first 2 are completed? :)


The upgrade-nova script in https://review.openstack.org/#/c/604454/
has been written to make it pretty clear what each of the steps
mean. With luck those steps can translate to both the ansible and
tripleo environments.

Please feel free to add me to any of the reviews and come calling in
#openstack-placement with questions if there are any.

--
Chris Dent   ٩◔̯◔۶   https://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][openstack-ansible][nova][placement] Owners needed for placement extraction upgrade deployment tooling

2018-10-30 Thread Emilien Macchi
On the TripleO side, it sounds like Lee Yarwood is taking the lead with a
first commit in puppet-placement:
https://review.openstack.org/#/c/604182/

Lee, can you confirm that you and your team are working on it for Stein
cycle?

On Thu, Oct 25, 2018 at 1:34 PM Matt Riedemann  wrote:

> Hello OSA/TripleO people,
>
> A plan/checklist was put in place at the Stein PTG for extracting
> placement from nova [1]. The first item in that list is done in grenade
> [2], which is the devstack-based upgrade project in the integrated gate.
> That should serve as a template for the necessary upgrade steps in
> deployment projects. The related devstack change for extracted placement
> on the master branch (Stein) is [3]. Note that change has some
> dependencies.
>
> The second point in the plan from the PTG was getting extracted
> placement upgrade tooling support in a deployment project, notably
> TripleO (and/or OpenStackAnsible).
>
> Given the grenade change is done and passing tests, TripleO/OSA should
> be able to start coding up and testing an upgrade step when going from
> Rocky to Stein. My question is who can we name as an owner in either
> project to start this work? Because we really need to be starting this
> as soon as possible to flush out any issues before they are too late to
> correct in Stein.
>
> So if we have volunteers or better yet potential patches that I'm just
> not aware of, please speak up here so we know who to contact about
> status updates and if there are any questions with the upgrade.
>
> [1]
>
> http://lists.openstack.org/pipermail/openstack-dev/2018-September/134541.html
> [2] https://review.openstack.org/#/c/604454/
> [3] https://review.openstack.org/#/c/600162/
>
> --
>
> Thanks,
>
> Matt
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


-- 
Emilien Macchi
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   5   6   7   8   9   10   >