Re: [openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-29 Thread Steven Hardy
On Wed, Jun 29, 2016 at 02:59:45PM +0200, Dmitry Tantsur wrote:
> On 06/28/2016 01:37 PM, Erno Kuvaja wrote:
> > TL;DR
> > 
> > Makes absolutely sense to run file backend on single node undercloud at CI.
> > 
> > Few more comments inline.
> > 
> > On Mon, Jun 27, 2016 at 8:49 PM, Emilien Macchi  wrote:
> > > On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard  
> > > wrote:
> > > > There's probably some minimal gain in cross compatibility testing to
> > > > sticking with the status quo.  The Swift API is old and stable, but I
> > > > believe there was some bug in recent history where some return value in
> > > > swiftclient changed from a iterable to a generator or something and some
> > > > aggressive non-duck type checking broke something somewhere
> > > > 
> > > > I find that bug reports sorta interesting, the reported memory pressure
> > > > there doesn't make sense.  Maybe there's some non-
> > > > essential middleware configured on that proxy that's causing the 
> > > > workers to
> > > > bloat up like that?
> > > 
> > > Swift proxy pipeline:
> > > pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
> > > formpost authtoken keystone staticweb proxy-logging proxy-server
> > 
> > Some things I do not think we benefit having there if we want to
> > experiment still with swift in undercloud:
> 
> I hope we're not removing it completely...

No, definitely not - we require Swift for several things other than backing
glance, including:

- Storing introspection data from ironic-inspector
- Signals/Metadata-polling for Heat using the tempurl transport
- Mistral deployment workflows, where plans are pushed into swift

> > staticweb - do we need containers being presented as webpages?
> > tempurl - Id assume we can expect the user having access the needed
> > objects with their own credentials.
> 
> Please leave it there, we need it to support agent_* family of ironic
> drivers.

Yes, we need tempurl for heat metadata/signals and also upload of artefacts
such as puppet modules to the nodes, so we definitely need tempurl.

Steve

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-29 Thread Dmitry Tantsur

On 06/28/2016 01:37 PM, Erno Kuvaja wrote:

TL;DR

Makes absolutely sense to run file backend on single node undercloud at CI.

Few more comments inline.

On Mon, Jun 27, 2016 at 8:49 PM, Emilien Macchi  wrote:

On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard  wrote:

There's probably some minimal gain in cross compatibility testing to
sticking with the status quo.  The Swift API is old and stable, but I
believe there was some bug in recent history where some return value in
swiftclient changed from a iterable to a generator or something and some
aggressive non-duck type checking broke something somewhere

I find that bug reports sorta interesting, the reported memory pressure
there doesn't make sense.  Maybe there's some non-
essential middleware configured on that proxy that's causing the workers to
bloat up like that?


Swift proxy pipeline:
pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
formpost authtoken keystone staticweb proxy-logging proxy-server


Some things I do not think we benefit having there if we want to
experiment still with swift in undercloud:


I hope we're not removing it completely...


staticweb - do we need containers being presented as webpages?
tempurl - Id assume we can expect the user having access the needed
objects with their own credentials.


Please leave it there, we need it to support agent_* family of ironic 
drivers.



formpost - likely we do not need http forms instead of PUT calls either.
ratelimit - There and there, have we had single time where something
goes grazy and ratelimit has saved us and the tests still not failed.
healthcheck - not likely used, but also really lightweight so
shouldn't make any difference

cache - Memcache is likely the thing that kills us.



Thanks for your help,


-clayg

On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi  wrote:


Hi,

Today we're re-investigating a CI failure that we had multiple times [1]:
Swift memory usage grows until it is OOM-killed.

The perimeter of this thread is about our CI and not production
environments.
Indeed, our CI is running limited resources while production
environments should not hit this problem.

After some investigation on #ŧripleo, we found out this scenario was
happening almost every time since recently:

* undercloud is deployed, glance and swift are running. Glance is
configured with Swift backend to store images.
* tripleo CI upload overcloud image into Glance, image is successfully
uploaded.
* when overcloud starts deploying, some nodes randomly fail to deploy
because the undercloud OOM-kills swift-proxy-server that is still
sending the ovecloud image requested by Glance API. Swift fails,
Glance fails, overcloud deployment fails with a "No valid hosts
found".

It's likely due to performances issues in our CI, and there is nothing
we can do but adding more resources or reducing the number of
environments, something we won't do at this time, because our recent
improvements in our CI (more ram, SSD, etc).


So the possible streamlining and optimizing swift for small
environment was tried already?

Another thing that comes to my mind based on the discussions lately.
What is the core count on our CI uc node? Are all the serviced
deployed there with their default worker values? Might be sensible
(even for production use) to limit the amount of workers our services
kick up in aio undercloud as that tends to have huge impact on memory
consumption.

- Erno "jokke_" Kuvaja


As a first iteration, I propose [2] that we stop using Swift as a
backend for Glance. Indeed, our undercloud is currently single-node, I
see zero value of using Swift to store the overcloud image.
If there is a value, then we can add the option to whether or not
using it (and set it to False in our CI to use file backend, which
won't lead to OOM).

Note: on the overcloud: we currently support file, swift and rbd
backends, that you can easily select during your deployment.

[1] https://bugs.launchpad.net/tripleo/+bug/1595916
[2] https://review.openstack.org/#/c/334555/
--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





--
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



Re: [openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-28 Thread Erno Kuvaja
TL;DR

Makes absolutely sense to run file backend on single node undercloud at CI.

Few more comments inline.

On Mon, Jun 27, 2016 at 8:49 PM, Emilien Macchi  wrote:
> On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard  wrote:
>> There's probably some minimal gain in cross compatibility testing to
>> sticking with the status quo.  The Swift API is old and stable, but I
>> believe there was some bug in recent history where some return value in
>> swiftclient changed from a iterable to a generator or something and some
>> aggressive non-duck type checking broke something somewhere
>>
>> I find that bug reports sorta interesting, the reported memory pressure
>> there doesn't make sense.  Maybe there's some non-
>> essential middleware configured on that proxy that's causing the workers to
>> bloat up like that?
>
> Swift proxy pipeline:
> pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
> formpost authtoken keystone staticweb proxy-logging proxy-server

Some things I do not think we benefit having there if we want to
experiment still with swift in undercloud:
staticweb - do we need containers being presented as webpages?
tempurl - Id assume we can expect the user having access the needed
objects with their own credentials.
formpost - likely we do not need http forms instead of PUT calls either.
ratelimit - There and there, have we had single time where something
goes grazy and ratelimit has saved us and the tests still not failed.
healthcheck - not likely used, but also really lightweight so
shouldn't make any difference

cache - Memcache is likely the thing that kills us.

>
> Thanks for your help,
>
>> -clayg
>>
>> On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi  wrote:
>>>
>>> Hi,
>>>
>>> Today we're re-investigating a CI failure that we had multiple times [1]:
>>> Swift memory usage grows until it is OOM-killed.
>>>
>>> The perimeter of this thread is about our CI and not production
>>> environments.
>>> Indeed, our CI is running limited resources while production
>>> environments should not hit this problem.
>>>
>>> After some investigation on #ŧripleo, we found out this scenario was
>>> happening almost every time since recently:
>>>
>>> * undercloud is deployed, glance and swift are running. Glance is
>>> configured with Swift backend to store images.
>>> * tripleo CI upload overcloud image into Glance, image is successfully
>>> uploaded.
>>> * when overcloud starts deploying, some nodes randomly fail to deploy
>>> because the undercloud OOM-kills swift-proxy-server that is still
>>> sending the ovecloud image requested by Glance API. Swift fails,
>>> Glance fails, overcloud deployment fails with a "No valid hosts
>>> found".
>>>
>>> It's likely due to performances issues in our CI, and there is nothing
>>> we can do but adding more resources or reducing the number of
>>> environments, something we won't do at this time, because our recent
>>> improvements in our CI (more ram, SSD, etc).

So the possible streamlining and optimizing swift for small
environment was tried already?

Another thing that comes to my mind based on the discussions lately.
What is the core count on our CI uc node? Are all the serviced
deployed there with their default worker values? Might be sensible
(even for production use) to limit the amount of workers our services
kick up in aio undercloud as that tends to have huge impact on memory
consumption.

- Erno "jokke_" Kuvaja
>>>
>>> As a first iteration, I propose [2] that we stop using Swift as a
>>> backend for Glance. Indeed, our undercloud is currently single-node, I
>>> see zero value of using Swift to store the overcloud image.
>>> If there is a value, then we can add the option to whether or not
>>> using it (and set it to False in our CI to use file backend, which
>>> won't lead to OOM).
>>>
>>> Note: on the overcloud: we currently support file, swift and rbd
>>> backends, that you can easily select during your deployment.
>>>
>>> [1] https://bugs.launchpad.net/tripleo/+bug/1595916
>>> [2] https://review.openstack.org/#/c/334555/
>>> --
>>> Emilien Macchi
>>>
>>> __
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> 

Re: [openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-27 Thread Emilien Macchi
On Mon, Jun 27, 2016 at 3:46 PM, Clay Gerrard  wrote:
> There's probably some minimal gain in cross compatibility testing to
> sticking with the status quo.  The Swift API is old and stable, but I
> believe there was some bug in recent history where some return value in
> swiftclient changed from a iterable to a generator or something and some
> aggressive non-duck type checking broke something somewhere
>
> I find that bug reports sorta interesting, the reported memory pressure
> there doesn't make sense.  Maybe there's some non-
> essential middleware configured on that proxy that's causing the workers to
> bloat up like that?

Swift proxy pipeline:
pipeline = catch_errors healthcheck cache ratelimit bulk tempurl
formpost authtoken keystone staticweb proxy-logging proxy-server

Thanks for your help,

> -clayg
>
> On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi  wrote:
>>
>> Hi,
>>
>> Today we're re-investigating a CI failure that we had multiple times [1]:
>> Swift memory usage grows until it is OOM-killed.
>>
>> The perimeter of this thread is about our CI and not production
>> environments.
>> Indeed, our CI is running limited resources while production
>> environments should not hit this problem.
>>
>> After some investigation on #ŧripleo, we found out this scenario was
>> happening almost every time since recently:
>>
>> * undercloud is deployed, glance and swift are running. Glance is
>> configured with Swift backend to store images.
>> * tripleo CI upload overcloud image into Glance, image is successfully
>> uploaded.
>> * when overcloud starts deploying, some nodes randomly fail to deploy
>> because the undercloud OOM-kills swift-proxy-server that is still
>> sending the ovecloud image requested by Glance API. Swift fails,
>> Glance fails, overcloud deployment fails with a "No valid hosts
>> found".
>>
>> It's likely due to performances issues in our CI, and there is nothing
>> we can do but adding more resources or reducing the number of
>> environments, something we won't do at this time, because our recent
>> improvements in our CI (more ram, SSD, etc).
>>
>> As a first iteration, I propose [2] that we stop using Swift as a
>> backend for Glance. Indeed, our undercloud is currently single-node, I
>> see zero value of using Swift to store the overcloud image.
>> If there is a value, then we can add the option to whether or not
>> using it (and set it to False in our CI to use file backend, which
>> won't lead to OOM).
>>
>> Note: on the overcloud: we currently support file, swift and rbd
>> backends, that you can easily select during your deployment.
>>
>> [1] https://bugs.launchpad.net/tripleo/+bug/1595916
>> [2] https://review.openstack.org/#/c/334555/
>> --
>> Emilien Macchi
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-27 Thread Clay Gerrard
There's probably some minimal gain in cross compatibility testing to
sticking with the status quo.  The Swift API is old and stable, but I
believe there was some bug in recent history where some return value in
swiftclient changed from a iterable to a generator or something and some
aggressive non-duck type checking broke something somewhere

I find that bug reports sorta interesting, the reported memory pressure
there doesn't make sense.  Maybe there's some non-
essential middleware configured on that proxy that's causing the workers to
bloat up like that?

-clayg

On Mon, Jun 27, 2016 at 12:30 PM, Emilien Macchi  wrote:

> Hi,
>
> Today we're re-investigating a CI failure that we had multiple times [1]:
> Swift memory usage grows until it is OOM-killed.
>
> The perimeter of this thread is about our CI and not production
> environments.
> Indeed, our CI is running limited resources while production
> environments should not hit this problem.
>
> After some investigation on #ŧripleo, we found out this scenario was
> happening almost every time since recently:
>
> * undercloud is deployed, glance and swift are running. Glance is
> configured with Swift backend to store images.
> * tripleo CI upload overcloud image into Glance, image is successfully
> uploaded.
> * when overcloud starts deploying, some nodes randomly fail to deploy
> because the undercloud OOM-kills swift-proxy-server that is still
> sending the ovecloud image requested by Glance API. Swift fails,
> Glance fails, overcloud deployment fails with a "No valid hosts
> found".
>
> It's likely due to performances issues in our CI, and there is nothing
> we can do but adding more resources or reducing the number of
> environments, something we won't do at this time, because our recent
> improvements in our CI (more ram, SSD, etc).
>
> As a first iteration, I propose [2] that we stop using Swift as a
> backend for Glance. Indeed, our undercloud is currently single-node, I
> see zero value of using Swift to store the overcloud image.
> If there is a value, then we can add the option to whether or not
> using it (and set it to False in our CI to use file backend, which
> won't lead to OOM).
>
> Note: on the overcloud: we currently support file, swift and rbd
> backends, that you can easily select during your deployment.
>
> [1] https://bugs.launchpad.net/tripleo/+bug/1595916
> [2] https://review.openstack.org/#/c/334555/
> --
> Emilien Macchi
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] glance backend: replace swift by file in CI

2016-06-27 Thread Emilien Macchi
Hi,

Today we're re-investigating a CI failure that we had multiple times [1]:
Swift memory usage grows until it is OOM-killed.

The perimeter of this thread is about our CI and not production environments.
Indeed, our CI is running limited resources while production
environments should not hit this problem.

After some investigation on #ŧripleo, we found out this scenario was
happening almost every time since recently:

* undercloud is deployed, glance and swift are running. Glance is
configured with Swift backend to store images.
* tripleo CI upload overcloud image into Glance, image is successfully uploaded.
* when overcloud starts deploying, some nodes randomly fail to deploy
because the undercloud OOM-kills swift-proxy-server that is still
sending the ovecloud image requested by Glance API. Swift fails,
Glance fails, overcloud deployment fails with a "No valid hosts
found".

It's likely due to performances issues in our CI, and there is nothing
we can do but adding more resources or reducing the number of
environments, something we won't do at this time, because our recent
improvements in our CI (more ram, SSD, etc).

As a first iteration, I propose [2] that we stop using Swift as a
backend for Glance. Indeed, our undercloud is currently single-node, I
see zero value of using Swift to store the overcloud image.
If there is a value, then we can add the option to whether or not
using it (and set it to False in our CI to use file backend, which
won't lead to OOM).

Note: on the overcloud: we currently support file, swift and rbd
backends, that you can easily select during your deployment.

[1] https://bugs.launchpad.net/tripleo/+bug/1595916
[2] https://review.openstack.org/#/c/334555/
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev