Re: [openstack-dev] [tripleo] EOL process for newton branches

2018-07-18 Thread Tony Breeds
On Wed, Jul 18, 2018 at 08:08:16PM -0400, Emilien Macchi wrote:
> Option 2, EOL everything.
> Thanks a lot for your help on this one, Tony.

No problem.

I've created: 
 https://review.openstack.org/583856
to tag final releases for tripleo deliverables and then mark them as
EOL.

Once that merges we can arrange for someone, with appropriate
permissions to run:

# EOL repos belonging to tripleo
eol_branch.sh -- stable/newton newton-eol \
 openstack/instack openstack/instack-undercloud \
 openstack/os-apply-config openstack/os-collect-config \
 openstack/os-net-config openstack/os-refresh-config \
 openstack/puppet-tripleo openstack/python-tripleoclient \
 openstack/tripleo-common openstack/tripleo-heat-templates \
 openstack/tripleo-image-elements \
 openstack/tripleo-puppet-elements openstack/tripleo-ui \
 openstack/tripleo-validations

Yours Tony.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][neutron][qa] Should we add a tempest-slow job?

2018-07-18 Thread Ghanshyam Mann
 > On Sun, May 13, 2018 at 1:20 PM, Ghanshyam Mann  
 > wrote: 
 > > On Fri, May 11, 2018 at 10:45 PM, Matt Riedemann  
 > > wrote: 
 > >> The tempest-full job used to run API and scenario tests concurrently, and 
 > >> if 
 > >> you go back far enough I think it also ran slow tests. 
 > >> 
 > >> Sometime in the last year or so, the full job was changed to run the 
 > >> scenario tests in serial and exclude the slow tests altogether. So the 
 > >> API 
 > >> tests run concurrently first, and then the scenario tests run in serial. 
 > >> During that change, some other tests were identified as 'slow' and marked 
 > >> as 
 > >> such, meaning they don't get run in the normal tempest-full job. 
 > >> 
 > >> There are some valuable scenario tests marked as slow, however, like the 
 > >> only encrypted volume testing we have in tempest is marked slow so it 
 > >> doesn't get run on every change for at least nova. 
 > > 
 > > Yes, basically slow tests were selected based on 
 > > https://ethercalc.openstack.org/nu56u2wrfb2b and there were frequent 
 > > gate failure for heavy tests mainly from ssh checks so we tried to 
 > > mark more tests as slow. 
 > > I agree that some of them are not really slow at least in today situation. 
 > > 
 > >> 
 > >> There is only one job that can be run against nova changes which runs the 
 > >> slow tests but it's in the experimental queue so people forget to run it. 
 > > 
 > > Tempest job 
 > > "legacy-tempest-dsvm-neutron-scenario-multinode-lvm-multibackend" 
 > > run those slow tests including migration and LVM  multibackend tests. 
 > > This job runs on tempest check pipeline and experimental (as you 
 > > mentioned) on nova and cinder [3]. We marked this as n-v to check its 
 > > stability and now it is good to go as voting on tempest. 
 > > 
 > >> 
 > >> As a test, I've proposed a nova-slow job [1] which only runs the slow 
 > >> tests 
 > >> and only the compute API and scenario tests. Since there currently no 
 > >> compute API tests marked as slow, it's really just running slow scenario 
 > >> tests. Results show it runs 37 tests in about 37 minutes [2]. The overall 
 > >> job runtime was 1 hour and 9 minutes, which is on average less than the 
 > >> tempest-full job. The nova-slow job is also running scenarios that nova 
 > >> patches don't actually care about, like the neutron IPv6 scenario tests. 
 > >> 
 > >> My question is, should we make this a generic tempest-slow job which can 
 > >> be 
 > >> run either in the integrated-gate or at least in nova/neutron/cinder 
 > >> consistently (I'm not sure if there are slow tests for just keystone or 
 > >> glance)? I don't know if the other projects already have something like 
 > >> this 
 > >> that they gate on. If so, a nova-specific job for nova changes is fine 
 > >> for 
 > >> me. 
 > > 
 > > +1 on idea. As of now slow marked tests are from nova, cinder and 
 > > neutron scenario tests and 2 API swift tests only [4]. I agree that 
 > > making a generic job in tempest is better for maintainability. We can 
 > > use existing job for that with below modification- 
 > > -  We can migrate 
 > > "legacy-tempest-dsvm-neutron-scenario-multinode-lvm-multibackend" job 
 > > zuulv3 in tempest repo 
 > > -  We can see if we can move migration tests out of it and use 
 > > "nova-live-migration" job (in tempest check pipeline ) which is much 
 > > better in live migration env setup and controlled by nova. 
 > > -  then it can be name something like 
 > > "tempest-scenario-multinode-lvm-multibackend". 
 > > -  run this job in nova, cinder, neutron check pipeline instead of 
 > > experimental. 
 >  
 > Like this - 
 > https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:scenario-tests-job
 >  
 >  
 > That makes scenario job as generic with running all scenario tests 
 > including slow tests with concurrency 2. I made few cleanup and moved 
 > live migration tests out of it which is being run by 
 > 'nova-live-migration' job. Last patch making this job as voting on 
 > tempest side. 
 >  
 > If looks good, we can use this to run on project side pipeline as voting. 

Update on this thread:
 Old Scenario  job 
"legacy-tempest-dsvm-neutron-scenario-multinode-lvm-multibackend" has been 
migrated to Tempest as new job named "tempest-scenario-all" job[1] 

Changes from old job to new job:
- This new job will run all the scenario tests including slow with lvm 
multibackend. Same as old job
-  Executed the live migration API tests out of it. Live migration API tests 
runs on separate  nova job "nova-live-migration".
- This new job runs as voting on Tempest check and gate pipeline.

This is ready to use for cross project also. i have pushed the patch to nova, 
neutron, cinder to use this new job[3] and remove 
"legacy-tempest-dsvm-neutron-scenario-multinode-lvm-multibackend" from 
project-config[4]. 

Let me know your feedback on proposed patches. 

[1] 

[openstack-dev] [octavia] Make amphora-agent support http rest api

2018-07-18 Thread Jeff Yang
 In some private cloud environments, the possibility of vm being attacked
is very small, and all personnel are trusted. At this time, the
administrator hopes to reduce the complexity of octavia deployment and
operation and maintenance. We can let the amphora-agent provide the http
api so that the administrator can ignore the issue of the certificate.
https://storyboard.openstack.org/#!/story/2003027
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread Matt Riedemann

On 7/18/2018 5:58 PM, w...@seanmooney.info wrote:

general update.
i spent some time this afternoon debuging matt's regression test
https://review.openstack.org/#/c/583339
and it now works as intended with the addtion of disableing the late
check on the compute node in the regression test to mimic devstack.


Sean, thank you again for figuring out the issue in the regression test, 
that helps a ton in asserting the fix (and it also showed I was missing 
a couple of things in the fix when I rebased on top of the test).




matt has rebasedhttps://review.openstack.org/#/c/583347  ontop of
the regression test and its currently in the ci queue.
hopefully that will pass soon.

while the chage is less then ideal it is backportable downstream if
needed where as the wider change would not be easily so that is a
plus in the short term.


We don't have to backport this fix, it was a regression introduced in 
Rocky, so that's a good thing. But agree we can do more cleanups in 
Stein if we want to change how we handle RequestSpec.num_instances so 
it's not persisted and set per operation (or just not used at all in 
scheduling since we don't really need it anymore).


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] EOL process for newton branches

2018-07-18 Thread Emilien Macchi
Option 2, EOL everything.
Thanks a lot for your help on this one, Tony.
---
Emilien Macchi

On Wed, Jul 18, 2018, 7:47 PM Tony Breeds,  wrote:

>
> Hi All,
> As of I3671f10d5a2fef0e91510a40835de962637f16e5 we have meta-data in
> openstack/releases that tells us that the following repos are at
> newton-eol:
>  - openstack/instack-undercloud
>  - openstack/os-net-config
>  - openstack/puppet-tripleo
>  - openstack/tripleo-common
>  - openstack/tripleo-heat-templates
>
> I was setting up the request to create the tags and delete those
> branches but I noticed that the following repos have newton branches and
> are not in the list above:
>
>  - openstack/instack
>  - openstack/os-apply-config
>  - openstack/os-collect-config
>  - openstack/os-refresh-config
>  - openstack/python-tripleoclient
>  - openstack/tripleo-image-elements
>  - openstack/tripleo-puppet-elements
>  - openstack/tripleo-ui
>  - openstack/tripleo-validations
>
> So I guess there are a couple of options here:
>
> 1) Just EOL the 5 repos that opensatck/releases knows are at EOL
> 2) EOL the repos from both lists ad update openstack/releases to flag
>them as such
>
> I feel like option 2 is the correct option but perhaps there is a reason
> those repos where not tagged and released
>
>
> Yours Tony.
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] EOL process for newton branches

2018-07-18 Thread Tony Breeds

Hi All,
As of I3671f10d5a2fef0e91510a40835de962637f16e5 we have meta-data in
openstack/releases that tells us that the following repos are at
newton-eol:
 - openstack/instack-undercloud
 - openstack/os-net-config
 - openstack/puppet-tripleo
 - openstack/tripleo-common
 - openstack/tripleo-heat-templates

I was setting up the request to create the tags and delete those
branches but I noticed that the following repos have newton branches and
are not in the list above:

 - openstack/instack
 - openstack/os-apply-config
 - openstack/os-collect-config
 - openstack/os-refresh-config
 - openstack/python-tripleoclient
 - openstack/tripleo-image-elements
 - openstack/tripleo-puppet-elements
 - openstack/tripleo-ui
 - openstack/tripleo-validations

So I guess there are a couple of options here:

1) Just EOL the 5 repos that opensatck/releases knows are at EOL
2) EOL the repos from both lists ad update openstack/releases to flag
   them as such

I feel like option 2 is the correct option but perhaps there is a reason
those repos where not tagged and released


Yours Tony.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread Chris Friesen

On 07/18/2018 03:43 PM, melanie witt wrote:

On Wed, 18 Jul 2018 15:14:55 -0500, Matt Riedemann wrote:

On 7/18/2018 1:13 PM, melanie witt wrote:

Can we get rid of multi-create?  It keeps causing complications, and
it already
has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but
it should
arguably start up X instances.)

We've discussed that before but I think users do use it and appreciate
the ability to boot instances in batches (one request). The behavior you
describe could be changed with a microversion, though I'm not sure if
that would mean we have to preserve old behavior with the previous
microversion.

Correct, we can't just remove it since that's a backward incompatible
microversion change. Plus, NFV people*love*  it.


Sorry, I think I might have caused confusion with my question about a
microversion. I was saying that to change the min_count=X and max_count=Y
behavior of raising NoValidHost if X can be satisfied but Y can't, I thought we
could change that in a microversion. And I wasn't sure if that would also mean
we would have to keep the old behavior for previous microversions (and thus
maintain both behaviors).


I understood you. :)

For the case where we could satisfy min_count but not max_count I think we 
*would* need to keep the existing kill-them-all behaviour for existing 
microversions since that's definitely an end-user-visible behaviour.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread work
On Wed, 2018-07-18 at 15:14 -0500, Matt Riedemann wrote:
> On 7/18/2018 1:13 PM, melanie witt wrote:
> > > 
> > > Can we get rid of multi-create?  It keeps causing complications,
> > > and 
> > > it already
> > > has weird behaviour if you ask for min_count=X and max_count=Y
> > > and only X
> > > instances can be scheduled.  (Currently it fails with
> > > NoValidHost, but 
> > > it should
> > > arguably start up X instances.)
> > 
> > We've discussed that before but I think users do use it and
> > appreciate 
> > the ability to boot instances in batches (one request). The
> > behavior you 
> > describe could be changed with a microversion, though I'm not sure
> > if 
> > that would mean we have to preserve old behavior with the previous 
> > microversion.
> 
> Correct, we can't just remove it since that's a backward
> incompatible 
> microversion change. Plus, NFV people *love* it.

do they? alot of nfv folks use heat,osm or onap to drive there
deployments. im not sure if any of thoes actully use the multi create
support. but yes people proably do use it. 
> 
> > 
> > > > After talking with Sean Mooney, we have another fix which is 
> > > > self-contained to
> > > > the scheduler [5] so we wouldn't need to make any changes to
> > > > the 
> > > > RequestSpec
> > > > handling in conductor. It's admittedly a bit hairy, so I'm
> > > > asking for 
> > > > some eyes
> > > > on it since either way we go, we should get going soon before
> > > > we hit 
> > > > the FF and
> > > > RC1 rush which *always* kills the gate.
> > > 
> > > One of your options mentioned using RequestSpec.num_instances to 
> > > decide if it's
> > > in a multi-create.  Is there any reason to persist 
> > > RequestSpec.num_instances?
> > > It seems like it's only applicable to the initial request, since
> > > after 
> > > that each
> > > instance is managed individually. 
> 
> Yes, I agree RequestSpec.num_instances is something we shouldn't
> persist 
> since it's only applicable to the initial server create (you can't 
> multi-migrate a group of instances, for example - but I'm sure
> people 
> have asked for that at some point), and it should be set per call to
> the 
> scheduler, but that's a wider-ranging change since it would touch 
> several parts of conductor, plus the request spec, plus the 
> ServerGroupAntiAffinitySchedulerFilter.

i might be a little biased but i think the localised change in the
schduler makes sense for now and we should clean this up in stine.

general update.
i spent some time this afternoon debuging matt's regression test
https://review.openstack.org/#/c/583339
and it now works as intended with the addtion of disableing the late
check on the compute node in the regression test to mimic devstack.

matt has rebased https://review.openstack.org/#/c/583347 ontop of
the regression test and its currently in the ci queue.
hopefully that will pass soon.

while the chage is less then ideal it is backportable downstream if
needed where as the wider change would not be easily so that is a
plus in the short term.

> Honestly I'm OK with doing either, and I don't think they are
> mutually 
> exclusive things, so we could make num_instances a per-request thing
> in 
> the future for sanity reasons.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread melanie witt

On Wed, 18 Jul 2018 15:14:55 -0500, Matt Riedemann wrote:

On 7/18/2018 1:13 PM, melanie witt wrote:

Can we get rid of multi-create?  It keeps causing complications, and
it already
has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but
it should
arguably start up X instances.)

We've discussed that before but I think users do use it and appreciate
the ability to boot instances in batches (one request). The behavior you
describe could be changed with a microversion, though I'm not sure if
that would mean we have to preserve old behavior with the previous
microversion.

Correct, we can't just remove it since that's a backward incompatible
microversion change. Plus, NFV people*love*  it.


Sorry, I think I might have caused confusion with my question about a 
microversion. I was saying that to change the min_count=X and 
max_count=Y behavior of raising NoValidHost if X can be satisfied but Y 
can't, I thought we could change that in a microversion. And I wasn't 
sure if that would also mean we would have to keep the old behavior for 
previous microversions (and thus maintain both behaviors).


-melanie





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Stein blueprint - Plan to remove Keepalived support (replaced by Pacemaker)

2018-07-18 Thread Michele Baldessari
On Wed, Jul 18, 2018 at 11:07:04AM -0400, Dan Prince wrote:
> On Tue, 2018-07-17 at 22:00 +0200, Michele Baldessari wrote:
> > Hi Jarda,
> > 
> > thanks for these perspectives, this is very valuable!
> > 
> > On Tue, Jul 17, 2018 at 06:01:21PM +0200, Jaromir Coufal wrote:
> > > Not rooting for any approach here, just want to add a bit of
> > > factors which might play a role when deciding which way to go:
> > > 
> > > A) Performance matters, we should be improving simplicity and speed
> > > of
> > > deployments rather than making it heavier. If the deployment time
> > > and
> > > resource consumption is not significantly higher, I think it
> > > doesn’t
> > > cause an issue. But if there is a significant difference between
> > > PCMK
> > > and keepalived architecture, we would need to review that.
> > 
> > +1 Should the pcmk take substantially more time then I agree, not
> > worth
> > defaulting to it. Worth also exploring how we could tweak things
> > to make the setup of the cluster a bit faster (on a single node we
> > can
> > lower certain wait operations) but full agreement on this point.
> > 
> > > B) Containerization of PCMK plans - eventually we would like to run
> > > the whole undercloud/overcloud on minimal OS in containers to keep
> > > improving the operations on the nodes (updates/upgrades/etc). If
> > > because PCMK we would be forever stuck on BM, it would be a bit of
> > > pita. As Michele said, maybe we can re-visit this.
> > 
> > So I briefly discussed this in our team, and while it could be
> > re-explored, we need to be very careful about the tradeoffs.
> > This would be another layer which would bring quite a bit of
> > complexity
> > (pcs commands would have to be run inside a container, speed
> > tradeoffs,
> > more limited possibilities when it comes to upgrading/updating, etc.)
> > 
> > > C) Unification of undercloud/overcloud is important for us, so +1
> > > to
> > > whichever method is being used in both. But what I know, HA folks
> > > went
> > > to keepalived since it is simpler so would be good to keep in sync
> > > (and good we have their presence here actually) :)
> > 
> > Right so to be honest, the choice of keepalived on the undercloud for
> > VIP predates me and I was not directly involved, so I lack the exact
> > background for that choice (and I could not quickly reconstruct it
> > from git
> > history). But I think it is/was a reasonable choice for what it needs
> > doing, although I probably would have picked just configuring the
> > extra
> > VIPs on the interfaces and have one service less to care about.
> > +1 in general on the unification, with the caveats that have been
> > discussed so far.
> 
> I think it was more of that we wanted to use HAProxy for SSL
> termination and keepalived is a simple enough way to set this up.
> Instack-Undercloud has used HAProxy/keepalived for years in this
> manner.
> 
> I think this came up recently because downstream we did not have a
> keepalived container. So it got a bit of spotlight on it as to why we
> were using it. We do have a keepalived RPM and its worked as it has for
> years already so as far as single node/undercloud setups go I think it
> would continue to work fine. Kolla has had and supports the keepalived
> container for awhile now as well.
> 
> ---
> 
> Comments on this thread seem to cover 2 main themes to me.
> Simplification and the desire to use the same architecture as the
> Overcloud (Pacemaker). And there is some competition between them.
> 
> For simplification: If we can eliminate keepalived and still use
> HAProxy (thus keeping the SSL termination features working) then I
> think that would be worth trying. Specifically can we eliminate
> Keepalived without swapping in Pacemaker? Michele: if you have ideas
> here lets try them!

I don't think it makes a lot of sense to just move to native IPs on
interfaces just to remove keepalived. At least I don't see a good
trade-off. If it has worked so far, I'd say let's just keep it
(unless there are compelling arguments to remove it, of course)

> With regards to Pacemaker I think we need to make an exception. It
> seems way too heavy for single node setups and increases the complexity
> there for very little benefit. 
> To me the shared architecture for
> TripleO is the tools we use to setup services. By using t-h-t to drive
> our setup of the Undercloud and All-In-One installers we are already
> gaining a lot of benefit here. Pacemaker is weird as it is kind of
> augments the architecture a bit (HA architecture). But Pacemaker is
> also a service that gets configured by TripleO. So it kind of falls
> into both categories. Pacemaker gives us features we need in the
> Overcloud at the cost of some extra complexity. And in addition to all
> this we are still running the Pacemaker processes themselves on
> baremetal. All this just to say we are running the same "architecture"
> on both the Undercloud and Overcloud? I'm not a fan.

Fully agreed on the extra 

Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread Matt Riedemann

On 7/18/2018 1:13 PM, melanie witt wrote:


Can we get rid of multi-create?  It keeps causing complications, and 
it already

has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but 
it should

arguably start up X instances.)


We've discussed that before but I think users do use it and appreciate 
the ability to boot instances in batches (one request). The behavior you 
describe could be changed with a microversion, though I'm not sure if 
that would mean we have to preserve old behavior with the previous 
microversion.


Correct, we can't just remove it since that's a backward incompatible 
microversion change. Plus, NFV people *love* it.




After talking with Sean Mooney, we have another fix which is 
self-contained to
the scheduler [5] so we wouldn't need to make any changes to the 
RequestSpec
handling in conductor. It's admittedly a bit hairy, so I'm asking for 
some eyes
on it since either way we go, we should get going soon before we hit 
the FF and

RC1 rush which *always* kills the gate.


One of your options mentioned using RequestSpec.num_instances to 
decide if it's
in a multi-create.  Is there any reason to persist 
RequestSpec.num_instances?
It seems like it's only applicable to the initial request, since after 
that each
instance is managed individually. 


Yes, I agree RequestSpec.num_instances is something we shouldn't persist 
since it's only applicable to the initial server create (you can't 
multi-migrate a group of instances, for example - but I'm sure people 
have asked for that at some point), and it should be set per call to the 
scheduler, but that's a wider-ranging change since it would touch 
several parts of conductor, plus the request spec, plus the 
ServerGroupAntiAffinitySchedulerFilter.


Honestly I'm OK with doing either, and I don't think they are mutually 
exclusive things, so we could make num_instances a per-request thing in 
the future for sanity reasons.


--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Stein blueprint - Plan to remove Keepalived support (replaced by Pacemaker)

2018-07-18 Thread Emilien Macchi
Thanks everyone for this useful feedback (I guess it helps a lot to discuss
before the PTG, so we don't even need to spend too much time on this topic).

1) Everyone agrees that undercloud HA isn't something we target now,
therefore we won't switch to Pacemaker by default.
2) Pacemaker would still be a good option for multinode/HA standalone
deployments, like we do for the overcloud.
3) Investigate how we could replace keepalived by something which would
handle the VIPs used by HAproxy.

I've abandoned the patches that tested Pacemaker on the undercloud, and
also the patch in tripleoclient for enable_pacemaker parameter, I think we
don't need it for now. There is another way to enable Pacemaker for
Standalone. I also closed the blueprint:
https://blueprints.launchpad.net/tripleo/+spec/undercloud-pacemaker-default
and created a new one:
https://blueprints.launchpad.net/tripleo/+spec/replace-keepalived-undercloud
Please take a look and let me know what you think.

It fits well with the Simplicity theme for Stein, and it'll help to remove
services that we don't need anymore.

If any feedback on this summary, please go ahead and comment.
Thanks,

On Wed, Jul 18, 2018 at 11:07 AM Dan Prince  wrote:

> On Tue, 2018-07-17 at 22:00 +0200, Michele Baldessari wrote:
> > Hi Jarda,
> >
> > thanks for these perspectives, this is very valuable!
> >
> > On Tue, Jul 17, 2018 at 06:01:21PM +0200, Jaromir Coufal wrote:
> > > Not rooting for any approach here, just want to add a bit of
> > > factors which might play a role when deciding which way to go:
> > >
> > > A) Performance matters, we should be improving simplicity and speed
> > > of
> > > deployments rather than making it heavier. If the deployment time
> > > and
> > > resource consumption is not significantly higher, I think it
> > > doesn’t
> > > cause an issue. But if there is a significant difference between
> > > PCMK
> > > and keepalived architecture, we would need to review that.
> >
> > +1 Should the pcmk take substantially more time then I agree, not
> > worth
> > defaulting to it. Worth also exploring how we could tweak things
> > to make the setup of the cluster a bit faster (on a single node we
> > can
> > lower certain wait operations) but full agreement on this point.
> >
> > > B) Containerization of PCMK plans - eventually we would like to run
> > > the whole undercloud/overcloud on minimal OS in containers to keep
> > > improving the operations on the nodes (updates/upgrades/etc). If
> > > because PCMK we would be forever stuck on BM, it would be a bit of
> > > pita. As Michele said, maybe we can re-visit this.
> >
> > So I briefly discussed this in our team, and while it could be
> > re-explored, we need to be very careful about the tradeoffs.
> > This would be another layer which would bring quite a bit of
> > complexity
> > (pcs commands would have to be run inside a container, speed
> > tradeoffs,
> > more limited possibilities when it comes to upgrading/updating, etc.)
> >
> > > C) Unification of undercloud/overcloud is important for us, so +1
> > > to
> > > whichever method is being used in both. But what I know, HA folks
> > > went
> > > to keepalived since it is simpler so would be good to keep in sync
> > > (and good we have their presence here actually) :)
> >
> > Right so to be honest, the choice of keepalived on the undercloud for
> > VIP predates me and I was not directly involved, so I lack the exact
> > background for that choice (and I could not quickly reconstruct it
> > from git
> > history). But I think it is/was a reasonable choice for what it needs
> > doing, although I probably would have picked just configuring the
> > extra
> > VIPs on the interfaces and have one service less to care about.
> > +1 in general on the unification, with the caveats that have been
> > discussed so far.
>
> I think it was more of that we wanted to use HAProxy for SSL
> termination and keepalived is a simple enough way to set this up.
> Instack-Undercloud has used HAProxy/keepalived for years in this
> manner.
>
> I think this came up recently because downstream we did not have a
> keepalived container. So it got a bit of spotlight on it as to why we
> were using it. We do have a keepalived RPM and its worked as it has for
> years already so as far as single node/undercloud setups go I think it
> would continue to work fine. Kolla has had and supports the keepalived
> container for awhile now as well.
>
> ---
>
> Comments on this thread seem to cover 2 main themes to me.
> Simplification and the desire to use the same architecture as the
> Overcloud (Pacemaker). And there is some competition between them.
>
> For simplification: If we can eliminate keepalived and still use
> HAProxy (thus keeping the SSL termination features working) then I
> think that would be worth trying. Specifically can we eliminate
> Keepalived without swapping in Pacemaker? Michele: if you have ideas
> here lets try them!
>
> With regards to Pacemaker I think we need to make 

Re: [openstack-dev] [keystone] Keystone Team Update - Week of 9 July 2018

2018-07-18 Thread Lance Bragstad


On 07/13/2018 01:33 PM, Colleen Murphy wrote:
> # Keystone Team Update - Week of 9 July 2018
>
> ## News
>
> ### New Core Reviewer
>
> We added a new core reviewer[1]: thanks to XiYuan for stepping up to take 
> this responsibility and for all your hard work on keystone!
>
> [1] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132123.html
>
> ### Release Status
>
> This week is our scheduled feature freeze week, but we did not have quite the 
> tumult of activity we had at feature freeze last cycle. We're pushing the 
> auth receipts work until after the token model refactor is finished[2], to 
> avoid the receipts model having to carry extra technical debt. The 
> fine-grained access control feature for application credentials is also going 
> to need to be pushed to next cycle when more of us can dedicate time to 
> helping with it it[3]. The base work for default roles was completed[4] but 
> the auditing of the keystone API hasn't been completed yet and is partly 
> dependent on the flask work, so it is going to continue on into next 
> cycle[5]. The hierarchical limits work is pretty solid but we're (likely) 
> going to let it slide into next week so that some of the interface details 
> can be worked out[6].
>   
> [2] 
> http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2018-07-10.log.html#t2018-07-10T01:39:27
> [3] 
> http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2018-07-13.log.html#t2018-07-13T14:19:08
> [4] https://review.openstack.org/572243
> [5] 
> http://eavesdrop.openstack.org/irclogs/%23openstack-keystone/%23openstack-keystone.2018-07-13.log.html#t2018-07-13T14:02:03
> [6] https://review.openstack.org/557696
>
> ### PTG Planning
>
> We're starting to prepare topics for the next PTG in Denver[7] so please add 
> topics to the planning etherpad[8].
>
> [7] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132144.html
> [8] https://etherpad.openstack.org/p/keystone-stein-ptg
>
> ## Recently Merged Changes
>
> Search query: https://bit.ly/2IACk3F
>
> We merged 20 changes this week, including several of the flask conversion 
> patches.
>
> ## Changes that need Attention
>
> Search query: https://bit.ly/2wv7QLK
>
> There are 62 changes that are passing CI, not in merge conflict, have no 
> negative reviews and aren't proposed by bots. The major efforts to focus on 
> are the token model refactor[9], the flaskification work[10], and the 
> hierarchical project limits work[11].
>
> [9] https://review.openstack.org/#/q/is:open+topic:bug/1778945
> [10] https://review.openstack.org/#/q/is:open+topic:bug/1776504
> [11] https://review.openstack.org/#/q/is:open+topic:bp/strict-two-level-model
>
> ## Bugs
>
> This week we opened 3 new bugs and closed 4.
>
> Bugs opened (3) 
> Bug #1780532 (keystone:Undecided) opened by zheng yan 
> https://bugs.launchpad.net/keystone/+bug/1780532 
> Bug #1780896 (keystone:Undecided) opened by wangxiyuan 
> https://bugs.launchpad.net/keystone/+bug/1780896 
> Bug #1781536 (keystone:Undecided) opened by Pawan Gupta 
> https://bugs.launchpad.net/keystone/+bug/1781536 
>
> Bugs closed (0) 
>
> Bugs fixed (4) 
> Bug #1765193 (keystone:Medium) fixed by wangxiyuan 
> https://bugs.launchpad.net/keystone/+bug/1765193 
> Bug #1780159 (keystone:Medium) fixed by Sami Makki 
> https://bugs.launchpad.net/keystone/+bug/1780159 
> Bug #1780896 (keystone:Undecided) fixed by wangxiyuan 
> https://bugs.launchpad.net/keystone/+bug/1780896 
> Bug #1779172 (oslo.policy:Undecided) fixed by Lance Bragstad 
> https://bugs.launchpad.net/oslo.policy/+bug/1779172
>
> ## Milestone Outlook
>
> https://releases.openstack.org/rocky/schedule.html
>
> This week is our scheduled feature freeze. We are likely going to make an 
> extension for the hierarchical project limits work, pending discussion on the 
> mailing list.
>
> Next week is the non-client final release date[12], so work happening in 
> keystoneauth, keystonemiddleware, and our oslo libraries needs to be finished 
> and reviewed prior to next Thursday so a release can be requested in time.
I've starred some reviews that I think we should land before Thursday if
possible [0]. Eyes there would be appreciated. Morgan also reported a
bug that he is working on fixing in keystonemiddleware that we should
try an include as well [1]. I'll add the patch to the query as soon as a
review is proposed to gerrit.

[0]
https://review.openstack.org/#/q/starredby:lbragstad%2540gmail.com+status:open
[1] https://bugs.launchpad.net/keystonemiddleware/+bug/1782404
>
> [12] https://review.openstack.org/572243
>
> ## Help with this newsletter
>
> Help contribute to this newsletter by editing the etherpad: 
> https://etherpad.openstack.org/p/keystone-team-newsletter
> Dashboard generated using gerrit-dash-creator and 
> https://gist.github.com/lbragstad/9b0477289177743d1ebfc276d1697b67
>
> __
> 

Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread melanie witt

On Wed, 18 Jul 2018 12:05:13 -0600, Chris Friesen wrote:

On 07/18/2018 10:14 AM, Matt Riedemann wrote:

As can be seen from logstash [1] this bug is hurting us pretty bad in the check
queue.

I thought I originally had this fixed with [2] but that turned out to only be
part of the issue.

I think I've identified the problem but I have failed to write a recreate
regression test [3] because (I think) it's due to random ordering of which
request spec we select to send to the scheduler during a multi-create request
(and I tried making that predictable by sorting the instances by uuid in both
conductor and the scheduler but that didn't make a difference in my test).


Can we get rid of multi-create?  It keeps causing complications, and it already
has weird behaviour if you ask for min_count=X and max_count=Y and only X
instances can be scheduled.  (Currently it fails with NoValidHost, but it should
arguably start up X instances.)


We've discussed that before but I think users do use it and appreciate 
the ability to boot instances in batches (one request). The behavior you 
describe could be changed with a microversion, though I'm not sure if 
that would mean we have to preserve old behavior with the previous 
microversion.



After talking with Sean Mooney, we have another fix which is self-contained to
the scheduler [5] so we wouldn't need to make any changes to the RequestSpec
handling in conductor. It's admittedly a bit hairy, so I'm asking for some eyes
on it since either way we go, we should get going soon before we hit the FF and
RC1 rush which *always* kills the gate.


One of your options mentioned using RequestSpec.num_instances to decide if it's
in a multi-create.  Is there any reason to persist RequestSpec.num_instances?
It seems like it's only applicable to the initial request, since after that each
instance is managed individually.

Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev







__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack lagging behind 2 major python versions: we need a Python 3.7 gate

2018-07-18 Thread Clark Boylan
On Thu, Jul 12, 2018, at 1:38 PM, Thomas Goirand wrote:
> Hi everyone!
> 
> It's yet another of these emails where I'm going to complain out of
> frustration because of OpenStack having bugs when running with the
> newest stuff... Sorry in advance ! :)
> 
> tl;dr: It's urgent, we need Python 3.7 uwsgi + SSL gate jobs.
> 
> Longer version:
> 
> When Python 3.6 reached Debian, i already forwarded a few patches. It
> went quite ok, but still... When switching services to Python 3 for
> Newton, I discover that many services still had issues with uwsgi /
> mod_wsgi, and I spent a large amount of time trying to figure out ways
> to fix the situation. Some patches are still not yet merged, even though
> it was a community goal to have this support for Newton:
> 
> Neutron:
> https://review.openstack.org/#/c/555608/
> https://review.openstack.org/#/c/580049/
> 
> Neutron FWaaS:
> https://review.openstack.org/#/c/580327/
> https://review.openstack.org/#/c/579433/
> 
> Horizon tempest plugin:
> https://review.openstack.org/#/c/575714/
> 
> Oslotet (clearly, the -1 is for someone considering only Devstack /
> venv, not understanding packaging environment):
> https://review.openstack.org/#/c/571962/
> 
> Designate:
> As much as I know, it still doesn't support uwsgi / mod_wsgi (please let
> me know if this changed recently).
> 
> There may be more, I didn't have much time investigating some projects
> which are less important to me.
> 
> Now, both Debian and Ubuntu have Python 3.7. Every package which I
> upload in Sid need to support that. Yet, OpenStack's CI is still lagging
> with Python 3.5. And there's lots of things currently broken. We've
> fixed most "async" stuff, though we are failing to rebuild
> oslo.messaging (from Queens) with Python 3.7: unit tests are just
> hanging doing nothing.
> 
> I'm very happy to do small contributions to each and every component
> here and there whenever it's possible, but this time, it's becoming a
> little bit frustrating. I sometimes even got replies like "hum ...
> OpenStack only supports Python 3.5" a few times. That's not really
> acceptable, unfortunately.
> 
> So moving forward, what I think needs to happen is:
> 
> - Get each and every project to actually gate using uwsgi for the API,
> using both Python 3 and SSL (any other test environment is *NOT* a real
> production environment).
> 
> - The gating has to happen with whatever is the latest Python 3 version
> available. Best would even be if we could have that *BEFORE* it reaches
> distributions like Debian and Ubuntu. I'm aware that there's been some
> attempts in the OpenStack infra to have Debian Sid (which is probably
> the distribution getting the updates the faster). This effort needs to
> be restarted, and some (non-voting ?) gate jobs needs to be setup using
> whatever the latest thing is. If it cannot happen with Sid, then I don't
> know, choose another platform, and do the Python 3-latest gating...

When you asked about this last month I suggested Tumbleweed as an option. You 
get rolling release packages that are almost always up to date. I'd still 
suggest that now as a place to start.

http://lists.openstack.org/pipermail/openstack-dev/2018-June/131302.html

> 
> The current situation with the gate still doing Python 3.5 only jobs is
> just not sustainable anymore. Moving forward, Python 2.7 will die. When
> this happens, moving faster with Python 3 versions will be mandatory for
> everyone, not only for fools like me who made the switch early.
> 
>  :)
> 
> Cheers,
> 
> Thomas Goirand (zigo)
> 
> P.S: A big thanks to everyone who where helpful for making the switch to
> Python 3 in Debian, especially Annp and the rest of the Neutron team.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread Chris Friesen

On 07/18/2018 10:14 AM, Matt Riedemann wrote:

As can be seen from logstash [1] this bug is hurting us pretty bad in the check
queue.

I thought I originally had this fixed with [2] but that turned out to only be
part of the issue.

I think I've identified the problem but I have failed to write a recreate
regression test [3] because (I think) it's due to random ordering of which
request spec we select to send to the scheduler during a multi-create request
(and I tried making that predictable by sorting the instances by uuid in both
conductor and the scheduler but that didn't make a difference in my test).


Can we get rid of multi-create?  It keeps causing complications, and it already 
has weird behaviour if you ask for min_count=X and max_count=Y and only X 
instances can be scheduled.  (Currently it fails with NoValidHost, but it should 
arguably start up X instances.)



After talking with Sean Mooney, we have another fix which is self-contained to
the scheduler [5] so we wouldn't need to make any changes to the RequestSpec
handling in conductor. It's admittedly a bit hairy, so I'm asking for some eyes
on it since either way we go, we should get going soon before we hit the FF and
RC1 rush which *always* kills the gate.


One of your options mentioned using RequestSpec.num_instances to decide if it's 
in a multi-create.  Is there any reason to persist RequestSpec.num_instances? 
It seems like it's only applicable to the initial request, since after that each 
instance is managed individually.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack lagging behind 2 major python versions: we need a Python 3.7 gate

2018-07-18 Thread Jay Pipes

On 07/18/2018 12:42 AM, Ian Wienand wrote:

The ideal is that a (say) Neutron dev gets a clear traceback from a
standard Python error in their change and happily fixes it.  The
reality is probably more like this developer gets a tempest
failure due to nova failing to boot a cirros image, stemming from a
detached volume due to a qemu bug that manifests due to a libvirt
update (I'm exaggerating, I know :).


Not really exaggerating. :)

-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Bug 1781710 killing the check queue

2018-07-18 Thread Matt Riedemann
As can be seen from logstash [1] this bug is hurting us pretty bad in 
the check queue.


I thought I originally had this fixed with [2] but that turned out to 
only be part of the issue.


I think I've identified the problem but I have failed to write a 
recreate regression test [3] because (I think) it's due to random 
ordering of which request spec we select to send to the scheduler during 
a multi-create request (and I tried making that predictable by sorting 
the instances by uuid in both conductor and the scheduler but that 
didn't make a difference in my test).


I started with one fix yesterday [4] but that would regress an earlier 
fix for resizing servers to the same host which are in an anti-affinity 
group. If we went that route, it will involve changes to how we handle 
RequestSpec.num_instances (either not persist it, or reset it during 
move operations).


After talking with Sean Mooney, we have another fix which is 
self-contained to the scheduler [5] so we wouldn't need to make any 
changes to the RequestSpec handling in conductor. It's admittedly a bit 
hairy, so I'm asking for some eyes on it since either way we go, we 
should get going soon before we hit the FF and RC1 rush which *always* 
kills the gate.


[1] http://status.openstack.org/elastic-recheck/index.html#1781710
[2] https://review.openstack.org/#/c/582976/
[3] https://review.openstack.org/#/c/583339
[4] https://review.openstack.org/#/c/583351
[5] https://review.openstack.org/#/c/583347

--

Thanks,

Matt

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Proper behavior for os-force_detach

2018-07-18 Thread Walter Boring
The whole purpose of this test is to simulate the case where Nova doesn't
know where the vm is anymore,
or may simply not exist, but we need to clean up the cinder side of
things.   That being said, with the new
attach API, the connector is being saved in the cinder database for each
volume attachment.

Walt

On Wed, Jul 18, 2018 at 5:02 AM, Gorka Eguileor  wrote:

> On 17/07, Sean McGinnis wrote:
> > On Tue, Jul 17, 2018 at 04:06:29PM -0300, Erlon Cruz wrote:
> > > Hi Cinder and Nova folks,
> > >
> > > Working on some tests for our drivers, I stumbled upon this tempest
> test
> > > 'force_detach_volume'
> > > that is calling Cinder API passing a 'None' connector. At the time
> this was
> > > added several CIs
> > > went down, and people started discussing whether this
> (accepting/sending a
> > > None connector)
> > > would be the proper behavior for what is expected to a driver to
> do[1]. So,
> > > some of CIs started
> > > just skipping that test[2][3][4] and others implemented fixes that
> made the
> > > driver to disconnected
> > > the volume from all hosts if a None connector was received[5][6][7].
> >
> > Right, it was determined the correct behavior for this was to disconnect
> the
> > volume from all hosts. The CIs that are skipping this test should stop
> doing so
> > (once their drivers are fixed of course).
> >
> > >
> > > While implementing this fix seems to be straightforward, I feel that
> just
> > > removing the volume
> > > from all hosts is not the correct thing to do mainly considering that
> we
> > > can have multi-attach.
> > >
> >
> > I don't think multiattach makes a difference here. Someone is forcibly
> > detaching the volume and not specifying an individual connection. So
> based on
> > that, Cinder should be removing any connections, whether that is to one
> or
> > several hosts.
> >
>
> Hi,
>
> I agree with Sean, drivers should remove all connections for the volume.
>
> Even without multiattach there are cases where you'll have multiple
> connections for the same volume, like in a Live Migration.
>
> It's also very useful when Nova and Cinder get out of sync and your
> volume has leftover connections. In this case if you try to delete the
> volume you get a "volume in use" error from some drivers.
>
> Cheers,
> Gorka.
>
>
> > > So, my questions are: What is the best way to fix this problem? Should
> > > Cinder API continue to
> > > accept detachments with None connectors? If, so, what would be the
> effects
> > > on other Nova
> > > attachments for the same volume? Is there any side effect if the
> volume is
> > > not multi-attached?
> > >
> > > Additionally to this thread here, I should bring this topic to
> tomorrow's
> > > Cinder's meeting,
> > > so please join if you have something to share.
> > >
> >
> > +1 - good plan.
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Stein blueprint - Plan to remove Keepalived support (replaced by Pacemaker)

2018-07-18 Thread Dan Prince
On Tue, 2018-07-17 at 22:00 +0200, Michele Baldessari wrote:
> Hi Jarda,
> 
> thanks for these perspectives, this is very valuable!
> 
> On Tue, Jul 17, 2018 at 06:01:21PM +0200, Jaromir Coufal wrote:
> > Not rooting for any approach here, just want to add a bit of
> > factors which might play a role when deciding which way to go:
> > 
> > A) Performance matters, we should be improving simplicity and speed
> > of
> > deployments rather than making it heavier. If the deployment time
> > and
> > resource consumption is not significantly higher, I think it
> > doesn’t
> > cause an issue. But if there is a significant difference between
> > PCMK
> > and keepalived architecture, we would need to review that.
> 
> +1 Should the pcmk take substantially more time then I agree, not
> worth
> defaulting to it. Worth also exploring how we could tweak things
> to make the setup of the cluster a bit faster (on a single node we
> can
> lower certain wait operations) but full agreement on this point.
> 
> > B) Containerization of PCMK plans - eventually we would like to run
> > the whole undercloud/overcloud on minimal OS in containers to keep
> > improving the operations on the nodes (updates/upgrades/etc). If
> > because PCMK we would be forever stuck on BM, it would be a bit of
> > pita. As Michele said, maybe we can re-visit this.
> 
> So I briefly discussed this in our team, and while it could be
> re-explored, we need to be very careful about the tradeoffs.
> This would be another layer which would bring quite a bit of
> complexity
> (pcs commands would have to be run inside a container, speed
> tradeoffs,
> more limited possibilities when it comes to upgrading/updating, etc.)
> 
> > C) Unification of undercloud/overcloud is important for us, so +1
> > to
> > whichever method is being used in both. But what I know, HA folks
> > went
> > to keepalived since it is simpler so would be good to keep in sync
> > (and good we have their presence here actually) :)
> 
> Right so to be honest, the choice of keepalived on the undercloud for
> VIP predates me and I was not directly involved, so I lack the exact
> background for that choice (and I could not quickly reconstruct it
> from git
> history). But I think it is/was a reasonable choice for what it needs
> doing, although I probably would have picked just configuring the
> extra
> VIPs on the interfaces and have one service less to care about.
> +1 in general on the unification, with the caveats that have been
> discussed so far.

I think it was more of that we wanted to use HAProxy for SSL
termination and keepalived is a simple enough way to set this up.
Instack-Undercloud has used HAProxy/keepalived for years in this
manner.

I think this came up recently because downstream we did not have a
keepalived container. So it got a bit of spotlight on it as to why we
were using it. We do have a keepalived RPM and its worked as it has for
years already so as far as single node/undercloud setups go I think it
would continue to work fine. Kolla has had and supports the keepalived
container for awhile now as well.

---

Comments on this thread seem to cover 2 main themes to me.
Simplification and the desire to use the same architecture as the
Overcloud (Pacemaker). And there is some competition between them.

For simplification: If we can eliminate keepalived and still use
HAProxy (thus keeping the SSL termination features working) then I
think that would be worth trying. Specifically can we eliminate
Keepalived without swapping in Pacemaker? Michele: if you have ideas
here lets try them!

With regards to Pacemaker I think we need to make an exception. It
seems way too heavy for single node setups and increases the complexity
there for very little benefit. To me the shared architecture for
TripleO is the tools we use to setup services. By using t-h-t to drive
our setup of the Undercloud and All-In-One installers we are already
gaining a lot of benefit here. Pacemaker is weird as it is kind of
augments the architecture a bit (HA architecture). But Pacemaker is
also a service that gets configured by TripleO. So it kind of falls
into both categories. Pacemaker gives us features we need in the
Overcloud at the cost of some extra complexity. And in addition to all
this we are still running the Pacemaker processes themselves on
baremetal. All this just to say we are running the same "architecture"
on both the Undercloud and Overcloud? I'm not a fan.

Dan



> 
> > D) Undercloud HA is a nice have which I think we want to get to one
> > day, but it is not in as big demand as for example edge
> > deployments,
> > BM provisioning with pure OS, or multiple envs managed by single
> > undercloud. So even though undercloud HA is important, it won’t
> > bring
> > operators as many benefits as the previously mentioned
> > improvements.
> > Let’s keep it in mind when we are considering the amount of work
> > needed for it.
> 
> +100
> 
> > E) One of the use-cases we want to take into account is 

[openstack-dev] [publiccloud-wg] Meeting tomorrow for Public Cloud WG

2018-07-18 Thread Tobias Rydberg

Hi folks,

Time for a new meeting for the Public Cloud WG. Agenda draft can be 
found at https://etherpad.openstack.org/p/publiccloud-wg, feel free to 
add items to that list.


See you all tomorrow at IRC 1400 UTC in #openstack-publiccloud

Cheers,
Tobias

--
Tobias Rydberg
Senior Developer
Twitter & IRC: tobberydberg

www.citynetwork.eu | www.citycloud.com

INNOVATION THROUGH OPEN IT INFRASTRUCTURE
ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova]API update week 12-18

2018-07-18 Thread Ghanshyam Mann
Hi All, 

Please find the Nova API highlights of this week. 

Weekly Office Hour: 
=== 

What we discussed this week: 
- Discussion on priority BP and remaining reviews on those. 
- picked up 3 in-progress bug's patches and reviewed. 

Planned Features : 
== 
Below are the API related features for Rocky cycle. Nova API Sub team will 
start reviewing those to give their regular feedback. If anythings missing 
there feel free to add those in etherpad- 
https://etherpad.openstack.org/p/rocky-nova-priorities-tracking 

1. Servers Ips non-unique network names : 
- 
https://blueprints.launchpad.net/nova/+spec/servers-ips-non-unique-network-names
 
- Spec Merged 
- 
https://review.openstack.org/#/q/topic:bp/servers-ips-non-unique-network-names+(status:open+OR+status:merged)
 
- Weekly Progress: I sent mail to author but no response yet.  I will push the 
code update during next week early. 

2. Abort live migration in queued state: 
- 
https://blueprints.launchpad.net/nova/+spec/abort-live-migration-in-queued-status
 
- 
https://review.openstack.org/#/q/topic:bp/abort-live-migration-in-queued-status+(status:open+OR+status:merged)
 
- Weekly Progress: API patch is in gate to merge. nova client patch is 
remaining to mark this complete (Kevin mentioned he is working on that). 

3. Complex anti-affinity policies: 
- https://blueprints.launchpad.net/nova/+spec/complex-anti-affinity-policies 
- 
https://review.openstack.org/#/q/topic:bp/complex-anti-affinity-policies+(status:open+OR+status:merged)
 
- Weekly Progress: API patch is merged. nova client and 1 follow up patch is 
remaining. 

4. Volume multiattach enhancements: 
- https://blueprints.launchpad.net/nova/+spec/volume-multiattach-enhancements 
- 
https://review.openstack.org/#/q/topic:bp/volume-multiattach-enhancements+(status:open+OR+status:merged)
 
- Weekly Progress: No progress. 

5. API Extensions merge work 
- https://blueprints.launchpad.net/nova/+spec/api-extensions-merge-rocky 
- 
https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:bp/api-extensions-merge-rocky
 
- Weekly Progress: I pushed patches for part-2 (server_create merge). I will 
work on pushing last part-3 max by early next week. 

6. Handling a down cell 
- https://blueprints.launchpad.net/nova/+spec/handling-down-cell 
- 
https://review.openstack.org/#/q/topic:bp/handling-down-cell+(status:open+OR+status:merged)
- Weekly Progress: Code is up and matt has reviewed few patches. API subteam 
will target this BP as other BP work are almost merged. 

Bugs: 
 
Did review on in-progress bugs's patches. 

This week Bug Progress: 
Critical: 0->0 
High importance: 3->3 
By Status: 
New: 0->0 
Confirmed/Triage: 31-> 29 
In-progress: 36->36 
Incomplete: 4->4 
= 
Total: 70->69

NOTE- There might be some bug which are not tagged as 'api' or 'api-ref', those 
are not in above list. Tag such bugs so that we can keep our eyes. 

Ref: https://etherpad.openstack.org/p/nova-api-weekly-bug-report 

-gmann 





__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [cinder][nova] Proper behavior for os-force_detach

2018-07-18 Thread Gorka Eguileor
On 17/07, Sean McGinnis wrote:
> On Tue, Jul 17, 2018 at 04:06:29PM -0300, Erlon Cruz wrote:
> > Hi Cinder and Nova folks,
> >
> > Working on some tests for our drivers, I stumbled upon this tempest test
> > 'force_detach_volume'
> > that is calling Cinder API passing a 'None' connector. At the time this was
> > added several CIs
> > went down, and people started discussing whether this (accepting/sending a
> > None connector)
> > would be the proper behavior for what is expected to a driver to do[1]. So,
> > some of CIs started
> > just skipping that test[2][3][4] and others implemented fixes that made the
> > driver to disconnected
> > the volume from all hosts if a None connector was received[5][6][7].
>
> Right, it was determined the correct behavior for this was to disconnect the
> volume from all hosts. The CIs that are skipping this test should stop doing 
> so
> (once their drivers are fixed of course).
>
> >
> > While implementing this fix seems to be straightforward, I feel that just
> > removing the volume
> > from all hosts is not the correct thing to do mainly considering that we
> > can have multi-attach.
> >
>
> I don't think multiattach makes a difference here. Someone is forcibly
> detaching the volume and not specifying an individual connection. So based on
> that, Cinder should be removing any connections, whether that is to one or
> several hosts.
>

Hi,

I agree with Sean, drivers should remove all connections for the volume.

Even without multiattach there are cases where you'll have multiple
connections for the same volume, like in a Live Migration.

It's also very useful when Nova and Cinder get out of sync and your
volume has leftover connections. In this case if you try to delete the
volume you get a "volume in use" error from some drivers.

Cheers,
Gorka.


> > So, my questions are: What is the best way to fix this problem? Should
> > Cinder API continue to
> > accept detachments with None connectors? If, so, what would be the effects
> > on other Nova
> > attachments for the same volume? Is there any side effect if the volume is
> > not multi-attached?
> >
> > Additionally to this thread here, I should bring this topic to tomorrow's
> > Cinder's meeting,
> > so please join if you have something to share.
> >
>
> +1 - good plan.
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [edge][glance]: Image handling in edge environment

2018-07-18 Thread Csatari, Gergely (Nokia - HU/Budapest)
Hi,

We had a great Forum session about image handling in edge environment in 
Vancouver [1]. As one 
outcome of the session I've created a wiki with the mentioned architecture 
options 
[1]. During 
the Edge Working Group 
[3] discussions we 
identified some questions (some of them are in the wiki, some of them are in 
mails 
[4]) 
and also I would like to get some feedback on the analyzis in the wiki from 
people who know Glance.

I think the best would be to have some kind of meeting and I see two options to 
organize this:

  *   Organize a dedicated meeting for this
  *   Add this topic as an agenda point to the Glance weekly meeting

Please share your preference and/or opinion.

Thanks,
Gerg0

[1]: https://etherpad.openstack.org/p/yvr-edge-cloud-images
[2]: https://wiki.openstack.org/wiki/Image_handling_in_edge_environment
[3]: https://wiki.openstack.org/wiki/Edge_Computing_Group
[4]: http://lists.openstack.org/pipermail/edge-computing/2018-June/000239.html

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] New "validation" subcommand for "openstack undercloud"

2018-07-18 Thread Cédric Jeanneret
Dear Stackers,

Seeing the answers on and off-list, we're moving forward!

So, here are the first steps:

A blueprint has been created:
https://blueprints.launchpad.net/tripleo/+spec/validation-framework

I've started a draft of the spec, based on the feedbacks and discussions
I could have:
https://review.openstack.org/#/c/583475/

Please, feel free to comment the spec and add your thoughts - this is a
really great opportunity to get a proper validation framework in
tripleoclient directly.

Thank you for your feedback and attention.

Cheers,

C.


On 07/16/2018 05:27 PM, Cédric Jeanneret wrote:
> Dear Stackers,
> 
> In order to let operators properly validate their undercloud node, I
> propose to create a new subcommand in the "openstack undercloud" "tree":
> `openstack undercloud validate'
> 
> This should only run the different validations we have in the
> undercloud_preflight.py¹
> That way, an operator will be able to ensure all is valid before
> starting "for real" any other command like "install" or "upgrade".
> 
> Of course, this "validate" step is embedded in the "install" and
> "upgrade" already, but having the capability to just validate without
> any further action is something that can be interesting, for example:
> 
> - ensure the current undercloud hardware/vm is sufficient for an update
> - ensure the allocated VM for the undercloud is sufficient for a deploy
> - and so on
> 
> There are probably other possibilities, if we extend the "validation"
> scope outside the "undercloud" (like, tripleo, allinone, even overcloud).
> 
> What do you think? Any pros/cons/thoughts?
> 
> Cheers,
> 
> C.
> 
> 
> 
> ¹
> http://git.openstack.org/cgit/openstack/python-tripleoclient/tree/tripleoclient/v1/undercloud_preflight.py
> 

-- 
Cédric Jeanneret
Software Engineer
DFG:DF



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [watcher] weekly meeting

2018-07-18 Thread Чадин Александр Сергеевич
Watcher team,

It’s just a reminder we will have meeting today at 08:00 UTC
on #openstack-meeting-alt channel.

Best Regards,

Alex

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev