Re: [openstack-dev] [TaskFlow] TaskFlow persistence: Job failure retry

2016-06-05 Thread Joshua Harlow
Cool, we'll feel free to find the taskflow (and others) either in 
#openstack-oslo or #openstack-state-management if you have any questions.


-Josh

pnkk wrote:

I am working on NFV orchestrator based on MANO

Regards,
Kanthi

On Thu, Jun 2, 2016 at 3:00 AM, Joshua Harlow > wrote:

Interesting way to combine taskflow + celery.

I didn't expect it to be used like this, but the more power to you!

Taskflow itself has some similar capabilities via
http://docs.openstack.org/developer/taskflow/workers.html#design but
anyway what u've done is pretty neat as well.

I am assuming this isn't an openstack project (due to usage of
celery), any details on what's being worked on (am curious here)?

pnkk wrote:

Thanks for the nice documentation.

To my knowledge celery is widely used for distributed task
processing.
This fits our requirement perfectly where we want to return
immediate
response to the user from our API server and run long running
task in
background. Celery also gives flexibility with the worker
types(process(can overcome GIL problems too)/evetlet...) and it also
provides nice message brokers(rabbitmq,redis...)

We used both celery and taskflow for our core processing to
leverage the
benefits of both. Taskflow provides nice primitives like(execute,
revert, pre,post stuf) which takes off the load from the
application.

As far as the actual issue is concerned, I found one way to
solve it by
using celery "retry" option. This along with late_acks makes the
application highly fault tolerant.

http://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry

Regards,
Kanthi


On Sat, May 28, 2016 at 1:51 AM, Joshua Harlow

>>
wrote:

 Seems like u could just use
http://docs.openstack.org/developer/taskflow/jobs.html (it appears
 that you may not be?); the job itself would when failed be then
 worked on by a different job consumer.

 Have u looked at those? It almost appears that u are using
celery as
 a job distribution system (similar to the jobs.html link
mentioned
 above)? Is that somewhat correct (I haven't seen anyone try
this,
 wondering how u are using it and the choices that directed u to
 that, aka, am curious)?

 -Josh

 pnkk wrote:

 To be specific, we hit this issue when the node running our
 service is
 rebooted.
 Our solution is designed in a way that each and every
job is a
 celery
 task and inside celery task, we create taskflow flow.

 We enabled late_acks in celery(uses rabbitmq as message
broker),
 so if
 our service/node goes down, other healthy service can
pick the
 job and
 completes it.
 This works fine, but we just hit this rare case where
the node was
 rebooted just when taskflow is updating something to
the database.

 In this case, it raises an exception and the job is marked
 failed. Since
 it is complete(with failure), message is removed from the
 rabbitmq and
 other worker would not be able to process it.
 Can taskflow handle such I/O errors gracefully or should
 application try
 to catch this exception? If application has to handle
it what would
 happen to that particular database transaction which
failed just
 when
 the node is rebooted? Who will retry this transaction?

 Thanks,
 Kanthi

 On Fri, May 27, 2016 at 5:39 PM, pnkk

>



Re: [openstack-dev] [ovs-discuss] [OVN] [networking-ovn] [networking-sfc] SFC andOVN

2016-06-05 Thread John McDowall
Juno and team,

I have written and compiled (but not tested ) the ovs/ovn interface to 
networking-ovn and similarly I have written but not tested the IDL interfaces 
on the networking-ovn side. I will put it all together tomorrow and start 
debugging end to end. I know I am going to find a lot of issues as it is a 
major rewrite from my original interface to networking-sfc – it is the right 
path (IMHO) just a little more work than I expected.

I have merged my repos with the upstream masters and I will keep them sync’ed 
so if you want to take a look and start thinking where you can help it would be 
really appreciated.

Regards

John

From: Na Zhu >
Date: Saturday, June 4, 2016 at 6:30 AM
To: John McDowall 
>
Cc: "disc...@openvswitch.org" 
>, OpenStack 
Development Mailing List 
>, 
Ryan Moats >, Srilatha Tangirala 
>
Subject: Re: [ovs-discuss] [OVN] [networking-ovn] [networking-sfc] SFC andOVN

Hi John,

OK, please keep me posted once you done, thanks very much.




Regards,
Juno Zhu
IBM China Development Labs (CDL) Cloud IaaS Lab
Email: na...@cn.ibm.com
5F, Building 10, 399 Keyuan Road, Zhangjiang Hi-Tech Park, Pudong New District, 
Shanghai, China (201203)



From:John McDowall 
>
To:Na Zhu/China/IBM@IBMCN
Cc:"disc...@openvswitch.org" 
>, "OpenStack 
Development Mailing List" 
>, 
Ryan Moats >, Srilatha Tangirala 
>
Date:2016/06/03 13:15
Subject:Re: [ovs-discuss] [OVN] [networking-ovn] [networking-sfc] SFC 
andOVN




Juno

Whatever gets it done faster- let me get the three repos aligned. I need to get 
the ovs/ovn work done so networking-ovn can call it, and the networking-sfc can 
call networking-ovn.

Hopefully I will have it done tomorrow or over the weekend - let's touch base 
Monday or Sunday night.

Regards

John

Sent from my iPhone

On Jun 2, 2016, at 6:30 PM, Na Zhu > 
wrote:

Hi John,

I agree with submitting WIP patches to community, because you already did many 
works on networking-sfc and networking-ovn, it is better that you submit the 
initial patches about networking-sfc and networking-ovn, then me and Srilatha 
take over the patches. Do you have time to do it? if not, me and Srilatha can 
help to do it and you are always the co-author.




Regards,
Juno Zhu
IBM China Development Labs (CDL) Cloud IaaS Lab
Email: na...@cn.ibm.com
5F, Building 10, 399 Keyuan Road, Zhangjiang Hi-Tech Park, Pudong New District, 
Shanghai, China (201203)



From:John McDowall 
>
To:Na Zhu/China/IBM@IBMCN
Cc:"disc...@openvswitch.org" 
>, "OpenStack 
Development Mailing List" 
>, 
Ryan Moats >, Srilatha Tangirala 
>
Date:2016/06/03 00:08
Subject:Re: [ovs-discuss] [OVN] [networking-ovn] [networking-sfc] SFC 
andOVN




Juno,

Sure make sense. I will have ovs/ovn in rough shape by end of week (hopefully) 
that will allow you to call the interfaces from networking-ovn. Ryan has asked 
that we submit WIP patches etc so hopefully that will kickstart the review 
process.
Also, hopefully some of the networking-sfc team will also be able to help – I 
will let them speak for themselves.

Regards

John

From: Na Zhu >
Date: Wednesday, June 1, 2016 at 7:02 PM
To: John McDowall 
>
Cc: "disc...@openvswitch.org" 
>, OpenStack 
Development Mailing List 
>, 
Ryan Moats >, Srilatha Tangirala 
>
Subject: Re: [ovs-discuss] [OVN] [networking-ovn] [networking-sfc] SFC andOVN

Hi John,

Thanks your reply.

Seems you have covered 

Re: [openstack-dev] [ironic] Trello board

2016-06-05 Thread Michael Davies
On Sat, Jun 4, 2016 at 1:09 AM, Jim Rollenhagen 
wrote:

> Myself and some other cores have had trouble tracking our priorities
> using Launchpad and friends, so we put together a Trello board to help
> us track it. This should also help us focus on what to review or work
> on.
>
> https://trello.com/b/ROTxmGIc/ironic-newton-priorities



Thanks Jim for sharing that link.  Anything to help keep things organised :)
-- 
Michael Davies   mich...@the-davies.net
Rackspace Australia
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tricircle] reviewed by multiple eyes

2016-06-05 Thread joehuang
Hello,

There are several articles will be helpful both to reviewers and code 
contributors: [1],[2],[3]

[1]:  http://docs.openstack.org/infra/manual/developers.html#code-review
[2]:  http://docs.openstack.org/infra/manual/developers.html#peer-review
[3]:  https://krotscheck.net/2015/07/13/code-review-in-openstack.html

I think this point is quite important for reviewer: Be explicit: If you ask for 
a change, specify where that change needs to be made, how it needs to be made, 
and why it needs to be made. Provide a code example if possible. 

Best Regards
Chaoyi Huang ( Joe Huang )


-Original Message-
From: Shinobu Kinjo [mailto:shinobu...@gmail.com] 
Sent: Friday, June 03, 2016 7:28 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [tricircle] reviewed by multiple eyes

Hi Team,

There are some patch sets reviewed by only myself.
>From my point of view, any patch set needs to be reviewed by multiple eyes.

It's because anyone is not perfect. And there should be anything missing.
Please take a look, if you get notification to review.

Cheers,
Shinobu

-- 
Email:
shin...@linux.com
shin...@redhat.com

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Openstack] [searchlight] Routing for parent/child plugins

2016-06-05 Thread McLellan, Steven
Hi,

The switch to Elasticsearch 2.x for the gate functional tests revealed a couple 
of issues that need resolving fairly quickly [1] [2] [3]. Of them, [2] raises a 
problem regarding routing for documents belonging to child plugins.

Once we define a parent/child mapping, index/update/delete operations on child 
documents need to be accompanied with a routing hint since parent & child 
documents must be located on the same shard. Elasticsearch 1.x helpfully 
broadcasts deletes to all shards if no routing is given; Elasticsearch 2.x does 
not. We currently route on the parent id, which is fine in most cases but 
presents a problem for deletes. A neutron port delete notification contains 
only the port_id, not the network_id to which it belongs. To get the network_id 
we'd need to first query Elasticsearch, and again, without the network_id we 
can't issue a GET for the port but instead would need to run a search. This 
then raises some consistency problems.

My suggestion is that we route by tenant id instead unless a plugin 
specifically overrides it. This has potentially some advantages in that some 
searches will only run against information belonging to one tenant (although 
unfortunately glance and neutron both have the concept of public and shared 
resources), but more importantly we always have the project id for every 
notification. The downside is the potential for hot spotting; one big user may 
create load on one shard. A compromise would be to route by tenant id but only 
for child/parent plugins that are affected by this (currently, neutron). A 
further compromise would be a configuration level setting to route all 
resources by project id.

I intend to implement the second option (routing by project id where necessary) 
for now, but any thoughts welcome.

Steve

[1] https://bugs.launchpad.net/searchlight/+bug/1588933 - patch at 
https://review.openstack.org/325487 to fix it and disable tests affected by [2] 
and [3]
[2] https://bugs.launchpad.net/searchlight/+bug/1588540 - routing missing for 
some calls on child plugins, particularly delete
[3] https://bugs.launchpad.net/searchlight/+bug/1589319 - some reindex 
functional tests fail

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][all] Tagging kilo-eol for "the world"

2016-06-05 Thread Tony Breeds
On Fri, Jun 03, 2016 at 03:05:36PM +0200, Alan Pevec wrote:
> > openstack/packstack  BigTent
> 
> Just to clarify, Packstack has not formally applied to BigTent yet, it
> has only been automatically migrated from stackforge to openstack
> namespace.

Okay thanks.  'BitTent' is probably the wrong term there, for the purposes of
this it just means not accosiatced with a project from
governance:reference/projects.yaml

> But yes, please keep its kilo branch for now until we properly wrap it up.

Cool.

Yours Tony.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [kolla] Cinder Problem w/ AIO install

2016-06-05 Thread Arash Kaffamanesh
Hi,

I'm trying to install OpenStack via kolla on a single bare metal machine
following this page:

http://docs.openstack.org/developer/kolla/quickstart.html

After the install I can log into horizon, but obviously since cinder is
missing, I can't launch any instance (block device is missing).

So I tried to activate cinder and set in globals.yml:
(created a loopback device following this guide:
http://docs.openstack.org/developer/kolla/cinder-guide.html)

enable_cinder: "yes"

cinder_iscsi_ip_address: "x.x.x.x"

cinder_volume_group: "cinder-volumes"

After running kolla-ansible deploy, I'm getting the following exception
below by the task copying over cinder.conf:
Any ideas?

Thanks for any hints in advance!
-Arash


###

TASK [cinder : Copying over cinder.conf]
***

An exception occurred during task execution. To see the full traceback, use
-vvv. The error was: [line 57]: '[]\n'

fatal: [localhost]: FAILED! => {"failed": true, "stdout": ""}


PLAY RECAP
*

localhost  : ok=184  changed=1unreachable=0failed=1



Command failed ansible-playbook -i
/usr/share/kolla/ansible/inventory/all-in-one -e @/etc/kolla/globals.yml -e
@/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e action=deploy
/usr/share/kolla/ansible/site.yml

PLAY RECAP
*

localhost  : ok=184  changed=1unreachable=0failed=1


###
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TaskFlow] TaskFlow persistence: Job failure retry

2016-06-05 Thread pnkk
I am working on NFV orchestrator based on MANO

Regards,
Kanthi

On Thu, Jun 2, 2016 at 3:00 AM, Joshua Harlow  wrote:

> Interesting way to combine taskflow + celery.
>
> I didn't expect it to be used like this, but the more power to you!
>
> Taskflow itself has some similar capabilities via
> http://docs.openstack.org/developer/taskflow/workers.html#design but
> anyway what u've done is pretty neat as well.
>
> I am assuming this isn't an openstack project (due to usage of celery),
> any details on what's being worked on (am curious here)?
>
> pnkk wrote:
>
>> Thanks for the nice documentation.
>>
>> To my knowledge celery is widely used for distributed task processing.
>> This fits our requirement perfectly where we want to return immediate
>> response to the user from our API server and run long running task in
>> background. Celery also gives flexibility with the worker
>> types(process(can overcome GIL problems too)/evetlet...) and it also
>> provides nice message brokers(rabbitmq,redis...)
>>
>> We used both celery and taskflow for our core processing to leverage the
>> benefits of both. Taskflow provides nice primitives like(execute,
>> revert, pre,post stuf) which takes off the load from the application.
>>
>> As far as the actual issue is concerned, I found one way to solve it by
>> using celery "retry" option. This along with late_acks makes the
>> application highly fault tolerant.
>>
>> http://docs.celeryproject.org/en/latest/faq.html#faq-acks-late-vs-retry
>>
>> Regards,
>> Kanthi
>>
>>
>> On Sat, May 28, 2016 at 1:51 AM, Joshua Harlow > > wrote:
>>
>> Seems like u could just use
>> http://docs.openstack.org/developer/taskflow/jobs.html (it appears
>> that you may not be?); the job itself would when failed be then
>> worked on by a different job consumer.
>>
>> Have u looked at those? It almost appears that u are using celery as
>> a job distribution system (similar to the jobs.html link mentioned
>> above)? Is that somewhat correct (I haven't seen anyone try this,
>> wondering how u are using it and the choices that directed u to
>> that, aka, am curious)?
>>
>> -Josh
>>
>> pnkk wrote:
>>
>> To be specific, we hit this issue when the node running our
>> service is
>> rebooted.
>> Our solution is designed in a way that each and every job is a
>> celery
>> task and inside celery task, we create taskflow flow.
>>
>> We enabled late_acks in celery(uses rabbitmq as message broker),
>> so if
>> our service/node goes down, other healthy service can pick the
>> job and
>> completes it.
>> This works fine, but we just hit this rare case where the node was
>> rebooted just when taskflow is updating something to the database.
>>
>> In this case, it raises an exception and the job is marked
>> failed. Since
>> it is complete(with failure), message is removed from the
>> rabbitmq and
>> other worker would not be able to process it.
>> Can taskflow handle such I/O errors gracefully or should
>> application try
>> to catch this exception? If application has to handle it what
>> would
>> happen to that particular database transaction which failed just
>> when
>> the node is rebooted? Who will retry this transaction?
>>
>> Thanks,
>> Kanthi
>>
>> On Fri, May 27, 2016 at 5:39 PM, pnkk > 
>> >> wrote:
>>
>>  Hi,
>>
>>  When taskflow engine is executing a job, the execution
>> failed due to
>>  IO error(traceback pasted below).
>>
>>  2016-05-25 19:45:21.717 7119 ERROR
>>  taskflow.engines.action_engine.engine 127.0.1.1 [-]  Engine
>>  execution has failed, something bad must of happened (last 10
>>  machine transitions were [('SCHEDULING', 'WAITING'),
>> ('WAITING',
>> 'ANALYZING'), ('ANALYZING', 'SCHEDULING'), ('SCHEDULING',
>> 'WAITING'), ('WAITING', 'ANALYZING'), ('ANALYZING', 'SCHEDULING'),
>>  ('SCHEDULING', 'WAITING'), ('WAITING', 'ANALYZING'),
>> ('ANALYZING',
>> 'GAME_OVER'), ('GAME_OVER', 'FAILURE')])
>>  2016-05-25 19:45:21.717 7119 TRACE
>>  taskflow.engines.action_engine.engine Traceback (most
>> recent call last):
>>  2016-05-25 19:45:21.717 7119 TRACE
>>  taskflow.engines.action_engine.engine   File
>>
>> "/opt/nso/nso-1.1223-default/nfvo-0.8.0.dev1438/.venv/local/lib/python2.7/site-packages/taskflow/engines/action_engine/engine.py",
>>  line 269, in run_iter
>>  2016-05-25 19:45:21.717 7119 TRACE
>>  

[openstack-dev] [Meghdwar] Weekly IRC meeting proposed on Wedneday 7AM-8AM PST same as UTC 2.00PM-3.00PM

2016-06-05 Thread prakash RAMCHANDRAN


Hi openstack-dev folks,
I am planning to arrange weekly irc meetings Wednesday 7-8 AM on 
#openstack-meghdwar IRC -use http://webchat.freenode.net/
Agenda for 1st meeting on June 8, 2016 : 7AM-8AM PST same as UTC 2.00PM-3.00PM
1. Brief description of proposed megdwar project with background to Cloudlet 
and should we enter openstack as Library only efforts?
Refer : https://launchpad.net/cloudlet & https://launchpad.net/meghdwar
code: https://github.com/OpenEdgeComputing/elijah-openstack

2. Should we Abandon the OpenStack meghdwar efforts till we have a github 
version in pubic ready for Cloudlet in Mitakas and  start afresh?
https://review.openstack.org/#/c/319466/ 
3. WIll we need change to provisioning of Cloudlet for Ubuntu 16.04 LTS with 
Miataka (What are our options) -Code review (Any Volunteers for code review?)
https://github.com/cmusatyalab/elijah-provisioning4. WIll we need change to 
update Devstack for  Miataka install before using Cloudlet (What are our 
options- discussions with code 
walk)https://bitbucket.org/krha/elijah-openstack/overview
5. Please add your self to above and if you think we should pull the code from 
past edit COMMENTS and give me a minus 1 to hold till we can compile a 
reasonable working code to pull on deciding based on 1 and 2 on 8th June. So 
minus 1 review is most welcome for change.
https://review.openstack.org/#/c/319466/10/gerrit/projects.yaml

I hope there will be many inputs before the irc meeting on #openstack-meghdwar 
this wednesday and there is another forum if you like to 
Qhttp://forum.openedgecomputing.org/c/FAQ/getting-started

Scoping, specs and Resource needed we can discuss during the irc call or by 
email.This   openstack-dev announcement for Wednesday irc meeting should 
jumpstart  long term association with edge cloud efforts.
ThanksPrakash__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [octavia] enabling new topologies

2016-06-05 Thread Sergey Guenender

> Hi Sergey,  Welcome to working on Octavia!

Thanks, glad to be joining! :^)
Please read a further explanation of my proposal down below.

> I'm not sure I fully understand your proposals, but I can give my
> thoughts/opinion on the challenge for Active/Active.
>
> In general I agree with Stephen.
>
> The intention of using TaskFlow is to facilitate code reuse across
> similar but different code flows.
>
> For an Active/Active provisioning request I envision it as a new flow
> that is loaded as opposed to the current standalone and Active/Standby
> flow.  I would expect it would include many existing tasks (example,
> plug_network) that may be required for the requested action.  This new
> flow will likely include a number of concurrent sub-flows using these
> existing tasks.
>
> I do expect that the "distributor" will need to be a new "element".
> Because the various stakeholders are considering implementing this
> function in different ways, we agreed that an API and driver would be
> developed for interactions with the distributor.  This should also
> take into account that there may be some deployments where
> distributors are not shared.

I too expect a new model element to represent distributor, including its 
own API.


Virtual distributor does seem to share some behavior with amphora.

For instance, consider the "create load balancer" flow:
 * get_create_load_balancer_flow gets-or-creates a few nova instances, 
waits till they boot and marks them in DB
 * get_new_LB_networking_subflow allocates and plugs VIP on both 
Neutron and amphorae sides; security group handling included
 * when needed, get_vrrp_subflow creates a VRRP group on the LB and 
configures/starts it on amphorae

 * amphorae get connected to the members' networks
 * if listeners are defined on LB
   * haproxy services get configured and started including "peers" 
configuration

   * VIP network connections get Neutron security groups blessing

All parts of this flow seem to apply to the active-active topology too.

My intent is to try and reuse most of this rather involved flow by 
treating distributors as both a subset of "front-facing" amphorae and 
the "vrrp running" amphorae, while the original amphorae would be 
treated as both "back-facing" (for haproxy configuration, members' 
networks plugging, etc.) and "front-facing" (for VIP network 
plugging/processing).


If this leads to changing a lot of existing code or changing it 
non-trivially, I'll drop this idea, as my hope is to have less code 
review, not more.


> I still need to review the latest version of the Act/Act spec to
> understand where that was left after my first round of comments and
> our mid-cycle discussions.
>
> Michael

Thanks,
-Sergey.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [octavia] enabling new topologies

2016-06-05 Thread Sergey Guenender


Hi Stephen, please find my reply next to your points below.

Thank you,
-Sergey.


On 01/06/2016 20:23, Stephen Balukoff wrote:
> Hey Sergey--
>
> Apologies for the delay in my response. I'm still wrapping my head
> around your option 2 suggestion and the implications it might have for
> the code base moving forward. I think, though, that I'm against your
> option 2 proposal and in favor of option 1 (which, yes, is more work
> initially) for the following reasons:
>
> A. We have a precedent in the code tree with how the stand-alone and
> active-standby topologies are currently being handled. Yes, this does
> entail various conditionals and branches in tasks and flows-- which is
> not really that ideal, as it means the controller worker needs to have
> more specific information on how topologies work than I think any of us
> would like, and this adds some rigidity to the implementation (meaning
> 3rd party vendors may have more trouble interfacing at that level)...
> but it's actually "not that bad" in many ways, especially given we don't
> anticipate supporting a large or variable number of topologies.
> (stand-alone, active-standby, active-active... and then what? We've been
> doing this for a number of years and nobody has mentioned any radically
> new topologies they would like in their load balancing. Things like
> auto-scale are just a specific case of active-active).

Just as you say, two topologies are being handled as of now by only one 
set of flows. Option two goes along the same lines, instead of adding 
new flows for active-active it suggests that minor adjustments to 
existing flows can also satisfy active-active.


> B. If anything Option 2 builds more less-obvious rigidity into the
> implementation than option 1. For example, it makes the assumption that
> the distributor is necessarily an amphora or service VM, whereas we have
> already heard that some will implement the distributor as a pure network
> routing function that isn't going to be managed the same way other
> amphorae are.

This is a good point. By looking at the code, I see there are comments 
mentioning the intent to share amphora between several load balancers. 
Although probably not straightforward to implement, it might be a good 
idea one day, but the fact is it looks like amphora has not been shared 
between load balancers for a few years.


Personally, when developing something complex, I believe in taking baby 
steps. If the virtual, non-shared distributor (which is promised by the 
AA blueprint anyway) is the smallest step towards a working 
active-active, then I guess it should be considered taking first.


Unless of course, it precludes implementing the following, more complex 
topologies.


My belief is it doesn't have to. The proposed change alone (splitting 
amphorae into sub-clusters to be used by the many for-loops) doesn't 
force any special direction on its own. Any future topology may leave 
its "front-facing amphorae" set equal to its "back-facing amphorae" 
which brings it back to the current style of for-loops handling.


> C. Option 2 seems like it's going to have a lot more permutations that
> would need testing to ensure that code changes don't break existing /
> potentially supported functionality. Option 1 keeps the distributor and
> amphorae management code separate, which means tests should be more
> straight-forward, and any breaking changes which slip through
> potentially break less stuff. Make sense?

It certainly does.

My intent is that the simplest active-active implementation promised by 
the blueprint can be achieved with only minor changes by existing code. 
If required changes are not in fact small, or if this simplistic 
approach in some way impedes future work, we can drop this option.



> Stephen
>
>
> On Sun, May 29, 2016 at 7:12 AM, Sergey Guenender
>  > wrote:
>
> I'm working with the IBM team implementing the Active-Active N+1
> topology [1].
>
> I've been commissioned with the task to help integrate the code
> supporting the new topology while a) making as few code changes and
> b) reusing as much code as possible.
>
> To make sure the changes to existing code are future-proof, I'd like
> to implement them outside AA N+1, submit them on their own and let
> the AA N+1 base itself on top of it.
>
> --TL;DR--
>
> what follows is a description of the challenges I'm facing and the
> way I propose to solve them. Please skip down to the end of the
> email to see the actual questions.
>
> --The details--
>
> I've been studying the code for a few weeks now to see where the
> best places for minimal changes might be.
>
> Currently I see two options:
>
> 1. introduce a new kind of entity (the distributor) and make
> sure it's being handled on any of the 6 levels of controller worker
> code (endpoint, controller worker, *_flows, *_tasks, *_driver)
>
> 2. 

Re: [openstack-dev] 答复: [probably forge email可能是仿冒邮件]Re: [Kolla] About kolla-ansible reconfigure

2016-06-05 Thread Steven Dake (stdake)
Hu,

Thinking more about my proposed workarounds, I don’t think they will work 
because services are registered with keystone using the kolla external fqdn and 
internal fqdn.  If you didn't specify those originally (and instead are using 
IP addresses which fqdn defaults to if not specified) kolla won't change 
endpoint registrations magically.

A solution for changing VIPs must involve some kind of keystone reregistration.

For each config value in globals.yml, this kind of individual analysis must be 
carried out and implemented throughout the playbooks.  As you can see by this 
single case, the implementation requires lots of experimentation on each 
variable to permit changes.  I'm not certain in what development cycle that 
work will begin or end.  It definitely wont be backported as its a feature.

Regards
-steve


From: Steven Dake >
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
>
Date: Saturday, June 4, 2016 at 1:01 AM
To: "OpenStack Development Mailing List (not for usage questions)" 
>
Subject: Re: [openstack-dev] 答复: [probably forge email可能是仿冒邮件]Re: [Kolla] About 
kolla-ansible reconfigure

Hu,

Comments inline.

From: "hu.zhiji...@zte.com.cn" 
>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
>
Date: Saturday, June 4, 2016 at 12:11 AM
To: "OpenStack Development Mailing List (not for usage questions)" 
>
Subject: [openstack-dev] 答复: [probably forge email可能是仿冒邮件]Re: [Kolla] About 
kolla-ansible reconfigure

Hi Steven,


Thanks for the information. Some further questions:

> Reconfigure was not designed to handle changes to globals.yml.  I think its a 
> good goal that it should be able to do so, but it does not today.

So waht is the prefered method to change kolla_internal_vip_address and make it 
effective?

I can't think of any known way to do this.  Internal VIPs are typically 
permanent allocations.  Reconfigure iIRC does not copy configuration files if 
there are no changes to the relevant /etc/kolla/config/* file (such as 
nova.conf).  Since you can't get new config files generated on your deployed 
targets, the containers won't pick them up.  If they did pick them up, they 
wouldn't actually restart because they are designed to only restart on a 
reconfigure operation that reconfigures something (I.e. there is new config 
content).

One option that comes to mind is to login to every host in your cluster, sed 
replace the original internal VIP address in every file in /etc/kolla/* with 
the new one, then docker stop every container on every node, then docker start 
every container on every node.  I know, not optimal and only works with 
COPY_ALWAYS.  This could be automated in an ansible playbook with relative ease.

Another way that may work (less confidence here) is to stop every container on 
every node, and run kolla-ansible deploy.  There is no stop operation in 
kolla-ansible – but you can look at the cleanup code here to craft your own:

https://github.com/openstack/kolla/blob/master/ansible/cleanup.yml
https://github.com/openstack/kolla/blob/master/tools/kolla-ansible#L45
https://github.com/openstack/kolla/blob/master/ansible/roles/cleanup/tasks/main.yml#L1-L4

Make certain to leave out line 6 – as that removes named volumes (you would 
lose your persistent data).  You only need lines 1-4 (of main.yml).

Please file a bug.

Maybe someone else has a more elegant solution.


> Reconfigure was designed to handle changes to /etc/kolla/config/* (where 
> custom config for services live).  Reconfigure in its current incarnation in 
> all our branches and master is really a misnomer – it should be  
> service-reconfgiure – but that is wordy and we plan to make globals.yml 
> reconfigurable if feasible – but probably not anytime soon.

There is no /etc/kolla/config/* located in my env before or after successful 
deployment. Is that right?

That is right.  To provide custom configuration for nova for example, you could 
add a /etc/kolla/config/nova.conf file and put:

[libvirt]
virt_typer=qemu

(documented here: 
http://docs.openstack.org/developer/kolla/deployment-philosophy.html)

Run reconfigure and from that point forward all of your machines would use QEMU 
software emulation instead of KVM hardware virt.  The use case the reconfigure 
action was designed to handle was reconfiguring /etc/kolla/config files, (e.g. 
Merging custom config with defaults while overriding when that is called for).

Handling a reconfiguration of globals.yml and passwords.yml is much more 
complex.  I'd like to see us get