[openstack-dev] [Congress] Austin recap

Tim Hinrichs Tue, 03 May 2016 11:41:26 -0700

Hi all,

Here’s a quick summary of the Congress activities in Austin.  Everyone
should feel free to chime in with corrections and things I missed.


1. Talks

Masahito gave a talk on applying Congress for fault recovery in the context
of NFV.

https://www.openstack.org/summit/austin-2016/summit-schedule/events/7199

Fabio gave a talk on applying Congress + Monasca to enforce
application-level SLAs.

https://www.openstack.org/summit/austin-2016/summit-schedule/events/7363

2. Integrations

We had discussions, both within the Congress Integrations fishbowl session,
and outside of that session on potential integrations with other OpenStack
projects.  Here's a quick overview.

- Monasca (fabiog). The proposed integration: Monasca pushes data to
Congress using the push driver to let Congress know about the alarms
Monasca configured.  Can use multiple alarms using a single table.
Eventually we talked about having Congress analyze the policy to configure
the alarms that Monasca uses, completing the loop.

- Watcher (acabot). Watcher aims to optimize the placement of VMs by
pulling data from Ceilometer/Monasca and Nova (including
affinity/anti-affinity info), computing necessary migrations for whichever
strategy is configured, and migrates the VMs.  Want to use Congress as a
source of policies that they take into account when computing the necessary
migrations.

- Nova scheduler.  There’s interest in policy-enabling the Nova scheduler,
and then integrating that with Congress in the context of delegation, both
to give Congress the ability to pull in the scheduling policy and to push
the scheduling policy.

- Mistral.  The use case for this integration is to help people create an
HA solution for VMs.  So have Congress monitor VMs, identify when they have
failed, and kick off a Mistral workflow to resurrect them.

- Vintrage.  Vintrage does root-cause-analysis.  It provides a graph-based
model for the structure of the datacenter (switches attached to
hypervisors, servers attached to hypervisors, etc.) and a templating
language for defining how to create new alarms from existing alarms.  The
action item that we left is that the Vintrage team will initiate a mailing
list thread where we discuss which Vintrage data might be valuable for
Congress policies.

3. Working sessions

- The new distributed architecture is nearing completion.  There seems to
be 1 blocker for having the basic functionality ready to test: at boot,
Congress doesn’t properly spin up datasources that have already been
configured in the database.  As an experiment to see how close we were to
completion, we started up the Congress server with just the API and policy
engine and saw the basics actually working!  When we added the datasources,
we found a bug where the API was assuming the datasources could be
referenced by UUID, when in fact they can only be referenced by Name on the
message-bus.   So while there’s still quite a bit to do, we’re getting
close to having all the basics working.

- We made progress on the high-availability and high-throughput design.
This is still very much open to design and discussion, so continuing the
design on the mailing list would be great.  Here are the highlights.

   o  Policy engine: split into (i) active-active for queries to deal with
high-throughput (ii) active-passive for action-execution (requiring
leader-election, etc.).  Policy CRUD modifies DB; undecided whether API
also informs all policy-engines, or whether they all sync from the DB.

   o  Pull datasources: no obvious need for replication, since they restart
really fast and will just re-pull the latest data anyhow

   o  Push datasources: Need HA for ensuring the pusher can always push,
e.g. the pusher drops the message onto oslo-messaging.  Still up for debate
is whether we also need HA for storing the data since there is no way to
ask for it after a restart; one suggestion is that every datasource must
allow us to ask for the state.  HT does not require replication, since
syncing the state between several instances would be required and would be
less performant than a single instance.

   o  API (didn’t really discuss this, so here’s my take).  No obvious need
for replication for HT, since if the API is a bottleneck, the backend will
be an even bigger bottleneck.  For HA, could do active-active since the API
is just a front-end to the message bus + database, though we would need to
look into locking now that there is no GIL.

It was great seeing everyone in Austin!
Tim

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [Congress] Austin recap

Reply via email to