Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum
Excerpts from Steve Baker's message of 2014-08-10 15:33:26 -0700:
 On 02/08/14 04:07, Allison Randal wrote:
  A few of us have been independently experimenting with Ansible as a
  backend for TripleO, and have just decided to try experimenting
  together. I've chatted with Robert, and he says that TripleO was always
  intended to have pluggable backends (CM layer), and just never had
  anyone interested in working on them. (I see it now, even in the early
  docs and talks, I guess I just couldn't see the forest for the trees.)
  So, the work is in line with the overall goals of the TripleO project.
 
  We're starting with a tiny scope, focused only on updating a running
  TripleO deployment, so our first work is in:
 
  - Create an Ansible Dynamic Inventory plugin to extract metadata from Heat
  - Improve/extend the Ansible nova_compute Cloud Module (or create a new
  one), for Nova rebuild
  - Develop a minimal handoff from Heat to Ansible, particularly focused
  on the interactions between os-collect-config and Ansible
 
  We're merging our work in this repo, until we figure out where it should
  live:
 
  https://github.com/allisonrandal/tripleo-ansible
 
  We've set ourselves one week as the first sanity-check to see whether
  this idea is going anywhere, and we may scrap it all at that point. But,
  it seems best to be totally transparent about the idea from the start,
  so no-one is surprised later.
 
 Having pluggable backends for configuration seems like a good idea, and
 Ansible is a great choice for the first alternative backend.
 

TripleO is intended to be loosely coupled for many components, not just
in-instance configuration.

 However what this repo seems to be doing at the moment is bypassing heat
 to do a stack update, and I can only assume there is an eventual goal to
 not use heat at all for stack orchestration too.


 Granted, until blueprint update-failure-recovery lands[1] then doing a
 stack-update is about as much fun as russian roulette. But this effort
 is tactical rather than strategic, especially given TripleO's mission
 statement.
 

We intend to stay modular. Ansible won't replace Heat from end to end.

Right now we're stuck with an update that just doesn't work. It isn't
just about update-failure-recovery, which is coming along nicely, but
it is also about the lack of signals to control rebuild, poor support
for addressing machines as groups, and unacceptable performance in
large stacks.

We remain committed to driving these things into Heat, which will allow
us to address these things the way a large scale operation will need to.

But until we can land those things in Heat, we need something more
flexible like Ansible to go around Heat and do things in the exact
order we need them done. Ansible doesn't have a REST API, which is a
non-starter for modern automation, but the need to control workflow is
greater than the need to have a REST API at this point.

 If I were to use Ansible for TripleO configuration I would start with
 something like the following:
 * Install an ansible software-config hook onto the image to be triggered
 by os-refresh-config[2][3]
 * Incrementally replace StructuredConfig resources in
 tripleo-heat-templates with SoftwareConfig resources that include the
 ansible playbooks via get_file
 * The above can start in a fork of tripleo-heat-templates, but can
 eventually be structured using resource providers so that the deployer
 chooses what configuration backend to use by selecting the environment
 file that contains the appropriate config resources
 
 Now you have a cloud orchestrated by heat and configured by Ansible. If
 it is still deemed necessary to do an out-of-band update to the stack
 then you're in a much better position to do an ansible push, since you
 can use the same playbook files that heat used to bring up the stack.
 

That would be a good plan if we wanted to fix issues with os-*-config,
but that is the opposite of reality. We are working around Heat
orchestration issues with Ansible.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
 On 11/08/14 10:46, Clint Byrum wrote:
  Right now we're stuck with an update that just doesn't work. It isn't
  just about update-failure-recovery, which is coming along nicely, but
  it is also about the lack of signals to control rebuild, poor support
  for addressing machines as groups, and unacceptable performance in
  large stacks.
 
 Are there blueprints/bugs filed for all of these issues?
 

Convergnce addresses the poor performance for large stacks in general.
We also have this:

https://bugs.launchpad.net/heat/+bug/1306743

Which shows how slow metadata access can get. I have worked on patches
but haven't been able to complete them. We made big strides but we are
at a point where 40 nodes polling Heat every 30s is too much for one CPU
to handle. When we scaled Heat out onto more CPUs on one box by forking
we ran into eventlet issues. We also ran into issues because even with
many processes we can only use one to resolve templates for a single
stack during update, which was also excessively slow.

We haven't been able to come back around to those yet, but you can see
where this has turned into a bit of a rat hole of optimization.

action-aware-sw-config is sort of what we want for rebuild. We
collaborated with the trove devs on how to also address it for resize
a while back but I have lost track of that work as it has taken a back
seat to more pressing issues.

Addressing groups is a general problem that I've had a hard time
articulating in the past. Tomas Sedovic has done a good job with this
TripleO spec, but I don't know that we've asked for an explicit change
in a bug or spec in Heat just yet:

https://review.openstack.org/#/c/97939/

There are a number of other issues noted in that spec which are already
addressed in Heat, but require refactoring in TripleO's templates and
tools, and that work continues.

The point remains: we need something that works now, and doing an
alternate implementation for updates is actually faster than addressing
all of these issues.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
 On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
   On 11/08/14 10:46, Clint Byrum wrote:
Right now we're stuck with an update that just doesn't work. It isn't
just about update-failure-recovery, which is coming along nicely, but
it is also about the lack of signals to control rebuild, poor support
for addressing machines as groups, and unacceptable performance in
large stacks.
   
   Are there blueprints/bugs filed for all of these issues?
   
  
  Convergnce addresses the poor performance for large stacks in general.
  We also have this:
  
  https://bugs.launchpad.net/heat/+bug/1306743
  
  Which shows how slow metadata access can get. I have worked on patches
  but haven't been able to complete them. We made big strides but we are
  at a point where 40 nodes polling Heat every 30s is too much for one CPU
  to handle. When we scaled Heat out onto more CPUs on one box by forking
  we ran into eventlet issues. We also ran into issues because even with
  many processes we can only use one to resolve templates for a single
  stack during update, which was also excessively slow.
 
 Related to this, and a discussion we had recently at the TripleO meetup is
 this spec I raised today:
 
 https://review.openstack.org/#/c/113296/
 
 It's following up on the idea that we could potentially address (or at
 least mitigate, pending the fully convergence-ified heat) some of these
 scalability concerns, if TripleO moves from the one-giant-template model
 to a more modular nested-stack/provider model (e.g what Tomas has been
 working on)
 
 I've not got into enough detail on that yet to be sure if it's acheivable
 for Juno, but it seems initially to be complex-but-doable.
 
 I'd welcome feedback on that idea and how it may fit in with the more
 granular convergence-engine model.
 
 Can you link to the eventlet/forking issues bug please?  I thought since
 bug #1321303 was fixed that multiple engines and multiple workers should
 work OK, and obviously that being true is a precondition to expending
 significant effort on the nested stack decoupling plan above.
 

That was the issue. So we fixed that bug, but we never un-reverted
the patch that forks enough engines to use up all the CPU's on a box
by default. That would likely help a lot with metadata access speed
(we could manually do it in TripleO but we tend to push defaults. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][heat] a small experiment with Ansible in TripleO

2014-08-11 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-08-11 13:35:44 -0700:
 On 11/08/14 14:49, Clint Byrum wrote:
  Excerpts from Steven Hardy's message of 2014-08-11 11:40:07 -0700:
  On Mon, Aug 11, 2014 at 11:20:50AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-11 08:16:56 -0700:
  On 11/08/14 10:46, Clint Byrum wrote:
  Right now we're stuck with an update that just doesn't work. It isn't
  just about update-failure-recovery, which is coming along nicely, but
  it is also about the lack of signals to control rebuild, poor support
  for addressing machines as groups, and unacceptable performance in
  large stacks.
 
  Are there blueprints/bugs filed for all of these issues?
 
 
  Convergnce addresses the poor performance for large stacks in general.
  We also have this:
 
  https://bugs.launchpad.net/heat/+bug/1306743
 
  Which shows how slow metadata access can get. I have worked on patches
  but haven't been able to complete them. We made big strides but we are
  at a point where 40 nodes polling Heat every 30s is too much for one CPU
 
 This sounds like the same figure I heard at the design summit; did the 
 DB call optimisation work that Steve Baker did immediately after that 
 not have any effect?
 

Steve's work got us to 40. From 7.

  to handle. When we scaled Heat out onto more CPUs on one box by forking
  we ran into eventlet issues. We also ran into issues because even with
  many processes we can only use one to resolve templates for a single
  stack during update, which was also excessively slow.
 
  Related to this, and a discussion we had recently at the TripleO meetup is
  this spec I raised today:
 
  https://review.openstack.org/#/c/113296/
 
  It's following up on the idea that we could potentially address (or at
  least mitigate, pending the fully convergence-ified heat) some of these
  scalability concerns, if TripleO moves from the one-giant-template model
  to a more modular nested-stack/provider model (e.g what Tomas has been
  working on)
 
  I've not got into enough detail on that yet to be sure if it's acheivable
  for Juno, but it seems initially to be complex-but-doable.
 
  I'd welcome feedback on that idea and how it may fit in with the more
  granular convergence-engine model.
 
  Can you link to the eventlet/forking issues bug please?  I thought since
  bug #1321303 was fixed that multiple engines and multiple workers should
  work OK, and obviously that being true is a precondition to expending
  significant effort on the nested stack decoupling plan above.
 
 
  That was the issue. So we fixed that bug, but we never un-reverted
  the patch that forks enough engines to use up all the CPU's on a box
  by default. That would likely help a lot with metadata access speed
  (we could manually do it in TripleO but we tend to push defaults. :)
 
 Right, and we decided we wouldn't because it's wrong to do that to 
 people by default. In some cases the optimal running configuration for 
 TripleO will differ from the friendliest out-of-the-box configuration 
 for Heat users in general, and in those cases - of which this is one - 
 TripleO will need to specify the configuration.
 

Whether or not the default should be to fork 1 process per CPU is a
debate for another time. The point is, we can safely use the forking in
Heat now to perhaps improve performance of metadata polling.

Chasing that, and other optimizations, has not led us to a place where
we can get to, say, 100 real nodes _today_. We're chasing another way to
get to the scale and capability we need _today_, in much the same way
we did with merge.py. We'll find the way to get it done more elegantly
as time permits.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-13 Thread Clint Byrum
Excerpts from Thierry Carrez's message of 2014-08-13 02:54:58 -0700:
 Rochelle.RochelleGrober wrote:
  [...]
  So, with all that prologue, here is what I propose (and please consider 
  proposing your improvements/changes to it).  I would like to see for Kilo:
  
  - IRC meetings and mailing list meetings beginning with Juno release and 
  continuing through the summit that focus on core project needs (what 
  Thierry call strategic) that as a set would be considered the primary 
  focus of the Kilo release for each project.  This could include high 
  priority bugs, refactoring projects, small improvement projects, high 
  interest extensions and new features, specs that didn't make it into Juno, 
  etc.
  - Develop the list and prioritize it into Needs and Wants. Consider 
  these the feeder projects for the two runways if you like.  
  - Discuss the lists.  Maybe have a community vote? The vote will freeze 
  the list, but as in most development project freezes, it can be a soft 
  freeze that the core, or drivers or TC can amend (or throw out for that 
  matter).
  [...]
 
 One thing we've been unable to do so far is to set release goals at
 the beginning of a release cycle and stick to those. It used to be
 because we were so fast moving that new awesome stuff was proposed
 mid-cycle and ends up being a key feature (sometimes THE key feature)
 for the project. Now it's because there is so much proposed noone knows
 what will actually get completed.
 
 So while I agree that what you propose is the ultimate solution (and the
 workflow I've pushed PTLs to follow every single OpenStack release so
 far), we have struggled to have the visibility, long-term thinking and
 discipline to stick to it in the past. If you look at the post-summit
 plans and compare to what we end up in a release, you'll see quite a lot
 of differences :)
 

I think that shows agility, and isn't actually a problem. 6 months
is quite a long time in the future for some business models. Strategic
improvements for the project should be able to stick to a 6 month
schedule, but companies will likely be tactical about where their
developer resources are directed for feature work.

The fact that those resources land code upstream is one of the greatest
strengths of OpenStack. Any potential impact on how that happens should
be carefully considered when making any changes to process and
governance.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories

2014-08-15 Thread Clint Byrum
Excerpts from Brownell, Jonathan C (Corvallis)'s message of 2014-08-15 08:11:18 
-0700:
 The current DIB element support for downloading tarballs via 
 source-repository allows an entry in the following form:
 
 name tar targetdir url
 
 Today, this feature is currently used only by the mysql DIB element. You can 
 see how it's used here:
 https://github.com/openstack/tripleo-image-elements/blob/master/elements/mysql/source-repository-mysql
 
 However, the underlying diskimage-builder implementation of tarball handling 
 is rather odd and inflexible. After downloading the file (or retrieving from 
 cache) and unpacking into a tmp directory, it performs:
 
 mv $tmp/*/* $targetdir
 
 This does work as long as the tarball follows a structure where all its 
 files/directories are contained within a single directory, but it fails if 
 the tarball contains no subdirectories. (Even worse is when it contains some 
 files and some subdirectories, in which case the files are lost and the 
 contents of all subdirs get lumped together in the output folder.)
 
 Since this tarball support is only used today by the mysql DIB element, I 
 would love to fix this in both diskimage-builder and tripleo-image-element by 
 changing to simply:
 
 mv $tmp/* $targetdir
 
 And then manually tweaking the directory structure of $targetdir from a new 
 install.d script in the mysql element to restore the desired layout.
 
 However, it's important to note that this will break backwards compatibility 
 if tarball support is used in its current fashion by users with private DIB 
 elements.
 
 Personally, I consider the current behavior so egregious that it really needs 
 to be fixed across the board rather than preserving backwards compatibility.
 
 Do others agree? If not, do you have suggestions as to how to improve this 
 mechanism cleanly without sacrificing backwards compatibility?
 

How about we make a glob to use, so like this:

mysql tar /usr/local/mysql http://someplace/mysql.tar.gz mysql-5.*

That would result in

mv $tmp/mysql-5.*/* $targetdir

And then we would warn that assuming the glob will be '*' is deprecated,
to be changed in a later release.

Users who want your proposed behavior would use . until the default
changes. That would result in

mv $tmp/./* $targetdir

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] fix poor tarball support in source-repositories

2014-08-16 Thread Clint Byrum
Excerpts from Jyoti Ranjan's message of 2014-08-16 00:57:52 -0700:
 We will have to be little bit cautious in using glob because of its
 inherent usage pattern. For e.g. the file starting with . will not get
 matched.
 

That is a separate bug, but I think the answer to that is to use rsync
instead of mv and globs. So this:

mv $tmp/./* $destdir

becomes this:

rsync --remove-source-files $tmp/. $destdir

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Time to Samba! :-)

2014-08-16 Thread Clint Byrum
Excerpts from Martinx - ジェームズ's message of 2014-08-16 12:03:20 -0700:
 Hey Stackers,
 
  I'm wondering here... Samba4 is pretty solid (up coming 4.2 rocks), I'm
 using it on a daily basis as an AD DC controller, for both Windows and
 Linux Instances! With replication, file system ACLs - cifs, built-in LDAP,
 dynamic DNS with Bind9 as a backend (no netbios) and etc... Pretty cool!
 
  In OpenStack ecosystem, there are awesome solutions like Trove, Solum,
 Designate and etc... Amazing times BTW! So, why not try to integrate
 Samba4, working as an AD DC, within OpenStack itself?!
 

But, if we did that, what would be left for us to reinvent in our own
slightly different way?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-17 Thread Clint Byrum
Here's why folk are questioning Ceilometer:

Nova is a set of tools to abstract virtualization implementations.
Neutron is a set of tools to abstract SDN/NFV implementations.
Cinder is a set of tools to abstract block-device implementations.
Trove is a set of tools to simplify consumption of existing databases.
Sahara is a set of tools to simplify Hadoop consumption.
Swift is a feature-complete implementation of object storage, none of
which existed when it was started.
Keystone supports all of the above, unifying their auth.
Horizon supports all of the above, unifying their GUI.

Ceilometer is a complete implementation of data collection and alerting.
There is no shortage of implementations that exist already.

I'm also core on two projects that are getting some push back these
days:

Heat is a complete implementation of orchestration. There are at least a
few of these already in existence, though not as many as their are data
collection and alerting systems.

TripleO is an attempt to deploy OpenStack using tools that OpenStack
provides. There are already quite a few other tools that _can_ deploy
OpenStack, so it stands to reason that people will question why we
don't just use those. It is my hope we'll push more into the unifying
the implementations space and withdraw a bit from the implementing
stuff space.

So, you see, people are happy to unify around a single abstraction, but
not so much around a brand new implementation of things that already
exist.

Excerpts from Nadya Privalova's message of 2014-08-17 11:11:34 -0700:
 Hello all,
 
 As a Ceilometer's core, I'd like to add my 0.02$.
 
 During previous discussions it was mentioned several projects which were
 started or continue to be developed after Ceilometer became integrated. The
 main question I'm thinking of is why it was impossible to contribute into
 existing integrated project? Is it because of Ceilometer's architecture,
 the team or there are some other (maybe political) reasons? I think it's a
 very sad situation when we have 3-4 Ceilometer-like projects from different
 companies instead of the only one that satisfies everybody. (We don't see
 it in other projects. Though, maybe there are several Novas os Neutrons on
 StackForge and I don't know about it...)
 Of course, sometimes it's much easier to start the project from scratch.
 But there should be strong reasons for doing this if we are talking about
 integrated project.
 IMHO the idea, the role is the most important thing when we are talking
 about integrated project. And if Ceilometer's role is really needed (and I
 think it is) then we should improve existing implementation, merge all
 needs into the one project and the result will be still Ceilometer.
 
 Thanks,
 Nadya
 
 On Fri, Aug 15, 2014 at 12:41 AM, Joe Gordon joe.gord...@gmail.com wrote:
 
 
 
 
  On Wed, Aug 13, 2014 at 12:24 PM, Doug Hellmann d...@doughellmann.com
  wrote:
 
 
  On Aug 13, 2014, at 3:05 PM, Eoghan Glynn egl...@redhat.com wrote:
 
  
   At the end of the day, that's probably going to mean saying No to more
   things. Everytime I turn around everyone wants the TC to say No to
   things, just not to their particular thing. :) Which is human nature.
   But I think if we don't start saying No to more things we're going to
   end up with a pile of mud that no one is happy with.
  
   That we're being so abstract about all of this is frustrating. I get
   that no-one wants to start a flamewar, but can someone be concrete
  about
   what they feel we should say 'no' to but are likely to say 'yes' to?
  
  
   I'll bite, but please note this is a strawman.
  
   No:
   * Accepting any more projects into incubation until we are comfortable
  with
   the state of things again
   * Marconi
   * Ceilometer
  
   Well -1 to that, obviously, from me.
  
   Ceilometer is on track to fully execute on the gap analysis coverage
   plan agreed with the TC at the outset of this cycle, and has an active
   plan in progress to address architectural debt.
 
  Yes, there seems to be an attitude among several people in the community
  that the Ceilometer team denies that there are issues and refuses to work
  on them. Neither of those things is the case from our perspective.
 
 
  Totally agree.
 
 
 
  Can you be more specific about the shortcomings you see in the project
  that aren’t being addressed?
 
 
 
  Once again, this is just a strawman.
 
  I'm just not sure OpenStack has 'blessed' the best solution out there.
 
 
  https://wiki.openstack.org/wiki/Ceilometer/Graduation#Why_we_think_we.27re_ready
 
  
 
 - Successfully passed the challenge of being adopted by 3 related
 projects which have agreed to join or use ceilometer:
- Synaps
- Healthnmon
- StackTach

  https://wiki.openstack.org/w/index.php?title=StackTachaction=editredlink=1

 
 
  Stacktach seems to still be under active development (
  http://git.openstack.org/cgit/stackforge/stacktach/log/), is used by
  

Re: [openstack-dev] [all] The future of the integrated release

2014-08-20 Thread Clint Byrum
Excerpts from Robert Collins's message of 2014-08-18 23:41:20 -0700:
 On 18 August 2014 09:32, Clint Byrum cl...@fewbar.com wrote:
 
 I can see your perspective but I don't think its internally consistent...
 
  Here's why folk are questioning Ceilometer:
 
  Nova is a set of tools to abstract virtualization implementations.
 
 With a big chunk of local things - local image storage (now in
 glance), scheduling, rebalancing, ACLs and quotas. Other
 implementations that abstract over VM's at various layers already
 existed when Nova started - some bad ( some very bad!) and others
 actually quite ok.
 

The fact that we have local implementations of domain specific things is
irrelevant to the difference I'm trying to point out. Glance needs to
work with the same authentication semantics and share a common access
catalog to work well with Nova. It's unlikely there's a generic image
catalog that would ever fit this bill. In many ways glance is just an
abstraction of file storage backends and a database to track a certain
domain of files (images, and soon, templates and other such things).

The point of mentioning Nova is, we didn't write libvirt, or xen, we
wrote an abstraction so that users could consume them via a REST API
that shares these useful automated backends like glance.

  Neutron is a set of tools to abstract SDN/NFV implementations.
 
 And implements a DHCP service, DNS service, overlay networking : its
 much more than an abstraction-over-other-implementations.
 

Native DHCP and overlay? Last I checked Neutron used dnsmasq and
openvswitch, but it has been a few months, and I know that is an eon in
OpenStack time.

  Cinder is a set of tools to abstract block-device implementations.
  Trove is a set of tools to simplify consumption of existing databases.
  Sahara is a set of tools to simplify Hadoop consumption.
  Swift is a feature-complete implementation of object storage, none of
  which existed when it was started.
 
 Swift was started in 2009; Eucalyptus goes back to 2007, with Walrus
 part of that - I haven't checked precise dates, but I'm pretty sure
 that it existed and was usable by the start of 2009. There may well be
 other object storage implementations too - I simply haven't checked.
 

Indeed, and MogileFS was sort of like Swift but not HTTP based. Perhaps
Walrus was evaluated and inadequate for the CloudFiles product
requirements? I don't know. But there weren't de-facto object stores
at the time because object stores were just becoming popular.

  Keystone supports all of the above, unifying their auth.
 
 And implementing an IdP (which I know they want to stop doing ;)). And
 in fact lots of OpenStack projects, for various reasons support *not*
 using Keystone (something that bugs me, but thats a different
 discussion).
 

My point was it is justified to have a whole implementation and not
just abstraction because it is meant to enable the ecosystem, not _be_
the ecosystem. I actually think Keystone is problematic too, and I often
wonder why we haven't just do OAuth, but I'm not trying to throw every
project under the bus. I'm trying to state that we accept Keystone because
it has grown organically to support the needs of all the other pieces.

  Horizon supports all of the above, unifying their GUI.
 
  Ceilometer is a complete implementation of data collection and alerting.
  There is no shortage of implementations that exist already.
 
  I'm also core on two projects that are getting some push back these
  days:
 
  Heat is a complete implementation of orchestration. There are at least a
  few of these already in existence, though not as many as their are data
  collection and alerting systems.
 
  TripleO is an attempt to deploy OpenStack using tools that OpenStack
  provides. There are already quite a few other tools that _can_ deploy
  OpenStack, so it stands to reason that people will question why we
  don't just use those. It is my hope we'll push more into the unifying
  the implementations space and withdraw a bit from the implementing
  stuff space.
 
  So, you see, people are happy to unify around a single abstraction, but
  not so much around a brand new implementation of things that already
  exist.
 
 If the other examples we had were a lot purer, this explanation would
 make sense. I think there's more to it than that though :).
 

If purity is required to show a difference, then I don't think I know
how to demonstrate what I think is obvious to most of us: Ceilometer
is an end to end implementation of things that exist in many battle
tested implementations. I struggle to think of another component of
OpenStack that has this distinction.

 What exactly, I don't know, but its just too easy an answer, and one
 that doesn't stand up to non-trivial examination :(.
 
 I'd like to see more unification of implementations in TripleO - but I
 still believe our basic principle of using OpenStack technologies that
 already exist in preference to third party ones is still

Re: [openstack-dev] [all] The future of the integrated release

2014-08-20 Thread Clint Byrum
Excerpts from Jay Pipes's message of 2014-08-20 14:53:22 -0700:
 On 08/20/2014 05:06 PM, Chris Friesen wrote:
  On 08/20/2014 07:21 AM, Jay Pipes wrote:
  Hi Thierry, thanks for the reply. Comments inline. :)
 
  On 08/20/2014 06:32 AM, Thierry Carrez wrote:
  If we want to follow your model, we probably would have to dissolve
  programs as they stand right now, and have blessed categories on one
  side, and teams on the other (with projects from some teams being
  blessed as the current solution).
 
  Why do we have to have blessed categories at all? I'd like to think of
  a day when the TC isn't picking winners or losers at all. Level the
  playing field and let the quality of the projects themselves determine
  the winner in the space. Stop the incubation and graduation madness and
  change the role of the TC to instead play an advisory role to upcoming
  (and existing!) projects on the best ways to integrate with other
  OpenStack projects, if integration is something that is natural for the
  project to work towards.
 
  It seems to me that at some point you need to have a recommended way of
  doing things, otherwise it's going to be *really hard* for someone to
  bring up an OpenStack installation.
 
 Why can't there be multiple recommended ways of setting up an OpenStack 
 installation? Matter of fact, in reality, there already are multiple 
 recommended ways of setting up an OpenStack installation, aren't there?
 
 There's multiple distributions of OpenStack, multiple ways of doing 
 bare-metal deployment, multiple ways of deploying different message 
 queues and DBs, multiple ways of establishing networking, multiple open 
 and proprietary monitoring systems to choose from, etc. And I don't 
 really see anything wrong with that.
 

This is an argument for loosely coupling things, rather than tightly
integrating things. You will almost always win my vote with that sort of
movement, and you have here. +1.

  We already run into issues with something as basic as competing SQL
  databases.
 
 If the TC suddenly said Only MySQL will be supported, that would not 
 mean that the greater OpenStack community would be served better. It 
 would just unnecessarily take options away from deployers.
 

This is really where supported becomes the mutex binding us all. The
more supported options, the larger the matrix, the more complex a
user's decision process becomes.

   If every component has several competing implementations and
  none of them are official how many more interaction issues are going
  to trip us up?
 
 IMO, OpenStack should be about choice. Choice of hypervisor, choice of 
 DB and MQ infrastructure, choice of operating systems, choice of storage 
 vendors, choice of networking vendors.
 

Err, uh. I think OpenStack should be about users. If having 400 choices
means users are just confused, then OpenStack becomes nothing and
everything all at once. Choices should be part of the whole not when 1%
of the market wants a choice, but when 20%+ of the market _requires_
a choice.

What we shouldn't do is harm that 1%'s ability to be successful. We should
foster it and help it grow, but we don't just pull it into the program and
say You're ALSO in OpenStack now! and we also don't want to force those
users to make a hard choice because the better solution is not blessed.

 If there are multiple actively-developed projects that address the same 
 problem space, I think it serves our OpenStack users best to let the 
 projects work things out themselves and let the cream rise to the top. 
 If the cream ends up being one of those projects, so be it. If the cream 
 ends up being a mix of both projects, so be it. The production community 
 will end up determining what that cream should be based on what it 
 deploys into its clouds and what input it supplies to the teams working 
 on competing implementations.
 

I'm really not a fan of making it a competitive market. If a space has a
diverse set of problems, we can expect it will have a diverse set of
solutions that overlap. But that doesn't mean they both need to drive
toward making that overlap all-encompassing. Sometimes that happens and
it is good, and sometimes that happens and it causes horrible bloat.

 And who knows... what works or is recommended by one deployer may not be 
 what is best for another type of deployer and I believe we (the 
 TC/governance) do a disservice to our user community by picking a winner 
 in a space too early (or continuing to pick a winner in a clearly 
 unsettled space).
 

Right, I think our current situation crowds out diversity, when what we
want to do is enable it, without confusing the users.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-21 Thread Clint Byrum
Excerpts from Duncan Thomas's message of 2014-08-21 09:21:06 -0700:
 On 21 August 2014 14:27, Jay Pipes jaypi...@gmail.com wrote:
 
  Specifically for Triple-O, by making the Deployment program == Triple-O, the
  TC has picked the disk-image-based deployment of an undercloud design as The
  OpenStack Way of Deployment. And as I've said previously in this thread, I
  believe that the deployment space is similarly unsettled, and that it would
  be more appropriate to let the Chef cookbooks and Puppet modules currently
  sitting in the stackforge/ code namespace live in the openstack/ code
  namespace.
 
 Totally agree with Jay here, I know people who gave up on trying to
 get any official project around deployment because they were told they
 had to do it under the TripleO umbrella
 

This was why the _program_ versus _project_ distinction was made. But
I think we ended up being 1:1 anyway.

Perhaps the deployment program's mission statement is too narrow, and
we should iterate on that. That others took their ball and went home,
instead of asking for a review of that ruling, is a bit disconcerting.

That probably strikes to the heart of the current crisis. If we were
being reasonable, alternatives to an official OpenStack program's mission
statement would be debated and considered thoughtfully. I know I made the
mistake early on of pushing the narrow _TripleO_ vision into what should
have been a much broader Deployment program. I'm not entirely sure why
that seemed o-k to me at the time, or why it was allowed to continue, but
I think it may be a good exercise to review those events and try to come
up with a few theories or even conclusions as to what we could do better.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-21 Thread Clint Byrum
Excerpts from David Kranz's message of 2014-08-21 12:45:05 -0700:
 On 08/21/2014 02:39 PM, gordon chung wrote:
   The point I've been making is
   that by the TC continuing to bless only the Ceilometer project as the
   OpenStack Way of Metering, I think we do a disservice to our users by
   picking a winner in a space that is clearly still unsettled.
 
  can we avoid using the word 'blessed' -- it's extremely vague and 
  seems controversial. from what i know, no one is being told project 
  x's services are the be all end all and based on experience, companies 
  (should) know this. i've worked with other alternatives even though i 
  contribute to ceilometer.
   Totally agree with Jay here, I know people who gave up on trying to
   get any official project around deployment because they were told they
   had to do it under the TripleO umbrella
  from the pov of a project that seems to be brought up constantly and 
  maybe it's my naivety, i don't really understand the fascination with 
  branding and the stigma people have placed on 
  non-'openstack'/stackforge projects. it can't be a legal thing because 
  i've gone through that potential mess. also, it's just as easy to 
  contribute to 'non-openstack' projects as 'openstack' projects (even 
  easier if we're honest).
 Yes, we should be honest. The even easier part is what Sandy cited as 
 the primary motivation for pursuing stacktach instead of ceilometer.
 
 I think we need to consider the difference between why OpenStack wants 
 to bless a project, and why a project might want to be blessed by 
 OpenStack. Many folks believe that for OpenStack to be successful it 
 needs to present itself as a stack that can be tested and deployed, not 
 a sack of parts that only the most extremely clever people can manage to 
 assemble into an actual cloud. In order to have such a stack, some code 
 (or, alternatively, dare I say API...) needs to be blessed. Reasonable 
 debates will continue about which pieces are essential to this stack, 
 and which should be left to deployers, but metering was seen as such a 
 component and therefore something needed to be blessed. The hope was 
 that every one would jump on that and make it great but it seems that 
 didn't quite happen (at least yet).
 
 Though Open Source has many advantages over proprietary development, the 
 ability to choose a direction and marshal resources for efficient 
 delivery is the biggest advantage of proprietary development like what 
 AWS does. The TC process of blessing is, IMO, an attempt to compensate 
 for that in an OpenSource project. Of course if the wrong code is 
 blessed, the negative  impact can be significant. Blessing APIs would be 

Hm, I wonder if the only difference there is when AWS blesses the wrong
thing, they evaluate the business impact, and respond by going in a
different direction, all behind closed doors. The shame is limited to
that inner circle.

Here, with full transparency, calling something the wrong thing is
pretty much public humiliation for the team involved.

So it stands to reason that we shouldn't call something the right
thing if we aren't comfortable with the potential public shaming.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Glance][Heat] Murano split dsicussion

2014-08-21 Thread Clint Byrum
Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700:
 During last Atlanta summit there were couple discussions about Application
 Catalog and Application space projects in OpenStack. These cross-project
 discussions occurred as a result of Murano incubation request [1] during
 Icehouse cycle.  On the TC meeting devoted to Murano incubation there was
 an idea about splitting the Murano into parts which might belong to
 different programs[2].
 
 
 Today, I would like to initiate a discussion about potential splitting of
 Murano between two or three programs.
 
 
 *App Catalog API to Catalog Program*
 
 Application Catalog part can belong to Catalog program, the package
 repository will move to artifacts repository part where Murano team already
 participates. API part of App Catalog will add a thin layer of API methods
 specific to Murano applications and potentially can be implemented as a
 plugin to artifacts repository. Also this API layer will expose other 3rd
 party systems API like CloudFoundry ServiceBroker API which is used by
 CloudFoundry marketplace feature to provide an integration layer between
 OpenStack Application packages and 3rd party PaaS tools.
 
 

I thought this was basically already agreed upon, and that Glance was
just growing the ability to store more than just images.

 
 *Murano Engine to Orchestration Program*
 
 Murano engine orchestrates the Heat template generation. Complementary to a
 Heat declarative approach, Murano engine uses imperative approach so that
 it is possible to control the whole flow of the template generation. The
 engine uses Heat updates to update Heat templates to reflect changes in
 applications layout. Murano engine has a concept of actions - special flows
 which can be called at any time after application deployment to change
 application parameters or update stacks. The engine is actually
 complementary to Heat engine and adds the following:
 
 
- orchestrate multiple Heat stacks - DR deployments, HA setups, multiple
datacenters deployment

These sound like features already requested directly in Heat.

- Initiate and controls stack updates on application specific events

Sounds like workflow. :)

- Error handling and self-healing - being imperative Murano allows you
to handle issues and implement additional logic around error handling and
self-healing.

Also sounds like workflow.

 


I think we need to re-think what a program is before we consider this.

I really don't know much about Murano. I have no interest in it at
all, and nobody has come to me saying If we only had Murano in our
orchestration toolbox, we'd solve xxx. But making them part of the
Orchestration program would imply that we'll do design sessions together,
that we'll share the same mission statement, and that we'll have just
one PTL. I fail to see why they're not another, higher level program
that builds on top of the other services.

 
 
 *Murano UI to Dashboard Program*
 
 Application Catalog requires  a UI focused on user experience. Currently
 there is a Horizon plugin for Murano App Catalog which adds Application
 catalog page to browse, search and filter applications. It also adds a
 dynamic UI functionality to render a Horizon forms without writing an
 actual code.
 
 

I feel like putting all the UI plugins in Horizon is the same mistake
as putting all of the functional tests in Tempest. It doesn't have the
affect of breaking the gate but it probably is a lot of burden on a
single team.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Glance][Heat] Murano split dsicussion

2014-08-22 Thread Clint Byrum
Excerpts from Angus Salkeld's message of 2014-08-21 20:14:12 -0700:
 On Fri, Aug 22, 2014 at 12:34 PM, Clint Byrum cl...@fewbar.com wrote:
 
  Excerpts from Georgy Okrokvertskhov's message of 2014-08-20 13:14:28 -0700:
   During last Atlanta summit there were couple discussions about
  Application
   Catalog and Application space projects in OpenStack. These cross-project
   discussions occurred as a result of Murano incubation request [1] during
   Icehouse cycle.  On the TC meeting devoted to Murano incubation there was
   an idea about splitting the Murano into parts which might belong to
   different programs[2].
  
  
   Today, I would like to initiate a discussion about potential splitting of
   Murano between two or three programs.
  
  
   *App Catalog API to Catalog Program*
  
   Application Catalog part can belong to Catalog program, the package
   repository will move to artifacts repository part where Murano team
  already
   participates. API part of App Catalog will add a thin layer of API
  methods
   specific to Murano applications and potentially can be implemented as a
   plugin to artifacts repository. Also this API layer will expose other 3rd
   party systems API like CloudFoundry ServiceBroker API which is used by
   CloudFoundry marketplace feature to provide an integration layer between
   OpenStack Application packages and 3rd party PaaS tools.
  
  
 
  I thought this was basically already agreed upon, and that Glance was
  just growing the ability to store more than just images.
 
  
   *Murano Engine to Orchestration Program*
  
   Murano engine orchestrates the Heat template generation. Complementary
  to a
   Heat declarative approach, Murano engine uses imperative approach so that
   it is possible to control the whole flow of the template generation. The
   engine uses Heat updates to update Heat templates to reflect changes in
   applications layout. Murano engine has a concept of actions - special
  flows
   which can be called at any time after application deployment to change
   application parameters or update stacks. The engine is actually
   complementary to Heat engine and adds the following:
  
  
  - orchestrate multiple Heat stacks - DR deployments, HA setups,
  multiple
  datacenters deployment
 
  These sound like features already requested directly in Heat.
 
  - Initiate and controls stack updates on application specific events
 
  Sounds like workflow. :)
 
  - Error handling and self-healing - being imperative Murano allows you
  to handle issues and implement additional logic around error handling
  and
  self-healing.
 
  Also sounds like workflow.
 
  
 
 
  I think we need to re-think what a program is before we consider this.
 
  I really don't know much about Murano. I have no interest in it at
 
 
 get off my lawn;)
 

And turn down that music!

Sorry for the fist shaking, but I wan to highlight that I'm happy to
consider it, just not with programs working the way they do now.

 http://stackalytics.com/?project_type=allmodule=murano-group
 
 HP seems to be involved, you should check it out.
 

HP is involved in a lot of OpenStack things. It's a bit hard for me to
keep my eyes on everything we do. Good to know that others have been able
to take some time and buy into it a bit. +1 for distributing the load. :)

  all, and nobody has come to me saying If we only had Murano in our
  orchestration toolbox, we'd solve xxx. But making them part of the
 
 
 I thought you were saying that opsworks was neat the other day?
 Murano from what I understand was partly inspired from opsworks, yes
 it's a layer up, but still really the same field.


I was saying that OpsWorks is reportedly popular, yes. I did not make
the connection at all from OpsWorks to Murano, and nobody had pointed
that out to me until now.

  Orchestration program would imply that we'll do design sessions together,
  that we'll share the same mission statement, and that we'll have just
 
 
 This is exactly what I hope will happen.
 

Which sessions from last summit would we want to give up to make room
for the Murano-only focused sessions? How much time in our IRC meeting
should we give to Murano-only concerns?

Forgive me for being harsh. We have a cloud to deploy using Heat,
and it is taking far too long to get Heat to do that in an acceptable
manner already. Adding load to our PTL and increasing the burden on our
communication channels doesn't really seem like something that will
increase our velocity. I could be dead wrong though, Murano could be
exactly what we need. I just don't see it, and I'm sorry to be so direct
about saying that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-22 Thread Clint Byrum
Excerpts from Michael Chapman's message of 2014-08-21 23:30:44 -0700:
 On Fri, Aug 22, 2014 at 2:57 AM, Jay Pipes jaypi...@gmail.com wrote:
 
  On 08/19/2014 11:28 PM, Robert Collins wrote:
 
  On 20 August 2014 02:37, Jay Pipes jaypi...@gmail.com wrote:
  ...
 
   I'd like to see more unification of implementations in TripleO - but I
  still believe our basic principle of using OpenStack technologies that
  already exist in preference to third party ones is still sound, and
  offers substantial dogfood and virtuous circle benefits.
 
 
 
  No doubt Triple-O serves a valuable dogfood and virtuous cycle purpose.
  However, I would move that the Deployment Program should welcome the many
  projects currently in the stackforge/ code namespace that do deployment
  of
  OpenStack using traditional configuration management tools like Chef,
  Puppet, and Ansible. It cannot be argued that these configuration
  management
  systems are the de-facto way that OpenStack is deployed outside of HP,
  and
  they belong in the Deployment Program, IMO.
 
 
  I think you mean it 'can be argued'... ;).
 
 
  No, I definitely mean cannot be argued :) HP is the only company I know
  of that is deploying OpenStack using Triple-O. The vast majority of
  deployers I know of are deploying OpenStack using configuration management
  platforms and various systems or glue code for baremetal provisioning.
 
  Note that I am not saying that Triple-O is bad in any way! I'm only saying
  that it does not represent the way that the majority of real-world
  deployments are done.
 
 
   And I'd be happy if folk in
 
  those communities want to join in the deployment program and have code
  repositories in openstack/. To date, none have asked.
 
 
  My point in this thread has been and continues to be that by having the TC
  bless a certain project as The OpenStack Way of X, that we implicitly are
  saying to other valid alternatives Sorry, no need to apply here..
 
 
   As a TC member, I would welcome someone from the Chef community proposing
  the Chef cookbooks for inclusion in the Deployment program, to live under
  the openstack/ code namespace. Same for the Puppet modules.
 
 
  While you may personally welcome the Chef community to propose joining the
  deployment Program and living under the openstack/ code namespace, I'm just
  saying that the impression our governance model and policies create is one
  of exclusion, not inclusion. Hope that clarifies better what I've been
  getting at.
 
 
 
 (As one of the core reviewers for the Puppet modules)
 
 Without a standardised package build process it's quite difficult to test
 trunk Puppet modules vs trunk official projects. This means we cut release
 branches some time after the projects themselves to give people a chance to
 test. Until this changes and the modules can be released with the same
 cadence as the integrated release I believe they should remain on
 Stackforge.
 

Seems like the distros that build the packages are all doing lots of
daily-build type stuff that could somehow be leveraged to get over that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum
It has been brought to my attention that Ironic uses the biggest hammer
in the IPMI toolbox to control chassis power:

https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142

Which is

ret = ipmicmd.set_power('off', wait)

This is the most abrupt form, where the system power should be flipped
off at a hardware level. The short press on the power button would be
'shutdown' instead of 'off'.

I also understand that this has been brought up before, and that the
answer given was SSH in and shut it down yourself. I can respect that
position, but I have run into a bit of a pickle using it. Observe:

- ssh box.ip poweroff
- poll ironic until power state is off.
  - This is a race. Ironic is asserting the power. As soon as it sees
that the power is off, it will turn it back on.

- ssh box.ip halt
  - NO way to know that this has worked. Once SSH is off and the network
stack is gone, I cannot actually verify that the disks were
unmounted properly, which is the primary area of concern that I
have.

This is particulary important if I'm issuing a rebuild + preserve
ephemeral, as it is likely I will have lots of I/O going on, and I want
to make sure that it is all quiesced before I reboot to replace the
software and reboot.

Perhaps I missed something. If so, please do educate me on how I can
achieve this without hacking around it. Currently my workaround is to
manually unmount the state partition, which is something system shutdown
is supposed to do and may become problematic if system processes are
holding it open.

It seems to me that Ironic should at least try to use the graceful
shutdown. There can be a timeout, but it would need to be something a user
can disable so if graceful never works we never just dump the power on the
box. Even a journaled filesystem will take quite a bit to do a full fsck.

The inability to gracefully shutdown in a reasonable amount of time
is an error state really, and I need to go to the box and inspect it,
which is precisely the reason we have ERROR states.

Thanks for your time. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

2014-08-22 Thread Clint Byrum
Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700:
 On 08/22/2014 01:48 PM, Clint Byrum wrote:
  It has been brought to my attention that Ironic uses the biggest hammer
  in the IPMI toolbox to control chassis power:
 
  https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
 
  Which is
 
   ret = ipmicmd.set_power('off', wait)
 
  This is the most abrupt form, where the system power should be flipped
  off at a hardware level. The short press on the power button would be
  'shutdown' instead of 'off'.
 
  I also understand that this has been brought up before, and that the
  answer given was SSH in and shut it down yourself. I can respect that
  position, but I have run into a bit of a pickle using it. Observe:
 
  - ssh box.ip poweroff
  - poll ironic until power state is off.
 - This is a race. Ironic is asserting the power. As soon as it sees
   that the power is off, it will turn it back on.
 
  - ssh box.ip halt
 - NO way to know that this has worked. Once SSH is off and the network
   stack is gone, I cannot actually verify that the disks were
   unmounted properly, which is the primary area of concern that I
   have.
 
  This is particulary important if I'm issuing a rebuild + preserve
  ephemeral, as it is likely I will have lots of I/O going on, and I want
  to make sure that it is all quiesced before I reboot to replace the
  software and reboot.
 
  Perhaps I missed something. If so, please do educate me on how I can
  achieve this without hacking around it. Currently my workaround is to
  manually unmount the state partition, which is something system shutdown
  is supposed to do and may become problematic if system processes are
  holding it open.
 
  It seems to me that Ironic should at least try to use the graceful
  shutdown. There can be a timeout, but it would need to be something a user
  can disable so if graceful never works we never just dump the power on the
  box. Even a journaled filesystem will take quite a bit to do a full fsck.
 
  The inability to gracefully shutdown in a reasonable amount of time
  is an error state really, and I need to go to the box and inspect it,
  which is precisely the reason we have ERROR states.
 
 What about placing a runlevel script in /etc/init.d/ and symlinking it 
 to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount 
 the state partition in that script which would ensure disk state was 
 quiesced, no?

That's already what OS's do in their rc0.d.

My point is, I don't have any way to know that process happened, without
the box turning itself off after it succeeded.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone

2014-08-23 Thread Clint Byrum
I don't know how Zaqar does its magic, but I'd love to see simple signed
URLs rather than users/passwords. This would work for Heat as well. That
way we only have to pass in a single predictably formatted string.

Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700:
 Here's an interesting fact about Zaqar (the project formerly known as 
 Marconi) that I hadn't thought about before this week: it's probably the 
 first OpenStack project where a major part of the API primarily faces 
 software running in the cloud rather than facing the user.
 
 That is to say, nobody is going to be sending themselves messages on 
 their laptop, from their laptop, via a cloud. At least one end of any 
 given queue is likely to be on a VM in the cloud.
 
 That makes me wonder: how does Zaqar authenticate users who are sending 
 and receiving messages (as opposed to setting up the queues in the first 
 place)? Presumably using Keystone, in which case it will run into a 
 problem we've been struggling with in Heat since the very early days.
 
 Keystone is generally a front end for an identity store with a 1:1 
 correspondence between users and actual natural persons. Only the 
 operator can add or remove accounts. This breaks down as soon as you 
 need to authenticate automated services running in the cloud - in 
 particular, you never ever want to store the credentials belonging to an 
 actual natural person in a server in the cloud.
 
 Heat has managed to work around this to some extent (for those running 
 the Keystone v3 API) by creating users in a separate domain and more or 
 less doing our own authorisation for them. However, this requires action 
 on the part of the operator, and isn't an option for the end user. I 
 guess Zaqar could do something similar and pass out sets of credentials 
 good only for reading and writing to queues (respectively), but it seems 
 like it would be better if the user could create the keystone accounts 
 and set their own access control rules on the queues.
 
 On AWS the very first thing a user does is create a bunch of IAM 
 accounts so that they virtually never have to use the credentials 
 associated with their natural person ever again. There are both user 
 accounts and service accounts - the latter IIUC have 
 automatically-rotating keys. Is there anything like this planned in 
 Keystone? Zaqar is likely only the first (I guess second, if you count 
 Heat) of many services that will need it.
 
 I have this irrational fear that somebody is going to tell me that this 
 issue is the reason for the hierarchical-multitenancy idea - fear 
 because that both sounds like it requires intrusive changes in every 
 OpenStack project and fails to solve the problem. I hope somebody will 
 disabuse me of that notion in 3... 2... 1...
 
 cheers,
 Zane.
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [ptls] The Czar system, or how to scale PTLs

2014-08-23 Thread Clint Byrum
Excerpts from Dolph Mathews's message of 2014-08-22 09:45:37 -0700:
 On Fri, Aug 22, 2014 at 11:32 AM, Zane Bitter zbit...@redhat.com wrote:
 
  On 22/08/14 11:19, Thierry Carrez wrote:
 
  Zane Bitter wrote:
 
  On 22/08/14 08:33, Thierry Carrez wrote:
 
  We also
  still need someone to have the final say in case of deadlocked issues.
 
 
  -1 we really don't.
 
 
  I know we disagree on that :)
 
 
  No problem, you and I work in different programs so we can both get our
  way ;)
 
 
   People say we don't have that many deadlocks in OpenStack for which the
  PTL ultimate power is needed, so we could get rid of them. I'd argue
  that the main reason we don't have that many deadlocks in OpenStack is
  precisely *because* we have a system to break them if they arise.
 
 
  s/that many/any/ IME and I think that threatening to break a deadlock by
  fiat is just as bad as actually doing it. And by 'bad' I mean
  community-poisoningly, trust-destroyingly bad.
 
 
  I guess I've been active in too many dysfunctional free and open source
  software projects -- I put a very high value on the ability to make a
  final decision. Not being able to make a decision is about as
  community-poisoning, and also results in inability to make any
  significant change or decision.
 
 
  I'm all for getting a final decision, but a 'final' decision that has been
  imposed from outside rather than internalised by the participants is...
  rarely final.
 
 
 The expectation of a PTL isn't to stomp around and make final decisions,
 it's to step in when necessary and help both sides find the best solution.
 To moderate.
 

Have we had many instances where a project's community divided into
two camps and dug in to the point where they actually needed active
moderation? And in those cases, was the PTL not already on one side of
said argument? I'd prefer specific examples here.

 
  I have yet to see a deadlock in Heat that wasn't resolved by better
  communication.
 
 
 Moderation == bettering communication. I'm under the impression that you
 and Thierry are agreeing here, just from opposite ends of the same spectrum.
 

I agree as well. PTL is a servant of the community, as any good leader
is. If the PTL feels they have to drop the hammer, or if an impass is
reached where they are asked to, it is because they have failed to get
everyone communicating effectively, not because that's their job.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] heat.conf.sample is not up to date

2014-08-24 Thread Clint Byrum
Guessing this is due to the new tox feature which randomizes python's
hash seed.

Excerpts from Mike Spreitzer's message of 2014-08-24 00:10:42 -0700:
 What is going on with this?  If I do a fresh clone of heat and run `tox 
 -epep8` then I get that complaint.  If I then run the recommended command 
 to fix it, and then `tox -epep8` again, I get the same complaint again --- 
 and with different differences exhibited!  The email below carries a 
 typescript showing this.
 
 What I really need to know is what to do when committing a change that 
 really does require a change in the sample configuration file.  Of course 
 I tried running generate_sample.sh, but `tox -epep8` still complains. What 
 is the right procedure to get a correct sample committed?  BTW, I am doing 
 the following admittedly risky thing: I run DevStack, and make my changes 
 in /opt/stack/heat/.
 
 Thanks,
 Mike
 
 - Forwarded by Mike Spreitzer/Watson/IBM on 08/24/2014 03:03 AM -
 
 From:   ubuntu@mjs-dstk-821a (Ubuntu)
 To: Mike Spreitzer/Watson/IBM@IBMUS, 
 Date:   08/24/2014 02:55 AM
 Subject:fresh flake fail
 
 
 
 ubuntu@mjs-dstk-821a:~/code$ git clone 
 git://git.openstack.org/openstack/heat.git
 Cloning into 'heat'...
 remote: Counting objects: 49690, done.
 remote: Compressing objects: 100% (19765/19765), done.
 remote: Total 49690 (delta 36660), reused 39014 (delta 26526)
 Receiving objects: 100% (49690/49690), 7.92 MiB | 7.29 MiB/s, done.
 Resolving deltas: 100% (36660/36660), done.
 Checking connectivity... done.
 ubuntu@mjs-dstk-821a:~/code$ cd heat
 ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8
 pep8 create: /home/ubuntu/code/heat/.tox/pep8
 pep8 installdeps: -r/home/ubuntu/code/heat/requirements.txt, 
 -r/home/ubuntu/code/heat/test-requirements.txt
 pep8 develop-inst: /home/ubuntu/code/heat
 pep8 runtests: PYTHONHASHSEED='0'
 pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn 
 bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib
 pep8 runtests: commands[1] | 
 /home/ubuntu/code/heat/tools/config/check_uptodate.sh
 --- /tmp/heat.ep2CBe/heat.conf.sample2014-08-24 06:52:54.16484 +
 +++ etc/heat/heat.conf.sample2014-08-24 06:48:13.66484 +
 @@ -164,7 +164,7 @@
  
 #allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions
  
  # Qpid broker hostname. (string value)
 -#qpid_hostname=heat
 +#qpid_hostname=localhost
  
  # Qpid broker port. (integer value)
  #qpid_port=5672
 @@ -221,7 +221,7 @@
  
  # The RabbitMQ broker address where a single node is used.
  # (string value)
 -#rabbit_host=heat
 +#rabbit_host=localhost
  
  # The RabbitMQ broker port where a single node is used.
  # (integer value)
 check_uptodate.sh: heat.conf.sample is not up to date.
 check_uptodate.sh: Please run 
 /home/ubuntu/code/heat/tools/config/generate_sample.sh.
 ERROR: InvocationError: 
 '/home/ubuntu/code/heat/tools/config/check_uptodate.sh'
 pep8 runtests: commands[2] | 
 /home/ubuntu/code/heat/tools/requirements_style_check.sh requirements.txt 
 test-requirements.txt
 pep8 runtests: commands[3] | bash -c find heat -type f -regex '.*\.pot?' 
 -print0|xargs -0 -n 1 msgfmt --check-format -o /dev/null
 ___ summary 
 
 ERROR:   pep8: commands failed
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ tools/config/generate_sample.sh
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ 
 ubuntu@mjs-dstk-821a:~/code/heat$ tox -epep8
 pep8 develop-inst-noop: /home/ubuntu/code/heat
 pep8 runtests: PYTHONHASHSEED='0'
 pep8 runtests: commands[0] | flake8 heat bin/heat-api bin/heat-api-cfn 
 bin/heat-api-cloudwatch bin/heat-engine bin/heat-manage contrib
 pep8 runtests: commands[1] | 
 /home/ubuntu/code/heat/tools/config/check_uptodate.sh
 --- /tmp/heat.DqIhK5/heat.conf.sample2014-08-24 06:54:34.62884 +
 +++ etc/heat/heat.conf.sample2014-08-24 06:53:51.54084 +
 @@ -159,10 +159,6 @@
  # Size of RPC connection pool. (integer value)
  #rpc_conn_pool_size=30
  
 -# Modules of exceptions that are permitted to be recreated
 -# upon receiving exception data from an rpc call. (list value)
 -#allowed_rpc_exception_modules=oslo.messaging.exceptions,nova.exception,cinder.exception,exceptions
 -
  # Qpid broker hostname. (string value)
  #qpid_hostname=heat
  
 @@ -301,15 +297,6 @@
  # Heartbeat time-to-live. (integer value)
  #matchmaker_heartbeat_ttl=600
  
 -# Host to locate redis. (string value)
 -#host=127.0.0.1
 -
 -# Use this port to connect to redis host. (integer value)
 -#port=6379
 -
 -# Password for Redis server (optional). (string value)
 -#password=None
 -
  # Size of RPC greenthread pool. (integer value)
  #rpc_thread_pool_size=64
  
 @@ -1229,6 +1216,22 @@
  #hash_algorithms=md5
  
  
 +[matchmaker_redis]
 +
 +#
 +# Options defined in oslo.messaging
 +#
 +
 +# 

Re: [openstack-dev] [qa][all][Heat] Packaging of functional tests

2014-08-26 Thread Clint Byrum
Excerpts from Steve Baker's message of 2014-08-26 14:25:46 -0700:
 On 27/08/14 03:18, David Kranz wrote:
  On 08/26/2014 10:14 AM, Zane Bitter wrote:
  Steve Baker has started the process of moving Heat tests out of the
  Tempest repository and into the Heat repository, and we're looking
  for some guidance on how they should be packaged in a consistent way.
  Apparently there are a few projects already packaging functional
  tests in the package projectname.tests.functional (alongside
  projectname.tests.unit for the unit tests).
 
  That strikes me as odd in our context, because while the unit tests
  run against the code in the package in which they are embedded, the
  functional tests run against some entirely different code - whatever
  OpenStack cloud you give it the auth URL and credentials for. So
  these tests run from the outside, just like their ancestors in
  Tempest do.
 
  There's all kinds of potential confusion here for users and
  packagers. None of it is fatal and all of it can be worked around,
  but if we refrain from doing the thing that makes zero conceptual
  sense then there will be no problem to work around :)
  Thanks, Zane. The point of moving functional tests to projects is to
  be able to run more of them
  in gate jobs for those projects, and allow tempest to survive being
  stretched-to-breaking horizontally as we scale to more projects. At
  the same time, there are benefits to the
  tempest-as-all-in-one-functional-and-integration-suite that we should
  try not to lose:
 
  1. Strong integration testing without thinking too hard about the
  actual dependencies
  2. Protection from mistaken or unwise api changes (tempest two-step
  required)
  3. Exportability as a complete blackbox functional test suite that can
  be used by Rally, RefStack, deployment validation, etc.
 
  I think (1) may be the most challenging because tests that are moved
  out of tempest might be testing some integration that is not being
  covered
  by a scenario. We will need to make sure that tempest actually has a
  complete enough set of tests to validate integration. Even if this is
  all implemented in a way where tempest can see in-project tests as
  plugins, there will still not be time to run them all as part of
  tempest on every commit to every project, so a selection will have to
  be made.
 
  (2) is quite difficult. In Atlanta we talked about taking a copy of
  functional tests into tempest for stable apis. I don't know how
  workable that is but don't see any other real options except vigilance
  in reviews of patches that change functional tests.
 
  (3) is what Zane was addressing. The in-project functional tests need
  to be written in a way that they can, at least in some configuration,
  run against a real cloud.
 
 
 
  I suspect from reading the previous thread about In-tree functional
  test vision that we may actually be dealing with three categories of
  test here rather than two:
 
  * Unit tests that run against the package they are embedded in
  * Functional tests that run against the package they are embedded in
  * Integration tests that run against a specified cloud
 
  i.e. the tests we are now trying to add to Heat might be
  qualitatively different from the projectname.tests.functional
  suites that already exist in a few projects. Perhaps someone from
  Neutron and/or Swift can confirm?
  That seems right, except that I would call the third functional
  tests and not integration tests, because the purpose is not really
  integration but deep testing of a particular service. Tempest would
  continue to focus on integration testing. Is there some controversy
  about that?
  The second category could include whitebox tests.
 
  I don't know about swift, but in neutron the intent was to have these
  tests be configurable to run against a real cloud, or not. Maru Newby
  would have details.
 
  I'd like to propose that tests of the third type get their own
  top-level package with a name of the form
  projectname-integrationtests (second choice: projectname-tempest
  on the principle that they're essentially plugins for Tempest). How
  would people feel about standardising that across OpenStack?
  +1 But I would not call it integrationtests for the reason given above.
 
 Because all heat does is interact with other services, what we call
 functional tests are actually integration tests. Sure, we could mock at
 the REST API level, but integration coverage is what we need most. This

I'd call that faking, not mocking, but both could apply.

 lets us verify things like:
 - how heat handles races in other services leading to resources going
 into ERROR

A fake that predictably fails (and thus tests failure handling) will
result in better coverage than a real service that only fails when that
real service is broken. What's frustrating is that _both_ are needed to
catch bugs.

 - connectivity and interaction between heat and agents on orchestrated
 servers
 

That is definitely 

Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
 On 27/08/14 11:04, Steven Hardy wrote:
  On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
  I am little bit skeptical about using Swift for this use case because 
  of
  its eventual consistency issue. I am not sure Swift cluster is good to 
  be
  used for this kind of problem. Please note that Swift cluster may give 
  you
  old data at some point of time.
 
  This is probably not a major problem, but it's certainly worth considering.
 
  My assumption is that the latency of making the replicas consistent will be
  small relative to the timeout for things like SoftwareDeployments, so all
  we need is to ensure that instances  eventually get the new data, act on
 
 That part is fine, but if they get the new data and then later get the 
 old data back again... that would not be so good.
 

Agreed, and I had not considered that this can happen.

There is a not-so-simple answer though:

* Heat inserts this as initial metadata:

{metadata: {}, update-url: xx, version: 0}

* Polling goes to update-url and ignores metadata = 0

* Polling finds new metadata in same format, and continues the loop
without talking to Heat

However, this makes me rethink why we are having performance problems.
MOST of the performance problems have two root causes:

* We parse the entire stack to show metadata, because we have to see if
  there are custom access controls defined in any of the resources used.
  I actually worked on a patch set to deprecate this part of the resource
  plugin API because it is impossible to scale this way.
* We rely on the engine to respond because of the parsing issue.

If however we could just push metadata into the db fully resolved
whenever things in the stack change, and cache the response in the API
using Last-Modified/Etag headers, I think we'd be less inclined to care
so much about swift for polling. However we are still left with the many
thousands of keystone users being created vs. thousands of swift tempurls.

That would also set us up nicely for very easy integration with Zaqar,
as metadata changes would flow naturally into the message queue for the
server through the same mechanism as they flow into the database.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2014-08-27 10:08:36 -0700:
 On Wed, Aug 27, 2014 at 09:40:31AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
   On 27/08/14 11:04, Steven Hardy wrote:
On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
I am little bit skeptical about using Swift for this use case 
because of
its eventual consistency issue. I am not sure Swift cluster is 
good to be
used for this kind of problem. Please note that Swift cluster may 
give you
old data at some point of time.
   
This is probably not a major problem, but it's certainly worth 
considering.
   
My assumption is that the latency of making the replicas consistent 
will be
small relative to the timeout for things like SoftwareDeployments, so 
all
we need is to ensure that instances  eventually get the new data, act on
   
   That part is fine, but if they get the new data and then later get the 
   old data back again... that would not be so good.
   
  
  Agreed, and I had not considered that this can happen.
  
  There is a not-so-simple answer though:
  
  * Heat inserts this as initial metadata:
  
  {metadata: {}, update-url: xx, version: 0}
  
  * Polling goes to update-url and ignores metadata = 0
  
  * Polling finds new metadata in same format, and continues the loop
  without talking to Heat
  
  However, this makes me rethink why we are having performance problems.
  MOST of the performance problems have two root causes:
  
  * We parse the entire stack to show metadata, because we have to see if
there are custom access controls defined in any of the resources used.
I actually worked on a patch set to deprecate this part of the resource
plugin API because it is impossible to scale this way.
  * We rely on the engine to respond because of the parsing issue.
  
  If however we could just push metadata into the db fully resolved
  whenever things in the stack change, and cache the response in the API
  using Last-Modified/Etag headers, I think we'd be less inclined to care
  so much about swift for polling. However we are still left with the many
  thousands of keystone users being created vs. thousands of swift tempurls.
 
 There's probably a few relatively simple optimisations we can do if the
 keystone user thing becomes the bottleneck:
 - Make the user an attribute of the stack and only create one per
   stack/tree-of-stacks
 - Make the user an attribute of each server resource (probably more secure
   but less optimal if your optimal is less keystone users).
 
 I don't think the many keystone users thing is actually a problem right now
 though, or is it?

1000 servers means 1000 keystone users to manage, and all of the tokens
and backend churn that implies.

It's not a problem, but it is quite a bit heavier than tempurls.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum
Excerpts from Thierry Carrez's message of 2014-08-27 05:51:55 -0700:
 Hi everyone,
 
 I've been thinking about what changes we can bring to the Design Summit
 format to make it more productive. I've heard the feedback from the
 mid-cycle meetups and would like to apply some of those ideas for Paris,
 within the constraints we have (already booked space and time). Here is
 something we could do:
 
 Day 1. Cross-project sessions / incubated projects / other projects
 
 I think that worked well last time. 3 parallel rooms where we can
 address top cross-project questions, discuss the results of the various
 experiments we conducted during juno. Don't hesitate to schedule 2 slots
 for discussions, so that we have time to come to the bottom of those
 issues. Incubated projects (and maybe other projects, if space allows)
 occupy the remaining space on day 1, and could occupy pods on the
 other days.
 

I like it. The only thing I would add is that it would be quite useful if
the use of pods were at least partially enhanced by an unconference style
interest list.  What I mean is, on day 1 have people suggest topics and
vote on suggested topics to discuss at the pods, and from then on the pods
can host these topics. This is for the other things that aren't well
defined until the summit and don't have their own rooms for days 2 and 3.

This is driven by the fact that the pods in Atlanta were almost always
busy doing something other than whatever the track that owned them
wanted. A few projects pods grew to 30-40 people a few times, eating up
all the chairs for the surrounding pods. TripleO often sat at the Heat
pod because of this for instance.

I don't think they should be fully scheduled. They're also just great
places to gather and have a good discussion, but it would be useful to
plan for topic flexibility and help coalesce interested parties, rather
than have them be silos that get taken over randomly. Especially since
there is a temptation to push the other topics to them already.

 Day 2 and Day 3. Scheduled sessions for various programs
 
 That's our traditional scheduled space. We'll have a 33% less slots
 available. So, rather than trying to cover all the scope, the idea would
 be to focus those sessions on specific issues which really require
 face-to-face discussion (which can't be solved on the ML or using spec
 discussion) *or* require a lot of user feedback. That way, appearing in
 the general schedule is very helpful. This will require us to be a lot
 stricter on what we accept there and what we don't -- we won't have
 space for courtesy sessions anymore, and traditional/unnecessary
 sessions (like my traditional release schedule one) should just move
 to the mailing-list.
 
 Day 4. Contributors meetups
 
 On the last day, we could try to split the space so that we can conduct
 parallel midcycle-meetup-like contributors gatherings, with no time
 boundaries and an open agenda. Large projects could get a full day,
 smaller projects would get half a day (but could continue the discussion
 in a local bar). Ideally that meetup would end with some alignment on
 release goals, but the idea is to make the best of that time together to
 solve the issues you have. Friday would finish with the design summit
 feedback session, for those who are still around.
 

Love this. Please if we can also fully enclose these meetups and the
session rooms in dry erase boards that would be ideal.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum
Excerpts from Sean Dague's message of 2014-08-27 06:26:38 -0700:
 On 08/27/2014 08:51 AM, Thierry Carrez wrote:
  Hi everyone,
  
  I've been thinking about what changes we can bring to the Design Summit
  format to make it more productive. I've heard the feedback from the
  mid-cycle meetups and would like to apply some of those ideas for Paris,
  within the constraints we have (already booked space and time). Here is
  something we could do:
  
  Day 1. Cross-project sessions / incubated projects / other projects
  
  I think that worked well last time. 3 parallel rooms where we can
  address top cross-project questions, discuss the results of the various
  experiments we conducted during juno. Don't hesitate to schedule 2 slots
  for discussions, so that we have time to come to the bottom of those
  issues. Incubated projects (and maybe other projects, if space allows)
  occupy the remaining space on day 1, and could occupy pods on the
  other days.
  
  Day 2 and Day 3. Scheduled sessions for various programs
  
  That's our traditional scheduled space. We'll have a 33% less slots
  available. So, rather than trying to cover all the scope, the idea would
  be to focus those sessions on specific issues which really require
  face-to-face discussion (which can't be solved on the ML or using spec
  discussion) *or* require a lot of user feedback. That way, appearing in
  the general schedule is very helpful. This will require us to be a lot
  stricter on what we accept there and what we don't -- we won't have
  space for courtesy sessions anymore, and traditional/unnecessary
  sessions (like my traditional release schedule one) should just move
  to the mailing-list.
  
  Day 4. Contributors meetups
  
  On the last day, we could try to split the space so that we can conduct
  parallel midcycle-meetup-like contributors gatherings, with no time
  boundaries and an open agenda. Large projects could get a full day,
  smaller projects would get half a day (but could continue the discussion
  in a local bar). Ideally that meetup would end with some alignment on
  release goals, but the idea is to make the best of that time together to
  solve the issues you have. Friday would finish with the design summit
  feedback session, for those who are still around.
  
  
  I think this proposal makes the best use of our setup: discuss clear
  cross-project issues, address key specific topics which need
  face-to-face time and broader attendance, then try to replicate the
  success of midcycle meetup-like open unscheduled time to discuss
  whatever is hot at this point.
  
  There are still details to work out (is it possible split the space,
  should we use the usual design summit CFP website to organize the
  scheduled time...), but I would first like to have your feedback on
  this format. Also if you have alternative proposals that would make a
  better use of our 4 days, let me know.
 
 I definitely like this approach. I think it will be really interesting
 to collect feedback from people about the value they got from days 2  3
 vs. Day 4.
 
 I also wonder if we should lose a slot from days 1 - 3 and expand the
 hallway time. Hallway track is always pretty interesting, and honestly
 at a lot of interesting ideas spring up. The 10 minute transitions often
 seem to feel like you are rushing between places too quickly some times.

Yes please. I'd also be fine with just giving back 5 minutes from each
session to facilitate this.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Design Summit reloaded

2014-08-27 Thread Clint Byrum
Excerpts from Anita Kuno's message of 2014-08-27 13:48:25 -0700:
 On 08/27/2014 02:46 PM, John Griffith wrote:
  On Wed, Aug 27, 2014 at 9:25 AM, Flavio Percoco fla...@redhat.com wrote:
  
  On 08/27/2014 03:26 PM, Sean Dague wrote:
  On 08/27/2014 08:51 AM, Thierry Carrez wrote:
  Hi everyone,
 
  I've been thinking about what changes we can bring to the Design Summit
  format to make it more productive. I've heard the feedback from the
  mid-cycle meetups and would like to apply some of those ideas for Paris,
  within the constraints we have (already booked space and time). Here is
  something we could do:
 
  Day 1. Cross-project sessions / incubated projects / other projects
 
  I think that worked well last time. 3 parallel rooms where we can
  address top cross-project questions, discuss the results of the various
  experiments we conducted during juno. Don't hesitate to schedule 2 slots
  for discussions, so that we have time to come to the bottom of those
  issues. Incubated projects (and maybe other projects, if space allows)
  occupy the remaining space on day 1, and could occupy pods on the
  other days.
 
  Day 2 and Day 3. Scheduled sessions for various programs
 
  That's our traditional scheduled space. We'll have a 33% less slots
  available. So, rather than trying to cover all the scope, the idea would
  be to focus those sessions on specific issues which really require
  face-to-face discussion (which can't be solved on the ML or using spec
  discussion) *or* require a lot of user feedback. That way, appearing in
  the general schedule is very helpful. This will require us to be a lot
  stricter on what we accept there and what we don't -- we won't have
  space for courtesy sessions anymore, and traditional/unnecessary
  sessions (like my traditional release schedule one) should just move
  to the mailing-list.
 
  Day 4. Contributors meetups
 
  On the last day, we could try to split the space so that we can conduct
  parallel midcycle-meetup-like contributors gatherings, with no time
  boundaries and an open agenda. Large projects could get a full day,
  smaller projects would get half a day (but could continue the discussion
  in a local bar). Ideally that meetup would end with some alignment on
  release goals, but the idea is to make the best of that time together to
  solve the issues you have. Friday would finish with the design summit
  feedback session, for those who are still around.
 
 
  I think this proposal makes the best use of our setup: discuss clear
  cross-project issues, address key specific topics which need
  face-to-face time and broader attendance, then try to replicate the
  success of midcycle meetup-like open unscheduled time to discuss
  whatever is hot at this point.
 
  There are still details to work out (is it possible split the space,
  should we use the usual design summit CFP website to organize the
  scheduled time...), but I would first like to have your feedback on
  this format. Also if you have alternative proposals that would make a
  better use of our 4 days, let me know.
 
  I definitely like this approach. I think it will be really interesting
  to collect feedback from people about the value they got from days 2  3
  vs. Day 4.
 
  I also wonder if we should lose a slot from days 1 - 3 and expand the
  hallway time. Hallway track is always pretty interesting, and honestly
  at a lot of interesting ideas spring up. The 10 minute transitions often
  seem to feel like you are rushing between places too quickly some times.
 
  +1
 
  Last summit, it was basically impossible to do any hallway talking and
  even meet some folks face-2-face.
 
  Other than that, I think the proposal is great and makes sense to me.
 
  Flavio
 
  --
  @flaper87
  Flavio Percoco
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
  ​Sounds like a great idea to me:
  +1​
  
  
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
 I think this is a great direction.
 
 Here is my dilemma and it might just affect me. I attended 3 mid-cycles
 this release: one of Neutron's (there were 2), QA/Infra and Cinder. The
 Neutron and Cinder ones were mostly in pursuit of figuring out third
 party and exchanging information surrounding that (which I feel was
 successful). The QA/Infra one was, well even though I feel like I have
 been awol, I still consider this my home.
 
 From my perspective and check with Neutron and Cinder to see if they
 agree, but having at least one person from qa/infra at a mid-cycle helps
 in small ways. At both I worked with folks to help them make more
 efficient use of their review time by exploring gerrit queries (there
 were people who didn't know this magic, nor did they think to ask 

Re: [openstack-dev] [Keystone][Marconi][Heat] Creating accounts in Keystone

2014-08-27 Thread Clint Byrum
Excerpts from Adam Young's message of 2014-08-24 20:17:34 -0700:
 On 08/23/2014 02:01 AM, Clint Byrum wrote:
  I don't know how Zaqar does its magic, but I'd love to see simple signed
  URLs rather than users/passwords. This would work for Heat as well. That
  way we only have to pass in a single predictably formatted string.
 
  Excerpts from Zane Bitter's message of 2014-08-22 14:35:38 -0700:
  Here's an interesting fact about Zaqar (the project formerly known as
  Marconi) that I hadn't thought about before this week: it's probably the
  first OpenStack project where a major part of the API primarily faces
 
 
 
 Nah, this is the direction we are headed.  Service users (out of LDAP!)  are 
 going to be the norm with a recent feature add to Keytone:
 
 
 http://adam.younglogic.com/2014/08/getting-service-users-out-of-ldap/
 

This complicates the case by requiring me to get tokens and present
them, to cache them, etc. I just want to fetch and/or send messages.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Spam] Re: [Openstack][TripleO] [Ironic] What if undercloud machines down, can we reboot overcloud machines?

2014-08-28 Thread Clint Byrum
Excerpts from Jyoti Ranjan's message of 2014-08-27 21:20:19 -0700:
 I do agree but it create an extra requirement for Undercloud if we high
 availability is important criteria. Because of this, undercloud has to be
 there 24x7, 365 days and to make it available we need to have HA for this
 also. So, you indirectly mean that undercloud also should be designed
 keeping high availability in mind.

I'm worried that you may be overstating the needs of a typical cloud.

The undercloud needs to be able to reach a state of availability when
you need to boot boxes. Even if you are doing CD and _constantly_
rebooting boxes, you can take your undercloud down for an hour, as long
as it can be brought back up for emergencies.

However, Ironic has already been designed this way. I believe that
Ironic has a nice dynamic hash ring of server ownership, and if you
mark a conductor down, the other conductors will assume ownership of
the machines that it was holding. So the path to making this HA is
basically add one more undercloud server.

Ironic experts, please tell me this is true, and not just something I
inserted into my own distorted version of reality to help me sleep at
night.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-04 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
 Greetings,
 
 Last Tuesday the TC held the first graduation review for Zaqar. During
 the meeting some concerns arose. I've listed those concerns below with
 some comments hoping that it will help starting a discussion before the
 next meeting. In addition, I've added some comments about the project
 stability at the bottom and an etherpad link pointing to a list of use
 cases for Zaqar.
 

Hi Flavio. This was an interesting read. As somebody whose attention has
recently been drawn to Zaqar, I am quite interested in seeing it
graduate.

 # Concerns
 
 - Concern on operational burden of requiring NoSQL deploy expertise to
 the mix of openstack operational skills
 
 For those of you not familiar with Zaqar, it currently supports 2 nosql
 drivers - MongoDB and Redis - and those are the only 2 drivers it
 supports for now. This will require operators willing to use Zaqar to
 maintain a new (?) NoSQL technology in their system. Before expressing
 our thoughts on this matter, let me say that:
 
 1. By removing the SQLAlchemy driver, we basically removed the chance
 for operators to use an already deployed OpenStack-technology
 2. Zaqar won't be backed by any AMQP based messaging technology for
 now. Here's[0] a summary of the research the team (mostly done by
 Victoria) did during Juno
 3. We (OpenStack) used to require Redis for the zmq matchmaker
 4. We (OpenStack) also use memcached for caching and as the oslo
 caching lib becomes available - or a wrapper on top of dogpile.cache -
 Redis may be used in place of memcached in more and more deployments.
 5. Ceilometer's recommended storage driver is still MongoDB, although
 Ceilometer has now support for sqlalchemy. (Please correct me if I'm wrong).
 
 That being said, it's obvious we already, to some extent, promote some
 NoSQL technologies. However, for the sake of the discussion, lets assume
 we don't.
 
 I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
 keep avoiding these technologies. NoSQL technologies have been around
 for years and we should be prepared - including OpenStack operators - to
 support these technologies. Not every tool is good for all tasks - one
 of the reasons we removed the sqlalchemy driver in the first place -
 therefore it's impossible to keep an homogeneous environment for all
 services.
 

I whole heartedly agree that non traditional storage technologies that
are becoming mainstream are good candidates for use cases where SQL
based storage gets in the way. I wish there wasn't so much FUD
(warranted or not) about MongoDB, but that is the reality we live in.

 With this, I'm not suggesting to ignore the risks and the extra burden
 this adds but, instead of attempting to avoid it completely by not
 evolving the stack of services we provide, we should probably work on
 defining a reasonable subset of NoSQL services we are OK with
 supporting. This will help making the burden smaller and it'll give
 operators the option to choose.
 
 [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
 
 
 - Concern on should we really reinvent a queue system rather than
 piggyback on one
 
 As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
 brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
 flavor on top. [0]
 

I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
trying to connect two processes in real time. You're trying to do fully
asynchronous messaging with fully randomized access to any message.

Perhaps somebody should explore whether the approaches taken by large
scale IMAP providers could be applied to Zaqar.

Anyway, I can't imagine writing a system to intentionally use the
semantics of IMAP and SMTP. I'd be very interested in seeing actual use
cases for it, apologies if those have been posted before.

 Some things that differentiate Zaqar from SQS is it's capability for
 supporting different protocols without sacrificing multi-tenantcy and
 other intrinsic features it provides. Some protocols you may consider
 for Zaqar are: STOMP, MQTT.
 
 As far as the backend goes, Zaqar is not re-inventing it either. It sits
 on top of existing storage technologies that have proven to be fast and
 reliable for this task. The choice of using NoSQL technologies has a lot
 to do with this particular thing and the fact that Zaqar needs a storage
 capable of scaling, replicating and good support for failover.
 

What's odd to me is that other systems like Cassandra and Riak are not
being discussed. There are well documented large scale message storage
systems on both, and neither is encumbered by the same licensing FUD
as MongoDB.

Anyway, again if we look at this as a place to storage and retrieve
messages, and not as a queue, then talking about databases, instead of
message brokers, makes a lot more sense.

 
 - concern on the maturity of the NoQSL not AGPL backend (Redis)
 

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-04 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-04 06:01:45 -0700:
 On 09/04/2014 02:14 PM, Sean Dague wrote:
  On 09/04/2014 03:08 AM, Flavio Percoco wrote:
  Greetings,
 
  Last Tuesday the TC held the first graduation review for Zaqar. During
  the meeting some concerns arose. I've listed those concerns below with
  some comments hoping that it will help starting a discussion before the
  next meeting. In addition, I've added some comments about the project
  stability at the bottom and an etherpad link pointing to a list of use
  cases for Zaqar.
 
  # Concerns
 
  - Concern on operational burden of requiring NoSQL deploy expertise to
  the mix of openstack operational skills
 
  For those of you not familiar with Zaqar, it currently supports 2 nosql
  drivers - MongoDB and Redis - and those are the only 2 drivers it
  supports for now. This will require operators willing to use Zaqar to
  maintain a new (?) NoSQL technology in their system. Before expressing
  our thoughts on this matter, let me say that:
 
  1. By removing the SQLAlchemy driver, we basically removed the chance
  for operators to use an already deployed OpenStack-technology
  2. Zaqar won't be backed by any AMQP based messaging technology for
  now. Here's[0] a summary of the research the team (mostly done by
  Victoria) did during Juno
  3. We (OpenStack) used to require Redis for the zmq matchmaker
  4. We (OpenStack) also use memcached for caching and as the oslo
  caching lib becomes available - or a wrapper on top of dogpile.cache -
  Redis may be used in place of memcached in more and more deployments.
  5. Ceilometer's recommended storage driver is still MongoDB, although
  Ceilometer has now support for sqlalchemy. (Please correct me if I'm 
  wrong).
 
  That being said, it's obvious we already, to some extent, promote some
  NoSQL technologies. However, for the sake of the discussion, lets assume
  we don't.
 
  I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
  keep avoiding these technologies. NoSQL technologies have been around
  for years and we should be prepared - including OpenStack operators - to
  support these technologies. Not every tool is good for all tasks - one
  of the reasons we removed the sqlalchemy driver in the first place -
  therefore it's impossible to keep an homogeneous environment for all
  services.
 
  With this, I'm not suggesting to ignore the risks and the extra burden
  this adds but, instead of attempting to avoid it completely by not
  evolving the stack of services we provide, we should probably work on
  defining a reasonable subset of NoSQL services we are OK with
  supporting. This will help making the burden smaller and it'll give
  operators the option to choose.
 
  [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
  
  I've been one of the consistent voices concerned about a hard
  requirement on adding NoSQL into the mix. So I'll explain that thinking
  a bit more.
  
  I feel like when the TC makes an integration decision previously this
  has been about evaluating the project applying for integration, and if
  they met some specific criteria they were told about some time in the
  past. I think that's the wrong approach. It's a locally optimized
  approach that fails to ask the more interesting question.
  
  Is OpenStack better as a whole if this is a mandatory component of
  OpenStack? Better being defined as technically better (more features,
  less janky code work arounds, less unexpected behavior from the stack).
  Better from the sense of easier or harder to run an actual cloud by our
  Operators (taking into account what kinds of moving parts they are now
  expected to manage). Better from the sense of a better user experience
  in interacting with OpenStack as whole. Better from a sense that the
  OpenStack release will experience less bugs, less unexpected cross
  project interactions, an a greater overall feel of consistency so that
  the OpenStack API feels like one thing.
  
  https://dague.net/2014/08/26/openstack-as-layers/
  
  One of the interesting qualities of Layers 1  2 is they all follow an
  AMQP + RDBMS pattern (excepting swift). You can have a very effective
  IaaS out of that stack. They are the things that you can provide pretty
  solid integration testing on (and if you look at where everything stood
  before the new TC mandates on testing / upgrade that was basically what
  was getting integration tested). (Also note, I'll accept Barbican is
  probably in the wrong layer, and should be a Layer 2 service.)
  
  While large shops can afford to have a dedicated team to figure out how
  to make mongo or redis HA, provide monitoring, have a DR plan for when a
  huricane requires them to flip datacenters, that basically means
  OpenStack heads further down the path of only for the big folks. I
  don't want OpenStack to be only for the big folks, I want OpenStack to
  be for all sized folks. I really do 

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-04 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-04 02:11:15 -0700:
 Hey Clint,
 
 Thanks for reading, some comments in-line:
 
 On 09/04/2014 10:30 AM, Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
 
 [snip]
 
  - Concern on should we really reinvent a queue system rather than
  piggyback on one
 
  As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
  brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
  flavor on top. [0]
 
  
  I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
  trying to connect two processes in real time. You're trying to do fully
  asynchronous messaging with fully randomized access to any message.
  
  Perhaps somebody should explore whether the approaches taken by large
  scale IMAP providers could be applied to Zaqar.
  
  Anyway, I can't imagine writing a system to intentionally use the
  semantics of IMAP and SMTP. I'd be very interested in seeing actual use
  cases for it, apologies if those have been posted before.
  
  Some things that differentiate Zaqar from SQS is it's capability for
  supporting different protocols without sacrificing multi-tenantcy and
  other intrinsic features it provides. Some protocols you may consider
  for Zaqar are: STOMP, MQTT.
 
  As far as the backend goes, Zaqar is not re-inventing it either. It sits
  on top of existing storage technologies that have proven to be fast and
  reliable for this task. The choice of using NoSQL technologies has a lot
  to do with this particular thing and the fact that Zaqar needs a storage
  capable of scaling, replicating and good support for failover.
 
  
  What's odd to me is that other systems like Cassandra and Riak are not
  being discussed. There are well documented large scale message storage
  systems on both, and neither is encumbered by the same licensing FUD
  as MongoDB.
 
 FWIW, they both have been discussed. As far as Cassandra goes, we raised
 the red flag after reading reading this post[0]. The post itself may be
 obsolete already but I don't think I have enough knowledge about
 Cassandra to actually figure this out. Some folks have come to us asking
 for a Cassandra driver and they were interested in contributing/working
 on one. I really hope that will happen someday, although it'll certainly
 happen as an external driver. Riak, on the other hand, was certainly a
 good candidate. What made us go with MongoDB and Redis is they're both
 good for the job, they are both likely already deployed in OpenStack
 clouds and we have enough knowledge to provide support and maintenance
 for both drivers.

It seems like Cassandra is good for when you're going to be writing all
the time but only reading once. I would agree that this makes it less
attractive for a generalized messaging platform, since you won't know
how users will consume the messages, and if they are constantly
reading then you'll have terrible performance.

  # Use Cases
 
  In addition to the aforementioned concerns and comments, I also would
  like to share an etherpad that contains some use cases that other
  integrated projects have for Zaqar[0]. The list is not exhaustive and
  it'll contain more information before the next meeting.
 
  [0] https://etherpad.openstack.org/p/zaqar-integrated-projects-use-cases
 
  
  Just taking a look, there are two basic applications needed:
  
  1) An inbox. Horizon wants to know when snapshots are done. Heat wants
  to know what happened during a stack action. Etc.
  
  2) A user-focused message queue. Heat wants to push data to agents.
  Swift wants to synchronize processes when things happen.
  
  To me, #1 is Zaqar as it is today. #2 is the one that I worry may not
  be served best by bending #1 onto it.
 
 Push semantics are being developed. We've had enough discussions that
 have helped preparing the ground for it. However, I believe both use
 cases could be covered by Zaqar as-is.
 
 Could you elaborate a bit more on #2? Especially on why you think Zaqar
 as is can't serve this specific case?

The difference between 1 and 2 is that 2 is a true queueing problem. The
message should go away when it has been consumed, and the volume may be
rather high. With 1, you have a storage problem, and a database makes a
lot more sense. If users can stick to type 2 problems, they'll be able
to stay much more lightweight because they won't need a large data store
that supports random access.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] memory usage in devstack-gate (the oom-killer strikes again)

2014-09-08 Thread Clint Byrum
Excerpts from Joe Gordon's message of 2014-09-08 15:24:29 -0700:
 Hi All,
 
 We have recently started seeing assorted memory issues in the gate
 including the oom-killer [0] and libvirt throwing memory errors [1].
 Luckily we run ps and dstat on every devstack run so we have some insight
 into why we are running out of memory. Based on the output from job taken
 at random [2][3] a typical run consists of:
 
 * 68 openstack api processes alone
 * the following services are running 8 processes (number of CPUs on test
 nodes)
   * nova-api (we actually run 24 of these, 8 compute, 8 EC2, 8 metadata)
   * nova-conductor
   * cinder-api
   * glance-api
   * trove-api
   * glance-registry
   * trove-conductor
 * together nova-api, nova-conductor, cinder-api alone take over 45 %MEM
 (note: some of that is memory usage is counted multiple times as RSS
 includes shared libraries)
 * based on dstat numbers, it looks like we don't use that much memory
 before tempest runs, and after tempest runs we use a lot of memory.
 
 Based on this information I have two categories of questions:
 
 1) Should we explicitly set the number of workers that services use in
 devstack? Why have so many workers in a small all-in-one environment? What
 is the right balance here?

I'm kind of wondering why we aren't pushing everything to go the same
direction keystone did with apache. I may be crazy but apache gives us
all kinds of tools to tune around process forking that we'll have to
reinvent in our own daemon bits (like MaxRequestsPerChild to prevent
leaky or slow GC from eating all our memory over time).

Meanwhile, the idea on running api processes with ncpu is that we don't
want to block an API request if there is a CPU available to it. Of
course if we have enough cinder, nova, keystone, trove, etc. requests
all at one time that we do need to block, we defer to the CPU scheduler
of the box to do it, rather than queue things up at the event level.
This can lead to quite ugly CPU starvation issues, and that is a lot
easier to tune for if you have one tuning knob for apache + mod_wsgi
instead of nservices.

In production systems I'd hope that memory would be quite a bit more
available than on the bazillions of cloud instances that run tests. So,
while process-per-cpu-per-service is a large percentage of 8G, it is
a very small percentage of 24G+, which is a pretty normal amount of
memory to have on an all-in-one type of server that one might choose
as a baremetal controller. For VMs that are handling production loads,
It's a pretty easy trade-off to give them a little more RAM so they can
take advantage of all the CPU's as needed.

All this to say, since devstack is always expected to be run in a dev
context, and not production, I think it would make sense to dial it
back to 4 from ncpu.

 
 2) Should we be worried that some OpenStack services such as nova-api,
 nova-conductor and cinder-api take up so much memory? Does there memory
 usage keep growing over time, does anyone have any numbers to answer this?
 Why do these processes take up so much memory?

Yes I do think we should be worried that they grow quite a bit. I've
experienced this problem a few times in a few scripting languages, and
almost every time it turned out to be too much data being read from
the database or MQ. Moving to tighter messages, and tighter database
interaction, nearly always results in less wasted RAM.

I like the other suggestion to start graphing this. Since we have all
that dstat data, I wonder if we can just process that directly into
graphite.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Clint Byrum
Excerpts from Samuel Merritt's message of 2014-09-09 16:12:09 -0700:
 On 9/9/14, 12:03 PM, Monty Taylor wrote:
  On 09/04/2014 01:30 AM, Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
  Greetings,
 
  Last Tuesday the TC held the first graduation review for Zaqar. During
  the meeting some concerns arose. I've listed those concerns below with
  some comments hoping that it will help starting a discussion before the
  next meeting. In addition, I've added some comments about the project
  stability at the bottom and an etherpad link pointing to a list of use
  cases for Zaqar.
 
 
  Hi Flavio. This was an interesting read. As somebody whose attention has
  recently been drawn to Zaqar, I am quite interested in seeing it
  graduate.
 
  # Concerns
 
  - Concern on operational burden of requiring NoSQL deploy expertise to
  the mix of openstack operational skills
 
  For those of you not familiar with Zaqar, it currently supports 2 nosql
  drivers - MongoDB and Redis - and those are the only 2 drivers it
  supports for now. This will require operators willing to use Zaqar to
  maintain a new (?) NoSQL technology in their system. Before expressing
  our thoughts on this matter, let me say that:
 
   1. By removing the SQLAlchemy driver, we basically removed the
  chance
  for operators to use an already deployed OpenStack-technology
   2. Zaqar won't be backed by any AMQP based messaging technology for
  now. Here's[0] a summary of the research the team (mostly done by
  Victoria) did during Juno
   3. We (OpenStack) used to require Redis for the zmq matchmaker
   4. We (OpenStack) also use memcached for caching and as the oslo
  caching lib becomes available - or a wrapper on top of dogpile.cache -
  Redis may be used in place of memcached in more and more deployments.
   5. Ceilometer's recommended storage driver is still MongoDB,
  although
  Ceilometer has now support for sqlalchemy. (Please correct me if I'm
  wrong).
 
  That being said, it's obvious we already, to some extent, promote some
  NoSQL technologies. However, for the sake of the discussion, lets assume
  we don't.
 
  I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
  keep avoiding these technologies. NoSQL technologies have been around
  for years and we should be prepared - including OpenStack operators - to
  support these technologies. Not every tool is good for all tasks - one
  of the reasons we removed the sqlalchemy driver in the first place -
  therefore it's impossible to keep an homogeneous environment for all
  services.
 
 
  I whole heartedly agree that non traditional storage technologies that
  are becoming mainstream are good candidates for use cases where SQL
  based storage gets in the way. I wish there wasn't so much FUD
  (warranted or not) about MongoDB, but that is the reality we live in.
 
  With this, I'm not suggesting to ignore the risks and the extra burden
  this adds but, instead of attempting to avoid it completely by not
  evolving the stack of services we provide, we should probably work on
  defining a reasonable subset of NoSQL services we are OK with
  supporting. This will help making the burden smaller and it'll give
  operators the option to choose.
 
  [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
 
 
  - Concern on should we really reinvent a queue system rather than
  piggyback on one
 
  As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
  brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
  flavor on top. [0]
 
 
  I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
  trying to connect two processes in real time. You're trying to do fully
  asynchronous messaging with fully randomized access to any message.
 
  Perhaps somebody should explore whether the approaches taken by large
  scale IMAP providers could be applied to Zaqar.
 
  Anyway, I can't imagine writing a system to intentionally use the
  semantics of IMAP and SMTP. I'd be very interested in seeing actual use
  cases for it, apologies if those have been posted before.
 
  It seems like you're EITHER describing something called XMPP that has at
  least one open source scalable backend called ejabberd. OR, you've
  actually hit the nail on the head with bringing up SMTP and IMAP but for
  some reason that feels strange.
 
  SMTP and IMAP already implement every feature you've described, as well
  as retries/failover/HA and a fully end to end secure transport (if
  installed properly) If you don't actually set them up to run as a public
  messaging interface but just as a cloud-local exchange, then you could
  get by with very low overhead for a massive throughput - it can very
  easily be run on a single machine for Sean's simplicity, and could just
  as easily be scaled out using well known techniques for public cloud
  sized deployments?
 
  So why not use existing daemons

Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Clint Byrum
Excerpts from Devananda van der Veen's message of 2014-09-09 16:47:27 -0700:
 On Tue, Sep 9, 2014 at 4:12 PM, Samuel Merritt s...@swiftstack.com wrote:
  On 9/9/14, 12:03 PM, Monty Taylor wrote:
 [snip]
  So which is it? Because it sounds like to me it's a thing that actually
  does NOT need to diverge in technology in any way, but that I've been
  told that it needs to diverge because it's delivering a different set of
  features - and I'm pretty sure if it _is_ the thing that needs to
  diverge in technology because of its feature set, then it's a thing I
  don't think we should be implementing in python in OpenStack because it
  already exists and it's called AMQP.
 
 
  Whether Zaqar is more like AMQP or more like email is a really strange
  metric to use for considering its inclusion.
 
 
 I don't find this strange at all -- I had been judging the technical
 merits of Zaqar (ex-Marconi) for the last ~18 months based on the
 understanding that it aimed to provide Queueing-as-a-Service, and
 found its delivery of that to be lacking on technical grounds. The
 implementation did not meet my view of what a queue service should
 provide; it is based on some serious antipatterns (storing a queue in
 an RDBMS is probably the most obvious); and in fact, it isn't even
 queue-like in the access patterns enabled by the REST API (random
 access to a set != a queue). That was the basis for a large part of my
 objections to the project over time, and a source of frustration for
 me as the developers justified many of their positions rather than
 accepted feedback and changed course during the incubation period. The
 reason for this seems clear now...
 
 As was pointed out in the TC meeting today, Zaqar is (was?) actually
 aiming to provide Messaging-as-a-Service -- not queueing as a service!
 This is another way of saying it's more like email and less like
 AMQP, which means my but-its-not-a-queue objection to the project's
 graduation is irrelevant, and I need to rethink about all my previous
 assessments of the project.

Well said.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-10 Thread Clint Byrum
Excerpts from Samuel Merritt's message of 2014-09-09 19:04:58 -0700:
 On 9/9/14, 4:47 PM, Devananda van der Veen wrote:
  On Tue, Sep 9, 2014 at 4:12 PM, Samuel Merritt s...@swiftstack.com wrote:
  On 9/9/14, 12:03 PM, Monty Taylor wrote:
  [snip]
  So which is it? Because it sounds like to me it's a thing that actually
  does NOT need to diverge in technology in any way, but that I've been
  told that it needs to diverge because it's delivering a different set of
  features - and I'm pretty sure if it _is_ the thing that needs to
  diverge in technology because of its feature set, then it's a thing I
  don't think we should be implementing in python in OpenStack because it
  already exists and it's called AMQP.
 
 
  Whether Zaqar is more like AMQP or more like email is a really strange
  metric to use for considering its inclusion.
 
 
  I don't find this strange at all -- I had been judging the technical
  merits of Zaqar (ex-Marconi) for the last ~18 months based on the
  understanding that it aimed to provide Queueing-as-a-Service, and
  found its delivery of that to be lacking on technical grounds. The
  implementation did not meet my view of what a queue service should
  provide; it is based on some serious antipatterns (storing a queue in
  an RDBMS is probably the most obvious); and in fact, it isn't even
  queue-like in the access patterns enabled by the REST API (random
  access to a set != a queue). That was the basis for a large part of my
  objections to the project over time, and a source of frustration for
  me as the developers justified many of their positions rather than
  accepted feedback and changed course during the incubation period. The
  reason for this seems clear now...
 
  As was pointed out in the TC meeting today, Zaqar is (was?) actually
  aiming to provide Messaging-as-a-Service -- not queueing as a service!
  This is another way of saying it's more like email and less like
  AMQP, which means my but-its-not-a-queue objection to the project's
  graduation is irrelevant, and I need to rethink about all my previous
  assessments of the project.
 
  The questions now before us are:
  - should OpenStack include, in the integrated release, a
  messaging-as-a-service component?
 
 I certainly think so. I've worked on a few reasonable-scale web 
 applications, and they all followed the same pattern: HTTP app servers 
 serving requests quickly, background workers for long-running tasks, and 
 some sort of durable message-broker/queue-server thing for conveying 
 work from the first to the second.
 
 A quick straw poll of my nearby coworkers shows that every non-trivial 
 web application that they've worked on in the last decade follows the 
 same pattern.
 
 While not *every* application needs such a thing, web apps are quite 
 common these days, and Zaqar satisfies one of their big requirements. 
 Not only that, it does so in a way that requires much less babysitting 
 than run-your-own-broker does.
 

I think you missed the distinction.

What you describe is _message queueing_. Not messaging. The difference
being the durability and addressability of each message.

As Devananda pointed out, a queue doesn't allow addressing the items in
the queue directly. You can generally only send, receive, ACK, or NACK.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] convergence flow diagrams

2014-09-10 Thread Clint Byrum
Excerpts from Angus Salkeld's message of 2014-09-08 17:15:04 -0700:
 On Mon, Sep 8, 2014 at 11:22 PM, Tyagi, Ishant ishant.ty...@hp.com wrote:
 
   Hi All,
 
 
 
  As per the heat mid cycle meetup whiteboard, we have created the
  flowchart and sequence diagram for the convergence . Can you please review
  these diagrams and provide your feedback?
 
 
 
  https://www.dropbox.com/sh/i8qbjtgfdxn4zx4/AAC6J-Nps8J12TzfuCut49ioa?dl=0
 
 
 Great! Good to see something.
 
 
 I was expecting something like:
 engine ~= like nova-conductor (it's the only process that talks to the db -
 make upgrading easier)

This complicates things immensely. The engine can just be the workers
too, we're just not going to do the observing and converging in the same
greenthread.

 observer - purely gets the actual state/properties and writes then to the
 db (via engine)

If you look closely at the diagrams, thats what it does.

 worker - has a job queue and grinds away at running those (resource
 actions)
 

The convergence worker is just another set of RPC API calls that split
out work into isolated chunks.

 Then engine then triggers on differences on goal vs. actual state and
 create a job and sends it to the job queue.

Remember, we're not targeting continuous convergence yet. Just
convergence when we ask for things.

 - so, on create it sees there is no actual state so it sends a create job
 for the first resource to the worker queue

The diagram shows that, but confusingly says is difference = 1. In
the original whiteboard this is 'if diff = DNE'. DNE stands for Does
Not Exist.

 - when the observer writes the new state for that resource it triggers the
 next resource create in the dependency tree.

Not the next resource create, but the next resource convergence. And not
just one either. I think one of the graphs was forgotten, it goes like
this:

https://www.dropbox.com/s/1h2ee151iriv4i1/resolve_graph.svg?dl=0

That is what we called return happy because we were at hour 9 or so of
talking and we got a bit punchy. I've renamed it 'resolve_graph'.

 - like any system that relies on notifications we need timeouts and each
 stack needs a periodic notification to make sure


This is, again, the continuous observer model.

https://review.openstack.org/#/c/100012/

   that progress is been made or notify the user that no progress is being
 made.
 
 One question about the observer (in either my setup or the one in the
 diagram).
 - If we are relying on rpc notifications all the observer processes will
 receive a copy of the same notification

Please read that spec. We talk about a filter.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]

2014-09-10 Thread Clint Byrum
Excerpts from Gordon Sim's message of 2014-09-10 06:18:52 -0700:
 On 09/10/2014 09:58 AM, Flavio Percoco wrote:
  To clarify the doubts of what Zaqar is or it's not, let me quote what's
  written in the project's overview section[0]:
 
  Zaqar is a multi-tenant cloud messaging service for web developers.
 
 How are different tenants isolated from each other? Can different 
 tenants access the same queue? If so, what does Zaqar do to prevent one 
 tenant from negatively affecting the other? If not, how is communication 
 with other tenants achieved.
 
 Most messaging systems allow authorisation to be used to restrict what a 
 particular user can access and quotas to restrict their resource 
 consumption. What does Zaqar do differently?
 
  It
  combines the ideas pioneered by Amazon's SQS product with additional
  semantics to support event broadcasting.
 
  The service features a fully RESTful API, which developers can use to
  send messages between various components of their SaaS and mobile
  applications, by using a variety of communication patterns. Underlying
  this API is an efficient messaging engine designed with scalability and
  security in mind.
 
  Other OpenStack components can integrate with Zaqar to surface events
  to end users and to communicate with guest agents that run in the
  over-cloud layer.
 
 I may be misunderstanding the last sentence, but I think *direct* 
 integration of other OpenStack services with Zaqar would be a bad idea.
 
 Wouldn't this be better done through olso.messaging's notifications in 
 some way? and/or through some standard protocol (and there's more than 
 one to choose from)?
 

It's not direct, nobody is suggesting that.

What people are suggesting is that a user would be able to tell Nova
to put any messages that would want to deliver in a _user_ focused
queue/inbox.

This has nothing to do with oslo.messaging. Users don't want many options
for backends. They want a simple message passing interface so they don't
have to babysit one and choose one.

Certainly the undercloud Zaqar API could be based on the existing
oslo.messaging notifications. A simple daemon that sits between the oslo
notifications firehose and Zaqar's user queues would be quite efficient.

However, putting the whole burden of talking directly to a notification
bus on the users is unnecessarily complex... especially if they use Java
and have no idea what oslo is.

 Communicating through a specific, fixed messaging system, with its own 
 unique protocol is actually a step backwards in my opinion, especially 
 for things that you want to keep as loosely coupled as possible. This is 
 exactly why various standard protocols emerged.
 

You're thinking like an operator. Think like an application developer.
They're asking you how do I subscribe to notifications about _just my
instances_ from Nova?, not how do I pump 40,000 messages per second
through a message bus that I fully control?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]

2014-09-11 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-11 04:14:30 -0700:
 On 09/10/2014 03:45 PM, Gordon Sim wrote:
  On 09/10/2014 01:51 PM, Thierry Carrez wrote:
  I think we do need, as Samuel puts it, some sort of durable
  message-broker/queue-server thing. It's a basic application building
  block. Some claim it's THE basic application building block, more useful
  than database provisioning. It's definitely a layer above pure IaaS, so
  if we end up splitting OpenStack into layers this clearly won't be in
  the inner one. But I think IaaS+ basic application building blocks
  belong in OpenStack one way or another. That's the reason I supported
  Designate (everyone needs DNS) and Trove (everyone needs DBs).
 
  With that said, I think yesterday there was a concern that Zaqar might
  not fill the some sort of durable message-broker/queue-server thing
  role well. The argument goes something like: if it was a queue-server
  then it should actually be built on top of Rabbit; if it was a
  message-broker it should be built on top of postfix/dovecot; the current
  architecture is only justified because it's something in between, so
  it's broken.
  
  What is the distinction between a message broker and a queue server? To
  me those terms both imply something broadly similar (message broker
  perhaps being a little bit more generic). I could see Zaqar perhaps as
  somewhere between messaging and data-storage.
 
 I agree with Gordon here. I really don't know how to say this without
 creating more confusion. Zaqar is a messaging service. Messages are the
 most important entity in Zaqar. This, however, does not forbid anyone to
 use Zaqar as a queue. It has the required semantics, it guarantees FIFO
 and other queuing specific patterns. This doesn't mean Zaqar is trying
 to do something outside its scope, it comes for free.
 

It comes with a huge cost actually, so saying it comes for free is a
misrepresentation. It is a side effect of developing a superset of
queueing. But that superset is only useful to a small number of your
stated use cases. Many of your use cases (including the one I've been
involved with, Heat pushing metadata to servers) are entirely served by
the much simpler, much lighter weight, pure queueing service.

 Is Zaqar being optimized as a *queuing* service? I'd say no. Our goal is
 to optimize Zaqar for delivering messages and supporting different
 messaging patterns.
 

Awesome! Just please don't expect people to get excited about it for
the lighter weight queueing workloads that you've claimed as use cases.

I totally see Horizon using it to keep events for users. I see Heat
using it for stack events as well. I would bet that Trove would benefit
from being able to communicate messages to users.

But I think in between Zaqar and the backends will likely be a lighter
weight queue-only service that the users can just subscribe to when they
don't want an inbox. And I think that lighter weight queue service is
far more important for OpenStack than the full blown random access
inbox.

I think the reason such a thing has not appeared is because we were all
sort of running into but Zaqar is already incubated. Now that we've
fleshed out the difference, I think those of us that need a lightweight
multi-tenant queue service should add it to OpenStack.  Separately. I hope
that doesn't offend you and the rest of the excellent Zaqar developers. It
is just a different thing.

 Should we remove all the semantics that allow people to use Zaqar as a
 queue service? I don't think so either. Again, the semantics are there
 because Zaqar is using them to do its job. Whether other folks may/may
 not use Zaqar as a queue service is out of our control.
 
 This doesn't mean the project is broken.
 

No, definitely not broken. It just isn't actually necessary for many of
the stated use cases.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-11 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-09-11 15:21:26 -0700:
 On 09/09/14 19:56, Clint Byrum wrote:
  Excerpts from Samuel Merritt's message of 2014-09-09 16:12:09 -0700:
  On 9/9/14, 12:03 PM, Monty Taylor wrote:
  On 09/04/2014 01:30 AM, Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
  Greetings,
 
  Last Tuesday the TC held the first graduation review for Zaqar. During
  the meeting some concerns arose. I've listed those concerns below with
  some comments hoping that it will help starting a discussion before the
  next meeting. In addition, I've added some comments about the project
  stability at the bottom and an etherpad link pointing to a list of use
  cases for Zaqar.
 
 
  Hi Flavio. This was an interesting read. As somebody whose attention has
  recently been drawn to Zaqar, I am quite interested in seeing it
  graduate.
 
  # Concerns
 
  - Concern on operational burden of requiring NoSQL deploy expertise to
  the mix of openstack operational skills
 
  For those of you not familiar with Zaqar, it currently supports 2 nosql
  drivers - MongoDB and Redis - and those are the only 2 drivers it
  supports for now. This will require operators willing to use Zaqar to
  maintain a new (?) NoSQL technology in their system. Before expressing
  our thoughts on this matter, let me say that:
 
1. By removing the SQLAlchemy driver, we basically removed the
  chance
  for operators to use an already deployed OpenStack-technology
2. Zaqar won't be backed by any AMQP based messaging technology 
  for
  now. Here's[0] a summary of the research the team (mostly done by
  Victoria) did during Juno
3. We (OpenStack) used to require Redis for the zmq matchmaker
4. We (OpenStack) also use memcached for caching and as the oslo
  caching lib becomes available - or a wrapper on top of dogpile.cache -
  Redis may be used in place of memcached in more and more deployments.
5. Ceilometer's recommended storage driver is still MongoDB,
  although
  Ceilometer has now support for sqlalchemy. (Please correct me if I'm
  wrong).
 
  That being said, it's obvious we already, to some extent, promote some
  NoSQL technologies. However, for the sake of the discussion, lets assume
  we don't.
 
  I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
  keep avoiding these technologies. NoSQL technologies have been around
  for years and we should be prepared - including OpenStack operators - to
  support these technologies. Not every tool is good for all tasks - one
  of the reasons we removed the sqlalchemy driver in the first place -
  therefore it's impossible to keep an homogeneous environment for all
  services.
 
 
  I whole heartedly agree that non traditional storage technologies that
  are becoming mainstream are good candidates for use cases where SQL
  based storage gets in the way. I wish there wasn't so much FUD
  (warranted or not) about MongoDB, but that is the reality we live in.
 
  With this, I'm not suggesting to ignore the risks and the extra burden
  this adds but, instead of attempting to avoid it completely by not
  evolving the stack of services we provide, we should probably work on
  defining a reasonable subset of NoSQL services we are OK with
  supporting. This will help making the burden smaller and it'll give
  operators the option to choose.
 
  [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
 
 
  - Concern on should we really reinvent a queue system rather than
  piggyback on one
 
  As mentioned in the meeting on Tuesday, Zaqar is not reinventing message
  brokers. Zaqar provides a service akin to SQS from AWS with an OpenStack
  flavor on top. [0]
 
 
  I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
  trying to connect two processes in real time. You're trying to do fully
  asynchronous messaging with fully randomized access to any message.
 
  Perhaps somebody should explore whether the approaches taken by large
  scale IMAP providers could be applied to Zaqar.
 
  Anyway, I can't imagine writing a system to intentionally use the
  semantics of IMAP and SMTP. I'd be very interested in seeing actual use
  cases for it, apologies if those have been posted before.
 
  It seems like you're EITHER describing something called XMPP that has at
  least one open source scalable backend called ejabberd. OR, you've
  actually hit the nail on the head with bringing up SMTP and IMAP but for
  some reason that feels strange.
 
  SMTP and IMAP already implement every feature you've described, as well
  as retries/failover/HA and a fully end to end secure transport (if
  installed properly) If you don't actually set them up to run as a public
  messaging interface but just as a cloud-local exchange, then you could
  get by with very low overhead for a massive throughput - it can very
  easily be run on a single machine for Sean's simplicity, and could just
  as easily

Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]

2014-09-12 Thread Clint Byrum
Excerpts from Mark McLoughlin's message of 2014-09-12 03:27:42 -0700:
 On Wed, 2014-09-10 at 14:51 +0200, Thierry Carrez wrote:
  Flavio Percoco wrote:
   [...]
   Based on the feedback from the meeting[3], the current main concern is:
   
   - Do we need a messaging service with a feature-set akin to SQS+SNS?
   [...]
  
  I think we do need, as Samuel puts it, some sort of durable
  message-broker/queue-server thing. It's a basic application building
  block. Some claim it's THE basic application building block, more useful
  than database provisioning. It's definitely a layer above pure IaaS, so
  if we end up splitting OpenStack into layers this clearly won't be in
  the inner one. But I think IaaS+ basic application building blocks
  belong in OpenStack one way or another. That's the reason I supported
  Designate (everyone needs DNS) and Trove (everyone needs DBs).
  
  With that said, I think yesterday there was a concern that Zaqar might
  not fill the some sort of durable message-broker/queue-server thing
  role well. The argument goes something like: if it was a queue-server
  then it should actually be built on top of Rabbit; if it was a
  message-broker it should be built on top of postfix/dovecot; the current
  architecture is only justified because it's something in between, so
  it's broken.
  
  I guess I don't mind that much zaqar being something in between:
  unless I misunderstood, exposing extra primitives doesn't prevent the
  queue-server use case from being filled. Even considering the
  message-broker case, I'm also not convinced building it on top of
  postfix/dovecot would be a net win compared to building it on top of
  Redis, to be honest.
 
 AFAICT, this part of the debate boils down to the following argument:
 
   If Zaqar implemented messaging-as-a-service with only queuing 
   semantics (and no random access semantics), it's design would 
   naturally be dramatically different and simply implement a 
   multi-tenant REST API in front of AMQP queues like this:
 
 https://www.dropbox.com/s/yonloa9ytlf8fdh/ZaqarQueueOnly.png?dl=0
 
   and that this architecture would allow for dramatically improved 
   throughput for end-users while not making the cost of providing the 
   service prohibitive to operators.
 
 You can't dismiss that argument out-of-hand, but I wonder (a) whether
 the claimed performance improvement is going to make a dramatic
 difference to the SQS-like use case and (b) whether backing this thing
 with an RDBMS and multiple highly available, durable AMQP broker
 clusters is going to be too much of a burden on operators for whatever
 performance improvements it does gain.

Having had experience taking queue-only data out of RDBMS's and even SMTP
solutions, and putting them into queues, I can say that it was generally
quite a bit more reliable and cheaper to maintain.

However, as I've been thinking about this more, I am concerned about the
complexity of trying to use a stateless protocol like HTTP for reliable
delivery, given that these queues all use a session model that relies
on connection persistence. That may very well invalidate my hypothesis.

 
 But the troubling part of this debate is where we repeatedly batter the
 Zaqar team with hypotheses like these and appear to only barely
 entertain their carefully considered justification for their design
 decisions like:
 
   
 https://wiki.openstack.org/wiki/Frequently_asked_questions_%28Zaqar%29#Is_Zaqar_a_provisioning_service_or_a_data_API.3F
   
 https://wiki.openstack.org/wiki/Frequently_asked_questions_%28Zaqar%29#What_messaging_patterns_does_Zaqar_support.3F
 
 I would like to see an SQS-like API provided by OpenStack, I accept the
 reasons for Zaqar's design decisions to date, I respect that those
 decisions were made carefully by highly competent members of our
 community and I expect Zaqar to evolve (like all projects) in the years
 ahead based on more real-world feedback, new hypotheses or ideas, and
 lessons learned from trying things out.

I have read those and I truly believe that the Zaqar team, who I believe
are already a valuable part of the OpenStack family, are doing good work.
Seriously, I believe it is valuable as is and I trust them to do what
they have stated they will do.

Let me explain my position again. Heat is in dire need of a way to
communicate with instances that is efficient. It has no need for a full
messaging stack.. just a way for users to have things pushed from Heat
to their instances efficiently.

So, to reiterate why I keep going on about this: If a messaging service
is to become an integrated part of OpenStack's release, we should think
carefully about the ramifications for operators _and_ users of not
having a light weight queue-only option, when that seems to fit _most_
of the use cases.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-12 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-12 00:22:35 -0700:
 On 09/12/2014 03:29 AM, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-09-11 15:21:26 -0700:
  On 09/09/14 19:56, Clint Byrum wrote:
  Excerpts from Samuel Merritt's message of 2014-09-09 16:12:09 -0700:
  On 9/9/14, 12:03 PM, Monty Taylor wrote:
  On 09/04/2014 01:30 AM, Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
  Greetings,
 
  Last Tuesday the TC held the first graduation review for Zaqar. During
  the meeting some concerns arose. I've listed those concerns below with
  some comments hoping that it will help starting a discussion before 
  the
  next meeting. In addition, I've added some comments about the project
  stability at the bottom and an etherpad link pointing to a list of use
  cases for Zaqar.
 
 
  Hi Flavio. This was an interesting read. As somebody whose attention 
  has
  recently been drawn to Zaqar, I am quite interested in seeing it
  graduate.
 
  # Concerns
 
  - Concern on operational burden of requiring NoSQL deploy expertise to
  the mix of openstack operational skills
 
  For those of you not familiar with Zaqar, it currently supports 2 
  nosql
  drivers - MongoDB and Redis - and those are the only 2 drivers it
  supports for now. This will require operators willing to use Zaqar to
  maintain a new (?) NoSQL technology in their system. Before expressing
  our thoughts on this matter, let me say that:
 
1. By removing the SQLAlchemy driver, we basically removed the
  chance
  for operators to use an already deployed OpenStack-technology
2. Zaqar won't be backed by any AMQP based messaging technology 
  for
  now. Here's[0] a summary of the research the team (mostly done by
  Victoria) did during Juno
3. We (OpenStack) used to require Redis for the zmq matchmaker
4. We (OpenStack) also use memcached for caching and as the oslo
  caching lib becomes available - or a wrapper on top of dogpile.cache -
  Redis may be used in place of memcached in more and more deployments.
5. Ceilometer's recommended storage driver is still MongoDB,
  although
  Ceilometer has now support for sqlalchemy. (Please correct me if I'm
  wrong).
 
  That being said, it's obvious we already, to some extent, promote some
  NoSQL technologies. However, for the sake of the discussion, lets 
  assume
  we don't.
 
  I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
  keep avoiding these technologies. NoSQL technologies have been around
  for years and we should be prepared - including OpenStack operators - 
  to
  support these technologies. Not every tool is good for all tasks - one
  of the reasons we removed the sqlalchemy driver in the first place -
  therefore it's impossible to keep an homogeneous environment for all
  services.
 
 
  I whole heartedly agree that non traditional storage technologies that
  are becoming mainstream are good candidates for use cases where SQL
  based storage gets in the way. I wish there wasn't so much FUD
  (warranted or not) about MongoDB, but that is the reality we live in.
 
  With this, I'm not suggesting to ignore the risks and the extra burden
  this adds but, instead of attempting to avoid it completely by not
  evolving the stack of services we provide, we should probably work on
  defining a reasonable subset of NoSQL services we are OK with
  supporting. This will help making the burden smaller and it'll give
  operators the option to choose.
 
  [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
 
 
  - Concern on should we really reinvent a queue system rather than
  piggyback on one
 
  As mentioned in the meeting on Tuesday, Zaqar is not reinventing 
  message
  brokers. Zaqar provides a service akin to SQS from AWS with an 
  OpenStack
  flavor on top. [0]
 
 
  I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
  trying to connect two processes in real time. You're trying to do fully
  asynchronous messaging with fully randomized access to any message.
 
  Perhaps somebody should explore whether the approaches taken by large
  scale IMAP providers could be applied to Zaqar.
 
  Anyway, I can't imagine writing a system to intentionally use the
  semantics of IMAP and SMTP. I'd be very interested in seeing actual use
  cases for it, apologies if those have been posted before.
 
  It seems like you're EITHER describing something called XMPP that has at
  least one open source scalable backend called ejabberd. OR, you've
  actually hit the nail on the head with bringing up SMTP and IMAP but for
  some reason that feels strange.
 
  SMTP and IMAP already implement every feature you've described, as well
  as retries/failover/HA and a fully end to end secure transport (if
  installed properly) If you don't actually set them up to run as a public
  messaging interface but just as a cloud-local exchange, then you could
  get by with very low

Re: [openstack-dev] [Heat] Defining what is a SupportStatus version

2014-09-14 Thread Clint Byrum
Excerpts from Gauvain Pocentek's message of 2014-09-04 22:29:05 -0700:
 Hi,
 
 A bit of background: I'm working on the publication of the HOT 
 resources reference on docs.openstack.org. This book is mostly 
 autogenerated from the heat source code, using the sphinx XML output. To 
 avoid publishing several references (one per released version, as is 
 done for the OpenStack config-reference), I'd like to add information 
 about the support status of each resource (when they appeared, when 
 they've been deprecated, and so on).
 
 So the plan is to use the SupportStatus class and its `version` 
 attribute (see https://review.openstack.org/#/c/116443/ ). And the 
 question is, what information should the version attribute hold? 
 Possibilities include the release code name (Icehouse, Juno), or the 
 release version (2014.1, 2014.2). But this wouldn't be useful for users 
 of clouds continuously deployed.
 
  From my documenter point of view, using the code name seems the right 
 option, because it fits with the rest of the documentation.
 
 What do you think would be the best choice from the heat devs POV?

What we ship in-tree is the standard library for Heat. I think Heat
should not tie things to the release of OpenStack, but only to itself.

The idea is to simply version the standard library of resources separately
even from the language. Added resources and properties would be minor
bumps, deprecating or removing anything would be a major bump. Users then
just need an API call that allows querying the standard library version.

With this scheme, we can provide a gate test that prevents breaking the
rules, and automatically generate the docs still. Doing this would sync
better with continuous deployers who will be running Juno well before
there is a 2014.2.

Anyway, Heat largely exists to support portability of apps between
OpenStack clouds. Many many OpenStack clouds don't run one release,
and we don't require them to do so. So tying to the release is, IMO,
a poor coice. We do the same thing with HOT's internals, so why not also
do the standard library this way?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and ready state orchestration

2014-09-15 Thread Clint Byrum
Excerpts from James Slagle's message of 2014-09-15 08:15:21 -0700:
 On Mon, Sep 15, 2014 at 7:44 AM, Steven Hardy sha...@redhat.com wrote:
  All,
 
  Starting this thread as a follow-up to a strongly negative reaction by the
  Ironic PTL to my patches[1] adding initial Heat-Ironic integration, and
  subsequent very detailed justification and discussion of why they may be
  useful in this spec[2].
 
  Back in Atlanta, I had some discussions with folks interesting in making
  ready state[3] preparation of bare-metal resources possible when
  deploying bare-metal nodes via TripleO/Heat/Ironic.
 
 After a cursory reading of the references, it seems there's a couple of 
 issues:
 - are the features to move hardware to a ready-state even going to
 be in Ironic proper, whether that means in ironic at all or just in
 contrib.
 - assuming some of the features are there, should Heat have any Ironic
 resources given that Ironic's API is admin-only.
 
 
  The initial assumption is that there is some discovery step (either
  automatic or static generation of a manifest of nodes), that can be input
  to either Ironic or Heat.
 
 I think it makes a lot of sense to use Heat to do the bulk
 registration of nodes via Ironic. I understand the argument that the
 Ironic API should be admin-only a little bit for the non-TripleO
 case, but for TripleO, we only have admins interfacing with the
 Undercloud. The user of a TripleO undercloud is the deployer/operator
 and in some scenarios this may not be the undercloud admin. So,
 talking about TripleO, I don't really buy that the Ironic API is
 admin-only.
 
 Therefore, why not have some declarative Heat resources for things
 like Ironic nodes, that the deployer can make use of in a Heat
 template to do bulk node registration?
 
 The alternative listed in the spec:
 
 Don’t implement the resources and rely on scripts which directly
 interact with the Ironic API, prior to any orchestration via Heat.
 
 would just be a bit silly IMO. That goes against one of the main
 drivers of TripleO, which is to use OpenStack wherever possible. Why
 go off and write some other thing that is going to parse a
 json/yaml/csv of nodes and orchestrate a bunch of Ironic api calls?
 Why would it be ok for that other thing to use Ironic's admin-only
 API yet claim it's not ok for Heat on the undercloud to do so?
 

An alternative that is missed, is to just define a bulk loading format
for hardware, or adopt an existing one (I find it hard to believe there
isn't already an open format for this), and make use of it in Ironic.

The analogy I'd use is shipping dry goods in a refrigerated truck.
It's heavier, has a bit less capacity, and unnecessary features.  If all
you have is the refrigerated truck, ok. But we're talking about _building_
a special dry-goods add-on to our refrigerated truck (Heat) to avoid
building the same thing into the regular trucks we already have (Ironic).

  Following discovery, but before an undercloud deploying OpenStack onto the
  nodes, there are a few steps which may be desired, to get the hardware into
  a state where it's ready and fully optimized for the subsequent deployment:
 
  - Updating and aligning firmware to meet requirements of qualification or
site policy
  - Optimization of BIOS configuration to match workloads the node is
expected to run
  - Management of machine-local storage, e.g configuring local RAID for
optimal resilience or performance.
 
  Interfaces to Ironic are landing (or have landed)[4][5][6] which make many
  of these steps possible, but there's no easy way to either encapsulate the
  (currently mostly vendor specific) data associated with each step, or to
  coordinate sequencing of the steps.
 
  What is required is some tool to take a text definition of the required
  configuration, turn it into a correctly sequenced series of API calls to
  Ironic, expose any data associated with those API calls, and declare
  success or failure on completion.  This is what Heat does.
 
  So the idea is to create some basic (contrib, disabled by default) Ironic
  heat resources, then explore the idea of orchestrating ready-state
  configuration via Heat.
 
  Given that Devananda and I have been banging heads over this for some time
  now, I'd like to get broader feedback of the idea, my interpretation of
  ready state applied to the tripleo undercloud, and any alternative
  implementation ideas.
 
 My opinion is that if the features are in Ironic, they should be
 exposed via Heat resources for orchestration. If the TripleO case is
 too much of a one-off (which I don't really think it is), then sure,
 keep it all in contrib so that no one gets confused about why the
 resources are there.
 

And I think if this is a common thing that Ironic users need to do,
then Ironic should do it, not Heat.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]

2014-09-15 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-15 00:57:05 -0700:
 On 09/12/2014 07:13 PM, Clint Byrum wrote:
  Excerpts from Thierry Carrez's message of 2014-09-12 02:16:42 -0700:
  Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-11 04:14:30 -0700:
  Is Zaqar being optimized as a *queuing* service? I'd say no. Our goal is
  to optimize Zaqar for delivering messages and supporting different
  messaging patterns.
 
  Awesome! Just please don't expect people to get excited about it for
  the lighter weight queueing workloads that you've claimed as use cases.
 
  I totally see Horizon using it to keep events for users. I see Heat
  using it for stack events as well. I would bet that Trove would benefit
  from being able to communicate messages to users.
 
  But I think in between Zaqar and the backends will likely be a lighter
  weight queue-only service that the users can just subscribe to when they
  don't want an inbox. And I think that lighter weight queue service is
  far more important for OpenStack than the full blown random access
  inbox.
 
  I think the reason such a thing has not appeared is because we were all
  sort of running into but Zaqar is already incubated. Now that we've
  fleshed out the difference, I think those of us that need a lightweight
  multi-tenant queue service should add it to OpenStack.  Separately. I hope
  that doesn't offend you and the rest of the excellent Zaqar developers. It
  is just a different thing.
 
  Should we remove all the semantics that allow people to use Zaqar as a
  queue service? I don't think so either. Again, the semantics are there
  because Zaqar is using them to do its job. Whether other folks may/may
  not use Zaqar as a queue service is out of our control.
 
  This doesn't mean the project is broken.
 
  No, definitely not broken. It just isn't actually necessary for many of
  the stated use cases.
 
  Clint,
 
  If I read you correctly, you're basically saying the Zaqar is overkill
  for a lot of people who only want a multi-tenant queue service. It's
  doing A+B. Why does that prevent people who only need A from using it ?
 
  Is it that it's actually not doing A well, from a user perspective ?
  Like the performance sucks, or it's missing a key primitive ?
 
  Is it that it's unnecessarily complex to deploy, from a deployer
  perspective, and that something only doing A would be simpler, while
  covering most of the use cases?
 
  Is it something else ?
 
  I want to make sure I understand your objection. In the user
  perspective it might make sense to pursue both options as separate
  projects. In the deployer perspective case, having a project doing A+B
  and a project doing A doesn't solve anything. So this affects the
  decision we have to take next Tuesday...
  
  I believe that Zaqar does two things, inbox semantics, and queue
  semantics. I believe the queueing is a side-effect of needing some kind
  of queue to enable users to store and subscribe to messages in the
  inbox.
  
  What I'd rather see is an API for queueing, and an API for inboxes
  which integrates well with the queueing API. For instance, if a user
  says give me an inbox I think Zaqar should return a queue handle for
  sending into the inbox the same way Nova gives you a Neutron port if
  you don't give it one. You might also ask for a queue to receive push
  messages from the inbox. Point being, the queues are not the inbox,
  and the inbox is not the queues.
  
  However, if I just want a queue, just give me a queue. Don't store my
  messages in a randomly addressable space, and don't saddle the deployer
  with the burden of such storage. Put the queue API in front of a scalable
  message queue and give me a nice simple HTTP API. Users would likely be
  thrilled. Heat, Nova, Ceilometer, probably Trove and Sahara, could all
  make use of just this. Only Horizon seems to need a place to keep the
  messages around while users inspect them.
  
  Whether that is two projects, or one, separation between the two API's,
  and thus two very different types of backends, is something I think
  will lead to more deployers wanting to deploy both, so that they can
  bill usage appropriately and so that their users can choose wisely.
 
 This is one of the use-cases we designed flavors for. One of the mail
 ideas behind flavors is giving the user the choice of where they want
 their messages to be stored. This certainly requires the deployer to
 have installed stores that are good for each job. For example, based on
 the current existing drivers, a deployer could have configured a
 high-throughput flavor on top of a redis node that has been configured
 to perform for this job. Alongside to this flavor, the deployer could've
 configured a flavor that features durability on top of mongodb or redis.
 
 When the user creates the queue/bucket/inbox/whatever they want to put
 their messages into, they'll be able to choose where those messages
 should be stored

Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and ready state orchestration

2014-09-15 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2014-09-15 10:10:05 -0700:
 On Mon, Sep 15, 2014 at 09:50:24AM -0700, Clint Byrum wrote:
  Excerpts from Steven Hardy's message of 2014-09-15 04:44:24 -0700:
   All,
   
   Starting this thread as a follow-up to a strongly negative reaction by the
   Ironic PTL to my patches[1] adding initial Heat-Ironic integration, and
   subsequent very detailed justification and discussion of why they may be
   useful in this spec[2].
   
   Back in Atlanta, I had some discussions with folks interesting in making
   ready state[3] preparation of bare-metal resources possible when
   deploying bare-metal nodes via TripleO/Heat/Ironic.
   
   The initial assumption is that there is some discovery step (either
   automatic or static generation of a manifest of nodes), that can be input
   to either Ironic or Heat.
   
   Following discovery, but before an undercloud deploying OpenStack onto the
   nodes, there are a few steps which may be desired, to get the hardware 
   into
   a state where it's ready and fully optimized for the subsequent 
   deployment:
   
   - Updating and aligning firmware to meet requirements of qualification or
 site policy
   - Optimization of BIOS configuration to match workloads the node is
 expected to run
   - Management of machine-local storage, e.g configuring local RAID for
 optimal resilience or performance.
   
   Interfaces to Ironic are landing (or have landed)[4][5][6] which make many
   of these steps possible, but there's no easy way to either encapsulate the
   (currently mostly vendor specific) data associated with each step, or to
   coordinate sequencing of the steps.
   
  
  First, Ironic is hidden under Nova as far as TripleO is concerned. So
  mucking with the servers underneath Nova during deployment is a difficult
  proposition. Would I look up the Ironic node ID of the nova server,
  and then optimize it for the workload after the workload arrived? Why
  wouldn't I just do that optimization before the deployment?
 
 That's exactly what I'm proposing - a series of preparatory steps performed
 before the node is visible to nova, before the deployment.
 

Ok good, so I didn't misunderstand. I'm having trouble seeing where Heat
is a good fit there.

 The whole point is that Ironic is hidden under nova, and provides no way to
 perform these pre-deploy steps via interaction with nova.
 
  
   What is required is some tool to take a text definition of the required
   configuration, turn it into a correctly sequenced series of API calls to
   Ironic, expose any data associated with those API calls, and declare
   success or failure on completion.  This is what Heat does.
   
  
  I'd rather see Ironic define or adopt a narrow scope document format
  that it can consume for bulk loading. Heat is extremely generic, and thus
  carries a ton of complexity for what is probably doable with a CSV file.
 
 Perhaps you can read the spec - it's not really about the bulk-load part,
 it's about orchestrating the steps to prepare the node, after it's
 registered with Ironic, but before it's ready to have the stuff deployed to
 it.
 

Sounds like workflow to me. :-P

 What tool do you think will just do that optimization before the
 deployment? (snark not intended, I genuinely want to know, is it scripts
 in TripleO, some sysadmin pre-deploy steps, magic in Ironic?)


If it can all be done by calls to the ironic client with the node ID and
parameters from the user, I'd suggest that this is a simple workflow
and can be done in the step prior to 'heat stack-create'. I don't see
any reason to keep a bunch of records around in Heat to describe what
happened, identically, for Ironic nodes. It is an ephemeral step in the
evolution of the system, not something we need to edit on a regular basis.

My new bar for whether something is a good fit for Heat is what happens
to my workload when I update it. If I go into my Ironic pre-registration
stack and change things around, the likely case is that my box reboots
to re-apply BIOS updates with the new paramters. And there is a missing
dependency expression when using the orchestration tool to do the
workflow job. It may actually be necessary to always do these things to
the hardware in a certain sequence. But editting the Heat template and
updating has no way to express that.

To contrast this with developing it in a workflow control language
(like bash), it is imperative so I am consciously deciding to re-apply
those things by running it. If I only want to do one step, I just do
the one step.

Basically, the imperative model is rigid, sharp, and pointy, but the
declarative model is soft and maleable, and full of unexpected sharp
pointy things. I think users are more comfortable with knowing where
the sharp and pointy things are, than stumbling on them.

   So the idea is to create some basic (contrib, disabled by default) Ironic
   heat resources, then explore the idea of orchestrating ready-state

Re: [openstack-dev] [Zaqar] Zaqar graduation (round 2) [was: Comments on the concerns arose during the TC meeting]

2014-09-15 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-09-15 12:05:09 -0700:
 On 15/09/14 13:28, Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-15 00:57:05 -0700:
  On 09/12/2014 07:13 PM, Clint Byrum wrote:
  Excerpts from Thierry Carrez's message of 2014-09-12 02:16:42 -0700:
  Clint Byrum wrote:
  Excerpts from Flavio Percoco's message of 2014-09-11 04:14:30 -0700:
  Is Zaqar being optimized as a *queuing* service? I'd say no. Our goal 
  is
  to optimize Zaqar for delivering messages and supporting different
  messaging patterns.
 
  Awesome! Just please don't expect people to get excited about it for
  the lighter weight queueing workloads that you've claimed as use cases.
 
  I totally see Horizon using it to keep events for users. I see Heat
  using it for stack events as well. I would bet that Trove would benefit
  from being able to communicate messages to users.
 
  But I think in between Zaqar and the backends will likely be a lighter
  weight queue-only service that the users can just subscribe to when they
  don't want an inbox. And I think that lighter weight queue service is
  far more important for OpenStack than the full blown random access
  inbox.
 
  I think the reason such a thing has not appeared is because we were all
  sort of running into but Zaqar is already incubated. Now that we've
  fleshed out the difference, I think those of us that need a lightweight
  multi-tenant queue service should add it to OpenStack.  Separately. I 
  hope
  that doesn't offend you and the rest of the excellent Zaqar developers. 
  It
  is just a different thing.
 
  Should we remove all the semantics that allow people to use Zaqar as a
  queue service? I don't think so either. Again, the semantics are there
  because Zaqar is using them to do its job. Whether other folks may/may
  not use Zaqar as a queue service is out of our control.
 
  This doesn't mean the project is broken.
 
  No, definitely not broken. It just isn't actually necessary for many of
  the stated use cases.
 
  Clint,
 
  If I read you correctly, you're basically saying the Zaqar is overkill
  for a lot of people who only want a multi-tenant queue service. It's
  doing A+B. Why does that prevent people who only need A from using it ?
 
  Is it that it's actually not doing A well, from a user perspective ?
  Like the performance sucks, or it's missing a key primitive ?
 
  Is it that it's unnecessarily complex to deploy, from a deployer
  perspective, and that something only doing A would be simpler, while
  covering most of the use cases?
 
  Is it something else ?
 
  I want to make sure I understand your objection. In the user
  perspective it might make sense to pursue both options as separate
  projects. In the deployer perspective case, having a project doing A+B
  and a project doing A doesn't solve anything. So this affects the
  decision we have to take next Tuesday...
 
  I believe that Zaqar does two things, inbox semantics, and queue
  semantics. I believe the queueing is a side-effect of needing some kind
  of queue to enable users to store and subscribe to messages in the
  inbox.
 
  What I'd rather see is an API for queueing, and an API for inboxes
  which integrates well with the queueing API. For instance, if a user
  says give me an inbox I think Zaqar should return a queue handle for
  sending into the inbox the same way Nova gives you a Neutron port if
  you don't give it one. You might also ask for a queue to receive push
  messages from the inbox. Point being, the queues are not the inbox,
  and the inbox is not the queues.
 
  However, if I just want a queue, just give me a queue. Don't store my
  messages in a randomly addressable space, and don't saddle the deployer
  with the burden of such storage. Put the queue API in front of a scalable
  message queue and give me a nice simple HTTP API. Users would likely be
  thrilled. Heat, Nova, Ceilometer, probably Trove and Sahara, could all
  make use of just this. Only Horizon seems to need a place to keep the
  messages around while users inspect them.
 
  Whether that is two projects, or one, separation between the two API's,
  and thus two very different types of backends, is something I think
  will lead to more deployers wanting to deploy both, so that they can
  bill usage appropriately and so that their users can choose wisely.
 
  This is one of the use-cases we designed flavors for. One of the mail
  ideas behind flavors is giving the user the choice of where they want
  their messages to be stored. This certainly requires the deployer to
  have installed stores that are good for each job. For example, based on
  the current existing drivers, a deployer could have configured a
  high-throughput flavor on top of a redis node that has been configured
  to perform for this job. Alongside to this flavor, the deployer could've
  configured a flavor that features durability on top of mongodb or redis.
 
  When the user creates the queue/bucket

Re: [openstack-dev] [Heat] Defining what is a SupportStatus version

2014-09-15 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-09-15 09:31:33 -0700:
 On 14/09/14 11:09, Clint Byrum wrote:
  Excerpts from Gauvain Pocentek's message of 2014-09-04 22:29:05 -0700:
  Hi,
 
  A bit of background: I'm working on the publication of the HOT
  resources reference on docs.openstack.org. This book is mostly
  autogenerated from the heat source code, using the sphinx XML output. To
  avoid publishing several references (one per released version, as is
  done for the OpenStack config-reference), I'd like to add information
  about the support status of each resource (when they appeared, when
  they've been deprecated, and so on).
 
  So the plan is to use the SupportStatus class and its `version`
  attribute (see https://review.openstack.org/#/c/116443/ ). And the
  question is, what information should the version attribute hold?
  Possibilities include the release code name (Icehouse, Juno), or the
  release version (2014.1, 2014.2). But this wouldn't be useful for users
  of clouds continuously deployed.
 
From my documenter point of view, using the code name seems the right
  option, because it fits with the rest of the documentation.
 
  What do you think would be the best choice from the heat devs POV?
 
  What we ship in-tree is the standard library for Heat. I think Heat
  should not tie things to the release of OpenStack, but only to itself.
 
 Standard Library implies that everyone has it available, but in 
 reality operators can (and will, and do) deploy any combination of 
 resource types that they want.
 

Mmk, I guess I was being too optimistic about how homogeneous OpenStack
clouds might be.

  The idea is to simply version the standard library of resources separately
  even from the language. Added resources and properties would be minor
  bumps, deprecating or removing anything would be a major bump. Users then
  just need an API call that allows querying the standard library version.
 
 We already have API calls to actually inspect resource types. I don't 
 think a semantic version number is helpful here, since the different 
 existing combinations of resources types are not expressible linearly.
 
 There's no really good answer here, but the only real answer is making 
 sure it's easy for people to generate the docs themselves for their 
 actual deployment.
 

That's an interesting idea. By any chance do we have something that
publishes the docs directly from source tree into swift? Might make it
easier if we could just do that as part of code pushes for those who run
clouds from source.

  With this scheme, we can provide a gate test that prevents breaking the
  rules, and automatically generate the docs still. Doing this would sync
  better with continuous deployers who will be running Juno well before
  there is a 2014.2.
 
 Maybe continuous deployers should continuously deploy their own docs? 
 For any given cloud the only thing that matters is what it supports 
 right now.


Thats an interesting idea, but I like what the user wants is to see how
this cloud is different than the other clouds.

  Anyway, Heat largely exists to support portability of apps between
  OpenStack clouds. Many many OpenStack clouds don't run one release,
  and we don't require them to do so. So tying to the release is, IMO,
  a poor coice.
 
 The original question was about docs.openstack.org, and in that context 
 I think tying it to the release version is a good choice, because 
 that's... how OpenStack is released. Individual clouds, however, really 
 need to deploy their own docs that document what they actually support.
 

Yeah I hadn't thought of that before. I like the idea but I wonder how
practical it is for CD private clouds.

 The flip side of this, of course, is that whatever we use for the 
 version strings on docs.openstack.org will all make its way into all the 
 other documentation that gets built, and I do understand your point in 
 that context. But versioning the standard library of plugins as if it 
 were a monolithic, always-available thing seems wrong to me.


Yeah I think it is too optimistic in retrospect.

  We do the same thing with HOT's internals, so why not also
  do the standard library this way?
 
 The current process for HOT is for every OpenStack development cycle 
 (Juno is the first to use this) to give it a 'version' string that is 
 the expected date of the next release (in the future), and continuous 
 deployers who use the new one before that date are on their own (i.e. 
 it's not considered stable). So not really comparable.
 

I think there's a difference between a CD operator making it available,
and saying they support it. Just like a new API version in OpenStack, it
may be there, but they may communicate to users it is alpha until after
it gets released upstream. I think that is the same for this, and so I
think that using the version number is probably fine.

___
OpenStack-dev mailing list
OpenStack-dev

Re: [openstack-dev] [glance][all] Help with interpreting the log level guidelines

2014-09-15 Thread Clint Byrum
Excerpts from Sean Dague's message of 2014-09-15 16:02:04 -0700:
 On 09/15/2014 07:00 PM, Mark Washenberger wrote:
  Hi there logging experts,
  
  We've recently had a little disagreement in the glance team about the
  appropriate log levels for http requests that end up failing due to user
  errors. An example would be a request to get an image that does not
  exist, which results in a 404 Not Found request.
  
  On one hand, this event is an error, so DEBUG or INFO seem a little too
  low. On the other hand, this error doesn't generally require any kind of
  operator investigation or indicate any actual failure of the service, so
  perhaps it is excessive to log it at WARN or ERROR.
  
  Please provide feedback to help us resolve this dispute if you feel you can!
 
 My feeling is this is an INFO level. There is really nothing the admin
 should care about here.

Agree with Sean. INFO are useful for investigations. WARN and ERROR are
cause for alarm.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-17 Thread Clint Byrum
This is where Debian's one urllib3 to rule them all model fails in
a modern fast paced world. Debian is arguably doing the right thing by
pushing everyone to use one API, and one library, so that when that one
library is found to be vulnerable to security problems, one update covers
everyone. Also, this is an HTTP/HTTPS library.. so nobody can make the
argument that security isn't paramount in this context.

But we all know that the app store model has started to bleed down into
backend applications, and now you just ship the virtualenv or docker
container that has your app as you tested it, and if that means you're
20 versions behind on urllib3, that's your problem, not the OS vendor's.

I think it is _completely_ irresponsible of requests, a library, to
embed another library. But I don't know if we can avoid making use of
it if we are going to be exposed to objects that are attached to it.

Anyway, Thomas, if you're going to send the mob with pitchforks and
torches somewhere, I'd say send them to wherever requests makes its
home. OpenStack is just buying their mutated product.

Excerpts from Donald Stufft's message of 2014-09-17 08:22:48 -0700:
 Looking at the code on my phone it looks completely correct to use the 
 vendored copy here and it wouldn't actually work otherwise. 
 
  On Sep 17, 2014, at 11:17 AM, Donald Stufft don...@stufft.io wrote:
  
  I don't know the specific situation but it's appropriate to do this if 
  you're using requests and wish to interact with the urllib3 that requests 
  is using.
  
  On Sep 17, 2014, at 11:15 AM, Thomas Goirand z...@debian.org wrote:
  
  Hi,
  
  I'm horrified by what I just found. I have just found out this in
  glanceclient:
  
  File bla/tests/test_ssl.py, line 19, in module
from requests.packages.urllib3 import poolmanager
  ImportError: No module named packages.urllib3
  
  Please *DO NOT* do this. Instead, please use urllib3 from ... urllib3.
  Not from requests. The fact that requests is embedding its own version
  of urllib3 is an heresy. In Debian, the embedded version of urllib3 is
  removed from requests.
  
  In Debian, we spend a lot of time to un-vendorize stuff, because
  that's a security nightmare. I don't want to have to patch all of
  OpenStack to do it there as well.
  
  And no, there's no good excuse here...
  
  Thomas Goirand (zigo)
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
  
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-17 Thread Clint Byrum
Excerpts from Davanum Srinivas's message of 2014-09-17 10:15:29 -0700:
 I was trying request-ifying oslo.vmware and ran into this as well:
 https://review.openstack.org/#/c/121956/
 
 And we don't seem to have urllib3 in global-requirements either.
 Should we do that first?

Honestly, after reading this:

https://github.com/kennethreitz/requests/pull/1812

I think we might want to consider requests a poor option. Its author
clearly doesn't understand the role a _library_ plays in software
development and considers requests an application, not a library.

For instance, why is requests exposing internal implementation details
at all?  It should be wrapping any exceptions or objects to avoid
forcing users to make this choice at all.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat][nova] VM restarting on host failure in convergence

2014-09-17 Thread Clint Byrum
Excerpts from Jastrzebski, Michal's message of 2014-09-17 06:03:06 -0700:
 All,
 
 Currently OpenStack does not have a built-in HA mechanism for tenant
 instances which could restore virtual machines in case of a host
 failure. Openstack assumes every app is designed for failure and can
 handle instance failure and will self-remediate, but that is rarely
 the case for the very large Enterprise application ecosystem.
 Many existing enterprise applications are stateful, and assume that
 the physical infrastructure is always on.
 

There is a fundamental debate that OpenStack's vendors need to work out
here. Existing applications are well served by existing virtualization
platforms. Turning OpenStack into a work-alike to oVirt is not the end
goal here. It's a happy accident that traditional apps can sometimes be
bent onto the cloud without much modification.

The thing that clouds do is they give development teams a _limited_
infrastructure that lets IT do what they're good at (keep the
infrastructure up) and lets development teams do what they're good at (run
their app). By putting HA into the _app_, and not the _infrastructure_,
the dev teams get agility and scalability. No more waiting weeks for
allocationg specialized servers with hardware fencing setups and fibre
channel controllers to house a shared disk system so the super reliable
virtualization can hide HA from the user.

Spin up vms. Spin up volumes.  Run some replication between regions,
and be resilient.

So, as long as it is understood that whatever is being proposed should
be an application centric feature, and not an infrastructure centric
feature, this argument remains interesting in the cloud context.
Otherwise, it is just an invitation for OpenStack to open up direct
competition with behemoths like vCenter.

 Even the OpenStack controller services themselves do not gracefully
 handle failure.
 

Which ones?

 When these applications were virtualized, they were virtualized on
 platforms that enabled very high SLAs for each virtual machine,
 allowing the application to not be rewritten as the IT team moved them
 from physical to virtual. Now while these apps cannot benefit from
 methods like automatic scaleout, the application owners will greatly
 benefit from the self-service capabilities they will recieve as they
 utilize the OpenStack control plane.
 

These apps were virtualized for IT's benefit. But the application authors
and users are now stuck in high-cost virtualization. The cloud is best
utilized when IT can control that cost and shift the burden of uptime
to the users by offering them more overall capacity and flexibility with
the caveat that the individual resources will not be as reliable.

So what I'm most interested in is helping authors change their apps to
be reslient on their own, not in putting more burden on IT.

 I'd like to suggest to expand heat convergence mechanism to enable
 self-remediation of virtual machines and other heat resources.
 

Convergence is still nascent. I don't know if I'd pile on to what might
take another 12 - 18 months to get done anyway. We're just now figuring
out how to get started where we thought we might already be 1/3 of the
way through. Just something to consider.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Adding hp1 back running tripleo CI

2014-09-17 Thread Clint Byrum
Excerpts from Derek Higgins's message of 2014-09-17 06:53:25 -0700:
 On 15/09/14 22:37, Gregory Haynes wrote:
  This is a total shot in the dark, but a couple of us ran into issues
  with the Ubuntu Trusty kernel (I know I hit it on HP hardware) that was
  causing severely degraded performance for TripleO. This fixed with a
  recently released kernel in Trusty... maybe you could be running into
  this?
 
 thanks Greg,
 
 To try this out, I've redeployed the new testenv image and ran 35
 overcloud jobs on it(32 passed), the average time for these was 130
 minutes so unfortunately no major difference.
 
 The old kernel was
 3.13.0-33-generic #58-Ubuntu SMP Tue Jul 29 16:45:05 UTC 2014 x86_64

This kernel definitely had the kvm bugs Greg and I exprienced in the
past

 the one one is
 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64
 

Darn. This one does not. Is it possible the hardware is just less
powerful?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-18 Thread Clint Byrum
Excerpts from Donald Stufft's message of 2014-09-18 04:58:06 -0700:
 
  On Sep 18, 2014, at 7:54 AM, Thomas Goirand z...@debian.org wrote:
  
  
  Linux distributions are not the end be all of distribution models and
  they don’t get to dictate to upstream.
  
  Well, distributions is where the final user is, and where software gets
  consumed. Our priority should be the end users.
 
 
 Distributions are not the only place that people get their software from,
 unless you think that the ~3 million downloads requests has received
 on PyPI in the last 30 days are distributions downloading requests to
 package in their OSs.
 

Do pypi users not also need to be able to detect and fix any versions
of libraries they might have? If one has some virtualenvs with various
libraries and apps installed and no --system-site-packages, one would
probably still want to run 'pip freeze' in all of them and find out what
libraries are there and need to be fixed.

Anyway, generally security updates require a comprehensive strategy.
One common comprehensive strategy is version assertion.

Vendoring complicates that immensely.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Clint Byrum
Great job highlighting what our friends over at Amazon are doing.

It's clear from these snippets, and a few other pieces of documentation
for SQS I've read, that the Amazon team approached SQS from a _massive_
scaling perspective. I think what may be forcing a lot of this frustration
with Zaqar is that it was designed with a much smaller scale in mind.

I think as long as that is the case, the design will remain in question.
I'd be comfortable saying that the use cases I've been thinking about
are entirely fine with the limitations SQS has.

Excerpts from Joe Gordon's message of 2014-09-17 13:36:18 -0700:
 Hi All,
 
 My understanding of Zaqar is that it's like SQS. SQS uses distributed
 queues, which have a few unusual properties [0]:
 Message Order
 
 Amazon SQS makes a best effort to preserve order in messages, but due to
 the distributed nature of the queue, we cannot guarantee you will receive
 messages in the exact order you sent them. If your system requires that
 order be preserved, we recommend you place sequencing information in each
 message so you can reorder the messages upon receipt.
 At-Least-Once Delivery
 
 Amazon SQS stores copies of your messages on multiple servers for
 redundancy and high availability. On rare occasions, one of the servers
 storing a copy of a message might be unavailable when you receive or delete
 the message. If that occurs, the copy of the message will not be deleted on
 that unavailable server, and you might get that message copy again when you
 receive messages. Because of this, you must design your application to be
 idempotent (i.e., it must not be adversely affected if it processes the
 same message more than once).
 Message Sample
 
 The behavior of retrieving messages from the queue depends whether you are
 using short (standard) polling, the default behavior, or long polling. For
 more information about long polling, see Amazon SQS Long Polling
 http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-long-polling.html
 .
 
 With short polling, when you retrieve messages from the queue, Amazon SQS
 samples a subset of the servers (based on a weighted random distribution)
 and returns messages from just those servers. This means that a particular
 receive request might not return all your messages. Or, if you have a small
 number of messages in your queue (less than 1000), it means a particular
 request might not return any of your messages, whereas a subsequent request
 will. If you keep retrieving from your queues, Amazon SQS will sample all
 of the servers, and you will receive all of your messages.
 
 The following figure shows short polling behavior of messages being
 returned after one of your system components makes a receive request.
 Amazon SQS samples several of the servers (in gray) and returns the
 messages from those servers (Message A, C, D, and B). Message E is not
 returned to this particular request, but it would be returned to a
 subsequent request.
 
 
 
 Presumably SQS has these properties because it makes the system scalable,
 if so does Zaqar have the same properties (not just making these same
 guarantees in the API, but actually having these properties in the
 backends)? And if not, why? I looked on the wiki [1] for information on
 this, but couldn't find anything.
 
 
 
 
 
 [0]
 http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/DistributedQueues.html
 [1] https://wiki.openstack.org/wiki/Zaqar

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-18 Thread Clint Byrum
Excerpts from Donald Stufft's message of 2014-09-18 07:30:27 -0700:
 
  On Sep 18, 2014, at 10:18 AM, Clint Byrum cl...@fewbar.com wrote:
  
  Excerpts from Donald Stufft's message of 2014-09-18 04:58:06 -0700:
  
  On Sep 18, 2014, at 7:54 AM, Thomas Goirand z...@debian.org wrote:
  
  
  Linux distributions are not the end be all of distribution models and
  they don’t get to dictate to upstream.
  
  Well, distributions is where the final user is, and where software gets
  consumed. Our priority should be the end users.
  
  
  Distributions are not the only place that people get their software from,
  unless you think that the ~3 million downloads requests has received
  on PyPI in the last 30 days are distributions downloading requests to
  package in their OSs.
  
  
  Do pypi users not also need to be able to detect and fix any versions
  of libraries they might have? If one has some virtualenvs with various
  libraries and apps installed and no --system-site-packages, one would
  probably still want to run 'pip freeze' in all of them and find out what
  libraries are there and need to be fixed.
  
  Anyway, generally security updates require a comprehensive strategy.
  One common comprehensive strategy is version assertion.
  
  Vendoring complicates that immensely.
 
 It doesn’t really matter. PyPI doesn’t dictate to projects who host there what
 that project is allowed to do except in some very broad circumstances. Whether
 or not requests *should* do this doesn't really have any bearing on what
 Openstack should do to cope with it. The facts are that requests does it, and
 that people pulling things from PyPI is an actual platform that needs thought
 about.
 
 This leaves Openstack with a few reasonable/sane options:
 
 1) Decide that vendoring in requests is unacceptable to what Openstack as a
project is willing to support, and cease the use of requests.
 2) Decide that what requests offers is good enough that it outweighs the fact
that it vendors urllib3 and continue using it.
 

There's also 3) fork requests, which is the democratic way to vote out
an upstream that isn't supporting the needs of the masses.

I don't think we're anywhere near there, but I wanted to make it clear
there _is_ a more extreme option.

 If the 2nd option is chosen, then doing anything but supporting the fact that
 requests vendors urllib3 within the code that openstack writes is hurting the
 users who fetch these projects from PyPI because you don't agree with one of
 the choices that requests makes. By all means do conditional imports to lessen
 the impact that the choice requests has made (and the one that Openstack has
 made to use requests) on downstream distributors, but unconditionally 
 importing
 from the top level urllib3 for use within requests is flat out wrong.
 
 Obviously neither of these options excludes the choice to lean on requests to
 reverse this decision as well. However that is best done elsewhere as the
 person making that decision isn't a member of these mailing lists as far as
 I am aware.
 

To be clear, I think we should keep using requests. But we should lend
our influence upstream and explain that our users are required to deal
with this in a way that perhaps hasn't been considered or given the
appropriate priority.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-18 Thread Clint Byrum
Excerpts from Ian Cordasco's message of 2014-09-18 07:35:10 -0700:
 On 9/18/14, 9:18 AM, Clint Byrum cl...@fewbar.com wrote:
 
 Excerpts from Donald Stufft's message of 2014-09-18 04:58:06 -0700:
  
   On Sep 18, 2014, at 7:54 AM, Thomas Goirand z...@debian.org wrote:
   
   
   Linux distributions are not the end be all of distribution models and
   they don’t get to dictate to upstream.
   
   Well, distributions is where the final user is, and where software
 gets
   consumed. Our priority should be the end users.
  
  
  Distributions are not the only place that people get their software
 from,
  unless you think that the ~3 million downloads requests has received
  on PyPI in the last 30 days are distributions downloading requests to
  package in their OSs.
  
 
 Do pypi users not also need to be able to detect and fix any versions
 of libraries they might have? If one has some virtualenvs with various
 libraries and apps installed and no --system-site-packages, one would
 probably still want to run 'pip freeze' in all of them and find out what
 libraries are there and need to be fixed.
 
 Anyway, generally security updates require a comprehensive strategy.
 One common comprehensive strategy is version assertion.
 
 Vendoring complicates that immensely.
 
 Except that even OpenStack doesn’t pin requests because of how
 extraordinarily stable our API is. While you can argue that Kenneth has
 non-standard opinions about his library, Cory and I take backwards
 compatibility and stability very seriously. This means anyone can upgrade
 to a newer version of requests without worrying that it will be backwards
 incompatible. 
 

All of your hard work is very much appreciated. I don't understand what
your assertion means though. We don't pin things. However, our users end
up pinning when they install via pip, and our distros end up pinning
when they deliver a version. Without any indication that urllib3 is in
the system, they will fail at any cursory version audit that looks for it.

I'm not saying either way is right or wrong either.. I'm suggesting
that this is a valid, proven method for large scale risk assessment,
and it is complicated quite a bit by vendored libraries.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Expand resource name allowed characters

2014-09-18 Thread Clint Byrum
Excerpts from Christopher Yeoh's message of 2014-09-18 16:57:12 -0700:
 On Thu, 18 Sep 2014 12:12:28 -0400
 Sean Dague s...@dague.net wrote:
   When we can return the json-schema to user in the future, can we say
   that means API accepting utf8 or utf8mb4 is discoverable? If it is
   discoverable, then we needn't limit anything in our python code.
  
  Honestly, we should accept utf8 (no weird mysqlism not quite utf8). We
  should make the default scheme for our dbs support that on names (but
  only for the name columns). The failure of a backend to do utf8 for
  real should return an error to the user. Let's not make this more
  complicated than it needs to be.
 
 I agree that discoverability for this is not the way to go - I think its
 too complicated for end users. I don't know enough about mysql to know
 if utf8mb4 is going to a performance issue but if its not then we
 should just support utf-8 properly. 
 
 We can we can catch the db errors. However whilst converting db
 errors causing 500s is fairly straightforward when an error occurs that
 deep in Nova it also means a lot of potential unwinding work in the db
 and compute layers which is complicated and error prone. So i'd prefer
 to avoid the situation with input validation in the first place. 

Just to add a reference into the discussion:

http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html

It does have the same limitation of making fixed width keys and CHAR()
columns. It goes from 3 bytes per CHAR position, to 4, so it should not
be a database wide default, but something we use sparingly.

Note that the right answer for things that are not utf-8 (like UUID's)
is not to set a charset of latin1, but use BINARY/VARBINARY. Last
time I tried I had a difficult time coercing SQLAlchemy to model the
difference.. but maybe I just didn't look in the right part of the manual.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-19 Thread Clint Byrum
Excerpts from Eoghan Glynn's message of 2014-09-19 04:23:55 -0700:
 
  Hi All,
  
  My understanding of Zaqar is that it's like SQS. SQS uses distributed 
  queues,
  which have a few unusual properties [0]:
  Message Order
  
  
  Amazon SQS makes a best effort to preserve order in messages, but due to the
  distributed nature of the queue, we cannot guarantee you will receive
  messages in the exact order you sent them. If your system requires that
  order be preserved, we recommend you place sequencing information in each
  message so you can reorder the messages upon receipt.
  At-Least-Once Delivery
  
  
  Amazon SQS stores copies of your messages on multiple servers for redundancy
  and high availability. On rare occasions, one of the servers storing a copy
  of a message might be unavailable when you receive or delete the message. If
  that occurs, the copy of the message will not be deleted on that unavailable
  server, and you might get that message copy again when you receive messages.
  Because of this, you must design your application to be idempotent (i.e., it
  must not be adversely affected if it processes the same message more than
  once).
  Message Sample
  
  
  The behavior of retrieving messages from the queue depends whether you are
  using short (standard) polling, the default behavior, or long polling. For
  more information about long polling, see Amazon SQS Long Polling .
  
  With short polling, when you retrieve messages from the queue, Amazon SQS
  samples a subset of the servers (based on a weighted random distribution)
  and returns messages from just those servers. This means that a particular
  receive request might not return all your messages. Or, if you have a small
  number of messages in your queue (less than 1000), it means a particular
  request might not return any of your messages, whereas a subsequent request
  will. If you keep retrieving from your queues, Amazon SQS will sample all of
  the servers, and you will receive all of your messages.
  
  The following figure shows short polling behavior of messages being returned
  after one of your system components makes a receive request. Amazon SQS
  samples several of the servers (in gray) and returns the messages from those
  servers (Message A, C, D, and B). Message E is not returned to this
  particular request, but it would be returned to a subsequent request.
  
  
  
  
  
  
  
  Presumably SQS has these properties because it makes the system scalable, if
  so does Zaqar have the same properties (not just making these same
  guarantees in the API, but actually having these properties in the
  backends)? And if not, why? I looked on the wiki [1] for information on
  this, but couldn't find anything.
 
 The premise of this thread is flawed I think.
 
 It seems to be predicated on a direct quote from the public
 documentation of a closed-source system justifying some
 assumptions about the internal architecture and design goals
 of that closed-source system.
 
 It then proceeds to hold zaqar to account for not making
 the same choices as that closed-source system.
 

I don't think we want Zaqar to make the same choices. OpenStack's
constraints are different from AWS's.

I want to highlight that our expectations are for the API to support
deploying at scale. SQS _clearly_ started with a point of extreme scale
for the deployer, and thus is a good example of an API that is limited
enough to scale like that.

What has always been the concern is that Zaqar would make it extremely
complicated and/or costly to get to that level.

 This puts the zaqar folks in a no-win situation, as it's hard
 to refute such arguments when they have no visibility over
 the innards of that closed-source system.
 

Nobody expects to know the insides. But the outsides, the parts that
are public, are brilliant because they are _limited_, and yet they still
support many many use cases.

 Sure, the assumption may well be correct that the designers
 of SQS made the choice to expose applications to out-of-order
 messages as this was the only practical way of acheiving their
 scalability goals.
 
 But since the code isn't on github and the design discussions
 aren't publicly archived, we have no way of validating that.
 

We don't need to see the code. Not requiring ordering makes the whole
problem easier to reason about. You don't need explicit pools anymore.
Just throw messages wherever, and make sure that everywhere gets
polled on a reasonable enough frequency. This is the kind of thing
operations loves. No global state means no split brain to avoid, no
synchronization. Does it solve all problems? no. But it solves a single
one, REALLY well.

Frankly I don't understand why there would be this argument to hold on
to so many use cases and so much API surface area. Zaqar's life gets
easier without ordering guarantees or message browsing. And it still
retains _many_ of its potential users.

___

Re: [openstack-dev] [Heat][Zaqar] Integration plan moving forward

2014-09-19 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-18 02:07:09 -0700:
 Greetings,
 
 If I recall correctly, Heat was planning to adopt Zaqar regardless of
 the result of the graduation attempt (please correct me if I'm wrong).
 Based on this assumption, I'd like to start working on a plan forward to
 make this integration happen.
 
 So far, these are the use cases I've collected from past discussions:
 
 * Notify  heat user before an action is taken, and after - Heat may want
 to wait  for a response before proceeding - notifications not
 necessarily needed  and signed read-only queues might help, but not
 necessary
 * For integrating with user's tools
 * Monitoring
 * Control surface
 * Config management tools
 * Does not require notifications and/or read-only/signed queue endpoints
 *[These may be helpful, but were not brought up in the discussion]

This is perhaps the most important need. It would be fully satisfied by
out of order messages as long as we have guaranteed at least once
delivery.

[for the rest, I've reduced the indent, as I don't think they were meant
to be underneath the one above]

 * Subscribe to an aggregate feed of interesting events from other
   open-stack components (such as Nova)
 * Heat is often deployed in a different place than other
   components and doesn't have access to the AMQP bus
 * Large  deployments consist of multiple AMQP brokers, and there
   doesn't seem to  be a nice way to aggregate all those events [need to
   confirm]

I've also heard tell that Ceilometer wants to be a sieve for these. I've
no idea why that makes sense, but I have heard it said.

 * Push metadata updates to os-collect-config agent running in
   servers, instead of having them poll Heat

This one is fine with an out of order durable queue.

 
 
 Few questions that I think we should start from:
 
 - Does the above list cover Heat's needs?
 - Which of the use cases listed above should be addressed first?
 - Can we split the above into milestones w/ due dates?
 
 
 Thanks,
 Flavio
 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat][Zaqar] Integration plan moving forward

2014-09-19 Thread Clint Byrum
Excerpts from Flavio Percoco's message of 2014-09-19 02:37:08 -0700:
 On 09/18/2014 11:51 AM, Angus Salkeld wrote:
  
  On 18/09/2014 7:11 PM, Flavio Percoco fla...@redhat.com
  mailto:fla...@redhat.com wrote:
 
  Greetings,
 
  If I recall correctly, Heat was planning to adopt Zaqar regardless of
  the result of the graduation attempt (please correct me if I'm wrong).
  Based on this assumption, I'd like to start working on a plan forward to
  make this integration happen.
 
  So far, these are the use cases I've collected from past discussions:
 
  * Notify  heat user before an action is taken, and after - Heat may want
  to wait  for a response before proceeding - notifications not
  necessarily needed  and signed read-only queues might help, but not
  necessary
  * For integrating with user's tools
  * Monitoring
  * Control surface
  * Config management tools
  * Does not require notifications and/or read-only/signed queue
  endpoints
  *[These may be helpful, but were not brought up in the discussion]
  * Subscribe to an aggregate feed of interesting events from other
  open-stack components (such as Nova)
  * Heat is often deployed in a different place than other
  components and doesn't have access to the AMQP bus
  * Large  deployments consist of multiple AMQP brokers, and there
  doesn't seem to  be a nice way to aggregate all those events [need to
  confirm]
  * Push metadata updates to os-collect-config agent running in
  servers, instead of having them poll Heat
 
 
  Few questions that I think we should start from:
 
  - Does the above list cover Heat's needs?
  - Which of the use cases listed above should be addressed first?
  
  IMHO it would be great to simply replace the event store we have
  currently, so that the user can get a stream of progress messages during
  the deployment.
 
 Could you point me to the right piece of code and/or documentation so I
 can understand better what it does and where do you want it to go?

https://git.openstack.org/cgit/openstack/heat/tree/heat/engine/event.py

We currently use db_api to store these in the database, which is costly.

Would be much better if we could just shove them into a message queue for
the user. It is problematic though, as we have event-list and event-show
in the Heat API which basically work the same as the things we've been
wanting removed from Zaqar's API: access by ID and pagination. ;)

I think ideally we'd deprecate those or populate them with nothing if
the user has opted to use messaging instead.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Please do *NOT* use vendorized versions of anything (here: glanceclient using requests.packages.urllib3)

2014-09-19 Thread Clint Byrum
Excerpts from Ian Cordasco's message of 2014-09-18 10:33:04 -0700:
 
 On 9/18/14, 11:29 AM, Clint Byrum cl...@fewbar.com wrote:
 
 Excerpts from Donald Stufft's message of 2014-09-18 07:30:27 -0700:
  
   On Sep 18, 2014, at 10:18 AM, Clint Byrum cl...@fewbar.com wrote:
   
   Excerpts from Donald Stufft's message of 2014-09-18 04:58:06 -0700:
   
   On Sep 18, 2014, at 7:54 AM, Thomas Goirand z...@debian.org wrote:
   
   
   Linux distributions are not the end be all of distribution models
 and
   they don’t get to dictate to upstream.
   
   Well, distributions is where the final user is, and where software
 gets
   consumed. Our priority should be the end users.
   
   
   Distributions are not the only place that people get their software
 from,
   unless you think that the ~3 million downloads requests has received
   on PyPI in the last 30 days are distributions downloading requests to
   package in their OSs.
   
   
   Do pypi users not also need to be able to detect and fix any versions
   of libraries they might have? If one has some virtualenvs with various
   libraries and apps installed and no --system-site-packages, one would
   probably still want to run 'pip freeze' in all of them and find out
 what
   libraries are there and need to be fixed.
   
   Anyway, generally security updates require a comprehensive strategy.
   One common comprehensive strategy is version assertion.
   
   Vendoring complicates that immensely.
  
  It doesn’t really matter. PyPI doesn’t dictate to projects who host
 there what
  that project is allowed to do except in some very broad circumstances.
 Whether
  or not requests *should* do this doesn't really have any bearing on what
  Openstack should do to cope with it. The facts are that requests does
 it, and
  that people pulling things from PyPI is an actual platform that needs
 thought
  about.
  
  This leaves Openstack with a few reasonable/sane options:
  
  1) Decide that vendoring in requests is unacceptable to what Openstack
 as a
 project is willing to support, and cease the use of requests.
  2) Decide that what requests offers is good enough that it outweighs
 the fact
 that it vendors urllib3 and continue using it.
  
 
 There's also 3) fork requests, which is the democratic way to vote out
 an upstream that isn't supporting the needs of the masses.
 
 Given requests’ download count, I have to doubt that OpenStack users
 constitute the masses in this case.
 

This wasn't the masses from the requests stand point, but from the
OpenStack standpoint. Consider the case of a small island territory
of a much larger nation. At some point most of them have claimed their
independence from the larger nation unless the larger nation is willing
to step up and make them a full member with a real vote. This allows
them to act in their best interest. So even if it means a much more
difficult road, it is the road most advantageous to them.

Also upon reflection, it's a bit interesting that forking requests is
being dismissed so quickly, when in essence, requests maintains a fork
of urllib3 in tree (albeit, one that is just a fork from the _releases_,
not from the whole project).

 I don't think we're anywhere near there, but I wanted to make it clear
 there _is_ a more extreme option.
 
  If the 2nd option is chosen, then doing anything but supporting the
 fact that
  requests vendors urllib3 within the code that openstack writes is
 hurting the
  users who fetch these projects from PyPI because you don't agree with
 one of
  the choices that requests makes. By all means do conditional imports to
 lessen
  the impact that the choice requests has made (and the one that
 Openstack has
  made to use requests) on downstream distributors, but unconditionally
 importing
  from the top level urllib3 for use within requests is flat out wrong.
  
  Obviously neither of these options excludes the choice to lean on
 requests to
  reverse this decision as well. However that is best done elsewhere as
 the
  person making that decision isn't a member of these mailing lists as
 far as
  I am aware.
  
 
 To be clear, I think we should keep using requests. But we should lend
 our influence upstream and explain that our users are required to deal
 with this in a way that perhaps hasn't been considered or given the
 appropriate priority.
 
 It’s been considered several times. There have been multiple issues.
 There’s more than just the one you linked to. The decision is highly
 unlikely to change whether it’s coming from a group of people in OpenStack
 or another distribution package maintainer.
 

Indeed, hence my thinking that forking requests might be in order. Even
if that fork is just a long lived fork that stays mostly in sync, but
without urllib3 vendored. I think that has actually already happened in
the distros... so I wonder how painful it would be to do the same thing
on pypi, and let the distros just consume that.

Anyway, I'm not going to take that challenge

Re: [openstack-dev] Oslo messaging vs zaqar

2014-09-22 Thread Clint Byrum
Geoff, do you expect all of our users to write all of their messaging
code in Python?

oslo.messaging is a _python_ library.

Zaqar is a service with a REST API -- accessible to any application.

As Zane's sarcastic reply implied, these are as related as sharks are
to tornados. Could they be combined? Yes [1]. But the only result would be
dead people and sharks strewn about the landscape.

[1] http://www.imdb.com/title/tt2724064/

Excerpts from Geoff O'Callaghan's message of 2014-09-20 01:17:45 -0700:
 Hi all,
 I'm just trying to understand the messaging strategy in openstack.It
 seems we have at least 2 messaging layers.
 
 Oslo.messaging and zaqar,  Can someone explain to me why there are two?
 To quote from the zaqar faq :
 -
 How does Zaqar compare to oslo.messaging?
 
 oslo.messsaging is an RPC library used throughout OpenStack to manage
 distributed commands by sending messages through different messaging
 layers. Oslo Messaging was originally developed as an abstraction over
 AMQP, but has since added support for ZeroMQ.
 
 As opposed to Oslo Messaging, Zaqar is a messaging service for the over and
 under cloud. As a service, it is meant to be consumed by using libraries
 for different languages. Zaqar currently supports 1 protocol (HTTP) and
 sits on top of other existing technologies (MongoDB as of version 1.0).
 
 It seems to my casual view that we could have one and scale that and use it
 for SQS style messages, internal messaging (which could include logging)
 all managed by message schemas and QoS.  This would give a very robust and
 flexible system for endpoints to consume.
 
 Is there a plan to consolidate?
 
 Rgds
 Geoff

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-22 Thread Clint Byrum
Excerpts from Joe Gordon's message of 2014-09-22 19:04:03 -0700:
 On Mon, Sep 22, 2014 at 5:47 PM, Zane Bitter zbit...@redhat.com wrote:
 
  On 22/09/14 17:06, Joe Gordon wrote:
 
  On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter zbit...@redhat.com wrote:
 
   On 22/09/14 10:11, Gordon Sim wrote:
 
   On 09/19/2014 09:13 PM, Zane Bitter wrote:
 
   SQS offers very, very limited guarantees, and it's clear that the
  reason
  for that is to make it massively, massively scalable in the way that
  e.g. S3 is scalable while also remaining comparably durable (S3 is
  supposedly designed for 11 nines, BTW).
 
  Zaqar, meanwhile, seems to be promising the world in terms of
  guarantees. (And then taking it away in the fine print, where it says
  that the operator can disregard many of them, potentially without the
  user's knowledge.)
 
  On the other hand, IIUC Zaqar does in fact have a sharding feature
  (Pools) which is its answer to the massive scaling question.
 
 
  There are different dimensions to the scaling problem.
 
 
  Many thanks for this analysis, Gordon. This is really helpful stuff.
 
As I understand it, pools don't help scaling a given queue since all
  the
 
  messages for that queue must be in the same pool. At present traffic
  through different Zaqar queues are essentially entirely orthogonal
  streams. Pooling can help scale the number of such orthogonal streams,
  but to be honest, that's the easier part of the problem.
 
 
  But I think it's also the important part of the problem. When I talk
  about
  scaling, I mean 1 million clients sending 10 messages per second each,
  not
  10 clients sending 1 million messages per second each.
 
  When a user gets to the point that individual queues have massive
  throughput, it's unlikely that a one-size-fits-all cloud offering like
  Zaqar or SQS is _ever_ going to meet their needs. Those users will want
  to
  spin up and configure their own messaging systems on Nova servers, and at
  that kind of size they'll be able to afford to. (In fact, they may not be
  able to afford _not_ to, assuming per-message-based pricing.)
 
 
  Running a message queue that has a high guarantee of not loosing a message
  is hard and SQS promises exactly that, it *will* deliver your message. If
  a
  use case can handle occasionally dropping messages then running your own
  MQ
  makes more sense.
 
  SQS is designed to handle massive queues as well, while I haven't found
  any
  examples of queues that have 1 million messages/second being sent or
  received  30k to 100k messages/second is not unheard of [0][1][2].
 
  [0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
  [1] http://java.dzone.com/articles/benchmarking-sqs
  [2]
  http://www.slideshare.net/AmazonWebServices/massive-
  message-processing-with-amazon-sqs-and-amazon-
  dynamodb-arc301-aws-reinvent-2013-28431182
 
 
  Thanks for digging those up, that's really helpful input. I think number
  [1] kind of summed up part of what I'm arguing here though:
 
  But once your requirements get above 35k messages per second, chances are
  you need custom solutions anyway; not to mention that while SQS is cheap,
  it may become expensive with such loads.
 
 
 If you don't require the reliability guarantees that SQS provides then
 perhaps. But I would be surprised to hear that a user can set up something
 with this level of uptime for less:
 
 Amazon SQS runs within Amazon’s high-availability data centers, so queues
 will be available whenever applications need them. To prevent messages from
 being lost or becoming unavailable, all messages are stored redundantly
 across multiple servers and data centers. [1]
 

This is pretty easily doable with gearman or even just using Redis
directly. But it is still ops for end users. The AWS users I've talked to
who use SQS do so because they like that they can use RDS, SQS, and ELB,
and have only one type of thing to operate: their app.

 
 
 There is also the possibility of using the sharding capabilities of the
 
  underlying storage. But the pattern of use will determine how effective
  that can be.
 
  So for example, on the ordering question, if order is defined by a
  single sequence number held in the database and atomically incremented
  for every message published, that is not likely to be something where
  the databases sharding is going to help in scaling the number of
  concurrent publications.
 
  Though sharding would allow scaling the total number messages on the
  queue (by distributing them over multiple shards), the total ordering of
  those messages reduces it's effectiveness in scaling the number of
  concurrent getters (e.g. the concurrent subscribers in pub-sub) since
  they will all be getting the messages in exactly the same order.
 
  Strict ordering impacts the competing consumers case also (and is in my
  opinion of limited value as a guarantee anyway). At any given time, the
  head of the queue is in one shard, and all concurrent claim 

Re: [openstack-dev] [Heat] Question regarding Stack updates and templates

2014-09-22 Thread Clint Byrum
Excerpts from Angus Salkeld's message of 2014-09-22 20:31:46 -0700:
 On Tue, Sep 23, 2014 at 1:04 AM, Anant Patil anant.pa...@hp.com wrote:
 
  Hi,
 
  In convergence, we discuss about having concurrent updates to a stack. I
  wanted to know if it is safe to assume that the an update will be a
  super set of it's previous updates. Understanding this is critical to
  arrive at implementation of concurrent stack operations.
 
  Assuming that an admin will have VCS setup and will issue requests by
  checking-out the template and modifying it, I could see that the updates
  will be incremental and not discreet. Is this assumption correct? When
  an update is issued before a previous update is complete, would the
  template for that be based on the template of previously issued
  incomplete update or the last completed one?
 
 
 I don't think you can assume anything about the update. What if the user
 just posts a
 totally different template? That is still a valid update. Or they post an
 empty template
 to delete the resources.

Agreed. The new template simply replaces the graph with a new version.
If that graph happens to change everything, then the old things will now
be in a should not exist desired state and the new template convergence
should remove them when it gets to the garbage collection at the end.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Convergence: Backing up template instead of stack

2014-09-22 Thread Clint Byrum
Excerpts from Angus Salkeld's message of 2014-09-22 20:15:43 -0700:
 On Tue, Sep 23, 2014 at 1:09 AM, Anant Patil anant.pa...@hp.com wrote:
 
  Hi,
 
  One of the steps in the direction of convergence is to enable Heat
  engine to handle concurrent stack operations. The main convergence spec
  talks about it. Resource versioning would be needed to handle concurrent
  stack operations.
 
  As of now, while updating a stack, a backup stack is created with a new
  ID and only one update runs at a time. If we keep the raw_template
  linked to it's previous completed template, i.e. have a back up of
  template instead of stack, we avoid having backup of stack.
 
  Since there won't be a backup stack and only one stack_id to be dealt
  with, resources and their versions can be queried for a stack with that
  single ID. The idea is to identify resources for a stack by using stack
  id and version. Please let me know your thoughts.
 
 
 Hi Anant,
 
 This seems more complex than it needs to be.
 
 I could be wrong, but I thought the aim was to simply update the goal state.
 The backup stack is just the last working stack. So if you update and there
 is already an update you don't need to touch the backup stack.
 
 Anyone else that was at the meetup want to fill us in?
 

The backup stack is a device used to collect items to operate on after
the current action is complete. It is entirely an implementation detail.

Resources that can be updated in place will have their resource record
superseded, but retain their physical resource ID.

This is one area where the resource plugin API is particularly sticky,
as resources are allowed to raise the replace me exception if in-place
updates fail. That is o-k though, at that point we will just comply by
creating a replacement resource as if we never tried the in-place update.

In order to facilitate this, we must expand the resource data model to
include a version. Replacement resources will be marked as current and
to-be-removed resources marked for deletion. We can also keep all current
- 1 resources around to facilitate rollback until the stack reaches a
complete state again. Once that is done, we can remove the backup stack.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Oslo messaging vs zaqar

2014-09-22 Thread Clint Byrum
Excerpts from Geoff O'Callaghan's message of 2014-09-22 17:30:47 -0700:
 On 23/09/2014 1:59 AM, Clint Byrum cl...@fewbar.com wrote:
 
  Geoff, do you expect all of our users to write all of their messaging
  code in Python?
 
  oslo.messaging is a _python_ library.
 
  Zaqar is a service with a REST API -- accessible to any application.
 
 No I do not.   I am suggesting thaf a well designed, scalable and robust
 messaging layer can meet the requirements of both as well as a number of
 other openstack servuces.  How the messaging layer is consumed isn't the
 issue.
 
 Below is what I originally posted.
 
   
   It seems to my casual view that we could have one and scale that and
 use it
   for SQS style messages, internal messaging (which could include logging)
   all managed by message schemas and QoS.  This would give a very robust
 and
   flexible system for endpoints to consume.
  
   Is there a plan to consolidate?
 

Sorry for the snark George. I was very confused by the text above, and
I still am. I am confused because consolidation requires commonalities,
of which to my mind, there are almost none other than the relationship
to the very abstract term messaging.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-23 Thread Clint Byrum
Excerpts from Joe Gordon's message of 2014-09-23 14:59:33 -0700:
 On Tue, Sep 23, 2014 at 9:13 AM, Zane Bitter zbit...@redhat.com wrote:
 
  On 22/09/14 22:04, Joe Gordon wrote:
 
  To me this is less about valid or invalid choices. The Zaqar team is
  comparing Zaqar to SQS, but after digging into the two of them, zaqar
  barely looks like SQS. Zaqar doesn't guarantee what IMHO is the most
  important parts of SQS: the message will be delivered and will never be
  lost by SQS.
 
 
  I agree that this is the most important feature. Happily, Flavio has
  clarified this in his other thread[1]:
 
   *Zaqar's vision is to provide a cross-cloud interoperable,
fully-reliable messaging service at scale that is both, easy and not
invasive, for deployers and users.*
 
...
 
Zaqar aims to be a fully-reliable service, therefore messages should
never be lost under any circumstances except for when the message's
expiration time (ttl) is reached
 
  So Zaqar _will_ guarantee reliable delivery.
 
   Zaqar doesn't have the same scaling properties as SQS.
 
 
  This is true. (That's not to say it won't scale, but it doesn't scale in
  exactly the same way that SQS does because it has a different architecture.)
 
  It appears that the main reason for this is the ordering guarantee, which
  was introduced in response to feedback from users. So this is clearly a
  different design choice: SQS chose reliability plus effectively infinite
  scalability, while Zaqar chose reliability plus FIFO. It's not feasible to
  satisfy all three simultaneously, so the options are:
 
  1) Implement two separate modes and allow the user to decide
  2) Continue to choose FIFO over infinite scalability
  3) Drop FIFO and choose infinite scalability instead
 
  This is one of the key points on which we need to get buy-in from the
  community on selecting one of these as the long-term strategy.
 
   Zaqar is aiming for low latency per message, SQS doesn't appear to be.
 
 
  I've seen no evidence that Zaqar is actually aiming for that. There are
  waaay lower-latency ways to implement messaging if you don't care about
  durability (you wouldn't do store-and-forward, for a start). If you see a
  lot of talk about low latency, it's probably because for a long time people
  insisted on comparing Zaqar to RabbitMQ instead of SQS.
 
 
 I thought this was why Zaqar uses Falcon and not Pecan/WSME?
 
 For an application like Marconi where throughput and latency is of
 paramount importance, I recommend Falcon over Pecan.
 https://wiki.openstack.org/wiki/Zaqar/pecan-evaluation#Recommendation
 
 Yes that statement mentions throughput as well, but it does mention latency
 as well.
 

I definitely see where that may have subtly suggested the wrong
thing, if indeed latency isn't a top concern.

I think what it probably should say is something like this:

For an application like Marconi where there will be many repetitive,
small requests, a lighter weight solution such as Falcon is preferred
over Pecan.

As in, we care about the cost of all those requests, not so much about
the latency.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tripleo] New Project - Kolla: Deploy and Manage OpenStack using Kubernetes and Docker

2014-09-24 Thread Clint Byrum
Excerpts from Jay Pipes's message of 2014-09-23 21:38:37 -0700:
 On 09/23/2014 10:29 PM, Steven Dake wrote:
  There is a deployment program - tripleo is just one implementation.
 
 Nope, that is not correct. Like it or not (I personally don't), Triple-O 
 is *the* Deployment Program for OpenStack:
 
 http://git.openstack.org/cgit/openstack/governance/tree/reference/programs.yaml#n284
 
 Saying Triple-O is just one implementation of a deployment program is 
 like saying Heat is just one implementation of an orchestration program. 
 It isn't. It's *the* implemenation of an orchestration program that has 
 been blessed by the TC:
 
 http://git.openstack.org/cgit/openstack/governance/tree/reference/programs.yaml#n112
 

That was written before we learned everything we've learned in the last
12 months. I think it is unfair to simply point to this and imply that
bending or even changing it is not open for discussion.

   We
  went through this with Heat and various projects that want to extend
  heat (eg Murano) and one big mistake I think Murano folks made was not
  figuring out where there code would go prior to writing it.  I'm only
  making a statement as to where I think it should belong.
 
 Sorry, I have to call you to task on this.
 
 You think it was a mistake for the Murano folks to not figure out where 
 the code would go prior to writing it? For the record, Murano existed 
 nearly 2 years ago, as a response to various customer requests. Having 
 the ability to properly deploy Windows applications like SQL Server and 
 Active Directory into an OpenStack cloud was more important to the 
 Murano developers than trying to predict what the whims of the OpenStack 
 developer and governance model would be months or years down the road.
 
 Tell me, did any of Heat's code exist prior to deciding to propose it 
 for incubation? Saying that Murano developers should have thought about 
 where their code would live is holding them to a higher standard than 
 any of the other developer communities. Did folks working on 
 disk-image-builder pre-validate with the TC or the mailing list that the 
 dib code would live in the triple-o program? No, of course not. It was 
 developed naturally and then placed into the program that fit it best.
 
 Murano was developed naturally in exactly the same way, and the Murano 
 developers have been nothing but accommodating to every request made of 
 them by the TC (and those requests have been entirely different over the 
 last 18 months, ranging from split it out to just propose another 
 program) and by the PTLs for projects that requested they split various 
 parts of Murano out into existing programs.
 
 The Murano developers have done no power grab, have deliberately tried 
 to be as community-focused and amenable to all requests as possible, and 
 yet they are treated with disdain by a number of folks in the core Heat 
 developer community, including yourself, Clint and Zane. And honestly, I 
 don't get it... all Murano is doing is generating Heat templates and 
 trying to fill in some pieces that Heat isn't interested in doing. I 
 don't see why there is so much animosity towards a project that has, to 
 my knowledge, acted in precisely the ways that we've asked projects to 
 act in the OpenStack community: with openness, transparency, and 
 community good will.

Disdain is hardly the right word. Disdain implies we don't have any
respect at all for Murano. I cannot speak for others, but I do have
respect. I'm just not interested in Murano.

FWIW, I think what Steven Dake is saying is that he does not want to
end up in the same position Murano is in. I think that is unlikely,
as we're seeing many projects hitting the same wall, which is the cause
for discussing changing how we include or exclude projects.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] PTL Candidacy

2014-09-24 Thread Clint Byrum
I am writing to announce my candidacy for OpenStack Deployment PTL.

Those of you involved with the deployment program may be surprised to
see my name here. I've been quiet lately, distracted by an experiment
which was announced by Allison Randal a few months back. [1]

The experiment has been going well. We've had to narrow our focus from
the broader OpenStack project and just push hard to get HP's Helion
Product ready for release, but we're ready to bring everything back out
into the open and add it to the options for the deployment program. Most
recently our 'tripleo-ansible' repository has been added to stackforge [2],
and I hope we can work out a way where it lands in the official deployment
namespace once we have broader interest.

Those facts may cause some readers to panic, and others to rejoice,
but I would ask you to keep reading, even if you think the facts above
might disqualify me from your ballot.

My intention is to serve as PTL for OpenStack Deployment. I want to
emphasize the word serve. I believe that a PTL's first job is to serve
the mission of the program.

I have watched Robert serve closely, and I think I understand the wide
reach the program already has. We make use of Ironic, Nova, Glance,
Neutron, and Heat, and we need to interface directly with those projects
to be successful, regardless of any other tools in use.

However, I don't think the way to scale this project is to buckle down and
try to be a hero-PTL. We need to make the program's mission more appealing
to a greater number of OpenStack operators that want to deploy and manage
OpenStack. This will widen our focus, which may slow some things down,
but we can collaborate, and find common ground on many issues while still
pushing forward on the fronts that are important to each organization.

My recent experience with Ansible has convinced me that Ansible is not
_the_ answer, but that Ansible is _an_ answer which serves the needs
of some OpenStack users. Heat serves other needs, where Puppet, Chef,
Salt, and SSH in a for loop serve yet more diverse needs.

So, with that in mind, I want to succinctly state my priorities for
the role:

 * Serve the operators. Our feedback from operators has been extremely
   mixed. We need to do a better job of turning operators into OpenStack
   Deployment users and contributors.

 * Improve diversity. I have been as guilty as anyone else in the past
   of slamming the door on those who wanted to join our effort but with
   a different use case. This was a mistake. Looking forward, the door
   needs to stay open, and be widened. Without that, we won't be able
   to welcome more operators.

 * March toward a presence in the gate. I know that the gate is
   a hot term and up for debate right now. However, there will always
   be a gate of some kind for the projects in the integrated release,
   and I'd like to see a more production-like test in that gate. From
   the beginning, TripleO has been focused on supporting continuous
   deployment models, so it would make a lot of sense to have TripleO
   doing integration testing of the integrated release. If there is
   a continued stripping down of the gate, then TripleO would still
   certainly be a valuable CI job for the integrated release. We've had
   TripleO break numerous times because we run with a focus on production
   ready settings and multiple nodes which exposes new facets of the
   code that go untouched in the single-node simple-and-fast focused
   devstack tests.
   
   Of course, our CI has not exactly been rock solid, for various
   reasons. We need to make it a priority to get CI handled for at least
   the primary tooling, and at the same time welcome and support efforts
   to make use of our infrastructure for alternative tooling. This isn't
   something I necessarily think will happen in the next 6 months, but
   I think one role that a PTL can be asked to serve is as shepherd of
   long term efforts, and this is definitely one of those.

So, I thank you for taking the time to read this, and hope that whatever
happens we can build a better deployment program this cycle.

-Clint Byrum

[1] http://lists.openstack.org/pipermail/openstack-dev/2014-August/042589.html
[2] https://git.openstack.org/cgit/stackforge/tripleo-ansible

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Zaqar] The horse is dead. Long live the horse.

2014-09-24 Thread Clint Byrum
Sorry for the vague subject[1]. I just wanted to commend Flavio Percoco
and the Zaqar team for maintaining poise and being excellent citizens
of OpenStack whilst being questioned intensely by the likes of me,
and others.

I feel that this questioning has been useful, and will allow us to reason
about Zaqar in the future. So, I recommend that we stop questioning,
and start coding.

If you feel that a lighter weight system with different guarantees will
serve the users of OpenStack better than Zaqar, then own up and write it.

Meanwhile, I suggest we spend our communication bandwidth and effort
on reasoning about the bigger problem that Zaqar exposes, and which I
think Monty has highlighted in his recent thread about the big tent.

Anyway, thanks for listening.

-Clint

[1] The subject is a reference to beating a dead horse. Zaqar is not a
horse, and is not dead. The first person to get angry about my declaration
of Zaqar's death should be asked to wear a ridiculous sombrero to the
next summit.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Thoughts on OpenStack Layers and a Big Tent model

2014-09-24 Thread Clint Byrum
Excerpts from Robert Collins's message of 2014-09-23 21:14:47 -0700:
 No one helped me edit this :)
 
 http://rbtcollins.wordpress.com/2014/09/24/what-poles-for-the-tent/
 
 I hope I haven't zoned out and just channelled someone else here ;)
 

This sounds like API's are what matters. You did spend some time
working with Simon Wardley, didn't you? ;)

I think it's a sound argument, but I'd like to banish the term reference
implementation from any discussions around what OpenStack, as a project,
delivers. It has too many negative feelings wrapped up in it.

I also want to call attention to how what you describe feels an awful
lot like POSIX to me. Basically offering guarantees of API compatibility,
but then letting vendors run wild around and behind it.

I'm not sure if that is a good thing, or a bad thing. I do, however,
think if we can avoid a massive vendor battle that involves multiple
vendors pushing multiple implementations, we will save our companies a
lot of money, and our users will get what they need sooner.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [olso] [cinder] upgrade issues in lock_path in cinder after oslo utils sync

2013-12-09 Thread Clint Byrum
Excerpts from Sean Dague's message of 2013-12-09 08:17:45 -0800:
 On 12/06/2013 05:40 PM, Ben Nemec wrote:
  On 2013-12-06 16:30, Clint Byrum wrote:
  Excerpts from Ben Nemec's message of 2013-12-06 13:38:16 -0800:
 
 
  On 2013-12-06 15:14, Yuriy Taraday wrote:
 
   Hello, Sean.
  
   I get the issue with upgrade path. User doesn't want to update
  config unless one is forced to do so.
   But introducing code that weakens security and let it stay is an
  unconditionally bad idea.
   It looks like we have to weigh two evils: having troubles upgrading
  and lessening security. That's obvious.
  
   Here are my thoughts on what we can do with it:
   1. I think we should definitely force user to do appropriate
  configuration to let us use secure ways to do locking.
   2. We can wait one release to do so, e.g. issue a deprecation
  warning now and force user to do it the right way later.
   3. If we are going to do 2. we should do it in the service that is
  affected not in the library because library shouldn't track releases
  of an application that uses it. It should do its thing and do it
  right (secure).
  
   So I would suggest to deal with it in Cinder by importing
  'lock_path' option after parsing configs and issuing a deprecation
  warning and setting it to tempfile.gettempdir() if it is still None.
 
  This is what Sean's change is doing, but setting lock_path to
  tempfile.gettempdir() is the security concern.
 
  Yuriy's suggestion is that we should let Cinder override the config
  variable's default with something insecure. Basically only deprecate
  it in Cinder's world, not oslo's. That makes more sense from a library
  standpoint as it keeps the library's expected interface stable.
  
  Ah, I see the distinction now.  If we get this split off into
  oslo.lockutils (which I believe is the plan), that's probably what we'd
  have to do.
  
 
  Since there seems to be plenty of resistance to using /tmp by default,
  here is my proposal:
 
  1) We make Sean's change to open files in append mode. I think we can
  all agree this is a good thing regardless of any config changes.
 
  2) Leave lockutils broken in Icehouse if lock_path is not set, as I
  believe Mark suggested earlier. Log an error if we find that
  configuration. Users will be no worse off than they are today, and if
  they're paying attention they can get the fixed lockutils behavior
  immediately.
 
  Broken how? Broken in that it raises an exception, or broken in that it
  carries a security risk?
  
  Broken as in external locks don't actually lock.  If we fall back to
  using a local semaphore it might actually be a little better because
  then at least the locks work within a single process, whereas before
  there was no locking whatsoever.
 
 Right, so broken as in doesn't actually locks, potentially completely
 scrambles the user's data, breaking them forever.
 

Things I'd like to see OpenStack do in the short term, ranked in ascending
order of importance:

4) Upgrade smoothly.
3) Scale.
2) Help users manage external risks.
1) Not do what Sean described above.

I mean, how can we even suggest silently destroying integrity?

I suggest merging Sean's patch and putting a warning in the release
notes that running without setting this will be deprecated in the next
release. If that is what this is preventing this debate should not have
happened, and I sincerely apologize for having delayed it. I believe my
mistake was assuming this was something far more trivial than without
this patch we destroy users' data.

I thought we were just talking about making upgrades work. :-P

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Schduler] Volunteers wanted for a modest proposal for an external scheduler in our lifetime

2013-12-09 Thread Clint Byrum
Excerpts from Herman Narkaytis's message of 2013-12-09 08:18:17 -0800:
 Hi All,
   Last couple of month Mirantis team was working on new scalable scheduler
 architecture. The main concept was proposed by Boris Pavlovic in the
 following blue print
 https://blueprints.launchpad.net/nova/+spec/no-db-scheduler and Alexey
 Ovchinnikov prepared a bunch of patches
 https://review.openstack.org/#/c/45867/9
   This patch set was intensively reviewed by community and there was a call
 for some kind of documentation that describes overall architecture and
 details of implementation. Here is an etherpad document
 https://etherpad.openstack.org/p/scheduler-design-proposal (a copy in
 google doc
 https://docs.google.com/a/mirantis.com/document/d/1irmDDYWWKWAGWECX8bozu8AAmzgQxMCAAdjhk53L9aM/edit
 ).
   Comments and critics are highly welcome.
 

Looks great. I think I would post this message as new rather than a
reply. It it isn't really related to the original thread. Many people
who are interested in scheduler improvements may have already killed this
thread in their mail reader and thus may miss your excellent message. :)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][TripleO] Nested resources

2013-12-09 Thread Clint Byrum
Excerpts from Fox, Kevin M's message of 2013-12-09 09:34:06 -0800:
 I'm thinking more generic:
 
 The cloud provider will provide one or more suballocating images. The one 
 Triple O uses to take a bare metal node and make vm's available would be the 
 obvious one to make available initially. I think that one should not have a 
 security concern since it is already being used in that way safely.

I like where you're going with this, in that the cloud should eventually
become self aware enough to be able to privision the baremetal resources
it has and spin nova up on them. I do think that is quite far out. Right
now, we have two nova's.. an undercloud nova which owns all the baremetal,
and an overcloud nova which owns all the vms. This is definitely nested,
but there is a hard line between the two.

For many people, that hard line is a feature. For others, it is a bug.  :)

 
 I think a docker based one shouldn't have the safety concern either, since I 
 think docker containerizes network resources too? I could be wrong though.
 

The baremetal-to-tenant issues have little to do with networking. They
are firmware problems. Root just has too much power on baremetal.
Somebody should make some hardware which defends against that. For now
the best thing is virtualization extensions.

Docker isn't really going to fix that. The containerization that is
available is good, but does not do nearly as much as true virtualization
does to isolate the user from the hardware. There's still a single
kernel there, and thus, if you can trick that kernel, you can own the
whole box. I've heard it descried as a little better than chroot.
AFAIK, the people using containers for multi-tenant are doing so by
leveraging kernel security modules heavily.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-09 Thread Clint Byrum
Excerpts from Steven Dake's message of 2013-12-09 09:41:06 -0800:
 On 12/09/2013 09:41 AM, David Boucha wrote:
  On Sat, Dec 7, 2013 at 11:09 PM, Monty Taylor mord...@inaugust.com 
  mailto:mord...@inaugust.com wrote:
 
 
 
  On 12/08/2013 07:36 AM, Robert Collins wrote:
   On 8 December 2013 17:23, Monty Taylor mord...@inaugust.com
  mailto:mord...@inaugust.com wrote:
  
  
   I suggested salt because we could very easily make trove and
  savana into
   salt masters (if we wanted to) just by having them import salt
  library
   and run an api call. When they spin up nodes using heat, we
  could easily
   have that to the cert exchange - and the admins of the site
  need not
   know _anything_ about salt, puppet or chef - only about trove
  or savana.
  
   Are salt masters multi-master / HA safe?
  
   E.g. if I've deployed 5 savanna API servers to handle load, and they
   all do this 'just import', does that work?
  
   If not, and we have to have one special one, what happens when it
   fails / is redeployed?
 
  Yes. You can have multiple salt masters.
 
   Can salt minions affect each other? Could one pretend to be a
  master,
   or snoop requests/responses to another minion?
 
  Yes and no. By default no - and this is protected by key
  encryption and
  whatnot. They can affect each other if you choose to explicitly grant
  them the ability to. That is - you can give a minion an acl to
  allow it
  inject specific command requests back up into the master. We use
  this in
  the infra systems to let a jenkins slave send a signal to our salt
  system to trigger a puppet run. That's all that slave can do though -
  send the signal that the puppet run needs to happen.
 
  However - I don't think we'd really want to use that in this case,
  so I
  think they answer you're looking for is no.
 
   Is salt limited: is it possible to assert that we *cannot* run
   arbitrary code over salt?
 
  In as much as it is possible to assert that about any piece of
  software
  (bugs, of course, blah blah) But the messages that salt sends to a
  minion are run this thing that you have a local definition for
  rather
  than here, have some python and run it
 
  Monty
 
 
 
  Salt was originally designed to be a unified agent for a system like 
  openstack. In fact, many people use it for this purpose right now.
 
  I discussed this with our team management and this is something 
  SaltStack wants to support.
 
  Are there any specifics things that the salt minion lacks right now to 
  support this use case?
 
 
 David,
 
 If I am correct of my parsing of the salt nomenclature, Salt provides a 
 Master (eg a server) and minions (eg agents that connect to the salt 
 server).  The salt server tells the minions what to do.
 
 This is not desirable for a unified agent (atleast in the case of Heat).
 
 The bar is very very very high for introducing new *mandatory* *server* 
 dependencies into OpenStack.  Requiring a salt master (or a puppet 
 master, etc) in my view is a non-starter for a unified guest agent 
 proposal.  Now if a heat user wants to use puppet, and can provide a 
 puppet master in their cloud environment, that is fine, as long as it is 
 optional.
 

What if we taught Heat to speak salt-master-ese? AFAIK it is basically
an RPC system. I think right now it is 0mq, so it would be relatively
straight forward to just have Heat start talking to the agents in 0mq.

 A guest agent should have the following properties:
 * minimal library dependency chain
 * no third-party server dependencies
 * packaged in relevant cloudy distributions
 

That last one only matters if the distributions won't add things like
agents to their images post-release. I am pretty sure work well in
OpenStack is important for server distributions and thus this is at
least something we don't have to freak out about too much.

 In terms of features:
 * run shell commands
 * install files (with selinux properties as well)
 * create users and groups (with selinux properties as well)
 * install packages via yum, apt-get, rpm, pypi
 * start and enable system services for systemd or sysvinit
 * Install and unpack source tarballs
 * run scripts
 * Allow grouping, selection, and ordering of all of the above operations
 

All of those things are general purpose low level system configuration
features. None of them will be needed for Trove or Savanna. They need
to do higher level things like run a Hadoop job or create a MySQL user.

 Agents are a huge pain to maintain and package.  It took a huge amount 
 of willpower to get cloud-init standardized across the various 
 distributions.  We have managed to get heat-cfntools (the heat agent) 
 into every distribution at this point and this was a significant amount 
 of work.  We don't want to keep repeating this 

Re: [openstack-dev] [heat] Core criteria, review stats vs reality

2013-12-09 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2013-12-09 03:31:36 -0800:
 Hi all,
 
 So I've been getting concerned about $subject recently, and based on some
 recent discussions so have some other heat-core folks, so I wanted to start
 a discussion where we can agree and communicate our expectations related to
 nomination for heat-core membership (becuase we do need more core
 reviewers):
 
 The issues I have are:
 - Russell's stats (while very useful) are being used by some projects as
   the principal metric related to -core membership (ref TripleO's monthly
   cull/nameshame, which I am opposed to btw).  This is in some cases
   encouraging some stats-seeking in our review process, IMO.
 

This is quite misleading, so please do put the TripleO reference in
context:

http://lists.openstack.org/pipermail/openstack-dev/2013-October/016186.html
http://lists.openstack.org/pipermail/openstack-dev/2013-October/016232.html

Reading the text of those two I think you can see that while the stats
are a tool Robert is using to find the good reviewers, it is not the
principal metric.

I also find it quite frustrating that you are laying accusations of
stats-seeking without proof. That is just spreading FUD. I'm sure that
is not what you want to do, so I'd like to suggest that we not accuse our
community of any kind of cheating or gaming of the system without
actual proof. I would also suggest that these accusations be made in
private and dealt with directly rather than as broad passive-aggressive
notes on the mailing list.

 - Review quality can't be measured mechanically - we have some folks who
   contribute fewer, but very high quality reviews, and are also very active
   contributors (so knowledge of the codebase is not stale).  I'd like to
   see these people do more reviews, but removing people from core just
   because they drop below some arbitrary threshold makes no sense to me.


Not sure I agree that it absolutely can't, but it certainly isn't
something these stats are even meant to do.

We other reviewers must keep tabs on our aspiring core reviewers and
try to rate them ourselves based on whether or not they're spotting the
problems we would spot, and whether or not they're also upholding the
culture we want to foster in our community. We express our rating of
these people when voting on a nomination in the mailing list.

So what you're saying is, there is more to our votes than the mechanical
number. I'd agree 100%. However, I think the numbers _do_ let people
know where they stand in one very limited aspect versus the rest of the
community.

I would actually find it interesting if we had a meta-gerrit that asked
us to review the reviews. This type of system works fantastically for
stackexchange. That would give us a decent mechanical number as well.

 So if you're aiming for heat-core nomination, here's my personal wish-list,
 but hopefully others can proide their input and we can update the wiki with
 the resulting requirements/guidelines:
 
 - Make your reviews high-quality.  Focus on spotting logical errors,
   reducing duplication, consistency with existing interfaces, opportunities
   for reuse/simplification etc.  If every review you do is +1, or -1 for a
   trivial/cosmetic issue, you are not making a strong case for -core IMHO.
 

Disagree. I am totally fine having somebody in core who is really good
at finding all of the trivial cosmetic issues. Those should mean that
the second +2'er of their code is looking at code free of trivial and
cosmetic issues that distract from the bigger issues.

 - Send patches.  Some folks argue that -core membership is only about
   reviews, I disagree - There are many aspects of reviews which require
   deep knowledge of the code, e.g spotting structural issues, logical
   errors caused by interaction with code not modified by the patch,
   effective use of test infrastructure, etc etc.  This deep knowledge comes
   from writing code, not only reviewing it.  This also gives us a way to
   verify your understanding and alignment with our sylistic conventions.
 

The higher the bar goes, the less reviewers we will have. There are
plenty of people that will find _tons_ of real issues but won't submit
very many patches if any. However, I think there isn't any value in
arguing over this point as most of our reviewers are also submitting
patches already.

 - Fix bugs.  Related to the above, help us fix real problems by testing,
   reporting bugs, and fixing them, or take an existing bug and post a patch
   fixing it.  Ask an existing team member to direct you if you're not sure
   which bug to tackle.  Sending patches doing trivial cosmetic cleanups is
   sometimes worthwhile, but make sure that's not all you do, as we need
   -core folk who can find, report, fix and review real user-impacting
   problems (not just new features).  This is also a great way to build
   trust and knowledge if you're aiming to contribute features to Heat.


There's a theme running through 

Re: [openstack-dev] [heat] Core criteria, review stats vs reality

2013-12-09 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2013-12-09 09:52:25 -0800:
 On 09/12/13 06:31, Steven Hardy wrote:
  Hi all,
 
  So I've been getting concerned about $subject recently, and based on some
  recent discussions so have some other heat-core folks, so I wanted to start
  a discussion where we can agree and communicate our expectations related to
  nomination for heat-core membership (becuase we do need more core
  reviewers):
 
  The issues I have are:
  - Russell's stats (while very useful) are being used by some projects as
 the principal metric related to -core membership (ref TripleO's monthly
 cull/nameshame, which I am opposed to btw).  This is in some cases
 encouraging some stats-seeking in our review process, IMO.
 
  - Review quality can't be measured mechanically - we have some folks who
 contribute fewer, but very high quality reviews, and are also very active
 contributors (so knowledge of the codebase is not stale).  I'd like to
 see these people do more reviews, but removing people from core just
 because they drop below some arbitrary threshold makes no sense to me.
 
 +1
 
 Fun fact: due to the quirks of how Gerrit produces the JSON data dump, 
 it's not actually possible for the reviewstats tools to count +0 
 reviews. So, for example, one can juice one's review stats by actively 
 obstructing someone else's work (voting -1) when a friendly comment 
 would have sufficed. This is one of many ways in which metrics offer 
 perverse incentives.
 
 Statistics can be useful. They can be particularly useful *in the 
 aggregate*. But as soon as you add a closed feedback loop you're no 
 longer measuring what you originally thought - mostly you're just 
 measuring the gain of the feedback loop.
 

I think I understand the psychology of stats and incentives, and I know
that this _may_ happen.

However, can we please be more careful about how this is referenced?
Your message above is suggesting the absolute _worst_ behavior from our
community. That is not what I expect, and I think anybody who was doing
that would be dealt with _swiftly_.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Tripleo] Core reviewer update Dec

2013-12-10 Thread Clint Byrum
Excerpts from Robert Collins's message of 2013-12-09 16:31:37 -0800:
 On 6 December 2013 21:56, Jaromir Coufal jcou...@redhat.com wrote:
 
 
  Hey there,
 
  thanks Rob for keeping eye on this. Speaking for myself, as current
  non-coder it was very hard to keep pace with others, especially when UI was
  on hold and I was designing future views. I'll continue working on designs
  much more, but I will also keep an eye on code which is going in. I believe
  that UX reviews will be needed before merging so that we assure keeping the
  vision. That's why I would like to express my will to stay within -core even
  when I don't deliver that big amount of reviews as other engineers. However
  if anybody feels that I should be just +1, I completely understand and I
  will give up my +2 power.
 
  -- Jarda
 
 Hey, so -
 
 I think there are two key things to highlight here. Firstly, there's
 considerable support from other -core for delaying the removals this
 month, so we'll re-evaluate in Jan (and be understanding then as there
 is a big 1-2 week holiday in there).
 
 That said, I want to try and break down the actual implications here,
 both in terms of contributions, recognition and what it means for the
 project.
 
 Firstly, contributions. Reviewing isn't currently *directly*
 recognised as a 'technical contribution' by the bylaws: writing code
 that land in the repository is, and there is a process for other
 contributions (such as design, UX, and reviews) to be explicitly
 recognised out-of-band. It's possible we should revisit that -
 certainly I'd be very happy to nominate people contributing through
 that means as a TripleO ATC irrespective of their landing patches in a
 TripleO code repository [as long as their reviews *are* helpful :)].
 But thats a legalistic sort of approach. A more holistic approach is
 to say that any activity that helps TripleO succeed in it's mission is
 a contribution, and we should be fairly broad in our recognition of
 that activity : whether it's organising contributed hardware for the
 test cloud, helping admin the test cloud, doing code review, or UX
 design - we should recognise and celebrate all of those things.
 Specifically, taking the time to write a thoughtful code review which
 avoids a bug landing in TripleO, or keeps the design flexible and
 effective *is* contributing to TripleO.
 
 We have a bit of a bug in OpenStack today, IMO, in that there is more
 focus on being -core than on being a good effective reviewer. IMO
 that's backwards: the magic switch that lets you set +2 and -2 is a
 responsibility, and that has some impact on the weight your comments
 in reviews have on other people - both other core and non-core, but
 the contribution we make by reviewing doesn't suddenly get
 significantly better by virtue of being -core. There is an element of
 trust and faith in personality etc - you don't want destructive
 behaviour in code review, but you don't want that from anyone - it's
 not a new requirement place on -core. What I'd like to see is more of
 a focus on review (design review, code review, architecture review) as
 something we should all contribute towards - jointly share the burden.
 For instance, the summit is a fantastic point for us to come together
 and do joint design review of the work organisations are pushing on
 for the next 6 months : thats a fantastic contribution. But when
 organisations don't send people to the summit, because of $reasons,
 that reduces our entire ability to catch problems with that planned
 work : going to the summit is /hard work/ - long days, exhausting,
 intense conversations. The idea (which I've heard some folk mention)
 that only -core folk would be sent to the summit is incredibly nutty!
 

We are human.

Humans see a word like core and they look it up in their internal
dictionary. Apples have cores. Nuclear reactors have cores. You can
extract a core sample from deep under a sheet of ice. In our mind,
cores are at the _center_. The seeds are in the cores. Cores are where
all the magic happens. They hold the secrets.

So it makes sense that when not-core sees core, they think I should
defer to this person. They do not think I am a peer. If the code
review process were a nuclear reactor, they are just shielding, or vent
tubes. That person is part of _the core_.

And having +2/-2/+A reinforces this. Core does not have to say much,
they can speak softly and carry a big -2 stick.

Finally, being confirmed by the rest of the team makes them special. It
makes them a leader.

Can we change this? Before we run off and change everything, I think we
must also ask ourselves should we change this?

 So what does it mean for TripleO when someone stops being -core
 because of inactivity:
 
 Firstly it means they have *already* effectively stopped doing code
 review at a high frequency: they are *not* contributing in a
 substantial fashion through that avenue. It doesn't mean anything
 about other avenues of contribution.
 
 

Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-10 08:15:15 -0800:
 Guys,
 
 I see two major trends in the thread:
 
  * use Salt
  * write our own solution with architecture similar to Salt or MCollective
 
 There were points raised pro and contra both solutions. But I have a
 concern which I believe was not covered yet. Both solutions use either
 ZeroMQ or message queues (AMQP/STOMP) as a transport. The thing is there is
 going to be a shared facility between all the tenants. And unlike all other
 OpenStack services, this facility will be directly accessible from VMs,
 which leaves tenants very vulnerable to each other. Harm the facility from
 your VM, and the whole Region/Cell/Availability Zone will be left out of
 service.
 
 Do you think that is solvable, or maybe I overestimate the threat?
 

I think Salt would be thrilled if we tested and improved its resiliency
to abuse. We're going to have to do that with whatever we expose to VMs.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-10 08:25:26 -0800:
 And one more thing,
 
 Sandy Walsh pointed to the client Rackspace developed and use - [1], [2].
 Its design is somewhat different and can be expressed by the following
 formulae:
 
 App - Host (XenStore) - Guest Agent
 
 (taken from the wiki [3])
 
 It has an obvious disadvantage - it is hypervisor dependent and currently
 implemented for Xen only. On the other hand such design should not have
 shared facility vulnerability as Agent accesses the server not directly but
 via XenStore (which AFAIU is compute node based).
 

I don't actually see any advantage to this approach. It seems to me that
it would be simpler to expose and manage a single network protocol than
it would be to expose hypervisor level communications for all hypervisors.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [heat] Stack preview

2013-12-10 Thread Clint Byrum
Excerpts from Richard Lee's message of 2013-12-10 09:46:49 -0800:
 Hey all,
 
 We're working on a
 blueprinthttps://blueprints.launchpad.net/heat/+spec/preview-stack
 that
 adds the ability to preview what a given template+parameters would create
 in terms of resources.  We think this would provide significant value for
 blueprint authors and for other heat users that want to see what someone's
 template would create before actually launching resources (and possibly
 having to pay for them).
 
 We'd love to hear any thoughts regarding this feature

Thanks for starting with use cases. I like the use case of being able
to preview the damage this template will do to your bank account when
consuming templates without actually understanding the template language.

I'd love to see more details in the spec.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] workflow for fixes involving numerous changes

2013-12-10 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2013-12-10 03:00:26 -0800:
 On Tue, Dec 10, 2013 at 11:45:11AM +0200, Pavlo Shchelokovskyy wrote:
  - wouldn't it be better to keep all these changes in one bug and fix all
  misuses per file basis (with one file per patch-set for example)? It seems
  to me it would be easier to review in this way.
 
 One file per patch is not a good idea IMO, I think the review burden is
 minimised if you make sure that each commit just contains the exact same
 change to many files, then it's quick to click through them all and confirm
 all looks OK.

+1

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-10 11:08:58 -0800:
 2013/12/10 Clint Byrum cl...@fewbar.com
 
  Excerpts from Dmitry Mescheryakov's message of 2013-12-10 08:25:26 -0800:
   And one more thing,
  
   Sandy Walsh pointed to the client Rackspace developed and use - [1], [2].
   Its design is somewhat different and can be expressed by the following
   formulae:
  
   App - Host (XenStore) - Guest Agent
  
   (taken from the wiki [3])
  
   It has an obvious disadvantage - it is hypervisor dependent and currently
   implemented for Xen only. On the other hand such design should not have
   shared facility vulnerability as Agent accesses the server not directly
  but
   via XenStore (which AFAIU is compute node based).
  
 
  I don't actually see any advantage to this approach. It seems to me that
  it would be simpler to expose and manage a single network protocol than
  it would be to expose hypervisor level communications for all hypervisors.
 
 
 I think the Rackspace agent design could be expanded as follows:
 
 Controller (Savanna/Trove) - AMQP/ZeroMQ - Agent on Compute host -
 XenStore - Guest Agent
 
 That is somewhat speculative because if I understood it correctly the
 opened code covers only the second part of exchange:
 
 Python API / CMD interface - XenStore - Guest Agent
 
 Assuming I got it right:
 While more complex, such design removes pressure from AMQP/ZeroMQ
 providers: on the 'Agent on Compute' you can easily control the amount of
 messages emitted by Guest with throttling. It is easy since such agent runs
 on a compute host. In the worst case, if it is happened to be abused by a
 guest, it affect this compute host only and not the whole segment of
 OpenStack.
 

This still requires that we also write a backend to talk to the host
for all virt drivers. It also means that any OS we haven't written an
implementation for needs to be hypervisor-aware. That sounds like a
never ending battle.

If it is just a network API, it works the same for everybody. This
makes it simpler, and thus easier to scale out independently of compute
hosts. It is also something we already support and can very easily expand
by just adding a tiny bit of functionality to neutron-metadata-agent.

In fact we can even push routes via DHCP to send agent traffic through
a different neutron-metadata-agent, so I don't see any issue where we
are piling anything on top of an overstressed single resource. We can
have neutron route this traffic directly to the Heat API which hosts it,
and that can be load balanced and etc. etc. What is the exact scenario
you're trying to avoid?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-10 12:37:37 -0800:
  What is the exact scenario you're trying to avoid?
 
 It is DDoS attack on either transport (AMQP / ZeroMQ provider) or server
 (Salt / Our own self-written server). Looking at the design, it doesn't
 look like the attack could be somehow contained within a tenant it is
 coming from.
 

We can push a tenant-specific route for the metadata server, and a tenant
specific endpoint for in-agent things. Still simpler than hypervisor-aware
guests. I haven't seen anybody ask for this yet, though I'm sure if they
run into these problems it will be the next logical step.

 In the current OpenStack design I see only one similarly vulnerable
 component - metadata server. Keeping that in mind, maybe I just
 overestimate the threat?
 

Anything you expose to the users is vulnerable. By using the localized
hypervisor scheme you're now making the compute node itself vulnerable.
Only now you're asking that an already complicated thing (nova-compute)
add another job, rate limiting.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] 答复: [OpenStack][Heat] AutoScaling scale down issue

2013-12-11 Thread Clint Byrum
Hi!

This list is for discussion of ongoing bugs and features in Heat. For
user-centric discussions, please use the main openstack mailing list:

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Thanks!

Excerpts from Haiming Yang's message of 2013-12-11 09:40:32 -0800:
 I think it might be useful to think about how to integrate savana into heat.
 When auto scale usually the first created node will be removed first.
 
 -原始邮件-
 发件人: Jay Lau jay.lau@gmail.com
 发送时间: ‎2013/‎12/‎11 23:46
 收件人: OpenStack Development Mailing List openstack-dev@lists.openstack.org
 主题: [openstack-dev] [OpenStack][Heat] AutoScaling scale down issue
 
 Greetings,
 
 
 Here come a question related to heat auto scale down.
 
 
 The scenario is as following:
 
 I was trying to deploy hadoop cluster with heat Auto Scaling template. 
 
 When scale up a slave node, I can use user-data to do some post work for 
 configuration file on hadoop master node base on the information of slave 
 node (The mainly configuration file is conf/slaves as I need to put slave 
 node to this file);
 
 But when scale down, seems I have no chance to do some configuration for the 
 master node (Remove the scale down node from conf/slaves) as master node do 
 not know which slave node was scale down.
 
 
 Does anyone has some experience on this?
 
 
 
 Thanks,
 
 
 Jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Generic question: Any tips for 'keeping up' with the mailing lists?

2013-12-12 Thread Clint Byrum
Excerpts from Justin Hammond's message of 2013-12-12 08:23:24 -0800:
 I am a developer who is currently having troubles keeping up with the
 mailing list due to volume, and my inability to organize it in my client.
 I am nearly forced to use Outlook 2011 for Mac and I have read and
 attempted to implement
 https://wiki.openstack.org/wiki/MailingListEtiquette but it is still a lot
 to deal with. I read once a topic or wiki page on using X-Topics but I
 have no idea how to set that in outlook (google has told me that the
 feature was removed).

Justin I'm sorry that the volume is catching up with you. I have a highly
optimized email-list-reading work-flow using sup-mail and a few filters,
and I still spend 2 hours a day sifting through all of the lists I am on
(not just openstack lists). It is worth it to keep aware and to avoid
duplicating efforts, even if it means I have to hit the kill this thread
button a lot.

Whomever is forcing you to use this broken client, I suggest that you
explain to them your situation. It is the reason for your problems. Note
that you can just subscribe to the list from a different address than
you post from, and configure a good e-mail client like Thunderbird to set
your From: address so that you still are representing your organization
the way you'd like to. So if it is just a mail server thing, that is
one way around it.

Also the setup I use makes use of offlineimap, which can filter things
for you, so if you have IMAP access to your inbox, you can use that and
then just configure your client for local access (I believe Thunderbird
even supports a local Maildir mode).

Anyway, you _MUST_ have a threaded email client that quotes well for
replies. If not, I'm afraid it will remain unnecessarily difficult to
participate on this list.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

2013-12-12 Thread Clint Byrum
I've been chasing quite a few bugs in the TripleO automated bring-up
lately that have to do with failures because either there are no valid
hosts ready to have servers scheduled, or there are hosts listed and
enabled, but they can't bind to the network because for whatever reason
the L2 agent has not checked in with Neutron yet.

This is only a problem in the first few minutes of a nova-compute host's
life. But it is critical for scaling up rapidly, so it is important for
me to understand how this is supposed to work.

So I'm asking, is there a standard way to determine whether or not a
nova-compute is definitely ready to have things scheduled on it? This
can be via an API, or even by observing something on the nova-compute
host itself. I just need a definitive signal that the compute host is
ready.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

2013-12-12 Thread Clint Byrum
Excerpts from Chris Friesen's message of 2013-12-12 09:19:42 -0800:
 On 12/12/2013 11:02 AM, Clint Byrum wrote:
 
  So I'm asking, is there a standard way to determine whether or not a
  nova-compute is definitely ready to have things scheduled on it? This
  can be via an API, or even by observing something on the nova-compute
  host itself. I just need a definitive signal that the compute host is
  ready.
 
 Is it not sufficient that nova service-list shows the compute service 
 as up?
 

I could spin waiting for at least one. Not a bad idea actually. However,
I suspect that will only handle the situations I've gotten where the
scheduler returns NoValidHost.

I say that because I think if it shows there, it matches the all hosts
filter and will have things scheduled on it. With one compute host I
get failures after scheduling because neutron has no network segment to
bind to. That is because the L2 agent on the host has not yet registered
itself with Neutron.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

2013-12-12 Thread Clint Byrum
Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800:
 On 12/12/2013 12:02 PM, Clint Byrum wrote:
  I've been chasing quite a few bugs in the TripleO automated bring-up
  lately that have to do with failures because either there are no valid
  hosts ready to have servers scheduled, or there are hosts listed and
  enabled, but they can't bind to the network because for whatever reason
  the L2 agent has not checked in with Neutron yet.
  
  This is only a problem in the first few minutes of a nova-compute host's
  life. But it is critical for scaling up rapidly, so it is important for
  me to understand how this is supposed to work.
  
  So I'm asking, is there a standard way to determine whether or not a
  nova-compute is definitely ready to have things scheduled on it? This
  can be via an API, or even by observing something on the nova-compute
  host itself. I just need a definitive signal that the compute host is
  ready.
 
 If a nova compute host has registered itself to start having instances
 scheduled to it, it *should* be ready.  AFAIK, we're not doing any
 network sanity checks on startup, though.
 
 We already do some sanity checks on startup.  For example, nova-compute
 requires that it can talk to nova-conductor.  nova-compute will block on
 startup until nova-conductor is responding if they happened to be
 brought up at the same time.
 
 We could do something like this with a networking sanity check if
 someone could define what that check should look like.
 

Could we ask Neutron if our compute host has an L2 agent yet? That seems
like a valid sanity check.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-12 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-12 09:24:13 -0800:
 Clint, Kevin,
 
 Thanks for reassuring me :-) I just wanted to make sure that having direct
 access from VMs to a single facility is not a dead end in terms of security
 and extensibility. And since it is not, I agree it is much simpler (and
 hence better) than hypervisor-dependent design.
 
 
 Then returning to two major suggestions made:
  * Salt
  * Custom solution specific to our needs
 
 The custom solution could be made on top of oslo.messaging. That gives us
 RPC working on different messaging systems. And that is what we really need
 - an RPC into guest supporting various transports. What it lacks at the
 moment is security - it has neither authentication nor ACL.
 

I bet salt would be super open to modularizing their RPC. Since
oslo.messaging includes ZeroMQ, and is a library now, I see no reason to
avoid opening that subject with our fine friends in the Salt community.
Perhaps a few of them are even paying attention right here. :)

The benefit there is that we get everything except the plugins we want
to write already done. And we could start now with the ZeroMQ-only
salt agent if we could at least get an agreement on principle that Salt
wouldn't mind using an abstraction layer for RPC.

That does make the poke a hole out of private networks conversation
_slightly_ more complex. It is one thing to just let ZeroMQ out, another
to let all of oslo.messaging's backends out. But I think in general
they'll all share the same thing: you want an address+port to be routed
intelligently out of the private network into something running under
the cloud.

Next steps (all can be done in parallel, as all are interdependent):

* Ask Salt if oslo.messaging is a path they'll walk with us
* Experiment with communicating with salt agents from an existing
  OpenStack service (Savanna, Trove, Heat, etc)
* Deep-dive into Salt to see if it is feasible

As I have no cycles for this, I can't promise to do any, but I will
try to offer assistance if I can.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Neutron] How do we know a host is ready to have servers scheduled onto it?

2013-12-12 Thread Clint Byrum
Excerpts from Kyle Mestery's message of 2013-12-12 09:53:57 -0800:
 On Dec 12, 2013, at 11:44 AM, Jay Pipes jaypi...@gmail.com wrote:
  On 12/12/2013 12:36 PM, Clint Byrum wrote:
  Excerpts from Russell Bryant's message of 2013-12-12 09:09:04 -0800:
  On 12/12/2013 12:02 PM, Clint Byrum wrote:
  I've been chasing quite a few bugs in the TripleO automated bring-up
  lately that have to do with failures because either there are no valid
  hosts ready to have servers scheduled, or there are hosts listed and
  enabled, but they can't bind to the network because for whatever reason
  the L2 agent has not checked in with Neutron yet.
  
  This is only a problem in the first few minutes of a nova-compute host's
  life. But it is critical for scaling up rapidly, so it is important for
  me to understand how this is supposed to work.
  
  So I'm asking, is there a standard way to determine whether or not a
  nova-compute is definitely ready to have things scheduled on it? This
  can be via an API, or even by observing something on the nova-compute
  host itself. I just need a definitive signal that the compute host is
  ready.
  
  If a nova compute host has registered itself to start having instances
  scheduled to it, it *should* be ready.  AFAIK, we're not doing any
  network sanity checks on startup, though.
  
  We already do some sanity checks on startup.  For example, nova-compute
  requires that it can talk to nova-conductor.  nova-compute will block on
  startup until nova-conductor is responding if they happened to be
  brought up at the same time.
  
  We could do something like this with a networking sanity check if
  someone could define what that check should look like.
  
  Could we ask Neutron if our compute host has an L2 agent yet? That seems
  like a valid sanity check.
  
  ++
  
 This makes sense to me as well. Although, not all Neutron plugins have
 an L2 agent, so I think the check needs to be more generic than that.
 For example, the OpenDaylight MechanismDriver we have developed
 doesn't need an agent. I also believe the Nicira plugin is agent-less,
 perhaps there are others as well.
 
 And I should note, does this sort of integration also happen with cinder,
 for example, when we're dealing with storage? Any other services which
 have a requirement on startup around integration with nova as well?
 

Does cinder actually have per-compute-host concerns? I admit to being a
bit cinder-stupid here.

Anyway, it seems to me that any service that is compute-host aware
should be able to respond to the compute host whether or not it is a)
aware of it, and b) ready to serve on it.

For agent-less drivers that is easy, you just always return True. And
for drivers with agents, you return false unless you can find an agent
for the host.

So something like:

GET /host/%(compute-host-name)

And then in the response include a ready attribute that would signal
whether all networks that should work there, can work there.

As a first pass, just polling until that is ready before nova-compute
enables itself would solve the problems I see (and that I think users
would see as a cloud provider scales out compute nodes). Longer term
we would also want to aim at having notifications available for this
so that nova-compute could subscribe to that notification bus and then
disable itself if its agent ever goes away.

I opened this bug to track the issue. I suspect there are duplicates of
it already reported, but would like to start clean to make sure it is
analyzed fully and then we can use those other bugs as test cases and
confirmation:

https://bugs.launchpad.net/nova/+bug/1260440

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-12 Thread Clint Byrum
Excerpts from Jay Pipes's message of 2013-12-12 10:15:13 -0800:
 On 12/10/2013 03:49 PM, Ian Wells wrote:
  On 10 December 2013 20:55, Clint Byrum cl...@fewbar.com
  mailto:cl...@fewbar.com wrote:
 
  If it is just a network API, it works the same for everybody. This
  makes it simpler, and thus easier to scale out independently of compute
  hosts. It is also something we already support and can very easily
  expand
  by just adding a tiny bit of functionality to neutron-metadata-agent.
 
  In fact we can even push routes via DHCP to send agent traffic through
  a different neutron-metadata-agent, so I don't see any issue where we
  are piling anything on top of an overstressed single resource. We can
  have neutron route this traffic directly to the Heat API which hosts it,
  and that can be load balanced and etc. etc. What is the exact scenario
  you're trying to avoid?
 
 
  You may be making even this harder than it needs to be.  You can create
  multiple networks and attach machines to multiple networks.  Every point
  so far has been 'why don't we use idea as a backdoor into our VM
  without affecting the VM in any other way' - why can't that just be one
  more network interface set aside for whatever management  instructions
  are appropriate?  And then what needs pushing into Neutron is nothing
  more complex than strong port firewalling to prevent the slaves/minions
  talking to each other.  If you absolutely must make the communication
  come from a system agent and go to a VM, then that can be done by
  attaching the system agent to the administrative network - from within
  the system agent, which is the thing that needs this, rather than within
  Neutron, which doesn't really care how you use its networks.  I prefer
  solutions where other tools don't have to make you a special case.
 
 I've read through this email thread with quite a bit of curiosity, and I 
 have to say what Ian says above makes a lot of sense to me. If Neutron 
 can handle the creation of a management vNIC that has some associated 
 iptables rules governing it that provides a level of security for guest 
 - host and guest - $OpenStackService, then the transport problem 
 domain is essentially solved, and Neutron can be happily ignorant (as it 
 should be) of any guest agent communication with anything else.
 

Indeed I think it could work, however I think the NIC is unnecessary.

Seems likely even with a second NIC that said address will be something
like 169.254.169.254 (or the ipv6 equivalent?).

If we want to attach that network as a second NIC instead of pushing a
route to it via DHCP, that is fine. But I don't think it actually gains
much, and the current neutron-metadata-agent already facilitates the
conversation between private guests and 169.254.169.254. We just need to
make sure we can forward more than port 80 through that.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-12 Thread Clint Byrum
Excerpts from Steven Dake's message of 2013-12-12 12:32:55 -0800:
 On 12/12/2013 10:24 AM, Dmitry Mescheryakov wrote:
  Clint, Kevin,
 
  Thanks for reassuring me :-) I just wanted to make sure that having 
  direct access from VMs to a single facility is not a dead end in terms 
  of security and extensibility. And since it is not, I agree it is much 
  simpler (and hence better) than hypervisor-dependent design.
 
 
  Then returning to two major suggestions made:
   * Salt
   * Custom solution specific to our needs
 
  The custom solution could be made on top of oslo.messaging. That gives 
  us RPC working on different messaging systems. And that is what we 
  really need - an RPC into guest supporting various transports. What it 
  lacks at the moment is security - it has neither authentication nor ACL.
 
  Salt also provides RPC service, but it has a couple of disadvantages: 
  it is tightly coupled with ZeroMQ and it needs a server process to 
  run. A single transport option (ZeroMQ) is a limitation we really want 
  to avoid. OpenStack could be deployed with various messaging 
  providers, and we can't limit the choice to a single option in the 
  guest agent. Though it could be changed in the future, it is an 
  obstacle to consider.
 
  Running yet another server process within OpenStack, as it was already 
  pointed out, is expensive. It means another server to deploy and take 
  care of, +1 to overall OpenStack complexity. And it does not look it 
  could be fixed any time soon.
 
  For given reasons I give favor to an agent based on oslo.messaging.
 
 
 An agent based on oslo.messaging is a potential security attack vector 
 and a possible scalability problem.  We do not want the guest agents 
 communicating over the same RPC servers as the rest of OpenStack

I don't think we're talking about agents talking to the exact same
RabbitMQ/Qpid/etc. bus that things under the cloud are talking to. That
would definitely raise some eyebrows. No doubt it will be in the realm
of possibility if deployers decide to do that, but so is letting your
database server sit on the same flat network as your guests.

I have a hard time seeing how using the same library is a security
risk though.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-13 Thread Clint Byrum
Excerpts from Scott Moser's message of 2013-12-13 06:28:08 -0800:
 On Tue, 10 Dec 2013, Ian Wells wrote:
 
  On 10 December 2013 20:55, Clint Byrum cl...@fewbar.com wrote:
 
   If it is just a network API, it works the same for everybody. This
   makes it simpler, and thus easier to scale out independently of compute
   hosts. It is also something we already support and can very easily expand
   by just adding a tiny bit of functionality to neutron-metadata-agent.
  
   In fact we can even push routes via DHCP to send agent traffic through
   a different neutron-metadata-agent, so I don't see any issue where we
   are piling anything on top of an overstressed single resource. We can
   have neutron route this traffic directly to the Heat API which hosts it,
   and that can be load balanced and etc. etc. What is the exact scenario
   you're trying to avoid?
  
 
  You may be making even this harder than it needs to be.  You can create
  multiple networks and attach machines to multiple networks.  Every point so
  far has been 'why don't we use idea as a backdoor into our VM without
  affecting the VM in any other way' - why can't that just be one more
  network interface set aside for whatever management  instructions are
  appropriate?  And then what needs pushing into Neutron is nothing more
  complex than strong port firewalling to prevent the slaves/minions talking
  to each other.  If you absolutely must make the communication come from a
 
 +1
 
 tcp/ip works *really* well as a communication mechanism.  I'm planning on
 using it to send this email.
 
 For controlled guests, simply don't break your networking.  Anything that
 could break networking can break /dev/hypervisor-socket also.
 

Who discussed breaking networking?

 Fwiw, we already have an extremely functional agent in just about every
 [linux] node in sshd.  Its capable of marshalling just about anything in
 and out of the node. (note, i fully realize there are good reasons for
 more specific agent, lots of them exist).
 

This was already covered way back in the thread. sshd is a backdoor
agent, and thus undesirable for this purpose. Locking it down is more
effort than adopting an agent which is meant to be limited to specific
tasks.

Also SSH is a push agent, so Savanna/Heat/Trove would have to find the
VM, and reach into it to do things. A pull agent scales well because you
only have to tell the nodes where to pull things from, and then you can
add more things to pull from behind that endpoint without having to
update the nodes.

 I've really never understood we don't want to rely on networking as a
 transport.
 

You may have gone to plaid with this one. Not sure what you mean. AFAICT
the direct-to-hypervisor tricks are not exactly popular in this thread.
Were you referring to something else?

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-13 Thread Clint Byrum
Excerpts from Alessandro Pilotti's message of 2013-12-13 07:13:01 -0800:
 Hi guys,
 
 This seems to become a pretty long thread with quite a lot of ideas. What do 
 you think about setting up a meeting on IRC to talk about what direction to 
 take?
 IMO this has the potential of becoming a completely separated project to be 
 hosted on stackforge or similar.
 
 Generally speaking, we already use Cloudbase-Init, which beside being the de 
 facto standard Windows Cloud-Init type feature” (Apache 2 licensed) 
 has been recently used as a base to provide the same functionality on FreeBSD.
 
 For reference: https://github.com/cloudbase/cloudbase-init and 
 http://www.cloudbase.it/cloud-init-for-windows-instances/
 
 We’re seriously thinking if we should transform Cloudbase-init into an agent 
 or if we should keep it on line with the current “init only, let the guest to 
 the rest” approach which fits pretty
 well with the most common deployment approaches (Heat, Puppet / Chef, Salt, 
 etc). Last time I spoke with Scott about this agent stuff for cloud-init, the 
 general intention was
 to keep the init approach as well (please correct me if I missed something in 
 the meantime).
 
 The limitations that we see, independently from which direction and tool will 
 be adopted for the agent, are mainly in the metadata services and the way 
 OpenStack users employ them to 
 communicate with Nova, Heat and the rest of the pack as orchestration 
 requirements complexity increases:
 

Hi, Allessandro. Really interesting thoughts. Most of what you have
described that is not about agent transport is what we discussed
at the Icehouse summit under the topic of the hot-software-config
blueprint. There is definitely a need for better workflow integration
in Heat, and that work is happening now.

 1) We need a way to post back small amounts of data (e.g. like we already do 
 for the encrypted Windows password) for status updates,
 so that the users know how things are going and can be properly notified in 
 case of post-boot errors. This might be irrelevant as long as you just create 
 a user and deploy some SSH keys,
 but becomes very important for most orchestration templates.


Heat already has this via wait conditions. hot-software-config will
improve upon this. I believe once a unified guest agent protocol is
agreed upon we will make Heat use that for wait condition signalling.

 2) The HTTP metadata service accessible from the guest with its magic number 
 is IMO quite far from an optimal solution. Since every hypervisor commonly 
 used in OpenStack (e.g. KVM, XenServer, Hyper-V, ESXi) provides guest / host 
 communication services, we could define a common abstraction layer which will 
 include a guest side (to be included in cloud-init, cloudbase-init, etc) and 
 a hypervisor side, to be implemented for each hypervisor and included in the 
 related Nova drivers.
 This has already been proposed / implemented in various third party 
 scenarios, but never under the OpenStack umbrella for multiple hypervisors.
 
 Metadata info can be at that point retrieved and posted by the Nova driver in 
 a secure way and proxied to / from the guest whithout needing to expose the 
 metadata 
 service to the guest itself. This would also simplify Neutron, as we could 
 get rid of the complexity of the Neutron metadata proxy. 
 

The neutron metadata proxy is actually relatively simple. Have a look at
it. The basic way it works in pseudo code is:

port = lookup_requesting_ip_port(remote_ip)
instance_id = lookup_port_instance_id(port)
response = forward_and_sign_request_to_nova(REQUEST, instance_id, 
conf.nova_metadata_ip)
return response

Furthermore, if we have to embrace some complexity, I would rather do so
inside Neutron than in an agent that users must install and make work
on every guest OS.

The dumber an agent is, the better it will scale and more resilient it
will be. I would credit this principle with the success of cloud-init
(sorry, you know I love you Scott! ;). What we're talking about now is
having an equally dumb, but differently focused agent.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-13 Thread Clint Byrum
Excerpts from Sergey Lukjanov's message of 2013-12-13 07:46:34 -0800:
 Hi Alessandro,
 
 it's a good idea to setup an IRC meeting for the unified agents. IMO it'll
 seriously speedup discussion. The first one could be used to determine the
 correct direction, then we can use them to discuss details and coordinate
 efforts, it will be necessary regardless of the approach.
 

I'd like for those who are going to do the actual work to stand up and
be counted before an IRC meeting. This is starting to feel bike-sheddy
and the answer to bike-shedding is not more meetings.

I am keenly interested in this, but have limited cycles to spare for it
at this time. So I do not count myself as one of those people.

I believe that a few individuals who are involved with already working
specialized agents will be doing the work to consolidate them and to fix
the bug that they all share (Heat shares this too) which is that private
networks cannot reach their respective agent endpoints. I think those
individuals should review the original spec given the new information,
revise it, and present it here in a new thread. If there are enough of
them that they feel they should have a meeting, I suggest they organize
one. But I do not think we need more discussion on a broad scale.

Speaking of that, before I run out and report a bug that affects
Savanna Heat and Trove, is there already a bug titled something like
Guests cannot reach [Heat/Savanna/Trove] endpoints from inside private
networks. ?

(BTW, paint it yellow!)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][TripleO] Nested resources

2013-12-14 Thread Clint Byrum
Excerpts from Sylvain Bauza's message of 2013-12-14 06:23:48 -0800:
 2013/12/9 Clint Byrum cl...@fewbar.com
 
  Excerpts from Fox, Kevin M's message of 2013-12-09 09:34:06 -0800:
   I'm thinking more generic:
  
   The cloud provider will provide one or more suballocating images. The
  one Triple O uses to take a bare metal node and make vm's available would
  be the obvious one to make available initially. I think that one should not
  have a security concern since it is already being used in that way safely.
 
  I like where you're going with this, in that the cloud should eventually
  become self aware enough to be able to privision the baremetal resources
  it has and spin nova up on them. I do think that is quite far out. Right
  now, we have two nova's.. an undercloud nova which owns all the baremetal,
  and an overcloud nova which owns all the vms. This is definitely nested,
  but there is a hard line between the two.
 
  For many people, that hard line is a feature. For others, it is a bug.  :)
 
 
 Could we imagine that an end-user would like to provision one undercloud
 host plus a certain number of overcloud nodes so that the Scheduler for
 undercloud Nova would deny other hosts but the ones provisioned ?

Yes I could imagine that. I also imagine that does not require any special
knowledge of the undercloud that the overcloud's nova API doesn't already
have access to. The host is a thing in the overcloud after all.

 As a contrary,  Scheduler for other undercloud Nova's need to deny the
 provisioning of the nodes hosted by another tenant than the requester...
 I played with TripleO a few months ago (August/September, before the merge
 with Tuskar) so that's a bit unclear for me, but I'm just saying we could
 potentially achieve this by using Climate which does deploy its own
 Scheduler Filter for making sure the proper hosts are booted.
 

Currently undercloud nova has one tenant: the overcloud operator. It is
single tenant, which means it has less complexity, but that also means you
can't hand hardware out directly to multiple tenants. That is why I say
that some consider it a feature, and some a bug. IMO that is how it should
remain, and we should just enhance systems like Climate to be more aware
of the topology of the hosts which are already an entity in the overcloud.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-15 Thread Clint Byrum
Excerpts from Steven Dake's message of 2013-12-14 09:00:53 -0800:
 On 12/13/2013 01:13 PM, Clint Byrum wrote:
  Excerpts from Dmitry Mescheryakov's message of 2013-12-13 12:01:01 -0800:
  Still, what about one more server process users will have to run? I see
  unified agent as library which can be easily adopted by both exiting and
  new OpenStack projects. The need to configure and maintain Salt server
  process is big burden for end users. That idea will definitely scare off
  adoption of the agent. And at the same time what are the gains of having
  that server process? I don't really see to many of them.
 
 
 I tend to agree, I don't see a big advantage to using something like 
 salt, when the current extremely simplistic cfn-init + friends do the job.
 
 What specific problem does salt solve?  I guess I missed that context in 
 this long thread.
 

Yes you missed the crux of the thread. There is a need to have agents that
are _not_ general purpose like cfn-init and friends. They specifically
need to be narrow in focus and not give the higher level service operator
backdoor access to everything via SSH-like control.

Salt works with plugins and thus the general purpose backdoor features
can be disabled definitively by not having them present, and then
plugins for Trove/Savanna/et.al can be added. Since they are
operator-controlled services these exotic agent configurations will be
built into operator-controlled images.

For Heat, the advantage is that you get unified transport in/out of
private networks to a general purpose agent which matches the agent for
those higher level services.

  The Salt devs already mentioned that we can more or less just import
  salt's master code and run that inside the existing server processes. So
  Savanna would have a salt master capability, and so would Heat Engine.
 I really don't think we want to introduce a salt executive into the heat 
 engine process address space, even if it is as simple as an import 
 operation.  Sounds like a debugging nightmare!
 

Data is not that hard to collect in this case, so before we call this
complicated or a debugging nightmare I think a bit of discovery would
go a long way. Also the engine is not likely to be where this would be,
existing server processes also includes heat-api, which would make a
lot more sense in this case.

  If it isn't eventlet friendly we can just fork it off and run it as its
  own child. Still better than inventing our own.
 fork/exec is not the answer to scalability and availability I was 
 looking for :)  So, given that we need scale+availability, we are back 
 to managing a daemon outside the address space, which essentially 
 introduces another daemon to be scaled and ha-ified (and documented, 
 etc, see long zookeeper thread for my arguments against new server 
 processes...).  import is not the answer, or atleast it won't be for heat...
 

Fork and run things that don't work well with eventlet does not mean fork
and _exec_. Also this is not to address scalability or availability. It
is to isolate code that does not behave exactly like the rest of our code.

 Salt just seems like more trouble then it is worth, but I don't totally 
 understand the rationale for introducing it as a dependency in this 
 case, and I generally think dependencies are evil :)


I generally think dependencies are efficient ways of consuming existing
code. Should we not use pecan? eventlet?

 What are we inventing our own again of?  cfn-init  friends already 
 exist, are dead simple, and have no need of anything beyond a metadata 
 server.  I would like to see that level of simplicity in any unified agent.
 

Please do read the whole thread or at least the first message. We would
invent a framework for efficient agent communication and plugin based
actions. Right now there are several agents and none of them work quite
like the others but all have the same basic goals. This is only about Heat
because by adopting the same communication protocol we gain in-instance
orchestration in private networks.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-17 Thread Clint Byrum
Excerpts from Dmitry Mescheryakov's message of 2013-12-17 08:01:38 -0800:
 Folks,
 
 The discussion didn't result in a consensus, but it did revealed a great
 number of things to be accounted. I've tried to summarize top-level points
 in the etherpad [1]. It lists only items everyone (as it seems to me)
 agrees on, or suggested options where there was no consensus. Let me know
 if i misunderstood or missed something. The etherpad does not list
 advantages/disadvantages of options, otherwise it just would be too long.
 Interested people might search the thread for the arguments :-) .
 
 I've thought it over and I agree with people saying we need to move
 further. Savanna needs the agent and I am going to write a PoC for it. Sure
 the PoC will be implemented in project-independent way. I still think that
 Salt limitations overweight its advantages, so the PoC will be done on top
 of oslo.messaging without Salt. At least we'll have an example on how it
 might look.
 
 Most probably I will have more questions in the process, for instance we
 didn't finish discussion on enabling networking for the agent yet. In that
 case I will start a new, more specific thread in the list.

If you're not going to investigate using salt, can I suggest you base
your POC on os-collect-config? It it would not take much to add two-way
communication to it.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   5   6   7   8   9   10   >