Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-28 Thread Jyoti Ranjan
Yes, after version and agent ensuring to pick correct data will not cause
Swift issue.


On Thu, Aug 28, 2014 at 2:59 AM, Steve Baker sba...@redhat.com wrote:

 On 28/08/14 03:41, Zane Bitter wrote:
  On 27/08/14 11:04, Steven Hardy wrote:
  On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
  I am little bit skeptical about using Swift for this use case
  because of
  its eventual consistency issue. I am not sure Swift cluster is
  good to be
  used for this kind of problem. Please note that Swift cluster
  may give you
  old data at some point of time.
 
  This is probably not a major problem, but it's certainly worth
  considering.
 
  My assumption is that the latency of making the replicas consistent
  will be
  small relative to the timeout for things like SoftwareDeployments, so
  all
  we need is to ensure that instances  eventually get the new data, act on
 
  That part is fine, but if they get the new data and then later get the
  old data back again... that would not be so good.

 It would be fairly easy for the agent to check last modified headers and
 ignore data which is older than the most recently fetched metadata.



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Jyoti Ranjan
I am curious to know about Swift role here. Can you elaborate little bit
please?


On Wed, Aug 27, 2014 at 4:14 AM, Steve Baker sba...@redhat.com wrote:

 On 23/08/14 07:39, Zane Bitter wrote:
  We held the inaugural Heat mid-cycle meetup in Raleigh, North Carolina
  this week. There were a dozen folks in attendance, and I think
  everyone agreed that it was a very successful event. Notes from the
  meetup are on the Etherpad here:
 
  https://etherpad.openstack.org/p/heat-juno-midcycle-meetup
 
  Here are a few of the conclusions:
 
 ...
  * Marconi is now called Zaqar.
  Who knew?
 
  * Marc^W Zaqar is critical to pretty much every major non-Convergence
  feature on the roadmap.
  We knew that we wanted to use it for notifications, but we also want
  to make those a replacement for events, and a conduit for warnings and
  debugging information to the user. This is becoming so important that
  we're going to push ahead with an implementation now without waiting
  to see when Zaqar will graduate. Zaqar would also be a good candidate
  for pushing metadata changes to servers, to resolve the performance
  issues currently caused by polling.
 
 Until Zaqar is generally available we can still remove the polling load
 from heat by pushing metadata to a swift TempURL. This is ready now for
 review:
 https://review.openstack.org/#/q/topic:bp/swift-deployment-transport,n,z


 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Steven Hardy
On Wed, Aug 27, 2014 at 02:39:09PM +0530, Jyoti Ranjan wrote:
I am curious to know about Swift role here. Can you elaborate little bit
please? 

I think Zane already covered it with We just want people to stop polling
us, because it's killing our performance.

Basically, if we provide the option for folks using heat at large scale to
poll Swift instead of the Heat API, we can work around some performance
issues a subset of our users have been experiencing due to the load of many
resources polling Heat (and hence the database) frequently.

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Jyoti Ranjan
I am little bit skeptical about using Swift for this use case because of
its eventual consistency issue. I am not sure Swift cluster is good to be
used for this kind of problem. Please note that Swift cluster may give you
old data at some point of time.


On Wed, Aug 27, 2014 at 4:12 PM, Steven Hardy sha...@redhat.com wrote:

 On Wed, Aug 27, 2014 at 02:39:09PM +0530, Jyoti Ranjan wrote:
 I am curious to know about Swift role here. Can you elaborate little
 bit
 please?

 I think Zane already covered it with We just want people to stop polling
 us, because it's killing our performance.

 Basically, if we provide the option for folks using heat at large scale to
 poll Swift instead of the Heat API, we can work around some performance
 issues a subset of our users have been experiencing due to the load of many
 resources polling Heat (and hence the database) frequently.

 Steve

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Steven Hardy
On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
I am little bit skeptical about using Swift for this use case because of
its eventual consistency issue. I am not sure Swift cluster is good to be
used for this kind of problem. Please note that Swift cluster may give you
old data at some point of time.

This is probably not a major problem, but it's certainly worth considering.

My assumption is that the latency of making the replicas consistent will be
small relative to the timeout for things like SoftwareDeployments, so all
we need is to ensure that instances  eventually get the new data, act on
it, and send a signal back to Heat (again, Heat eventually getting it via
Swift will be OK provided the replication delay is small relative to the
stack timeout, which defaults to one hour)

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Zane Bitter

On 27/08/14 11:04, Steven Hardy wrote:

On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:

I am little bit skeptical about using Swift for this use case because of
its eventual consistency issue. I am not sure Swift cluster is good to be
used for this kind of problem. Please note that Swift cluster may give you
old data at some point of time.


This is probably not a major problem, but it's certainly worth considering.

My assumption is that the latency of making the replicas consistent will be
small relative to the timeout for things like SoftwareDeployments, so all
we need is to ensure that instances  eventually get the new data, act on


That part is fine, but if they get the new data and then later get the 
old data back again... that would not be so good.



it, and send a signal back to Heat (again, Heat eventually getting it via
Swift will be OK provided the replication delay is small relative to the
stack timeout, which defaults to one hour)

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Ryan Brown
Swift does have some guarantees around read-after-write consistency, but
for Heat I think the best bet would be the X-Newest[1] header which has
been in swift for a very, very long time. The downside here is that
(IIUC) it queries all storage nodes for that object. It does not provide
a hard guarantee[2] but does at least try *harder* to get the most
recent version.

We could also (assuming it was turned on) use object versioning to
ensure that the most up to date version of the metadata was used, but I
think X-Newest is the way to go.

[1]: https://lists.launchpad.net/openstack/msg06846.html
[2]:
https://ask.openstack.org/en/question/26403/does-x-newest-apply-to-getting-container-lists-and-object-lists-also-dlo/

On 08/27/2014 11:41 AM, Zane Bitter wrote:
 On 27/08/14 11:04, Steven Hardy wrote:
 On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
 I am little bit skeptical about using Swift for this use case
 because of
 its eventual consistency issue. I am not sure Swift cluster is
 good to be
 used for this kind of problem. Please note that Swift cluster may
 give you
 old data at some point of time.

 This is probably not a major problem, but it's certainly worth
 considering.

 My assumption is that the latency of making the replicas consistent
 will be
 small relative to the timeout for things like SoftwareDeployments, so all
 we need is to ensure that instances  eventually get the new data, act on
 
 That part is fine, but if they get the new data and then later get the
 old data back again... that would not be so good.
 
 it, and send a signal back to Heat (again, Heat eventually getting it via
 Swift will be OK provided the replication delay is small relative to the
 stack timeout, which defaults to one hour)

 Steve

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-- 
Ryan Brown / Software Engineer, Openstack / Red Hat, Inc.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Clint Byrum
Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
 On 27/08/14 11:04, Steven Hardy wrote:
  On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
  I am little bit skeptical about using Swift for this use case because 
  of
  its eventual consistency issue. I am not sure Swift cluster is good to 
  be
  used for this kind of problem. Please note that Swift cluster may give 
  you
  old data at some point of time.
 
  This is probably not a major problem, but it's certainly worth considering.
 
  My assumption is that the latency of making the replicas consistent will be
  small relative to the timeout for things like SoftwareDeployments, so all
  we need is to ensure that instances  eventually get the new data, act on
 
 That part is fine, but if they get the new data and then later get the 
 old data back again... that would not be so good.
 

Agreed, and I had not considered that this can happen.

There is a not-so-simple answer though:

* Heat inserts this as initial metadata:

{metadata: {}, update-url: xx, version: 0}

* Polling goes to update-url and ignores metadata = 0

* Polling finds new metadata in same format, and continues the loop
without talking to Heat

However, this makes me rethink why we are having performance problems.
MOST of the performance problems have two root causes:

* We parse the entire stack to show metadata, because we have to see if
  there are custom access controls defined in any of the resources used.
  I actually worked on a patch set to deprecate this part of the resource
  plugin API because it is impossible to scale this way.
* We rely on the engine to respond because of the parsing issue.

If however we could just push metadata into the db fully resolved
whenever things in the stack change, and cache the response in the API
using Last-Modified/Etag headers, I think we'd be less inclined to care
so much about swift for polling. However we are still left with the many
thousands of keystone users being created vs. thousands of swift tempurls.

That would also set us up nicely for very easy integration with Zaqar,
as metadata changes would flow naturally into the message queue for the
server through the same mechanism as they flow into the database.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Steven Hardy
On Wed, Aug 27, 2014 at 11:41:29AM -0400, Zane Bitter wrote:
 On 27/08/14 11:04, Steven Hardy wrote:
 On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
 I am little bit skeptical about using Swift for this use case because of
 its eventual consistency issue. I am not sure Swift cluster is good to 
  be
 used for this kind of problem. Please note that Swift cluster may give 
  you
 old data at some point of time.
 
 This is probably not a major problem, but it's certainly worth considering.
 
 My assumption is that the latency of making the replicas consistent will be
 small relative to the timeout for things like SoftwareDeployments, so all
 we need is to ensure that instances  eventually get the new data, act on
 
 That part is fine, but if they get the new data and then later get the old
 data back again... that would not be so good.

Right, my assumption is that we'd have a version, either directly in the
data being polled or via swift object versioning.  We persist the most
recent metadata inside the instance, so the agent doing the polling just
has to know to ignore any metadata with a version number lower than the
locally stored data.

This does all seem like a fairly convoluted way to work around what are
seemingly mostly database bandwidth issues, but the eventual consistency
thing doesn't seem to be a showstopper afaics, if we go the swift
direction.

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Steven Hardy
On Wed, Aug 27, 2014 at 09:40:31AM -0700, Clint Byrum wrote:
 Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
  On 27/08/14 11:04, Steven Hardy wrote:
   On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
   I am little bit skeptical about using Swift for this use case 
   because of
   its eventual consistency issue. I am not sure Swift cluster is good 
   to be
   used for this kind of problem. Please note that Swift cluster may 
   give you
   old data at some point of time.
  
   This is probably not a major problem, but it's certainly worth 
   considering.
  
   My assumption is that the latency of making the replicas consistent will 
   be
   small relative to the timeout for things like SoftwareDeployments, so all
   we need is to ensure that instances  eventually get the new data, act on
  
  That part is fine, but if they get the new data and then later get the 
  old data back again... that would not be so good.
  
 
 Agreed, and I had not considered that this can happen.
 
 There is a not-so-simple answer though:
 
 * Heat inserts this as initial metadata:
 
 {metadata: {}, update-url: xx, version: 0}
 
 * Polling goes to update-url and ignores metadata = 0
 
 * Polling finds new metadata in same format, and continues the loop
 without talking to Heat
 
 However, this makes me rethink why we are having performance problems.
 MOST of the performance problems have two root causes:
 
 * We parse the entire stack to show metadata, because we have to see if
   there are custom access controls defined in any of the resources used.
   I actually worked on a patch set to deprecate this part of the resource
   plugin API because it is impossible to scale this way.
 * We rely on the engine to respond because of the parsing issue.
 
 If however we could just push metadata into the db fully resolved
 whenever things in the stack change, and cache the response in the API
 using Last-Modified/Etag headers, I think we'd be less inclined to care
 so much about swift for polling. However we are still left with the many
 thousands of keystone users being created vs. thousands of swift tempurls.

There's probably a few relatively simple optimisations we can do if the
keystone user thing becomes the bottleneck:
- Make the user an attribute of the stack and only create one per
  stack/tree-of-stacks
- Make the user an attribute of each server resource (probably more secure
  but less optimal if your optimal is less keystone users).

I don't think the many keystone users thing is actually a problem right now
though, or is it?

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Clint Byrum
Excerpts from Steven Hardy's message of 2014-08-27 10:08:36 -0700:
 On Wed, Aug 27, 2014 at 09:40:31AM -0700, Clint Byrum wrote:
  Excerpts from Zane Bitter's message of 2014-08-27 08:41:29 -0700:
   On 27/08/14 11:04, Steven Hardy wrote:
On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
I am little bit skeptical about using Swift for this use case 
because of
its eventual consistency issue. I am not sure Swift cluster is 
good to be
used for this kind of problem. Please note that Swift cluster may 
give you
old data at some point of time.
   
This is probably not a major problem, but it's certainly worth 
considering.
   
My assumption is that the latency of making the replicas consistent 
will be
small relative to the timeout for things like SoftwareDeployments, so 
all
we need is to ensure that instances  eventually get the new data, act on
   
   That part is fine, but if they get the new data and then later get the 
   old data back again... that would not be so good.
   
  
  Agreed, and I had not considered that this can happen.
  
  There is a not-so-simple answer though:
  
  * Heat inserts this as initial metadata:
  
  {metadata: {}, update-url: xx, version: 0}
  
  * Polling goes to update-url and ignores metadata = 0
  
  * Polling finds new metadata in same format, and continues the loop
  without talking to Heat
  
  However, this makes me rethink why we are having performance problems.
  MOST of the performance problems have two root causes:
  
  * We parse the entire stack to show metadata, because we have to see if
there are custom access controls defined in any of the resources used.
I actually worked on a patch set to deprecate this part of the resource
plugin API because it is impossible to scale this way.
  * We rely on the engine to respond because of the parsing issue.
  
  If however we could just push metadata into the db fully resolved
  whenever things in the stack change, and cache the response in the API
  using Last-Modified/Etag headers, I think we'd be less inclined to care
  so much about swift for polling. However we are still left with the many
  thousands of keystone users being created vs. thousands of swift tempurls.
 
 There's probably a few relatively simple optimisations we can do if the
 keystone user thing becomes the bottleneck:
 - Make the user an attribute of the stack and only create one per
   stack/tree-of-stacks
 - Make the user an attribute of each server resource (probably more secure
   but less optimal if your optimal is less keystone users).
 
 I don't think the many keystone users thing is actually a problem right now
 though, or is it?

1000 servers means 1000 keystone users to manage, and all of the tokens
and backend churn that implies.

It's not a problem, but it is quite a bit heavier than tempurls.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-27 Thread Steve Baker
On 28/08/14 03:41, Zane Bitter wrote:
 On 27/08/14 11:04, Steven Hardy wrote:
 On Wed, Aug 27, 2014 at 07:54:41PM +0530, Jyoti Ranjan wrote:
 I am little bit skeptical about using Swift for this use case
 because of
 its eventual consistency issue. I am not sure Swift cluster is
 good to be
 used for this kind of problem. Please note that Swift cluster
 may give you
 old data at some point of time.

 This is probably not a major problem, but it's certainly worth
 considering.

 My assumption is that the latency of making the replicas consistent
 will be
 small relative to the timeout for things like SoftwareDeployments, so
 all
 we need is to ensure that instances  eventually get the new data, act on

 That part is fine, but if they get the new data and then later get the
 old data back again... that would not be so good.

It would be fairly easy for the agent to check last modified headers and
ignore data which is older than the most recently fetched metadata.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-26 Thread Steven Hardy
On Fri, Aug 22, 2014 at 03:39:24PM -0400, Zane Bitter wrote:
 We held the inaugural Heat mid-cycle meetup in Raleigh, North Carolina this
 week. There were a dozen folks in attendance, and I think everyone agreed
 that it was a very successful event. Notes from the meetup are on the
 Etherpad here:
 
 https://etherpad.openstack.org/p/heat-juno-midcycle-meetup
 
 Here are a few of the conclusions:

Thanks for the update Zane, for those of us who were unable to attend it is
much appreciated! :)

Steve

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-26 Thread Steve Baker
On 23/08/14 07:39, Zane Bitter wrote:
 We held the inaugural Heat mid-cycle meetup in Raleigh, North Carolina
 this week. There were a dozen folks in attendance, and I think
 everyone agreed that it was a very successful event. Notes from the
 meetup are on the Etherpad here:

 https://etherpad.openstack.org/p/heat-juno-midcycle-meetup

 Here are a few of the conclusions:

...
 * Marconi is now called Zaqar.
 Who knew?

 * Marc^W Zaqar is critical to pretty much every major non-Convergence
 feature on the roadmap.
 We knew that we wanted to use it for notifications, but we also want
 to make those a replacement for events, and a conduit for warnings and
 debugging information to the user. This is becoming so important that
 we're going to push ahead with an implementation now without waiting
 to see when Zaqar will graduate. Zaqar would also be a good candidate
 for pushing metadata changes to servers, to resolve the performance
 issues currently caused by polling.

Until Zaqar is generally available we can still remove the polling load
from heat by pushing metadata to a swift TempURL. This is ready now for
review:
https://review.openstack.org/#/q/topic:bp/swift-deployment-transport,n,z


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-25 Thread Thierry Carrez
Zane Bitter wrote:
 [...]
 Here are a few of the conclusions:
 
 * Everyone wishes the Design Summit worked like this.
 The meetup seemed a lot more productive than the design summit ever is.
 It's really nice to be in a room small enough that you can talk normally
 and hear everyone, instead of in a room designed for 150 people. It's
 really nice to be able to discuss stuff that isn't related to a
 particular feature - we had a long discussion about how to get through
 the review backlog, for example. It's really nice to not have fixed time
 slots for discussions - because everyone was in the room the whole time,
 we could dip in and out of different topics at will. Often we came back
 to one that we'd previously discussed because we had discovered new
 information. Finally, it's critical to be in a room covered in
 full-sized whiteboards that everyone can see. A single tiny flip chart
 doesn't cut it.

That's good feedback, thanks. The current discussion on design summit
format changes is a bit lost under a Nova thread, so I should revive it
as a separate thread very soon. The idea being to implement whatever
changes we can to the summit to make it more productive (in the limited
remaining time and options we have for that).

 [...]
 * Marc^W Zaqar is critical to pretty much every major non-Convergence
 feature on the roadmap.
 We knew that we wanted to use it for notifications, but we also want to
 make those a replacement for events, and a conduit for warnings and
 debugging information to the user. This is becoming so important that
 we're going to push ahead with an implementation now without waiting to
 see when Zaqar will graduate. Zaqar would also be a good candidate for
 pushing metadata changes to servers, to resolve the performance issues
 currently caused by polling.

Could you expand on that ? Do you need some kind of user-facing queue
service, or is there something in the marc^WZaqar approach that makes it
especially appealing ?

-- 
Thierry Carrez (ttx)

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-25 Thread Zane Bitter

On 25/08/14 05:21, Thierry Carrez wrote:

Zane Bitter wrote:

[...]
Here are a few of the conclusions:

* Everyone wishes the Design Summit worked like this.
The meetup seemed a lot more productive than the design summit ever is.
It's really nice to be in a room small enough that you can talk normally
and hear everyone, instead of in a room designed for 150 people. It's
really nice to be able to discuss stuff that isn't related to a
particular feature - we had a long discussion about how to get through
the review backlog, for example. It's really nice to not have fixed time
slots for discussions - because everyone was in the room the whole time,
we could dip in and out of different topics at will. Often we came back
to one that we'd previously discussed because we had discovered new
information. Finally, it's critical to be in a room covered in
full-sized whiteboards that everyone can see. A single tiny flip chart
doesn't cut it.


That's good feedback, thanks. The current discussion on design summit
format changes is a bit lost under a Nova thread, so I should revive it
as a separate thread very soon. The idea being to implement whatever
changes we can to the summit to make it more productive (in the limited
remaining time and options we have for that).


Yeah, I have been following that thread too. It's a hard problem, 
because obviously the ability to talk to developers from other projects 
and users at the design summit is _also_ valuable... we need to find a 
way to somehow make both happen, preferably without making everyone 
travel 4 times a year or for more than a week at a time.



[...]
* Marc^W Zaqar is critical to pretty much every major non-Convergence
feature on the roadmap.
We knew that we wanted to use it for notifications, but we also want to
make those a replacement for events, and a conduit for warnings and
debugging information to the user. This is becoming so important that
we're going to push ahead with an implementation now without waiting to
see when Zaqar will graduate. Zaqar would also be a good candidate for
pushing metadata changes to servers, to resolve the performance issues
currently caused by polling.


Could you expand on that ? Do you need some kind of user-facing queue
service, or is there something in the marc^WZaqar approach that makes it
especially appealing ?


Basically we just need a user-facing queue service. The key drivers are 
the need for:

1) an asynchronous way of talking back to the user
2) a central service optimised for polling by the user, so that other 
services (like Heat) can move to a push model
3) a way of passing notifications between OpenStack services that the 
user can intercept and process if they choose (e.g. we already use 
user-defined webhooks to communicate from Ceilometer to autoscaling in 
Heat so that users can interpose their own alarm conditioning, but this 
has authentication issues and the potential to turn Ceilometer into a 
DOS engine)


Zaqar is a more-than-adequate-seeming implementation of those 
requirements that is already incubated.


Ideally it will also support SNS-style push notifications to the user, 
but that's more the user's problem than Heat's ;) We just want people to 
stop polling us, because it's killing our performance.


cheers,
Zane.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Heat] Heat Juno Mid-cycle Meetup report

2014-08-22 Thread Zane Bitter
We held the inaugural Heat mid-cycle meetup in Raleigh, North Carolina 
this week. There were a dozen folks in attendance, and I think everyone 
agreed that it was a very successful event. Notes from the meetup are on 
the Etherpad here:


https://etherpad.openstack.org/p/heat-juno-midcycle-meetup

Here are a few of the conclusions:

* Everyone wishes the Design Summit worked like this.
The meetup seemed a lot more productive than the design summit ever is. 
It's really nice to be in a room small enough that you can talk normally 
and hear everyone, instead of in a room designed for 150 people. It's 
really nice to be able to discuss stuff that isn't related to a 
particular feature - we had a long discussion about how to get through 
the review backlog, for example. It's really nice to not have fixed time 
slots for discussions - because everyone was in the room the whole time, 
we could dip in and out of different topics at will. Often we came back 
to one that we'd previously discussed because we had discovered new 
information. Finally, it's critical to be in a room covered in 
full-sized whiteboards that everyone can see. A single tiny flip chart 
doesn't cut it.


* 3 days seems to be about the right length.
Not a lot got done on day 3, and people started to drift out at various 
times to catch flights, but the fact that everyone was there for _all_ 
of day 1 and 2 was essential (the critical Convergence plan was 
finalised around 7.30pm on Tuesday).


* There was a lot more discussion than hacking.
The main value of the meet-up was more in the discussions you'd hope to 
be able to have at the design summit than in working collaboratively on 
code.


* Marconi is now called Zaqar.
Who knew?

* Marc^W Zaqar is critical to pretty much every major non-Convergence 
feature on the roadmap.
We knew that we wanted to use it for notifications, but we also want to 
make those a replacement for events, and a conduit for warnings and 
debugging information to the user. This is becoming so important that 
we're going to push ahead with an implementation now without waiting to 
see when Zaqar will graduate. Zaqar would also be a good candidate for 
pushing metadata changes to servers, to resolve the performance issues 
currently caused by polling.


* We are on track to meet the immediate requirements of TripleO.
Obviously it would be nice to have Convergence now, but failing that the 
most critical bugs are under control. In the immediate future, we need 
to work on finding a consensus on running with multiple worker by 
default, split trees of nested stacks so that each nested stack runs in 
a separate engine, and find a way to push metadata out from Heat instead 
of having servers poll us for it.


* We have a plan for what Convergence will look like.
Here's some horrific photos of a whiteboard: 
https://www.dropbox.com/sh/tamoc8dhhckb81w/AAA6xp2be9xv20P7SWx-xnZba?dl=0
Clint is, I believe, working on turning that into something more 
consumable. This was definitely the biggest success of the meet-up. 
Before this I had no idea what convergence would look like; now I have a 
fair idea how it will work and where the tricky bits might be. I doubt 
this could have happened at a design summit.


* We probably don't need TaskFlow.
After coming up with the Convergence plan we realised that, while 
TaskFlow would be useful/essential if we were planning a more modest 
refactoring of Heat, the architecture of Convergence should actually 
eliminate the need for it. All we think we need is a bunch of work 
queues that can be provided by oslo.messaging. TaskFlow seems great for 
the problem it solves, but we have the opportunity to not create that 
problem for ourselves in the first place.


* Convergence probably won't buy us much in the short term.
I think we all hoped that our incremental work on Convergence would 
render incremental benefits for Heat. After figuring out the rough 
outline of how Convergence could work, we realised that the incremental 
steps along the way (like implementing the observer process) will 
actually not have a big impact. So while, of course, we'll continue to 
work incrementally, we don't expect to see major benefits until nearer 
the end of the process.



Thanks to everyone who made the trip, and of course also to everyone who 
contributed input via IRC and generally held down the fort while we were 
meeting. If I misstated or just plain missed anything above, please feel 
free to weigh in.


cheers,
Zane.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev