Re: [openstack-dev] [Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

2016-04-14 Thread gordon chung


On 14/04/2016 5:28 AM, Nadya Shakhat wrote:
> Hi Gordon,
>
> I'd like to add some clarifications and comments.
>
> this is not entirely accurate pre-polling change, the polling agents
> publish one message per sample. not the polling agents publish one
> message per interval (multiple samples).
>
> Looks like there is some misunderstanding here. In the code, there is
> "batch_polled_samples" option. You can switch it off and get the result
> you described, but it's True by default.  See
> https://github.com/openstack/ceilometer/blob/master/ceilometer/agent/manager.py#L205-L211

right... the polling agents are by default to publish one message per 
interval as i said (if you s/not/now/) where as before it was publishing 
1 message per sample. i don't see why that's a bad thing?

> .
>
> You wrote:
>
> the polling change is not related to coordination work in notification.
> the coordination work was to handle HA / multiple notification agents.
> regardless polling change, this must exist.
>
> and
>
> transformers are already optional. they can be removed from
> pipeline.yaml if not required (and thus coordination can be disabled).
>
>
> So, coordination is needed only to support transformations. Polling
> change does relate to this because it has brought additional
> transformations on notification agent side. I suggest to pay attention
> to the existing use cases. In real life, people use transformers for
> polling-based metrics only. The most important use case for
> transformation is Heat autoscaling. It usually based on cpu_util. Before
> Liberty, we were able not to use coordination for notification agent to
> support the autoscaling usecase. In Liberty we cannot support it without
> Redis. Now "transformers are already optional", that's true. But I think
> it's better to add some restrictions like "we don't support
> transformations for notifications" and have transformers optional on
> polling-agent only instead of introducing such a comprehensive
> coordination.

i'm not sure if it's safe to say it's only use for cpu_util. that said, 
cpu_util ideally shouldn't be a transform anyways. see the work Avi was 
doing[1].


>
> IPC is one of the
> standard use cases for message queues. the concept of using queues to
> pass around and distribute work is essentially what it's designed for.
> if rabbit or any message queue service can't provide this function, it
> does worry me.
>
>
> I see your point here, but Ceilometer aims to take care of the
> OpenStack, monitor it's state. Now it is known as a "Rabbit killer". We
> cannot ignore that if we want anybody uses Ceilometer.

what is the message load we're seeing here? how is your MQ configured? 
do you have batching? how many agents/queues do you have? i think this 
needs to be reviewed first to be honest as there really isn't much to go on?


[1] https://review.openstack.org/#/c/182057/


-- 
gord

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

2016-04-14 Thread Nadya Shakhat
Hi Gordon,

I'd like to add some clarifications and comments.

this is not entirely accurate pre-polling change, the polling agents
> publish one message per sample. not the polling agents publish one
> message per interval (multiple samples).

Looks like there is some misunderstanding here. In the code, there is
"batch_polled_samples" option. You can switch it off and get the result you
described, but it's True by default.  See
https://github.com/openstack/ceilometer/blob/master/ceilometer/agent/manager.py#L205-L211
.

You wrote:

> the polling change is not related to coordination work in notification.
> the coordination work was to handle HA / multiple notification agents.
> regardless polling change, this must exist.

and

> transformers are already optional. they can be removed from
> pipeline.yaml if not required (and thus coordination can be disabled).


So, coordination is needed only to support transformations. Polling change
does relate to this because it has brought additional transformations on
notification agent side. I suggest to pay attention to the existing use
cases. In real life, people use transformers for polling-based metrics
only. The most important use case for transformation is Heat autoscaling.
It usually based on cpu_util. Before Liberty, we were able not to use
coordination for notification agent to support the autoscaling usecase. In
Liberty we cannot support it without Redis. Now "transformers are already
optional", that's true. But I think it's better to add some restrictions
like "we don't support transformations for notifications" and have
transformers optional on polling-agent only instead of introducing such a
comprehensive coordination.

> IPC is one of the
> standard use cases for message queues. the concept of using queues to
> pass around and distribute work is essentially what it's designed for.
> if rabbit or any message queue service can't provide this function, it
> does worry me.


I see your point here, but Ceilometer aims to take care of the OpenStack,
monitor it's state. Now it is known as a "Rabbit killer". We cannot ignore
that if we want anybody uses Ceilometer.


Also, I'd like to copy-paste Chris's ideas from the previous message:

Are the options the following?
> * Do what you suggest and pull transformers back into the pollsters.

  Basically revert the change. I think this is the wrong long term
>   solution but might be the best option if there's nobody to do the
>   other options.
> * Implement a pollster.yaml for use by the pollsters and consider
>   pipeline.yaml as the canonical file for the notification agents as
>   there's where the actual _pipelines_ are. Somewhere in there kill
>   interval as a concept on pipeline side.
>   This of course doesn't address the messaging complexity. I admit
>   that I don't understand all the issues there but it often feels
>   like we are doing that aspect of things completely wrong, so I
>   would hope that before we change things there we consider all the
>   options.

I think that two types of agents should have two different pipeline
descriptions, but I still think that "pipeline" should be described and
fully applied on the both types of agents. On polling ones it should be the
same as it is now, on notification: remove interval and refuse from
transformations at all. Chris, I see your point about "long term", but I'm
afraid that "long term" may not happen...


> What else?
> One probably crazy idea: What about figuring out the desired end-meters
> of common transformations and making them into dedicated pollsters?
> Encapsulating that transformation not at the level of the polling
> manager but at the individual pollster.


Your "crazy idea" may work at least for restoring autoscaling functionality
indeed.

Thanks,
Nadya

On Wed, Apr 13, 2016 at 9:25 PM, gordon chung  wrote:

> hi Nadya,
>
> copy/pasting full original message with comments inline to clarify some
> comments.
>
> i think a lot of the confusion is because we use pipeline.yaml across
> both polling and notification agents when really it only applies to
> latter. just an fyi, we've had an open work item to create a
> polling.yaml file... just the issue of 'resources'.
>
> > Hello colleagues,
> >
> > I'd like to discuss one question with you. Perhaps, you remember that
> > in Liberty we decided to get rid of transformers on polling agents [1].
> I'd
> > like to describe several issues we are facing now because of this
> decision.
> > 1. pipeline.yaml inconsistency.
> > Ceilometer pipeline consists from the two basic things: source and
> > sink. In source, we describe how to get data, in sink - how to deal with
> > the data. After the refactoring described in [1], on polling agents we
> > apply only "source" definition, on notification agents we apply only
> "sink"
> > one. It causes the problems described in the mailing thread [2]: the
> "pipe"
> > concept is actually broken. To make it work more or less correctly, the
> 

[openstack-dev] [Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

2016-04-13 Thread gordon chung
hi Nadya,

copy/pasting full original message with comments inline to clarify some 
comments.

i think a lot of the confusion is because we use pipeline.yaml across 
both polling and notification agents when really it only applies to 
latter. just an fyi, we've had an open work item to create a 
polling.yaml file... just the issue of 'resources'.

> Hello colleagues,
>
> I'd like to discuss one question with you. Perhaps, you remember that
> in Liberty we decided to get rid of transformers on polling agents [1]. I'd
> like to describe several issues we are facing now because of this decision.
> 1. pipeline.yaml inconsistency.
> Ceilometer pipeline consists from the two basic things: source and
> sink. In source, we describe how to get data, in sink - how to deal with
> the data. After the refactoring described in [1], on polling agents we
> apply only "source" definition, on notification agents we apply only "sink"
> one. It causes the problems described in the mailing thread [2]: the "pipe"
> concept is actually broken. To make it work more or less correctly, the
> user should care that from a polling agent he/she doesn't send duplicated
> samples. In the example below, we send "cpu" Sample twice each 600 seconds
> from a compute agents:
>
> sources:
> - name: meter_source
> interval: 600
> meters:
> - "*"
> sinks:
> - meter_sink
> - name: cpu_source
> interval: 60
> meters:
> - "cpu"
> sinks:
> - cpu_sink
> - cpu_delta_sink
>
> If we apply the same configuration on notification agent, each "cpu" Sample
> will be processed by all of the 3 sinks. Please refer to the mailing thread
> [2] for more details.
> As I understood from the specification, the main reason for [1] is
> making the pollster code more readable. That's why I call this change a
> "refactoring". Please correct me if I miss anything here.

i don't know about more readable. it was also to offload work from 
compute nodes and all the stuff cdent mentions.

>
> 2. Coordination stuff.
> TBH, coordination for notification agents is the most painful thing for
> me because of several reasons:
>
> a. Stateless service has became stateful. Here I'd like to note that tooz
> usage for central agents and alarm-evaluators may be called "optional". If
> you want to have these services scalable, it is recommended to use tooz,
> i.e. install Redis/Zookeeper. But you may have your puppets unchanged and
> everything continue to work with one service (central agent or
> alarm-evaluator) per cloud. If we are talking about notification agent,
> it's not the case. You must change the deployment: eighter rewrite the
> puppets for notification agent deployment (to have only one notification
> agent per cloud)  or make tooz installation with Redis/Zookeeper required.
> One more option: remove transformations completely - that's what we've done
> in our company's product by default.

the polling change is not related to coordination work in notification. 
the coordination work was to handle HA / multiple notification agents. 
regardless polling change, this must exist.

>
> b. RabbitMQ high utilisation. As you know, tooz does only one part of
> coordination for a notification agent. In Ceilometer, we use IPC queues
> mechanism to be sure that samples with the one metric and from the one
> resource are processed by exactly the one notification agent (to make it
> possible to use a local cache). I'd like to remind you that without
> coordination (but with [1] applied) each compute agent polls each instances
> and send the result as one message to a notification agent. The

this is not entirely accurate pre-polling change, the polling agents 
publish one message per sample. not the polling agents publish one 
message per interval (multiple samples).

> notification agent processes all the samples and sends as many messages to
> a collector as many sinks it is defined (2-4, not many). If [1] if not
> applied, one "publishing" round is skipped. But with [1] and coordination
> (it's the most recommended deployment), amount of publications increases
> dramatically because we publish each Sample as a separate message. Instead
> of 3-5 "publish" calls, we do 1+2*instance_amount_on_compute publishings
> per each compute. And it's by design, i.e. it's not a bug but a feature.

i don't think the maths is right but regardless, IPC is one of the 
standard use cases for message queues. the concept of using queues to 
pass around and distribute work is essentially what it's designed for. 
if rabbit or any message queue service can't provide this function, it 
does worry me.

>
> c. Samples ordering in the queues. It may be considered as a corner case,
> but anyway I'd like to describe it here too. We have a lot of
> order-sensitive transformers (cpu.delta, cpu_util), but we can guarantee
> message ordering only in the "main" polling queue, but not in IPC queues. At
> the picture below (hope it will be displayed) there are 3 agents A1, A2 and
> A3 and 3 time-ordered 

Re: [openstack-dev] [Openstack] [Ceilometer][Architecture] Transformers in Kilo vs Liberty(and Mitaka)

2016-04-12 Thread Chris Dent


This discussion needs to be happening on openstack-dev too, so
cc'ing that list in as well. The top of the thread is at
http://lists.openstack.org/pipermail/openstack/2016-April/015864.html

On Tue, 12 Apr 2016, Chris Dent wrote:


On Tue, 12 Apr 2016, Nadya Shakhat wrote:


   I'd like to discuss one question with you. Perhaps, you remember that
in Liberty we decided to get rid of transformers on polling agents [1]. I'd
like to describe several issues we are facing now because of this decision.
1. pipeline.yaml inconsistency.


The original idea with centralizing the transformers to just the
notification agents was to allow a few different things, only one of
which has happened:

* Make the pollster code less complex with few dependencies,
 easing deployment options (this has happened), maintenance
 and custom pollsters.

 With the transformers in the pollsters they must maintain a
 considerable amount of state that makes effective use of eventlet
 (or whatever the chosen concurrent solution is) more difficult.

 The ideal pollster is just something that spits out a dumb piece
 of identified data every interval. And nothing else.

* Make it far easier to use and conceptualize the use of pollsters
 outside of the ceilometer environment as simple data collectors.
 In that context transformation would occur only close to the data
 consumption not at the data production.

 This, following the good practice of services doing one thing
 well.

* Migrate away from the pipeline.yaml that conflated sources and
 sinks to a model that is good both for computers and humans:

 * sources over here
 * sinks over here

That these other things haven't happened means we're in an awkward
situation.

Are the options the following?

* Do what you suggest and pull transformers back into the pollsters.
 Basically revert the change. I think this is the wrong long term
 solution but might be the best option if there's nobody to do the
 other options.

* Implement a pollster.yaml for use by the pollsters and consider
 pipeline.yaml as the canonical file for the notification agents as
 there's where the actual _pipelines_ are. Somewhere in there kill
 interval as a concept on pipeline side.

 This of course doesn't address the messaging complexity. I admit
 that I don't understand all the issues there but it often feels
 like we are doing that aspect of things completely wrong, so I
 would hope that before we change things there we consider all the
 options.

What else?

One probably crazy idea: What about figuring out the desired end-meters
of common transformations and making them into dedicated pollsters?
Encapsulating that transformation not at the level of the polling
manager but at the individual pollster.




--
Chris Dent   (╯°□°)╯︵┻━┻http://anticdent.org/
freenode: cdent tw: @anticdent__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev