For those who want see the picture this one should be available: https://docs.google.com/drawings/d/1OjnthkzKYnNLpIumYJYcs1ASFa_npbeyot8afo4LQgw/edit?usp=sharing
Thanks On Tue, Apr 12, 2016 at 4:13 PM, Nadya Shakhat <[email protected]> wrote: > Hello colleagues, > > I'd like to discuss one question with you. Perhaps, you remember that > in Liberty we decided to get rid of transformers on polling agents [1]. I'd > like to describe several issues we are facing now because of this decision. > 1. pipeline.yaml inconsistency. > Ceilometer pipeline consists from the two basic things: source and > sink. In source, we describe how to get data, in sink - how to deal with > the data. After the refactoring described in [1], on polling agents we > apply only "source" definition, on notification agents we apply only "sink" > one. It causes the problems described in the mailing thread [2]: the "pipe" > concept is actually broken. To make it work more or less correctly, the > user should care that from a polling agent he/she doesn't send duplicated > samples. In the example below, we send "cpu" Sample twice each 600 seconds > from a compute agents: > > sources: > - name: meter_source > interval: 600 > meters: > - "*" > sinks: > - meter_sink > - name: cpu_source > interval: 60 > meters: > - "cpu" > sinks: > - cpu_sink > - cpu_delta_sink > > If we apply the same configuration on notification agent, each "cpu" > Sample will be processed by all of the 3 sinks. Please refer to the mailing > thread [2] for more details. > As I understood from the specification, the main reason for [1] is > making the pollster code more readable. That's why I call this change a > "refactoring". Please correct me if I miss anything here. > > 2. Coordination stuff. > TBH, coordination for notification agents is the most painful thing > for me because of several reasons: > > a. Stateless service has became stateful. Here I'd like to note that tooz > usage for central agents and alarm-evaluators may be called "optional". If > you want to have these services scalable, it is recommended to use tooz, > i.e. install Redis/Zookeeper. But you may have your puppets unchanged and > everything continue to work with one service (central agent or > alarm-evaluator) per cloud. If we are talking about notification agent, > it's not the case. You must change the deployment: eighter rewrite the > puppets for notification agent deployment (to have only one notification > agent per cloud) or make tooz installation with Redis/Zookeeper required. > One more option: remove transformations completely - that's what we've done > in our company's product by default. > > b. RabbitMQ high utilisation. As you know, tooz does only one part of > coordination for a notification agent. In Ceilometer, we use IPC queues > mechanism to be sure that samples with the one metric and from the one > resource are processed by exactly the one notification agent (to make it > possible to use a local cache). I'd like to remind you that without > coordination (but with [1] applied) each compute agent polls each instances > and send the result as one message to a notification agent. The > notification agent processes all the samples and sends as many messages to > a collector as many sinks it is defined (2-4, not many). If [1] if not > applied, one "publishing" round is skipped. But with [1] and coordination > (it's the most recommended deployment), amount of publications increases > dramatically because we publish each Sample as a separate message. Instead > of 3-5 "publish" calls, we do 1+2*instance_amount_on_compute publishings > per each compute. And it's by design, i.e. it's not a bug but a feature. > > c. Samples ordering in the queues. It may be considered as a corner case, > but anyway I'd like to describe it here too. We have a lot of > order-sensitive transformers (cpu.delta, cpu_util), but we can guarantee > message ordering only in the "main" polling queue, but not in IPC queues. At > the picture below (hope it will be displayed) there are 3 agents A1, A2 and > A3 and 3 time-ordered messages in the MQ. Let's assume that at the same > time 3 agents start to read messages from the MQ. All the messages are > related to only one resource, that’s why they will go to only the one IPC > queue. Let it be IPC queue for A1 agent. At this point, we cannot guarantee > that the order will be kept, i.e. we cannot do order-sensitive > transformations without some loss. > > > Now I'd like to remind you that we need this coordination _only_ to > support transformations. Take a look on these specs: [3], [4] > From [3]: The issue that arises is that if we want to implement a > pipeline to process events, we cannot guarantee what event each agent > worker will get and because of that, we cannot enable transformers which > aggregate/collate some relationship across similar events. > > We don't have events transformations. In default pipeline.yaml we event > don't use transformations for notification-based samples (perhaps, we get > cpu from instance.exist, but we can drop it without any impact). The most > common case is transformations only for polling-based metrics. Please, > correct me if I'm wrong here. > > tl;dr > I suggest the following: > 1. Return transformations to polling agents > 2. Have a special format for pipeline.yaml on notification agents without > "interval" and "transformations". Notification-based transformations is > better to be done "offline". > > > [1] > https://github.com/openstack/telemetry-specs/blob/master/specs/liberty/pollsters-no-transform.rst > [2] http://www.gossamer-threads.com/lists/openstack/dev/53983 > [3] > https://github.com/openstack/ceilometer-specs/blob/master/specs/kilo/notification-coordiation.rst > [4] > https://github.com/openstack/ceilometer-specs/blob/master/specs/liberty/distributed-coordinated-notifications.rst > > Thanks for you attention, > Nadya >
_______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
