On Fri, 23 Sep 2016, gordon chung wrote:



On 23/09/2016 2:18 AM, Zhai, Edwin wrote:


There are many targets(topics)/endpoints in above ceilometer code. But
in AODH, we just have one topic, 'alarm.all', and one endpoint. If it is
still multi-threaded, there is already potential race condition here,
but event-alarm tiemout make it worse.

https://github.com/openstack/aodh/blob/master/aodh/event.py#L61-L63

see my reply to other message, but yes, it is multithreaded. there's not
race currently because we don't do anything that needs to honour ordering.

Currently, we still need ordering. e.g.
2 events with different traits could trigger same alarm. If they come in an interval big enough, the alarm would be triggered once(Second event see the state as 'ALARM' and give up). If they come and is handled concurrently, the alarm possibly be triggered twice(Both event see the state as 'UNKNOWN'). This is wrong as event alarm is one-shot(if repeat_actions=False).

Do you have any idea to resolve this race condition?



event evaluator is triggered by event only, that is, it's not called at
all until next event comes. If no event comes, evaluator just sleeps so
that can't check timeout and update_alarm. In other words, 'timeout.end'
is just for waking up evaluator.


what's the purpose of the thread being created? i thought the idea was
to receive alarm.timeout.start event -> creates a thread? can we not:
1. receive alarm.timeout.start -> create an alarm with timeout thread
2a. if event received, kill timeout thread, update alarm.
2b. if timeout reached, send alarm notification, update alarm.

^ that is just a random thought, i didn't think about exactly how to
implement. right now i'm not clear who is generating this
alarm.timeout.end event and why it needs to do that at all.


It's good idea! We need one way for timeout calculation: new thread, or alarm signal. If alarm signal is more stable, let's turn to it.

We need one list to keep all alarms waiting for timeout, and update the list when timeout signal reached.

alarm.timeout.end event is just for locking, and generated by new thread or alarm signal handler(your suggestion). If it is useless for locking, we can give up and just update alarm directly as you said.


cheers,
--
gord


Best Rgds,
Edwin

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to