Here is below the exchange we did have on github so we can continue there :)
On Mon, Jun 30, 2014 at 6:57 PM, Alexandre Viau <
alexandre.v...@savoirfairelinux.com> wrote:
> Hello :) This was also posted on github [here](
> https://github.com/naparuba/shinken/issues/1238).
>
> I am proposing the addition of a new optional daemon to Shiknen, the
> __analyst__.
>
> ##### This new component would allow for the following functionalities:
> * Make bprules on metrics and statuses
> * Generate alerts on unconfigured services and on passive configured
> services (ex: collectd)
> * Make trending analysis on any metrics or combination of metrics.
> * Possibly generate alerts and make bprules on services and statuses from
> different realms
> * Generate new metrics based on a mix of other metrics
>
> ##### It would cover the following use cases:
> * Extend the capabilities of bprules.
> * For example: Two switches are connected together with two cables. I
> want to monitor the bandwidth usage of both cables. Currently, bprules
> won't give me a good idea of total usage. It would be appropriate to merge
> both metrics: Instead of 99% (cricial) and 0%(ok), just tell me 49%.
>
> * In a large automated environment, it is complicated to configure hosts
> as they come and go. This is where passive monitoring comes in handy.
> However, I have to configure individual hosts if I want to receive alerts.
> I would like to define rules like the following: If load is above a certain
> threshold on any of my servers (configured or not), generate an alert.
>
> ##### How the daemon would fit in the current data flow:
>
> 1. The user creates rules and configures the analyst.
> 2. From these rules, the arbiter configures the brokers so that they keep
> the necessary data in dedicated queues for the analysts.
> 3. Receiver/Poller receives check results.
> 4. The broker polls the check results broks and sends them to the database
> 5. Before deleting the broks, the broker checks his config for analysts
> that requires them and puts the broks in dedicated queues.
> 6. The analyst retrieves the broks from the appropriate brokers and
> analyses them. If necessary, it generates new broks/alerts.
>
> ##### Details:
> * A check result could be required by several analysts, the brokers keep
> one queue per linked analyst.
> * At first, the rules would be written in Python and evaluated by the
> daemon. In the future, this would become modular and rules could be written
> with any syntax (Ex: lua).
> * Analysts could analyses metrics generated by other analysts. They could
> also generate metrics based on analyst-generated metrics!
>
> The analyst would greatly enhance Shinken's alerting capabilities and
> would also simplify alerting on passive results.
>
> I would like to propose myself as the main developer for this new daemon,
> I would be working on this at Savoir-Faire Linux where we already have two
> other Shinken specialists and contributors.
>
>
That's an interesting idea and in fact i'm already thinking about such rule
system since some months.
And after look at some points, it do not fit the shinken way of thinking.
In fact, the whole shinken thing is done to minimise the inter-daemons
intereaction, by a diff-bus (broks), objects that are linked togethers are
put in the same schedulers (arbiter role) and do not manage the history
(all in memory).
Shinken is like nagios, it's done to manage status, not metrics. There are
tools for this and i don't think it's a good thing to have a deamon that do
not fit the same logic that all the others.
But that's about philosophy, let's look at real use case now :)
We need such perfdata lookup for some cases. that's was triggers where
done. the main problems is that they are in the scheduler that got a
limited view in the objects (only a part of the It hosts). and that's why I
never really loved the triggers.
One other key problems you will have to tackle, and I don't see in your
post is how you will manage the distribute load between your daemons. rules
assignements won't be enough for large env.
What you really need are in fact triggers (your python function that can
then be LUA) in a metrology tool. It do not have to be in a monitoring
tool, but only to send back alert/states to a monitoring tool.
You daemon is basically a graphite with rules that is generating perfdata
into graphite as well (like the grahite agregator?) that is getting back
alerts in a receiver. You cnan propose them to add such a alert levels, or
maybe in collectd that already have such capabilities.
Maybe you can swith from graphite to elasticsearch if you only look at not
so far data and you want scalability, but it's up to you :)
For the rule engine, you can look at riemman, it's not python nor lua, but
it's still cool [image: :+1:] and you already have a ready to run event
machine. If I'm not wrong it is in scala (and also the rules). It will also
save you lot of coe lines [image: :godmode:]
On good thing on the shinken part can be a scheduler module that will allow
the check_commands to grab data from the backend so you won't have to
manage the reinsertion and the names mapping (can always be hard in the
passive->active logic).
Good luk for your project, and let us know how it evoles, I think we will
be please to add "function macros" linked from modules in the scehduler so
we can test it in shinken. Maybe in this 2.2 version if we got time
for it [image:
:+1:]
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel