Putting Ruby on all the systems is not a good approach for this project
- Ruby is far too heavyweight to run continually - and it's easy to
break. If you have other Ruby projects installed on the machine, they
can easily break other Ruby code like Puppet or facter (sigh). Yes,
this really happens. Not common, but when it happens, it's a nightmare.
Although I don't mind working with Puppet at all, we certainly can't be
dependent on them. Otherwise we'd limit our potential customer base by
about 95%. Most machines don't even have Ruby installed on them -
unless they're running Puppet or Chef or Rails. Probably no more than
10% of all computers.
Our agents are active all the time -- and have to be for monitoring.
So, we have a very different approach than the Puppet approach - we
perform continuous discovery - and we discover things they don't have
any idea of (network, listening ports, client ports, etc). Because we
have a no-news-is-good-news philosophy, we can also perform this work
with less overhead than trying to get continuous discovery out of facter.
We get everything back - once, and after that we only get the
differences - and as soon as they happen - because we have continually
running active agents. To get the equivalent information, you'd have to
run facter every 5 minutes or so - and then diff the data against the
last data you'd gotten.
Exporting our data to their infrastructure is probably of value. We
could also export it to numerous other tools - for example Chef (and
many others).
Discovery is a sideline to them, for us it's at least half of our mission.
On 08/26/2012 02:50 PM, aaron prayther wrote:
i wonder if you could leverage anything from mcollective and facter.
On Sat, Aug 25, 2012 at 11:37 AM, Alan Robertson <al...@unix.sh
<mailto:al...@unix.sh>> wrote:
Hi Aaron,
This seems like a good discussion to have on the mailing list - so
I've moved it there...
On 8/25/2012 8:20 AM, aaron prayther wrote:
still being very ignorant of what assimilation is...
In lots of ways, we all are ;-) <waves hands/>
what i would like to see is something that makes it relatively
easy to monitor all the complexity of a cloud infrastructure as
well as all the instances and the services (often times custom
developed) running on them.
then in terms of nagios, we would need to be able to setup checks
on to not only check that the service is running but it is
responding with the correct output (example: web server is not
only running but responds with the correct info).
How complicated do you make these checks? With the current OCF
(open cluster framework) resource agents we have for web servers,
we do an HTTP GET, check the return code, and then compare the
output it against a REGEX. The default REGEX is something like
</html>. For databases we do a simple query. The default one
does something simple like counting the number of authenticated
users in the database and make sure that it is at least one. All
these have timeouts. Each OCF resource agent gets to tell you a
default timeout for its operations.
The OCF resource agent API is quite good. There are probably
something like 80 or so resource agents written for this API. We
could support others APIs (and we will, like init scripts and so
on), but I don't know of anything that comes close in terms of
capabilities, configurability, and the ease of the agent telling
you what it does and how to configure it. In addition there is an
active development community around them - which is used and
contributed to by Red Hat, SuSE, and lots of SysAdmins.
be able to take actions on thresholds: run a script, do
something, like notification.
Notification is definitely in the plan. May piggyback on some of
Nagios' notification mechanisms - and others. A good reason for
that is during a transition you don't want to administer two
notification mechanisms. Haven't decided yet. I know about
thresholds, we could let you set up some default ones and then
customize for each system. Plans are vague...
I know about the need for scripting - but don't have specific
plans. (even more vague). If we're monitoring a service with an
OCF agent or an init script, we also will eventually have the
capability to start and stop the resource in question.
Start-order dependencies may be an issue. We'lll blow up that
bridge when we come to it...
and finally to make it "attractive" to management types, you'll
need some sort of front end / reporting system.
Yes. Our main goal is to make you and your management into heroes
in your organization. This requires some good reporting as well
as great admin capabilities.
and finally, the assumption i'm making about the whole thing is.
as we spin up instances, we tell assimilation something about
the activities of that instance and it automatically starts
monitoring. when the instance goes away, it would be nice if we
could get it to know if it was meant to go away or not and report
accordingly.
There is plenty enough up in the air at this point that hearing
what you're interested in helps. With our stealth discovery, we
could easily recognize what you've planted onto the image, and
then it should be possible to automatically start the default
monitoring actions for those images. One of the strengths we
/should /have is the ability to respond dynamically to things like
you need to in a cloud.
Thanks for asking!
-- Alan Robertson
al...@unix.sh <mailto:al...@unix.sh>
--
-prayther
_______________________________________________
Assimilation mailing list - Discovery-Driven Monitoring
Assimilation@lists.community.tummy.com
http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation
http://assimmon.org/
--
Alan Robertson <al...@unix.sh> - @OSSAlanR
"Openness is the foundation and preservative of friendship... Let me claim from you
at all times your undisguised opinions." - William Wilberforce
_______________________________________________
Assimilation mailing list - Discovery-Driven Monitoring
Assimilation@lists.community.tummy.com
http://lists.community.tummy.com/cgi-bin/mailman/listinfo/assimilation
http://assimmon.org/