Hi,

I would like us to take some time to make a little break like we did the
last year after the 0.2 version, and look at the project vision and why not,
change it.

**Beware** : one time again, I wrote a lot, sorry :)
*
What we did great the last year*
Let look at what we did this last year, since the 0.1 version. We focus on
put in place the core, with huge distributed feature. I think we fulfill our
goal for this part. We got a full scale architecture, we can manage all
classic network or organizationals problems (DMZ, distant lans, or
customers). And I"m quite proud we can say :
* distributed architecture : *Done*.

We add new modules (lot of retention ones, livestatus, ndo, merlin (not
finished :) ), pnp, etc). So, Ninja put aside, we manage in a good way the
main UIs of the Nagios world. It's still a point in progress, but it should
not ask a lot of work, mainly bug fixes and small improvements. So we can
say :
* export/presentation modules : *really good*.

One other thing that we add is a configuration enhancement and
simplification (service generators or easy dependencies definitions for
example). It's cool for people that wrote their conf with vi, they wrote
their conf in an efficient way now.
* configuration enhancement and simplification : *Done*.

We also add new quality method, especially the test driven one, and so we
are sure we just delete bugs, and nearly never add new ones n previous
features. It's a very comfortable way for hacking code. Without it, w should
not have as much feature as we got, and maybe no production installation at
all :)

One other thing I'm glad we add is a new way of look at the monitoring. I'm
talking about root problem/impacts + criticity. It's something very easy to
use, because it just need one parameter, from 0 to 5 for the criticity, but
the implications are just greats :
* far less easy to configure notification filter (only prod, not less)
* business rules that respect the root cause analysis feature, and easy to
setup.
* export theses informations in LiveStatus (that became the default API) so
UIs can use it to show only Business impacting problems.
So we can say :
* in core "focus on business feature & correlation" : *Done*.

So we can say that we reach a very good product, far better than I first
thought one year and half ago. *Big thanks an congrats everyone :)*


*What did we failed too*
But all was not as good as all theses points :
* My English skill is still very low :)
* Our wiki is very sparse in tutorials. Yes we got the "official doc" from
Nagios with the new features, but it's a nightmare to read and start with
such a documentation.
* The UIs did not follow us a lot. Yes they solve some bugs, but I think the
main addition in the monitoring from Shinken is not it's architecture, even
if it's a great one, but the root problem+ criticity one, really. And this
was not used by UIs, Thruk aside with shinken specific views.

I think there are our major problem right now for a shinken domination of
the world.... too much? ok, for a large shinken acceptance from users, that
show it as a "new Nagios" than a very enhanced one that will help us in
their day to day job.

For my English skills, I start English 16 years ago, so I think it will just
won't be possible. I'll try to read again the whole Harry potter books and
watch films in English, it can help :)

For the wiki, I think it's mainly my fault. It's very very hard to **start**
a documentation, but far more easy to enhanced it. I didn't wrote in it for
some weeks, and hopefully some people remember me that features are useless
without documentation. And I think, it's more than it. It's not
documentation we need, but tutorials about each feature. That what I try to
create in our new wiki main page, with a lof of tutorials. It's the same
thing with our web site, it's more "easy" to look at what shinken offer to
solve users problems.

I hope the wiki problem won't be one when the firsts 20 tutorials will be
write, and every one will help for enhanced them and wrote new one.
I'll also open a forum, so users will have a easy way to ask for help, far
less frightening than posting in a "devel" list :p (I don't think a user
mailing list is useful, it's the same purpose, we can start with a forum,
and wait some times to look at teh result).

For the third point, it's far more problematic. Today's admins are not the
same than 10 years before. Nowadays, we can talk about "speed admins",
because they do not have anymore the time to be expert in one thing, but
must be medium in a lof of things (I'm personally a
linux/windows/SAN/vmware/network/monitoring admin, and it's quite a short
list). It will be even harder in the future, with the "devops" arrival.

Nearly all of people of this mailing list know the difference between a core
and an UI. But a LOT of admins don't. It's not they are dumb, it's just they
do not have the time to look at such "detail".

And it's is a major problem for our (lovely) project. We got no visibility.
Of course our web site is cool :) but the main page that is look at is ..
screenshots!

So we face a double problem :
* we lack visibility for a lot of users, because we do not have an UI.
Simple problem, but terrible impacts for us.
* the other UIs do no follow us really. We use standard API and add new
features easy to access in it (especially LiveStatus), but it was not a
success. Thruk was the most "following" UI, and I would like to thanks Sven
for his support, really, (especially because my perl code was a nightmare,
and he was kind enough to correct it). But even with this inclusion, it's
stil very hard to look at a Thruk with a Nagios/Icinga backend, and a
Shinken one. Yes we got two new views, but it's not enough to help the user
focus on what we think is important for today and especially tomorrow
monitoring : focus on business first.

*So? What we do?*

The documentation and user helping problem will got a solution very soon,
but we must look at the UI one. We say last year in our project vision that
we are not here to make an UI, and if we can "enhance/influence" current
ones, it will be good enough.

I think we (mainly I) were wrong for 50%. **Not** making an UI allow us to
focus on core enhancement, stabilization and production ready product. And
now we got this, it's time to look at how we can help the users to get the
more prower from Shinken core in the most efficient way. I think add plugins
to current UIs is not enough. We can't make the users focus on business
first if we got the same view than Nagios 10 years ago. It's just not
possible. We can't afford having hosts and services manage in a different
ways anymore, both are "end user resource" after all, nothing more.

That's why I say that the root problem/criticity was so important the last
year, it will give a new way of "working" for admins for day to day work. It
should be simple to show links betweens elements, it should be immediate to
look at business impacts, it should be immediate to look at root problems of
this impacts, we do not need to see IT elements every where if they are not
"important" (business supporting IT), it's far enough to look at them on or
twice a day by default in such an UI.

I think it's just not possible to got such a new way with current UIs,
because it will need shinken hooks every where, and no one will want this,
especially because some old school users won't want this change, got their
habits and long hairs and will never use such a monitoring UI. And it's
good, they already got such an UI. They even got plenty of them, nearly all
UIs (nagvis and business process put aside) propose the same way of
thinking.

I think now with a stable core (the main need is for some retention
parameters and an enhanced merlin module, not something that will five us
work for one more year, more like one week :) ), it time for us to think
about such an UI.

I won't fade it, it's important to get "our own" to promote the project of
course, but I'm --> **strongly** <-- against doing an UI like all others,
put our logo and say "cool, we got our own ui, great isn't it?". No. Doing
so is not great. If we do one, it should add a new dimension, a new way of
seeing users problems, like we did in the core for distributed. The main
idea was not to ask "how we can make the current things scale in a good
way", but "how it should be done in a perfect wold. Ok. Now, it is possible
to do it with current code? Ok, let do a new one->Shinken".


*An UI? For who?** Which UI?*
I think the main thing to ask is if the currents admins and tomorrow ones
got their "perfect" UI? There are strong difference between monitoring
users. We can split in 3 main parts I thinks :
* operators : they are dedicated to monitoring, they should look at ALL
errors and solved them. Simple. Currents UIs are good for them (maybe a
criticity sorting can help them, but plugins and patches are good for them)
* admins : they are more and more asked to focus on business, because they
have less and less time to give for their monitoring solution. They should
look in continuous way at IT elements that impacts productions, qualif and
dev ones should be looked one or twice a day, not more.
* admins boss (N+1 for example) : they want to look at business impacts, and
see "easily" what is impacting it (so they can rushed to the good admin and
"help" him to solve it :) ).

So in all cases, the* root problem/impacts + criticty is very very important
*. It's even the difference between look at a console full of red elements
(like 500+) or an UI that show that we lost the distant ERP, and one click
after that, that it's due to the distant firewall that cannot write logs
because its hard drive is full. 10 minutes in the first case to find "what
solved", 30 sec for the N°2.
As such a console user, I begin to look at how get more productive with my
monitoring console. And from now it's not possible, I just lost a lof of
time during large impacts.


I think first we should focus on what we add for pure monitoring that help
in the instant the admins. I vote for an UI:
* very simple : who care about having 20 different views? I think a very
small set of very useful and thinked ones are far better than a plenty of
medium ones.
* strongly focus on business : it should be clear that IT is just here for
support end user app. If the admin want a classic UI, it take one of the
others, they will always be available. So the main view should be critical
(as criticity, not the service status) user app impacted. Then it should be
very easy to show the root problem of theses impacts. This view will be
useful for our two user populations (admins with a LOT of elements, that
should focus on business app first. I think in the future, most admins will
be in this case, and admins bosses, that focus on prod business only. He
(she) doesn't care about other "environments").

We can add another "classic" view that show host/services in problems for
pure IT elements. And only ONE view for theses 2 elements. It's another
thing important : host and service are here for end user app. They are
resources only, do not need do separate them.

It should be easy to "tag" end user apps (so the criticity).

It should be easy also to "select" realms. So if a guy got access to some
realms, it should be easy for him to select them (active/disable).

It should be easy to see realms status, and in fact daemons status.

Of course, there will be question about the configuration part, we can put
this for a V2 after we solved all of theses points. A lot of huge IT use on
the hand configuration tool (from CMDB, etc), and so such a tool won't help
them. So the "efficient visualization" (focus on critical root problem)
should be add first.

The main spirit should be "small is beautiful". There other UI with a lof of
features, users can still use them if they want :)

I think for operators that must solve everything, the classic view is
enough, old school admin will use it too, new hype admins will use the
efficient one, like their bosses.

We should focus on what shinken add for monitoring, and I think the
distributed and root problem/criticity are the key points. There are also
business rules that can be quite easily added (but not in a specific view,
more like a hover layout that show the tree if the user want it, no more :)
).

With this, we avoid the dangerous risk of "shinken UI do all you want". No.
>From now it help you to focus on business, nothing more. Then we can look at
user reactions, and gather lot of development power before going too far (we
should NOT forgot we got a core to maintain and develop! :) ).


*So? Are you ok?*
So? Is such an UI ok for you? Is this new project vision good? If it's ok,
we will see how we can do for this ui conception (I've got some mockups that
wait to be shown, and really are different than current (monitoring) UIs :)
) and start this new adventure :D



Jean
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network 
management toolset available today.  Delivers lowest initial 
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Shinken-devel mailing list
Shinken-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/shinken-devel

Reply via email to