Hello all, Here's my experience at monitoring XWikis.
With i2geo.net and with my private XWiki, I use a zabbix server. This php-based monitoring tool is quite easy to configure for http monitoring and with a few more steps you get a mail notification when, e.g., a timeout occurs in connections. I've been using HypericHQ for a while, a java based monitoring, which was rather nice to manipulate but a machine-name-change broke everything, so I looked for something a tick more modern. At curriki.org, a site with lots of visitors, there's quite a few tools used to monitor. - First, for the safety and honesty of a system outside, alertsite.com is used. It is very effective at detecting breakges, including potential internet backbones'. We use monitoring from three locations. - Second, because, indeed, the XWiki servers sometimes need a push, there used to be a regular script that checks a basic page and, if failed, auto-restarts the app-server. For us, this is a bit unsafe because we like to control things after a restart. - Third, for a while, we have been running a "combined monitoring" which allowed to combine a small graphical view synced with logs of apache, the app-server, thread-dumps, and mysql. This allowed to catch "bad actions" which sometimes happen when power users perform actions which trigger too big queries which locked others (group-deletions were such an action). - Finally, we also added a zabbix which collects http monitoring as well as other "classical" values (disks, memory, apache-stats, …). The rhythm at curriki is about a week… after a week, one of the two cluster nodes (there's two currently) needs a restart because some memory gets exhausted and the GC starts to fail. We generally get alertsite errors then. The interest of running a monitoring infrastructure such as zabbix, is that you can analyze the behaviors of multiple variables and see if there is a way to predict if things are getting wrong. It remains a guts' feeling story but still gives you quite some confidence. It would be really nice if we could converge on a set of JMX analysis "items" for zabbix so that we could be analyzing more concretely the xwiki-relevant information (in particular the cache behaviors) and start adjusting to less fall out of memory. paul On 31 oct. 2014, at 22:29, Jason Clemons <[email protected]> wrote: > I's also find any suggestions very helpful, I've had that happen a few times > and outside of monitoring CPU and RAM, I've found logging to be difficult to > use and configure, and even when I get it configured it's not very helpful. > > > >> On Oct 31, 2014, at 1:57 PM, Bryn Jeffries <[email protected]> >> wrote: >> >> Having made my XWiki site available to other users, I was concerned to find >> that the site became unusable at one point with client connections >> eventually timing out. I had no way to diagnose the problem, but eventually >> I managed to make a (slow) SSH connection to the server and restarted >> Tomcat, and things seemed to settle back to normal. >> >> The problem is I have no real sense of what happened and how to prevent it >> happening again. To that end, I'd appreciate any suggestions for monitoring >> the server and diagnosing poor performance. What do others typically use? I >> have an Apache2 server passing wiki page requests to Tomcat7 via an ajp >> connector, and a PostgreSQL database. My guess is that Tomcat is doing most >> of the work here so that's probably what I need to monitor the most. _______________________________________________ users mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/users
