Currently running on approx 125 nodes grown from 50 over the past six months on demand - we have to expect that to continue so are anticipating 200 in the next six months. On top of that we have a staging environment which has about 20 nodes. We do a *lot* of complex processing on them due to the nature of our business. In addition to this we have massive fluctuation of sessions depending on the time of day - e.g. when office workers get into work in the morning we take a big spike. Also we run in about 35 different countries.
I have a project in the pipeline to start rebuilding some of the code base as it is a sprawling mess which probably doesn't help our scalability but right now we have issues with logging which I need to solve as a first priority. Currently all we have 'in the mix' is log4net doing exception handling and logging to the filesystem and website monitoring to tell us of our uptime and some errors. This is obviously not good enough by any stretch of the imagination. The MSMQ idea is not a bad one however I wondered if it was overkill? We have enough server power in house to not need EC2 or anything - however the problem still remains what are we going to use for the aggregation/reporting? Ideally we want to know when we are experiencing problems, their level of severity, without being peppered with feedback from 100,000 users all at once.. On 6 April 2012 23:35, Joseph Cooney <[email protected]> wrote: > I've seen people use msmq to write a log entry locally and have it read > from the local machine into a centralized location, but that was on a > system with only about 20 web nodes. I've also seen ppl write to the > windows event log, and use monitoring tools like SCOM to aggregate (also on > about 20 nodes). > > I'm curious to know what you're doing that requires that many nodes. I > know a lot of household name web sites that run on a 10th or 20th of that. > > Sent from my iPhone > > On 06/04/2012, at 11:20 PM, Dave Walker <[email protected]> wrote: > > > Hey guys, > > > > looking for ideas or proven experiences involving logging from larger > applications e.g. 200+ web nodes. Our current system of log4net into files > on each server is quickly proving to be a nightmare - We are struggling to > find out when, where, how and why things are breaking because of the rapid > growth we have experienced. > > > > E.g. things we have thought about include log4net into a DB that we can > pull into a centralised location, MSMQ, file watchers etc. > > > > What have you guys done in the past? > > > > Thanks, > > Dave Walker >
