Hi Marc; The solution to your problem depends a bit on the alerting requirements at your site -- for example, do you care if alerts are delayed by one ore more polling cycles in SmokePing?
My first suggestion would be to define an alert something like this: +someloss type = loss pattern = 0%, 0%, 0%, 0%, 0%, >0%, >0%, >0% comment = Loss detected for last 3 polling cycles This alert definition will trigger when you have 3 *consecutive* polling cycles with some packet loss; this is different than the alert you tried (>0%, *12*, >0%, *12*, >0%, *12*) because the "*12*" in your pattern acts as a wildcard... it will match ANYTHING. So your alert pattern basically says "If we've seen >0% three times in the last 39 poll cycles, trigger an alert. Based on the data samples you provided, I believe a consecutive model would suit your needs better. If you need to get alerts sooner for actual problems, consider defining a second alert as well... something like: +bigloss type = loss pattern = 0%, 0%, 0%, >20% comment = We have sudden, severe packet loss If you enable both alerts on your hosts, you will get alerts when you have persistent, low-to-moderate (1-20%) loss on the links, but you'll get an alert immediately when there are bigger problems (>20% loss). I think these rules will probably serve you well as a baseline, but don't be afraid to experiment. I find it usually takes a couple weeks of testing & tweaking to find an optimum set of alerts for any given network simply due to different topology/architecturer, etc. - Peter On 21/07/2010 2:44 AM, Marc Haber wrote: > Hi, > > when a network device is quite busy (for example, when backup of some > servers connected to this device is going on), it's going to drop some > packets, resulting in loss data like this: > > 00:35:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5% > 00:35:52 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%, 0% > 00:48:53 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5% > 00:49:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0% > 00:49:53 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10% > 00:50:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%, 0% > 00:53:54 > loss: 0%, 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, > 0%, 0%, 5%, 0%, 10%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5% > 00:54:24 > loss: 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, > 0%, 5%, 0%, 10%, 0%,0%, 0%, 0%, 0%, 0%, 0%, 5%, 0% > > When one has the miniloss alert from the smokeping_config defined, > this causes the alarm to get raised and cleared multiple times over > this rather short period of time: > > 00:35:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5% > alarm raised > 00:35:52 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 10%, 0%, 0%, 5%, 0%, 5%, 0% > alarm cleared > 00:48:53 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5% > alarm raised > 00:49:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0% > alarm cleared > 00:49:53 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10% > alarm raised > 00:50:23 > loss: 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 0%, 0%, > 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, 0%, 5%, 0%, 10%, 0% > alarm cleared > 00:53:54 > loss: 0%, 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, > 0%, 0%, 5%, 0%, 10%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5% > alarm raised > 00:54:24 > loss: 0%, 5%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%, 5%, 0%, > 0%, 5%, 0%, 10%, 0%,0%, 0%, 0%, 0%, 0%, 0%, 5%, 0% > alarm cleared > > I am wondering whether it makes sense to clear the alarm just because > there is a 0% in the last slot of the data being considered. This > causes the alarm to flap in the case of occasional packet loss. > > I am thinking of either modifing the alarm so only go of for changes> > 5 %, like > > +miniloss > type = loss > # in percent > pattern =>5%,*12*,>5%,*12*,>5% > comment = detected loss 3 times over the last two hours > > or to have it stay raised even if the current loss is 0%, like > > +miniloss > type = loss > # in percent > pattern =>0%,*12*,>0%,*12*,>0%,*12* > comment = detected loss 3 times over the last two hours > > or > +miniloss > type = loss > # in percent > pattern =>0%,*12*,>0%,*12*,>0%,*12*,>=0% > comment = detected loss 3 times over the last two hours > > I would like to ask the more experienced users how you would act in my > position. Would you ditch the miniloss alert altogether, would you > modify it, and if so, how? > > Greetings > Marc > > _______________________________________________ smokeping-users mailing list [email protected] https://lists.oetiker.ch/cgi-bin/listinfo/smokeping-users
