Re: mira sfer overflow panic (was: Re: 11n support for athn(4))

2017-02-11 Thread Theo Buehler
On Sat, Feb 11, 2017 at 10:31:39AM +, Peter Kay wrote:
> 
> 
> On Thu, Jan 26, 2017 at 10:38:44AM +0100, Stefan Sperling wrote:
> > On Thu, Jan 26, 2017 at 06:36:06AM +, Peter Kay wrote:
> > > sfer overflow
> > 
> > Interesting. This is the first time I've ever seen this panic trigger.
> > 
> > Can you apply this patch and try to trigger it again?
> I've been running with MIRADEBUG since Feb 1st, just now had a different panic
> 
> Bogus long slot station count 0
> 
> Ieee80211_node_leave _11g+0xd0
> 
> Will post details later when I've had chance to ocr the screencaps

Thanks, I think you can spare yourself the trouble of ocr'ing the
screencaps. On February 2nd stsp committed a fix for the refcounting
bugs that led to this panic:
http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net80211/ieee80211_node.c?rev=1.113=text/x-cvsweb-markup



Re: mira sfer overflow panic (was: Re: 11n support for athn(4))

2017-02-11 Thread Peter Kay


On Thu, Jan 26, 2017 at 10:38:44AM +0100, Stefan Sperling wrote:
> On Thu, Jan 26, 2017 at 06:36:06AM +, Peter Kay wrote:
> > sfer overflow
> 
> Interesting. This is the first time I've ever seen this panic trigger.
> 
> Can you apply this patch and try to trigger it again?
I've been running with MIRADEBUG since Feb 1st, just now had a different panic

Bogus long slot station count 0

Ieee80211_node_leave _11g+0xd0

Will post details later when I've had chance to ocr the screencaps



Re: mira sfer overflow panic (was: Re: 11n support for athn(4))

2017-01-28 Thread Stefan Sperling
On Thu, Jan 26, 2017 at 10:38:44AM +0100, Stefan Sperling wrote:
> On Thu, Jan 26, 2017 at 06:36:06AM +, Peter Kay wrote:
> > sfer overflow
> 
> Interesting. This is the first time I've ever seen this panic trigger.
> 
> Can you apply this patch and try to trigger it again?

Peter, you can throw the diff below away and update your src tree.

I have just committed a better diff. These errors are recoverable so
there is no real reason to panic when they occur. To see the problem
with this new code, you'll need to run 'ifconfig athn0 debug'. This
will log what used to be the panic message to dmesg, including the
MAC address of the client which triggered it.

To also get the driver stats we need to debug the problem, please compile
a kernel with 'option MIRA_DEBUG' (or add a line '#define MIRA_DEBUG'
at the top of ieee80211_mira.c before compiling the kernel).
This will print the stats as well so we can see why sfer is overflowing.
MIRA_DEBUG will also cause some other noise (e.g. when mira selects a
new rate). You can ignore that.

> Index: ieee80211_mira.c
> ===
> RCS file: /cvs/src/sys/net80211/ieee80211_mira.c,v
> retrieving revision 1.8
> diff -u -p -r1.8 ieee80211_mira.c
> --- ieee80211_mira.c  12 Jan 2017 18:06:57 -  1.8
> +++ ieee80211_mira.c  26 Jan 2017 09:37:27 -
> @@ -427,8 +427,15 @@ ieee80211_mira_update_stats(struct ieee8
>  
>   /* Compute Sub-Frame Error Rate (see section 2.2 in MiRA paper). */
>   sfer = (mn->frames * mn->retries + mn->txfail);
> - if ((sfer >> MIRA_FP_SHIFT) != 0)
> + if ((sfer >> MIRA_FP_SHIFT) != 0) {
> + printf("%s: driver stats:\n", __func__);
> + printf("mn->frames = %u\n", mn->frames);
> + printf("mn->retries = %u\n", mn->retries);
> + printf("mn->txfail = %u\n", mn->txfail);
> + printf("mn->ampdu_size = %u\n", mn->ampdu_size);
> + printf("mn->agglen = %u\n", mn->agglen);
>   panic("sfer overflow"); /* bug in wifi driver */
> + }
>   sfer <<= MIRA_FP_SHIFT; /* convert to fixed-point */
>   sfer /= ((mn->retries + 1) * mn->frames);
>   if (sfer > MIRA_FP_1)
>