Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Stefan Esser
Am 29.06.2011 12:41, schrieb Bernhard Schmidt:
> On Wednesday, June 29, 2011 10:53:41 Stefan Esser wrote:
>> I recreated the panic, this time with kernel dumps correctly configured
>> (thanks for the hint, Scott). The panic message is:
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address   = 0xff809c7a1000
>> fault code  = supervisor read data, page not present
>> instruction pointer = 0x20:0x805e1851
>> stack pointer   = 0x28:0xff8000288ab0
>> frame pointer   = 0x28:0xff8000288b60
>> code segment= base 0x0, limit 0xf, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags= interrupt enabled, resume, IOPL = 0
>> current process = 11 (swi4: clock)
>>
>> Traceback:
>>
>> #10 0x805e1851 in ieee80211_tx_mgt_timeout (arg=0xff809c7a1000)
>> at ../../../net80211/ieee80211_output.c:2487
>>
>> This indicates, that an invalid argument is passed and assigned to
>> "*ni", which causes the page fault when dereferencing "ni" to obtain "*va".
> 
> The problem here seems to be wpa_supplicant. It can try to associate
> at any given point in time which results in the BSS ni being destroyed,
> though it might still be referenced somewhere (In this case the timeout
> stuff, or better said ath's TX queue). Not clearing the reference (or
> stopping whatever is using it) is the fault here. Now how to figure out
> who the caller is? Got the complete backtrace?

Not sure that I understand your question correctly ...

#10 0x805e1851 in ieee80211_tx_mgt_timeout
(arg=0xff809c7a1000) at ../../../net80211/ieee80211_output.c:2487
#11 0x8050f45c in softclock (arg=Variable "arg" is not
available.) at ../../../kern/kern_timeout.c:564
#12 0x804d9876 in intr_event_execute_handlers (p=Variable "p" is
not available.) at ../../../kern/kern_intr.c:1257
#13 0x804da4d6 in ithread_loop (arg=0xfe00032dcc60) at
../../../kern/kern_intr.c:1270
#14 0x804d718d in fork_exit (callout=0x804da440
, arg=0xfe00032dcc60, frame=0xff8000288c50) at
../../../kern/kern_fork.c:920
#15 0x807258ce in fork_trampoline () at
../../../amd64/amd64/exception.S:603

Bernhard, I'm sending you the compressed "core.txt" in private mail.

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Bernhard Schmidt
On Wednesday, June 29, 2011 10:53:41 Stefan Esser wrote:
> Am 29.06.2011 10:03, schrieb Adrian Chadd:
> > On 29 June 2011 14:03, Bernhard Schmidt  wrote:
> >> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
> >> requests. Afaik there is even a similar PR about that.
> 
> Sorry, I manually entered the panic message, since dumps were not
> working on my system at the time of that panic.
> 
> >> Adrian, you've got a AP set up to drop either a AUTH or ASSOC
> >> response frame?
> 
> I've got a number of AUTH -> SCAN transition lost messages for wlan0,
> seconds to minutes apart:
> 
> Jun 28 21:16:17 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
> -> SCAN transition lost
> Jun 28 21:34:46 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
> -> SCAN transition lost
> Jun 28 21:36:33 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
> -> SCAN transition lost
> Jun 28 21:45:14 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
> -> SCAN transition lost
> Jun 28 21:45:44 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
> -> SCAN transition lost
> 
> The setup is easy to reproduce, my rc.conf contained:
> 
> wlans_ath0="wlan0"
> ifconfig_ath0="down"
> ifconfig_wlan0="down"
> wpa_supplicant_enable="YES"

Strip the last 3 lines, don't ever fiddle around with ath0 directly.
This configuration always starts wpa_supplicant.

> This system used to be connected via ath0, but recently was moved to a
> place where Ethernet is available. The panics started only after WLAN
> was not used anymore. I might disable wpa_supplicant, since it is not
> required in the current situation, but did not try whether that helps
> prevent the panic.
> 
> > Tell me how and I'll set it up.
> > 
> > A panic at that point in the function indicates maybe ni is NULL?
> > or ni->vap is now NULL, maybe?
> 
> I recreated the panic, this time with kernel dumps correctly configured
> (thanks for the hint, Scott). The panic message is:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address   = 0xff809c7a1000
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x805e1851
> stack pointer   = 0x28:0xff8000288ab0
> frame pointer   = 0x28:0xff8000288b60
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 11 (swi4: clock)
> 
> Traceback:
> 
> #10 0x805e1851 in ieee80211_tx_mgt_timeout (arg=0xff809c7a1000)
> at ../../../net80211/ieee80211_output.c:2487
> 
> This indicates, that an invalid argument is passed and assigned to
> "*ni", which causes the page fault when dereferencing "ni" to obtain "*va".

The problem here seems to be wpa_supplicant. It can try to associate
at any given point in time which results in the BSS ni being destroyed,
though it might still be referenced somewhere (In this case the timeout
stuff, or better said ath's TX queue). Not clearing the reference (or
stopping whatever is using it) is the fault here. Now how to figure out
who the caller is? Got the complete backtrace?

-- 
Bernhard
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Adrian Chadd
The question here is - what context is the callback being called in?

The lack of net80211 locking has me confused and sad. :/


Adrian

On 29 June 2011 16:27, Bernhard Schmidt  wrote:
> On Wednesday, June 29, 2011 10:03:02 Adrian Chadd wrote:
>> On 29 June 2011 14:03, Bernhard Schmidt  wrote:
>>
>> > It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
>> > requests. Afaik there is even a similar PR about that.
>> >
>> > Adrian, you've got a AP set up to drop either a AUTH or ASSOC
>> > response frame?
>>
>> Tell me how and I'll set it up.
>>
>> A panic at that point in the function indicates maybe ni is NULL?
>> or ni->vap is now NULL, maybe?
>
> vap should never be NULL, so, I'd guess it's ni.
>
> Hmm.. I'd guess there is some kind of racy behavior, if the driver is
> telling us that it was able to send the AUTH req frame, net80211 sets
> up the timeout callback. What happens if the AUTH resp as well as the
> callback hit at the same time? It should be locked appropriately, but
> is it?
>
> This will drop the AUTH response:
>
> Index: sys/net80211/ieee80211_hostap.c
> ===
> --- sys/net80211/ieee80211_hostap.c     (revision 223661)
> +++ sys/net80211/ieee80211_hostap.c     (working copy)
> @@ -978,7 +978,7 @@ hostap_auth_open(struct ieee80211_node *ni, struct
>                    "%s", "station authentication defered (radius acl)");
>                ieee80211_notify_node_auth(ni);
>        } else {
> -               IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
> +               //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 
> 1);
>                IEEE80211_NOTE_MAC(vap,
>                    IEEE80211_MSG_DEBUG | IEEE80211_MSG_AUTH, ni->ni_macaddr,
>                    "%s", "station authenticated (open)");
> @@ -1158,7 +1158,7 @@ hostap_auth_shared(struct ieee80211_node *ni, stru
>                estatus = IEEE80211_STATUS_SEQUENCE;
>                goto bad;
>        }
> -       IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
> +       //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
>        return;
>  bad:
>        /*
>
>
> --
> Bernhard
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Stefan Esser
On 29.06.2011 10:27, Bernhard Schmidt wrote:
> On Wednesday, June 29, 2011 10:03:02 Adrian Chadd wrote:
>> On 29 June 2011 14:03, Bernhard Schmidt  wrote:
>>
>>> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
>>> requests. Afaik there is even a similar PR about that.
>>>
>>> Adrian, you've got a AP set up to drop either a AUTH or ASSOC
>>> response frame?
>>
>> Tell me how and I'll set it up.
>>
>> A panic at that point in the function indicates maybe ni is NULL?
>> or ni->vap is now NULL, maybe?
> 
> vap should never be NULL, so, I'd guess it's ni.

No, neither vap no vap->ni appear to cause NULL dereferences.

The panic message indicates a fault address of 0xff809c7a1000, which
is the value of arg passed to ieee80211_tx_mgt_timeout().

The fault occurs on the first instruction within that function and I
take this to mean, that it points outside kernel VM space. (I have got
to admit, that I do not know the exact memory layout for amd64, though.)

> Hmm.. I'd guess there is some kind of racy behavior, if the driver is
> telling us that it was able to send the AUTH req frame, net80211 sets
> up the timeout callback. What happens if the AUTH resp as well as the
> callback hit at the same time? It should be locked appropriately, but
> is it?
> 
> This will drop the AUTH response:

I have received a number of messages that might indicate a lost race:

ieee80211_new_state_locked: pending AUTH -> SCAN transition lost

repeats with between a few seconds and 20 minutes between messages.

> Index: sys/net80211/ieee80211_hostap.c
> ===
> --- sys/net80211/ieee80211_hostap.c   (revision 223661)
> +++ sys/net80211/ieee80211_hostap.c   (working copy)
> @@ -978,7 +978,7 @@ hostap_auth_open(struct ieee80211_node *ni, struct
>   "%s", "station authentication defered (radius acl)");
>   ieee80211_notify_node_auth(ni);
>   } else {
> - IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
> + //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
>   IEEE80211_NOTE_MAC(vap,
>   IEEE80211_MSG_DEBUG | IEEE80211_MSG_AUTH, ni->ni_macaddr,
>   "%s", "station authenticated (open)");
> @@ -1158,7 +1158,7 @@ hostap_auth_shared(struct ieee80211_node *ni, stru
>   estatus = IEEE80211_STATUS_SEQUENCE;
>   goto bad;
>   }
> - IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
> + //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
>   return;
>  bad:
>   /*
> 
> 

I could try that patch for a few hours ...

Regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Stefan Esser
Am 29.06.2011 10:03, schrieb Adrian Chadd:
> On 29 June 2011 14:03, Bernhard Schmidt  wrote:
>> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
>> requests. Afaik there is even a similar PR about that.

Sorry, I manually entered the panic message, since dumps were not
working on my system at the time of that panic.

>> Adrian, you've got a AP set up to drop either a AUTH or ASSOC
>> response frame?

I've got a number of AUTH -> SCAN transition lost messages for wlan0,
seconds to minutes apart:

Jun 28 21:16:17 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
-> SCAN transition lost
Jun 28 21:34:46 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
-> SCAN transition lost
Jun 28 21:36:33 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
-> SCAN transition lost
Jun 28 21:45:14 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
-> SCAN transition lost
Jun 28 21:45:44 kernel: wlan0: ieee80211_new_state_locked: pending AUTH
-> SCAN transition lost

The setup is easy to reproduce, my rc.conf contained:

wlans_ath0="wlan0"
ifconfig_ath0="down"
ifconfig_wlan0="down"
wpa_supplicant_enable="YES"

This system used to be connected via ath0, but recently was moved to a
place where Ethernet is available. The panics started only after WLAN
was not used anymore. I might disable wpa_supplicant, since it is not
required in the current situation, but did not try whether that helps
prevent the panic.

> Tell me how and I'll set it up.
> 
> A panic at that point in the function indicates maybe ni is NULL?
> or ni->vap is now NULL, maybe?

I recreated the panic, this time with kernel dumps correctly configured
(thanks for the hint, Scott). The panic message is:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xff809c7a1000
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x805e1851
stack pointer   = 0x28:0xff8000288ab0
frame pointer   = 0x28:0xff8000288b60
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 11 (swi4: clock)

Traceback:

#10 0x805e1851 in ieee80211_tx_mgt_timeout (arg=0xff809c7a1000)
at ../../../net80211/ieee80211_output.c:2487

This indicates, that an invalid argument is passed and assigned to
"*ni", which causes the page fault when dereferencing "ni" to obtain "*va".

I'm afraid that the assumption in the comment (about timeout being save
to use) does not really hold:

static void
ieee80211_tx_mgt_timeout(void *arg)
{
struct ieee80211_node *ni = arg;
struct ieee80211vap *vap = ni->ni_vap;

if (vap->iv_state != IEEE80211_S_INIT &&
(vap->iv_ic->ic_flags & IEEE80211_F_SCAN) == 0) {
/*
 * NB: it's safe to specify a timeout as the reason here;
 * it'll only be used in the right state.
 */
ieee80211_new_state(vap, IEEE80211_S_SCAN,
IEEE80211_SCAN_FAIL_TIMEOUT)*vap ;
}
}

If "vap" is valid during one invocation of that function, I'd expect it
to at least be a pointer to valid kernel memory after the timeout.
I.e., the value found by dereferencing it may be stale, but the pointer
itself should at least not cause a page fault. (???)


The compressed core.txt is 27KB, the compressed vmcore is 800MB. I might
be able to find a place to upload the vmcore file to, but since I'm
currently on a DSL with only 672KBit/s upstream, it would take me some 3
hours to upload to a better connected server (and I'd like to avoid
doing that, if not essential for debugging).

The core.txt is small enough to send by mail. Let me know if you think
it helps you understand the problem.


I'm willing to support debugging, e.g. by placement of printfs in my
kernel for the timeout handler and the creation and destruction of *vap
structures.


After removal of "wlans_ath0=wlan0" the system will most probably be
stable, I did not specifically test this case (i.e. ath0 configured, but
no wlan0 created). I do know, that an "ifconfig down" of ath0 and wlan0
suffices; probably an "ifconfig wlan0 down" alone would be enough.

So, I know how to avoid the panic, but I think it is still important to
find the cause.

Thank you for looking into this!


Best regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Bernhard Schmidt
On Wednesday, June 29, 2011 10:03:02 Adrian Chadd wrote:
> On 29 June 2011 14:03, Bernhard Schmidt  wrote:
> 
> > It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
> > requests. Afaik there is even a similar PR about that.
> >
> > Adrian, you've got a AP set up to drop either a AUTH or ASSOC
> > response frame?
> 
> Tell me how and I'll set it up.
> 
> A panic at that point in the function indicates maybe ni is NULL?
> or ni->vap is now NULL, maybe?

vap should never be NULL, so, I'd guess it's ni.

Hmm.. I'd guess there is some kind of racy behavior, if the driver is
telling us that it was able to send the AUTH req frame, net80211 sets
up the timeout callback. What happens if the AUTH resp as well as the
callback hit at the same time? It should be locked appropriately, but
is it?

This will drop the AUTH response:

Index: sys/net80211/ieee80211_hostap.c
===
--- sys/net80211/ieee80211_hostap.c (revision 223661)
+++ sys/net80211/ieee80211_hostap.c (working copy)
@@ -978,7 +978,7 @@ hostap_auth_open(struct ieee80211_node *ni, struct
"%s", "station authentication defered (radius acl)");
ieee80211_notify_node_auth(ni);
} else {
-   IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
+   //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
IEEE80211_NOTE_MAC(vap,
IEEE80211_MSG_DEBUG | IEEE80211_MSG_AUTH, ni->ni_macaddr,
"%s", "station authenticated (open)");
@@ -1158,7 +1158,7 @@ hostap_auth_shared(struct ieee80211_node *ni, stru
estatus = IEEE80211_STATUS_SEQUENCE;
goto bad;
}
-   IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
+   //IEEE80211_SEND_MGMT(ni, IEEE80211_FC0_SUBTYPE_AUTH, seq + 1);
return;
 bad:
/*


-- 
Bernhard
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-29 Thread Adrian Chadd
On 29 June 2011 14:03, Bernhard Schmidt  wrote:

> It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
> requests. Afaik there is even a similar PR about that.
>
> Adrian, you've got a AP set up to drop either a AUTH or ASSOC
> response frame?

Tell me how and I'll set it up.

A panic at that point in the function indicates maybe ni is NULL?
or ni->vap is now NULL, maybe?



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-28 Thread Bernhard Schmidt
On Wednesday, June 29, 2011 03:50:08 Adrian Chadd wrote:
> This is kinda strange; that symbol doesn't exist in the net80211 or ath 
> source.
> 
> What the heck?
> 
> 
> 
> adrian
> 
> 
> 
> On 28 June 2011 17:28, Stefan Esser  wrote:
> > Hi,
> >
> > is this a known issue?
> >
> > My -CURRENT system (r223560M, amd64, 8GB, Atheros WLAN) panics after
> > minutes to hours of uptime with the following message:
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 0; apic id = 0
> > fault virtual address   = 0xff807f502000
> > fault code  = supervisor data read, page not present
> > ...
> > processor eflags= interrupt enabled, resume, IOPL = 0
> > current process = 11 (swi4: clock)
> > [ thread pid 11 tid 112 ]
> > Stopped at  ieee80211_tx_mgmt_timeout+0x1:  movq (%rdi),%rdi
> >
> > db> bt
> > Tracing pid 11 tid 100012 td 0xfe00032e
> > ieee80211_tx_mgmt_timeout() at ieee80211_tx_mgmt_timeout+0x1
> > intr_event_execute_handlers() at intr_event_execute_handlers+0x66
> > ithread_loop() at ithread_loop+0x96
> > fork_exit() at fork_exit+0x11d
> > fork_trampoline() at fork_trampoline+0xe
> > --- trap 0, rip = 0, rsp = 0xff8000288d00, rbp = 0 ---
> >
> > This panic message is manually transcribed, since the GPT-only
> > partitioning prevents dumping of a kernel core. (Why, BTW?)
> > I could add a swap partition on a MBR disk, if a core dump seems
> > neccessary to diagnose the problem. I'm also willing to wait for that
> > panic to occur again and to gather more debug output.
> >
> >
> > Other information: The Atheros WLAN in this system is unused (not
> > associated) but both ath0 and wlan0 were "UP" at the time of the panic.
> >
> > Initial testing shows the system to be stable with both wlan0 and ath0
> > set to "down" after boot. But still, the timeout should not panic the
> > kernel, if WLAN is active but not fully configured (e.g. no SSID).
> >
> > Any ideas?

It's name is ieee80211_tx_mgt_timeout used to track AUTH/ASSOC
requests. Afaik there is even a similar PR about that.

Adrian, you've got a AP set up to drop either a AUTH or ASSOC
response frame?

-- 
Bernhard
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-28 Thread Adrian Chadd
This is kinda strange; that symbol doesn't exist in the net80211 or ath source.

What the heck?



adrian



On 28 June 2011 17:28, Stefan Esser  wrote:
> Hi,
>
> is this a known issue?
>
> My -CURRENT system (r223560M, amd64, 8GB, Atheros WLAN) panics after
> minutes to hours of uptime with the following message:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 0
> fault virtual address   = 0xff807f502000
> fault code              = supervisor data read, page not present
> ...
> processor eflags        = interrupt enabled, resume, IOPL = 0
> current process         = 11 (swi4: clock)
> [ thread pid 11 tid 112 ]
> Stopped at      ieee80211_tx_mgmt_timeout+0x1:  movq     (%rdi),%rdi
>
> db> bt
> Tracing pid 11 tid 100012 td 0xfe00032e
> ieee80211_tx_mgmt_timeout() at ieee80211_tx_mgmt_timeout+0x1
> intr_event_execute_handlers() at intr_event_execute_handlers+0x66
> ithread_loop() at ithread_loop+0x96
> fork_exit() at fork_exit+0x11d
> fork_trampoline() at fork_trampoline+0xe
> --- trap 0, rip = 0, rsp = 0xff8000288d00, rbp = 0 ---
>
> This panic message is manually transcribed, since the GPT-only
> partitioning prevents dumping of a kernel core. (Why, BTW?)
> I could add a swap partition on a MBR disk, if a core dump seems
> neccessary to diagnose the problem. I'm also willing to wait for that
> panic to occur again and to gather more debug output.
>
>
> Other information: The Atheros WLAN in this system is unused (not
> associated) but both ath0 and wlan0 were "UP" at the time of the panic.
>
> Initial testing shows the system to be stable with both wlan0 and ath0
> set to "down" after boot. But still, the timeout should not panic the
> kernel, if WLAN is active but not fully configured (e.g. no SSID).
>
> Any ideas?
>
> Best regards, STefan
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Panic in ieee80211 tx mgmt timeout

2011-06-28 Thread Scot Hetzel
On Tue, Jun 28, 2011 at 4:28 AM, Stefan Esser  wrote:
> This panic message is manually transcribed, since the GPT-only
> partitioning prevents dumping of a kernel core. (Why, BTW?)

You should be able to get a kernel core dump on a system with a GPT
partitioned disk.

Do you have a freebsd-swap partition?

How is your GPT disk partitioned?

Scot
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Panic in ieee80211 tx mgmt timeout

2011-06-28 Thread Stefan Esser
Hi,

is this a known issue?

My -CURRENT system (r223560M, amd64, 8GB, Atheros WLAN) panics after
minutes to hours of uptime with the following message:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 0
fault virtual address   = 0xff807f502000
fault code  = supervisor data read, page not present
...
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 11 (swi4: clock)
[ thread pid 11 tid 112 ]
Stopped at  ieee80211_tx_mgmt_timeout+0x1:  movq (%rdi),%rdi

db> bt
Tracing pid 11 tid 100012 td 0xfe00032e
ieee80211_tx_mgmt_timeout() at ieee80211_tx_mgmt_timeout+0x1
intr_event_execute_handlers() at intr_event_execute_handlers+0x66
ithread_loop() at ithread_loop+0x96
fork_exit() at fork_exit+0x11d
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xff8000288d00, rbp = 0 ---

This panic message is manually transcribed, since the GPT-only
partitioning prevents dumping of a kernel core. (Why, BTW?)
I could add a swap partition on a MBR disk, if a core dump seems
neccessary to diagnose the problem. I'm also willing to wait for that
panic to occur again and to gather more debug output.


Other information: The Atheros WLAN in this system is unused (not
associated) but both ath0 and wlan0 were "UP" at the time of the panic.

Initial testing shows the system to be stable with both wlan0 and ath0
set to "down" after boot. But still, the timeout should not panic the
kernel, if WLAN is active but not fully configured (e.g. no SSID).

Any ideas?

Best regards, STefan
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"