Re: USB lockup

2020-11-26 Thread Edgar Fuß
Thanks a lot for looking into this!

> Really hard to help without seeing the full ohcidebug usbhist log.
The problem is that file system (or block I/O) seems to lock up so the 
usbhist is hard to get out of the machine other than by canera. 
I guess dump-ing will take ages to complete (16G RAM).
I could try to replace my panic with simply writing something to usbhist 
and aborting the loop.

> I guess the E20 TD got written out with incorrect next_td, or some other
> error condition caused the mixup.
You mean nexttd or td_nexttd? As far as I can tell, neither field is touched 
by the driver without being ohci_dump_td()'d afterwards, and, as I wrote, 
minus the loopback td_nexttd, everything is exactly as one would expect.

> The change I referred to was
I'll have a look into that one tomorrow.

> is something being aborted?
May well be. I haven't checked yet.

My feeling is that this is either a controller error or some sort of 
DMA/cache/barrier/whatever race during the HccaDoneHead manipulation. 
But I'm steadily confused by the writing-a-1-clears-the-bit or 
writing-a-1-sets-the-bit semantics of the registers and know nothing about 
all these cache/barrier/re-ordering issues other that they may exist.

The one nice thing is that the lock-up is easily and 100% reproducible. 
If only these PeCee boxes wouldn't take ages to reboot.


Re: USB lockup

2020-11-26 Thread Nick Hudson

On 26/11/2020 20:35, Edgar Fuß wrote:

Add a check to ohci_softintr to see if the list goes circular and enter
ddb / dump usbhist when it does...

I already did add a panic and it fired.

I'm still trying to find out how that happens.

What I'm seeing (dumped by device_ctrl_start()) is a chain of four TDs
(named here after their addresses' three least significant nybbles):
E20->EE0->FA0->F40->0
which are linked in that sense by both nexttd and td.td_nexttd.

Then, in ohci_softint(), the done queue is (as linked by td.nexttd):
FA0->EE0->E20->FA0->...
and, as expected, the nexttd links are as before.
Absent the E20->FA0 link, that's exactly what one would expect if the first
three TDs have been handled (the done list is most recently done first);
the big question is where that additinal link comes from.

I've added code to ohci_hash_add_td() to catch a TD being added with a
physical address already present in the hash list, but that didn't fire.




Really hard to help without seeing the full ohcidebug usbhist log. I
guess the E20 TD got written out with incorrect next_td, or some other
error condition caused the mixup.

The change I referred to was

Revision 1.254.2.76 / (download) - annotate - [select for diffs], Mon
May 30 06:46:50 2016 UTC (4 years, 5 months ago) by skrll
Branch: nick-nhusb
Changes since 1.254.2.75: +181 -48 lines
Diff to previous 1.254.2.75 (colored) to branchpoint 1.254 (colored)

Restructure the abort code for TD based transfers (ctrl, bulk, intr).

In PR/22646 some TDs can be on the done queue when the abort start and,
if this is the case, they need to processed after the WDH interrupt.
Instead of waiting for WDH we release TDs that have been touched by the
HC and replace them with new ones.  Once WDH happens the floating TDs
will be returned to the free list.


is something being aborted?

Nick


Re: MAXTSIZ removal?

2020-11-26 Thread Christos Zoulas
In article <20201125210311.7wofo3mtipvfb...@yt.nih.at>,
Thomas Klausner   wrote:
>There was a commit by christos that made MAXTSIZ optional, but
>at least the amd64 vmparam.h still defines it.
>
>Any reason not to remove it?
>
>(I still can't start emulators/mame with a GENERIC without that change)
> Thomas

Nope I'll remove it,

christos



Re: USB lockup

2020-11-26 Thread Edgar Fuß
> Add a check to ohci_softintr to see if the list goes circular and enter
> ddb / dump usbhist when it does...
I already did add a panic and it fired.

I'm still trying to find out how that happens.

What I'm seeing (dumped by device_ctrl_start()) is a chain of four TDs 
(named here after their addresses' three least significant nybbles):
E20->EE0->FA0->F40->0
which are linked in that sense by both nexttd and td.td_nexttd.

Then, in ohci_softint(), the done queue is (as linked by td.nexttd):
FA0->EE0->E20->FA0->...
and, as expected, the nexttd links are as before.
Absent the E20->FA0 link, that's exactly what one would expect if the first 
three TDs have been handled (the done list is most recently done first); 
the big question is where that additinal link comes from.

I've added code to ohci_hash_add_td() to catch a TD being added with a 
physical address already present in the hash list, but that didn't fire.


Re: USB lockup

2020-11-26 Thread Nick Hudson

On 24/11/2020 16:30, Edgar Fuß wrote:

so the td list must have gone circular, no?

It's indeed circular (in the td_nexttd sense), as addionally inserted
debugging output revealed. It also happens in uniprocessor (boot -1) mode.




Add a check to ohci_softintr to see if the list goes circular and enter
ddb / dump usbhist when it does...

I had a fix on my nick-nhusb branch that might help here, but other
updates broke it and I've not looked as to why.

Nick