Re: USB lockup
Thanks a lot for looking into this! > Really hard to help without seeing the full ohcidebug usbhist log. The problem is that file system (or block I/O) seems to lock up so the usbhist is hard to get out of the machine other than by canera. I guess dump-ing will take ages to complete (16G RAM). I could try to replace my panic with simply writing something to usbhist and aborting the loop. > I guess the E20 TD got written out with incorrect next_td, or some other > error condition caused the mixup. You mean nexttd or td_nexttd? As far as I can tell, neither field is touched by the driver without being ohci_dump_td()'d afterwards, and, as I wrote, minus the loopback td_nexttd, everything is exactly as one would expect. > The change I referred to was I'll have a look into that one tomorrow. > is something being aborted? May well be. I haven't checked yet. My feeling is that this is either a controller error or some sort of DMA/cache/barrier/whatever race during the HccaDoneHead manipulation. But I'm steadily confused by the writing-a-1-clears-the-bit or writing-a-1-sets-the-bit semantics of the registers and know nothing about all these cache/barrier/re-ordering issues other that they may exist. The one nice thing is that the lock-up is easily and 100% reproducible. If only these PeCee boxes wouldn't take ages to reboot.
Re: USB lockup
On 26/11/2020 20:35, Edgar Fuß wrote: Add a check to ohci_softintr to see if the list goes circular and enter ddb / dump usbhist when it does... I already did add a panic and it fired. I'm still trying to find out how that happens. What I'm seeing (dumped by device_ctrl_start()) is a chain of four TDs (named here after their addresses' three least significant nybbles): E20->EE0->FA0->F40->0 which are linked in that sense by both nexttd and td.td_nexttd. Then, in ohci_softint(), the done queue is (as linked by td.nexttd): FA0->EE0->E20->FA0->... and, as expected, the nexttd links are as before. Absent the E20->FA0 link, that's exactly what one would expect if the first three TDs have been handled (the done list is most recently done first); the big question is where that additinal link comes from. I've added code to ohci_hash_add_td() to catch a TD being added with a physical address already present in the hash list, but that didn't fire. Really hard to help without seeing the full ohcidebug usbhist log. I guess the E20 TD got written out with incorrect next_td, or some other error condition caused the mixup. The change I referred to was Revision 1.254.2.76 / (download) - annotate - [select for diffs], Mon May 30 06:46:50 2016 UTC (4 years, 5 months ago) by skrll Branch: nick-nhusb Changes since 1.254.2.75: +181 -48 lines Diff to previous 1.254.2.75 (colored) to branchpoint 1.254 (colored) Restructure the abort code for TD based transfers (ctrl, bulk, intr). In PR/22646 some TDs can be on the done queue when the abort start and, if this is the case, they need to processed after the WDH interrupt. Instead of waiting for WDH we release TDs that have been touched by the HC and replace them with new ones. Once WDH happens the floating TDs will be returned to the free list. is something being aborted? Nick
Re: MAXTSIZ removal?
In article <20201125210311.7wofo3mtipvfb...@yt.nih.at>, Thomas Klausner wrote: >There was a commit by christos that made MAXTSIZ optional, but >at least the amd64 vmparam.h still defines it. > >Any reason not to remove it? > >(I still can't start emulators/mame with a GENERIC without that change) > Thomas Nope I'll remove it, christos
Re: USB lockup
> Add a check to ohci_softintr to see if the list goes circular and enter > ddb / dump usbhist when it does... I already did add a panic and it fired. I'm still trying to find out how that happens. What I'm seeing (dumped by device_ctrl_start()) is a chain of four TDs (named here after their addresses' three least significant nybbles): E20->EE0->FA0->F40->0 which are linked in that sense by both nexttd and td.td_nexttd. Then, in ohci_softint(), the done queue is (as linked by td.nexttd): FA0->EE0->E20->FA0->... and, as expected, the nexttd links are as before. Absent the E20->FA0 link, that's exactly what one would expect if the first three TDs have been handled (the done list is most recently done first); the big question is where that additinal link comes from. I've added code to ohci_hash_add_td() to catch a TD being added with a physical address already present in the hash list, but that didn't fire.
Re: USB lockup
On 24/11/2020 16:30, Edgar Fuß wrote: so the td list must have gone circular, no? It's indeed circular (in the td_nexttd sense), as addionally inserted debugging output revealed. It also happens in uniprocessor (boot -1) mode. Add a check to ohci_softintr to see if the list goes circular and enter ddb / dump usbhist when it does... I had a fix on my nick-nhusb branch that might help here, but other updates broke it and I've not looked as to why. Nick