Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-26 Thread sc . dying
On 2020/07/26 10:54, Marcus Glocker wrote:
> On Sat, 25 Jul 2020 20:31:44 +
> sc.dy...@gmail.com wrote:
> 
>> On 2020/07/25 18:10, Marcus Glocker wrote:
>>> On Sun, Jul 19, 2020 at 02:12:21PM +, sc.dy...@gmail.com wrote:
>>>   
 On 2020/07/19 11:25, Marcus Glocker wrote:  
> On Sun, 19 Jul 2020 02:25:30 +
> sc.dy...@gmail.com wrote:
>  
>> hi,
>>
>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
>> and ASM1042 (10421b21).
>> I simply play with ffplay -f v4l2 /dev/video0 to test.  
>
> If your cam supports MJPEG it's good to add '-input_format mjpeg'
> with higher resolutions like 1280x720, because that will
> generated varying image sizes, which hit the 64k memory boundary
> more often, and thus generate potentially more chained TDs.  

 Thank you for useful information.
 My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1)
 maxburst x 40 frames = 122880 bytes/xfer is enough to observe TD
 fragmentation.

  
>> At this moment it does not work on VL805, but I have no idea.
>> I'll investigate furthermore...  
>>>
>>> Did you already had a chance to figure out something regarding the
>>> issue you faced on your VL805 controller?
>>>
>>> I'm running the diff here since then on the Intel xHCI controller
>>> and couldn't re-produce any errors using different uvideo(4) and
>>> uaudio(4) devices.
>>>   
>>
>> No, yet -- all I know about this problem is VL805 genegates
>> many MISSED_SRV Transfer Event for Isoch-IN pipe.
>>
>> xhci0: slot 3 missed srv with 123 TRB
>>  :
>>
>> Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
>> MISSED_SRV. When I disable splitting TD, it works well.
>> I added printf paddr in the event TRB but each paddr of MISSED_SRV is
>> 0, that does not meet 4.10.3.2.
>> Parameters in this endpoint context are
>>
>> xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0
>>
>> looks sane.
> 
> Hmm, I see.
> 
> I currently have also no idea what exactly is causing the missed service
> events.  I was reading a little bit yesterday about the VL805 and could
> find some statements where people say it's not fully compliant with the
> xHCI specs, and in Linux it took some cooperation with the vendor to
> make it work.
> 
> One thing I still wanted to ask you to understand whether the problem
> on your VL805 is only related with my last diff;  Are the multi-trb
> transfers working fine with your last diff on the VL805?

On VL805 ffplay plays the movie sometimes smoothly, sometimes laggy.
The multi-TRB transfer itself works on VL805 with your patch.
Not all splitted TD fail to transfer. Successful splitted transfer
works as intended.
I think MISSED_SRV is caused by other reason, maybe isochronous
scheduling problem.
Thus, IMO your patch can be committed.

VL805 also has same problem that AMD Bolton has.
It may generate the 1. TRB event w/ cc = SHORT_PKT and
remain = requested length (that is, transferred length = 0),
but the 2. TRB w/ cc = SUCCESS and remain = 0.
remain of 2. TRB should be given length, and CC should be SHORT_PKT.
Your patch fixes this problem.



Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-25 Thread sc . dying
On 2020/07/25 18:10, Marcus Glocker wrote:
> On Sun, Jul 19, 2020 at 02:12:21PM +, sc.dy...@gmail.com wrote:
> 
>> On 2020/07/19 11:25, Marcus Glocker wrote:
>>> On Sun, 19 Jul 2020 02:25:30 +
>>> sc.dy...@gmail.com wrote:
>>>
 hi,

 It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
 and ASM1042 (10421b21).
 I simply play with ffplay -f v4l2 /dev/video0 to test.
>>>
>>> If your cam supports MJPEG it's good to add '-input_format mjpeg' with
>>> higher resolutions like 1280x720, because that will generated varying
>>> image sizes, which hit the 64k memory boundary more often, and thus
>>> generate potentially more chained TDs.
>>
>> Thank you for useful information.
>> My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1) maxburst
>> x 40 frames = 122880 bytes/xfer is enough to observe TD fragmentation.
>>
>>
 At this moment it does not work on VL805, but I have no idea.
 I'll investigate furthermore...
> 
> Did you already had a chance to figure out something regarding the
> issue you faced on your VL805 controller?
> 
> I'm running the diff here since then on the Intel xHCI controller and
> couldn't re-produce any errors using different uvideo(4) and uaudio(4)
> devices.
> 

No, yet -- all I know about this problem is VL805 genegates
many MISSED_SRV Transfer Event for Isoch-IN pipe.

xhci0: slot 3 missed srv with 123 TRB
 :

Even if I increase UVIDEO_IXFERS in uvideo.h to 6, HC still detects
MISSED_SRV. When I disable splitting TD, it works well.
I added printf paddr in the event TRB but each paddr of MISSED_SRV is 0,
that does not meet 4.10.3.2.
Parameters in this endpoint context are

xhci0: slot 3 dci 3 ival 0 mps 1024 maxb 2 mep 3072 atl 3072 mult 0

looks sane.


>>> Oh dear :-) - Ok.
>>>  
 BTW I think xhci_ring_alloc should free ring dma buffer
 if trb_processed[] allocation fails.
>>>
>>> Ah right, I'll add the usbd_dma_contig_free() there - Thanks.
>>>

 On 2020/07/18 14:47, Marcus Glocker wrote:
> Hey,
>
> On Sat, 18 Jul 2020 00:14:32 +
> sc.dy...@gmail.com wrote:
>   
>> hi,
>>
>> On 2020/07/15 21:28, sc.dy...@gmail.com wrote:  
>>> hi,
>>>
>>> The patch works well on Intel PCH xhci, but not on AMD Bolton
>>> xHCI. I'll investigate this problem.
>>
>> Bolton xHCI sometimes raises following events.
>> (I added printf Completion Code.)
>>
>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>>
>> It seems this behavior violates the spec. 4.10.1.1.
>> When the remain of 1. TRB is 0, CC should be SUCCESS, not
>> SHORT_PKT, and HC should not raise interrupt.
>> The problem is how many length 2. TRB transferred.
>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
>> frlengths. If SHORT_PKT, we should keep frlengths as is.
>> Actually with the latter assumption I can see video smoothly w/o
>> errors.
>>
>> I changed your patch to skip adding to frlengths and actlen
>> if previous TRB is processed AND remain == 0.  
>
> Interesting!  Thanks for tracing this down.
> So in general we can say that for a chained TD, if the first TRB is
> SHORT, we can simply skip the second TRB.
>
> To make the code a bit more understandable, and use the new
> trb_processed flag to control this, I made another diff which keeps
> track whether the TRB was not processed, processed, or processed
> short.
>
> It also includes the malloc NULL check and replaces malloc by
> mallocarray as suggested by tb@.
>
> Does this diff also work for you on your AMD xhci?
>
>
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -u -p -r1.116 xhci.c
> --- xhci.c30 Jun 2020 10:21:59 -  1.116
> +++ xhci.c18 Jul 2020 14:20:28 -
> @@ -82,7 +82,7 @@ voidxhci_event_xfer(struct xhci_softc *
>  int  xhci_event_xfer_generic(struct xhci_softc *, struct
> usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
> uint8_t); int xhci_event_xfer_isoc(struct usbd_xfer *,
> struct xhci_pipe *,
> - uint32_t, int);
> + uint32_t, int, uint8_t);
>  void xhci_event_command(struct xhci_softc *, uint64_t);
>  void xhci_event_port_change(struct xhci_softc *, uint64_t,
> uint32_t); intxhci_pipe_init(struct xhci_softc *, struct
> usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
> *sc, u return;
>   break;
>   case UE_ISOCHRONOUS:
> - if (xhci_event_xfer_isoc(xfer, xp, remain,
> trb_idx))
> + if (xhci_event_xfer_isoc(xfer, xp, remain,
> trb_idx, code)) return;
>   break;
>   default:

Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-19 Thread sc . dying
On 2020/07/19 11:25, Marcus Glocker wrote:
> On Sun, 19 Jul 2020 02:25:30 +
> sc.dy...@gmail.com wrote:
> 
>> hi,
>>
>> It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
>> and ASM1042 (10421b21).
>> I simply play with ffplay -f v4l2 /dev/video0 to test.
> 
> If your cam supports MJPEG it's good to add '-input_format mjpeg' with
> higher resolutions like 1280x720, because that will generated varying
> image sizes, which hit the 64k memory boundary more often, and thus
> generate potentially more chained TDs.

Thank you for useful information.
My webcam supprots at most 640x480, but 1024 bytes/frame x (2+1) maxburst
x 40 frames = 122880 bytes/xfer is enough to observe TD fragmentation.


>> At this moment it does not work on VL805, but I have no idea.
>> I'll investigate furthermore...
> 
> Oh dear :-) - Ok.
>  
>> BTW I think xhci_ring_alloc should free ring dma buffer
>> if trb_processed[] allocation fails.
> 
> Ah right, I'll add the usbd_dma_contig_free() there - Thanks.
> 
>>
>> On 2020/07/18 14:47, Marcus Glocker wrote:
>>> Hey,
>>>
>>> On Sat, 18 Jul 2020 00:14:32 +
>>> sc.dy...@gmail.com wrote:
>>>   
 hi,

 On 2020/07/15 21:28, sc.dy...@gmail.com wrote:  
> hi,
>
> The patch works well on Intel PCH xhci, but not on AMD Bolton
> xHCI. I'll investigate this problem.

 Bolton xHCI sometimes raises following events.
 (I added printf Completion Code.)

 #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
 #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072

 It seems this behavior violates the spec. 4.10.1.1.
 When the remain of 1. TRB is 0, CC should be SUCCESS, not
 SHORT_PKT, and HC should not raise interrupt.
 The problem is how many length 2. TRB transferred.
 If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
 frlengths. If SHORT_PKT, we should keep frlengths as is.
 Actually with the latter assumption I can see video smoothly w/o
 errors.

 I changed your patch to skip adding to frlengths and actlen
 if previous TRB is processed AND remain == 0.  
>>>
>>> Interesting!  Thanks for tracing this down.
>>> So in general we can say that for a chained TD, if the first TRB is
>>> SHORT, we can simply skip the second TRB.
>>>
>>> To make the code a bit more understandable, and use the new
>>> trb_processed flag to control this, I made another diff which keeps
>>> track whether the TRB was not processed, processed, or processed
>>> short.
>>>
>>> It also includes the malloc NULL check and replaces malloc by
>>> mallocarray as suggested by tb@.
>>>
>>> Does this diff also work for you on your AMD xhci?
>>>
>>>
>>> Index: xhci.c
>>> ===
>>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
>>> retrieving revision 1.116
>>> diff -u -p -u -p -r1.116 xhci.c
>>> --- xhci.c  30 Jun 2020 10:21:59 -  1.116
>>> +++ xhci.c  18 Jul 2020 14:20:28 -
>>> @@ -82,7 +82,7 @@ void  xhci_event_xfer(struct xhci_softc *
>>>  intxhci_event_xfer_generic(struct xhci_softc *, struct
>>> usbd_xfer *, struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t,
>>> uint8_t); int   xhci_event_xfer_isoc(struct usbd_xfer *,
>>> struct xhci_pipe *,
>>> -   uint32_t, int);
>>> +   uint32_t, int, uint8_t);
>>>  void   xhci_event_command(struct xhci_softc *, uint64_t);
>>>  void   xhci_event_port_change(struct xhci_softc *, uint64_t,
>>> uint32_t); int  xhci_pipe_init(struct xhci_softc *, struct
>>> usbd_pipe *); @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc
>>> *sc, u return;
>>> break;
>>> case UE_ISOCHRONOUS:
>>> -   if (xhci_event_xfer_isoc(xfer, xp, remain,
>>> trb_idx))
>>> +   if (xhci_event_xfer_isoc(xfer, xp, remain,
>>> trb_idx, code)) return;
>>> break;
>>> default:
>>> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>>>  
>>>  int
>>>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
>>> -uint32_t remain, int trb_idx)
>>> +uint32_t remain, int trb_idx, uint8_t code)
>>>  {
>>> struct usbd_xfer *skipxfer;
>>> struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
>>> -   int trb0_idx, frame_idx = 0;
>>> +   int trb0_idx, frame_idx = 0, skip_trb = 0;
>>>  
>>> KASSERT(xx->index >= 0);
>>> +
>>> +   switch (code) {
>>> +   case XHCI_CODE_SHORT_XFER:
>>> +   xp->ring.trb_processed[trb_idx] =
>>> TRB_PROCESSED_SHORT;
>>> +   break;
>>> +   default:
>>> +   xp->ring.trb_processed[trb_idx] =
>>> TRB_PROCESSED_YES;
>>> +   break;
>>> +   }
>>> +
>>> trb0_idx =
>>> ((xx->index + xp->ring.ntrb) - xx->ntrb) %
>>> (xp->ring.ntrb - 1); 
>>> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>>> trb0_idx = xp->ring.ntrb - 2;
>>> else
>>> trb0_idx = trb_idx - 1;
>>> -   if 

Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-18 Thread sc . dying
hi,

It works on AMD Bolton xHCI (78141022), Intel PCH (1e318086),
and ASM1042 (10421b21).
I simply play with ffplay -f v4l2 /dev/video0 to test.
At this moment it does not work on VL805, but I have no idea.
I'll investigate furthermore...

BTW I think xhci_ring_alloc should free ring dma buffer
if trb_processed[] allocation fails.


On 2020/07/18 14:47, Marcus Glocker wrote:
> Hey,
> 
> On Sat, 18 Jul 2020 00:14:32 +
> sc.dy...@gmail.com wrote:
> 
>> hi,
>>
>> On 2020/07/15 21:28, sc.dy...@gmail.com wrote:
>>> hi,
>>>
>>> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
>>> I'll investigate this problem.  
>>
>> Bolton xHCI sometimes raises following events.
>> (I added printf Completion Code.)
>>
>> #246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
>> #247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072
>>
>> It seems this behavior violates the spec. 4.10.1.1.
>> When the remain of 1. TRB is 0, CC should be SUCCESS, not SHORT_PKT,
>> and HC should not raise interrupt.
>> The problem is how many length 2. TRB transferred.
>> If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to
>> frlengths. If SHORT_PKT, we should keep frlengths as is.
>> Actually with the latter assumption I can see video smoothly w/o
>> errors.
>>
>> I changed your patch to skip adding to frlengths and actlen
>> if previous TRB is processed AND remain == 0.
> 
> Interesting!  Thanks for tracing this down.
> So in general we can say that for a chained TD, if the first TRB is
> SHORT, we can simply skip the second TRB.
> 
> To make the code a bit more understandable, and use the new
> trb_processed flag to control this, I made another diff which keeps
> track whether the TRB was not processed, processed, or processed short.
> 
> It also includes the malloc NULL check and replaces malloc by
> mallocarray as suggested by tb@.
> 
> Does this diff also work for you on your AMD xhci?
> 
> 
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -u -p -r1.116 xhci.c
> --- xhci.c30 Jun 2020 10:21:59 -  1.116
> +++ xhci.c18 Jul 2020 14:20:28 -
> @@ -82,7 +82,7 @@ voidxhci_event_xfer(struct xhci_softc *
>  int  xhci_event_xfer_generic(struct xhci_softc *, struct usbd_xfer *,
>   struct xhci_pipe *, uint32_t, int, uint8_t, uint8_t, uint8_t);
>  int  xhci_event_xfer_isoc(struct usbd_xfer *, struct xhci_pipe *,
> - uint32_t, int);
> + uint32_t, int, uint8_t);
>  void xhci_event_command(struct xhci_softc *, uint64_t);
>  void xhci_event_port_change(struct xhci_softc *, uint64_t, uint32_t);
>  int  xhci_pipe_init(struct xhci_softc *, struct usbd_pipe *);
> @@ -800,7 +800,7 @@ xhci_event_xfer(struct xhci_softc *sc, u
>   return;
>   break;
>   case UE_ISOCHRONOUS:
> - if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx))
> + if (xhci_event_xfer_isoc(xfer, xp, remain, trb_idx, code))
>   return;
>   break;
>   default:
> @@ -927,13 +927,23 @@ xhci_event_xfer_generic(struct xhci_soft
>  
>  int
>  xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xhci_pipe *xp,
> -uint32_t remain, int trb_idx)
> +uint32_t remain, int trb_idx, uint8_t code)
>  {
>   struct usbd_xfer *skipxfer;
>   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> - int trb0_idx, frame_idx = 0;
> + int trb0_idx, frame_idx = 0, skip_trb = 0;
>  
>   KASSERT(xx->index >= 0);
> +
> + switch (code) {
> + case XHCI_CODE_SHORT_XFER:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_SHORT;
> + break;
> + default:
> + xp->ring.trb_processed[trb_idx] = TRB_PROCESSED_YES;
> + break;
> + }
> +
>   trb0_idx =
>   ((xx->index + xp->ring.ntrb) - xx->ntrb) % (xp->ring.ntrb - 1);
>  
> @@ -958,15 +968,21 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   trb0_idx = xp->ring.ntrb - 2;
>   else
>   trb0_idx = trb_idx - 1;
> - if (xfer->frlengths[frame_idx] == 0) {
> + if (xp->ring.trb_processed[trb0_idx] == TRB_PROCESSED_NO) {
>   xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>   xp->ring.trbs[trb0_idx].trb_status));
> + } else if (xp->ring.trb_processed[trb0_idx] ==
> + TRB_PROCESSED_SHORT) {
> + skip_trb = 1;
>   }
>   }
>  
> - xfer->frlengths[frame_idx] +=
> - XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
> - xfer->actlen += xfer->frlengths[frame_idx];
> + if (!skip_trb) {
> + xfer->frlengths[frame_idx] +=
> + XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
> + remain;
> + xfer->actlen += 

Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-17 Thread sc . dying
hi,

On 2020/07/15 21:28, sc.dy...@gmail.com wrote:
> hi,
> 
> The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
> I'll investigate this problem.

Bolton xHCI sometimes raises following events.
(I added printf Completion Code.)

#246 cc 13 remain 0 type 5 origlen 2048 frlengths[6] 2048
#247 cc 1 remain 0 type 1 origlen 1024 frlengths[6] 3072

It seems this behavior violates the spec. 4.10.1.1.
When the remain of 1. TRB is 0, CC should be SUCCESS, not SHORT_PKT,
and HC should not raise interrupt.
The problem is how many length 2. TRB transferred.
If real CC of 1. TRB = SUCCESS, we should add 2. TRB length to frlengths.
If SHORT_PKT, we should keep frlengths as is.
Actually with the latter assumption I can see video smoothly w/o errors.

I changed your patch to skip adding to frlengths and actlen
if previous TRB is processed AND remain == 0.


--- sys/dev/usb/xhci.c.orig Sun Apr  5 10:12:37 2020
+++ sys/dev/usb/xhci.c  Fri Jul 17 23:30:30 2020
@@ -931,7 +931,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
 {
struct usbd_xfer *skipxfer;
struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
-   int trb0_idx, frame_idx = 0;
+   int trb0_idx, frame_idx = 0, prev_trb_processed = 0;
 
KASSERT(xx->index >= 0);
trb0_idx =
@@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
if (trb0_idx++ == (xp->ring.ntrb - 1))
trb0_idx = 0;
}
+   xp->ring.trb_processed[trb_idx] = 1;
 
/*
 * If we queued two TRBs for a frame and this is the second TRB,
@@ -958,15 +959,19 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
trb0_idx = xp->ring.ntrb - 2;
else
trb0_idx = trb_idx - 1;
-   if (xfer->frlengths[frame_idx] == 0) {
+   prev_trb_processed = xp->ring.trb_processed[trb0_idx];
+   if (prev_trb_processed == 0) {
xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
xp->ring.trbs[trb0_idx].trb_status));
}
}
 
-   xfer->frlengths[frame_idx] +=
-   XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
-   xfer->actlen += xfer->frlengths[frame_idx];
+   if (!(prev_trb_processed != 0 && remain == 0)) {
+   xfer->frlengths[frame_idx] +=
+   XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) -
+   remain;
+   xfer->actlen += xfer->frlengths[frame_idx];
+   }
 
if (xx->index != trb_idx)
return (1);
@@ -1696,6 +1701,9 @@ xhci_ring_alloc(struct xhci_softc *sc, struct xhci_rin
 
ring->ntrb = ntrb;
 
+   ring->trb_processed = malloc(ring->ntrb * sizeof(*ring->trb_processed),
+   M_DEVBUF, M_NOWAIT);
+
xhci_ring_reset(sc, ring);
 
return (0);
@@ -1704,6 +1712,8 @@ xhci_ring_alloc(struct xhci_softc *sc, struct xhci_rin
 void
 xhci_ring_free(struct xhci_softc *sc, struct xhci_ring *ring)
 {
+   free(ring->trb_processed, M_DEVBUF,
+   ring->ntrb * sizeof(*ring->trb_processed));
usbd_dma_contig_free(>sc_bus, >dma);
 }
 
@@ -1715,6 +1725,8 @@ xhci_ring_reset(struct xhci_softc *sc, struct xhci_rin
size = ring->ntrb * sizeof(struct xhci_trb);
 
memset(ring->trbs, 0, size);
+   memset(ring->trb_processed, 0,
+   ring->ntrb * sizeof(*ring->trb_processed));
 
ring->index = 0;
ring->toggle = XHCI_TRB_CYCLE;
@@ -1799,6 +1811,8 @@ xhci_ring_produce(struct xhci_softc *sc, struct xhci_r
ring->index = 0;
ring->toggle ^= 1;
}
+
+   ring->trb_processed[(trb - >trbs[0])] = 0;
 
return (trb);
 }



Re: xhci(4) isoc: fix bogus handling of chained TRBs

2020-07-15 Thread sc . dying
hi,

The patch works well on Intel PCH xhci, but not on AMD Bolton xHCI.
I'll investigate this problem.

BTW please check if malloc returns NULL.


On 2020/07/13 09:59, Marcus Glocker wrote:
> On Sat, 11 Jul 2020 23:43:43 +0200
> Marcus Glocker  wrote:
> 
>> On Sat, 11 Jul 2020 20:14:03 +0200
>> Marcus Glocker  wrote:
>>
>>> On Sat, 11 Jul 2020 11:56:38 +0200
>>> Marcus Glocker  wrote:
>>>   
 On Fri, 10 Jul 2020 14:32:47 +0200
 Marcus Glocker  wrote:
 
> On Fri, 10 Jul 2020 14:19:00 +0200
> Patrick Wildt  wrote:
>   
>> On Fri, Jul 10, 2020 at 01:00:31PM +0200, Marcus Glocker
>> wrote: 
>>> Hi All,
>>>
>>> Laurence Tratt was reporting about corrupted video images
>>> when using uvideo(4) on xhci(4) with MJPEG and higher
>>> resolutions, e.g. when using ffplay:
>>>
>>> $ ffplay -f v4l2 -input_format mjpeg -video_size 1280x720 -f
>>> /dev/video1
>>>
>>> When trying to re-produce the issue on my side, the video
>>> images were still OK, but I also could see a lot of errors
>>> returned by ffplay related to corrupted MJPEG data.
>>>
>>> When debugging the error I noticed that the corruption only
>>> happens when TRBs are get chained due to hitting a 64k
>>> boundary, and therefore require to queue up two TRBs.  So,
>>> what happened?
>>>
>>> In xhci.c:xhci_event_xfer_isoc() we do the accounting of the
>>> processed, chained TD:
>>>
>>> /*
>>>  * If we queued two TRBs for a frame and this is the second
>>> TRB,
>>>  * check if the first TRB needs accounting since it might
>>> not have
>>>  * raised an interrupt in case of full data received.
>>>  */
>>>
>>> If there was a zero length transfer with such a TD, and the
>>> 2. TRB receives the interrupt with len=0, it assumes that
>>> the 1. TRB was processed (but it wasn't), and adds the
>>> requested 1. TRB length to actlen, passing back that data
>>> has been received, which is obviously wrong, resulting in
>>> the seen data corruptions.
>>>
>>> Instead, when the 2. TRB receives the IOC interrupt with
>>> len=0, we need to ignore the 1. TRB, considering this was a
>>> zero length transfer.
>>>
>>> With the below diff Laurent is getting a clear video image
>>> now with high resolutions, and I could remove all the output
>>> errors from ffplay in my case.  I also tested with an
>>> uaudio(4) device.
>>>
>>> Feedback, more testers, OKs?
>>>
>>> Thanks,
>>> Marcus
>>>   
>>
>> What if the first TRB was filled completely, but the second
>> TRB transferred 0?  Wouldn't we then need to add 1. TRB?  I
>> think your diff wouldn't do that.
>
> Right - That was the only case I was thinking about as well
> which wouldn't get handled at the moment, but I wasn't sure if
> it can be a case at all (according to your comment it can :-)
>   
>> I think what we actually need is some "processed" flag.  If
>> you look on bugs@ or so, there's a diff that tries to fix the
>> same issue by adding a "CHAINED" or so flag.  I'm not sure if
>> that's the right way.
>>
>> But, anyway, I think we need a way to say "this TRB got an
>> interrupt" or "this TRB was processed".

 Sorry, I just did see the "xhci: zero length multi-TRB inbound
 xfer does not work" mail thread now.

 Honestly, at the moment I also don't know what's the right way to
 implement this.  I need to understand the approach from sc.dyings
 diff a bit better first.
>>>
>>> Ok, I think sc.dyings diff is going in to the right direction but I
>>> agree with Patrick that we should track the TRB processed status per
>>> TRB.
>>>
>>> How's that as a start (for isoc)?  
>>
>> A slightly re-worked diff which moves the trb_processed=false
>> initialization to xhci_ring_produce().
>> I couldn't manage to move the setting of trb_processed=true to
>> xhci_ring_consume().  Maybe at one point we need to introduce a new
>> xhci_ring_* function/macro to do this which is getting called by the
>> according TRB processing functions?
> 
> Updated diff including feedback from mpi@ that we don't want to
> introduce 'bool' in the kernel.
> 
> 
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.116
> diff -u -p -r1.116 xhci.c
> --- xhci.c30 Jun 2020 10:21:59 -  1.116
> +++ xhci.c13 Jul 2020 09:49:23 -
> @@ -945,6 +945,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>   if (trb0_idx++ == (xp->ring.ntrb - 1))
>   trb0_idx = 0;
>   }
> + xp->ring.trb_processed[trb_idx] = 1;
>  
>   /*
>* If we queued two TRBs for a frame and this is the second TRB,
> @@ -958,7 +959,7 @@ xhci_event_xfer_isoc(struct usbd_xfer *x
>  

Re: xhci: zero length multi-TRB inbound xfer does not work

2020-06-24 Thread sc . dying
On 2020/06/24 11:57, Patrick Wildt wrote:
> On Tue, Jun 16, 2020 at 06:55:27AM +, sc.dy...@gmail.com wrote:
>> hi,
>>
>> The function xhci_event_xfer_isoc() of sys/dev/usb/xhci.c at line 954
>> does not work with zero length multi-TRB inbound transfer.
>>
>>949   /*
>>950* If we queued two TRBs for a frame and this is the 
>> second TRB,
>>951* check if the first TRB needs accounting since it 
>> might not have
>>952* raised an interrupt in case of full data received.
>>953*/
>>954   if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) & 
>> XHCI_TRB_TYPE_MASK) ==
>>955   XHCI_TRB_TYPE_NORMAL) {
>>956   frame_idx--;
>>957   if (trb_idx == 0)
>>958   trb0_idx = xp->ring.ntrb - 2;
>>959   else
>>960   trb0_idx = trb_idx - 1;
>>961   if (xfer->frlengths[frame_idx] == 0) {
>>962   xfer->frlengths[frame_idx] = 
>> XHCI_TRB_LEN(letoh32(
>>963   
>> xp->ring.trbs[trb0_idx].trb_status));
>>964   }
>>965   }
>>966   
>>967   xfer->frlengths[frame_idx] +=
>>968   
>> XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
>>969   xfer->actlen += xfer->frlengths[frame_idx];
>>
>> When a multi-TRB inbound transfer TD completes with transfer length = 0,
>> the HC should generate two events: 1st event for ISOCH TRB /w ISP|CHAIN
>> and 2nd event for NORMAL TRB w/ ISP|IOC.
>> Transfer Length field (it's remain length, actually) of each event is
>> same as requested length is, i.e., transferred length is 0.
>> So when the first event raises the frlengths is set to 0 at line 967.
>> It's correct.
>> On second event, as the comment describes, xhci.c tries to calculate
>> the 1st TRB xfer length at lines 954-965. The requested length of
>> 1st TRB is stored into frlengths -- even though the xfer len is 0.
>>
>> If frlengths = 0, we cannot distinguish the case the first event is
>> not raised from the case the transferred length is 0.
>> The frlengths is already 0 so the requested length of 1st TRB is stored.
> 
> That's a really good find!  I actually do wonder if we could have the
> same issue with the non-isoc transfers, when the first TRB throws a
> short and then we get another event for the last TRB.

It may occur on non-isoc pipes and should be fixed.
I have not observed problems on non-isoc pipes, but it's because
the devices I saw did not do zero-length xfer on non-isoc pipes,
I guess.

> 
> Maybe it would make sense to record the idx of the last TRB that we have
> received an event for?  Then we could check if we already processed that
> TRB.

At first I recorded such an index and compared with current idx, but
I noticed it would be expressed as a bit.
Do you have any plan to use the software's dequeue pointer for other
purposes?

> 
> Patrick
> 
>> For example, I applied debug printf [*1], and run
>> mplayer tv:// for my webcam.
>> I see...
>>
>> #25 remain 1024 type 5 origlen 1024 frlengths[25] 0
>> #25 (omitted) frlen[25] 1024
>> #26 remain 2048 type 1 origlen 2048 frlengths[25] 1024
>>
>> These console logs show a 3072 bytes frame is splitted into
>> two TRBs and got 0 bytes. The first TRB transfers 0 bytes and
>> the second TRB transfers 0, too, but it results 1024 bytes.
>>
>> My proposal patch [*2] adds a flag to xhci_xfer that indicates the
>> TRB processed by xhci.c previously has CHAIN bit, and updates the
>> frlengths only when that flag is not set.
>>
>>
>>
>> [*1]
>> debug printf.
>> It shows only splitted isochronous TDs.
>>
>> --- sys/dev/usb/xhci.c.orig  Sun Apr  5 10:12:37 2020
>> +++ sys/dev/usb/xhci.c   Fri May 29 04:13:36 2020
>> @@ -961,12 +961,23 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
>>  if (xfer->frlengths[frame_idx] == 0) {
>>  xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
>>  xp->ring.trbs[trb0_idx].trb_status));
>> +printf("#%d (omitted) frlen[%d] %u\n",
>> +trb0_idx, frame_idx, xfer->frlengths[frame_idx]);
>>  }
>>  }
>>  
>>  xfer->frlengths[frame_idx] +=
>>  XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
>>  xfer->actlen += xfer->frlengths[frame_idx];
>> +uint32_t trb_flags = letoh32(xp->ring.trbs[trb_idx].trb_flags);
>> +if ((trb_flags & XHCI_TRB_CHAIN) ||
>> +(trb_flags & XHCI_TRB_TYPE_MASK) == XHCI_TRB_TYPE_NORMAL) {
>> +printf("#%d remain %u type %u origlen %u frlengths[%d] %hu\n",
>> +trb_idx, remain,
>> +XHCI_TRB_TYPE(trb_flags),
>> +

xhci: zero length multi-TRB inbound xfer does not work

2020-06-16 Thread sc . dying
hi,

The function xhci_event_xfer_isoc() of sys/dev/usb/xhci.c at line 954
does not work with zero length multi-TRB inbound transfer.

   949  /*
   950   * If we queued two TRBs for a frame and this is the second TRB,
   951   * check if the first TRB needs accounting since it might not 
have
   952   * raised an interrupt in case of full data received.
   953   */
   954  if ((letoh32(xp->ring.trbs[trb_idx].trb_flags) & 
XHCI_TRB_TYPE_MASK) ==
   955  XHCI_TRB_TYPE_NORMAL) {
   956  frame_idx--;
   957  if (trb_idx == 0)
   958  trb0_idx = xp->ring.ntrb - 2;
   959  else
   960  trb0_idx = trb_idx - 1;
   961  if (xfer->frlengths[frame_idx] == 0) {
   962  xfer->frlengths[frame_idx] = 
XHCI_TRB_LEN(letoh32(
   963  xp->ring.trbs[trb0_idx].trb_status));
   964  }
   965  }
   966  
   967  xfer->frlengths[frame_idx] +=
   968  XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - 
remain;
   969  xfer->actlen += xfer->frlengths[frame_idx];

When a multi-TRB inbound transfer TD completes with transfer length = 0,
the HC should generate two events: 1st event for ISOCH TRB /w ISP|CHAIN
and 2nd event for NORMAL TRB w/ ISP|IOC.
Transfer Length field (it's remain length, actually) of each event is
same as requested length is, i.e., transferred length is 0.
So when the first event raises the frlengths is set to 0 at line 967.
It's correct.
On second event, as the comment describes, xhci.c tries to calculate
the 1st TRB xfer length at lines 954-965. The requested length of
1st TRB is stored into frlengths -- even though the xfer len is 0.

If frlengths = 0, we cannot distinguish the case the first event is
not raised from the case the transferred length is 0.
The frlengths is already 0 so the requested length of 1st TRB is stored.

For example, I applied debug printf [*1], and run
mplayer tv:// for my webcam.
I see...

#25 remain 1024 type 5 origlen 1024 frlengths[25] 0
#25 (omitted) frlen[25] 1024
#26 remain 2048 type 1 origlen 2048 frlengths[25] 1024

These console logs show a 3072 bytes frame is splitted into
two TRBs and got 0 bytes. The first TRB transfers 0 bytes and
the second TRB transfers 0, too, but it results 1024 bytes.

My proposal patch [*2] adds a flag to xhci_xfer that indicates the
TRB processed by xhci.c previously has CHAIN bit, and updates the
frlengths only when that flag is not set.



[*1]
debug printf.
It shows only splitted isochronous TDs.

--- sys/dev/usb/xhci.c.orig Sun Apr  5 10:12:37 2020
+++ sys/dev/usb/xhci.c  Fri May 29 04:13:36 2020
@@ -961,12 +961,23 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
if (xfer->frlengths[frame_idx] == 0) {
xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
xp->ring.trbs[trb0_idx].trb_status));
+   printf("#%d (omitted) frlen[%d] %u\n",
+   trb0_idx, frame_idx, xfer->frlengths[frame_idx]);
}
}
 
xfer->frlengths[frame_idx] +=
XHCI_TRB_LEN(letoh32(xp->ring.trbs[trb_idx].trb_status)) - remain;
xfer->actlen += xfer->frlengths[frame_idx];
+   uint32_t trb_flags = letoh32(xp->ring.trbs[trb_idx].trb_flags);
+   if ((trb_flags & XHCI_TRB_CHAIN) ||
+   (trb_flags & XHCI_TRB_TYPE_MASK) == XHCI_TRB_TYPE_NORMAL) {
+   printf("#%d remain %u type %u origlen %u frlengths[%d] %hu\n",
+   trb_idx, remain,
+   XHCI_TRB_TYPE(trb_flags),
+   XHCI_TRB_LEN(le32toh(xp->ring.trbs[trb_idx].trb_status)),
+   frame_idx, xfer->frlengths[frame_idx]);
+   }
 
if (xx->index != trb_idx)
return (1);

[*2]
patch

--- sys/dev/usb/xhcivar.h.orig  Sun Oct  6 21:19:28 2019
+++ sys/dev/usb/xhcivar.h   Fri May 22 04:19:57 2020
@@ -40,6 +40,7 @@ struct xhci_xfer {
struct usbd_xfer xfer;
int  index; /* Index of the last TRB */
size_t   ntrb;  /* Number of associated TRBs */
+   bool trb_chained;
 };
 
 struct xhci_ring {
--- sys/dev/usb/xhci.c.orig Sun Apr  5 10:12:37 2020
+++ sys/dev/usb/xhci.c  Thu Jun  4 05:39:03 2020
@@ -958,15 +958,23 @@ xhci_event_xfer_isoc(struct usbd_xfer *xfer, struct xh
trb0_idx = xp->ring.ntrb - 2;
else
trb0_idx = trb_idx - 1;
-   if (xfer->frlengths[frame_idx] == 0) {
+   if (xfer->frlengths[frame_idx] == 0 && !xx->trb_chained) {
xfer->frlengths[frame_idx] = XHCI_TRB_LEN(letoh32(
xp->ring.trbs[trb0_idx].trb_status));
}

[patch] xhci: Context Entries initialization fix

2019-12-09 Thread sc . dying
Hello,

My uplcom(4) does not work correctly with Etron EJ168 xhci.
It is attached correctly, but cannot be opened.
If it is attached to other xHCI or EHCI, it works.

When ucom is opened, bulk-in endpoint is configured at first, then
bulk-out one is configured. The former has DCI=7 and the latter has
DCI=4 (see below).  Most of xHCIs allow to use this DCI value as a
Context Entries in the Slot Context.  But Etron EJ168 does not allow,
that is, it requires the Context Entries shall be "Maximum DCI of
configured endpoint contexts", as the specification 4.5.2 suggests.
Otherwise it will generate Parameter Error(17).
In my uplcom case, software should configure the bulk-in endpoint with
Context Entries=7, and the bulk-out endpoint with Context Entries=7.


xhci2 at pci5 dev 0 function 0 "Etron EJ168 xHCI" rev 0x01: msi, xHCI 1.0
xhci2: CAPLENGTH=0x20
xhci2: DOORBELL=0x3000
xhci2: RUNTIME=0x2000
xhci2: 64 bytes context
xhci2: supported page size 0x0001
xhci2: 4 ports and 64 slots
xhci2: 4 scratch pages, ETE=0, IST=0x7
usb2 at xhci2: USB revision 3.0
uhub2 at usb2 configuration 1 interface 0 "Etron xHCI root hub" rev 3.00/1.00 
addr 1
xhci2: DCBAAP=00xda0d5000
xhci2: CRCR=00 (da0d6000)
xhci2: ERSTBA=00xd9f3e000
xhci2: ERDP=00xda0d7000
xhci2: USBCMD=0x5
xhci2: IMAN=0x2

# xhci2: port=2 change=0x04
xhci2: port=2 change=0x04
xhci2: xhci_cmd_slot_control
xhci2: dev 1, input=0xfd80dbe0 slot=0xfd80dbe00040 
ep0=0xfd80dbe00080
xhci2: dev 1, setting DCBAA to 0xdbe01000
xhci_pipe_init: pipe=0x8050b000 addr=0 depth=1 port=2 speed=2 dev 1 dci 
1 (epAddr=0x0)
xhci2: xhci_cmd_set_address BSR=1
xhci2: xhci_cmd_set_address BSR=0
xhci2: dev 1 addr 1
uplcom0 at uhub2 port 2 configuration 1 interface 0 "Prolific Technology Inc. 
USB-Serial Controller" rev 1.10/3.00 addr 2
ucom0 at uplcom0

# cu -s115200 -lttyU0
xhci_pipe_init: pipe=0x8081a000 addr=2 depth=1 port=2 speed=2 dev 1 dci 
7 (epAddr=0x83)
xhci2: xhci_cmd_configure_ep dev 1
xhci_pipe_init: pipe=0x8081b000 addr=2 depth=1 port=2 speed=2 dev 1 dci 
4 (epAddr=0x2)
xhci2: xhci_cmd_configure_ep dev 1
xhci2: event error code=17, result=33
trb=0x800022376430 (0xda0d6040 0x1100 0x1008401)
xhci2: xhci_cmd_slot_control
xhci2: xhci_cmd_configure_ep dev 1
xhci2: event error code=11, result=33
trb=0x800022376530 (0xda0d6060 0x0b00 0x1008401)
xhci2: error clearing ep (7)
cu: open("/dev/ttyU0"): Input/output error


To fix the problem, set the maximum value between the DCI of the last
valid Endpoint Context and the DCI to be configured to the DCI to be
configured.


--- sys/dev/usb/xhci.c.orig Wed Dec  4 22:32:43 2019
+++ sys/dev/usb/xhci.c  Sun Dec  8 03:24:00 2019
@@ -1330,7 +1330,7 @@ xhci_pipe_maxburst(struct usbd_pipe *pipe)
 int
 xhci_context_setup(struct xhci_softc *sc, struct usbd_pipe *pipe)
 {
-   struct xhci_pipe *xp = (struct xhci_pipe *)pipe;
+   struct xhci_pipe *lxp, *xp = (struct xhci_pipe *)pipe;
struct xhci_soft_dev *sdev = >sc_sdevs[xp->slot];
usb_endpoint_descriptor_t *ed = pipe->endpoint->edesc;
uint32_t mps = UGETW(ed->wMaxPacketSize);
@@ -1338,6 +1338,7 @@ xhci_context_setup(struct xhci_softc *sc, struct usbd_
uint8_t speed, cerr = 0;
uint32_t route = 0, rhport = 0;
struct usbd_device *hub;
+   int i;
 
/*
 * Calculate the Route String.  Assume that there is no hub with
@@ -1393,9 +1394,16 @@ xhci_context_setup(struct xhci_softc *sc, struct usbd_
sdev->input_ctx->drop_flags = 0;
sdev->input_ctx->add_flags = htole32(XHCI_INCTX_MASK_DCI(xp->dci));
 
+   /* Find the last valid Endpoint Context */
+   for (i = 30; i >= 0; i--) {
+   lxp = sdev->pipes[i];
+   if (lxp != NULL && lxp != xp)
+   break;
+   }
+
/* Setup the slot context */
sdev->slot_ctx->info_lo = htole32(
-   XHCI_SCTX_DCI(xp->dci) | XHCI_SCTX_SPEED(speed) |
+   XHCI_SCTX_DCI(max(lxp->dci, xp->dci)) | XHCI_SCTX_SPEED(speed) |
XHCI_SCTX_ROUTE(route)
);
sdev->slot_ctx->info_hi = htole32(XHCI_SCTX_RHPORT(rhport));




Re: XHCI support for bulk xfers >64k

2019-11-21 Thread sc . dying
On 2019/11/21 07:59, Patrick Wildt wrote:
> On Thu, Nov 21, 2019 at 06:52:36AM +, sc.dy...@gmail.com wrote:
>> Hi,
>>
>> On 2019/11/18 20:34, Marcus Glocker wrote:
>>> On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:
>>>
 While recently testing UDL on XHCI I faced a lot of USB timeouts, and
 therefore screen rendering issues.
 The reason is that XHCI currently only supports single bulk transfers
 up to 64k, while UDL can schedule a bulk transfer up to 1m.

 The following diff adds XHCI bulk transfer support >64k and makes UDL
 work fine for me on XHCI.

 mpi@ already did an initial re-view, and I have adapted some of his
 comments already in this diff.

 More testing and comments welcome.
>>>
>>> After patrick@ did feedback that the current bulk transfer code in
>>> XHCI should also work for larger transfers as is, we have found the
>>> bug which resulted in a much smaller diff :-)
>>>
>>> Attached for your reference.  It just has been committed.
>>>
>>>
>>> Index: xhci.c
>>> ===
>>> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
>>> retrieving revision 1.106
>>> diff -u -p -u -p -r1.106 xhci.c
>>> --- xhci.c  6 Oct 2019 17:30:00 -   1.106
>>> +++ xhci.c  18 Nov 2019 19:32:30 -
>>> @@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
>>> /* If the buffer crosses a 64k boundary, we need one more. */
>>> len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>> if (len < xfer->length)
>>> -   ntrb++;
>>> +   ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
>>> else
>>> len = xfer->length;
>>>  
>>>
>>
>> Should xhci_device_isoc_start also count up the chunks after boundary?
>>
> 
> We had discussed this as well, but since a frame cannot be bigger than
> 64k, a hard boundary which is visible when you realize frlengths is a
> uint16_t *, that change would essentially be a no-op.

Oh, I missed it.
Thank you for explanation.
I see, xhci_event_xfer_isoc does not need fix.

> 
> We could change it for the sake of having the same calculation in all
> places.
> 
> Thanks,
> Patrick
> 
>>
>> --- sys/dev/usb/xhci.c.orig  Mon Nov 18 22:47:25 2019
>> +++ sys/dev/usb/xhci.c   Thu Nov 21 05:13:02 2019
>> @@ -3049,7 +3049,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>>  /* If the buffer crosses a 64k boundary, we need one more. */
>>  len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>  if (len < xfer->frlengths[i])
>> -ntrb++;
>> +ntrb = howmany(xfer->frlengths[i] - len,
>> +XHCI_TRB_MAXSIZE) + 1;
>>  
>>  paddr += xfer->frlengths[i];
>>  }
>> @@ -3066,7 +3067,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
>>  /* If the buffer crosses a 64k boundary, we need one more. */
>>  len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>>  if (len < xfer->frlengths[i])
>> -ntrb++;
>> +ntrb = howmany(xfer->frlengths[i] - len,
>> +XHCI_TRB_MAXSIZE) + 1;
>>  else
>>  len = xfer->frlengths[i];
>>  
>>



Re: XHCI support for bulk xfers >64k

2019-11-20 Thread sc . dying
Hi,

On 2019/11/18 20:34, Marcus Glocker wrote:
> On Sun, Nov 17, 2019 at 12:36:49PM +0100, Marcus Glocker wrote:
> 
>> While recently testing UDL on XHCI I faced a lot of USB timeouts, and
>> therefore screen rendering issues.
>> The reason is that XHCI currently only supports single bulk transfers
>> up to 64k, while UDL can schedule a bulk transfer up to 1m.
>>
>> The following diff adds XHCI bulk transfer support >64k and makes UDL
>> work fine for me on XHCI.
>>
>> mpi@ already did an initial re-view, and I have adapted some of his
>> comments already in this diff.
>>
>> More testing and comments welcome.
> 
> After patrick@ did feedback that the current bulk transfer code in
> XHCI should also work for larger transfers as is, we have found the
> bug which resulted in a much smaller diff :-)
> 
> Attached for your reference.  It just has been committed.
> 
> 
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.106
> diff -u -p -u -p -r1.106 xhci.c
> --- xhci.c6 Oct 2019 17:30:00 -   1.106
> +++ xhci.c18 Nov 2019 19:32:30 -
> @@ -2871,7 +2871,7 @@ xhci_device_generic_start(struct usbd_xf
>   /* If the buffer crosses a 64k boundary, we need one more. */
>   len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
>   if (len < xfer->length)
> - ntrb++;
> + ntrb = howmany(xfer->length - len, XHCI_TRB_MAXSIZE) + 1;
>   else
>   len = xfer->length;
>  
> 

Should xhci_device_isoc_start also count up the chunks after boundary?


--- sys/dev/usb/xhci.c.orig Mon Nov 18 22:47:25 2019
+++ sys/dev/usb/xhci.c  Thu Nov 21 05:13:02 2019
@@ -3049,7 +3049,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
/* If the buffer crosses a 64k boundary, we need one more. */
len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
if (len < xfer->frlengths[i])
-   ntrb++;
+   ntrb = howmany(xfer->frlengths[i] - len,
+   XHCI_TRB_MAXSIZE) + 1;
 
paddr += xfer->frlengths[i];
}
@@ -3066,7 +3067,8 @@ xhci_device_isoc_start(struct usbd_xfer *xfer)
/* If the buffer crosses a 64k boundary, we need one more. */
len = XHCI_TRB_MAXSIZE - (paddr & (XHCI_TRB_MAXSIZE - 1));
if (len < xfer->frlengths[i])
-   ntrb++;
+   ntrb = howmany(xfer->frlengths[i] - len,
+   XHCI_TRB_MAXSIZE) + 1;
else
len = xfer->frlengths[i];
 



Re: patch axen(4) (WIP)

2019-06-16 Thread sc . dying
hi,

On 2019/02/26 14:00, Nils Frohberg wrote:
> On Mon, Feb 25, 2019 at 03:50:48PM -0300, Martin Pieuchot wrote:
>> On 25/02/19(Mon) 14:52, Nils Frohberg wrote:
[snip]
>>> I didn't get around to much testing/debugging lately, therefore I
>>> wanted to share the current state (diff below).
>>>
>>> The current state works a lot better than previously (for me). I
>>> used to have a huge amount of ierrs (aprrox. 1 ierr per ipkt) and
>>> often no packets would be transferred at all (or stop being transferred
>>> after some time).
>>
>> Do you know why?  What were the problems?
> 
> I'm not 100% sure, since I did a lot of back and forth. It finally
> got better once I disabled EEE and lowered the watermark levels.

I run my axen + your [1-7]/7 patches on amd64 with ehci,
that is, on high-speed.
When I ping to another host, it drops 20-30% of receiving packets.
It looks like the txeof does its job but the rxeof is not called
when packets arrive.
(debug messages and tcpdump on another host says so).

I checked each difference of diffs and found out that the initial
bufsz value in attach causes the drops.
I reverted like as following diff, it works correctly without drops
even the BUFSZ of HS is lower than bufsz of qctrl of HS.

Thoughts?


--- sys/dev/usb/if_axenreg.hSun Jun 16 00:48:48 2019
+++ sys/dev/usb/if_axenreg.hSun Jun 16 00:49:24 2019
@@ -13,7 +13,7 @@
 #define AXEN_MCAST_FILTER_SIZE 8
 /* unit: KB */
 #define AXEN_BUFSZ_LS  0x18
-#define AXEN_BUFSZ_HS  0x16
+#define AXEN_BUFSZ_HS  0x10
 #define AXEN_BUFSZ_SS  0x12
 #define AXEN_BUFSZ_MAX AXEN_BUFSZ_LS
 
--- sys/dev/usb/if_axen.c   Wed Jun 12 12:52:01 2019
+++ sys/dev/usb/if_axen.c   Sun Jun 16 00:50:08 2019
@@ -690,8 +695,10 @@ axen_attach(struct device *parent, struct device *self
 * this will be updated once the link state changes
 */
switch (sc->axen_udev->speed) {
-   case USB_SPEED_FULL:
case USB_SPEED_HIGH:
+   sc->axen_bufsz = AXEN_BUFSZ_HS * 1024;
+   break;
+   case USB_SPEED_FULL:
case USB_SPEED_SUPER:
/* linux adds 2 to the buffer size (why?) */
sc->axen_bufsz = (AXEN_BUFSZ_MAX + 2) * 1024;


>>> This box hosts backups (rsync and TimeMachine), so it gets its fair
>>> share of traffic. I could reduce the ierrs to ~2100 with 5d uptime,
>>> and the packets keep flowing:
>>>
>>> $ netstat -niI axen0
>>> NameMtu   Network Address  Ipkts IerrsOpkts Oerrs 
>>> Colls
>>> axen0   150094:c6:91:1f:85:a5 141199356  2099 112289969 0 
>>> 0
>>> $ uptime
>>>  7:15PM  up 5 days,  9:57, 3 users, load averages: 0.11, 0.13, 0.14
>>> $ 
>>>
>>> But there a still a few problems:
>>>
>>> 1) There are still ierrs. These happen when the rx path can't
>>> allocate mbufs (cf. diff below for DPRINTF):
>>> axen0: could not allocate rx mbuf (2 total, 2 remaining)
>>> axen0: could not allocate rx mbuf (3 total, 3 remaining)
>>> axen0: could not allocate rx mbuf (2 total, 2 remaining)
>>> axen0: could not allocate rx mbuf (1 total, 1 remaining)
>>
>> Look at the pools when this happen, what do you see?  What is the size
>> of `pkt_len' when this happen?
> 
> I added pkt_len to the DPRINTF so that I can check the relevant
> pools once I see this error.
> 
>>> 2) If the adapter is plugged into a USB 3 port at boot, it will
>>> return 0x42 when aked for AXEN_PHYSICAL_LINK_STATUS, ie.
>>> (AXEN_EPHY_1000|AXEN_USB_HS). The adapter most often then simply
>>> doesn't receive packets. If I plug in the adapter after boot is
>>> complete, 0x44 is returned (AXEN_EPHY_1000|AXEN_USB_SS), as expected.
>>> I'm not sure if I'm still missing something in init (probably) or
>>> if this results from something higher in the stack (xhci?).
>>> (Didn't test USB 2 or lower.)
>>
>> Do you see any difference in 'usbdevs -vv' output during the two cases? 
> 
> Yes, high speed vs super speed and power draw:
> 
> # Plugged in after boot:
> 
> $ usbdevs -vv
> Controller /dev/usb0:
> addr 01: 8086: Intel, xHCI root hub
>super speed, self powered, config 1, rev 1.00
>driver: uhub0
>port 01: 0001.02a0 power Rx.detect
>port 02: 0011.02a0 power Rx.detect
>port 03: .0503 connect enabled recovery
>port 04: .0503 connect enabled recovery
>port 05: .02a0 power Rx.detect
>port 06: .02a0 power Rx.detect
>port 07: .02a0 power Rx.detect
>port 08: .0103 connect enabled recovery
>port 09: .02a0 power Rx.detect
>port 10: .0203 connect enabled power U0
>port 11: .0203 connect enabled power U0
>port 12: .0203 connect enabled power U0
>port 13: .02a0 power Rx.detect
>port 14: .02a0 power Rx.detect
>port 15: .02a0 power Rx.detect
> addr 04: 0bc2:ab44 Seagate, Backup+ Hub
>high speed, self powered, config 1, rev 48.85, iSerial 

Re: patch axen(4) (WIP)

2019-06-16 Thread sc . dying
hi,

On 2019/02/28 06:53, Nils Frohberg wrote:
> On Tue, Feb 26, 2019 at 08:57:57PM +0200, Artturi Alm wrote:
>> On Tue, Feb 26, 2019 at 03:00:15PM +0100, Nils Frohberg wrote:
[snip]
>> Have you looked at what NetBSD has done with their axen(4)? there has
>> been 20commits in 2019 so far[0], while some of them are possibly,
>> idk., useless to us(thinking about hw checksum offloading), there was
>> some bug fixes that did look relevant to me, but i succesfully
>> installed kernels on nfs over axen(4) a couple of weeks ago,
>> so the bugs it has didn't feel critical enough for me to make
>> a branch for them. that was on arm64/dwctwo(4), tbh. i haven't been
>> happy with axen(4) on amd64/{e,x}hci(4) myself in the past either. :]
> 
> I wrote the diff last December. I looked at NetBSD's code back then, 
> but they didn't have any significant changes.
> 
> At a cursory glance, many changes are similar to mine. But there are 
> a few things that should be worth looking at.

Do you see the link status LED of axen is down when ifconfig it up/down?
The attached patch should fix that bug.
I applied the diff of NetBSD's if_axen.c between rev 1.20 and 1.21

https://github.com/NetBSD/src/commit/4603809d1167ab5cc2d8d35f95722aec85789fd0#diff-4748c3be63ca566d4a07520285f031a9

to if_axen.c of 6.5-current with [1-7]/7 patches.

The log reads:
> Enable AXEN_RXCTL_START bit only when RX is ready. Otherwise,
> the adapter eventually falls into "no carrier" state while it
> is not running.

I reverted 3/7 partially to keep previous value of AXEN_MAC_RXCTL reg,
so that RXCTL_START is set only in axen_init and is reset in axen_stop.

axen_reset does not initialise the chip any more.
It does nothing but delay.
The chip init, axen_ax88179_init, is called once while attach.


--- sys/dev/usb/if_axen.c   Wed Jun 12 12:52:01 2019
+++ sys/dev/usb/if_axen.c   Thu Jun 13 09:19:32 2019
@@ -421,7 +421,10 @@ axen_iff(struct axen_softc *sc)
 
/* Enable receiver, set RX mode */
axen_lock_mii(sc);
-   rxmode = AXEN_RXCTL_DROPCRCERR | AXEN_RXCTL_START;
+   axen_cmd(sc, AXEN_CMD_MAC_READ2, 2, AXEN_MAC_RXCTL, );
+   rxmode = UGETW(wval);
+   rxmode &= ~(AXEN_RXCTL_ACPT_ALL_MCAST | AXEN_RXCTL_PROMISC |
+   AXEN_RXCTL_ACPT_MCAST);
ifp->if_flags &= ~IFF_ALLMULTI;
 
/*
@@ -476,8 +479,6 @@ axen_reset(struct axen_softc *sc)
/* Wait a little while for the chip to get its brains in order. */
DELAY(1000);
 
-   axen_ax88179_init(sc);
-
return;
 }
 
@@ -486,7 +487,7 @@ axen_ax88179_init(struct axen_softc *sc)
 {
uWord   wval;
uByte   val;
-   u_int16_t   ctl, temp;
+   u_int16_t   ctl, rxmode, temp;
 
DPRINTFN(2,("%s: %s: enter\n", sc->axen_dev.dv_xname,__func__));
 
@@ -565,6 +566,10 @@ axen_ax88179_init(struct axen_softc *sc)
axen_cmd(sc, AXEN_CMD_MAC_WRITE, 1, AXEN_PAUSE_HIGH_WATERMARK, );
 
/* Set RX/TX configuration. */
+   rxmode = AXEN_RXCTL_DROPCRCERR;
+   USETW(wval, rxmode);
+   axen_cmd(sc, AXEN_CMD_MAC_WRITE2, 2, AXEN_MAC_RXCTL, );
+
 #ifdef AXEN_TOE
/* enable offloading */
val = AXEN_RXCOE_IPv4 |
@@ -1566,7 +1571,7 @@ axen_stop(struct axen_softc *sc)
usbd_status err;
struct ifnet*ifp;
int i;
-   u_int16_t   ctl;
+   u_int16_t   ctl, rxmode;
uWord   wval;
 
DPRINTFN(2,("%s: %s: enter\n", sc->axen_dev.dv_xname,__func__));
@@ -1578,7 +1583,13 @@ axen_stop(struct axen_softc *sc)
 
timeout_del(>axen_stat_ch);
 
-   /* disable receive */
+   /* Disable receiver, set RX mode */
+   axen_cmd(sc, AXEN_CMD_MAC_READ2, 2, AXEN_MAC_RXCTL, );
+   rxmode = UGETW();
+   rxmode &= ~AXEN_RXCTL_START;
+   USETW(wval, rxmode);
+   axen_cmd(sc, AXEN_CMD_MAC_WRITE2, 2, AXEN_MAC_RXCTL, );
+
axen_cmd(sc, AXEN_CMD_MAC_READ2, 2, AXEN_MEDIUM_STATUS, );
ctl = UGETW(wval);
ctl &= ~AXEN_MEDIUM_RECV_EN;


>> I guess i'm trying to say maybe it wouldn't hurt to sync a bit before
>> deviating as much as atleast your whole WIP diff did. I haven't read
>> your separate patches yet, but i'll try to get around to also testing
>> those before weekend:]
> 
> The separate patches are (more or less) the big patch split up via 
> $(git add -p). There are a few things that might be worth looking 
> into, such as the pause water levels, enabling EEE, axen_bulk_size 
> values, buffer sizes, ...
> 
> More testing would be great. Especially since this is the only box 
> I have where I can attach it to xhci.
> 
>> -Artturi
>>
>> [0] https://github.com/NetBSD/src/commits/trunk/sys/dev/usb/if_axen.c
>>
> 



[patch] print xhci version in hex

2019-02-01 Thread sc . dying
Hi,

The HCIVERSION register represents the BCD encoded HCI version number,
so it should be printed with %x.
For example, ver `0.150' should be printed as `0.96'.

--- /sys/dev/usb/xhci.c Sun Sep  9 10:40:35 2018
+++ /sys/dev/usb/xhci.c Fri Jan 25 09:05:09 2019
@@ -296,7 +296,7 @@ xhci_init(struct xhci_softc *sc)
sc->sc_runt_off = XREAD4(sc, XHCI_RTSOFF);
 
sc->sc_version = XREAD2(sc, XHCI_HCIVERSION);
-   printf(", xHCI %u.%u\n", sc->sc_version >> 8, sc->sc_version & 0xff);
+   printf(", xHCI %x.%x\n", sc->sc_version >> 8, sc->sc_version & 0xff);
 
 #ifdef XHCI_DEBUG
printf("%s: CAPLENGTH=%#lx\n", DEVNAME(sc), sc->sc_oper_off);



Re: [PATCH] ure improvement

2017-07-27 Thread sc dying
On 2017/07/27 12:22, Martin Pieuchot wrote:
> On 25/07/17(Tue) 00:30, sc dying wrote:
>> On 2017/07/24 14:35, Martin Pieuchot wrote:
>>> [...]
>>> Here's a diff to try and play with.  I'd guess the problem is in
>>> pipe_close().  Which state has the EP you're closing?  Do we close
>>> the EP correctly?
>>
>> EP 3 is RX pipe, EP 4 is TX pipe.
>> Looks good to me.
>
> Does that mean it works?  Do you still see a RXERROR with it?

Still get IOERROR.

>>> - /* Mask the endpoint */
>>> + /* Stop the endpoint */
>>> + bus_dmamap_sync(sdev->ictx_dma.tag, sdev->ictx_dma.map, 0,
>>> +sc->sc_pagesize, BUS_DMASYNC_POSTREAD);
>>> + state = letoh32(XHCI_EPCTX_STATE(sdev->ep_ctx[xp->dci - 1]->info_lo));
>>
>> Do you wanna see output context (device context, 6.2.1)?
>
> I'm not sure why would I, could you elaborate?
>
> I though the Endpoint Context, sec. 6.2.3, is enough.  Updated diff with
> a corrected letoh32().

The input context gives the data to xHC from system software, and
device context (aka output context) gives the data to software from xHC.
We will peek the endpoint state modified by xHC, so that we should see
octx_dma.
xhci_setaddr is a good example. It peeks the state of slot context
in debug code.

>
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.73
> diff -u -p -r1.73 xhci.c
> --- xhci.c 22 Jun 2017 02:44:37 - 1.73
> +++ xhci.c 27 Jul 2017 12:21:54 -
> @@ -1251,13 +1251,22 @@ xhci_pipe_close(struct usbd_pipe *pipe)
>   struct xhci_softc *sc = (struct xhci_softc *)pipe->device->bus;
>   struct xhci_pipe *lxp, *xp = (struct xhci_pipe *)pipe;
>   struct xhci_soft_dev *sdev = >sc_sdevs[xp->slot];
> + uint32_t state;
>   int i;
>
>   /* Root Hub */
>   if (pipe->device->depth == 0)
>   return;
>
> - /* Mask the endpoint */
> + /* If required, stop the endpoint */
> + bus_dmamap_sync(sdev->ictx_dma.tag, sdev->ictx_dma.map, 0,
> +sc->sc_pagesize, BUS_DMASYNC_POSTREAD);
> + state = XHCI_EPCTX_STATE(letoh32(sdev->ep_ctx[xp->dci - 1]->info_lo));
> + printf("%s: EP %d state=%x\n", DEVNAME(sc), xp->dci, state);
> + if (state != XHCI_EP_STOPPED && xhci_cmd_stop_ep(sc, xp->slot, xp->dci))
> + DPRINTF(("%s: error stopping ep (%d)\n", DEVNAME(sc), xp->dci));
> +
> + /* Mask it */
>   sdev->input_ctx->drop_flags = htole32(XHCI_INCTX_MASK_DCI(xp->dci));
>   sdev->input_ctx->add_flags = 0;
>
>



Re: [PATCH] ure improvement

2017-07-21 Thread sc dying
On 2017/07/20 08:31, Martin Pieuchot wrote:
> On 18/07/17(Tue) 13:43, sc dying wrote:
>> On 2017/07/18 09:12, Martin Pieuchot wrote:
>>> On 17/07/17(Mon) 15:24, sc dying wrote:
>>>> On 2017/07/17 08:24, Martin Pieuchot wrote:
>>>>> On 15/07/17(Sat) 21:16, sc dying wrote:
>>>>>> - Call usbd_set_config before configuring endpoints in ure_init to fix
>>>>>>an error when re-opening pipes.  I grabbed the code from if_kue.c.
>>>>>
>>>>> Which error? Calling usbd_set_config() should be avoid as much as
>>>>> possible in driver code.
>>>>
>>>> Without patch, ure on usb-3 port says this error
>>>>
>>>> ure0: usb errors on rx: IOERROR
>>>>
>>>> when down the interface and it up.
>>>
>>> This is certainly a bug in the way pipe are closed, or more likely in
>>> xhci(4).  Can you reproduce the problem on ehci(4)?  Do you find
>>> anything useful if you define XHCI_DEBUG in your kernel?
>>
>> This problem is not seen on ehci.
>>
>> On xhci with XHCI_DEBUG, curiously, it does not happen.
>> I'll look into this.
>
> Great, I committed your diff without this chunk.  I'd be glad to hear
> from you if you find the problem with xhci(4).

Thank you for applying the patch.

About IOERROR, it happens even if XHCI_DEBUG is defined.
It happens on usb 3 mode port, but does not on usb 2
even if it is xhci's port.

All I did for test are
1) Plug-in the device
2) ifconfig ure0 inet6 fe80::1
3) ping6 from other host to fe80::1
4) ifconfig ure0 down
5) ifconfig ure0 up
6) ping6 again

I forgot step 3) to check if the device replies to ping.

If the device does not receive any packets, it can be re-up
without errors and works correctly after interface down and up.
However, once it receives a packet, re-opening bulk-in pipe for RX
immediately fails with TXERR (event_xfer translates it to IOERROR).

xhci 4.6.6 says closing all pipes transitions the device state
from Configured to Addressed.
I thought it is needed to issue configure_ep and SET_CONFIG to transition
the device to Configured state correcly again as described in 4.6.6,
but this does not explain why the device works on usb 2.



Re: [PATCH] ure improvement

2017-07-18 Thread sc dying
On 2017/07/18 09:12, Martin Pieuchot wrote:
> On 17/07/17(Mon) 15:24, sc dying wrote:
>> On 2017/07/17 08:24, Martin Pieuchot wrote:
>>> On 15/07/17(Sat) 21:16, sc dying wrote:
>>>> - Call usbd_set_config before configuring endpoints in ure_init to fix
>>>>an error when re-opening pipes.  I grabbed the code from if_kue.c.
>>>
>>> Which error? Calling usbd_set_config() should be avoid as much as
>>> possible in driver code.
>>
>> Without patch, ure on usb-3 port says this error
>>
>> ure0: usb errors on rx: IOERROR
>>
>> when down the interface and it up.
>
> This is certainly a bug in the way pipe are closed, or more likely in
> xhci(4).  Can you reproduce the problem on ehci(4)?  Do you find
> anything useful if you define XHCI_DEBUG in your kernel?

This problem is not seen on ehci.

On xhci with XHCI_DEBUG, curiously, it does not happen.
I'll look into this.



Re: [PATCH] ure improvement

2017-07-17 Thread sc dying
On 2017/07/17 08:24, Martin Pieuchot wrote:
> On 15/07/17(Sat) 21:16, sc dying wrote:
>> Hi,
>>
>> This patch does:
>>
>> - Enable RX aggregation.
>
> Does it work on all chips?

I don't have all, but it works with mine that have RTL8152 (ver 4c10)
and RTL8153 (ver 5c20).

>
>> - Fix RX packet buffer alignment, using roundup() macro in sys/params.h.
>
> Why is that needed?  pktlen should already be aligned, no?

If RX_AGG is enabled, multiple packets may be received in a transaction
and the chip puts each packet on 8 bytes boundary.
For the first packet buf points at aligned buffer, but for second packet
the buf + pktlen might points at unaligned pointer as pktlen is the
actual length of packet. It should be rounded up.

>
>> - Call usbd_set_config before configuring endpoints in ure_init to fix
>>an error when re-opening pipes.  I grabbed the code from if_kue.c.
>
> Which error? Calling usbd_set_config() should be avoid as much as
> possible in driver code.

Without patch, ure on usb-3 port says this error

ure0: usb errors on rx: IOERROR

when down the interface and it up.

>
>> - Make the chip recognize given MAC address.
>
> Nice
>
>> - Remove ure_reset in ure_init, becasue its already called from ure_stop.
>
> Your diff do not apply, please check your mail setup.

I attached the patch. gmail will encode it with base64.
Sorry about that.
--- sys/dev/usb/if_ure.cWed May  3 22:20:15 2017
+++ sys/dev/usb/if_ure.cMon Jun 19 09:11:09 2017
@@ -470,8 +470,6 @@ ure_init(void *xsc)
/* Cancel pending I/O. */
ure_stop(sc);
 
-   ure_reset(sc);
-
if (ure_rx_list_init(sc) == ENOBUFS) {
printf("%s: rx list init failed\n", sc->ure_dev.dv_xname);
splx(s);
@@ -484,9 +482,18 @@ ure_init(void *xsc)
return;
}
 
+#define URE_CONFIG_NO 1
+   if (usbd_set_config_no(sc->ure_udev, URE_CONFIG_NO, 1) ||
+   usbd_device2interface_handle(sc->ure_udev, URE_IFACE_IDX,
+>ure_iface))
+   printf("%s: set_config failed\n", sc->ure_dev.dv_xname);
+   usbd_delay_ms(sc->ure_udev, 10);
+
/* Set MAC address. */
+   ure_write_1(sc, URE_PLA_CRWECR, URE_MCU_TYPE_PLA, URE_CRWECR_CONFIG);
ure_write_mem(sc, URE_PLA_IDR, URE_MCU_TYPE_PLA | URE_BYTE_EN_SIX_BYTES,
sc->ure_ac.ac_enaddr, 8);
+   ure_write_1(sc, URE_PLA_CRWECR, URE_MCU_TYPE_PLA, URE_CRWECR_NORAML);
 
/* Reset the packet filter. */
ure_write_2(sc, URE_PLA_FMC, URE_MCU_TYPE_PLA,
@@ -683,10 +690,10 @@ ure_rtl8152_init(struct ure_softc *sc)
URE_GPHY_STS_MSK | URE_SPEED_DOWN_MSK | URE_SPDWN_RXDV_MSK |
URE_SPDWN_LINKCHG_MSK);
 
-   /* Disable Rx aggregation. */
+   /* Enable Rx aggregation. */
ure_write_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB,
-   ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) |
-   URE_RX_AGG_DISABLE);
+   ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) &
+   ~URE_RX_AGG_DISABLE);
 
/* Disable ALDPS. */
ure_ocp_reg_write(sc, URE_OCP_ALDPS_CONFIG, URE_ENPDNPS | URE_LINKENA |
@@ -835,10 +842,10 @@ ure_rtl8153_init(struct ure_softc *sc)
 
ure_init_fifo(sc);
 
-   /* Disable Rx aggregation. */
+   /* Enable Rx aggregation. */
ure_write_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB,
-   ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) |
-   URE_RX_AGG_DISABLE);
+   ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) &
+   ~URE_RX_AGG_DISABLE);
 
val = ure_read_2(sc, URE_USB_U2P3_CTRL, URE_MCU_TYPE_USB);
if (!(sc->ure_chip & (URE_CHIP_VER_5C00 | URE_CHIP_VER_5C10)))
@@ -1289,7 +1296,7 @@ ure_rxeof(struct usbd_xfer *xfer, void *priv, usbd_sta
goto done;
}
 
-   buf += pktlen;
+   buf += roundup(pktlen, 8);
 
memcpy(, buf, sizeof(rxhdr));
total_len -= sizeof(rxhdr);
@@ -1302,7 +1309,7 @@ ure_rxeof(struct usbd_xfer *xfer, void *priv, usbd_sta
goto done;
}
 
-   total_len -= pktlen;
+   total_len -= roundup(pktlen, 8);
buf += sizeof(rxhdr);
 
m = m_devget(buf, pktlen, ETHER_ALIGN);


[PATCH] axen improvement

2017-07-15 Thread sc dying
Hi,

This patch does:
- Fix axen won't work after interface down.
- Fix definitions and comments that differ from linux ones.

regards,

--- sys/dev/usb/if_axen.c Fri Mar  3 15:04:52 2017
+++ sys/dev/usb/if_axen.c Mon Jun 19 16:05:09 2017
@@ -1256,6 +1256,14 @@ axen_init(void *xsc)
  */
  axen_reset(sc);

+#define AXEN_CONFIG_NO 1
+#define AXEN_IFACE_IDX 0
+ if (usbd_set_config_no(sc->axen_udev, AXEN_CONFIG_NO, 1) ||
+usbd_device2interface_handle(sc->axen_udev, AXEN_IFACE_IDX,
+ >axen_iface))
+ printf("%s: set_config failed\n", sc->axen_dev.dv_xname);
+ usbd_delay_ms(sc->axen_udev, 10);
+
  /* XXX: ? */
  bval = 0x01;
  axen_lock_mii(sc);
--- sys/dev/usb/if_axenreg.h Fri Sep 16 22:17:07 2016
+++ sys/dev/usb/if_axenreg.h Mon Jun 19 10:54:28 2017
@@ -26,8 +26,8 @@
   * || ++-L3_type (1:ipv4, 0/2:ipv6)
   *pkt_len(13)  || ||+ ++-L4_type(0: icmp, 1: UDP, 4: TCP)
   * |765|43210 76543210|7654 3210 7654 3210|
- *  ||+-crc_err  |+-L4_err |+-L4_CSUM_ERR
- *  |+-mii_err   +--L3_err +--L3_CSUM_ERR
+ *  ||+-crc_err   |+-L4_err |+-L4_CSUM_ERR
+ *  |+-mii_err+--L3_err +--L3_CSUM_ERR
   *  +-drop_err
   *
   * ex) pkt_hdr 0x00680820
@@ -70,7 +70,7 @@
  #define   AXEN_RXHDR_L4_TYPE_TCP 0x4

  /* L3 packet type (2bit) */
-#define AXEN_RXHDR_L3_TYPE_MASK 0x0600
+#define AXEN_RXHDR_L3_TYPE_MASK 0x0060
  #define AXEN_RXHDR_L3_TYPE_OFFSET 5
  #define   AXEN_RXHDR_L3_TYPE_UNDEF 0x0
  #define   AXEN_RXHDR_L3_TYPE_IPV4 0x1



[PATCH] ure improvement

2017-07-15 Thread sc dying
Hi,

This patch does:

- Enable RX aggregation.
- Fix RX packet buffer alignment, using roundup() macro in sys/params.h.
- Call usbd_set_config before configuring endpoints in ure_init to fix
   an error when re-opening pipes.  I grabbed the code from if_kue.c.
- Make the chip recognize given MAC address.
- Remove ure_reset in ure_init, becasue its already called from ure_stop.

Regards,

--- sys/dev/usb/if_ure.c Wed May  3 22:20:15 2017
+++ sys/dev/usb/if_ure.c Mon Jun 19 09:11:09 2017
@@ -470,8 +470,6 @@ ure_init(void *xsc)
  /* Cancel pending I/O. */
  ure_stop(sc);

- ure_reset(sc);
-
  if (ure_rx_list_init(sc) == ENOBUFS) {
  printf("%s: rx list init failed\n", sc->ure_dev.dv_xname);
  splx(s);
@@ -484,9 +482,18 @@ ure_init(void *xsc)
  return;
  }

+#define URE_CONFIG_NO 1
+ if (usbd_set_config_no(sc->ure_udev, URE_CONFIG_NO, 1) ||
+usbd_device2interface_handle(sc->ure_udev, URE_IFACE_IDX,
+ >ure_iface))
+ printf("%s: set_config failed\n", sc->ure_dev.dv_xname);
+ usbd_delay_ms(sc->ure_udev, 10);
+
  /* Set MAC address. */
+ ure_write_1(sc, URE_PLA_CRWECR, URE_MCU_TYPE_PLA, URE_CRWECR_CONFIG);
  ure_write_mem(sc, URE_PLA_IDR, URE_MCU_TYPE_PLA | URE_BYTE_EN_SIX_BYTES,
 sc->ure_ac.ac_enaddr, 8);
+ ure_write_1(sc, URE_PLA_CRWECR, URE_MCU_TYPE_PLA, URE_CRWECR_NORAML);

  /* Reset the packet filter. */
  ure_write_2(sc, URE_PLA_FMC, URE_MCU_TYPE_PLA,
@@ -683,10 +690,10 @@ ure_rtl8152_init(struct ure_softc *sc)
 URE_GPHY_STS_MSK | URE_SPEED_DOWN_MSK | URE_SPDWN_RXDV_MSK |
 URE_SPDWN_LINKCHG_MSK);

- /* Disable Rx aggregation. */
+ /* Enable Rx aggregation. */
  ure_write_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB,
-ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) |
-URE_RX_AGG_DISABLE);
+ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) &
+~URE_RX_AGG_DISABLE);

  /* Disable ALDPS. */
  ure_ocp_reg_write(sc, URE_OCP_ALDPS_CONFIG, URE_ENPDNPS | URE_LINKENA |
@@ -835,10 +842,10 @@ ure_rtl8153_init(struct ure_softc *sc)

  ure_init_fifo(sc);

- /* Disable Rx aggregation. */
+ /* Enable Rx aggregation. */
  ure_write_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB,
-ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) |
-URE_RX_AGG_DISABLE);
+ure_read_2(sc, URE_USB_USB_CTRL, URE_MCU_TYPE_USB) &
+~URE_RX_AGG_DISABLE);

  val = ure_read_2(sc, URE_USB_U2P3_CTRL, URE_MCU_TYPE_USB);
  if (!(sc->ure_chip & (URE_CHIP_VER_5C00 | URE_CHIP_VER_5C10)))
@@ -1289,7 +1296,7 @@ ure_rxeof(struct usbd_xfer *xfer, void *priv, usbd_sta
  goto done;
  }

- buf += pktlen;
+ buf += roundup(pktlen, 8);

  memcpy(, buf, sizeof(rxhdr));
  total_len -= sizeof(rxhdr);
@@ -1302,7 +1309,7 @@ ure_rxeof(struct usbd_xfer *xfer, void *priv, usbd_sta
  goto done;
  }

- total_len -= pktlen;
+ total_len -= roundup(pktlen, 8);
  buf += sizeof(rxhdr);

  m = m_devget(buf, pktlen, ETHER_ALIGN);