Re: relayd patch for websocket upgrade

2021-10-19 Thread Jonathon Fletcher
On Sun, May 02, 2021 at 11:05:16AM -0700, Jonathon Fletcher wrote:
> On Sun, Mar 07, 2021 at 06:22:04PM -0800, Jonathon Fletcher wrote:
> > On Sun, Mar 07, 2021 at 06:46:33PM +0100, Marcus MERIGHI wrote:
> > > Hello Jonathon!
> > > 
> > > welcome to the party:
> > > 
> > > https://marc.info/?t=15833439123
> > > 
> > > especially the two comments by sthen@:
> > > 
> > > https://marc.info/?m=161349608614743
> > > https://marc.info/?m=16135019371
> > > 
> > > reyk@ removed from CC: on purpose: 
> > > https://twitter.com/reykfloeter/status/1284868070901776384
> > > 
> > > Marcus
> > > 
> > > jonathon.fletc...@gmail.com (Jonathon Fletcher), 2021.03.06 (Sat) 21:02 
> > > (CET):
> > > > When relayd relays a connection upgrade to a websocket, it relays
> > > > the outbound "Connection: Upgrade" header from the interal server.
> > > > 
> > > > It also tags on a "Connection: close" header to the outbound
> > > > response - ie the response goes out with two "Connection"
> > > > header lines.
> > > > 
> > > > Chrome and Netscape work despite the double upgrade/close connection 
> > > > headers. Safari fails.
> > > > 
> > > > Small patch below against 6.8 to only send the "Connection: close"
> > > > header if we are not handling a http_status 101.
> > > > 
> > > > Thanks,
> > > > Jonathon
> > > > 
> > > > 
> > > > cvs -q -d /cvs diff -ub -rOPENBSD_6_8 usr.sbin/relayd/relay_http.c
> > > > 
> > > > 
> 
> snip
> 
> > Marcus,
> > 
> > I did not realize that there was already a party. Apologies for my
> > previous, duplicate, patch.
> > 
> > Reading through the thread, I came to the conclusion that the comments
> > worried that the previous patch(es) were not specific enough.
> > 
> > The current relayd behaviour is that outbound http responses have a
> > "Connection: close" header/value attached to them by relayd.
> > This can result in multiple "Connection" headers in the response
> > which is .. not good.
> > 
> > The current behaviour is because relayd does not handle repeated http
> > request/response sequences after the first one and prefers to force the
> > http session to close. Fortunately for websockets, the protocol after
> > the websocket upgrade is not http and so there is no need for relayd
> > to look for or process http headers after the upgrade.
> > 
> > Here is an updated patch. This avoids the incorrect current in-tree
> > behaviour in the following specific sitations:
> > 
> > 1: The headers for an outbound (internal -> external) response already
> >include "Connection: Upgrade", "Upgrade: websocket" and the config
> >permits websocket upgrades, or
> > 
> > 2: The headers for an outbound response already include a Connection
> >header with the value "close" - so do not send a duplicate as the
> >in-tree code currently does.
> > 
> > I think this is specfic enough for now. In order for a websocket upgrade
> > to work the external client has to request it and the internal server 
> > has to respond in agreement.
> > 
> > I am explicit about websocket upgrades in my configs: I require the
> > "Connection" and "Upgrade" headers in the rule that directs traffic
> > to the websocket pool:
> > 
> > 
> > pass request quick \
> > header "Host" value "ws.example.com" \
> > header "Connection" value "Upgrade" \
> > header "Upgrade" value "websocket" \
> > forward to 
> > 
> > 
> > This is for my use cases (tls accelerator) and relayd is adept at
> > handling them. I really appreciate the functionality of relayd in base.
> > 
> > Let me know if there are specific concerns about the patch below or
> > if there are specific preferred ways to accomplish better compliance
> > with the RFC within the context of relayd.
> > 
> > Thanks,
> > Jonathon
> > 
> > The Connection header is covered in:
> > 
> > https://tools.ietf.org/html/rfc7230#section-6
> > 
> 
> Here is the same relayd websocket upgrade patch as above, but
> against OPENBSD_6_9.
> 
> Thanks,
> Jonathon

Hi, here is the

Re: relayd patch for websocket upgrade

2021-05-02 Thread Jonathon Fletcher
On Sun, Mar 07, 2021 at 06:22:04PM -0800, Jonathon Fletcher wrote:
> On Sun, Mar 07, 2021 at 06:46:33PM +0100, Marcus MERIGHI wrote:
> > Hello Jonathon!
> > 
> > welcome to the party:
> > 
> > https://marc.info/?t=15833439123
> > 
> > especially the two comments by sthen@:
> > 
> > https://marc.info/?m=161349608614743
> > https://marc.info/?m=16135019371
> > 
> > reyk@ removed from CC: on purpose: 
> > https://twitter.com/reykfloeter/status/1284868070901776384
> > 
> > Marcus
> > 
> > jonathon.fletc...@gmail.com (Jonathon Fletcher), 2021.03.06 (Sat) 21:02 
> > (CET):
> > > When relayd relays a connection upgrade to a websocket, it relays
> > > the outbound "Connection: Upgrade" header from the interal server.
> > > 
> > > It also tags on a "Connection: close" header to the outbound
> > > response - ie the response goes out with two "Connection"
> > > header lines.
> > > 
> > > Chrome and Netscape work despite the double upgrade/close connection 
> > > headers. Safari fails.
> > > 
> > > Small patch below against 6.8 to only send the "Connection: close"
> > > header if we are not handling a http_status 101.
> > > 
> > > Thanks,
> > > Jonathon
> > > 
> > > 
> > > cvs -q -d /cvs diff -ub -rOPENBSD_6_8 usr.sbin/relayd/relay_http.c
> > > 
> > > 

snip

> Marcus,
> 
> I did not realize that there was already a party. Apologies for my
> previous, duplicate, patch.
> 
> Reading through the thread, I came to the conclusion that the comments
> worried that the previous patch(es) were not specific enough.
> 
> The current relayd behaviour is that outbound http responses have a
> "Connection: close" header/value attached to them by relayd.
> This can result in multiple "Connection" headers in the response
> which is .. not good.
> 
> The current behaviour is because relayd does not handle repeated http
> request/response sequences after the first one and prefers to force the
> http session to close. Fortunately for websockets, the protocol after
> the websocket upgrade is not http and so there is no need for relayd
> to look for or process http headers after the upgrade.
> 
> Here is an updated patch. This avoids the incorrect current in-tree
> behaviour in the following specific sitations:
> 
> 1: The headers for an outbound (internal -> external) response already
>include "Connection: Upgrade", "Upgrade: websocket" and the config
>permits websocket upgrades, or
> 
> 2: The headers for an outbound response already include a Connection
>header with the value "close" - so do not send a duplicate as the
>in-tree code currently does.
> 
> I think this is specfic enough for now. In order for a websocket upgrade
> to work the external client has to request it and the internal server 
> has to respond in agreement.
> 
> I am explicit about websocket upgrades in my configs: I require the
> "Connection" and "Upgrade" headers in the rule that directs traffic
> to the websocket pool:
> 
> 
> pass request quick \
> header "Host" value "ws.example.com" \
> header "Connection" value "Upgrade" \
> header "Upgrade" value "websocket" \
> forward to 
> 
> 
> This is for my use cases (tls accelerator) and relayd is adept at
> handling them. I really appreciate the functionality of relayd in base.
> 
> Let me know if there are specific concerns about the patch below or
> if there are specific preferred ways to accomplish better compliance
> with the RFC within the context of relayd.
> 
> Thanks,
> Jonathon
> 
> The Connection header is covered in:
> 
> https://tools.ietf.org/html/rfc7230#section-6
> 

Here is the same relayd websocket upgrade patch as above, but
against OPENBSD_6_9.

Thanks,
Jonathon

Index: usr.sbin/relayd/relay_http.c
===
RCS file: /cvs/src/usr.sbin/relayd/relay_http.c,v
retrieving revision 1.81
diff -u -p -u -b -r1.81 relay_http.c
--- usr.sbin/relayd/relay_http.c24 Mar 2021 20:59:54 -  1.81
+++ usr.sbin/relayd/relay_http.c2 May 2021 17:45:09 -
@@ -180,6 +180,8 @@ relay_read_http(struct bufferevent *bev,
struct http_method_node *hmn;
struct http_session *hs;
enum httpmethod  request_method;
+   struct kv   *connection_close = NULL;
+   int  

Re: relayd patch for websocket upgrade

2021-03-07 Thread Jonathon Fletcher
On Sun, Mar 07, 2021 at 06:46:33PM +0100, Marcus MERIGHI wrote:
> Hello Jonathon!
> 
> welcome to the party:
> 
> https://marc.info/?t=15833439123
> 
> especially the two comments by sthen@:
> 
> https://marc.info/?m=161349608614743
> https://marc.info/?m=16135019371
> 
> reyk@ removed from CC: on purpose: 
> https://twitter.com/reykfloeter/status/1284868070901776384
> 
> Marcus
> 
> jonathon.fletc...@gmail.com (Jonathon Fletcher), 2021.03.06 (Sat) 21:02 (CET):
> > When relayd relays a connection upgrade to a websocket, it relays
> > the outbound "Connection: Upgrade" header from the interal server.
> > 
> > It also tags on a "Connection: close" header to the outbound
> > response - ie the response goes out with two "Connection"
> > header lines.
> > 
> > Chrome and Netscape work despite the double upgrade/close connection 
> > headers. Safari fails.
> > 
> > Small patch below against 6.8 to only send the "Connection: close"
> > header if we are not handling a http_status 101.
> > 
> > Thanks,
> > Jonathon
> > 
> > 
> > cvs -q -d /cvs diff -ub -rOPENBSD_6_8 usr.sbin/relayd/relay_http.c
> > 
> > 
> > Index: usr.sbin/relayd/relay_http.c
> > ===
> > RCS file: /cvs/src/usr.sbin/relayd/relay_http.c,v
> > retrieving revision 1.79
> > diff -u -p -u -b -r1.79 relay_http.c
> > --- usr.sbin/relayd/relay_http.c4 Sep 2020 13:09:14 -   1.79
> > +++ usr.sbin/relayd/relay_http.c6 Mar 2021 19:46:56 -
> > @@ -526,7 +526,7 @@ relay_read_http(struct bufferevent *bev,
> >  * Ask the server to close the connection after this request
> >  * since we don't read any further request headers.
> >  */
> > -   if (cre->toread == TOREAD_UNLIMITED)
> > +   if (cre->toread == TOREAD_UNLIMITED && desc->http_status != 101)
> > if (kv_add(>http_headers, "Connection",
> > "close", 0) == NULL)
> > goto fail;
> > 


Marcus,

I did not realize that there was already a party. Apologies for my
previous, duplicate, patch.

Reading through the thread, I came to the conclusion that the comments
worried that the previous patch(es) were not specific enough.

The current relayd behaviour is that outbound http responses have a
"Connection: close" header/value attached to them by relayd.
This can result in multiple "Connection" headers in the response
which is .. not good.

The current behaviour is because relayd does not handle repeated http
request/response sequences after the first one and prefers to force the
http session to close. Fortunately for websockets, the protocol after
the websocket upgrade is not http and so there is no need for relayd
to look for or process http headers after the upgrade.

Here is an updated patch. This avoids the incorrect current in-tree
behaviour in the following specific sitations:

1: The headers for an outbound (internal -> external) response already
   include "Connection: Upgrade", "Upgrade: websocket" and the config
   permits websocket upgrades, or

2: The headers for an outbound response already include a Connection
   header with the value "close" - so do not send a duplicate as the
   in-tree code currently does.

I think this is specfic enough for now. In order for a websocket upgrade
to work the external client has to request it and the internal server 
has to respond in agreement.

I am explicit about websocket upgrades in my configs: I require the
"Connection" and "Upgrade" headers in the rule that directs traffic
to the websocket pool:


pass request quick \
header "Host" value "ws.example.com" \
header "Connection" value "Upgrade" \
header "Upgrade" value "websocket" \
forward to 


This is for my use cases (tls accelerator) and relayd is adept at
handling them. I really appreciate the functionality of relayd in base.

Let me know if there are specific concerns about the patch below or
if there are specific preferred ways to accomplish better compliance
with the RFC within the context of relayd.

Thanks,
Jonathon

The Connection header is covered in:

https://tools.ietf.org/html/rfc7230#section-6




Index: usr.sbin/relayd/relay_http.c
===
RCS file: /cvs/src/usr.sbin/relayd/relay_http.c,v
retrieving revision 1.79
diff -u -p -u -b -r1.79 relay_http.c
--- usr.sbin/relayd/relay_http.c  

relayd patch for websocket upgrade

2021-03-06 Thread Jonathon Fletcher



When relayd relays a connection upgrade to a websocket, it relays
the outbound "Connection: Upgrade" header from the interal server.

It also tags on a "Connection: close" header to the outbound
response - ie the response goes out with two "Connection"
header lines.

Chrome and Netscape work despite the double upgrade/close connection 
headers. Safari fails.

Small patch below against 6.8 to only send the "Connection: close"
header if we are not handling a http_status 101.

Thanks,
Jonathon


cvs -q -d /cvs diff -ub -rOPENBSD_6_8 usr.sbin/relayd/relay_http.c


Index: usr.sbin/relayd/relay_http.c
===
RCS file: /cvs/src/usr.sbin/relayd/relay_http.c,v
retrieving revision 1.79
diff -u -p -u -b -r1.79 relay_http.c
--- usr.sbin/relayd/relay_http.c4 Sep 2020 13:09:14 -   1.79
+++ usr.sbin/relayd/relay_http.c6 Mar 2021 19:46:56 -
@@ -526,7 +526,7 @@ relay_read_http(struct bufferevent *bev,
 * Ask the server to close the connection after this request
 * since we don't read any further request headers.
 */
-   if (cre->toread == TOREAD_UNLIMITED)
+   if (cre->toread == TOREAD_UNLIMITED && desc->http_status != 101)
if (kv_add(>http_headers, "Connection",
"close", 0) == NULL)
goto fail;



Re: xhci zero length transfers 'leak' one transfer buffer count

2020-10-11 Thread Jonathon Fletcher
On Sun, Oct 11, 2020 at 12:14:22PM +0200, Patrick Wildt wrote:
> On Sun, Oct 11, 2020 at 08:12:29AM +0200, Martin Pieuchot wrote:
> > On 09/10/20(Fri) 12:37, Jonathon Fletcher wrote:
> > > In xhci_xfer_get_trb, the count of transfer buffers in the pipe 
> > > (xp->free_trbs) is always decremented but the count of transfer buffers 
> > > used in the transfer (xx->ntrb) is not incremented for zero-length 
> > > transfers. The result of this is that, at the end of a zero length 
> > > transfer, xp->free_trbs has 'lost' one.
> > > 
> > > Over time, this mismatch of unconditional decrement (xp->free_trbs) vs 
> > > conditional increment (xx->ntrb) results in xhci_device_*_start returning 
> > > USBD_NOMEM.
> > > 
> > > The patch below works around this by only decrementing xp->free_trbs in 
> > > the cases when xx->ntrb is incremented.
> > 
> > Did you consider incrementing xx->ntrb instead?

xp->free_trbs is used for accounting at the pipe level.
xx->ntrb is also used in ring index calculations and
changing that would have larger effects.

> That doesn't work either, because the status completion code needs
> xx->ntrb to be correct for the data TD to be handled correctly.
> Incrementing xx->ntrb means the number of TRBs for the data TD is
> incorrect, since it includes the (optional) zero TD's TRB.
> 
> In this case the zero TD allocates a TRB but doesn't do proper
> accounting, and currently there's no place where this could be
> accounted properly.
> 
> In the end it's all software, so I guess the diff will simply have
> to be bigger than just a one-liner.

There are two (or more) bugs.

Patrick is correct about the major issue - zero length transfers
use a transfer buffer and it is not correctly accounted for. I
do not have a patch for this and I agree with Patrick that it
will be bigger than a one-liner.

My minor patch works around an accounting mismatch between the
pipe (xp->free_trbs) and the transfer (xx->ntrb). This accounting
mismatch will cause a xhci controller to stop working (via
USBD_NOMEM) after a fixed number of zero-length transfers
following pipe initialization. This patch does not change
the major issue that Patrick describes.

> > With the diff below the produced TRB isn't accounted which might lead to
> > and off-by-one.
> > 
> > > Index: xhci.c
> > > ===
> > > RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> > > retrieving revision 1.119
> > > diff -u -p -u -r1.119 xhci.c
> > > --- xhci.c31 Jul 2020 19:27:57 -  1.119
> > > +++ xhci.c9 Oct 2020 19:11:45 -
> > > @@ -1836,7 +1836,6 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
> > >   struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
> > >  
> > >   KASSERT(xp->free_trbs >= 1);
> > > - xp->free_trbs--;
> > >   *togglep = xp->ring.toggle;
> > >  
> > >   switch (last) {
> > > @@ -1847,11 +1846,13 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
> > >   xp->pending_xfers[xp->ring.index] = xfer;
> > >   xx->index = -2;
> > >   xx->ntrb += 1;
> > > + xp->free_trbs--;
> > >   break;
> > >   case 1: /* This will terminate a chain. */
> > >   xp->pending_xfers[xp->ring.index] = xfer;
> > >   xx->index = xp->ring.index;
> > >   xx->ntrb += 1;
> > > + xp->free_trbs--;
> > >   break;
> > >   }
> > >  
> > > 
> > 



xhci zero length transfers 'leak' one transfer buffer count

2020-10-09 Thread Jonathon Fletcher


In xhci_xfer_get_trb, the count of transfer buffers in the pipe (xp->free_trbs) 
is always decremented but the count of transfer buffers used in the transfer 
(xx->ntrb) is not incremented for zero-length transfers. The result of this is 
that, at the end of a zero length transfer, xp->free_trbs has 'lost' one.

Over time, this mismatch of unconditional decrement (xp->free_trbs) vs 
conditional increment (xx->ntrb) results in xhci_device_*_start returning 
USBD_NOMEM.

The patch below works around this by only decrementing xp->free_trbs in the 
cases when xx->ntrb is incremented.

Jonathon


Index: xhci.c
===
RCS file: /cvs/src/sys/dev/usb/xhci.c,v
retrieving revision 1.119
diff -u -p -u -r1.119 xhci.c
--- xhci.c  31 Jul 2020 19:27:57 -  1.119
+++ xhci.c  9 Oct 2020 19:11:45 -
@@ -1836,7 +1836,6 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
struct xhci_xfer *xx = (struct xhci_xfer *)xfer;
 
KASSERT(xp->free_trbs >= 1);
-   xp->free_trbs--;
*togglep = xp->ring.toggle;
 
switch (last) {
@@ -1847,11 +1846,13 @@ xhci_xfer_get_trb(struct xhci_softc *sc,
xp->pending_xfers[xp->ring.index] = xfer;
xx->index = -2;
xx->ntrb += 1;
+   xp->free_trbs--;
break;
case 1: /* This will terminate a chain. */
xp->pending_xfers[xp->ring.index] = xfer;
xx->index = xp->ring.index;
xx->ntrb += 1;
+   xp->free_trbs--;
break;
}
 



Re: Improve ure(4) performance

2020-08-03 Thread Jonathon Fletcher
On Thu, Jul 30, 2020 at 06:56:48PM +, Mikolaj Kucharski wrote:
> On Thu, Jul 30, 2020 at 07:58:23AM -0700, Jonathon Fletcher wrote:
> > 
> > Mikolaj,
> > 
> > I hope the patch has been stable for you. I do have an update - it appears
> > that a splx(s) got lost along the way (from the end of ure_txeof). This
> > patch adds that back in and has some minor cleanup (variable name, cleanup
> > defines, adjust the setting of flags and buffer sizes based on device type).
> > I have been running this for a couple of days now.
> 
> Yes, no issues so far. I managed to have on Google Meet call, audio only
> and one Zoom call (was surprised it actually worked on OpenBSD, but
> needed to fake user agent to Linux). Network card didn't fail like
> before your diff during VoIP calls. That's really great. I just compiled
> new version of the kernel with your below patch, will reboot my laptop
> tonight to switch to that new kernel. Thank you!

Assuming no issues with this, I would like to get it into the tree. This
patch predates Marcus' usbd_abort_pipe patch by a few weeks. Let me know
if you want an updated patch against HEAD after his patch.

After reports from multiple people, this patch improves tx performance on
ure from about 90-110MBps to between about 250MBps (Mikolaj), 290MBps (Joshua), 
550MBps (me), 1900MBps (Kevin), depending on ure device, network setup, and
underlying hardware.

Thanks,
Jonathon

> > 
> > Index: sys/dev/usb/if_urereg.h
> > ===
> > RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v
> > retrieving revision 1.8
> > diff -u -p -u -r1.8 if_urereg.h
> > --- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 -   1.8
> > +++ sys/dev/usb/if_urereg.h 28 Jul 2020 15:30:41 -
> > @@ -494,28 +494,30 @@ struct ure_txpkt {
> >  #define URE_ENDPT_TX   1
> >  #define URE_ENDPT_MAX  2
> >  
> > -#defineURE_TX_LIST_CNT 8
> > +#defineURE_TX_LIST_CNT 1
> >  #defineURE_RX_LIST_CNT 1
> > -#defineURE_RX_BUF_ALIGNsizeof(uint64_t)
> > +#defineURE_TX_BUF_ALIGN4
> > +#defineURE_RX_BUF_ALIGN8
> >  
> > -#defineURE_TXBUFSZ 16384
> > -#defineURE_8152_RXBUFSZ16384
> > -#defineURE_8153_RXBUFSZ32768
> > +#defineURE_TX_BUFSZ16384
> > +#defineURE_8152_RX_BUFSZ   16384
> > +#defineURE_8153_RX_BUFSZ   32768
> >  
> >  struct ure_chain {
> > struct ure_softc*uc_sc;
> > struct usbd_xfer*uc_xfer;
> > char*uc_buf;
> > -   struct mbuf *uc_mbuf;
> > -   int uc_accum;
> > -   int uc_idx;
> > +   uint32_tuc_buflen;
> > +   uint32_tuc_bufmax;
> > +   struct mbuf_listuc_mbuf_list;
> > +   SLIST_ENTRY(ure_chain)  uc_list;
> > +   uint8_t uc_idx;
> >  };
> >  
> >  struct ure_cdata {
> > -   struct ure_chaintx_chain[URE_TX_LIST_CNT];
> > -   struct ure_chainrx_chain[URE_RX_LIST_CNT];
> > -   int tx_prod;
> > -   int tx_cnt;
> > +   struct ure_chainure_rx_chain[URE_RX_LIST_CNT];
> > +   struct ure_chainure_tx_chain[URE_TX_LIST_CNT];
> > +   SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free;
> >  };
> >  
> >  struct ure_softc {
> > @@ -541,7 +543,7 @@ struct ure_softc {
> >  
> > struct timeval  ure_rx_notice;
> > int ure_rxbufsz;
> > -   int ure_tx_list_cnt;
> > +   int ure_txbufsz;
> >  
> > int ure_phyno;
> >  
> > Index: sys/dev/usb/if_ure.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/if_ure.c,v
> > retrieving revision 1.16
> > diff -u -p -u -r1.16 if_ure.c
> > --- sys/dev/usb/if_ure.c10 Jul 2020 13:26:41 -  1.16
> > +++ sys/dev/usb/if_ure.c28 Jul 2020 22:56:29 -
> > @@ -117,11 +117,13 @@ void  ure_miibus_writereg(struct device 
> >  void   ure_lock_mii(struct ure_softc *);
> >  void   ure_unlock_mii(struct ure_softc *);
> >  
> > -inture_encap(struct ure_softc *, struct mbuf *);
> > +inture_encap_txpkt(struct mbuf *, char *, uint32_t);
> > +inture_encap_xfer(struct ifnet *, struct ure_softc *, 
> > struct u

Re: Improve ure(4) performance

2020-07-30 Thread Jonathon Fletcher
On Mon, Jul 27, 2020 at 07:13:33PM +, Mikolaj Kucharski wrote:
> On Mon, Jul 27, 2020 at 11:58:02AM -0700, Jonathon Fletcher wrote:
> > 
> > Are you ok using patch's -l option for this patch?
> > 
> 
> Sure, can do. New kernel compiled, will switch my machine tonight. Will
> let you know how things go. For meetings I would need couple of days, as
> I don't have Google Meet calls that often.

Mikolaj,

I hope the patch has been stable for you. I do have an update - it appears
that a splx(s) got lost along the way (from the end of ure_txeof). This
patch adds that back in and has some minor cleanup (variable name, cleanup
defines, adjust the setting of flags and buffer sizes based on device type).
I have been running this for a couple of days now.

Jonathon


Index: sys/dev/usb/if_urereg.h
===
RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v
retrieving revision 1.8
diff -u -p -u -r1.8 if_urereg.h
--- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 -   1.8
+++ sys/dev/usb/if_urereg.h 28 Jul 2020 15:30:41 -
@@ -494,28 +494,30 @@ struct ure_txpkt {
 #define URE_ENDPT_TX   1
 #define URE_ENDPT_MAX  2
 
-#defineURE_TX_LIST_CNT 8
+#defineURE_TX_LIST_CNT 1
 #defineURE_RX_LIST_CNT 1
-#defineURE_RX_BUF_ALIGNsizeof(uint64_t)
+#defineURE_TX_BUF_ALIGN4
+#defineURE_RX_BUF_ALIGN8
 
-#defineURE_TXBUFSZ 16384
-#defineURE_8152_RXBUFSZ16384
-#defineURE_8153_RXBUFSZ32768
+#defineURE_TX_BUFSZ16384
+#defineURE_8152_RX_BUFSZ   16384
+#defineURE_8153_RX_BUFSZ   32768
 
 struct ure_chain {
struct ure_softc*uc_sc;
struct usbd_xfer*uc_xfer;
char*uc_buf;
-   struct mbuf *uc_mbuf;
-   int uc_accum;
-   int uc_idx;
+   uint32_tuc_buflen;
+   uint32_tuc_bufmax;
+   struct mbuf_listuc_mbuf_list;
+   SLIST_ENTRY(ure_chain)  uc_list;
+   uint8_t uc_idx;
 };
 
 struct ure_cdata {
-   struct ure_chaintx_chain[URE_TX_LIST_CNT];
-   struct ure_chainrx_chain[URE_RX_LIST_CNT];
-   int tx_prod;
-   int tx_cnt;
+   struct ure_chainure_rx_chain[URE_RX_LIST_CNT];
+   struct ure_chainure_tx_chain[URE_TX_LIST_CNT];
+   SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free;
 };
 
 struct ure_softc {
@@ -541,7 +543,7 @@ struct ure_softc {
 
struct timeval  ure_rx_notice;
int ure_rxbufsz;
-   int ure_tx_list_cnt;
+   int ure_txbufsz;
 
int ure_phyno;
 
Index: sys/dev/usb/if_ure.c
===
RCS file: /cvs/src/sys/dev/usb/if_ure.c,v
retrieving revision 1.16
diff -u -p -u -r1.16 if_ure.c
--- sys/dev/usb/if_ure.c10 Jul 2020 13:26:41 -  1.16
+++ sys/dev/usb/if_ure.c28 Jul 2020 22:56:29 -
@@ -117,11 +117,13 @@ void  ure_miibus_writereg(struct device 
 void   ure_lock_mii(struct ure_softc *);
 void   ure_unlock_mii(struct ure_softc *);
 
-inture_encap(struct ure_softc *, struct mbuf *);
+inture_encap_txpkt(struct mbuf *, char *, uint32_t);
+inture_encap_xfer(struct ifnet *, struct ure_softc *, struct 
ure_chain *);
 void   ure_rxeof(struct usbd_xfer *, void *, usbd_status);
 void   ure_txeof(struct usbd_xfer *, void *, usbd_status);
-inture_rx_list_init(struct ure_softc *);
-inture_tx_list_init(struct ure_softc *);
+inture_xfer_list_init(struct ure_softc *, struct ure_chain *,
+   uint32_t, int);
+void   ure_xfer_list_free(struct ure_softc *, struct ure_chain *, int);
 
 void   ure_tick_task(void *);
 void   ure_tick(void *);
@@ -625,12 +627,36 @@ void
 ure_watchdog(struct ifnet *ifp)
 {
struct ure_softc*sc = ifp->if_softc;
+   struct ure_chain*c;
+   usbd_status err;
+   int i, s;
+
+   ifp->if_timer = 0;
 
if (usbd_is_dying(sc->ure_udev))
return;
 
+   if ((ifp->if_flags & (IFF_RUNNING|IFF_UP)) != (IFF_RUNNING|IFF_UP))
+   return;
+
+   sc = ifp->if_softc;
+   s = splnet();
+
ifp->if_oerrors++;
-   printf("%s: watchdog timeout\n", sc->ure_dev.dv_xname);
+   DPRINTF(("%s: watchdog timeout\n", sc->ure_dev.dv_xname));
+
+   for (i = 0; i < URE_TX_LIST_CNT; i++) {
+

Re: Improve ure(4) performance

2020-07-27 Thread Jonathon Fletcher
On Mon, Jul 27, 2020 at 07:13:33PM +, Mikolaj Kucharski wrote:
> On Mon, Jul 27, 2020 at 11:58:02AM -0700, Jonathon Fletcher wrote:
> > 
> > Are you ok using patch's -l option for this patch?
> 
> Sure, can do. New kernel compiled, will switch my machine tonight. Will
> let you know how things go. For meetings I would need couple of days, as
> I don't have Google Meet calls that often.

Mikolaj,

Great, thank you. I have attached the patch below with the whitespace intact.

Thanks,
Jonathon

Index: sys/dev/usb/if_urereg.h
===
RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v
retrieving revision 1.8
diff -u -p -u -r1.8 if_urereg.h
--- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 -   1.8
+++ sys/dev/usb/if_urereg.h 27 Jul 2020 17:10:09 -
@@ -494,28 +494,32 @@ struct ure_txpkt {
 #define URE_ENDPT_TX   1
 #define URE_ENDPT_MAX  2
 
-#defineURE_TX_LIST_CNT 8
+#defineURE_TX_LIST_CNT 1
 #defineURE_RX_LIST_CNT 1
-#defineURE_RX_BUF_ALIGNsizeof(uint64_t)
+#defineURE_TX_BUF_ALIGN4
+#defineURE_RX_BUF_ALIGN8
 
-#defineURE_TXBUFSZ 16384
-#defineURE_8152_RXBUFSZ16384
-#defineURE_8153_RXBUFSZ32768
+#defineURE_8152_TX_BUFSZ   16384
+#defineURE_8152_RX_BUFSZ   16384
+
+#defineURE_8153_TX_BUFSZ   16384
+#defineURE_8153_RX_BUFSZ   32768
 
 struct ure_chain {
struct ure_softc*uc_sc;
struct usbd_xfer*uc_xfer;
char*uc_buf;
-   struct mbuf *uc_mbuf;
-   int uc_accum;
-   int uc_idx;
+   uint32_tuc_buflen;
+   uint32_tuc_bufmax;
+   struct mbuf_listuc_mbuf_list;
+   SLIST_ENTRY(ure_chain)  ure_list;
+   uint8_t uc_idx;
 };
 
 struct ure_cdata {
-   struct ure_chaintx_chain[URE_TX_LIST_CNT];
-   struct ure_chainrx_chain[URE_RX_LIST_CNT];
-   int tx_prod;
-   int tx_cnt;
+   struct ure_chainure_rx_chain[URE_RX_LIST_CNT];
+   struct ure_chainure_tx_chain[URE_TX_LIST_CNT];
+   SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free;
 };
 
 struct ure_softc {
@@ -541,7 +545,7 @@ struct ure_softc {
 
struct timeval  ure_rx_notice;
int ure_rxbufsz;
-   int ure_tx_list_cnt;
+   int ure_txbufsz;
 
int ure_phyno;
 
Index: sys/dev/usb/if_ure.c
===
RCS file: /cvs/src/sys/dev/usb/if_ure.c,v
retrieving revision 1.16
diff -u -p -u -r1.16 if_ure.c
--- sys/dev/usb/if_ure.c10 Jul 2020 13:26:41 -  1.16
+++ sys/dev/usb/if_ure.c27 Jul 2020 17:10:39 -
@@ -117,11 +117,13 @@ void  ure_miibus_writereg(struct device 
 void   ure_lock_mii(struct ure_softc *);
 void   ure_unlock_mii(struct ure_softc *);
 
-inture_encap(struct ure_softc *, struct mbuf *);
+inture_encap_txpkt(struct mbuf *, char *, uint32_t);
+inture_encap_xfer(struct ifnet *, struct ure_softc *, struct 
ure_chain *);
 void   ure_rxeof(struct usbd_xfer *, void *, usbd_status);
 void   ure_txeof(struct usbd_xfer *, void *, usbd_status);
-inture_rx_list_init(struct ure_softc *);
-inture_tx_list_init(struct ure_softc *);
+inture_xfer_list_init(struct ure_softc *, struct ure_chain *,
+   uint32_t, int);
+void   ure_xfer_list_free(struct ure_softc *, struct ure_chain *, int);
 
 void   ure_tick_task(void *);
 void   ure_tick(void *);
@@ -625,12 +627,36 @@ void
 ure_watchdog(struct ifnet *ifp)
 {
struct ure_softc*sc = ifp->if_softc;
+   struct ure_chain*c;
+   usbd_status err;
+   int i, s;
+
+   ifp->if_timer = 0;
 
if (usbd_is_dying(sc->ure_udev))
return;
 
+   if ((ifp->if_flags & (IFF_RUNNING|IFF_UP)) != (IFF_RUNNING|IFF_UP))
+   return;
+
+   sc = ifp->if_softc;
+   s = splnet();
+
ifp->if_oerrors++;
-   printf("%s: watchdog timeout\n", sc->ure_dev.dv_xname);
+   DPRINTF(("watchdog timeout\n"));
+
+   for (i = 0; i < URE_TX_LIST_CNT; i++) {
+   c = >ure_cdata.ure_tx_chain[i];
+   if (ml_len(>uc_mbuf_list) > 0) {
+   usbd_get_xfer_status(c->uc_xfer, NULL, NULL, NULL,
+   );
+   ure_txeo

Re: Improve ure(4) performance

2020-07-27 Thread Jonathon Fletcher


> On Jul 27, 2020, at 11:47 AM, Mikolaj Kucharski  
> wrote:
> 
> On Mon, Jul 27, 2020 at 11:24:55AM -0700, Jonathon Fletcher wrote:
>> 
>> I have attached an updated patch - which includes Kevin's changes from
>> his earlier email - that also corrects the tx buffer sizes for RTL8153
>> / RTL8153B devices.
>> 
> 
> Unfortunately below diff fails to apply.
> 
> 2 out of 2 hunks failed
> 15 out of 15 hunks failed
> 
> I've tried on fresh cvs version of if_urereg.h and if_ure.c
> 
> I usually, for convenience take the diffs from marc.info <http://marc.info/>:
> 
> https://marc.info/?l=openbsd-tech=159587449018248=mbox 
> <https://marc.info/?l=openbsd-tech=159587449018248=mbox>

I appear to have a mangled whitespace problem with my email.

It applies successfully when I copy / paste the patch out of the url above and 
apply it from /usr/src with patch -p0 -l .

Are you ok using patch's -l option for this patch?

Thanks,
Jonathon



Re: Improve ure(4) performance

2020-07-27 Thread Jonathon Fletcher
On Sun, Jul 26, 2020 at 10:12 PM Mikolaj Kucharski
 wrote:
>
> On Sun, Jul 26, 2020 at 06:47:32PM -0700, Jonathon Fletcher wrote:
> > Mikolaj,
> >
> > Thank you for testing my patch.
> >
> > I am very interested to know what xhci (or ehci) hardware you have.
> >
> > I have the following:
> >
> > xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x03: msi, xHCI 
> > 1.0
> > usb0 at xhci0: USB revision 3.0
> > uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 
> > 3.00/1.00 addr 1
> > ure0 at uhub0 port 14 configuration 1 interface 0 "Realtek USB 
> > 10/100/1G/2.5G LAN" rev 3.20/30.00 addr 3
> > ure0: RTL8156 (0x7030), address 00:e0:4c:ab:64:5a
> >
> > My ure0 is https://www.amazon.com/gp/product/B07Z8S6PN4 
> > <https://www.amazon.com/gp/product/B07Z8S6PN4>
> >
> > I do not get any watchdog errors with this.
> >
> > Kevin has been testing my patch and giving me good feedback. He has seen 
> > some watchdog errors with an RTL8153B also.
> >
> > I am starting to suspect that I have the tx xfer size wrong (too big) for 
> > the 8153 / 8153B devices. I am also trying to eliminate the XHCI hardware 
> > as a cause.
> >
> > Please could you post your dmesg? At a minimum it would help me a lot to 
> > see your usb hardware.
> >
>
> Sure, no problem. I'm testing your patch with:
>
> https://www.amazon.co.uk/gp/product/B081YGDBG7/
>

[snip]

> xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi, xHCI 
> 1.0
> usb0 at xhci0: USB revision 3.0
> uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
> addr 1
> uhub1 at uhub0 port 3 configuration 1 interface 0 "VIA Labs, Inc. USB2.0 Hub" 
> rev 2.10/3.d3 addr 2
> uhub2 at uhub1 port 3 configuration 1 interface 0 "Terminus Technology USB 
> 2.0 Hub" rev 2.00/1.00 addr 3
> ure0 at uhub2 port 2 configuration 1 interface 0 "Realtek USB 10/100/1000 
> LAN" rev 2.10/30.00 addr 4
> ure0: RTL8153 (0x5c30), address 00:e0:4c:04:09:7d
> rgephy0 at ure0 phy 0: RTL8251 PHY, rev. 0

Thanks Mikolaj,

I believe that the intermittent watchdog messages on RTL8153 variants
are now resolved. The previous code defined 32kb buffer sizes for both
tx and rx for RTL8153. The rx buffer size does vary by device type but
the tx buffer size should be 16kb for all types.

The patch improves tx performance primarily by packing multiple
packets into the outgoing xfer buffer if / when this is possible. This
makes it possible to use the entire xfer buffer in a single tx xfer.
If there are enough packets in the outbound queue and the xfer buffer
is defined as larger than the device buffer then the device could be
sent a tx xfer too large for it - eventually resulting in the watchdog
routine getting called.

I have attached an updated patch - which includes Kevin's changes from
his earlier email - that also corrects the tx buffer sizes for RTL8153
/ RTL8153B devices.

Jonathon

Index: sys/dev/usb/if_urereg.h
===
RCS file: /cvs/src/sys/dev/usb/if_urereg.h,v
retrieving revision 1.8
diff -u -p -u -r1.8 if_urereg.h
--- sys/dev/usb/if_urereg.h 7 Dec 2019 08:45:28 - 1.8
+++ sys/dev/usb/if_urereg.h 27 Jul 2020 17:10:09 -
@@ -494,28 +494,32 @@ struct ure_txpkt {
 #define URE_ENDPT_TX 1
 #define URE_ENDPT_MAX 2

-#define URE_TX_LIST_CNT 8
+#define URE_TX_LIST_CNT 1
 #define URE_RX_LIST_CNT 1
-#define URE_RX_BUF_ALIGN sizeof(uint64_t)
+#define URE_TX_BUF_ALIGN 4
+#define URE_RX_BUF_ALIGN 8

-#define URE_TXBUFSZ 16384
-#define URE_8152_RXBUFSZ 16384
-#define URE_8153_RXBUFSZ 32768
+#define URE_8152_TX_BUFSZ 16384
+#define URE_8152_RX_BUFSZ 16384
+
+#define URE_8153_TX_BUFSZ 16384
+#define URE_8153_RX_BUFSZ 32768

 struct ure_chain {
  struct ure_softc *uc_sc;
  struct usbd_xfer *uc_xfer;
  char *uc_buf;
- struct mbuf *uc_mbuf;
- int uc_accum;
- int uc_idx;
+ uint32_t uc_buflen;
+ uint32_t uc_bufmax;
+ struct mbuf_list uc_mbuf_list;
+ SLIST_ENTRY(ure_chain)  ure_list;
+ uint8_t uc_idx;
 };

 struct ure_cdata {
- struct ure_chain tx_chain[URE_TX_LIST_CNT];
- struct ure_chain rx_chain[URE_RX_LIST_CNT];
- int tx_prod;
- int tx_cnt;
+ struct ure_chain ure_rx_chain[URE_RX_LIST_CNT];
+ struct ure_chain ure_tx_chain[URE_TX_LIST_CNT];
+ SLIST_HEAD(ure_list_head, ure_chain)ure_tx_free;
 };

 struct ure_softc {
@@ -541,7 +545,7 @@ struct ure_softc {

  struct timeval ure_rx_notice;
  int ure_rxbufsz;
- int ure_tx_list_cnt;
+ int ure_txbufsz;

  int ure_phyno;

Index: sys/dev/usb/if_ure.c
===
RCS file: /cvs/src/sys/dev/usb/if_ure.c,v
retrieving revision 1.16
diff -u -p -u -r1.16 if_ure.

Re: Improve ure(4) performance

2020-07-26 Thread Jonathon Fletcher



> On Jul 26, 2020, at 1:24 PM, Mikolaj Kucharski  wrote:
> 
> Hi Kevin, Jonathon,
> 
> On Tue, Jul 21, 2020 at 02:38:55PM +0800, Kevin Lo wrote:
>> Hi,
>> 
>> Jonathon Fletcher has been helping get the better performance out of ure(4).
>> I ran the tcpbench with ure (RTL8156) for 5 minutes:
>> 
>> 71538432760 bytes sent over 300.999 seconds
>> bandwidth min/avg/max/std-dev = 12.071/1901.448/1947.873/135.283 Mbps
>> 
>> A big thanks to Jonathon for his hard work.
> 
> I've tested this on amd64 with following ure(4) card:
> 
> ure0 at uhub2 port 2 configuration 1 interface 0 "Realtek USB 10/100/1000 
> LAN" rev 2.10/30.00 addr 4
> ure0: RTL8153 (0x5c30), address 00:e0:4c:04:09:7d
> 
> I could see speedtest-cli improvement from around 150 Mbit/s before the
> diff, to 245 Mbit/s after the diff.
> 
> Big thank you for this patch. Will run it for next couple of days.
> Before the diff I faced following errors with ure(4):
> 
> usbd_start_next: error=5
> ure0: watchdog timeout
> ure0: usb errors on rx: IOERROR
> 
> and unplug/re-plug dance was needed to bring back the network. That
> usually happend during Google Meet video call, so I need few days to see
> how this diff goes against web video calls.

Mikolaj,

Thank you for testing my patch.

I am very interested to know what xhci (or ehci) hardware you have.

I have the following:

xhci0 at pci0 dev 20 function 0 "Intel 9 Series xHCI" rev 0x03: msi, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
addr 1
ure0 at uhub0 port 14 configuration 1 interface 0 "Realtek USB 10/100/1G/2.5G 
LAN" rev 3.20/30.00 addr 3
ure0: RTL8156 (0x7030), address 00:e0:4c:ab:64:5a

My ure0 is https://www.amazon.com/gp/product/B07Z8S6PN4 
<https://www.amazon.com/gp/product/B07Z8S6PN4>

I do not get any watchdog errors with this.

Kevin has been testing my patch and giving me good feedback. He has seen some 
watchdog errors with an RTL8153B also.

I am starting to suspect that I have the tx xfer size wrong (too big) for the 
8153 / 8153B devices. I am also trying to eliminate the XHCI hardware as a 
cause.

Please could you post your dmesg? At a minimum it would help me a lot to see 
your usb hardware.

Thanks,
Jonathon