Re: enable jumbos on newer re(4) devices

2015-02-20 Thread Stuart Henderson
On 2015/02/18 11:57, Jim Smith wrote:
 On Fri, Jan 23, 2015 at 11:26:53AM +1000, David Gwynne wrote:
  a compromise could be to advertise checksum offload to the stack,
  pass it on to the hardware for small frames but have the driver do
  it in software for the big ones?
 
 greetings,
 
 below are two diffs. the first allows re(4) chips to handle checksums
 in software for large packets. this allows the chip to advertise hardware
 checksums for regular packets and do it manually for jumbos, which the
 the hardware cannot do properly (at least for 8168D and 8168E chips, which
 i've tested).
 
 the second diff is the same as the previous jumbo diff i sent through,
 but does not disable hw csums for the 8168D and 8168E chips.
 
 the first will do nothing without the second, but the diff's goals
 are different enough that two make sense.
 
 thanks dlg@ for the original concept and for hammering square pegs into
 my round brain. feedback appreciated.

I'm not finding many re(4) machines to test it on here. I think I have a card
somewhere but haven't tracked it down yet. No regressions with this (non jumbo
capable) hardware..

re0 at pci2 dev 0 function 0 Realtek 8101E rev 0x02: RTL8102EL (0x2480), msi, 
address 00:23:8b:43:96:e6
rlphy0 at re0 phy 7: RTL8201L 10/100 PHY, rev. 1

..not that I would expect problems from reviewing the diffs.

OK for the first diff. It changes nothing for existing cases, and only
has an effect when both 1) second diff is applied and 2) mtu is raised.

Second diff I'm pretty much OK with but considering the time we're at in
the release cycle and the number of different re devices existing, it would
be nice to have more reports.



Re: enable jumbos on newer re(4) devices

2015-02-20 Thread David Gwynne

 On 20 Feb 2015, at 10:52 pm, Stuart Henderson st...@openbsd.org wrote:
 
 On 2015/02/18 11:57, Jim Smith wrote:
 On Fri, Jan 23, 2015 at 11:26:53AM +1000, David Gwynne wrote:
 a compromise could be to advertise checksum offload to the stack,
 pass it on to the hardware for small frames but have the driver do
 it in software for the big ones?
 
 greetings,
 
 below are two diffs. the first allows re(4) chips to handle checksums
 in software for large packets. this allows the chip to advertise hardware
 checksums for regular packets and do it manually for jumbos, which the
 the hardware cannot do properly (at least for 8168D and 8168E chips, which
 i've tested).
 
 the second diff is the same as the previous jumbo diff i sent through,
 but does not disable hw csums for the 8168D and 8168E chips.
 
 the first will do nothing without the second, but the diff's goals
 are different enough that two make sense.
 
 thanks dlg@ for the original concept and for hammering square pegs into
 my round brain. feedback appreciated.
 
 I'm not finding many re(4) machines to test it on here. I think I have a card
 somewhere but haven't tracked it down yet. No regressions with this (non jumbo
 capable) hardware..
 
 re0 at pci2 dev 0 function 0 Realtek 8101E rev 0x02: RTL8102EL (0x2480), 
 msi, address 00:23:8b:43:96:e6
 rlphy0 at re0 phy 7: RTL8201L 10/100 PHY, rev. 1
 
 ..not that I would expect problems from reviewing the diffs.
 
 OK for the first diff. It changes nothing for existing cases, and only
 has an effect when both 1) second diff is applied and 2) mtu is raised.
 
 Second diff I'm pretty much OK with but considering the time we're at in
 the release cycle and the number of different re devices existing, it would
 be nice to have more reports.

its working for me on landisk:

re0 at pci0 dev 0 function 0 Realtek 8139 rev 0x20: RTL8139C+ (0x7480), irq 
5, address 00:a0:b0:70:cc:8b
rlphy0 at re0 phy 0: RTL internal PHY

but yes, i agree regarding testing on more chips.



Re: enable jumbos on newer re(4) devices

2015-02-17 Thread Jim Smith
On Fri, Jan 23, 2015 at 11:26:53AM +1000, David Gwynne wrote:
 a compromise could be to advertise checksum offload to the stack,
 pass it on to the hardware for small frames but have the driver do
 it in software for the big ones?

greetings,

below are two diffs. the first allows re(4) chips to handle checksums
in software for large packets. this allows the chip to advertise hardware
checksums for regular packets and do it manually for jumbos, which the
the hardware cannot do properly (at least for 8168D and 8168E chips, which
i've tested).

the second diff is the same as the previous jumbo diff i sent through,
but does not disable hw csums for the 8168D and 8168E chips.

the first will do nothing without the second, but the diff's goals
are different enough that two make sense.

thanks dlg@ for the original concept and for hammering square pegs into
my round brain. feedback appreciated.

Index: re.c
===
RCS file: /cvs/src/sys/dev/ic/re.c,v
retrieving revision 1.175
diff -u -p -r1.175 re.c
--- re.c9 Feb 2015 03:09:57 -   1.175
+++ re.c18 Feb 2015 01:35:32 -
@@ -128,6 +128,8 @@
 #include net/if_media.h
 
 #include netinet/in.h
+#include netinet/ip.h
+#include netinet/ip_var.h
 #include netinet/if_ether.h
 
 #if NVLAN  0
@@ -194,6 +196,8 @@ voidre_setup_intr(struct rl_softc *, in
 intre_wol(struct ifnet*, int);
 #endif
 
+void   in_delayed_cksum(struct mbuf *);
+
 struct cfdriver re_cd = {
0, re, DV_IFNET
 };
@@ -1601,7 +1605,10 @@ int
 re_encap(struct rl_softc *sc, struct mbuf *m, int *idx)
 {
bus_dmamap_tmap;
+   struct mbuf *mp, mh;
int error, seg, nsegs, uidx, startidx, curidx, lastidx, pad;
+   int off;
+   struct ip   *ip;
struct rl_desc  *d;
u_int32_t   cmdstat, vlanctl = 0, csum_flags = 0;
struct rl_txq   *txq;
@@ -1618,6 +1625,27 @@ re_encap(struct rl_softc *sc, struct mbu
 * is requested.  Otherwise, RL_TDESC_CMD_TCPCSUM/
 * RL_TDESC_CMD_UDPCSUM does not take affect.
 */
+
+   if ((sc-rl_flags  RL_FLAG_JUMBOV2) 
+   m-m_pkthdr.len  RL_MTU 
+   (m-m_pkthdr.csum_flags 
+   (M_IPV4_CSUM_OUT|M_TCP_CSUM_OUT|M_UDP_CSUM_OUT)) != 0) {
+   mp = m_getptr(m, ETHER_HDR_LEN, off);
+   mh.m_flags = 0;
+   mh.m_data = mtod(mp, caddr_t) + off;
+   mh.m_next = mp-m_next;
+   mh.m_pkthdr.len = mp-m_pkthdr.len - ETHER_HDR_LEN;
+   mh.m_len = mp-m_len - off;
+   ip = (struct ip *)mh.m_data;
+
+   if (m-m_pkthdr.csum_flags  M_IPV4_CSUM_OUT)
+   ip-ip_sum = in_cksum(mh, sizeof(struct ip)); 
+   if (m-m_pkthdr.csum_flags  (M_TCP_CSUM_OUT|M_UDP_CSUM_OUT))
+   in_delayed_cksum(mh);
+
+   m-m_pkthdr.csum_flags =
+   ~(M_IPV4_CSUM_OUT|M_TCP_CSUM_OUT|M_UDP_CSUM_OUT);
+   }
 
if ((m-m_pkthdr.csum_flags 
(M_IPV4_CSUM_OUT|M_TCP_CSUM_OUT|M_UDP_CSUM_OUT)) != 0) {






Index: re.c
===
RCS file: /cvs/src/sys/dev/ic/re.c,v
retrieving revision 1.175
diff -u -p -r1.175 re.c
--- re.c9 Feb 2015 03:09:57 -   1.175
+++ re.c18 Feb 2015 01:41:03 -
@@ -171,6 +171,8 @@ voidre_watchdog(struct ifnet *);
 intre_ifmedia_upd(struct ifnet *);
 void   re_ifmedia_sts(struct ifnet *, struct ifmediareq *);
 
+void   re_set_jumbo(struct rl_softc *);
+
 void   re_eeprom_putbyte(struct rl_softc *, int);
 void   re_eeprom_getword(struct rl_softc *, int, u_int16_t *);
 void   re_read_eeprom(struct rl_softc *, caddr_t, int, int);
@@ -206,6 +208,10 @@ struct cfdriver re_cd = {
CSR_WRITE_1(sc, RL_EECMD,   \
CSR_READ_1(sc, RL_EECMD)  ~x)
 
+#define RL_FRAMELEN(mtu)   \
+   (mtu + ETHER_HDR_LEN + ETHER_CRC_LEN +  \
+   ETHER_VLAN_ENCAP_LEN)
+
 static const struct re_revision {
u_int32_t   re_chipid;
const char  *re_name;
@@ -1022,8 +1028,10 @@ re_attach(struct rl_softc *sc, const cha
 
/* Create DMA maps for RX buffers */
for (i = 0; i  RL_RX_DESC_CNT; i++) {
-   error = bus_dmamap_create(sc-sc_dmat, MCLBYTES, 1, MCLBYTES,
-   0, 0, sc-rl_ldata.rl_rxsoft[i].rxs_dmamap);
+   error = bus_dmamap_create(sc-sc_dmat,
+   RL_FRAMELEN(sc-rl_max_mtu), 1,
+   RL_FRAMELEN(sc-rl_max_mtu), 0, 0,
+   sc-rl_ldata.rl_rxsoft[i].rxs_dmamap);
if (error) {
printf(%s: can't create DMA map for RX\n,
sc-sc_dev.dv_xname);
@@ -1038,8 +1046,7 @@ re_attach(struct rl_softc *sc, const cha
ifp-if_ioctl = re_ioctl;

Re: enable jumbos on newer re(4) devices

2015-01-28 Thread Jim Smith
On Thu, Jan 22, 2015 at 08:52:25PM +, Stuart Henderson wrote:
 some minor things I spotted:
 
 slight KNF issues, some of the new lines are  80 columns.
 
 + case RL_HWREV_8168E:
 + CSR_WRITE_1(sc, RL_CFG4, CSR_READ_1(sc, RL_CFG4) | 0x01);
 + break;
 + default:
 + CSR_WRITE_1(sc, RL_CFG4, CSR_READ_1(sc, RL_CFG4) | 
 RL_CFG4_JUMBO_EN1);
 + break;
 
 there's a new #define for one of these but not the other; probably
 would be preferable to a magic 0x01

ok, fixed.

 
 + if ((sc-rl_flags  RL_FLAG_JUMBOV2) != 0) {
 + ifp-if_capabilities = ~(IFCAP_CSUM_IPv4);
 + re_set_jumbo(sc);
 + }
 
 wondering...would it be possible to only disable checksum offload iff
 the user has configured a large mtu?

something along the lines of dlg's solution seems reasonable, but should
probably come in a separate diff. for now i've added the 8168/8111D/E
chips to the broken tx checksum list rather than having it sitting
in that chunk as a special snowflake.

thanks for the feedback, diff below.


Index: re.c
===
RCS file: /cvs/src/sys/dev/ic/re.c,v
retrieving revision 1.173
diff -u -r1.173 re.c
--- re.c21 Jan 2015 10:00:42 -  1.173
+++ re.c28 Jan 2015 12:04:49 -
@@ -171,6 +171,8 @@
 intre_ifmedia_upd(struct ifnet *);
 void   re_ifmedia_sts(struct ifnet *, struct ifmediareq *);
 
+void   re_set_jumbo(struct rl_softc *);
+
 void   re_eeprom_putbyte(struct rl_softc *, int);
 void   re_eeprom_getword(struct rl_softc *, int, u_int16_t *);
 void   re_read_eeprom(struct rl_softc *, caddr_t, int, int);
@@ -206,6 +208,10 @@
CSR_WRITE_1(sc, RL_EECMD,   \
CSR_READ_1(sc, RL_EECMD)  ~x)
 
+#define RL_FRAMELEN(mtu)   \
+   (mtu + ETHER_HDR_LEN + ETHER_CRC_LEN +  \
+   ETHER_VLAN_ENCAP_LEN)
+
 static const struct re_revision {
u_int32_t   re_chipid;
const char  *re_name;
@@ -1022,8 +1028,10 @@
 
/* Create DMA maps for RX buffers */
for (i = 0; i  RL_RX_DESC_CNT; i++) {
-   error = bus_dmamap_create(sc-sc_dmat, MCLBYTES, 1, MCLBYTES,
-   0, 0, sc-rl_ldata.rl_rxsoft[i].rxs_dmamap);
+   error = bus_dmamap_create(sc-sc_dmat,
+   RL_FRAMELEN(sc-rl_max_mtu), 1,
+   RL_FRAMELEN(sc-rl_max_mtu), 0, 0,
+   sc-rl_ldata.rl_rxsoft[i].rxs_dmamap);
if (error) {
printf(%s: can't create DMA map for RX\n,
sc-sc_dev.dv_xname);
@@ -1038,8 +1046,7 @@
ifp-if_ioctl = re_ioctl;
ifp-if_start = re_start;
ifp-if_watchdog = re_watchdog;
-   if ((sc-rl_flags  RL_FLAG_JUMBOV2) == 0)
-   ifp-if_hardmtu = sc-rl_max_mtu;
+   ifp-if_hardmtu = sc-rl_max_mtu;
IFQ_SET_MAXLEN(ifp-if_snd, RL_TX_QLEN);
IFQ_SET_READY(ifp-if_snd);
 
@@ -1054,6 +1061,9 @@
case RL_HWREV_8168C:
case RL_HWREV_8168C_SPIN2:
case RL_HWREV_8168CP:
+   /* 8186/8111D/E seem to have a similar problem. */
+   case RL_HWREV_8168D:
+   case RL_HWREV_8168E:
break;
default:
ifp-if_capabilities |= IFCAP_CSUM_IPv4;
@@ -1159,7 +1169,7 @@
u_int32_t   cmdstat;
int error, idx;
 
-   m = MCLGETI(NULL, M_DONTWAIT, NULL, MCLBYTES);
+   m = MCLGETI(NULL, M_DONTWAIT, NULL, RL_FRAMELEN(sc-rl_max_mtu));
if (!m)
return (ENOBUFS);
 
@@ -1168,7 +1178,7 @@
 * alignment so that the frame payload is
 * longword aligned on strict alignment archs.
 */
-   m-m_len = m-m_pkthdr.len = RE_RX_DESC_BUFLEN;
+   m-m_len = m-m_pkthdr.len = RL_FRAMELEN(sc-rl_max_mtu);
m-m_data += RE_ETHER_ALIGN;
 
idx = sc-rl_ldata.rl_rx_prodidx;
@@ -1306,8 +1316,12 @@
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(sc-sc_dmat, rxs-rxs_dmamap);
 
-   if (!(rxstat  RL_RDESC_STAT_EOF)) {
-   m-m_len = RE_RX_DESC_BUFLEN;
+   if ((sc-rl_flags  RL_FLAG_JUMBOV2) != 0 
+   (rxstat  (RL_RDESC_STAT_SOF | RL_RDESC_STAT_EOF)) !=
+   (RL_RDESC_STAT_SOF | RL_RDESC_STAT_EOF)) {
+   continue;
+   } else if (!(rxstat  RL_RDESC_STAT_EOF)) {
+   m-m_len = RL_FRAMELEN(sc-rl_max_mtu);
if (sc-rl_head == NULL)
sc-rl_head = sc-rl_tail = m;
else {
@@ -1341,8 +1355,9 @@
 * if total_len  2^13-1, both _RXERRSUM and _GIANT will be
 * set, but if CRC is clear, it will still be a valid frame.
 */
-   if (rxstat  RL_RDESC_STAT_RXERRSUM  !(total_len  8191 
-   

Re: enable jumbos on newer re(4) devices

2015-01-22 Thread David Gwynne

 On 23 Jan 2015, at 06:52, Stuart Henderson st...@openbsd.org wrote:
 
 On 2015/01/22 10:09, David Gwynne wrote:
 
 On 21 Jan 2015, at 23:49, Brad Smith b...@comstyle.com wrote:
 
 On 01/21/15 06:51, Jim Smith wrote:
 hi all,
 
 the below diff enables support for jumbo frames on
 some newer re(4) devices. i've tested it on 8186D/8111D
 and 8186E/8111E chips, which are both able to do 9k
 jumbos. it seems to provide a significant speed-up on
 simple file transfer tests. most of the important parts
 were taken from the freebsd re(4) driver.
 
 feedback welcome, hope this helps someone.
 
 I had started looking into adding support for jumbos to re(4)
 but what I have so far does not work; no crashing but the
 interface does not receive any packets. What you posted still
 needs some work. I'll see if I can get back to it soon-ish but
 I have put it aside as it was not intended to be a priority for
 the next release.
 
 thats not very constructive feedback, especially if you dont have to time to 
 tweak the code.
 
 if you could explain what needs work, maybe jim could have a go at it. he's 
 already spent the time to get it this far, and he said feedback is welcome.
 
 dlg
 
 
 
 some minor things I spotted:
 
 slight KNF issues, some of the new lines are  80 columns.
 
 + case RL_HWREV_8168E:
 + CSR_WRITE_1(sc, RL_CFG4, CSR_READ_1(sc, RL_CFG4) | 0x01);
 + break;
 + default:
 + CSR_WRITE_1(sc, RL_CFG4, CSR_READ_1(sc, RL_CFG4) | 
 RL_CFG4_JUMBO_EN1);
 + break;
 
 there's a new #define for one of these but not the other; probably
 would be preferable to a magic 0x01
 
 + if ((sc-rl_flags  RL_FLAG_JUMBOV2) != 0) {
 + ifp-if_capabilities = ~(IFCAP_CSUM_IPv4);
 + re_set_jumbo(sc);
 + }
 
 wondering...would it be possible to only disable checksum offload iff
 the user has configured a large mtu?

of course. it's only code. id rather not hang a behaviour change like that 
off the mtu though.

a compromise could be to advertise checksum offload to the stack, pass it on to 
the hardware for small frames but have the driver do it in software for the big 
ones?



Re: enable jumbos on newer re(4) devices

2015-01-21 Thread David Gwynne

 On 21 Jan 2015, at 23:49, Brad Smith b...@comstyle.com wrote:
 
 On 01/21/15 06:51, Jim Smith wrote:
 hi all,
 
 the below diff enables support for jumbo frames on
 some newer re(4) devices. i've tested it on 8186D/8111D
 and 8186E/8111E chips, which are both able to do 9k
 jumbos. it seems to provide a significant speed-up on
 simple file transfer tests. most of the important parts
 were taken from the freebsd re(4) driver.
 
 feedback welcome, hope this helps someone.
 
 I had started looking into adding support for jumbos to re(4)
 but what I have so far does not work; no crashing but the
 interface does not receive any packets. What you posted still
 needs some work. I'll see if I can get back to it soon-ish but
 I have put it aside as it was not intended to be a priority for
 the next release.

thats not very constructive feedback, especially if you dont have to time to 
tweak the code.

if you could explain what needs work, maybe jim could have a go at it. he's 
already spent the time to get it this far, and he said feedback is welcome.

dlg