RE: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Hi Claudiu, > -Original Message- > From: Claudiu Beznea [mailto:claudiu.bez...@microchip.com] > Sent: Tuesday, August 7, 2018 2:21 PM > To: Harini Katakam ; Jennifer Dahm > > Cc: netdev@vger.kernel.org; David S . Miller ; Nathan > Sullivan ; Rafal Ozieblo ; > Harini Katakam ; Nicolas Ferre > > Subject: Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all > Zynq > > Hi Harini, > > On 01.08.2018 15:53, Harini Katakam wrote: > > Hi Jennifer, > > > > On Tue, Jun 5, 2018 at 10:21 AM, Harini Katakam wrote: > >> Hi Jeniffer, > >> > >> On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre > >> wrote: > >>> Jennifer, > >>> > >>> On 25/05/2018 at 23:44, Jennifer Dahm wrote: > >>>> > >>>> During testing, I discovered that the Zynq GEM hardware overwrites > >>>> all outgoing UDP packet checksums, which is illegal in packet > >>>> forwarding cases. This happens both with and without the > >>>> checksum-zeroing behavior introduced in > >>>> 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 > >>>> ("net: macb: initialize checksum when using checksum offloading"). > >>>> The only solution to both the small packet bug and the packet > >>>> forwarding bug that I can find is to disable TX checksum offloading > entirely. > >>> > >>> > >> > >> Thanks for the extensive testing. > >> I'll try to reproduce and see if it is something to be fixed in the driver. > >> > >>> Are the bugs listed above present in all revisions of the GEM IP, > >>> only for some revisions? > >>> Is there an errata that describe this issue for the Zynq GEM? > >> > >> @Nicolas, AFAIK, there is no errata for this in either Cadence or > >> Zynq documentation. > > > > I was unable to reproduce this issue on Zynq. > > Although I do not have HW with two GEM ports, I tried by routing one > > GEM via PL and another via on board RGMII. > > Since there was no specific errata related to this, I also tried on > > subsequent ZynqMP versions with multiple GEM ports but dint find any > > checksum issues. I discussed the same with cadence and they tried the > > test with 2 bytes of UDP payload on the Zynq GEM IP version in their > > regressions and did not hit any issue either. > > > > I tried to reach out earlier to see if you can share your exact > > application. Could you please let me know if you have any further > > updates? > > I manage to reproduce the issue and provide a patch for this (see patch 3/3 > from > [1]). > > [1] https://www.spinics.net/lists/netdev/msg513848.html Sorry, I missed your series. Thanks. Regards, Harini
Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Hi Harini, On 01.08.2018 15:53, Harini Katakam wrote: > Hi Jennifer, > > On Tue, Jun 5, 2018 at 10:21 AM, Harini Katakam wrote: >> Hi Jeniffer, >> >> On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre >> wrote: >>> Jennifer, >>> >>> On 25/05/2018 at 23:44, Jennifer Dahm wrote: During testing, I discovered that the Zynq GEM hardware overwrites all outgoing UDP packet checksums, which is illegal in packet forwarding cases. This happens both with and without the checksum-zeroing behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 ("net: macb: initialize checksum when using checksum offloading"). The only solution to both the small packet bug and the packet forwarding bug that I can find is to disable TX checksum offloading entirely. >>> >>> >> >> Thanks for the extensive testing. >> I'll try to reproduce and see if it is something to be fixed in the driver. >> >>> Are the bugs listed above present in all revisions of the GEM IP, only for >>> some revisions? >>> Is there an errata that describe this issue for the Zynq GEM? >> >> @Nicolas, AFAIK, there is no errata for this in either Cadence or >> Zynq documentation. > > I was unable to reproduce this issue on Zynq. > Although I do not have HW with two GEM ports, > I tried by routing one GEM via PL and another via on board RGMII. > Since there was no specific errata related to this, I also tried on > subsequent ZynqMP versions with multiple GEM ports but dint find any > checksum issues. I discussed the same with cadence and they > tried the test with 2 bytes of UDP payload on the Zynq GEM IP > version in their regressions and did not hit any issue either. > > I tried to reach out earlier to see if you can share your exact > application. Could you please let me know if you have any > further updates? I manage to reproduce the issue and provide a patch for this (see patch 3/3 from [1]). [1] https://www.spinics.net/lists/netdev/msg513848.html > > Regards, > Harini >
Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Hi Jennifer, On Tue, Jun 5, 2018 at 10:21 AM, Harini Katakam wrote: > Hi Jeniffer, > > On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre > wrote: >> Jennifer, >> >> On 25/05/2018 at 23:44, Jennifer Dahm wrote: >>> >>> During testing, I discovered that the Zynq GEM hardware overwrites all >>> outgoing UDP packet checksums, which is illegal in packet forwarding >>> cases. This happens both with and without the checksum-zeroing >>> behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 >>> ("net: macb: initialize checksum when using checksum offloading"). The >>> only solution to both the small packet bug and the packet forwarding >>> bug that I can find is to disable TX checksum offloading entirely. >> >> > > Thanks for the extensive testing. > I'll try to reproduce and see if it is something to be fixed in the driver. > >> Are the bugs listed above present in all revisions of the GEM IP, only for >> some revisions? >> Is there an errata that describe this issue for the Zynq GEM? > > @Nicolas, AFAIK, there is no errata for this in either Cadence or > Zynq documentation. I was unable to reproduce this issue on Zynq. Although I do not have HW with two GEM ports, I tried by routing one GEM via PL and another via on board RGMII. Since there was no specific errata related to this, I also tried on subsequent ZynqMP versions with multiple GEM ports but dint find any checksum issues. I discussed the same with cadence and they tried the test with 2 bytes of UDP payload on the Zynq GEM IP version in their regressions and did not hit any issue either. I tried to reach out earlier to see if you can share your exact application. Could you please let me know if you have any further updates? Regards, Harini
Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Hi Jeniffer, On Mon, Jun 4, 2018 at 8:35 PM, Nicolas Ferre wrote: > Jennifer, > > On 25/05/2018 at 23:44, Jennifer Dahm wrote: >> >> During testing, I discovered that the Zynq GEM hardware overwrites all >> outgoing UDP packet checksums, which is illegal in packet forwarding >> cases. This happens both with and without the checksum-zeroing >> behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 >> ("net: macb: initialize checksum when using checksum offloading"). The >> only solution to both the small packet bug and the packet forwarding >> bug that I can find is to disable TX checksum offloading entirely. > > Thanks for the extensive testing. I'll try to reproduce and see if it is something to be fixed in the driver. > Are the bugs listed above present in all revisions of the GEM IP, only for > some revisions? > Is there an errata that describe this issue for the Zynq GEM? @Nicolas, AFAIK, there is no errata for this in either Cadence or Zynq documentation. Regards, Harini
Re: [RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
Jennifer, On 25/05/2018 at 23:44, Jennifer Dahm wrote: During testing, I discovered that the Zynq GEM hardware overwrites all outgoing UDP packet checksums, which is illegal in packet forwarding cases. This happens both with and without the checksum-zeroing behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 ("net: macb: initialize checksum when using checksum offloading"). The only solution to both the small packet bug and the packet forwarding bug that I can find is to disable TX checksum offloading entirely. Are the bugs listed above present in all revisions of the GEM IP, only for some revisions? Is there an errata that describe this issue for the Zynq GEM? There's still the possibility that these bugs are actually with the driver software and not with the hardware. I've found several places where the checksum is set to 0x (the incorrect checksum found in small packets) when something goes wrong, and I can imagine a buggy driver writing over the checksum blindly when TX checksum offloading is enabled. I would like feedback on two things: 1. Is it possible that the two bugs described above are caused by the driver and not by the hardware? If so, where should I look to implicate the driver? Rafal, Do you want to comment on this part? 2. Is this a problem we care enough about to completely disable TX checksum offloading? I would prefer to keep it enabled if possible. I mean I'm not against such a workaround if the HW is not able to handle such cases but offloading a CPU is always good when we have "low end" processors like ARM9 at 200MHz like the older AT91 that use this driver... Here is the testing procedure I used to reproduce these bugs on my machine. Specifically, without this patchset, step 9 fails. Without 007e4ba3ee, step 8 also fails. 1. Set up the test environment: a. Acquire a Zynq device with two ethernet ports. This is the DUT. b. Acquire a USB-Ethernet adapter. c. Acquire two ethernet cables. d. Connect one Ethernet port on the DUT to your computer's network switch. e. Connect the other Ethernet port to the USB-Ethernet adapter and plug that adapter into your computer. f. Set up a Linux VM to send packets through the DUT. I recommend using a VM here so that you can easily detach it from the primary network to force outgoing traffic through the DUT. g. Set up a computer with a packet inspecting program to receive and inspect packets. This doesn't need to be a VM. For the purposes of this test, I'll be using a Windows instance with WireShark. 2. Load the kernel you want to test onto the DUT, making sure to include the `bridge` module. 3. Set up a bridge on the DUT. The following commands on the DUT should work, replacing `eth0` and `eth1` with the two ethernet interfaces on the DUT: ``` brctl addbr test brctl addif test eth0 eth1 ifconfig eth0 0.0.0.0 ifconfig eth1 0.0.0.0 dhclient test -v ``` 4. Disconnect the Linux VM from your host computer's network and connect it to the USB-Ethernet adapter in order to force outgoing network traffic through the DUT. If necessary, run dhclient on the Linux VM to acquire an IP address. 5. Ensure that you can reach your Windows instance from your Linux VM through the DUT (e.g. ping). 6. Start WireShark on your Windows instance and start monitoring traffic on a specific, unused port (e.g. 61557). 7. Using netcat, send a few not-tiny UDP packets from your Linux VM to your Windows instance to ensure that valid UDP packets are properly forwarded. Ex: ``` echo "hello world" | netcat -u 61557 ``` Inspect these packets to ensure that the data arrived intact and that the checksum looks reasonable (i.e. not 0x or 0x). 8. Using netcat, send a few tiny UDP packets (2 bytes or fewer) from Linux VM to your Windows instance to ensure that the checksum is reasonable. Ex: ``` echo "h" | netcat -u 61557 ``` 9. Using a custom program, send UDP packets with broken checksums (e.g. 0xABCD) from your Linux VM to your Windows instance. Inspect these packets with WireShark and make sure that the packet arrived with the same checksum you sent it with. For step 9, I wrote a C program using the Linux socket API that will send a properly formatted UDP packet with the payload "Hello!" and a (broken) checksum of 0xABCD to port 61557 on the host provided at the command line. I can send the full program if you would like, but here is the important part of it: ``` struct custom_udp { int16_t s_port; int16_t d_port; int16_t length; int16_t check; char data[]; }; int send_message(int sockfd, in_port_t port, const char *message) { struct custom_udp *frame; int16_t message_len; int16_t frame_len; int ret; message_len = strlen(message) * sizeof(char); frame_len =
[RFC PATCH 0/2] net: macb: Disable TX checksum offloading on all Zynq
During testing, I discovered that the Zynq GEM hardware overwrites all outgoing UDP packet checksums, which is illegal in packet forwarding cases. This happens both with and without the checksum-zeroing behavior introduced in 007e4ba3ee137f4700f39aa6dbaf01a71047c5f6 ("net: macb: initialize checksum when using checksum offloading"). The only solution to both the small packet bug and the packet forwarding bug that I can find is to disable TX checksum offloading entirely. There's still the possibility that these bugs are actually with the driver software and not with the hardware. I've found several places where the checksum is set to 0x (the incorrect checksum found in small packets) when something goes wrong, and I can imagine a buggy driver writing over the checksum blindly when TX checksum offloading is enabled. I would like feedback on two things: 1. Is it possible that the two bugs described above are caused by the driver and not by the hardware? If so, where should I look to implicate the driver? 2. Is this a problem we care enough about to completely disable TX checksum offloading? Here is the testing procedure I used to reproduce these bugs on my machine. Specifically, without this patchset, step 9 fails. Without 007e4ba3ee, step 8 also fails. 1. Set up the test environment: a. Acquire a Zynq device with two ethernet ports. This is the DUT. b. Acquire a USB-Ethernet adapter. c. Acquire two ethernet cables. d. Connect one Ethernet port on the DUT to your computer's network switch. e. Connect the other Ethernet port to the USB-Ethernet adapter and plug that adapter into your computer. f. Set up a Linux VM to send packets through the DUT. I recommend using a VM here so that you can easily detach it from the primary network to force outgoing traffic through the DUT. g. Set up a computer with a packet inspecting program to receive and inspect packets. This doesn't need to be a VM. For the purposes of this test, I'll be using a Windows instance with WireShark. 2. Load the kernel you want to test onto the DUT, making sure to include the `bridge` module. 3. Set up a bridge on the DUT. The following commands on the DUT should work, replacing `eth0` and `eth1` with the two ethernet interfaces on the DUT: ``` brctl addbr test brctl addif test eth0 eth1 ifconfig eth0 0.0.0.0 ifconfig eth1 0.0.0.0 dhclient test -v ``` 4. Disconnect the Linux VM from your host computer's network and connect it to the USB-Ethernet adapter in order to force outgoing network traffic through the DUT. If necessary, run dhclient on the Linux VM to acquire an IP address. 5. Ensure that you can reach your Windows instance from your Linux VM through the DUT (e.g. ping). 6. Start WireShark on your Windows instance and start monitoring traffic on a specific, unused port (e.g. 61557). 7. Using netcat, send a few not-tiny UDP packets from your Linux VM to your Windows instance to ensure that valid UDP packets are properly forwarded. Ex: ``` echo "hello world" | netcat -u 61557 ``` Inspect these packets to ensure that the data arrived intact and that the checksum looks reasonable (i.e. not 0x or 0x). 8. Using netcat, send a few tiny UDP packets (2 bytes or fewer) from Linux VM to your Windows instance to ensure that the checksum is reasonable. Ex: ``` echo "h" | netcat -u 61557 ``` 9. Using a custom program, send UDP packets with broken checksums (e.g. 0xABCD) from your Linux VM to your Windows instance. Inspect these packets with WireShark and make sure that the packet arrived with the same checksum you sent it with. For step 9, I wrote a C program using the Linux socket API that will send a properly formatted UDP packet with the payload "Hello!" and a (broken) checksum of 0xABCD to port 61557 on the host provided at the command line. I can send the full program if you would like, but here is the important part of it: ``` struct custom_udp { int16_t s_port; int16_t d_port; int16_t length; int16_t check; char data[]; }; int send_message(int sockfd, in_port_t port, const char *message) { struct custom_udp *frame; int16_t message_len; int16_t frame_len; int ret; message_len = strlen(message) * sizeof(char); frame_len = sizeof(struct custom_udp) + message_len; frame = malloc(frame_len); frame->s_port = htons(0); frame->d_port = htons(port); frame->length = htons(frame_len); frame->check = htons(0xABCD); memmove(frame->data, message, message_len); ret = write(sockfd, frame, frame_len); free(frame); return ret; } ``` Jennifer Dahm (1): net/macb: Disable TX checksum offloading on all Zynq-7000 drivers/net/ethernet/cadence/macb.h | 1 + drivers/net/ethernet/cadence/macb_main.c | 11 --- 2 files changed, 9