Re: [lwip-users] Transmission stall after ARP broadcast
On 20.02.2018 08:33, Stephan Hilchenbach wrote: Hello, this problem was not caused by the LwIP stack, but by the Ethernet driver. It was solved with the help of the Ti support: https://e2e.ti.com/support/arm/sitara_arm/f/791/t/663155 The address lookup engine (ALE) processes all received packets to determine which port(s) if any that the packet should the forwarded to. Configured as a switch, the port state of all ports was set to forwarding, even if a port was not connected. Setting the unconnected ports to blocking by default and setting them to forward after Phy connect, solved the problem. OK, good to know :-) Thanks for sharing the answer. Simon ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] Transmission stall after ARP broadcast
Hello, this problem was not caused by the LwIP stack, but by the Ethernet driver. It was solved with the help of the Ti support: https://e2e.ti.com/support/arm/sitara_arm/f/791/t/663155 The address lookup engine (ALE) processes all received packets to determine which port(s) if any that the packet should the forwarded to. Configured as a switch, the port state of all ports was set to forwarding, even if a port was not connected. Setting the unconnected ports to blocking by default and setting them to forward after Phy connect, solved the problem. Best Regards, Stephan ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] Transmission stall after ARP broadcast
>> So is there *any* communication from the device after this? Maybe on the 2nd >> port? After this incident receiving goes on. This means, the DMA handles incoming packets but does not accept any outgoing packets. The second port is not connected. The phy is not active. >> I don't know how the software on that processor is adapted to lwIP but >> strictly speaking, this is not a switch but 2 MACs connected via the PRUs. >> They might have bugs, too :-) The PRU's are not used. It's the CPSW_3G switch. What I found is, that I can avoid (so far) this problem, when I reduce the MDIO communication to the second port's phy to a minimum. Initially the driver tried to auto negotiate with the phy, even when there was no link. I changed this now and wait for a link first. I checked all code lines and in these procedures the LwIP is not involved. At this point I get confused. Somehow the LwIP is influenced by the phy communication and the state of the switch. Because there is no delay between the last sent data packet an the unexpected ARP broadcast, which is then the last transmission of the DMA. Probably the memory is overwritten somewhere. This problem also disappears as soon as the second port is connected too. >> Have you tried to debug what's going on in the processor after it stops? Yes I did. The integrated DMA (CPDMA) stops sending without showing any error. The DMA state is idle, but not responding to new descriptors. I created a thread in the Ti forum: https://e2e.ti.com/support/arm/sitara_arm/f/791/p/663155/2442288 The fact that this behavior is unknown indicates that my driver is doing something stupid. Best Regards, Stephan -Ursprüngliche Nachricht- Von: lwip-users [mailto:lwip-users-bounces+hilchenbach=ish-gmbh@nongnu.org] Im Auftrag von Simon Goldschmidt Gesendet: Freitag, 16. Februar 2018 08:55 An: lwip-users@nongnu.org Betreff: Re: [lwip-users] Transmission stall after ARP broadcast Stephan Hilchenbach wrote: >>> 1.4.1 is rather old. There have been numerous bugs fixed since then. > Yes I know. I would prefer to update, but I can't make this decision on my > own. Then talk to whoever is in a position to decide. From the pcap, I can't tell what's wrong. It does not seem like an lwIP issue, but maybe it is. If I were you, I'd first check with a newer version of lwIP before digging into the hardware drivers. Especially with that processor ;-) >>> No. In general, lwIP has *nothing* to do with your hardware. The netif >>> driver is responsible for that. > I expected this answer, but was not sure about this. It was very curious that > the port always stops with transmission of an ARP broadcast. > The device is configured as a switch with 2 ports. One port is connected, the > other one is not. A task cyclically checks the second port for phy activity. So is there *any* communication from the device after this? Maybe on the 2nd port? I don't know how the software on that processor is adapted to lwIP but strictly speaking, this is not a switch but 2 MACs connected via the PRUs. They might have bugs, too :-) Have you tried to debug what's going on in the processor after it stops? Simon ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users - E-Mail ist virenfrei. Von AVG uberpruft - www.avg.de Version: 2013.0.3556 / Virendatenbank: 4793/15405 - Ausgabedatum: 15.02.2018 ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] Transmission stall after ARP broadcast
Hello Simon, thank you for the fast response. >> 1.4.1 is rather old. There have been numerous bugs fixed since then. Yes I know. I would prefer to update, but I can't make this decision on my own. >> A screenshot? WTF? If this is a screenshot of a wireshark log, please attach >> a pcap file instead. Of course I have the log files. I was not sure if someone would look inside. In the Ti forum I get less attention. I attached 2 files. I cut about 100.000 frames to make them small. The device with LwIP has the IP 192.168.211 and my connected notebook the IP 192.168.1.31. >> No. In general, lwIP has *nothing* to do with your hardware. The netif >> driver is responsible for that. I expected this answer, but was not sure about this. It was very curious that the port always stops with transmission of an ARP broadcast. The device is configured as a switch with 2 ports. One port is connected, the other one is not. A task cyclically checks the second port for phy activity. Best Regards, Stephan Von: lwip-users [mailto:lwip-users-bounces+hilchenbach=ish-gmbh@nongnu.org] Im Auftrag von goldsi...@gmx.de Gesendet: Donnerstag, 15. Februar 2018 17:11 An: Mailing list for lwIP users Betreff: Re: [lwip-users] Transmission stall after ARP broadcast On 15.02.2018 16:05, Stephan Hilchenbach wrote: Hello Experts, I have a problem with my Ethernet driver connecting a Ti AM335x CPSW switch to the LwIP stack v1.4.1. 1.4.1 is rather old. There have been numerous bugs fixed since then. The port stops transmitting after some minutes or hours. The DMA hardware register shows no errors but the transmission stalled. The DMA does not process further packets, but the content of the next packets looks OK. When I run Wireshark I observe always the same sequence. Every time before the port stops transmission, the last packet sent was an ARP broadcast to the connected host "TexasIns_e4:2a:20 Broadcast ARP Who has 192.168.1.31? Tell 192.168.1.211". Curiously this is the only time the LwIP generates this request after connection was established. There are no other ARP broadcasts until the Tx stall. Attached is a screenshot. A screenshot? WTF? If this is a screenshot of a wireshark log, please attach a pcap file instead. I have two questions about this: 1. What is the reason for the LwIP to generate this ARP broadcast during transmission? Don't know. Attach a pcap including an explanation of which IPs we see, which device has which address and what they do. I'm too lazy to try to find that out myself. And I could be wrong. 2. Can the LwIP cause a hardware Tx port to stall (because of the packet content)? No. In general, lwIP has *nothing* to do with your hardware. The netif driver is responsible for that. Simon 2018-02-14_tx_fehler_4_cut.pcapng Description: 2018-02-14_tx_fehler_4_cut.pcapng 2018-02-14_tx_fehler_5_cut.pcapng Description: 2018-02-14_tx_fehler_5_cut.pcapng ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users
Re: [lwip-users] Transmission stall after ARP broadcast
On 15.02.2018 16:05, Stephan Hilchenbach wrote: Hello Experts, I have a problem with my Ethernet driver connecting a Ti AM335x CPSW switch to the LwIP stack v1.4.1. 1.4.1 is rather old. There have been numerous bugs fixed since then. The port stops transmitting after some minutes or hours. The DMA hardware register shows no errors but the transmission stalled. The DMA does not process further packets, but the content of the next packets looks OK. When I run Wireshark I observe always the same sequence. Every time before the port stops transmission, the last packet sent was an ARP broadcast to the connected host "TexasIns_e4:2a:20 Broadcast ARP Who has 192.168.1.31? Tell 192.168.1.211". Curiously this is the only time the LwIP generates this request after connection was established. There are no other ARP broadcasts until the Tx stall. Attached is a screenshot. A screenshot? WTF? If this is a screenshot of a wireshark log, please attach a pcap file instead. I have two questions about this: 1. What is the reason for the LwIP to generate this ARP broadcast during transmission? Don't know. Attach a pcap including an explanation of which IPs we see, which device has which address and what they do. I'm too lazy to try to find that out myself. And I could be wrong. 2. Can the LwIP cause a hardware Tx port to stall (because of the packet content)? No. In general, lwIP has *nothing* to do with your hardware. The netif driver is responsible for that. Simon ___ lwip-users mailing list lwip-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/lwip-users