Re: [lwip-users] lwIP 1.4.1 stable tcp connection stall

2017-08-09 Thread Bill Auerbach
goldsi...@gmx.de writes:

Maybe. As a starter, you should disable everything you can in tcp_output 
(TCP_OUTPUT_DEBUG, TCP_CWND_DEBUG). Then once the error happens, try dumping 
the lwip_stats plus the whole tcp_pcb that doesn't send any more. There's no 
function to do that, so you'll have to write that on your own. I mean every 
member of the tcp_pcb plus all "interesting" members of lists like unsent, 
unacked etc.

Below is debug output for TCP up until it fails (minus a lot more ahead of that 
that was OK).  The last 4 lines repeat forever. Our application disconnects and 
reconnects after 5 seconds of no response which unfortunately has kept 
customers at bay because the system does recover on its own.  We make 
commercial ink jet printers so if the customer is printing it's bad - even 
worse with a 500 lb paper roll.  The only good thing is only one customer is 
seeing disconnects hourly or more.  I get them in less than 5 minutes which is 
great for testing.

tcp_receive: pcb->rttest 467 rtseq 44078 ackno 44410
tcp_receive: experienced rtt 1 ticks (500 msec).
tcp_receive: RTO 4 (2000 milliseconds)
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, seg == NULL, ack 44410
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, seg == NULL, ack 44410
tcp_write(pcb=0x008449f8, data=0x0083ae20, len=332, apiflags=0)
tcp_write: queuelen: 0
tcp_write: queueing 44410:44742
tcp_write: 2 (after enqueued)
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, effwnd 332, seq 44410, ack 
44410
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, effwnd 332, seq 44410, ack 
44410, i 0
tcp_output_segment: rtseq 44410
tcp_output_segment: 44410:44742
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, seg == NULL, ack 44410
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 63244, cwnd 23364, wnd 23364, seg == NULL, ack 44410
tcp_receive: window update 62912
tcp_receive: congestion avoidance cwnd 23455
tcp_receive: queuelen 2 ... 0 (after freeing unacked)
tcp_receive: pcb->rttest 469 rtseq 44410 ackno 44742
tcp_receive: experienced rtt 1 ticks (500 msec).
tcp_receive: RTO 4 (2000 milliseconds)
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, seg == NULL, ack 44742
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, seg == NULL, ack 44742
tcp_write(pcb=0x008449f8, data=0x0083af6c, len=332, apiflags=0)
tcp_write: queuelen: 0
tcp_write: queueing 44742:45074
tcp_write: 2 (after enqueued)
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, effwnd 332, seq 44742, ack 
44742
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, effwnd 332, seq 44742, ack 
44742, i 0
tcp_output_segment: rtseq 44742
tcp_output_segment: 44742:45074
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, seg == NULL, ack 44742
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 62912, cwnd 23455, wnd 23455, seg == NULL, ack 44742
tcp_receive: window update 64240
tcp_receive: congestion avoidance cwnd 23545
tcp_receive: queuelen 2 ... 0 (after freeing unacked)
tcp_receive: pcb->rttest 471 rtseq 44742 ackno 45074
tcp_receive: experienced rtt 1 ticks (500 msec).
tcp_receive: RTO 4 (2000 milliseconds)
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, seg == NULL, ack 45074
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, seg == NULL, ack 45074
tcp_write(pcb=0x008449f8, data=0x0083b0b8, len=332, apiflags=0)
tcp_write: queuelen: 0
tcp_write: queueing 45074:45406
tcp_write: 2 (after enqueued)
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, effwnd 332, seq 45074, ack 
45074
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, effwnd 332, seq 45074, ack 
45074, i 0
tcp_output_segment: rtseq 45074
tcp_output_segment: 45074:45406
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, seg == NULL, ack 45074
tcp_slowtmr: processing active pcb
tcp_slowtmr: polling application
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 64240, cwnd 23545, wnd 23545, seg == NULL, ack 45074
tcp_receive: window update 63908
tcp_receive: congestion avoidance cwnd 23635
tcp_receive: queuelen 2 ... 0 (after freeing unacked)
tcp_receive: pcb->rttest 473 rtseq 45074 ackno 45406
tcp_receive: experienced rtt 1 ticks (500 msec).
tcp_receive: RTO 4 (2000 milliseconds)
tcp_output: nothing to send (0x)
tcp_output: snd_wnd 63908, cwnd 23635, wnd 23635, seg == 

Re: [lwip-users] IPv6 multicast support by Socket API?

2017-08-09 Thread Dirk Ziegelmeier
Andrey,

can you try my patch and tell me if it works? If not, please try to debug
it, I was not able to test it at home.

TODO: Correctly unregister at socket close, try to fix code duplication
with IGMP.

Ciao
Dirk
diff --git a/src/api/sockets.c b/src/api/sockets.c
index a50faf5..21f6481 100644
--- a/src/api/sockets.c
+++ b/src/api/sockets.c
@@ -59,7 +59,9 @@
 #include "lwip/udp.h"
 #include "lwip/memp.h"
 #include "lwip/pbuf.h"
+#include "lwip/netif.h"
 #include "lwip/priv/tcpip_priv.h"
+#include "lwip/mld6.h"
 #if LWIP_CHECKSUM_ON_COPY
 #include "lwip/inet_chksum.h"
 #endif
@@ -234,12 +236,12 @@
 #endif /* LWIP_IPV4 */
 };
 
-#if LWIP_IGMP
 /* Define the number of IPv4 multicast memberships, default is one per socket */
 #ifndef LWIP_SOCKET_MAX_MEMBERSHIPS
 #define LWIP_SOCKET_MAX_MEMBERSHIPS NUM_SOCKETS
 #endif
 
+#if LWIP_IGMP
 /* This is to keep track of IP_ADD_MEMBERSHIP calls to drop the membership when
a socket is closed */
 struct lwip_socket_multicast_pair {
@@ -256,6 +258,25 @@
 static int  lwip_socket_register_membership(int s, const ip4_addr_t *if_addr, const ip4_addr_t *multi_addr);
 static void lwip_socket_unregister_membership(int s, const ip4_addr_t *if_addr, const ip4_addr_t *multi_addr);
 static void lwip_socket_drop_registered_memberships(int s);
+#endif /* LWIP_IGMP */
+
+#if LWIP_IPV6_MLD
+/* This is to keep track of IP_JOIN_GROUP calls to drop the membership when
+   a socket is closed */
+struct lwip_socket_multicast_mld6_pair {
+  /** the socket */
+  struct lwip_sock* sock;
+  /** the interface index */
+  unsigned int if_idx;
+  /** the group address */
+  ip6_addr_t multi_addr;
+};
+
+struct lwip_socket_multicast_mld6_pair socket_ipv6_multicast_memberships[LWIP_SOCKET_MAX_MEMBERSHIPS];
+
+static int  lwip_socket_register_mld6_membership(int s, unsigned int if_idx, const ip6_addr_t *multi_addr);
+static void lwip_socket_unregister_mld6_membership(int s, unsigned int if_idx, const ip6_addr_t *multi_addr);
+static void lwip_socket_drop_registered_mld6_memberships(int s);
 #endif /* LWIP_IGMP */
 
 /** The global array of available sockets */
@@ -709,6 +730,10 @@
   /* drop all possibly joined IGMP memberships */
   lwip_socket_drop_registered_memberships(s);
 #endif /* LWIP_IGMP */
+#if LWIP_IPV6_MLD
+  /* drop all possibly joined MLD6 memberships */
+  lwip_socket_drop_registered_mld6_memberships(s);
+#endif /* LWIP_IPV6_MLD */
 
   err = netconn_delete(sock->conn);
   if (err != ERR_OK) {
@@ -3053,7 +3078,6 @@
 case IP_DROP_MEMBERSHIP:
   {
 /* If this is a TCP or a RAW socket, ignore these options. */
-/* @todo: assign membership to this socket so that it is dropped when closing the socket */
 err_t igmp_err;
 const struct ip_mreq *imr = (const struct ip_mreq *)optval;
 ip4_addr_t if_addr;
@@ -3079,6 +3103,41 @@
   }
   break;
 #endif /* LWIP_IGMP */
+#if LWIP_IPV6_MLD
+case IPV6_JOIN_GROUP:
+case IPV6_LEAVE_GROUP:
+  {
+/* If this is a TCP or a RAW socket, ignore these options. */
+err_t mld6_err;
+struct netif *netif;
+ip6_addr_t multi_addr;
+const struct ipv6_mreq *imr = (const struct ipv6_mreq *)optval;
+LWIP_SOCKOPT_CHECK_OPTLEN_CONN_PCB_TYPE(sock, optlen, struct ipv6_mreq, NETCONN_UDP);
+inet6_addr_to_ip6addr(_addr, >ipv6mr_multiaddr);
+netif = netif_get_by_index(imr->ipv6mr_interface);
+if (netif == NULL) {
+  err = EADDRNOTAVAIL;
+  break;
+}
+
+if (optname == IPV6_JOIN_GROUP) {
+  if (!lwip_socket_register_mld6_membership(s, imr->ipv6mr_interface, _addr)) {
+/* cannot track membership (out of memory) */
+err = ENOMEM;
+mld6_err = ERR_OK;
+  } else {
+mld6_err = mld6_joingroup_netif(netif, _addr);
+  }
+} else {
+  mld6_err = mld6_leavegroup_netif(netif, _addr);
+  lwip_socket_unregister_mld6_membership(s, imr->ipv6mr_interface, _addr);
+}
+if (mld6_err != ERR_OK) {
+  err = EADDRNOTAVAIL;
+}
+  }
+  break;
+#endif /* LWIP_IPV6_MLD */
 default:
   LWIP_DEBUGF(SOCKETS_DEBUG, ("lwip_setsockopt(%d, IPPROTO_IP, UNIMPL: optname=0x%x, ..)\n",
   s, optname));
@@ -3558,4 +3617,98 @@
   done_socket(sock);
 }
 #endif /* LWIP_IGMP */
+
+#if LWIP_IPV6_MLD
+/** Register a new MLD6 membership. On socket close, the membership is dropped automatically.
+ *
+ * ATTENTION: this function is called from tcpip_thread (or under CORE_LOCK).
+ *
+ * @return 1 on success, 0 on failure
+ */
+static int
+lwip_socket_register_mld6_membership(int s, unsigned int if_idx, const ip6_addr_t *multi_addr)
+{
+  struct lwip_sock *sock = get_socket(s);
+  int i;
+
+  if (!sock) {
+return 0;
+  }
+
+  for (i = 0; i < LWIP_SOCKET_MAX_MEMBERSHIPS; i++) {
+if (socket_ipv6_multicast_memberships[i].sock == NULL) {
+  socket_ipv6_multicast_memberships[i].sock   = sock;

Re: [lwip-users] lwIP 1.4.1 stable tcp connection stall

2017-08-09 Thread goldsi...@gmx.de

Bill Auerbach wrote:
I can debug some, yes, but I’m no expert on this part of lwIP or TCP. 
 So I need help in what information is best to gather. I can pause 
when the sending stops and get to these variables. Would output from 
LWIP_DEBUG_ONs (whichever ones are pertinent) show you anything about 
what’s happening?


Maybe. As a starter, you should disable everything you can in tcp_output 
(TCP_OUTPUT_DEBUG, TCP_CWND_DEBUG). Then once the error happens, try 
dumping the lwip_stats plus the whole tcp_pcb that doesn't send any 
more. There's no function to do that, so you'll have to write that on 
your own. I mean every member of the tcp_pcb plus all "interesting" 
members of lists like unsent, unacked etc.


Oh, and do you have TCP_OVERSIZE enabled? I recall the early versions 
having bugs, although I'm not sure they could result in the behaviour 
you're seeing...


It got some nervous here when I said there were probably a hundred 
changes in 2.x.


I can understand that! The step from 1.4.1 to 2.0.x is a large one. On 
the other hand, 2.0.2 really has many bugs fixed, so it'll definitively 
be the more stable version. But release tests are always reuired, of course.


Simon
___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] lwIP 1.4.1 stable tcp connection stall

2017-08-09 Thread Bill Auerbach
Hi Simon,

I see a zero window persist timer bugfix entry in 1.4.1.

I can ping-pong 2.0.2 and 1.4.1 and make the error come and go. At first on the 
NIOS II but being slower and harder to debug I just got the PowerPC converted 
to 2.0.2 and it’s working there too (as expected).  The PowerPC allows me 
faster debugging and much more horsepower to output debug information.  And I 
can get the debug output in Telnet.  I can debug some, yes, but I’m no expert 
on this part of lwIP or TCP.  So I need help in what information is best to 
gather. I can pause when the sending stops and get to these variables.  Would 
output from LWIP_DEBUG_ONs (whichever ones are pertinent) show you anything 
about what’s happening?

Thank you for replying. Your dedication to this product is evident by your 
interest in knowing what fixed something in an older version. Due to time I may 
have to push a 2.0.2-based release out.  Do to the seriousness I could see you 
pushing a DIY code change for users to update 1.4.1 code to reduce the concern 
of a major update.  It got some nervous here when I said there were probably a 
hundred changes in 2.x. But I’ve already run a test with 3 devices using 30M 
packets and I see no differences from 1.4.1. I will run overnight as well.

Bill

From: lwip-users 
[mailto:lwip-users-bounces+bauerbach=arrayonline@nongnu.org] On Behalf Of 
goldsimon
Sent: Wednesday, August 09, 2017 10:45 AM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] lwIP 1.4.1 stable tcp connection stall

Hey Bill,

Great to hear 2.0.2 fixes this. I'm a bit lost thinking about a bug fix. We had 
some since then I guess, and I would have to dig through the log myself.
However, the most obvious would be a zero window where the persist timer 
doesn't start or somehow doesn't work correctly.

Are you able to reproduce this now? Can you debug it? Would you need help in 
debugging?

I'm not at my PC very often right now, but I'll see if I find something. Maybe 
you can dump the PCB in question once it stops plus lwip_stats, that could help 
finding the issue.

Simon
___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] time out on last ack

2017-08-09 Thread Mike Rosing

At one point I set up an increment of 10 seconds per check, so 10, 20 , 30 ... and it would never reconnect even after a minute.  For my purposes the "REUSE" was important - same IP address, same port.  And re-establishing connection quickly (within 10 seconds) was also important.  It did not seem to matter how long I waited, the "INUSE" error always came up and it confused me because the SO_REUSE compile flag was set.  Now it does reuse it after 1 second wait and using tcp_abandon().  I'm happy!MikeOn August 9, 2017 at 9:48 AM goldsimon  wrote:Mike Rosing wrote: > After searching a while I see that the LwIP is waiting for LAST_ACK, but the server never sends it.  I would have expected lwip to time out this PCB eventually (although that can need quite some time). How long did you wait?  Simon___lwip-users mailing listlwip-users@nongnu.orghttps://lists.nongnu.org/mailman/listinfo/lwip-users
 

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] time out on last ack

2017-08-09 Thread goldsimon
Mike Rosing wrote:
> After searching a while I see that the LwIP is waiting for LAST_ACK, but the 
> server never sends it.

I would have expected lwip to time out this PCB eventually (although that can 
need quite some time). How long did you wait?

Simon___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] lwIP 1.4.1 stable tcp connection stall

2017-08-09 Thread goldsimon
Hey Bill,

Great to hear 2.0.2 fixes this. I'm a bit lost thinking about a bug fix. We had 
some since then I guess, and I would have to dig through the log myself.
However, the most obvious would be a zero window where the persist timer 
doesn't start or somehow doesn't work correctly.

Are you able to reproduce this now? Can you debug it? Would you need help in 
debugging?

I'm not at my PC very often right now, but I'll see if I find something. Maybe 
you can dump the PCB in question once it stops plus lwip_stats, that could help 
finding the issue.

Simon

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] lwIP 1.4.1 stable tcp connection stall

2017-08-09 Thread Bill Auerbach
Hello group,

First, thanks to everyone for the continued development and support of lwIP - 
it been great to see it so active the past few years.  This purpose of this 
message is to notify lwIP 1.4.1 stable users of a problem, and to see if anyone 
(i.e. developers) knows the bug report that would have resolved this.  The 
problem is, based on network traffic that I have been unable to pinpoint, the 
TCP outbound communication stalls.  I am unsure in the debug logs what lwIP is 
trying to do with each tcp_output, but nothing goes out the wire.  Packets come 
and go from the device as I can still open telnet, ping, and use a UDP protocol 
on the device.  We use NO_SYS=1, a cooperative multi-tasking system, UDP and a 
single TCP connection.  1.4.1 was great for over 5 years.

We install systems on a local subnet (only a PC NIC and lwIP/Lantronix devices 
- anywhere from 2 to 10).  A critical customer has been complaining for months 
about our devices disconnecting.  We report a disconnect error when we stop 
getting repeating status messages back from all devices.  I'd heard of this 
occurring intermittently over the years and we always wrote it off as 
electrical problems since we're usually in a noisy environment.  Until by 
chance, I connected my local subnet switch to our corporate network and I was 
seeing disconnects on all lwIP devices I have connected. I don't know why.  
This customer must have the same traffic on the subnet that I see on the 
corporate network.

The first thing I did was upgrade to 2.0.2.  Other than very few minor changes, 
everything builds and runs.  The TCP send stalls are gone.  I went back to lwIP 
1.4.1 and they came back.  Good, I had a test and a solution.  We decided here 
the best approach is to try to patch 1.4.1 with the fix for this for the 
critical customer and then use a controlled rollout and test plan for lwIP 
2.0.2 which means updating 9 of our lwIP devices.  I spent about half a day 
checking the CHANGELOG and trying a few patches in the bug reports mentioning 
TCP and no change I made resolved the problem.  The one mention for TCP 
stalling was with a new scaling window feature in lwIP 2.x.  I would have 
thought a bug-fix regarding stalled TCP sends would be easy to find in the list 
- this is a big deal in a TCP/IP stack.

My question to developers is, does anyone recall a change that resolved TCP 
send stalling?  And a note to lwIP 1.4.1 stable users - you should update to 
2.x.

Thank you - best regards,
Bill Auerbach
___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] Using BOOTP with lwIP

2017-08-09 Thread Amit Ashara
Hello All,

I am trying to get BOOTP working with lwIP 1.4.1 stack. I am able to
configure the device in STATIC IP address mode and send the BOOTP packet
and have the BOOTP server send the BOOTP response. However the call back
function for udp does not get invoked. After debugging the same, the issue
seems to be in ip.c file. In the function ip_input

  if ((netif_is_up(netif)) && (!ip_addr_isany(&(netif->ip_addr {
/* unicast to this interface address? */
if (ip_addr_cmp(_iphdr_dest, &(netif->ip_addr)) ||
/* or broadcast on this interface network address? */
ip_addr_isbroadcast(_iphdr_dest, netif)) {
  LWIP_DEBUGF(IP_DEBUG, ("ip_input: packet accepted on interface
%c%c\n",
  netif->name[0], netif->name[1]));
  /* break out of for loop */
  break;
}
#if LWIP_AUTOIP
/* connections to link-local addresses must persist after changing
   the netif's address (RFC3927 ch. 1.9) */
if ((netif->autoip != NULL) &&
ip_addr_cmp(_iphdr_dest, &(netif->autoip->llipaddr))) {
  LWIP_DEBUGF(IP_DEBUG, ("ip_input: LLA packet accepted on
interface %c%c\n",
  netif->name[0], netif->name[1]));
  /* break out of for loop */
  break;
}
#endif /* LWIP_AUTOIP */

the check seems to fail at ip_addr_isany causing the netif pointer to be
NULL and hence later in the code the udp stack call is not occuring. Since
BOOTP requires the client to send the BOOTP request packet with IP address
0.0.0.0 the server responds to the IP address 0.0.0.0 which causes the
check to fail. Is there some way I can bypass the same to get the BOOTP
response to the application layer.

Furthermore: the same logic exists on lwip 2.0.2 so my assumption would be
that it would not be possible with 2.0.2. I do understand that DHCP has
superseded BOOTP, but legacy network still relies on the same.

Regards
Amit
___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] time out on last ack

2017-08-09 Thread Mike Rosing

A couple weeks ago I posted this question:"I have a situation where the LwIP is a client and I have compiled with SO_REUSE = 1, but the reconnect fails with error -8 (in use).  After searching a while I see that the LwIP is waiting for LAST_ACK, but the server never sends it.  What can I do to force the internal pcb's to timeout and just allow a reconnect to proceed instead of thinking the connection is still live?"The answer is to use tcp_abandon() when the pcb->state == 9.  This removes the pcb from the "time wait" list and allows the connection to re-establish.  For anyone's future reference if you have a similar problem.Mike
 

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] IPv6 multicast support by Socket API?

2017-08-09 Thread Andrej Butok
Hello lwIP developers,

I am using latest  lwIP 2.0.2.
My application have to use lwIP Socket user API and FreeRTOS. And It have to 
join IPv4 and IPv6 multicast groups.
The IPv4 Multicast is supported very well via  the IP_ADD_MEMBERSHIP / 
IP_DROP_MEMBERSHIP socket options.
BUT it looks like the lWIP Sockets does not support IPv6 multicast join/leave.

Did I miss something?
Are you going to add this missing functionality? 
Any suggestion how to work around this issue now?

Thank you,
Andrey

___
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users