Strange packets lost
Hello all, i have searched many options but i haven't any new idea. I have 4 openbsd routers (2 on each site). Each router create a GRE tunnel with it's pair. Here is the configuration: | S1R1 --- gre + ospf --- S2R1 | LAN S1 (OSPF RIP) | | LAN S2 (OSPF RIP) | S1R2 --- gre + ospf --- S2R2 | The routing rules are correct, ssh, http(s), smtp, ntp, ldap and many other protocols works as expected between the two sites. But i have a problem with my Avaya phones on S2 which need to contact the S1 gatekeeper. Some packets are lost, and (by sniffing every interface) i don't found where the packets goes. If i capture LAN S1 link, i have this capture: 10:06:24.003479 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003607 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018842 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023582 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023710 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:06:24.024086 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:24.024329 192.168.106.38.411 192.168.238.121.56641: . 1461:2921(1460) ack 74 win 46 (DF) 10:06:27.017704 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:33.017772 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:45.017907 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:09.018198 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:57.018732 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:08:24.019074 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034803 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) If i capture the GRE tunnel i have this capture: 10:06:23.987975 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003614 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018833 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023573 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023716 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:08:24.019083 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034793 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) A part of the TCP transaction disappear and i don't know why. Have you got ideas ??? -- Best regards, Loïc BLOT, UNIX systems, security and network expert http://www.unix-experience.fr
Re: Strange packets lost
On 25 September 2013 11:03, Loïc BLOT loic.b...@unix-experience.fr wrote: Hello all, i have searched many options but i haven't any new idea. I have 4 openbsd routers (2 on each site). Each router create a GRE tunnel with it's pair. Here is the configuration: | S1R1 --- gre + ospf --- S2R1 | LAN S1 (OSPF RIP) | | LAN S2 (OSPF RIP) | S1R2 --- gre + ospf --- S2R2 | The routing rules are correct, ssh, http(s), smtp, ntp, ldap and many other protocols works as expected between the two sites. But i have a problem with my Avaya phones on S2 which need to contact the S1 gatekeeper. Some packets are lost, and (by sniffing every interface) i don't found where the packets goes. If i capture LAN S1 link, i have this capture: 10:06:24.003479 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003607 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018842 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023582 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023710 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:06:24.024086 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:24.024329 192.168.106.38.411 192.168.238.121.56641: . 1461:2921(1460) ack 74 win 46 (DF) 10:06:27.017704 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:33.017772 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:45.017907 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:09.018198 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:57.018732 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:08:24.019074 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034803 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) If i capture the GRE tunnel i have this capture: 10:06:23.987975 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003614 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018833 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023573 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023716 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:08:24.019083 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034793 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) A part of the TCP transaction disappear and i don't know why. Have you got ideas ??? this looks like a classical mtu problem. gre tunnel lowers the mtu and your tcp traffic uses mss of 1460 bytes and sets DF. therefore it gets dropped once the router figures out it can't send that much data over the gre link. possible solutions are using path mtu discovery on clients or making sure their mtu is less than 1500 or doing forced fragmentation and defragmentation on the router or configuring the application to use smaller mss value (setsockopt TCP_MAXSEG).
Re: Strange packets lost
Hello, you are totally right ! I haven't thought about layer 2 problems. But the problem is partially resolve, i have strange things with DF. Port 80 is no-df but not port 411 (avaya cfg). Here is a fragment of my pf config: set skip on lo set block-policy drop set limit { states 10, src-nodes 8, table-entries 60 } match in scrub (no-df) block in log all pass out all ... pass in quick inet from toip_area_v4 to toip_area_v4 scrub (no-df) no state Is something wrong ? -- Best regards, Loïc BLOT, UNIX systems, security and network expert http://www.unix-experience.fr Le mercredi 25 septembre 2013 à 14:23 +0200, Mike Belopuhov a écrit : On 25 September 2013He 11:03, Loïc BLOT loic.b...@unix-experience.fr wrote: Hello all, i have searched many options but i haven't any new idea. I have 4 openbsd routers (2 on each site). Each router create a GRE tunnel with it's pair. Here is the configuration: | S1R1 --- gre + ospf --- S2R1 | LAN S1 (OSPF RIP) | | LAN S2 (OSPF RIP) | S1R2 --- gre + ospf --- S2R2 | The routing rules are correct, ssh, http(s), smtp, ntp, ldap and many other protocols works as expected between the two sites. But i have a problem with my Avaya phones on S2 which need to contact the S1 gatekeeper. Some packets are lost, and (by sniffing every interface) i don't found where the packets goes. If i capture LAN S1 link, i have this capture: 10:06:24.003479 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003607 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018842 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023582 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023710 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:06:24.024086 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:24.024329 192.168.106.38.411 192.168.238.121.56641: . 1461:2921(1460) ack 74 win 46 (DF) 10:06:27.017704 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:33.017772 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:06:45.017907 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:09.018198 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:07:57.018732 192.168.106.38.411 192.168.238.121.56641: . 1:1461(1460) ack 74 win 46 (DF) 10:08:24.019074 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034803 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) If i capture the GRE tunnel i have this capture: 10:06:23.987975 192.168.238.121.56641 192.168.106.38.411: S 2621611805:2621611805(0) win 5840 mss 1460,sackOK,timestamp 4294948803 0,nop,wscale 4 (DF) 10:06:24.003614 192.168.106.38.411 192.168.238.121.56641: S 3090220105:3090220105(0) ack 2621611806 win 5840 mss 1460,nop,wscale 7 (DF) 10:06:24.018833 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) 10:06:24.023573 192.168.238.121.56641 192.168.106.38.411: P 1:74(73) ack 1 win 365 (DF) 10:06:24.023716 192.168.106.38.411 192.168.238.121.56641: . ack 74 win 46 (DF) 10:08:24.019083 192.168.106.38.411 192.168.238.121.56641: FP 2921:4273(1352) ack 74 win 46 (DF) 10:08:24.034793 192.168.238.121.56641 192.168.106.38.411: . ack 1 win 365 (DF) A part of the TCP transaction disappear and i don't know why. Have you got ideas ??? this looks like a classical mtu problem. gre tunnel lowers the mtu and your tcp traffic uses mss of 1460 bytes and sets DF. therefore it gets dropped once the router figures out it can't send that much data over the gre link. possible solutions are using path mtu discovery on clients or making sure their mtu is less than 1500 or doing forced fragmentation and defragmentation on the router or configuring the application to use smaller mss value (setsockopt TCP_MAXSEG).