Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Am Dienstag, 11. September 2018, 12:33:34 schrieb Steffen Klassert: > On Mon, Sep 10, 2018 at 10:18:47AM +0200, Kristian Evensen wrote: > > Hi, > > > > Thanks everyone for all the effort in debugging this issue. > > > > On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert > > > > wrote: > > > The easy fix that could be backported to stable would be > > > to check skb->dst for NULL and drop the packet in that case. > > > > Thought I should just chime in and say that we deployed this > > work-around when we started observing the error back in June. Since > > then we have not seen any crashes. Also, we have instrumented some of > > our kernels to count the number of times the error is hit (overall + > > consecutive). Compared to the overall number of packets, the error > > happens very rarely. With our workloads, we on average see the error > > once every couple of days. > > Thanks for letting us know! > > I plan to fix this in the ipsec tree with: > > Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force > clears the dst_entry. > > Since commit 222d7dbd258d ("net: prevent dst uses after free") > skb_dst_force() might clear the dst_entry attached to the skb. > The xfrm code don't expect this to happen, so we crash with > a NULL pointer dereference in this case. Fix it by checking > skb_dst(skb) for NULL after skb_dst_force() and drop the packet > in cast the dst_entry was cleared. > > Fixes: 222d7dbd258d ("net: prevent dst uses after free") > Reported-by: Tobias Hommel > Reported-by: Kristian Evensen > Reported-by: Wolfgang Walter > Signed-off-by: Steffen Klassert > --- > net/xfrm/xfrm_output.c | 4 > net/xfrm/xfrm_policy.c | 4 > 2 files changed, 8 insertions(+) > > diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c > index 89b178a78dc7..36d15a38ce5e 100644 > --- a/net/xfrm/xfrm_output.c > +++ b/net/xfrm/xfrm_output.c > @@ -101,6 +101,10 @@ static int xfrm_output_one(struct sk_buff *skb, int > err) spin_unlock_bh(>lock); > > skb_dst_force(skb); > + if (!skb_dst(skb)) { > + XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR); > + goto error_nolock; > + } > > if (xfrm_offload(skb)) { > x->type_offload->encap(x, skb); > diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c > index 7c5e8978aeaa..626e0f4d1749 100644 > --- a/net/xfrm/xfrm_policy.c > +++ b/net/xfrm/xfrm_policy.c > @@ -2548,6 +2548,10 @@ int __xfrm_route_forward(struct sk_buff *skb, > unsigned short family) } > > skb_dst_force(skb); > + if (!skb_dst(skb)) { > + XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); > + return 0; > + } > > dst = xfrm_lookup(net, skb_dst(skb), , NULL, XFRM_LOOKUP_QUEUE); > if (IS_ERR(dst)) { This patch fixes the problem here. XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably this happens when some route are changed/set then. Regards and thanks, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Am Montag, 10. September 2018, 10:18:47 schrieb Kristian Evensen: > Hi, > > Thanks everyone for all the effort in debugging this issue. > > On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert > > wrote: > > The easy fix that could be backported to stable would be > > to check skb->dst for NULL and drop the packet in that case. > > Thought I should just chime in and say that we deployed this > work-around when we started observing the error back in June. Since > then we have not seen any crashes. Also, we have instrumented some of > our kernels to count the number of times the error is hit (overall + > consecutive). Compared to the overall number of packets, the error > happens very rarely. With our workloads, we on average see the error > once every couple of days. > Would you mind send us yout patch (with the accounting) so that we can check how often that happens here? Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Hello Steffen, in one of your emails to Thomas you wrote: > xfrm_lookup+0x2a is at the very beginning of xfrm_lookup(), here we > find: > > u16 family = dst_orig->ops->family; > > ops has an offset of 32 bytes (20 hex) in dst_orig, so looks like > dst_orig is NULL. > > In the forwarding case, we get dst_orig from the skb and dst_orig > can't be NULL here unless the skb itself is already fishy. Is this really true? If xfrm_lookup is called from __xfrm_route_forward(): int __xfrm_route_forward(struct sk_buff *skb, unsigned short family) { struct net *net = dev_net(skb->dev); struct flowi fl; struct dst_entry *dst; int res = 1; if (xfrm_decode_session(skb, , family) < 0) { XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR); return 0; } skb_dst_force(skb); dst = xfrm_lookup(net, skb_dst(skb), , NULL, XFRM_LOOKUP_QUEUE); if (IS_ERR(dst)) { res = 0; dst = NULL; } skb_dst_set(skb, dst); return res; } couldn't it be possible that skb_dst_force(skb) actually sets dst to NULL if it cannot safely lock it? If it is absolutely sure that skb_dst_force() never can set dst to NULL I wonder why it is called at all? Here is skb_dst_force() static inline void skb_dst_force(struct sk_buff *skb) { if (skb_dst_is_noref(skb)) { struct dst_entry *dst = skb_dst(skb); WARN_ON(!rcu_read_lock_held()); if (!dst_hold_safe(dst)) dst = NULL; skb->_skb_refdst = (unsigned long)dst; } } and dst_hold_safe() is static inline bool dst_hold_safe(struct dst_entry *dst) { return atomic_inc_not_zero(>__refcnt); } Am Freitag, 7. September 2018, 22:22:39 schrieb Wolfgang Walter: > Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert: > > On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote: > > > Hello, > > > > > > kernels > 4.12 do not work on one of our main routers. They crash as > > > soon > > > as ipsec-tunnels are configured and ipsec-traffic actually flows. > > > > Can you please send the backtrace of this crash? > > I bootet the b838d5e1c5b6e57b10ec8af2268824041e3ea911 several times but I > could not record the complete trace. I think I have to log to the serial > console but I can't do that before next week. > > > What I could record ist: > > There is a always >... > the callrace. > > This is the part I could see: > > > irq_exit+0x71/0x80 > do_IRQ+0x4d/0xd0 > common_interrup+07a/0x7a > > RIP: 010:cpuidle_enter_state+0x11d/0x200 > RSP: 0018:c9000321bee0 EFLAGS: 0282 ORIG_RAX: ffc4 > RAX: 88085efde450 RBX: 0004 RCX: 0003c9e63c13 > RDX: 0003c9e63c13 RSI: ffb03103fe35ac43 RDI: > RBP: e87cf600 R08: 000c R09: 0004 > R10: 0400 R11: 0003c99e56fc R12: 0003c9e63c13 > R13: 0003c9da9567 R14: 0004 R15: 822763e0 > do_idle+0xd3/0x160 > cpu_startup_entry+0x14/0x20 > secondary_startup_64+0xa5/0xb0 > Code: 00 0f b7 83 c0 00 00 00 80 7c 02 08 01 0f 86 d3 02 00 00 41 > 8b 8c 24 3c 10 00 00 48 8b 6b 58 85 c9 0f 84 2f 01 00 00 48 83 e5 fe 45 > 60 > 02 0f 84 4e 01 00 00 f6 43 38 01 74 0d 80 00 bd ab 00 00 > RIP: ip_forward+0xd4/0x470 RSP: 88085efc3cb0 > CR2: 0060 > [ end trace 7205b53c25b7b35a ]--- > Kernel panic - not syncing: Fatal exception in interrupt > Kernel Offset: disabled > Rebooting in 60 seconds.. > > > I got an email from Tobias Hommel and I think it is the same problem. > > It is very clear that it is the difference from > > ipv4: call dst_hold_safe() properly > > to > > ipv4: mark DST_NOGC and remove the operation of dst_free() > > which triggers this bug. > > Regards, Regards -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert: > On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote: > > Hello, > > > > kernels > 4.12 do not work on one of our main routers. They crash as soon > > as ipsec-tunnels are configured and ipsec-traffic actually flows. > > Can you please send the backtrace of this crash? > I bootet the b838d5e1c5b6e57b10ec8af2268824041e3ea911 several times but I could not record the complete trace. I think I have to log to the serial console but I can't do that before next week. What I could record ist: There is a always ... the callrace. This is the part I could see: irq_exit+0x71/0x80 do_IRQ+0x4d/0xd0 common_interrup+07a/0x7a RIP: 010:cpuidle_enter_state+0x11d/0x200 RSP: 0018:c9000321bee0 EFLAGS: 0282 ORIG_RAX: ffc4 RAX: 88085efde450 RBX: 0004 RCX: 0003c9e63c13 RDX: 0003c9e63c13 RSI: ffb03103fe35ac43 RDI: RBP: e87cf600 R08: 000c R09: 0004 R10: 0400 R11: 0003c99e56fc R12: 0003c9e63c13 R13: 0003c9da9567 R14: 0004 R15: 822763e0 do_idle+0xd3/0x160 cpu_startup_entry+0x14/0x20 secondary_startup_64+0xa5/0xb0 Code: 00 0f b7 83 c0 00 00 00 80 7c 02 08 01 0f 86 d3 02 00 00 41 8b 8c 24 3c 10 00 00 48 8b 6b 58 85 c9 0f 84 2f 01 00 00 48 83 e5 fe 45 60 02 0f 84 4e 01 00 00 f6 43 38 01 74 0d 80 00 bd ab 00 00 RIP: ip_forward+0xd4/0x470 RSP: 88085efc3cb0 CR2: 0060 [ end trace 7205b53c25b7b35a ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: disabled Rebooting in 60 seconds.. I got an email from Tobias Hommel and I think it is the same problem. It is very clear that it is the difference from ipv4: call dst_hold_safe() properly to ipv4: mark DST_NOGC and remove the operation of dst_free() which triggers this bug. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Hello, didn't respond as I've been on vacation. Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert: > On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote: > > Hello, > > > > kernels > 4.12 do not work on one of our main routers. They crash as soon > > as ipsec-tunnels are configured and ipsec-traffic actually flows. > > Can you please send the backtrace of this crash? > I'll try today. The oops quickly disappears because other problems arising from it pop up. The machine crashes and no logs are logged. I try to make foto or try to log to the serial console. At the moment I only see that there is xfrm_ stuff in the call trace as xfrm_lookup, xfrm_route_, and it is while routing a packet. With later kernels (4.18.5) the machine seems to crash without a call trace on console. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
Hello, kernels > 4.12 do not work on one of our main routers. They crash as soon as ipsec-tunnels are configured and ipsec-traffic actually flows. Just configuring ipsec (that is starting strongswan) does not trigger the oops. I finally found time to bisect that. It bisected down to b838d5e1c5b6e57b10ec8af2268824041e3ea911 ipv4: mark DST_NOGC and remove the operation of dst_free() Now we have other machines which run just fine with the very same kernels doing ipsec. They differ insofar as they have much less cores, do not use the ixgbe driver, do not have 10G and terminate only a few tunnels instead of hundreds. I already tested distribution kernels > 4.12 from debian, they also crash. All kernels I created in the bisection run fine if I didn't use ipsec. The bad ones all oopsed/crashed exactly as vanilla 4.14 described above. Here is the bisect-log: # bad: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14 # good: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9 git bisect start 'v4.14' 'v4.9' # good: [d82dd0e34d0347be201fd274dc84cd645dccc064] raid1: prefer disk without bad blocks git bisect good d82dd0e34d0347be201fd274dc84cd645dccc064 # bad: [9967468c0a109644e4a1f5b39b39bf86fe7507a7] Merge branch 'akpm' (patches from Andrew) git bisect bad 9967468c0a109644e4a1f5b39b39bf86fe7507a7 # bad: [17d9aa66b08de445645bd0688fc1635bed77a57b] Merge tag 'iwlwifi-next-for-kalle-2017-06-30' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next git bisect bad 17d9aa66b08de445645bd0688fc1635bed77a57b # good: [de4d195308ad589626571dbe5789cebf9695a204] Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good de4d195308ad589626571dbe5789cebf9695a204 # good: [9376906c17fa975bf6a7ea9dd124be697bcda289] Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip git bisect good 9376906c17fa975bf6a7ea9dd124be697bcda289 # good: [40e86a3619a1e84ad73c716c943f65fc38eb1e28] iwlwifi: mvm: use scnprintf() instead of snprintf() git bisect good 40e86a3619a1e84ad73c716c943f65fc38eb1e28 # bad: [c66f2091c9248ddf42504c74cd327ae8619b04a4] net/mlx5e: Prevent PFC call for non ethernet ports git bisect bad c66f2091c9248ddf42504c74cd327ae8619b04a4 # good: [a090bd4ff8387c409732a8e059fbf264ea0bdd56] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net git bisect good a090bd4ff8387c409732a8e059fbf264ea0bdd56 # good: [1947030645b6012aeee98da764d6dd47071a6aad] Merge branch 'dsa-prefix-Global-macros' git bisect good 1947030645b6012aeee98da764d6dd47071a6aad # good: [69137ea60c9dad58773a1918de6c1b00b088520c] pktgen: Specify num packets per thread git bisect good 69137ea60c9dad58773a1918de6c1b00b088520c # good: [d24406c85d123df773bc4df88ad5da2233896919] udp: call dst_hold_safe() in udp_sk_rx_set_dst() git bisect good d24406c85d123df773bc4df88ad5da2233896919 # bad: [5b7c9a8ff828287af5aebe93e707271bf1a82cc3] net: remove dst gc related code git bisect bad 5b7c9a8ff828287af5aebe93e707271bf1a82cc3 # bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911 # good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new function dst_dev_put() git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36 # good: [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly git bisect good 95c47f9cf5e028d1ae77dc6c767c1edc8a18025b # good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() properly git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f # first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and remove the operation of dst_free() In my first email I wrote >= 4.12, but I think 4.12 works. I bisected between 4.9 and 4.14 as we actually run 4.9 on the machine with the problem and 4.14 on most other routers. I also tested 4.18.5 and it still shows this bug. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
kernels >= v4.12 oops/crash with ipsec-traffic: partly bisected
Hello, kernels >= 4.12 do not work on one of our main routers. They crash as soon as ipsec-tunnels are configured and ipsec-traffic actually flows. Just configuring ipsec (that is starting strongswan) does not trigger the oops. I finally found time to bisect that. Though I have not completed that yet, I already narrowed it down to the following commits good: d24406c85d123df773bc4df88ad5da2233896919 udp: call dst_hold_safe() in udp_sk_rx_set_dst() bad: 5b7c9a8ff828287af5aebe93e707271bf1a82cc3 net: remove dst gc related code Commits in between are almost all changes to remove dst gc. Now we have other machines which run just fine with the very same kernels doing ipsec. They differ insofar as they have much less cores, do not use the ixgbe driver, do not have 10G and terminate only a few tunnels instead of hundreds. I already tested distribution kernels > 4.12 from debian, they also crash. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: regression kernel 4.4: stops routing packets with a GRE-payload
Am Mittwoch, 20. Januar 2016, 17:58:52 schrieb Nicolas Dichtel: > Le 20/01/2016 15:00, Wolfgang Walter a écrit : > > Hello, > > > > we tried 4.4 on our routers. We found one problem: 4.4 stops routing GRE > > packets (ipv4 in GRE/ipv4) here. 4.4.15 works fine. > > 4.4.15 does not exist. Is it 4.1.15? Yes, I mean 4.1.15 -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts
Re: [PATCH net-next] ipv6: gro: support sit protocol
Am Mittwoch, 4. November 2015, 04:40:51 schrieb Eric Dumazet: > On Wed, 2015-11-04 at 13:19 +0100, Wolfgang Walter wrote: > > Today I found a problem: on a router forwarding GRE-packets (ipv4) (it is > > not the endpount) the interface (intel igb) stops sending packets after > > some time. I think this happens when an ISATAP packet is inside the > > GRE-packet.> > > gre packets arrives on eth0 > > eth1 stops sending (receiving still works) > > ethtool -r eth1 > > eth1 works again for some time > > > > Switching GRO off on eth0 "fixes" the problem. > > > > I didn't test vanilla 4.1.12 yet, though. Until today 4.1.11 has been > > running on the router. What I tested was your patch > > > > "gre_gso_segment() chokes if SIT frames were aggregated by GRO engine." > > > > but did not solve the problem. > > > > So I would not recommend to backport it to longterm 4.1. > > > > My plans are: > > > > * test vanilla 4.1.12 > > * test 4.3 > > > > I want to test 4.3 on another router first, though. > > If the NIC stops sending packets after some time, it might be an igb > issue. Yes, maybe igb has a problem sending a gro-packet if it is an isatap in gre. igb has no problem sending gro-packets which are pure isatap or which are ipv4 (tcp/udp) in gre with 4.1.12 + these patches. And it had no problem with 4.1.11 with isatap in gre. Disabling gso for the interface does help. I'll test pure 4.1.12 soon. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: gro: support sit protocol
Am Dienstag, 3. November 2015, 05:07:33 schrieb Eric Dumazet: > On Tue, 2015-11-03 at 13:57 +0100, Wolfgang Walter wrote: > > Am Montag, 19. Oktober 2015, 20:40:17 schrieb Eric Dumazet: > > > From: Eric Dumazet <eduma...@google.com> > > > > > > Tom Herbert added SIT support to GRO with commit > > > 19424e052fb4 ("sit: Add gro callbacks to sit_offload"), > > > later reverted by Herbert Xu. > > > > > > The problem came because Tom patch was building GRO > > > packets without proper meta data : If packets were locally > > > delivered, we would not care. > > > > > > But if packets needed to be forwarded, GSO engine was not > > > able to segment individual segments. > > > > > > With the following patch, we correctly set skb->encapsulation > > > and inner network header. We also update gso_type. > > > > I'm running 4.1.11 / 4.1.12 with this patch on top now since over a week. > > ISATAP works fine. > > Perfect ! thanks a lot for testing ! Today I found a problem: on a router forwarding GRE-packets (ipv4) (it is not the endpount) the interface (intel igb) stops sending packets after some time. I think this happens when an ISATAP packet is inside the GRE-packet. gre packets arrives on eth0 eth1 stops sending (receiving still works) ethtool -r eth1 eth1 works again for some time Switching GRO off on eth0 "fixes" the problem. I didn't test vanilla 4.1.12 yet, though. Until today 4.1.11 has been running on the router. What I tested was your patch "gre_gso_segment() chokes if SIT frames were aggregated by GRO engine." but did not solve the problem. So I would not recommend to backport it to longterm 4.1. My plans are: * test vanilla 4.1.12 * test 4.3 I want to test 4.3 on another router first, though. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: gro: support sit protocol
Am Mittwoch, 4. November 2015, 07:13:07 schrieb Eric Dumazet: > On Wed, 2015-11-04 at 15:09 +0100, Wolfgang Walter wrote: > > Yes, maybe igb has a problem sending a gro-packet if it is an isatap in > > gre. > We might detect this condition properly from igb ndo_features_check > method. > > It currently uses plain passthru_features_check() > > > igb has no problem sending gro-packets which are pure isatap or which are > > ipv4 (tcp/udp) in gre with 4.1.12 + these patches. > > > > And it had no problem with 4.1.11 with isatap in gre. > > > > Disabling gso for the interface does help. > > My patch was aimed for 4.4, not sure about backports to old kernels... I know. I cannot test 4.4 (or net-next) on that router, though, as I don't have easy physical access to it if it crashes or I loose network connectivity. For such tests I must send someone in situ. As 4.4 will be the next longterm kernel I definitivly will do that for 4.4-rc2 or 4.4-rc3. I think your patch is correct for 4.1 in the sense that ISATAP is correctly handled. Only SIT in GRE triggers this and if it is indeed igb I will see it probably in 4.4 ;-), too. I now tested an unmodified 4.1.12 and it shows no problems. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] ipv6: gro: support sit protocol
Am Montag, 19. Oktober 2015, 20:40:17 schrieb Eric Dumazet: > From: Eric Dumazet <eduma...@google.com> > > Tom Herbert added SIT support to GRO with commit > 19424e052fb4 ("sit: Add gro callbacks to sit_offload"), > later reverted by Herbert Xu. > > The problem came because Tom patch was building GRO > packets without proper meta data : If packets were locally > delivered, we would not care. > > But if packets needed to be forwarded, GSO engine was not > able to segment individual segments. > > With the following patch, we correctly set skb->encapsulation > and inner network header. We also update gso_type. > I'm running 4.1.11 / 4.1.12 with this patch on top now since over a week. ISATAP works fine. > Tested: > > Server : > netserver > modprobe dummy > ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up > arp -s 8.0.0.100 4e:32:51:04:47:e5 > iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100 > ifconfig sixtofour0 > sixtofour0 Link encap:IPv6-in-IPv4 > inet6 addr: 2002:af6:798::1/128 Scope:Global > inet6 addr: 2002:af6:798::/128 Scope:Global > UP RUNNING NOARP MTU:1480 Metric:1 > RX packets:411169 errors:0 dropped:0 overruns:0 frame:0 > TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:20319631739 (20.3 GB) TX bytes:29529556 (29.5 MB) > > Client : > netperf -H 2002:af6:798::1 -l 1000 & > > Checked on server traffic copied on dummy0 and verify segments were > properly rebuilt, with proper IP headers, TCP checksums... > > tcpdump on eth0 shows proper GRO aggregation takes place. > > Signed-off-by: Eric Dumazet <eduma...@google.com> > --- > net/ipv6/ip6_offload.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index 08b62047c67f..eeca943f12dc 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -264,6 +264,9 @@ static int ipv6_gro_complete(struct sk_buff *skb, int > nhoff) struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff); > int err = -ENOSYS; > > + if (skb->encapsulation) > + skb_set_inner_network_header(skb, nhoff); > + > iph->payload_len = htons(skb->len - nhoff - sizeof(*iph)); > > rcu_read_lock(); > @@ -280,6 +283,13 @@ out_unlock: > return err; > } > > +static int sit_gro_complete(struct sk_buff *skb, int nhoff) > +{ > + skb->encapsulation = 1; > + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT; > + return ipv6_gro_complete(skb, nhoff); > +} > + > static struct packet_offload ipv6_packet_offload __read_mostly = { > .type = cpu_to_be16(ETH_P_IPV6), > .callbacks = { > @@ -292,6 +302,8 @@ static struct packet_offload ipv6_packet_offload > __read_mostly = { static const struct net_offload sit_offload = { > .callbacks = { > .gso_segment= ipv6_gso_segment, > + .gro_receive= ipv6_gro_receive, > + .gro_complete = sit_gro_complete, > }, > }; Thanks, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 net-next 3/4] ipv6: Add gro functions to sit_offloads
Hello Eric! Am Freitag, 16. Oktober 2015, 08:23:49 schrieb Eric Dumazet: > On Thu, 2015-08-06 at 17:15 -0700, Jesse Gross wrote: > > On Mon, Aug 3, 2015 at 10:11 AM, Tom Herbert <t...@herbertland.com> wrote: > > > For GRO to work with sit we need gro_receive and gro_complete populated > > > in the sit_offload structure. > > > > > > Signed-off-by: Tom Herbert <t...@herbertland.com> > > > > You might want to checkout the recent history on this file unless > > there's something that's changed in the last couple of weeks: > > > > commit fdbf5b097bbd9693a86c0b8bfdd071a9a2117cfc > > Author: Herbert Xu <herb...@gondor.apana.org.au> > > Date: Mon Jul 20 17:55:38 2015 +0800 > > > > Revert "sit: Add gro callbacks to sit_offload" > > > > This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit: > > Add gro callbacks to sit_offload") because it generates packets > > that cannot be handled even by our own GSO. > > > > Reported-by: Wolfgang Walter <li...@stwm.de> > > Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au> > > Signed-off-by: David S. Miller <da...@davemloft.net> > > > > -- > > What about the following more complete patch ? > > We properly set skb->encapsulation and inner network header as some NIC > drivers depend on it. Our GSO should also work properly I think. > > Wolfgang, could you please test it ? (this is a patch on top of David > Miller net-next tree) > > Both Google and Facebook are eager to get proper GRO/SIT support ;) > > Thanks ! In the moment I can't test it, sorry. Will test it next week. But as I see you already did that yourself. > > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index 08b62047c67f..eeca943f12dc 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -264,6 +264,9 @@ static int ipv6_gro_complete(struct sk_buff *skb, int > nhoff) struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff); > int err = -ENOSYS; > > + if (skb->encapsulation) > + skb_set_inner_network_header(skb, nhoff); > + > iph->payload_len = htons(skb->len - nhoff - sizeof(*iph)); > > rcu_read_lock(); > @@ -280,6 +283,13 @@ out_unlock: > return err; > } > > +static int sit_gro_complete(struct sk_buff *skb, int nhoff) > +{ > + skb->encapsulation = 1; > + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT; > + return ipv6_gro_complete(skb, nhoff); > +} > + > static struct packet_offload ipv6_packet_offload __read_mostly = { > .type = cpu_to_be16(ETH_P_IPV6), > .callbacks = { > @@ -292,6 +302,8 @@ static struct packet_offload ipv6_packet_offload > __read_mostly = { static const struct net_offload sit_offload = { > .callbacks = { > .gso_segment= ipv6_gso_segment, > + .gro_receive= ipv6_gro_receive, > + .gro_complete = sit_gro_complete, > }, > }; Thank you very much, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Soft lockup issue in Linux 4.1.9
Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte: > On 10/02/15 08:52, Andre Tomt wrote: > > On 01. okt. 2015 13:52, Eric Dumazet wrote: > >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte > >> > >> <holger.hoffstae...@googlemail.com> wrote: > >>> On 10/01/15 13:29, Eric Dumazet wrote: > >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af > >>>> Author: Eric Dumazet <eduma...@google.com> > >>>> Date: Thu Aug 13 15:44:51 2015 -0700 > >>>> > >>>> inet: fix potential deadlock in reqsk_queue_unlink() > > > > > > > >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as > >>> we speak. Let's hope that this fixes the lockups. > >> > >> It definitely should help ! > >> > >> David, since patch is not yet seen on > >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=* > >> could you please add it to your queue ? > > > > Seems to fix it for me as well. 3 systems have been running varying > > types of production-like loads with it for 14+ hours without hanging. > > Just got up, and yes - my systems survived the night as well, no issues. > > Greg, any chance you can drop this into the pending 4.1.10? Otherwise people > will get another broken release. > Fixes the problem here, too. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel 4.1.9: networking hangs with rcu_preempt self-detected stall, 4.1.8 works; was: Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog
Am Dienstag, 29. September 2015, 12:48:43 schrieb Andre Tomt: > On 29. sep. 2015 12:21, Andre Tomt wrote: > > Meanwhile I'll revert both the mentioned net patches and see how it goes. > > So that blew up as well, meaning it's not any of these two patches: > [PATCH 4.1 124/159] net: do not process device backlog during unregistration > [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog > > I'll be offline for a half+ day, I'll look into bisecting when back if > nobody has figured it out by then. > -- > To unsubscribe from this list: send the line "unsubscribe stable" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html We see these rcu hangs with 4.1.9 on one of our routers, too. 4.1.8 runs fine. The output I got the last time was: [ 6488.174578] igb 000:06:00.1 eth3: Reset adapter [ 6497.350183] INFO: rcu_preempt self-detected stall on CPU { 3} (t=6301 jiffies g=383330 c=383329 q=1323) [ 6497.350229] rcu_preempt kthread starved for 6007 jiffies! [ 6560.311093] INFO: rcu_preempt self-detected stall on CPU { 3} (t=25205 jiffies g=383330 c=383329 q=4479) [ 6560.311140] rcu_preempt kthread starved for 24911 jiffies! [ 6623.272005] INFO: rcu_preempt self-detected stall on CPU { 3} (t=44109 jiffies g=383330 c=383329 q=7107) [ 6623.272049] rcu_preempt kthread starved for 43815 jiffies! [ 6633.053892] igb 000:06:00.0 eth2: Reset adapter [ 6633.053892] rcu_preempt kthread starved for 62719 jiffies! [ 6486.232914] INFO: rcu_preempt self-detected stall on CPU { 3} (t=63013 jiffies g=383330 c=383329 q=8487) [ 6486.233204] rcu_preempt kthread starved for 6007 jiffies! All other hangs basically were the same, the cpu varies though. After that the router completely hangs: networking stops working and we need to restart it. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sit: Set SKB_GSO_SIT bit when performing GRO
Am Montag, 20. Juli 2015, 14:14:59 schrieb Herbert Xu: On Fri, Jul 17, 2015 at 05:38:30PM +0200, Wolfgang Walter wrote: eth1 stops sending with the patch after some time disabling gro on eth0 helps disabling tso or gso on eth0 and/or eth1 or both does not help eth0 and eth1 are both intel I350. What does ethtool -k eth1 say? With TSO enabled: # ethtool -k eth0 Features for eth0: rx-checksumming: on tx-checksumming: on tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: on scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: off [fixed] tx-tcp6-segmentation: on udp-fragmentation-offload: off [fixed] generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: on tx-vlan-offload: on ntuple-filters: off [fixed] receive-hashing: on highdma: on [fixed] rx-vlan-filter: on [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: off [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-ipip-segmentation: off [fixed] tx-sit-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] hw-switch-offload: off [fixed] Can you confirm that disabling tso on eth1 does not help? Disabling TSO on eth1 does not help. Because the most plausible explanation is that we're feeding some bogus TSO packet to the hardware causing a tx lockup. I run the unpatched 4.1.2 again since saturday without look. With your patch the network card hangs within 10 minutes or so. On the other hand I run the the patched kernel on serveral other routers (same hardware, by the way) without problems. So maybe the problem is that the former one routes GRE-tunnel-packets which contains ISATAP packets. I don't know how deep GRO/GSO inspects a packet. But in any case if it is a hardware lockup then it's no longer just a pure software bug. No matter what we do in the stack the hardware should not lock up (unless of course we're feeding it something that's completely bogus). If we can't figure this out then the safest solution would be to disable tunnel GRO completely because it's broken as it stands. Cheers, Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sit: Set SKB_GSO_SIT bit when performing GRO
Am Freitag, 17. Juli 2015, 09:56:51 schrieb Herbert Xu: On Thu, Jul 16, 2015 at 12:58:45PM +0200, Wolfgang Walter wrote: Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu: On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote: Yes. Switching TSO off and leaving GRO on works, too. OK, could you please try this patch? Patch works here. Thanks for the confirmation. Let's add a tag for patchwork: Tested-by: Wolfgang Walter li...@stwm.de It seems that this patch may cause a problem with another one of our routers. Without the patch it had no problem, so I didn't tested it there. With that patch one interface blocks after some time. Not even arp requests get answered. It still receives packets though. Restarting the interface fixes the problem. Switching off gro for the other interface helps. This router is different from the other ones. It does not directly route isatap packets. It may routes isatap packets encapsulated in GRE packets, though. It is itself not an GRE-endpoint. The router does NAT. Basically it routes the GRE-tunnel packets unatted and NATs most of the rest. Not doing NAT and conntrack (and unloading all modules like nf_conntrack_ipv4, nf_defrag_ipv4) does not help. eth0: extern eth1: intern One (IPv4) GRE-tunnel is routed between eth0 und eth1. IPv6 ESP-tunnels are routed between eth0 and eth1 IPv4 UDP/TCP/ICMP from intern is natted with netfilter. eth1 stops sending with the patch after some time disabling gro on eth0 helps disabling tso or gso on eth0 and/or eth1 or both does not help eth0 and eth1 are both intel I350. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sit: Set SKB_GSO_SIT bit when performing GRO
Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu: On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote: Yes. Switching TSO off and leaving GRO on works, too. OK, could you please try this patch? Patch works here. Thanks, Wolfgang ---8--- We need to set the SKB_GSO_SIT bit if we detect a 6-in-4 tunnel when doing GRO. Otherwise we may throw a packet at TSO hardware that doesn't know what to do with it. Fixes: 19424e052fb4 (sit: Add gro callbacks to sit_offload) Reported-by: Wolfgang Walter li...@stwm.de Signed-off-by: Herbert Xu herb...@gondor.apana.org.au diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index e893cd1..1252eac 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -289,11 +289,21 @@ static struct packet_offload ipv6_packet_offload __read_mostly = { }, }; +static int sit_gro_complete(struct sk_buff *skb, int nhoff) +{ + int err = ipv6_gro_complete(skb, nhoff); + + skb-encapsulation = 1; + skb_shinfo(skb)-gso_type |= SKB_GSO_SIT; + + return err; +} + static const struct net_offload sit_offload = { .callbacks = { .gso_segment= ipv6_gso_segment, .gro_receive= ipv6_gro_receive, - .gro_complete = ipv6_gro_complete, + .gro_complete = sit_gro_complete, }, }; -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GRO: forwading ISATAP packets is very slow with kernel 4.1
Am Mittwoch, 15. Juli 2015, 17:50:20 schrieb Herbert Xu: On Wed, Jul 15, 2015 at 02:34:41AM +0200, Wolfgang Walter wrote: I wonder if the router may treat all ipv6-tcp connections of a host as a single flow as all those ipv6-packets are embedded in ipv4-packets with the ipv4-address of the host and the ipv4-address of the isatap-gateway and next- header is 41. It may only looks at those? I think GRO is actually working fine. Can you show me the output of ethtool -k eth1? If hardware TSO is enabled can you try disabling it to see if it helps? Thanks, Yes. Switching TSO off and leaving GRO on works, too. Thanks, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
GRO: forwading ISATAP packets is very slow with kernel 4.1
Hello, I upgraded routers from 3.14.x to 4.1.2. Forwarding ISATAP-packets (IPv4 packets with IPv6 payload) is very slow with 4.1 if GRO is enabled (youtube for example about 64kbit). Disabling GRO on the interfaces restores performance to values comparable to 3.14.x. The kernel is build with IPv6 support but IPv6 is disabled via kernel command line. The router is not a tunnel endpoint, it only forwards the ISATAP- packets. MTU is 1500 on both interfaces. Netfilter conntrack is not used and disabling netfilter has no effect. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GRO: forwading ISATAP packets is very slow with kernel 4.1
Am Mittwoch, 15. Juli 2015, 08:08:49 schrieben Sie: On Wed, Jul 15, 2015 at 12:16:30AM +0200, Wolfgang Walter wrote: Hello, I upgraded routers from 3.14.x to 4.1.2. Forwarding ISATAP-packets (IPv4 packets with IPv6 payload) is very slow with 4.1 if GRO is enabled (youtube for example about 64kbit). Disabling GRO on the interfaces restores performance to values comparable to 3.14.x. The kernel is build with IPv6 support but IPv6 is disabled via kernel command line. The router is not a tunnel endpoint, it only forwards the ISATAP- packets. MTU is 1500 on both interfaces. Netfilter conntrack is not used and disabling netfilter has no effect. Can you run some tcpdumps and post the results in the two cases? Yes, but I need the cooperation of one of our customers. I wonder if the router may treat all ipv6-tcp connections of a host as a single flow as all those ipv6-packets are embedded in ipv4-packets with the ipv4-address of the host and the ipv4-address of the isatap-gateway and next- header is 41. It may only looks at those? Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel = 4.0: crashes when using traceroute6 with isatap
Am Mittwoch, 6. Mai 2015, 11:15:18 schrieben Sie: (Cc'ing netdev.) On Sat, May 2, 2015 at 5:29 AM, Wolfgang Walter li...@stwm.de wrote: Am Samstag, 2. Mai 2015, 02:16:36 schrieb Wolfgang Walter: Hello, kernel 4.0 (and 4.0.1) crashes immediately when I use traceroute6 with an isatap-tunnel. I did some further tests. To trigger the crash you need * isatap-tunnel (probably any sit-tunnel will do it) * raw-socket * udp Using icmpv6 or tcp i.e. does not trigger it. Do you have a script to reproduce it? Thanks for the bug report! You need a isatap-server with say ipv4-address $X Then, on host with 4.0, start isatapd: isatapd --mtu 1280 $X then do traceroute6 www.google.de Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
problems with e1000 and flow control
Hello, it seems that e1000 enables flow-control (rx pause frames) even if the switch does not advertise flow control. This seems to get a problem as (at least some) switches then forward pause frames directed to the card from other hosts. We think there are hosts which indeed do this in the lans of our student halls. I think flow control should be completely disabled by default if the switch does not advertise it. It still can be forced with ethtool. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c
Hello Ilpo, it happened again with your patch applied: WARNING: at net/ipv4/tcp_input.c:1018 tcp_sacktag_write_queue() Call Trace: IRQ [80549290] tcp_sacktag_write_queue+0x7d0/0xa60 [80283869] add_partial+0x19/0x60 [80549ac4] tcp_ack+0x5a4/0x1d70 [8054e625] tcp_rcv_established+0x485/0x7b0 [80554c3d] tcp_v4_do_rcv+0xed/0x3e0 [80556fe7] tcp_v4_rcv+0x947/0x970 [80538c6c] ip_local_deliver+0xac/0x290 [80538862] ip_rcv+0x362/0x6c0 [804fc5d3] netif_receive_skb+0x323/0x420 [8042ab40] tg3_poll+0x630/0xa50 [804fecba] net_rx_action+0x8a/0x140 [8023a269] __do_softirq+0x69/0xe0 [8020d47c] call_softirq+0x1c/0x30 [8020f315] do_softirq+0x35/0x90 [8023a105] irq_exit+0x55/0x60 [8020f3f0] do_IRQ+0x80/0x100 [8020c7d1] ret_from_intr+0x0/0xa EOI Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen: On Mon, 3 Dec 2007, Wolfgang Walter wrote: with kernel 2.6.23.8 we saw a KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at net/ipv4/tcp_input.c (1292) Is this the only message? Are there any Leak printouts? Any tweaking done to TCP related sysctls? net/core/somaxconn=2048 net/ipv4/tcp_syncookies=1 net/ipv4/tcp_max_syn_backlog=8192 net/ipv4/tcp_max_tw_buckets=180 net/ipv4/tcp_window_scaling=0 net/ipv4/tcp_timestamps=0 Most likely I broke the manual synchronization for left_out in sacktag by skipping over it when packets_out == 0 but so far I haven't been able to figure out how such state could develop in the first place... Ie., I couldn't find a case where tcp_fastretrans_alert wouldn't be called if left_out was non-zero (and it did the sync_left_out after modifying either sacked_out or lost_out, IIRC). ...If you can reproduce it, you could try if this patch below changes anything (should silence the assert and trigger earlier a WARN_ON or two :-)). ...If this triggers, then I'm sure we can pollute TCP code by a larger number of more costly checks to catch it in early. This might reveal a long-standing inconsistency of left_out in some case I just couldn't come up with by code review. Left_out will be (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for left_out occurs quite soon after that BUG_TRAP anyway so the effect won't be too dramatic, prior_in_flight would be once stale, won't lead to big problems (either missed cnwd or cwnd_cnt increment, or failure to do application limited check at that particular ACK). Thanks anyway for the report. ...If I figure something out here, I'll let you know. -- diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9298a7..0c5194d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - tp-max_window)) return 0; - if (!tp-packets_out) + if (!tp-packets_out) { + WARN_ON(tp-sacked_out); + WARN_ON(tp-lost_out); + WARN_ON(tp-left_out); goto out; + } /* SACK fastpath: * if the only SACK change is the increase of the end_seq of @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } +out: + tp-left_out = tp-sacked_out + tp-lost_out; if ((reord tp-fackets_out) icsk-icsk_ca_state != TCP_CA_Loss (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark))) tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0); -out: - #if FASTRETRANS_DEBUG 0 BUG_TRAP((int)tp-sacked_out = 0); BUG_TRAP((int)tp-lost_out = 0); Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c
Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen: On Mon, 3 Dec 2007, Wolfgang Walter wrote: with kernel 2.6.23.8 we saw a KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at net/ipv4/tcp_input.c (1292) Is this the only message? Are there any Leak printouts? No. 4 days earlier there were 3 messages: TCP: Treason uncloaked! Peer a.b.c.d:80/56532 shrinks window 3535507131:3535513869. Repaired. Any tweaking done to TCP related sysctls? And for completeness, is GSO enabled (ethtool -k)? rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off Most likely I broke the manual synchronization for left_out in sacktag by skipping over it when packets_out == 0 but so far I haven't been able to figure out how such state could develop in the first place... Ie., I couldn't find a case where tcp_fastretrans_alert wouldn't be called if left_out was non-zero (and it did the sync_left_out after modifying either sacked_out or lost_out, IIRC). ...If you can reproduce it, you could try if this patch below changes I don't know how to reproduce it - we never saw the message before. I'll aply the patch. Let see if the WARN_ON triggers before we update to a newer kernel :-). anything (should silence the assert and trigger earlier a WARN_ON or two :-)). ...If this triggers, then I'm sure we can pollute TCP code by a larger number of more costly checks to catch it in early. This might reveal a long-standing inconsistency of left_out in some case I just couldn't come up with by code review. Left_out will be (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for left_out occurs quite soon after that BUG_TRAP anyway so the effect won't be too dramatic, prior_in_flight would be once stale, won't lead to big problems (either missed cnwd or cwnd_cnt increment, or failure to do application limited check at that particular ACK). Thanks anyway for the report. ...If I figure something out here, I'll let you know. -- diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c9298a7..0c5194d 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq, prior_snd_una - tp-max_window)) return 0; - if (!tp-packets_out) + if (!tp-packets_out) { + WARN_ON(tp-sacked_out); + WARN_ON(tp-lost_out); + WARN_ON(tp-left_out); goto out; + } /* SACK fastpath: * if the only SACK change is the increase of the end_seq of @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_ } } +out: + tp-left_out = tp-sacked_out + tp-lost_out; if ((reord tp-fackets_out) icsk-icsk_ca_state != TCP_CA_Loss (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark))) tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0); -out: - #if FASTRETRANS_DEBUG 0 BUG_TRAP((int)tp-sacked_out = 0); BUG_TRAP((int)tp-lost_out = 0); Thanks and regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c
Hello, with kernel 2.6.23.8 we saw a KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at net/ipv4/tcp_input.c (1292) Regards, Wolfgang Walter -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 0/3] Interface group patches
From: Patrick McHardy I'm working on the incremental ruleset changing API BTW :) One of the changes will be that interface matching is not a default part of every rule, and without wildcards it will use the ifindex. But since the cost of this feature seems pretty low, I don't see a compelling reason against it. Using ifindex instead of string matching the interface name in -i and -o would be a serious problem as it changes the semantics. 1) Now you can match a non existing interface. This is certainly used. I.e. with vlan interfaces, ppp etc. 2) Now your rule will match an interface even if the ifindex of the interface changes. This is used (i.e. you activate a backup interface and rename it, build new bridges etc.). If one wants to use the ifindex instead of a string match on the name one should explicitly request that (i.e. by using -i =eth0 or something like that). Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ipsec: icmp fragmentation-needed from ipsec-gateway is not encrypted
Hello, I have the following problem: router A has two interfaces eth0 and eth1. router B has two interfaces eth0 and eth1. The networks on A:eth1 and B:eth1 are connected over an ipsec-tunnel. the mtu on A:eth1 is 1400 (all others are 1500). both run 2.6.22.6 If I now ping a host HA on A:eth1 from host HB on B:eth1 with packet size greater 1400 the ping fails. tcpdump on A:eth0 shows an esp-tunnel-packet from B comes in icmp echo-request packet from HB to HA comes in (the decrypted esp-packet) an unecrypted icmp fragmentation-needed packet to HB from A (ip of eth1) sent out It seems to me that this fragementation-needed packet generated by B is not handled by ipsec, is sent out unencrypted instead and this is the reason it does not reach HB. I should not see the unecrypted packet going out at all? Because if I ping A:eth1 from HB then I don't see the unencrypted echo-reply packet (which has the same source-address as the fragmentation needed) but only the outgoing esp-packet (and the echo-reply reaches HB, by the way). Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
Am Mittwoch, 12. September 2007 21:55 schrieb J. Bruce Fields: On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote: On Wednesday 12 September 2007, J. Bruce Fields wrote: On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote: So it is in 2.6.21 and later and should probably go to .stable for .21 and .22. Bruce: for you :-) OK, thanks! But, (as is alas often the case) I'm still confused: if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) 1 + || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set after that test? Not all the code that does either of those is under the same serv-sv_lock lock that this code is. This should not matter - SK_CLOSED may be set at any time. svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue ensures that it is not enqueued twice. Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat, thanks. Can you verify that this solves your problem? Patch works fine here. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
Hello, as already described old temporary sockets (client is gone) of lockd aren't closed after some time. So, with enough clients and some time gone, there are 80 open dangling sockets and you start getting messages of the form: lockd: too many open TCP sockets, consider increasing the number of nfsd threads. If I understand the code then the intention was that the server closes temporary sockets after about 6 to 12 minutes: a timer is started which calls svc_age_temp_sockets every 6 minutes. svc_age_temp_sockets: if a socket is marked OLD it gets closed. sockets which are not marked as OLD are marked OLD every time the sockets receives something OLD is cleared. But svc_age_temp_sockets never closes any socket though because it only closes sockets with svsk-sk_inuse == 0. This seems to be a bug. Here is a patch against 2.6.22.6 which changes the test to svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine here. Unused sockets get closed (after 6 to 12 minutes) Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- ../linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200 +++ net/sunrpc/svcsock.c2007-09-11 11:07:13.0 +0200 @@ -1572,7 +1575,7 @@ if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); As svc_age_temp_sockets did not do anything before this change may trigger hidden bugs. To be true I don't see why this check (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags)) is needed at all (it can only be an optimation) as this fields change after the check. In svc_tcp_accept there is no such check when a temporary socket is closed. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
Am Mittwoch, 12. September 2007 15:37 schrieb J. Bruce Fields: On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote: as already described old temporary sockets (client is gone) of lockd aren't closed after some time. So, with enough clients and some time gone, there are 80 open dangling sockets and you start getting messages of the form: lockd: too many open TCP sockets, consider increasing the number of nfsd threads. Thanks for working on this problem! If I understand the code then the intention was that the server closes temporary sockets after about 6 to 12 minutes: a timer is started which calls svc_age_temp_sockets every 6 minutes. svc_age_temp_sockets: if a socket is marked OLD it gets closed. sockets which are not marked as OLD are marked OLD every time the sockets receives something OLD is cleared. But svc_age_temp_sockets never closes any socket though because it only closes sockets with svsk-sk_inuse == 0. This seems to be a bug. Here is a patch against 2.6.22.6 which changes the test to svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine here. Unused sockets get closed (after 6 to 12 minutes) So the fact that this changes the behavior means that sk_inuse is taking on negative values. This can't be right--how can something like svc_sock_put() (which does an atomic_dec_and_test) work in that case? You probably misread the code. if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) continue; This means: any socket where svsk-sk_inuse != 0 or SK_BUSY is set is ignored by svc_age_temp_sockets: no attempt is made to close the svc. This seems to be wrong: if svsk-sk_inuse is zero only if svc_delete_socket has been called for it and will be deleted anyway (probably it is already closed then). But the intention of svc_age_temp_sockets is to close open temporary sockets where no traffic has been received for more than 6 minutes. These sockets have svsk-sk_inuse = 1. My patch does exactly this: instead of skip sockets which are not already deleted or which are busy to skip sockets which are already deleted or which are busy I wish I had time today to figure out what's going on in this case. But from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious of anything without the stereotyped behavior--initializing to one, atomic_inc()ing whenever someone takes a reference, and atomic_dec_and_test()ing whenever someone drops it Then svc_tcp_accept would be wrong, too (it closes sockets the same way just without testing for sk_inuse and SK_BUSY). I think this works because as long as a socket is in sv_tempsocks or sv_permsocks svsk-sk_inuse can never reach zero. As svc_age_temp_sockets locks the list nobody can bring svsk-sk_inuse to zero as long as svc_age_temp_sockets holds the lock. As svc_age_temp_sockets calls atomic_inc(svsk-sk_inuse) when holding the lock there is no problem. (the same is true for svc_tcp_accept). This is the reason why I doubt that this check for svsk-sk_inuse in svc_age_temp_sockets is usefull at all. It should be always false. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
On Wednesday 12 September 2007, J. Bruce Fields wrote: On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote: So it is in 2.6.21 and later and should probably go to .stable for .21 and .22. Bruce: for you :-) OK, thanks! But, (as is alas often the case) I'm still confused: if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) 1 + || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set after that test? Not all the code that does either of those is under the same serv-sv_lock lock that this code is. This should not matter - SK_CLOSED may be set at any time. svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue ensures that it is not enqueued twice. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)
On Wednesday 12 September 2007, J. Bruce Fields wrote: On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote: On Wednesday 12 September 2007, J. Bruce Fields wrote: On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote: So it is in 2.6.21 and later and should probably go to .stable for .21 and .22. Bruce: for you :-) OK, thanks! But, (as is alas often the case) I'm still confused: if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; - if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse) 1 + || test_bit(SK_BUSY, svsk-sk_flags)) continue; atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set after that test? Not all the code that does either of those is under the same serv-sv_lock lock that this code is. This should not matter - SK_CLOSED may be set at any time. svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue ensures that it is not enqueued twice. Oh, got it. And the list manipulation is safe thanks to sv_lock. Neat, thanks. Can you verify that this solves your problem? I'll test it tomorrow. So friday morning I'll know and mail you for sure. Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] problems with lockd in 2.6.22.6
On Friday 07 September 2007, J. Bruce Fields wrote: On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote: Hello, 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not proberly work any more. Restarting the nfs-daemon is a workaround. I wonder why these sockets remain open, by the way. Even if they aren't used for days. Such a socket only gets deleted when the 81. socket must be opened. If I do not misunderstand the idea then temporary sockets should be destroyed after some time without activity by svc_age_temp_sockets. Now I wonder how svc_age_temp_sockets works. Does it ever close and delete a temporary socket at all? static void svc_age_temp_sockets(unsigned long closure) { struct svc_serv *serv = (struct svc_serv *)closure; struct svc_sock *svsk; struct list_head *le, *next; LIST_HEAD(to_be_aged); dprintk(svc_age_temp_sockets\n); if (!spin_trylock_bh(serv-sv_lock)) { /* busy, try again 1 sec later */ dprintk(svc_age_temp_sockets: busy\n); mod_timer(serv-sv_temptimer, jiffies + HZ); return; } list_for_each_safe(le, next, serv-sv_tempsocks) { svsk = list_entry(le, struct svc_sock, sk_list); if (!test_and_set_bit(SK_OLD, svsk-sk_flags)) continue; if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags)) continue; doesn't this mean that svsk-sk_inuse must be zero which means that SK_DEAD is set? and wouldn't that mean that svc_delete_socket already has been called for that socket (and probably is already closed) ? and wouldn't that mean that svc_sock_enqueue which is called later does not make any sense (it checks for SK_DEAD)? atomic_inc(svsk-sk_inuse); list_move(le, to_be_aged); set_bit(SK_CLOSE, svsk-sk_flags); set_bit(SK_DETACHED, svsk-sk_flags); } spin_unlock_bh(serv-sv_lock); while (!list_empty(to_be_aged)) { le = to_be_aged.next; /* fiddling the sk_list node is safe 'cos we're SK_DETACHED */ list_del_init(le); svsk = list_entry(le, struct svc_sock, sk_list); dprintk(queuing svsk %p for closing, %lu seconds old\n, svsk, get_seconds() - svsk-sk_lastrecv); /* a thread will dequeue and close it soon */ svc_sock_enqueue(svsk); svc_sock_put(svsk); } mod_timer(serv-sv_temptimer, jiffies + svc_conn_age_period * HZ); } Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
problems with lockd in 2.6.22.6
Hello, we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from ^\\236^\É^D 1) These random characters in the second line are caused by a bug in svc_tcp_accept. I already posted this patch on netdev@vger.kernel.org: Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200 +++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 +0200 @@ -1090,7 +1090,7 @@ serv-sv_name); printk(KERN_NOTICE %s: last TCP connect from %s\n, - serv-sv_name, buf); + serv-sv_name, __svc_print_addr(sin, buf, sizeof(buf))); } /* * Always select the oldest socket. It's not fair, with this patch applied one gets something like lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from 10.11.0.12, port=784 2) The number of nfsd threads we are running on the machine is 1024. So this is not the problem. It seems, though, that in the case of lockd svc_tcp_accept does not check the number of nfsd threads but the number of lockd threads which is one. As soon as the number of open lockd sockets surpasses 80 this message gets logged. This usually happens every evening when a lot of people shutdown their workstation. 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not proberly work any more. Restarting the nfs-daemon is a workaround. Reagrds, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [NFS] problems with lockd in 2.6.22.6
Am Freitag, 7. September 2007 18:19 schrieben Sie: On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote: Hello, we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message lockd: too many open TCP sockets, consider increasing the number of nfsd threads lockd: last TCP connect from ^\\236^\É^D 2) The number of nfsd threads we are running on the machine is 1024. So this is not the problem. It seems, though, that in the case of lockd svc_tcp_accept does not check the number of nfsd threads but the number of lockd threads which is one. As soon as the number of open lockd sockets surpasses 80 this message gets logged. This usually happens every evening when a lot of people shutdown their workstation. So to be clear: there's not an actual problem here other than that the logs are getting spammed? (Not that that isn't a problem in itself.) When more than 80 nfs clients try to lock files at the same time then it probably would. 3) For unknown reason these sockets then remain open. In the morning when people start their workstation again we therefor not only get a lot of these messages again but often the nfs-server does not properly work any more. Restarting the nfs-daemon is a workaround. Hm, thanks. I don't know if the lockd thing is the reason, though. 2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. Sometimes this fails with lockd_down: lockd failed to exit, clearing pid nfsd: last server has exited nfsd: unexporting all filesystems lockd_up: makesock failed, error=-98 after which the server must be rebooted. I think there is something with lockd because there are no problems over the day. It is in the morning when a lot of people log into their machines and start their desktops (I think kde locks its config files when it reads them). Regards -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] sunrpc: fix printk argument in svc_tcp_accept
Hello, in 2.6.22.6, net/sunrpc/svcsock.c random characters are printed by svc_tcp_accept: lockd: last TCP connect from some random chars because buf is used unitialized: printk(KERN_NOTICE %s: last TCP connect from %s\n, serv-sv_name, buf); Probably it should be printk(KERN_NOTICE %s: last TCP connect from %s\n, serv-sv_name, __svc_print_addr(sin, buf, sizeof(buf))); Signed-off-by: Wolfgang Walter [EMAIL PROTECTED] --- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200 +++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 +0200 @@ -1090,7 +1090,7 @@ serv-sv_name); printk(KERN_NOTICE %s: last TCP connect from %s\n, - serv-sv_name, buf); + serv-sv_name, __svc_print_addr(sin, buf, sizeof(buf))); } /* * Always select the oldest socket. It's not fair, Regards, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: iproute2 weirdness - fwmark and route pb
Hi, you may check if /proc/sys/net/ipv4/conf/eth3/rp_filter is 0. If it is 1 the kernel does a route lookup for an outgoing pseudo packet for every packet arriving on eth3. This pseudo packet is the incoming packet but with src and dst address exchanged. Only if this route goes via the same device as the original packet arrived on the latter is accepted. I don't think that netfilter is consulted in this process. So there this pseudo-packet is not marked and therefor your isdn table is not consulted. The iif roules will not match either. Instead table main is consulted where a route is found. But this route is via eth2. Please note that if you set /proc/sys/net/ipv4/conf/eth3/rp_filter to 0 you probably want to check the src address of incoming packets on eth3 for not being ones from your eth1. Greetings, -- Wolfgang Walter Studentenwerk München Anstalt des öffentlichen Rechts Leopoldstraße 15 80802 München [EMAIL PROTECTED] http://www.studentenwerk.mhn.de/ - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html