Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-11 Thread Wolfgang Walter
Am Dienstag, 11. September 2018, 12:33:34 schrieb Steffen Klassert:
> On Mon, Sep 10, 2018 at 10:18:47AM +0200, Kristian Evensen wrote:
> > Hi,
> > 
> > Thanks everyone for all the effort in debugging this issue.
> > 
> > On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert
> > 
> >  wrote:
> > > The easy fix that could be backported to stable would be
> > > to check skb->dst for NULL and drop the packet in that case.
> > 
> > Thought I should just chime in and say that we deployed this
> > work-around when we started observing the error back in June. Since
> > then we have not seen any crashes. Also, we have instrumented some of
> > our kernels to count the number of times the error is hit (overall +
> > consecutive). Compared to the overall number of packets, the error
> > happens very rarely. With our workloads, we on average see the error
> > once every couple of days.
> 
> Thanks for letting us know!
> 
> I plan to fix this in the ipsec tree with:
> 
> Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force
> clears the dst_entry.
> 
> Since commit 222d7dbd258d ("net: prevent dst uses after free")
> skb_dst_force() might clear the dst_entry attached to the skb.
> The xfrm code don't expect this to happen, so we crash with
> a NULL pointer dereference in this case. Fix it by checking
> skb_dst(skb) for NULL after skb_dst_force() and drop the packet
> in cast the dst_entry was cleared.
> 
> Fixes: 222d7dbd258d ("net: prevent dst uses after free")
> Reported-by: Tobias Hommel 
> Reported-by: Kristian Evensen 
> Reported-by: Wolfgang Walter 
> Signed-off-by: Steffen Klassert 
> ---
>  net/xfrm/xfrm_output.c | 4 
>  net/xfrm/xfrm_policy.c | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index 89b178a78dc7..36d15a38ce5e 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -101,6 +101,10 @@ static int xfrm_output_one(struct sk_buff *skb, int
> err) spin_unlock_bh(>lock);
> 
>   skb_dst_force(skb);
> + if (!skb_dst(skb)) {
> + XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> + goto error_nolock;
> + }
> 
>   if (xfrm_offload(skb)) {
>   x->type_offload->encap(x, skb);
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7c5e8978aeaa..626e0f4d1749 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2548,6 +2548,10 @@ int __xfrm_route_forward(struct sk_buff *skb,
> unsigned short family) }
> 
>   skb_dst_force(skb);
> + if (!skb_dst(skb)) {
> + XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR);
> + return 0;
> + }
> 
>   dst = xfrm_lookup(net, skb_dst(skb), , NULL, XFRM_LOOKUP_QUEUE);
>   if (IS_ERR(dst)) {

This patch fixes the problem here.

XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably 
this happens when some route are changed/set then. 

Regards and thanks,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-10 Thread Wolfgang Walter
Am Montag, 10. September 2018, 10:18:47 schrieb Kristian Evensen:
> Hi,
> 
> Thanks everyone for all the effort in debugging this issue.
> 
> On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert
> 
>  wrote:
> > The easy fix that could be backported to stable would be
> > to check skb->dst for NULL and drop the packet in that case.
> 
> Thought I should just chime in and say that we deployed this
> work-around when we started observing the error back in June. Since
> then we have not seen any crashes. Also, we have instrumented some of
> our kernels to count the number of times the error is hit (overall +
> consecutive). Compared to the overall number of packets, the error
> happens very rarely. With our workloads, we on average see the error
> once every couple of days.
> 

Would you mind send us yout patch (with the accounting) so that we can check 
how often that happens here?

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-07 Thread Wolfgang Walter
Hello Steffen,

in one of your emails to Thomas you wrote:
> xfrm_lookup+0x2a is at the very beginning of xfrm_lookup(), here we
> find:
> 
> u16 family = dst_orig->ops->family;
> 
> ops has an offset of 32 bytes (20 hex) in dst_orig, so looks like
> dst_orig is NULL.
> 
> In the forwarding case, we get dst_orig from the skb and dst_orig
> can't be NULL here unless the skb itself is already fishy.

Is this really true?

If xfrm_lookup is called from 

__xfrm_route_forward():

int __xfrm_route_forward(struct sk_buff *skb, unsigned short family)
{
struct net *net = dev_net(skb->dev);
struct flowi fl;
struct dst_entry *dst;
int res = 1;

if (xfrm_decode_session(skb, , family) < 0) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR);
return 0;
}

skb_dst_force(skb);

dst = xfrm_lookup(net, skb_dst(skb), , NULL, XFRM_LOOKUP_QUEUE);
if (IS_ERR(dst)) {
res = 0;
dst = NULL;
}
skb_dst_set(skb, dst);
return res;
}

couldn't it be possible that skb_dst_force(skb) actually sets dst to NULL if 
it cannot safely lock it? If it is absolutely sure that skb_dst_force() never 
can set dst to NULL I wonder why it is called at all?


Here is  skb_dst_force()

static inline void skb_dst_force(struct sk_buff *skb)
{
if (skb_dst_is_noref(skb)) {
struct dst_entry *dst = skb_dst(skb);

WARN_ON(!rcu_read_lock_held());
if (!dst_hold_safe(dst))
dst = NULL;

skb->_skb_refdst = (unsigned long)dst;
}
}

and dst_hold_safe() is

static inline bool dst_hold_safe(struct dst_entry *dst)
{
return atomic_inc_not_zero(>__refcnt);
}



Am Freitag, 7. September 2018, 22:22:39 schrieb Wolfgang Walter:
> Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert:
> > On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote:
> > > Hello,
> > > 
> > > kernels > 4.12 do not work on one of our main routers. They crash as
> > > soon
> > > as ipsec-tunnels are configured and ipsec-traffic actually flows.
> > 
> > Can you please send the backtrace of this crash?
> 
> I bootet the b838d5e1c5b6e57b10ec8af2268824041e3ea911 several times but I
> could not record the complete trace. I think I have to log to the serial
> console but I can't do that before next week.
> 
> 
> What I could record ist:
> 
> There is a always
>... 
> the callrace.
> 
> This is the part I could see:
> 
> 
> irq_exit+0x71/0x80
> do_IRQ+0x4d/0xd0
> common_interrup+07a/0x7a
> 
> RIP: 010:cpuidle_enter_state+0x11d/0x200
> RSP: 0018:c9000321bee0 EFLAGS: 0282 ORIG_RAX: ffc4
> RAX: 88085efde450 RBX: 0004 RCX: 0003c9e63c13
> RDX: 0003c9e63c13 RSI: ffb03103fe35ac43 RDI: 
> RBP: e87cf600 R08: 000c R09: 0004
> R10: 0400 R11: 0003c99e56fc R12: 0003c9e63c13
> R13: 0003c9da9567 R14: 0004 R15: 822763e0
> do_idle+0xd3/0x160
> cpu_startup_entry+0x14/0x20
> secondary_startup_64+0xa5/0xb0
> Code: 00 0f b7 83 c0 00 00 00 80 7c 02 08 01 0f 86 d3 02 00 00 41
> 8b 8c 24 3c 10 00 00 48 8b 6b 58 85 c9 0f 84 2f 01 00 00 48 83 e5 fe  45
> 60
> 02 0f 84 4e 01 00 00 f6 43 38 01 74 0d 80 00 bd ab 00 00
> RIP: ip_forward+0xd4/0x470 RSP: 88085efc3cb0
> CR2: 0060
> [ end trace 7205b53c25b7b35a ]---
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: disabled
> Rebooting in 60 seconds..
> 
> 
> I got an email from Tobias Hommel and I think it is the same problem.
> 
> It is very clear that it is the difference from
> 
>   ipv4: call dst_hold_safe() properly
> 
> to
> 
>   ipv4: mark DST_NOGC and remove the operation of dst_free()
> 
> which triggers this bug.
> 
> Regards,

Regards
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-07 Thread Wolfgang Walter
Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert:
> On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote:
> > Hello,
> > 
> > kernels > 4.12 do not work on one of our main routers. They crash as soon
> > as ipsec-tunnels are configured and ipsec-traffic actually flows.
> 
> Can you please send the backtrace of this crash?
> 

I bootet the b838d5e1c5b6e57b10ec8af2268824041e3ea911 several times but I 
could not record the complete trace. I think I have to log to the serial 
console but I can't do that before next week.


What I could record ist:

There is a always 
 ... 
the callrace.

This is the part I could see:


irq_exit+0x71/0x80
do_IRQ+0x4d/0xd0
common_interrup+07a/0x7a

RIP: 010:cpuidle_enter_state+0x11d/0x200
RSP: 0018:c9000321bee0 EFLAGS: 0282 ORIG_RAX: ffc4
RAX: 88085efde450 RBX: 0004 RCX: 0003c9e63c13
RDX: 0003c9e63c13 RSI: ffb03103fe35ac43 RDI: 
RBP: e87cf600 R08: 000c R09: 0004
R10: 0400 R11: 0003c99e56fc R12: 0003c9e63c13
R13: 0003c9da9567 R14: 0004 R15: 822763e0
do_idle+0xd3/0x160
cpu_startup_entry+0x14/0x20
secondary_startup_64+0xa5/0xb0
Code: 00 0f b7 83 c0 00 00 00 80 7c 02 08 01 0f 86 d3 02 00 00 41
8b 8c 24 3c 10 00 00 48 8b 6b 58 85 c9 0f 84 2f 01 00 00 48 83 e5 fe  45 
60
02 0f 84 4e 01 00 00 f6 43 38 01 74 0d 80 00 bd ab 00 00
RIP: ip_forward+0xd4/0x470 RSP: 88085efc3cb0
CR2: 0060
[ end trace 7205b53c25b7b35a ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: disabled
Rebooting in 60 seconds..


I got an email from Tobias Hommel and I think it is the same problem.

It is very clear that it is the difference from

ipv4: call dst_hold_safe() properly

to

ipv4: mark DST_NOGC and remove the operation of dst_free()

which triggers this bug.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-09-07 Thread Wolfgang Walter
Hello,

didn't respond as I've been on vacation.

Am Freitag, 31. August 2018, 08:50:24 schrieb Steffen Klassert:
> On Thu, Aug 30, 2018 at 08:53:50PM +0200, Wolfgang Walter wrote:
> > Hello,
> > 
> > kernels > 4.12 do not work on one of our main routers. They crash as soon
> > as ipsec-tunnels are configured and ipsec-traffic actually flows.
> 
> Can you please send the backtrace of this crash?
> 

I'll try today. The oops quickly disappears because other problems arising 
from it pop up. The machine crashes and no logs are logged. I try to make foto 
or try to log to the serial console.

At the moment I only see that there is xfrm_ stuff in the call trace as 
xfrm_lookup, xfrm_route_, and it is while routing a packet.

With later kernels (4.18.5) the machine seems to crash without a call trace on 
console.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()

2018-08-30 Thread Wolfgang Walter
Hello,

kernels > 4.12 do not work on one of our main routers. They crash as soon
as ipsec-tunnels are configured and ipsec-traffic actually flows.
 
Just configuring ipsec (that is starting strongswan) does not trigger the
oops.
 
I finally found time to bisect that. It bisected down to

b838d5e1c5b6e57b10ec8af2268824041e3ea911
ipv4: mark DST_NOGC and remove the operation of dst_free()

Now we have other machines which run just fine with the very same kernels
doing ipsec. They differ insofar as they have much less cores, do not use
the ixgbe driver, do not have 10G and terminate only a few tunnels instead
of hundreds.

I already tested distribution kernels > 4.12 from debian, they also crash.

All kernels I created in the bisection run fine if I didn't use ipsec.
The bad ones all oopsed/crashed exactly as vanilla 4.14 described above.


Here is the bisect-log:

# bad: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14
# good: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
git bisect start 'v4.14' 'v4.9'
# good: [d82dd0e34d0347be201fd274dc84cd645dccc064] raid1: prefer disk without 
bad blocks
git bisect good d82dd0e34d0347be201fd274dc84cd645dccc064
# bad: [9967468c0a109644e4a1f5b39b39bf86fe7507a7] Merge branch 'akpm' (patches 
from Andrew)
git bisect bad 9967468c0a109644e4a1f5b39b39bf86fe7507a7
# bad: [17d9aa66b08de445645bd0688fc1635bed77a57b] Merge tag 
'iwlwifi-next-for-kalle-2017-06-30' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
git bisect bad 17d9aa66b08de445645bd0688fc1635bed77a57b
# good: [de4d195308ad589626571dbe5789cebf9695a204] Merge branch 
'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good de4d195308ad589626571dbe5789cebf9695a204
# good: [9376906c17fa975bf6a7ea9dd124be697bcda289] Merge branch 
'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 9376906c17fa975bf6a7ea9dd124be697bcda289
# good: [40e86a3619a1e84ad73c716c943f65fc38eb1e28] iwlwifi: mvm: use 
scnprintf() instead of snprintf()
git bisect good 40e86a3619a1e84ad73c716c943f65fc38eb1e28
# bad: [c66f2091c9248ddf42504c74cd327ae8619b04a4] net/mlx5e: Prevent PFC call 
for non ethernet ports
git bisect bad c66f2091c9248ddf42504c74cd327ae8619b04a4
# good: [a090bd4ff8387c409732a8e059fbf264ea0bdd56] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good a090bd4ff8387c409732a8e059fbf264ea0bdd56
# good: [1947030645b6012aeee98da764d6dd47071a6aad] Merge branch 
'dsa-prefix-Global-macros'
git bisect good 1947030645b6012aeee98da764d6dd47071a6aad
# good: [69137ea60c9dad58773a1918de6c1b00b088520c] pktgen: Specify num packets 
per thread
git bisect good 69137ea60c9dad58773a1918de6c1b00b088520c
# good: [d24406c85d123df773bc4df88ad5da2233896919] udp: call dst_hold_safe() in 
udp_sk_rx_set_dst()
git bisect good d24406c85d123df773bc4df88ad5da2233896919
# bad: [5b7c9a8ff828287af5aebe93e707271bf1a82cc3] net: remove dst gc related 
code
git bisect bad 5b7c9a8ff828287af5aebe93e707271bf1a82cc3
# bad: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC and 
remove the operation of dst_free()
git bisect bad b838d5e1c5b6e57b10ec8af2268824041e3ea911
# good: [4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36] net: introduce a new 
function dst_dev_put()
git bisect good 4a6ce2b6f2ecabbddcfe47e7cf61dd0f00b10e36
# good: [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() 
properly
git bisect good 95c47f9cf5e028d1ae77dc6c767c1edc8a18025b
# good: [9df16efadd2a8a82731dc76ff656c771e261827f] ipv4: call dst_hold_safe() 
properly
git bisect good 9df16efadd2a8a82731dc76ff656c771e261827f
# first bad commit: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark 
DST_NOGC and remove the operation of dst_free()


In my first email I wrote >= 4.12, but I think 4.12 works. I bisected between
4.9 and 4.14 as we actually run 4.9 on the machine with the problem and 4.14
on most other routers.

I also tested 4.18.5 and it still shows this bug.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


kernels >= v4.12 oops/crash with ipsec-traffic: partly bisected

2018-08-30 Thread Wolfgang Walter
Hello,

kernels >= 4.12 do not work on one of our main routers. They crash as soon as 
ipsec-tunnels are configured and ipsec-traffic actually flows.

Just configuring ipsec (that is starting strongswan) does not trigger the 
oops.

I finally found time to bisect that. Though I have not completed that yet, I 
already narrowed it down to the following commits

good: d24406c85d123df773bc4df88ad5da2233896919
udp: call dst_hold_safe() in udp_sk_rx_set_dst()
bad: 5b7c9a8ff828287af5aebe93e707271bf1a82cc3
net: remove dst gc related code

Commits in between are almost all changes to remove dst gc.

Now we have other machines which run just fine with the very same kernels 
doing ipsec. They differ insofar as they have much less cores, do not use the 
ixgbe driver, do not have 10G and terminate only a few tunnels instead of 
hundreds.

I already tested distribution kernels > 4.12 from debian, they also crash.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: regression kernel 4.4: stops routing packets with a GRE-payload

2016-01-20 Thread Wolfgang Walter
Am Mittwoch, 20. Januar 2016, 17:58:52 schrieb Nicolas Dichtel:
> Le 20/01/2016 15:00, Wolfgang Walter a écrit :
> > Hello,
> > 
> > we tried 4.4 on our routers. We found one problem: 4.4 stops routing GRE
> > packets (ipv4 in GRE/ipv4) here. 4.4.15 works fine.
> 
> 4.4.15 does not exist. Is it 4.1.15?

Yes, I mean 4.1.15

-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Re: [PATCH net-next] ipv6: gro: support sit protocol

2015-11-04 Thread Wolfgang Walter
Am Mittwoch, 4. November 2015, 04:40:51 schrieb Eric Dumazet:
> On Wed, 2015-11-04 at 13:19 +0100, Wolfgang Walter wrote:
> > Today I found a problem: on a router forwarding GRE-packets (ipv4) (it is
> > not the endpount) the interface (intel igb) stops sending packets after
> > some time. I think this happens when an ISATAP packet is inside the
> > GRE-packet.> 
> > gre packets arrives on eth0
> > eth1 stops sending (receiving still works)
> > ethtool -r eth1
> > eth1 works again for some time
> > 
> > Switching GRO off on eth0 "fixes" the problem.
> > 
> > I didn't test vanilla 4.1.12 yet, though. Until today 4.1.11 has been
> > running on the router. What I tested was your patch
> > 
> > "gre_gso_segment() chokes if SIT frames were aggregated by GRO 
engine."
> > 
> > but did not solve the problem.
> > 
> > So I would not recommend to backport it to longterm 4.1.
> > 
> > My plans are:
> > 
> > * test vanilla 4.1.12
> > * test 4.3
> > 
> > I want to test 4.3 on another router first, though.
> 
> If the NIC stops sending packets after some time, it might be an igb
> issue.

Yes, maybe igb has a problem sending a gro-packet if it is an isatap in gre.

igb has no problem sending gro-packets which are pure isatap or which are ipv4 
(tcp/udp) in gre with 4.1.12 + these patches.

And it had no problem with 4.1.11 with isatap in gre.

Disabling gso for the interface does help.

I'll test pure 4.1.12 soon.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: gro: support sit protocol

2015-11-04 Thread Wolfgang Walter
Am Dienstag, 3. November 2015, 05:07:33 schrieb Eric Dumazet:
> On Tue, 2015-11-03 at 13:57 +0100, Wolfgang Walter wrote:
> > Am Montag, 19. Oktober 2015, 20:40:17 schrieb Eric Dumazet:
> > > From: Eric Dumazet <eduma...@google.com>
> > > 
> > > Tom Herbert added SIT support to GRO with commit
> > > 19424e052fb4 ("sit: Add gro callbacks to sit_offload"),
> > > later reverted by Herbert Xu.
> > > 
> > > The problem came because Tom patch was building GRO
> > > packets without proper meta data : If packets were locally
> > > delivered, we would not care.
> > > 
> > > But if packets needed to be forwarded, GSO engine was not
> > > able to segment individual segments.
> > > 
> > > With the following patch, we correctly set skb->encapsulation
> > > and inner network header. We also update gso_type.
> > 
> > I'm running 4.1.11 / 4.1.12 with this patch on top now since over a week.
> > ISATAP works fine.
> 
> Perfect ! thanks a lot for testing !

Today I found a problem: on a router forwarding GRE-packets (ipv4) (it is not 
the endpount) the interface (intel igb) stops sending packets after some time. 
I think this happens when an ISATAP packet is inside the GRE-packet.

gre packets arrives on eth0
eth1 stops sending (receiving still works)
ethtool -r eth1
eth1 works again for some time

Switching GRO off on eth0 "fixes" the problem.

I didn't test vanilla 4.1.12 yet, though. Until today 4.1.11 has been running 
on the router. What I tested was your patch
"gre_gso_segment() chokes if SIT frames were aggregated by GRO engine."
but did not solve the problem.

So I would not recommend to backport it to longterm 4.1.

My plans are:

* test vanilla 4.1.12
* test 4.3

I want to test 4.3 on another router first, though.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: gro: support sit protocol

2015-11-04 Thread Wolfgang Walter
Am Mittwoch, 4. November 2015, 07:13:07 schrieb Eric Dumazet:
> On Wed, 2015-11-04 at 15:09 +0100, Wolfgang Walter wrote:
> > Yes, maybe igb has a problem sending a gro-packet if it is an isatap in
> > gre.
> We might detect this condition properly from igb ndo_features_check
> method.
> 
> It currently uses plain passthru_features_check()
> 
> > igb has no problem sending gro-packets which are pure isatap or which are
> > ipv4 (tcp/udp) in gre with 4.1.12 + these patches.
> > 
> > And it had no problem with 4.1.11 with isatap in gre.
> > 
> > Disabling gso for the interface does help.
> 
> My patch was aimed for 4.4, not sure about backports to old kernels...

I know. I cannot test 4.4 (or net-next) on that router, though, as I don't 
have easy physical access to it if it crashes or I loose network connectivity. 
For such tests I must send someone in situ.

As 4.4 will be the next longterm kernel I definitivly will do that for 4.4-rc2 
or 4.4-rc3.

I think your patch is correct for 4.1 in the sense that ISATAP is correctly 
handled. Only SIT in GRE triggers this and if it is indeed igb I will see it 
probably in 4.4 ;-), too.

I now tested an unmodified 4.1.12 and it shows no problems.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] ipv6: gro: support sit protocol

2015-11-03 Thread Wolfgang Walter
Am Montag, 19. Oktober 2015, 20:40:17 schrieb Eric Dumazet:
> From: Eric Dumazet <eduma...@google.com>
> 
> Tom Herbert added SIT support to GRO with commit
> 19424e052fb4 ("sit: Add gro callbacks to sit_offload"),
> later reverted by Herbert Xu.
> 
> The problem came because Tom patch was building GRO
> packets without proper meta data : If packets were locally
> delivered, we would not care.
> 
> But if packets needed to be forwarded, GSO engine was not
> able to segment individual segments.
> 
> With the following patch, we correctly set skb->encapsulation
> and inner network header. We also update gso_type.
> 

I'm running 4.1.11 / 4.1.12 with this patch on top now since over a week. 
ISATAP works fine.
 

> Tested:
> 
> Server :
> netserver
> modprobe dummy
> ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
> arp -s 8.0.0.100 4e:32:51:04:47:e5
> iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
> ifconfig sixtofour0
> sixtofour0 Link encap:IPv6-in-IPv4
>   inet6 addr: 2002:af6:798::1/128 Scope:Global
>   inet6 addr: 2002:af6:798::/128 Scope:Global
>   UP RUNNING NOARP  MTU:1480  Metric:1
>   RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:0
>   RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)
> 
> Client :
> netperf -H 2002:af6:798::1 -l 1000 &
> 
> Checked on server traffic copied on dummy0 and verify segments were
> properly rebuilt, with proper IP headers, TCP checksums...
> 
> tcpdump on eth0 shows proper GRO aggregation takes place.
> 
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> ---
>  net/ipv6/ip6_offload.c |   12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 08b62047c67f..eeca943f12dc 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -264,6 +264,9 @@ static int ipv6_gro_complete(struct sk_buff *skb, int
> nhoff) struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
>   int err = -ENOSYS;
> 
> + if (skb->encapsulation)
> + skb_set_inner_network_header(skb, nhoff);
> +
>   iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
> 
>   rcu_read_lock();
> @@ -280,6 +283,13 @@ out_unlock:
>   return err;
>  }
> 
> +static int sit_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> + skb->encapsulation = 1;
> + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT;
> + return ipv6_gro_complete(skb, nhoff);
> +}
> +
>  static struct packet_offload ipv6_packet_offload __read_mostly = {
>   .type = cpu_to_be16(ETH_P_IPV6),
>   .callbacks = {
> @@ -292,6 +302,8 @@ static struct packet_offload ipv6_packet_offload
> __read_mostly = { static const struct net_offload sit_offload = {
>   .callbacks = {
>   .gso_segment= ipv6_gso_segment,
> + .gro_receive= ipv6_gro_receive,
> + .gro_complete   = sit_gro_complete,
>   },
>  };

Thanks,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 3/4] ipv6: Add gro functions to sit_offloads

2015-10-20 Thread Wolfgang Walter
Hello Eric!

Am Freitag, 16. Oktober 2015, 08:23:49 schrieb Eric Dumazet:
> On Thu, 2015-08-06 at 17:15 -0700, Jesse Gross wrote:
> > On Mon, Aug 3, 2015 at 10:11 AM, Tom Herbert <t...@herbertland.com> wrote:
> > > For GRO to work with sit we need gro_receive and gro_complete populated
> > > in the sit_offload structure.
> > > 
> > > Signed-off-by: Tom Herbert <t...@herbertland.com>
> > 
> > You might want to checkout the recent history on this file unless
> > there's something that's changed in the last couple of weeks:
> > 
> > commit fdbf5b097bbd9693a86c0b8bfdd071a9a2117cfc
> > Author: Herbert Xu <herb...@gondor.apana.org.au>
> > Date:   Mon Jul 20 17:55:38 2015 +0800
> > 
> > Revert "sit: Add gro callbacks to sit_offload"
> > 
> > This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit:
> > Add gro callbacks to sit_offload") because it generates packets
> > that cannot be handled even by our own GSO.
> > 
> > Reported-by: Wolfgang Walter <li...@stwm.de>
> > Signed-off-by: Herbert Xu <herb...@gondor.apana.org.au>
> > Signed-off-by: David S. Miller <da...@davemloft.net>
> > 
> > --
> 
> What about the following more complete patch ?
> 
> We properly set skb->encapsulation and inner network header as some NIC
> drivers depend on it. Our GSO should also work properly I think.
> 
> Wolfgang, could you please test it ? (this is a patch on top of David
> Miller net-next tree)
> 
> Both Google and Facebook are eager to get proper GRO/SIT support ;)
> 
> Thanks !

In the moment I can't test it, sorry.

Will test it next week. But as I see you already did that yourself.

> 
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 08b62047c67f..eeca943f12dc 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -264,6 +264,9 @@ static int ipv6_gro_complete(struct sk_buff *skb, int
> nhoff) struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + nhoff);
>   int err = -ENOSYS;
> 
> + if (skb->encapsulation)
> + skb_set_inner_network_header(skb, nhoff);
> +
>   iph->payload_len = htons(skb->len - nhoff - sizeof(*iph));
> 
>   rcu_read_lock();
> @@ -280,6 +283,13 @@ out_unlock:
>   return err;
>  }
> 
> +static int sit_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> + skb->encapsulation = 1;
> + skb_shinfo(skb)->gso_type |= SKB_GSO_SIT;
> + return ipv6_gro_complete(skb, nhoff);
> +}
> +
>  static struct packet_offload ipv6_packet_offload __read_mostly = {
>   .type = cpu_to_be16(ETH_P_IPV6),
>   .callbacks = {
> @@ -292,6 +302,8 @@ static struct packet_offload ipv6_packet_offload
> __read_mostly = { static const struct net_offload sit_offload = {
>   .callbacks = {
>   .gso_segment= ipv6_gso_segment,
> + .gro_receive= ipv6_gro_receive,
> + .gro_complete   = sit_gro_complete,
>   },
>  };

Thank you very much,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Soft lockup issue in Linux 4.1.9

2015-10-02 Thread Wolfgang Walter
Am Freitag, 2. Oktober 2015, 09:17:16 schrieb Holger Hoffstätte:
> On 10/02/15 08:52, Andre Tomt wrote:
> > On 01. okt. 2015 13:52, Eric Dumazet wrote:
> >> On Thu, Oct 1, 2015 at 4:43 AM, Holger Hoffstätte
> >> 
> >> <holger.hoffstae...@googlemail.com> wrote:
> >>> On 10/01/15 13:29, Eric Dumazet wrote:
> >>>> commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af
> >>>> Author: Eric Dumazet <eduma...@google.com>
> >>>> Date:   Thu Aug 13 15:44:51 2015 -0700
> >>>> 
> >>>>  inet: fix potential deadlock in reqsk_queue_unlink()
> > 
> > 
> > 
> >>> Whohoo! It applies/builds cleanly to 4.1.10-rc1 and is running as
> >>> we speak. Let's hope that this fixes the lockups.
> >> 
> >> It definitely should help !
> >> 
> >> David, since patch is not yet seen on
> >> http://patchwork.ozlabs.org/bundle/davem/stable/?state=*
> >> could you please add it to your queue ?
> > 
> > Seems to fix it for me as well. 3 systems have been running varying
> > types of production-like loads with it for 14+ hours without hanging.
> 
> Just got up, and yes - my systems survived the night as well, no issues.
> 
> Greg, any chance you can drop this into the pending 4.1.10? Otherwise people
> will get another broken release.
> 

Fixes the problem here, too.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel 4.1.9: networking hangs with rcu_preempt self-detected stall, 4.1.8 works; was: Re: [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog

2015-10-01 Thread Wolfgang Walter
Am Dienstag, 29. September 2015, 12:48:43 schrieb Andre Tomt:
> On 29. sep. 2015 12:21, Andre Tomt wrote:
> > Meanwhile I'll revert both the mentioned net patches and see how it goes.
> 
> So that blew up as well, meaning it's not any of these two patches:
> [PATCH 4.1 124/159] net: do not process device backlog during unregistration
> [PATCH 4.1 125/159] net: call rcu_read_lock early in process_backlog
> 
> I'll be offline for a half+ day, I'll look into bisecting when back if
> nobody has figured it out by then.
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

We see these rcu hangs with 4.1.9 on one of our routers, too. 4.1.8 runs fine.

The output I got the last time was:

[ 6488.174578] igb 000:06:00.1 eth3: Reset adapter
[ 6497.350183] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=6301 
jiffies g=383330 c=383329 q=1323)
[ 6497.350229] rcu_preempt kthread starved for 6007 jiffies!
[ 6560.311093] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=25205 
jiffies g=383330 c=383329 q=4479)
[ 6560.311140] rcu_preempt kthread starved for 24911 jiffies!
[ 6623.272005] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=44109 
jiffies g=383330 c=383329 q=7107)
[ 6623.272049] rcu_preempt kthread starved for 43815 jiffies!
[ 6633.053892] igb 000:06:00.0 eth2: Reset adapter
[ 6633.053892] rcu_preempt kthread starved for 62719 jiffies!
[ 6486.232914] INFO: rcu_preempt self-detected stall on CPU { 3}  (t=63013 
jiffies g=383330 c=383329 q=8487)
[ 6486.233204] rcu_preempt kthread starved for 6007 jiffies!


All other hangs basically were the same, the cpu varies though.

After that the router completely hangs: networking stops working and we need to 
restart it.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sit: Set SKB_GSO_SIT bit when performing GRO

2015-07-20 Thread Wolfgang Walter
Am Montag, 20. Juli 2015, 14:14:59 schrieb Herbert Xu:
 On Fri, Jul 17, 2015 at 05:38:30PM +0200, Wolfgang Walter wrote:
  eth1 stops sending with the patch after some time
  disabling gro on eth0 helps
  disabling tso or gso on eth0 and/or eth1 or both does not help
  
  eth0 and eth1 are both intel I350.
 
 What does ethtool -k eth1 say?

With TSO enabled:

# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]  
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-switch-offload: off [fixed]

 
 Can you confirm that disabling tso on eth1 does not help?

Disabling TSO on eth1 does not help.

 
 Because the most plausible explanation is that we're feeding
 some bogus TSO packet to the hardware causing a tx lockup.

I run the unpatched 4.1.2 again since saturday without look. With your patch 
the network card hangs within 10 minutes or so.

On the other hand I run the the patched kernel on serveral other routers (same 
hardware, by the way) without problems.

So maybe the problem is that the former one routes GRE-tunnel-packets which 
contains ISATAP packets. I don't know how deep GRO/GSO inspects a packet.

 
 But in any case if it is a hardware lockup then it's no longer
 just a pure software bug.  No matter what we do in the stack
 the hardware should not lock up (unless of course we're feeding
 it something that's completely bogus).
 
 If we can't figure this out then the safest solution would be
 to disable tunnel GRO completely because it's broken as it stands.
 
 Cheers,

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sit: Set SKB_GSO_SIT bit when performing GRO

2015-07-17 Thread Wolfgang Walter
Am Freitag, 17. Juli 2015, 09:56:51 schrieb Herbert Xu:
 On Thu, Jul 16, 2015 at 12:58:45PM +0200, Wolfgang Walter wrote:
  Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu:
   On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote:
Yes. Switching TSO off and leaving GRO on works, too.
   
   OK, could you please try this patch?
  
  Patch works here.
 
 Thanks for the confirmation.  Let's add a tag for patchwork:
 
 Tested-by: Wolfgang Walter li...@stwm.de

It seems that this patch may cause a problem with another one of our routers. 
Without the patch it had no problem, so I didn't tested it there.

With that patch one interface blocks after some time. Not even arp requests 
get answered. It still receives packets though. Restarting the interface fixes 
the problem.

Switching off gro for the other interface helps.

This router is different from the other ones. It does not directly route 
isatap packets. It may routes isatap packets encapsulated in GRE packets, 
though. It is itself not an GRE-endpoint.

The router does NAT. Basically it routes the GRE-tunnel packets unatted and 
NATs most of the rest.
Not doing NAT and conntrack (and unloading all modules like nf_conntrack_ipv4, 
nf_defrag_ipv4) does not help.

eth0: extern
eth1: intern

One (IPv4) GRE-tunnel is routed between eth0 und eth1.
IPv6 ESP-tunnels are routed between eth0 and eth1
IPv4 UDP/TCP/ICMP from intern is natted with netfilter.

eth1 stops sending with the patch after some time
disabling gro on eth0 helps
disabling tso or gso on eth0 and/or eth1 or both does not help

eth0 and eth1 are both intel I350.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sit: Set SKB_GSO_SIT bit when performing GRO

2015-07-16 Thread Wolfgang Walter
Am Donnerstag, 16. Juli 2015, 08:23:50 schrieb Herbert Xu:
 On Wed, Jul 15, 2015 at 02:25:59PM +0200, Wolfgang Walter wrote:
  Yes. Switching TSO off and leaving GRO on works, too.
 
 OK, could you please try this patch?

Patch works here.

Thanks,

Wolfgang

 
 ---8---
 We need to set the SKB_GSO_SIT bit if we detect a 6-in-4 tunnel
 when doing GRO.  Otherwise we may throw a packet at TSO hardware
 that doesn't know what to do with it.
 
 Fixes: 19424e052fb4 (sit: Add gro callbacks to sit_offload)
 Reported-by: Wolfgang Walter li...@stwm.de
 Signed-off-by: Herbert Xu herb...@gondor.apana.org.au
 
 diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
 index e893cd1..1252eac 100644
 --- a/net/ipv6/ip6_offload.c
 +++ b/net/ipv6/ip6_offload.c
 @@ -289,11 +289,21 @@ static struct packet_offload ipv6_packet_offload
 __read_mostly = { },
  };
 
 +static int sit_gro_complete(struct sk_buff *skb, int nhoff)
 +{
 + int err = ipv6_gro_complete(skb, nhoff);
 +
 + skb-encapsulation = 1;
 + skb_shinfo(skb)-gso_type |= SKB_GSO_SIT;
 +
 + return err;
 +}
 +
  static const struct net_offload sit_offload = {
   .callbacks = {
   .gso_segment= ipv6_gso_segment,
   .gro_receive= ipv6_gro_receive,
 - .gro_complete   = ipv6_gro_complete,
 + .gro_complete   = sit_gro_complete,
   },
  };

-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GRO: forwading ISATAP packets is very slow with kernel 4.1

2015-07-15 Thread Wolfgang Walter
Am Mittwoch, 15. Juli 2015, 17:50:20 schrieb Herbert Xu:
 On Wed, Jul 15, 2015 at 02:34:41AM +0200, Wolfgang Walter wrote:
  I wonder if the router may treat all ipv6-tcp connections of a host as a
  single flow as all those ipv6-packets are embedded in ipv4-packets with
  the
  ipv4-address of the host and the ipv4-address of the isatap-gateway and
  next- header is 41. It may only looks at those?
 
 I think GRO is actually working fine.  Can you show me the output
 of ethtool -k eth1? If hardware TSO is enabled can you try disabling
 it to see if it helps?
 
 Thanks,

Yes. Switching TSO off and leaving GRO on works, too.

Thanks,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


GRO: forwading ISATAP packets is very slow with kernel 4.1

2015-07-14 Thread Wolfgang Walter
Hello,

I upgraded routers from 3.14.x to 4.1.2. Forwarding ISATAP-packets (IPv4 
packets with IPv6 payload) is very slow with 4.1 if GRO is enabled (youtube 
for example about 64kbit). Disabling GRO on the interfaces restores 
performance to values comparable to 3.14.x.

The kernel is build with IPv6 support but IPv6 is disabled via kernel command 
line. The router is not a tunnel endpoint, it only forwards the ISATAP-
packets. MTU is 1500 on both interfaces. Netfilter conntrack is not used and 
disabling netfilter has no effect.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GRO: forwading ISATAP packets is very slow with kernel 4.1

2015-07-14 Thread Wolfgang Walter
Am Mittwoch, 15. Juli 2015, 08:08:49 schrieben Sie:
 On Wed, Jul 15, 2015 at 12:16:30AM +0200, Wolfgang Walter wrote:
  Hello,
  
  I upgraded routers from 3.14.x to 4.1.2. Forwarding ISATAP-packets (IPv4
  packets with IPv6 payload) is very slow with 4.1 if GRO is enabled
  (youtube
  for example about 64kbit). Disabling GRO on the interfaces restores
  performance to values comparable to 3.14.x.
  
  The kernel is build with IPv6 support but IPv6 is disabled via kernel
  command line. The router is not a tunnel endpoint, it only forwards the
  ISATAP- packets. MTU is 1500 on both interfaces. Netfilter conntrack is
  not used and disabling netfilter has no effect.
 
 Can you run some tcpdumps and post the results in the two cases?
 

Yes, but I need the cooperation of one of our customers.

I wonder if the router may treat all ipv6-tcp connections of a host as a 
single flow as all those ipv6-packets are embedded in ipv4-packets with the 
ipv4-address of the host and the ipv4-address of the isatap-gateway and next-
header is 41. It may only looks at those?

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel = 4.0: crashes when using traceroute6 with isatap

2015-05-06 Thread Wolfgang Walter
Am Mittwoch, 6. Mai 2015, 11:15:18 schrieben Sie:
 (Cc'ing netdev.)
 
 On Sat, May 2, 2015 at 5:29 AM, Wolfgang Walter li...@stwm.de wrote:
  Am Samstag, 2. Mai 2015, 02:16:36 schrieb Wolfgang Walter:
  Hello,
  
  kernel 4.0 (and 4.0.1) crashes immediately when I use traceroute6 with an
  isatap-tunnel.
  
  I did some further tests. To trigger the crash you need
  
  * isatap-tunnel (probably any sit-tunnel will do it)
  * raw-socket
  * udp
  
  Using icmpv6 or tcp i.e. does not trigger it.
 
 Do you have a script to reproduce it?
 
 
 Thanks for the bug report!

You need a isatap-server with say ipv4-address $X

Then, on host with 4.0, start isatapd: isatapd --mtu 1280 $X 

then do

traceroute6 www.google.de

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problems with e1000 and flow control

2008-02-25 Thread Wolfgang Walter
Hello,

it seems that e1000 enables flow-control (rx pause frames) even if the switch 
does not advertise flow control. This seems to get a problem as (at least 
some) switches then forward pause frames directed to the card from other 
hosts. We think there are hosts which indeed do this in the lans of our 
student halls.

I think flow control should be completely disabled by default if the switch 
does not advertise it. It still can be forced with ethtool.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-13 Thread Wolfgang Walter
Hello Ilpo,

it happened again with your patch applied:

WARNING: at net/ipv4/tcp_input.c:1018 tcp_sacktag_write_queue()

Call Trace:
IRQ  [80549290] tcp_sacktag_write_queue+0x7d0/0xa60
[80283869] add_partial+0x19/0x60
[80549ac4] tcp_ack+0x5a4/0x1d70
[8054e625] tcp_rcv_established+0x485/0x7b0
[80554c3d] tcp_v4_do_rcv+0xed/0x3e0
[80556fe7] tcp_v4_rcv+0x947/0x970
[80538c6c] ip_local_deliver+0xac/0x290
[80538862] ip_rcv+0x362/0x6c0
[804fc5d3] netif_receive_skb+0x323/0x420
[8042ab40] tg3_poll+0x630/0xa50
[804fecba] net_rx_action+0x8a/0x140
[8023a269] __do_softirq+0x69/0xe0
[8020d47c] call_softirq+0x1c/0x30
[8020f315] do_softirq+0x35/0x90
[8023a105] irq_exit+0x55/0x60
[8020f3f0] do_IRQ+0x80/0x100
[8020c7d1] ret_from_intr+0x0/0xa
EOI


Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen:
 On Mon, 3 Dec 2007, Wolfgang Walter wrote:
  with kernel 2.6.23.8 we saw a
 
  KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at
  net/ipv4/tcp_input.c (1292)

 Is this the only message? Are there any Leak printouts?
 Any tweaking done to TCP related sysctls?

net/core/somaxconn=2048
net/ipv4/tcp_syncookies=1
net/ipv4/tcp_max_syn_backlog=8192
net/ipv4/tcp_max_tw_buckets=180
net/ipv4/tcp_window_scaling=0
net/ipv4/tcp_timestamps=0


 Most likely I broke the manual synchronization for left_out in sacktag by
 skipping over it when packets_out == 0 but so far I haven't been able to
 figure out how such state could develop in the first place... Ie., I
 couldn't find a case where tcp_fastretrans_alert wouldn't be called if
 left_out was non-zero (and it did the sync_left_out after modifying
 either sacked_out or lost_out, IIRC).

 ...If you can reproduce it, you could try if this patch below changes
 anything (should silence the assert and trigger earlier a WARN_ON or
 two :-)). ...If this triggers, then I'm sure we can pollute TCP code
 by a larger number of more costly checks to catch it in early.

 This might reveal a long-standing inconsistency of left_out in some
 case I just couldn't come up with by code review. Left_out will be
 (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for
 left_out occurs quite soon after that BUG_TRAP anyway so the effect
 won't be too dramatic, prior_in_flight would be once stale, won't
 lead to big problems (either missed cnwd or cwnd_cnt increment, or
 failure to do application limited check at that particular ACK).

 Thanks anyway for the report. ...If I figure something out here, I'll
 let you know.

 --

 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index c9298a7..0c5194d 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq,
 prior_snd_una - tp-max_window)) return 0;

 - if (!tp-packets_out)
 + if (!tp-packets_out) {
 + WARN_ON(tp-sacked_out);
 + WARN_ON(tp-lost_out);
 + WARN_ON(tp-left_out);
   goto out;
 + }

   /* SACK fastpath:
* if the only SACK change is the increase of the end_seq of
 @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ }
   }

 +out:
 +
   tp-left_out = tp-sacked_out + tp-lost_out;

   if ((reord  tp-fackets_out)  icsk-icsk_ca_state != TCP_CA_Loss 
   (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark)))
   tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0);

 -out:
 -
  #if FASTRETRANS_DEBUG  0
   BUG_TRAP((int)tp-sacked_out = 0);
   BUG_TRAP((int)tp-lost_out = 0);

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-03 Thread Wolfgang Walter
Am Montag, 3. Dezember 2007 14:34 schrieb Ilpo Järvinen:
 On Mon, 3 Dec 2007, Wolfgang Walter wrote:
  with kernel 2.6.23.8 we saw a
 
  KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at
  net/ipv4/tcp_input.c (1292)

 Is this the only message? Are there any Leak printouts?

No.

4 days earlier there were 3 messages: TCP: Treason uncloaked! Peer 
a.b.c.d:80/56532 shrinks window 3535507131:3535513869. Repaired.

 Any tweaking done to TCP related sysctls?
 And for completeness, is GSO enabled (ethtool -k)?

rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off


 Most likely I broke the manual synchronization for left_out in sacktag by
 skipping over it when packets_out == 0 but so far I haven't been able to
 figure out how such state could develop in the first place... Ie., I
 couldn't find a case where tcp_fastretrans_alert wouldn't be called if
 left_out was non-zero (and it did the sync_left_out after modifying
 either sacked_out or lost_out, IIRC).

 ...If you can reproduce it, you could try if this patch below changes

I don't know how to reproduce it - we never saw the message before. I'll aply 
the patch. Let see if the WARN_ON triggers before we update to a newer 
kernel :-).

 anything (should silence the assert and trigger earlier a WARN_ON or
 two :-)). ...If this triggers, then I'm sure we can pollute TCP code
 by a larger number of more costly checks to catch it in early.

 This might reveal a long-standing inconsistency of left_out in some
 case I just couldn't come up with by code review. Left_out will be
 (is) anyway dropped as unnecessary in 2.6.24. In 2.6.23 sync for
 left_out occurs quite soon after that BUG_TRAP anyway so the effect
 won't be too dramatic, prior_in_flight would be once stale, won't
 lead to big problems (either missed cnwd or cwnd_cnt increment, or
 failure to do application limited check at that particular ACK).

 Thanks anyway for the report. ...If I figure something out here, I'll
 let you know.

 --

 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
 index c9298a7..0c5194d 100644
 --- a/net/ipv4/tcp_input.c
 +++ b/net/ipv4/tcp_input.c
 @@ -1012,8 +1012,12 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ if (before(TCP_SKB_CB(ack_skb)-ack_seq,
 prior_snd_una - tp-max_window)) return 0;

 - if (!tp-packets_out)
 + if (!tp-packets_out) {
 + WARN_ON(tp-sacked_out);
 + WARN_ON(tp-lost_out);
 + WARN_ON(tp-left_out);
   goto out;
 + }

   /* SACK fastpath:
* if the only SACK change is the increase of the end_seq of
 @@ -1277,14 +1281,14 @@ tcp_sacktag_write_queue(struct sock *sk, struct
 sk_buff *ack_skb, u32 prior_snd_ }
   }

 +out:
 +
   tp-left_out = tp-sacked_out + tp-lost_out;

   if ((reord  tp-fackets_out)  icsk-icsk_ca_state != TCP_CA_Loss 
   (!tp-frto_highmark || after(tp-snd_una, tp-frto_highmark)))
   tcp_update_reordering(sk, ((tp-fackets_out + 1) - reord), 0);

 -out:
 -
  #if FASTRETRANS_DEBUG  0
   BUG_TRAP((int)tp-sacked_out = 0);
   BUG_TRAP((int)tp-lost_out = 0);

Thanks and regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel 2.6.23.8: KERNEL: assertion in net/ipv4/tcp_input.c

2007-12-02 Thread Wolfgang Walter
Hello,

with kernel 2.6.23.8 we saw a

KERNEL: assertion ((int)tcp_packets_in_flight(tp) = 0) failed at 
net/ipv4/tcp_input.c (1292)

Regards,

Wolfgang Walter
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv6 0/3] Interface group patches

2007-11-22 Thread Wolfgang Walter
From: Patrick McHardy

 I'm working on the incremental ruleset changing API BTW :)
 One of the changes will be that interface matching is not
 a default part of every rule, and without wildcards it will
 use the ifindex. But since the cost of this feature seems
 pretty low, I don't see a compelling reason against it.

Using ifindex instead of string matching the interface name in -i and -o
 would be a serious problem as it changes the semantics.

1) Now you can match a non existing interface. This is certainly used. I.e.
with vlan interfaces, ppp etc.
2) Now your rule will match an interface even if the ifindex of the interface
changes. This is used (i.e. you activate a backup interface and rename it,
build new bridges etc.).

If one wants to use the ifindex instead of a string match on the name one
should explicitly request that (i.e. by using -i =eth0 or something like
that).

Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ipsec: icmp fragmentation-needed from ipsec-gateway is not encrypted

2007-09-18 Thread Wolfgang Walter
Hello,

I have the following problem:

router A has two interfaces eth0 and eth1.

router B has two interfaces eth0 and eth1.

The networks on A:eth1 and B:eth1 are connected over an ipsec-tunnel.

the mtu on A:eth1 is 1400 (all others are 1500).

both run 2.6.22.6

If I now ping a host HA on A:eth1 from host HB on B:eth1 with packet size 
greater 1400 the ping fails.

tcpdump on A:eth0 shows

an esp-tunnel-packet from B comes in
icmp echo-request packet from HB to HA comes in
(the decrypted esp-packet)
an unecrypted icmp fragmentation-needed packet to HB from A (ip of eth1) sent 
out

It seems to me that this fragementation-needed packet generated by B is not 
handled by ipsec, is sent out unencrypted instead and this is the reason it 
does not reach HB.

I should not see the unecrypted packet going out at all? Because if I ping 
A:eth1 from HB then I don't see the unencrypted echo-reply packet (which has 
the same source-address as the fragmentation needed) but only the outgoing 
esp-packet (and the echo-reply reaches HB, by the way).

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-14 Thread Wolfgang Walter
Am Mittwoch, 12. September 2007 21:55 schrieb J. Bruce Fields:
 On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
  On Wednesday 12 September 2007, J. Bruce Fields wrote:
   On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
So it is in 2.6.21 and later and should probably go to .stable for
.21 and .22.
   
Bruce:  for you :-)
  
   OK, thanks!  But, (as is alas often the case) I'm still confused:
if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
-   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY,
svsk-sk_flags)) + if (atomic_read(svsk-sk_inuse)  1
+   || test_bit(SK_BUSY, svsk-sk_flags))
continue;
atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);
  
   What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
   after that test?  Not all the code that does either of those is under
   the same serv-sv_lock lock that this code is.
 
  This should not matter - SK_CLOSED may be set at any time.
 
  svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then
  enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue
  ensures that it is not enqueued twice.

 Oh, got it.  And the list manipulation is safe thanks to sv_lock.  Neat,
 thanks.  Can you verify that this solves your problem?

Patch works fine here.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
Hello,

as already described old temporary sockets (client is gone) of lockd aren't
closed after some time. So, with enough clients and some time gone, there
are 80 open dangling sockets and you start getting messages of the form:

lockd: too many open TCP sockets, consider increasing the number of nfsd 
threads.

If I understand the code then the intention was that the server closes
temporary sockets after about 6 to 12 minutes:

a timer is started which calls svc_age_temp_sockets every 6 minutes.

svc_age_temp_sockets:
if a socket is marked OLD it gets closed.
sockets which are not marked as OLD are marked OLD

every time the sockets receives something OLD is cleared.

But svc_age_temp_sockets never closes any socket though because it only
closes sockets with svsk-sk_inuse == 0. This seems to be a bug.

Here is a patch against 2.6.22.6 which changes the test to
svsk-sk_inuse = 0 which was probably meant. The patched kernel runs fine
here. Unused sockets get closed (after 6 to 12 minutes)

Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]

--- ../linux-2.6.22.6/net/sunrpc/svcsock.c  2007-08-27 18:10:14.0 
+0200
+++ net/sunrpc/svcsock.c2007-09-11 11:07:13.0 +0200
@@ -1572,7 +1575,7 @@
 
if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
-   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
svsk-sk_flags))
+   if (atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, 
svsk-sk_flags))
continue;
atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);


As svc_age_temp_sockets did not do anything before this change may trigger
hidden bugs.

To be true I don't see why this check

(atomic_read(svsk-sk_inuse) = 0 || test_bit(SK_BUSY, svsk-sk_flags))

is needed at all (it can only be an optimation) as this fields change after
the check. In svc_tcp_accept there is no such check when a temporary socket
is closed.


Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
Am Mittwoch, 12. September 2007 15:37 schrieb J. Bruce Fields:
 On Wed, Sep 12, 2007 at 02:07:10PM +0200, Wolfgang Walter wrote:
  as already described old temporary sockets (client is gone) of lockd
  aren't closed after some time. So, with enough clients and some time
  gone, there are 80 open dangling sockets and you start getting messages
  of the form:
 
  lockd: too many open TCP sockets, consider increasing the number of nfsd
  threads.

 Thanks for working on this problem!

  If I understand the code then the intention was that the server closes
  temporary sockets after about 6 to 12 minutes:
 
  a timer is started which calls svc_age_temp_sockets every 6 minutes.
 
  svc_age_temp_sockets:
  if a socket is marked OLD it gets closed.
  sockets which are not marked as OLD are marked OLD
 
  every time the sockets receives something OLD is cleared.
 
  But svc_age_temp_sockets never closes any socket though because it only
  closes sockets with svsk-sk_inuse == 0. This seems to be a bug.
 
  Here is a patch against 2.6.22.6 which changes the test to
  svsk-sk_inuse = 0 which was probably meant. The patched kernel runs
  fine here. Unused sockets get closed (after 6 to 12 minutes)

 So the fact that this changes the behavior means that sk_inuse is taking
 on negative values.  This can't be right--how can something like
 svc_sock_put() (which does an atomic_dec_and_test) work in that case?

You probably misread the code.

if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, svsk-sk_flags))
continue;

This means: any socket where svsk-sk_inuse != 0 or SK_BUSY is set is ignored
by svc_age_temp_sockets: no attempt is made to close the svc.

This seems to be wrong: if svsk-sk_inuse is zero only if svc_delete_socket
has been called for it and will be deleted anyway (probably it is already
closed then).

But the intention of svc_age_temp_sockets is to close open temporary
sockets where no traffic has been received for more than 6 minutes. These
sockets have svsk-sk_inuse = 1.

My patch does exactly this:

instead of

skip sockets which are not already deleted or which are busy

to

skip sockets which are already deleted or which are busy


 I wish I had time today to figure out what's going on in this case.  But
 from a quick through svsock.c for sk_inuse, it looks odd; I'm suspicious
 of anything without the stereotyped behavior--initializing to one,
 atomic_inc()ing whenever someone takes a reference, and
 atomic_dec_and_test()ing whenever someone drops it


Then svc_tcp_accept would be wrong, too (it closes sockets the same way just
without testing for sk_inuse and SK_BUSY).

I think this works because as long as a socket is in sv_tempsocks or
sv_permsocks svsk-sk_inuse can never reach zero. As svc_age_temp_sockets locks
the list nobody can bring svsk-sk_inuse to zero as long as
svc_age_temp_sockets holds the lock. As svc_age_temp_sockets calls
atomic_inc(svsk-sk_inuse) when holding the lock there is no
problem. (the same is true for svc_tcp_accept).

This is the reason why I doubt that this check for svsk-sk_inuse in
svc_age_temp_sockets is usefull at all. It should be always false.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
On Wednesday 12 September 2007, J. Bruce Fields wrote:
 On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
  So it is in 2.6.21 and later and should probably go to .stable for .21
  and .22.
  
  Bruce:  for you :-)
 
 OK, thanks!  But, (as is alas often the case) I'm still confused:
 
  if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
  continue;
  -   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
  svsk-sk_flags))
  +   if (atomic_read(svsk-sk_inuse)  1
  +   || test_bit(SK_BUSY, svsk-sk_flags))
  continue;
  atomic_inc(svsk-sk_inuse);
  list_move(le, to_be_aged);
 
 What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
 after that test?  Not all the code that does either of those is under
 the same serv-sv_lock lock that this code is.
 

This should not matter - SK_CLOSED may be set at any time.

svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then 
enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue 
ensures that it is not enqueued twice.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] sunrpc: make closing of old temporary sockets work (was: problems with lockd in 2.6.22.6)

2007-09-12 Thread Wolfgang Walter
On Wednesday 12 September 2007, J. Bruce Fields wrote:
 On Wed, Sep 12, 2007 at 09:40:57PM +0200, Wolfgang Walter wrote:
  On Wednesday 12 September 2007, J. Bruce Fields wrote:
   On Wed, Sep 12, 2007 at 04:14:06PM +0200, Neil Brown wrote:
So it is in 2.6.21 and later and should probably go to .stable for .21
and .22.

Bruce:  for you :-)
   
   OK, thanks!  But, (as is alas often the case) I'm still confused:
   
if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
-   if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
svsk-sk_flags))
+   if (atomic_read(svsk-sk_inuse)  1
+   || test_bit(SK_BUSY, svsk-sk_flags))
continue;
atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);
   
   What is it that ensures svsk-sk_inuse isn't incremented or SK_BUSY set
   after that test?  Not all the code that does either of those is under
   the same serv-sv_lock lock that this code is.
   
  
  This should not matter - SK_CLOSED may be set at any time.
  
  svc_age_temp_sockets only detaches the socket, sets SK_CLOSED and then 
  enqueues it. If SK_BUSY is set its already enqueued and svc_sock_enqueue 
  ensures that it is not enqueued twice.
 
 Oh, got it.  And the list manipulation is safe thanks to sv_lock.  Neat,
 thanks.  Can you verify that this solves your problem?
 

I'll test it tomorrow. So friday morning I'll know and mail you for sure.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] problems with lockd in 2.6.22.6

2007-09-08 Thread Wolfgang Walter
On Friday 07 September 2007, J. Bruce Fields wrote:
 On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
  Hello,
  

  3) For unknown reason these sockets then remain open. In the morning
  when people start their workstation again we therefor not only get a
  lot of these messages again but often the nfs-server does not proberly
  work any more. Restarting the nfs-daemon is a workaround.
 

I wonder why these sockets remain open, by the way. Even if they aren't used
for days. Such a socket only gets deleted when the 81. socket must be opened.

If I do not misunderstand the idea then temporary sockets should be destroyed
after some time without activity by svc_age_temp_sockets.

Now I wonder how svc_age_temp_sockets works. Does it ever close and delete a
temporary socket at all?


static void
svc_age_temp_sockets(unsigned long closure)
{
struct svc_serv *serv = (struct svc_serv *)closure;
struct svc_sock *svsk;
struct list_head *le, *next;
LIST_HEAD(to_be_aged);

dprintk(svc_age_temp_sockets\n);

if (!spin_trylock_bh(serv-sv_lock)) {
/* busy, try again 1 sec later */
dprintk(svc_age_temp_sockets: busy\n);
mod_timer(serv-sv_temptimer, jiffies + HZ);
return;
}

list_for_each_safe(le, next, serv-sv_tempsocks) {
svsk = list_entry(le, struct svc_sock, sk_list);

if (!test_and_set_bit(SK_OLD, svsk-sk_flags))
continue;
if (atomic_read(svsk-sk_inuse) || test_bit(SK_BUSY, 
svsk-sk_flags))
continue;

doesn't this mean that svsk-sk_inuse must be zero which means that SK_DEAD is 
set?
and wouldn't that mean that svc_delete_socket already has been called for that 
socket
(and probably is already closed) ?
and wouldn't that mean that svc_sock_enqueue which is called later does not 
make any
sense (it checks for SK_DEAD)?


atomic_inc(svsk-sk_inuse);
list_move(le, to_be_aged);
set_bit(SK_CLOSE, svsk-sk_flags);
set_bit(SK_DETACHED, svsk-sk_flags);
}
spin_unlock_bh(serv-sv_lock);

while (!list_empty(to_be_aged)) {
le = to_be_aged.next;
/* fiddling the sk_list node is safe 'cos we're SK_DETACHED */
list_del_init(le);
svsk = list_entry(le, struct svc_sock, sk_list);

dprintk(queuing svsk %p for closing, %lu seconds old\n,
svsk, get_seconds() - svsk-sk_lastrecv);

/* a thread will dequeue and close it soon */
svc_sock_enqueue(svsk);
svc_sock_put(svsk);
}

mod_timer(serv-sv_temptimer, jiffies + svc_conn_age_period * HZ);
}

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


problems with lockd in 2.6.22.6

2007-09-07 Thread Wolfgang Walter
Hello,

we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then 
we get the message

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from ^\\236^\É^D

1) These random characters in the second line are caused by a bug in 
svc_tcp_accept.
I already posted this patch on netdev@vger.kernel.org:

Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]
--- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 
+0200
@@ -1090,7 +1090,7 @@
   serv-sv_name);
printk(KERN_NOTICE
   %s: last TCP connect from %s\n,
-  serv-sv_name, buf);
+  serv-sv_name, __svc_print_addr(sin, 
buf, sizeof(buf)));
}
/*
 * Always select the oldest socket. It's not fair,


with this patch applied one gets something like

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from 10.11.0.12, port=784


2) The number of nfsd threads we are running on the machine is 1024. So this is 
not
the problem. It seems, though, that in the case of lockd svc_tcp_accept does not
check the number of nfsd threads but the number of lockd threads which is one.
As soon as the number of open lockd sockets surpasses 80 this message gets 
logged.
This usually happens every evening when a lot of people shutdown their 
workstation.

3) For unknown reason these sockets then remain open. In the morning when people
start their workstation again we therefor not only get a lot of these messages
again but often the nfs-server does not proberly work any more. Restarting the
nfs-daemon is a workaround.

Reagrds,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [NFS] problems with lockd in 2.6.22.6

2007-09-07 Thread Wolfgang Walter
Am Freitag, 7. September 2007 18:19 schrieben Sie:
 On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
  Hello,
 
  we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
  then we get the message
 
  lockd: too many open TCP sockets, consider increasing the number of nfsd
  threads lockd: last TCP connect from ^\\236^\É^D

 
  2) The number of nfsd threads we are running on the machine is 1024.
  So this is not the problem. It seems, though, that in the case of
  lockd svc_tcp_accept does not check the number of nfsd threads but the
  number of lockd threads which is one.  As soon as the number of open
  lockd sockets surpasses 80 this message gets logged.  This usually
  happens every evening when a lot of people shutdown their workstation.

 So to be clear: there's not an actual problem here other than that the
 logs are getting spammed?  (Not that that isn't a problem in itself.)


When more than 80 nfs clients try to lock files at the same time then it
probably would.

  3) For unknown reason these sockets then remain open. In the morning
  when people start their workstation again we therefor not only get a
  lot of these messages again but often the nfs-server does not properly
  work any more. Restarting the nfs-daemon is a workaround.

 Hm, thanks.


I don't know if the lockd thing is the reason, though.

2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems
to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related 
problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. 
Sometimes this fails with

lockd_down: lockd failed to exit, clearing pid
nfsd: last server has exited
nfsd: unexporting all filesystems
lockd_up: makesock failed, error=-98

after which the server must be rebooted.

I think there is something with lockd because there are no problems over the 
day. It is in the morning when a lot of people log into their machines and 
start their desktops (I think kde locks its config files when it reads them).

Regards
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] sunrpc: fix printk argument in svc_tcp_accept

2007-09-03 Thread Wolfgang Walter
Hello,

in 2.6.22.6, net/sunrpc/svcsock.c

random characters are printed by svc_tcp_accept:

lockd: last TCP connect from some random chars

because buf is used unitialized:

printk(KERN_NOTICE
%s: last TCP connect from %s\n,
serv-sv_name, buf);


Probably it should be

printk(KERN_NOTICE
%s: last TCP connect from %s\n,
serv-sv_name, __svc_print_addr(sin, buf, sizeof(buf)));



Signed-off-by: Wolfgang Walter [EMAIL PROTECTED]


--- linux-2.6.22.6/net/sunrpc/svcsock.c 2007-08-27 18:10:14.0 +0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c2007-09-03 18:27:30.0 
+0200
@@ -1090,7 +1090,7 @@
   serv-sv_name);
printk(KERN_NOTICE
   %s: last TCP connect from %s\n,
-  serv-sv_name, buf);
+  serv-sv_name, __svc_print_addr(sin, 
buf, sizeof(buf)));
}
/*
 * Always select the oldest socket. It's not fair,



Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: iproute2 weirdness - fwmark and route pb

2005-11-10 Thread Wolfgang Walter
Hi,

you may check if /proc/sys/net/ipv4/conf/eth3/rp_filter is 0.

If it is 1 the kernel does a route lookup for an outgoing pseudo packet for 
every packet arriving on eth3. This pseudo packet is the incoming packet but 
with src and dst address exchanged. Only if this route goes via the same 
device as the original packet arrived on the latter is accepted.

I don't think that netfilter is consulted in this process. So there this 
pseudo-packet is not marked and therefor your isdn table is not consulted. 
The iif roules will not match either. Instead table main is consulted where a 
route is found. But this route is via eth2.

Please note that if you set /proc/sys/net/ipv4/conf/eth3/rp_filter to 0 you 
probably want to check the src address of incoming packets on eth3 for not 
being ones from your eth1. 

Greetings,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
Leopoldstraße 15
80802 München
[EMAIL PROTECTED]
http://www.studentenwerk.mhn.de/
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html