Re: Bonding driver has bad load balancing for forwarded traffic, 3.7+

2013-04-16 Thread Eric Dumazet
On Tue, 2013-04-16 at 12:01 +0300, Vitaly V. Bursov wrote: > Testing under real load for almost 2 hours now, works as expected. > xmit_hash_policy=layer3+4 > > I made a few simple test with layer2+3 policy too, looks OK. > > Thanks! Perfect, thanks a lot for all this ! Tested-by: Vitaly V. Bur

Re: Bonding driver has bad load balancing for forwarded traffic, 3.7+

2013-04-16 Thread Eric Dumazet
On Tue, 2013-04-16 at 06:51 -0700, Eric Dumazet wrote: > Perfect, thanks a lot for all this ! > > Tested-by: Vitaly V. Bursov > > By the way, we probably should use skb_flow_dissect() to get proper hashing for tunnels users. -- To unsubscribe from this list: send the li

Re: [PATCH net-next] net: netdev_pick_tx: use get_xps_q if xps map is set

2013-05-17 Thread Eric Dumazet
On Fri, 2013-05-17 at 14:18 +0530, govindarajulu.v wrote: > From: "govindarajulu.v" > > netdev_pick_tx ignores the xps map configuration if netdev->ndo_select_queue > is defined. Most of the drivers define ndo_select_queue. The problem with this > is, if admin wants kernel to pick tx queue based

Re: [PATCH net-next] net: netdev_pick_tx: use get_xps_q if xps map is set

2013-05-17 Thread Eric Dumazet
On Sat, 2013-05-18 at 01:20 +0530, govind wrote: > I see. In ixgbe_select_queue we use __netdev_pick_tx only if packet is not > FCOE & FIP. > > If you see > bnx2x_select_queue in drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c, line > 1832 > and mlx4_en_select_queue in drivers/net/ethernet/mella

[PATCH net-next] x86: bpf_jit_comp: secure bpf jit against spraying attacks

2013-05-17 Thread Eric Dumazet
From: Eric Dumazet hpa bringed into my attention some security related issues with BPF JIT on x86. This patch makes sure the bpf generated code is marked read only, as other kernel text sections. It also splits the unused space (we vmalloc() and only use a fraction of the page) in two parts

RE: [PATCH net-next] x86: bpf_jit_comp: secure bpf jit against spraying attacks

2013-05-20 Thread Eric Dumazet
On Mon, 2013-05-20 at 09:51 +0100, David Laight wrote: > Hmmm anyone looking to overwrite kernel code will then start > looking for blocks of 0xcc bytes and know that what follows > is the beginning of a function. > That isn't any harder than random writes. > > Copying a random part of .rodat

Re: [PATCH v3 net-next 2/4] tcp: add TCP support for low latency receive poll.

2013-05-20 Thread Eric Dumazet
On Mon, 2013-05-20 at 13:16 +0300, Eliezer Tamir wrote: > > +config INET_LL_TCP_POLL > + bool "Low Latency TCP Receive Poll" > + depends on INET_LL_RX_POLL > + default n > + ---help--- > + TCP support for Low Latency TCP Queue Poll. > + (For network cards that support

Re: [PATCH net-next] x86: bpf_jit_comp: secure bpf jit against spraying attacks

2013-05-20 Thread Eric Dumazet
On Mon, 2013-05-20 at 11:50 +0200, Daniel Borkmann wrote: > Here seems also to be another approach ... > >http://grsecurity.net/~spender/jit_prot.diff > > via: > http://www.reddit.com/r/netsec/comments/13dzhx/linux_kernel_jit_spray_for_smep_kernexec_bypass/ Well, there are many approaches

Re: [PATCH net-next] x86: bpf_jit_comp: secure bpf jit against spraying attacks

2013-05-20 Thread Eric Dumazet
On Mon, 2013-05-20 at 16:19 +0200, Florian Westphal wrote: > What about emitting additional instructions at random locations in the > generated code itself? > > Eg., after every instruction, have random chance to insert > 'xor $0xcc,%al; xor $0xcc,%al', etc? This will be the latest thing I'll do

Re: [PATCH v3 net-next 1/4] net: implement support for low latency socket polling

2013-05-20 Thread Eric Dumazet
On Mon, 2013-05-20 at 13:16 +0300, Eliezer Tamir wrote: > Adds a new ndo_ll_poll method and the code that supports and uses it. > This method can be used by low latency applications to busy poll ethernet > device queues directly from the socket code. The ip_low_latency_poll sysctl > entry controls

Re: [PATCH v3 net-next 1/4] net: implement support for low latency socket polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 10:28 +0300, Eliezer Tamir wrote: > On 20/05/2013 18:29, Eric Dumazet wrote: > > On Mon, 2013-05-20 at 13:16 +0300, Eliezer Tamir wrote: > --- > >> +static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct > >> *napi) >

Re: [PATCH v3 net-next 0/4] net: low latency Ethernet device polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 16:15 +0300, Alex Rosenbaum wrote: > On 5/21/2013 3:29 PM, Eliezer Tamir wrote: > > What benchmarks are you using to test poll/select/epoll? > for epoll/select latency tests we are using sockperf as performance > latency tool: https://code.google.com/p/sockperf/ > It is a cli

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote: > > > > -#define hlist_nulls_first_rcu(head) \ > > - (*((struct hlist_nulls_node __rcu __force **)&(head)->first)) > > +#define hlist_nulls_first_rcu(head)\ > > + (*((struct hlist_nulls_node __rcu __force **)

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote: > > > > -#define hlist_nulls_first_rcu(head) \ > > - (*((struct hlist_nulls_node __rcu __force **)&(head)->first)) > > +#define hlist_nulls_first_rcu(head)\ > > + (*((struct hlist_nulls_node __rcu __force **)

Re: [PATCH v4 net-next 1/4] net: implement support for low latency socket polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 17:26 +0300, Eliezer Tamir wrote: > +/* should be called when destroying a napi struct */ > +static inline void inc_ll_gen_id(void) > +{ > + ll_global_gen_id++; > +} > + > +static inline void skb_mark_ll(struct sk_buff *skb, struct napi_struct *napi) > +{ > + skb->dev

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote: > On 21.05.2013 17:44, Eric Dumazet wrote: > > On Tue, 2013-05-21 at 05:09 -0700, Paul E. McKenney wrote: > > > >>> > >>> -#define hlist_nulls_first_rcu(head) \ > >>> - (*((stru

Re: sk_page_frag_refill OOM killing spree

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 14:28 +0200, Florian Westphal wrote: > Hi Eric, > > seems like sk_page_frag_refill() can cause oom-killer invocation: > > postgres invoked oom-killer: gfp_mask=0x42d0, order=3, oom_score_adj=0 > Pid: 10551, comm: postgres Tainted: G O 3.8.6-5.g613ca40-smp #1 > Call

Re: [PATCH] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 18:47 +0400, Roman Gushchin wrote: > This code has the same mistake: it is rcu_dereference_raw(head->first), > so there is nothing that prevents gcc to store the (head->first) value > in a register. If other rcu accessors have the same problem, a more complete patch is nee

Re: [PATCH v3 net-next 1/4] net: implement support for low latency socket polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 19:02 +0200, Pekka Riikonen wrote: > Maybe even that's not needed. Couldn't skb->queue_mapping give the > correct NAPI instance in multiqueue nics? The NAPI instance could be made > easily available from skb->dev. In any case an index is much better than > a new pointer

Re: [PATCH v3 net-next 1/4] net: implement support for low latency socket polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 10:48 -0700, Eric Dumazet wrote: > We do not keep skb->dev information once a packet leaves the rcu > protected region. > > Once packet is queued to tcp input queues, skb->dev is NULL. This is done in tcp_v4_rcv() & tcp_v6_rcv() -- To unsubscrib

Re: [PATCH v3 net-next 1/4] net: implement support for low latency socket polling

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 22:25 +0300, Eliezer Tamir wrote: > On 21/05/2013 20:51, Eric Dumazet wrote: > > On Tue, 2013-05-21 at 10:48 -0700, Eric Dumazet wrote: > > > >> We do not keep skb->dev information once a packet leaves the rcu > >> protected region. &g

Re: [ 072/102] ipv6: do not clear pinet6 field

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 15:44 +0400, Roman Gushchin wrote: > Hi, all! > > I think, it's good, but not enough. > > We still can't rely on the sk->sk_family field by dereferencing the > inet_sk(sk)->pinet6 field, because we can set the sk_family field to > the PF_INET6 value before setting pinet6 to

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 22:12 +0400, Roman Gushchin wrote: > > If other rcu accessors have the same problem, a more complete patch is > > needed. > > [PATCH] rcu: fix a race in rcu lists traverse macros > > Some network functions (udp4_lib_lookup2(), for instance) use the > rcu lists traverse macro

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-21 Thread Eric Dumazet
On Tue, 2013-05-21 at 19:01 -0700, Eric Dumazet wrote: > Please use ACCESS_ONCE(), which is the standard way to deal with this, > and remove the rcu_dereference_raw() in > hlist_nulls_for_each_entry_rcu() > > something like : (for the nulls part only) Thinking a bit more about t

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Eric Dumazet
On Wed, 2013-05-22 at 02:58 -0700, Paul E. McKenney wrote: > Now that I am more awake... > > The RCU list macros assume that the list header is either statically > allocated (in which case no ACCESS_ONCE() or whatever is needed) or > that the caller did whatever was necessary to protect the list

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Eric Dumazet
On Wed, 2013-05-22 at 15:58 +0400, Roman Gushchin wrote: > +/* > + * Same as ACCESS_ONCE(), but used for accessing field of a structure. > + * The main goal is preventing compiler to store &ptr->field in a register. But &ptr->field is a constant during the whole duration of udp4_lib_lookup2() and

RE: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Eric Dumazet
On Wed, 2013-05-22 at 14:27 +0100, David Laight wrote: > > So yes, the patch appears to fix the bug, but it sounds not logical to > > me. > > I was confused because the copy of the code I found was different > (it has some checks for reusaddr - which force a function call in the > loop). > > The

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-22 Thread Eric Dumazet
On Wed, 2013-05-22 at 06:00 -0700, Paul E. McKenney wrote: > Right, rcu_read_lock() is part of the protection, but rcu_dereference() > is the other part. > > All that aside, I can't claim that I understand what problem the various > patches would solve. ;-) Problem is that rcu_dereference(expr)

Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 13:39 -0400, Rik van Riel wrote: > On 05/26/2013 04:19 PM, atom...@redhat.com wrote: > > From: Aaron Tomlin > > > > Since v1: > > - Removed unnecessary parentheses > > > > ---8<--- > > > > Failed GFP_ATOMIC allocations by the network stack result in dropped > > packets, whi

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 21:55 +0400, Roman Gushchin wrote: > Hi, Paul! > > > On 25.05.2013 15:37, Paul E. McKenney wrote: > >> Again, I believe that your retry logic needs to extend back into the > >> calling function for your some_func() example above. > > And what do you think about the following

Re: [PATCH v5 net-next 2/5] net: implement support for low latency socket polling

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 10:44 +0300, Eliezer Tamir wrote: > diff --git a/include/net/sock.h b/include/net/sock.h > index 66772cf..c7c3ea6 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -281,6 +281,7 @@ struct cg_proto; >* @sk_error_report: callback to indicate errors (e.g. %M

Re: [PATCH v5 net-next 1/5] net: add napi_id and hash

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 10:44 +0300, Eliezer Tamir wrote: > Adds a napi_id and a hashing mechanism to lookup a napi by id. > This will be used by subsequent patches to implement low latency > Ethernet device polling. > Based on a code sample by Eric Dumazet. > > Signed-off

Re: [PATCH v5 net-next 0/5] net: low latency Ethernet device polling

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 10:43 +0300, Eliezer Tamir wrote: > Hello Dave, > > There are many small changes from the last time. > The two big changes are: > * Skb and sk now store a napi_id instead of a pointer. > * Very naive poll/select support. There is a dramatic improvement in both > latencey an

Re: [PATCH v5 net-next 3/5] tcp: add TCP support for low latency receive poll.

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 10:44 +0300, Eliezer Tamir wrote: > adds busy-poll support for TCP. > Really, this is a small changelog for such an addition :( How poll()/epoll() is supported ? > Signed-off-by: Alexander Duyck > Signed-off-by: Jesse Brandeburg > Tested-by: Willem de Bruijn > Signed-of

Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets

2013-05-27 Thread Eric Dumazet
On Mon, 2013-05-27 at 21:31 -0700, Joe Perches wrote: > I think the __alloc_skb alloc failure message is ok, > but maybe there shouldn't be something "scary" like > a dump_stack. > > Maybe this site should use a trivial debug error > message like below instead. > --- Oh well. If dump_stack are

Re: [PATCH v5 net-next 1/5] net: add napi_id and hash

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 11:03 +0300, Eliezer Tamir wrote: > With an atomic we don't need the RTNL in any of the napi_id functions. > One less thing to worry about when we try to remove the RTNL. OK but we'll need something to protect the lists against concurrent insert/deletes. A spinlock or a mut

Re: [PATCH v5 net-next 3/5] tcp: add TCP support for low latency receive poll.

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 15:15 +0300, Eliezer Tamir wrote: > >> How IPv6 is handled ? > > It turns out that adding TCPv6/UDPv6 is very simple. Yep, I was about to send you the needed lines after my breakfast ;) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the bod

Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 13:15 -0300, Rafael Aquini wrote: > The real problem seems to be that more and more the network stack (drivers, > perhaps) > is relying on chunks of contiguous page-blocks without a fallback mechanism to > order-0 page allocations. When memory gets fragmented, these alloc fa

Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 14:43 -0300, Rafael Aquini wrote: > > Perhaps the explanation is because we're looking into old stuff bad effects, > then. But just to list a few for your appreciation: > > ---

Re: [Patch v2] skbuff: Hide GFP_ATOMIC page allocation failures for dropped packets

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 14:43 -0300, Rafael Aquini wrote: > Perhaps the explanation is because we're looking into old stuff bad effects, > then. But just to list a few for your appreciation: > > Apr 23 11:25:31 217-IDC kernel: httpd: page allo

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote: > On 28.05.2013 04:12, Eric Dumazet wrote: > > Adding a barrier() is probably what we want. > > I agree, inserting barrier() is also a correct and working fix. Yeah, but I can not find a clean way to put it inside the

Re: [PATCH v2] rcu: fix a race in hlist_nulls_for_each_entry_rcu macro

2013-05-28 Thread Eric Dumazet
On Tue, 2013-05-28 at 18:31 -0700, Paul E. McKenney wrote: > On Tue, May 28, 2013 at 05:34:53PM -0700, Eric Dumazet wrote: > > On Tue, 2013-05-28 at 13:10 +0400, Roman Gushchin wrote: > > > On 28.05.2013 04:12, Eric Dumazet wrote: > > > > > > Adding

Re: [PATCH net-next 1/3] net: core: let skb_partial_csum_set() set transport header

2013-03-27 Thread Eric Dumazet
nd simplify the caller. > > Cc: Eric Dumazet > Signed-off-by: Jason Wang > --- > net/core/skbuff.c |1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 31c6737..ba64614 100644 > --- a/net/core/sk

Re: [PATCH net-next 2/3] net: core: introduce skb_probe_transport_header()

2013-03-27 Thread Eric Dumazet
w_dissect(), if not just set the transport header to the hint passed by > caller. > > Cc: Eric Dumazet > Signed-off-by: Jason Wang > --- > include/linux/skbuff.h | 14 ++ > 1 files changed, 14 insertions(+), 0 deletions(-) Acked-by: Eric Dumazet -- To unsub

Re: [PATCH net-next 3/3] net: switch to use skb_probe_transport_header()

2013-03-27 Thread Eric Dumazet
On Wed, 2013-03-27 at 17:11 +0800, Jason Wang wrote: > Switch to use the new help skb_probe_transport_header() to do the l4 header > probing for untrusted sources. For packets with partial csum, the header > should > already been set by skb_partial_csum_set(). > > Cc: Eric Dum

Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)

2013-03-28 Thread Eric Dumazet
On Thu, 2013-03-28 at 13:16 -0400, Steven Rostedt wrote: > Hi, > > I'm currently debugging a crash in an old 3.0-rt kernel that one of our > customers is seeing. The bug happens with a stress test that loads and > unloads the bonding module in a loop (I don't know all the details as > I'm not the

[PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

2013-03-29 Thread Eric Dumazet
From: Eric Dumazet On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote: > Hmm. I think that this might be issue introduced by: > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a > Author: Stephen Hemminger > Date: Mon Aug 1 16:19:00 2011 + > > rcu: convert uses of

Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

2013-03-29 Thread Eric Dumazet
On Fri, 2013-03-29 at 09:17 -0400, Steven Rostedt wrote: > I've thought about this too, but I wasn't sure we wanted two > synchronize_*() functions, as the caller does a synchronize as well. > That said, I think this is the more robust solution and it lets all > rx_handler() functions assume that

Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

2013-03-29 Thread Eric Dumazet
On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote: > Erik, why doesn't help the write barrier between the assignments. It > should guarantee their orders... or not? > Its not enough, I wont explain here why as RCU is quite well documented in Documentation/RCU -- To unsubscribe from this l

Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

2013-03-29 Thread Eric Dumazet
On Fri, 2013-03-29 at 17:12 +0100, Jiri Pirko wrote: > Fri, Mar 29, 2013 at 04:38:15PM CET, eric.duma...@gmail.com wrote: > >On Fri, 2013-03-29 at 16:11 +0100, Ivan Vecera wrote: > > > >> Erik, why doesn't help the write barrier between the assignments. It > >> should guarantee their orders... or

Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

2013-03-29 Thread Eric Dumazet
On Fri, 2013-03-29 at 12:20 -0700, Paul E. McKenney wrote: > Reviewed-by: Paul E. McKenney > > With kudos to Steven Rostedt for his analogy between RCU and > Schrödinger's cat. ;-) Thanks Paul ! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a messa

Re: BUG: IPv4: Attempt to release TCP socket in state 1

2013-03-07 Thread Eric Dumazet
On Wed, 2013-03-06 at 16:41 -0800, dormando wrote: > Ok... bridge module is loaded but nothing seems to be using it. No > bond/tunnels/anything enabled. I couldn't quickly figure out what was > causing it to load. > > We removed the need for macvlan, started machines with a fresh boot, and > they

Re: BUG: soft lockup on all kernels after 2.6.3x

2013-03-07 Thread Eric Dumazet
On Thu, 2013-03-07 at 16:54 +0400, Alexey Vlasov wrote: > Hi, > > On Sat, Feb 09, 2013 at 07:07:53AM -0800, Eric Dumazet wrote: > > > > > > I used 2.6.2x kernel for a long time on my shared hosting and I didn't > > > have any problems. Kernels wo

Re: BUG: soft lockup on all kernels after 2.6.3x

2013-03-07 Thread Eric Dumazet
On Thu, 2013-03-07 at 20:37 +0400, Alexey Vlasov wrote: > On Thu, Mar 07, 2013 at 08:20:23AM -0800, Eric Dumazet wrote: > > > > What are gr_ symbols ? > > This is grsecurity patches ;) > Well, remove all alien patches and try to reproduce the bug with a pristin

Re: 3.9-rc1 NULL pointer crash at find_pid_ns

2013-03-07 Thread Eric Dumazet
On Thu, 2013-03-07 at 12:36 -0500, Sasha Levin wrote: > Looks like the hlist change is probably the issue, though it specifically > uses: > > #define hlist_entry_safe(ptr, type, member) \ > (ptr) ? hlist_entry(ptr, type, member) : NULL > > I'm still looking at the code in que

Re: 3.9-rc1 NULL pointer crash at find_pid_ns

2013-03-07 Thread Eric Dumazet
On Thu, 2013-03-07 at 13:14 -0500, Sasha Levin wrote: > Okay, I'm even more confused now. > > The expression in question is: > > hlist_entry_safe(rcu_dereference_bh(hlist_first_rcu(head))) > > You're saying that "rcu_dereference_bh(hlist_first_rcu(head))" can change > between > the two e

Re: epoll: possible bug from wakeup_source activation

2013-03-09 Thread Eric Dumazet
On Sun, 2013-03-10 at 01:11 +, Eric Wong wrote: > > static void ep_destroy_wakeup_source(struct epitem *epi) > { > - wakeup_source_unregister(epi->ws); > - epi->ws = NULL; > + struct wakeup_source *ws = epi->ws; > + > + rcu_assign_pointer(epi->ws, NULL); There is no need to

Re: Huge performance degradation for UDP between 2.4.17 and 2.6

2012-08-02 Thread Eric Dumazet
On Thu, 2012-08-02 at 14:27 +0200, leroy christophe wrote: > Hi > > I'm having a big issue with UDP. Using a powerpc board (MPC860). > > With our board running kernel 2.4.17, I'm able to send 16 voice > packets (UDP, 96 bytes per packet) in 11 seconds. > With the same board running either Ke

Re: [RFC 1/4] hashtable: introduce a small and naive hashtable

2012-08-02 Thread Eric Dumazet
On Thu, 2012-08-02 at 10:32 -0700, Linus Torvalds wrote: > On Thu, Aug 2, 2012 at 9:40 AM, Eric W. Biederman > wrote: > > > > For a trivial hash table I don't know if the abstraction is worth it. > > For a hash table that starts off small and grows as big as you need it > > the incent to use a ha

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-03 Thread Eric Dumazet
On Fri, 2012-07-27 at 23:37 +0800, Cong Wang wrote: > slave_enable_netpoll() and __netpoll_setup() may be called > with read_lock() held, so should use GFP_ATOMIC to allocate > memory. > > Cc: "David S. Miller" > Reported-by: Dan Carpenter > Signed-off-by: Cong Wang > --- > drivers/net/bonding

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 17:34 +0800, Cong Wang wrote: > On Fri, 2012-08-03 at 11:17 +0200, Eric Dumazet wrote: > > On Fri, 2012-07-27 at 23:37 +0800, Cong Wang wrote: > > > slave_enable_netpoll() and __netpoll_setup() may be called > > > with read_lock() held, so should

Re: [RFC v2 1/7] hashtable: introduce a small and naive hashtable

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 16:23 +0200, Sasha Levin wrote: > This hashtable implementation is using hlist buckets to provide a simple > hashtable to prevent it from getting reimplemented all over the kernel. > > +static void hash_add(struct hash_table *ht, struct hlist_node *node, long > key) > +{ >

Re: [RFC v2 7/7] net,9p: use new hashtable implementation

2012-08-03 Thread Eric Dumazet
On Fri, 2012-08-03 at 16:23 +0200, Sasha Levin wrote: > Switch 9p error table to use the new hashtable implementation. This reduces > the amount of > generic unrelated code in 9p. > > Signed-off-by: Sasha Levin > --- > net/9p/error.c | 17 - > 1 files changed, 8 insertions(+),

Re: TCP Delayed ACK in FIN/ACK

2012-08-04 Thread Eric Dumazet
On Sat, 2012-08-04 at 16:51 +0200, richard -rw- weinberger wrote: > On Sat, Aug 4, 2012 at 4:45 PM, Sławek Janecki wrote: > > I have a node.js client (10.177.62.7) requesting some data from http > > rest service from server (10.177.0.1). > > Client is simply using nodejs http.request() method (age

Re: Huge performance degradation for UDP between 2.4.17 and 2.6

2012-08-05 Thread Eric Dumazet
On Sun, 2012-08-05 at 10:16 +0200, LEROY christophe wrote: > Le 02/08/2012 16:13, Eric Dumazet a écrit : > > On Thu, 2012-08-02 at 14:27 +0200, leroy christophe wrote: > >> Hi > >> > >> I'm having a big issue with UDP. Using a powerpc board (MPC860). >

Re: [PATCH 1/7] netpoll: use GFP_ATOMIC in slave_enable_netpoll() and __netpoll_setup()

2012-08-06 Thread Eric Dumazet
On Mon, 2012-08-06 at 17:08 +0800, Cong Wang wrote: > On Fri, 2012-08-03 at 12:10 +0200, Eric Dumazet wrote: > > > > I did this , just take it ;) > > Do we have to pass gfp to ->ndo_netpoll_setup() too? It seems no, so far > I don't think we have to do tha

Re: [PATCH] task_work: add a scheduling point in task_work_run()

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 16:37 -0400, Mimi Zohar wrote: > We're here, because fput() called schedule_work() to delay the last > fput(). The execution needs to take place before the syscall returns to > userspace. Need to read __schedule()... Do you know if cond_resched() > can guarantee that it wi

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-21 Thread Eric Dumazet
On Tue, 2012-08-21 at 23:07 -0500, Larry Finger wrote: > Hi, > > The commit entitled "tcp: reduce out_of_order memory use" turns out to cause > problems with a number of USB drivers. > > The first one called to my attention was for staging/r8712u. For this driver, > there are problems with SSL

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 01:29 +0200, Alex Bergmann wrote: > Hi David, > > I'm not 100% sure, but it looks like I found an RFC mismatch with the > current default values of the TCP implementation. > > Alex > > From 8b854a525eb45f64ad29dfab16f9d9f681e84495 Mon Sep 17 00:00:00 2001 > From: Alexander

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 10:48 +0200, Alex Bergmann wrote: > On 08/22/2012 10:06 AM, Eric Dumazet wrote: > >> Prior to 9ad7c049 the timeout was defined with 189secs. Now we have only > >> a timeout of 63secs. > >> > >> ((2 << 5) - 1) * 3 secs = 1

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 11:29 +0200, Alex Bergmann wrote: > Actual 6 SYN frames are sent. The initial one and 5 retries. > first one had a t0 + 0 delay. How can it count ??? > The kernel is waiting another 32 seconds for a SYN+ACK and then gives > the ETIMEDOUT back to userspace. > > Do you mea

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 12:00 +0200, Eric Dumazet wrote: > On Wed, 2012-08-22 at 11:29 +0200, Alex Bergmann wrote: > > > Actual 6 SYN frames are sent. The initial one and 5 retries. > > > > first one had a t0 + 0 delay. How can it count ??? > > > The kernel is

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 11:38 -0500, Nathan Zimmer wrote: > This moves a kfree outside a spinlock to help scaling on larger (512 core) > systems. > > I ran a simple test which just reads from /proc/cpuinfo. > Lower is better, as you can see the worst case scenario is improved. > > baseline

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote: > > Thats interesting, but if you really want this to fly, one RCU > conversion would be much better ;) > > pde_users would be an atomic_t and you would avoid the spinlock > contention. Here is what I had in mind, I woul

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-22 Thread Eric Dumazet
On Wed, 2012-08-22 at 16:33 -0500, Larry Finger wrote: > On 08/22/2012 12:15 AM, Eric Dumazet wrote: > > > > This particular commit is the start of a patches batch that ended in the > > generic TCP coalescing mechanism. > > > > It is known to have problem on driv

Re: [PATCH] staging: rtl8192e: use is_zero_ether_addr() instead of memcmp()

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:19 +0800, Wei Yongjun wrote: > From: Wei Yongjun > > Using is_zero_ether_addr() instead of directly use > memcmp() to determine if the ethernet address is all > zeros. > > spatch with a semantic match is used to found this problem. > (http://coccinelle.lip6.fr/) > > Sig

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 13:58 +0200, Alex Bergmann wrote: > On 08/22/2012 06:41 PM, H.K. Jerry Chu wrote: > > This issue occurred to me right after I submitted the patch for RFC6298. > > I did not commit any more change because RFC compliance aside, 180secs > > just seem like eternity in the Interne

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
imes to retry active > opening a > * connection: ~180sec is RFC minimum */ > > #define TCP_SYNACK_RETRIES 5 /* number of times to retry passive opening a Acked-by: Eric Dumazet A change of the comment might be good, to help future readers. -- To unsub

RE: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 13:35 +0100, David Laight wrote: > > I would suggest to increase TCP_SYN_RETRIES from 5 to 6. > > > > 180 secs is eternity, but 31 secs is too small. > > Wasn't the intention of the long delay to allow a system > acting as a router to reboot? > I suspect that is why it (and

Re: [REGRESSION] 3.6-rc2 and 3.6-rc3: TCP/IP network connection hang

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 22:35 +0200, Martin Steigerwald wrote: > Hi! > > Its a bit difficult to describe. With 3.6-rc2 and 3.6-rc3 on an Lenovo > ThinkPad T520 from Linus git, I get occasional network hangs: > > On for example sending a small mail via SMTP to my Debian Squeeze > based server via a

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-23 Thread Eric Dumazet
On Thu, 2012-08-23 at 15:57 -0500, Larry Finger wrote: > On 08/22/2012 11:03 PM, Eric Dumazet wrote: > > > > Changing the allocation size removes the problem ? thats really strange. > > > > If you try different sizes in the 9100-30720 range, can you pinpoint the >

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:09 -0500, Larry Finger a écrit : > With kernel 3.6-rc2, the error changes to the following: > > > --2012-08-24 08:26:42-- https://bugzilla.redhat.com/show_bug.cgi?id=847525 > Resolving bugz

Re: [PATCH] fs/proc: Move kfree outside pde_unload_lock

2012-08-24 Thread Eric Dumazet
Le vendredi 24 août 2012 à 09:48 -0500, Nathan Zimmer a écrit : > On Wed, Aug 22, 2012 at 11:42:58PM +0200, Eric Dumazet wrote: > > On Wed, 2012-08-22 at 20:28 +0200, Eric Dumazet wrote: > > > > > > > > Thats interesting, but if you really want this to fly, one R

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:49 -0500, Larry Finger wrote: > There is nothing in kernel log when it happens. The file STRACE is attached. > So there is indeed a corruption. What was the driver you used in this case ? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > On 08/24/2012 10:19 AM, David Miller wrote: > > > > This looks like full-on data corruption to me. > > I agree. The question is why does it happen with r8712u, and only after the > commit in the subject. Drivers for other devices that I hav

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: > On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > > On 08/24/2012 10:19 AM, David Miller wrote: > > > > > > This looks like full-on data corruption to me. > > > > I agree. The question is w

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-24 Thread Eric Dumazet
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote: > On 08/24/2012 11:23 AM, Eric Dumazet wrote: > > On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote: > >> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote: > >>> On 08/24/2012 10:19 AM, David Miller w

Re: [PATCH 1/1] tcp: Wrong timeout for SYN segments

2012-08-25 Thread Eric Dumazet
_SYN_RETRIES to the value of 6, > providing a retransmission window of 63secs. > > The comments for SYN and SYNACK retries have also been updated to > describe the current settings. > > Signed-off-by: Alexander Bergmann > --- Acked-by: Eric Dumazet -- To unsubscribe fro

Re: Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

2012-08-27 Thread Eric Dumazet
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote: > I have prepared a patch to fix all the unchecked allocations. > > Over the weekend I made some progress. To test the latest vendor driver, I > installed a 32-bit system. Their driver is not compatible with a 64-bit > system. > I found th

[PATCH net-next v1] net: use a per task frag allocator

2012-09-19 Thread Eric Dumazet
From: Eric Dumazet We currently use a per socket page reserve for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of

Re: [PATCH] bnx2: update bnx2-mips-09 firmware to bnx2-mips-09-6.2.1b

2012-08-07 Thread Eric Dumazet
On Wed, 2012-08-08 at 08:17 +0200, Willy Tarreau wrote: > > Well, if the drivers provided with the kernel don't work out of the box > anymore, maybe we should also move them to a separate repository. All it > is going to do otherwise is to cause invalid bug reports because users > don't understan

Re: [PATCH] bnx2: update bnx2-mips-09 firmware to bnx2-mips-09-6.2.1b

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 08:49 +0200, Willy Tarreau wrote: > Hi Eric, > > On Wed, Aug 08, 2012 at 08:27:52AM +0200, Eric Dumazet wrote: > > On Wed, 2012-08-08 at 08:17 +0200, Willy Tarreau wrote: > > > > > > > > Well, if the drivers provided with the kernel

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
t to 1, but selinux ended up being set to disabled after the > >> __initcall(selinux_nf_ip_init) ran? Weird. > > This looks right as well: > > > > # zcat config.gz | grep SELINUX > > CONFIG_SECURITY_SELINUX=y > > CONFIG_SECURITY_SELINUX_BOOTPARAM=y > > CONFIG_SECURITY_SELIN

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:26 -0400, Paul Moore wrote: > On Wednesday, August 08, 2012 12:14:42 PM John Stultz wrote: > > So I bisected this down and it seems to be the following commit: > > > > commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046 > > Author: Eric Dumazet &

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:50 -0400, Paul Moore wrote: > Yep. I was just trying to see if there was a way we could avoid having to > make it conditional on CONFIG_SECURITY, but I think this is a better approach > than the alternatives. > > I'm also looking into making sure we get a sane LSM labe

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 12:49 -0700, John Stultz wrote: > I can't comment on the patch itself, but I tested it against Linus' HEAD > and it seems to resolve the oops on shutdown for me. OK, thanks ! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a mess

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: > Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY in core > code. Sure but it seems include file misses an accessor for this. We could add it on a future cleanup patch, as Paul mentioned. > Ifndef CONF_SECURITY then security_sk_a

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote: > On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: > > > Seems wrong. We shouldn't ever need ifdef CONFIG_SECURITY in core > > code. > > Sure but it seems include file misses an accessor for this. >

Re: NULL pointer dereference in selinux_ip_postroute_compat

2012-08-08 Thread Eric Dumazet
On Wed, 2012-08-08 at 16:46 -0400, Paul Moore wrote: > On Wednesday, August 08, 2012 10:32:52 PM Eric Dumazet wrote: > > On Wed, 2012-08-08 at 22:09 +0200, Eric Dumazet wrote: > > > On Wed, 2012-08-08 at 15:59 -0400, Eric Paris wrote: > > > > Seems wrong.

[PATCH net-next] time: jiffies_delta_to_clock_t() helper to the rescue

2012-08-09 Thread Eric Dumazet
From: Eric Dumazet Various /proc/net files sometimes report crazy timer values, expressed in clock_t units. This happens when an expired timer delta (expires - jiffies) is passed to jiffies_to_clock_t(). This function has an overflow in : return div_u64((u64)x * TICK_NSEC, NSEC_PER_SEC

<    1   2   3   4   5   6   7   8   9   10   >