Re: [PATCH net] net: sched: red: avoid hashing NULL child
On Fri, 18 May 2018, Paolo Abeni wrote: > Hangbin reported an Oops triggered by the syzkaller qdisc rules: > > kasan: GPF could be caused by NULL-ptr deref or user memory access > general protection fault: [#1] SMP KASAN PTI > Modules linked in: sch_red > CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1 > Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > RIP: 0010:qdisc_hash_add+0x26/0xa0 > RSP: 0018:8800589cf470 EFLAGS: 00010203 > RAX: dc00 RBX: RCX: 824ad971 > RDX: 0007 RSI: c9000ce9f000 RDI: 003c > RBP: 0001 R08: ed000b139ea2 R09: 8800589cf4f0 > R10: 8800589cf50f R11: ed000b139ea2 R12: 880054019fc0 > R13: 880054019fb4 R14: 88005c0af600 R15: 880054019fb0 > FS: 7fa6edcb1700() GS:88005ce0() knlGS: > CS: 0010 DS: ES: CR0: 80050033 > CR2: 2740 CR3: 0fc16000 CR4: 06f0 > DR0: DR1: DR2: > DR3: DR6: fffe0ff0 DR7: 0400 > Call Trace: > red_change+0x2d2/0xed0 [sch_red] > qdisc_create+0x57e/0xef0 > tc_modify_qdisc+0x47f/0x14e0 > rtnetlink_rcv_msg+0x6a8/0x920 > netlink_rcv_skb+0x2a2/0x3c0 > netlink_unicast+0x511/0x740 > netlink_sendmsg+0x825/0xc30 > sock_sendmsg+0xc5/0x100 > ___sys_sendmsg+0x778/0x8e0 > __sys_sendmsg+0xf5/0x1b0 > do_syscall_64+0xbd/0x3b0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > RIP: 0033:0x450869 > RSP: 002b:7fa6edcb0c48 EFLAGS: 0246 ORIG_RAX: 002e > RAX: ffda RBX: 7fa6edcb16b4 RCX: 00450869 > RDX: RSI: 20c0 RDI: 0013 > RBP: 0072bea0 R08: R09: > R10: R11: 0246 R12: > R13: 8778 R14: 00702838 R15: 7fa6edcb1700 > Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 > 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 > 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51 > RIP: qdisc_hash_add+0x26/0xa0 RSP: 8800589cf470 > > When a red qdisc is updated with a 0 limit, the child qdisc is left > unmodified, no additional scheduler is created in red_change(), > the 'child' local variable is rightfully NULL and must not add it > to the hash table. > > This change addresses the above issue moving qdisc_hash_add() right > after the child qdisc creation. It additionally removes unneeded checks > for noop_qdisc. > > Reported-by: Hangbin Liu <liuhang...@gmail.com> > Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the > dump") Acked-by: Jiri Kosina <jkos...@suse.cz> Thanks for fixing this Paolo. -- Jiri Kosina SUSE Labs
Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution
On Tue, 9 Jan 2018, Josh Poimboeuf wrote: > On Tue, Jan 09, 2018 at 11:44:05AM -0800, Dan Williams wrote: > > On Tue, Jan 9, 2018 at 11:34 AM, Jiri Kosina <ji...@kernel.org> wrote: > > > On Fri, 5 Jan 2018, Dan Williams wrote: > > > > > > [ ... snip ... ] > > >> Andi Kleen (1): > > >> x86, barrier: stop speculation for failed access_ok > > >> > > >> Dan Williams (13): > > >> x86: implement nospec_barrier() > > >> [media] uvcvideo: prevent bounds-check bypass via speculative > > >> execution > > >> carl9170: prevent bounds-check bypass via speculative execution > > >> p54: prevent bounds-check bypass via speculative execution > > >> qla2xxx: prevent bounds-check bypass via speculative execution > > >> cw1200: prevent bounds-check bypass via speculative execution > > >> Thermal/int340x: prevent bounds-check bypass via speculative > > >> execution > > >> ipv6: prevent bounds-check bypass via speculative execution > > >> ipv4: prevent bounds-check bypass via speculative execution > > >> vfs, fdtable: prevent bounds-check bypass via speculative execution > > >> net: mpls: prevent bounds-check bypass via speculative execution > > >> udf: prevent bounds-check bypass via speculative execution > > >> userns: prevent bounds-check bypass via speculative execution > > >> > > >> Mark Rutland (4): > > >> asm-generic/barrier: add generic nospec helpers > > >> Documentation: document nospec helpers > > >> arm64: implement nospec_ptr() > > >> arm: implement nospec_ptr() > > > > > > So considering the recent publication of [1], how come we all of a sudden > > > don't need the barriers in ___bpf_prog_run(), namely for LD_IMM_DW and > > > LDX_MEM_##SIZEOP, and something comparable for eBPF JIT? > > > > > > Is this going to be handled in eBPF in some other way? > > > > > > Without that in place, and considering Jann Horn's paper, it would seem > > > like PTI doesn't really lock it down fully, right? > > > > Here is the latest (v3) bpf fix: > > > > https://patchwork.ozlabs.org/patch/856645/ > > > > I currently have v2 on my 'nospec' branch and will move that to v3 for > > the next update, unless it goes upstream before then. Daniel, I guess you're planning to send this still for 4.15? > That patch seems specific to CONFIG_BPF_SYSCALL. Is the bpf() syscall > the only attack vector? Or are there other ways to run bpf programs > that we should be worried about? Seems like Alexei is probably the only person in the whole universe who isn't CCed here ... let's fix that. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution
On Fri, 5 Jan 2018, Dan Williams wrote: [ ... snip ... ] > Andi Kleen (1): > x86, barrier: stop speculation for failed access_ok > > Dan Williams (13): > x86: implement nospec_barrier() > [media] uvcvideo: prevent bounds-check bypass via speculative execution > carl9170: prevent bounds-check bypass via speculative execution > p54: prevent bounds-check bypass via speculative execution > qla2xxx: prevent bounds-check bypass via speculative execution > cw1200: prevent bounds-check bypass via speculative execution > Thermal/int340x: prevent bounds-check bypass via speculative execution > ipv6: prevent bounds-check bypass via speculative execution > ipv4: prevent bounds-check bypass via speculative execution > vfs, fdtable: prevent bounds-check bypass via speculative execution > net: mpls: prevent bounds-check bypass via speculative execution > udf: prevent bounds-check bypass via speculative execution > userns: prevent bounds-check bypass via speculative execution > > Mark Rutland (4): > asm-generic/barrier: add generic nospec helpers > Documentation: document nospec helpers > arm64: implement nospec_ptr() > arm: implement nospec_ptr() So considering the recent publication of [1], how come we all of a sudden don't need the barriers in ___bpf_prog_run(), namely for LD_IMM_DW and LDX_MEM_##SIZEOP, and something comparable for eBPF JIT? Is this going to be handled in eBPF in some other way? Without that in place, and considering Jann Horn's paper, it would seem like PTI doesn't really lock it down fully, right? [1] https://bugs.chromium.org/p/project-zero/issues/attachmentText?aid=287305 -- Jiri Kosina SUSE Labs
Re: [Patch net] net_sched: check noop_qdisc before qdisc_hash_add()
On Tue, 4 Apr 2017, Cong Wang wrote: > Dmitry reported a crash when injecting faults in > attach_one_default_qdisc() and dev->qdisc is still > a noop_disc, the check before qdisc_hash_add() fails > to catch it because it tests NULL. We should test > against noop_qdisc since it is the default qdisc > at this point. > > Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable") > Reported-by: Dmitry Vyukov <dvyu...@google.com> > Cc: Jiri Kosina <jkos...@suse.cz> > Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com> Acked-by: Jiri Kosina <jkos...@suse.cz> Thanks. > --- > net/sched/sch_generic.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c > index b052b27..1a2f9e9 100644 > --- a/net/sched/sch_generic.c > +++ b/net/sched/sch_generic.c > @@ -794,7 +794,7 @@ static void attach_default_qdiscs(struct net_device *dev) > } > } > #ifdef CONFIG_NET_SCHED > - if (dev->qdisc) > + if (dev->qdisc != _qdisc) > qdisc_hash_add(dev->qdisc); > #endif > } > -- > 2.5.5 > -- Jiri Kosina SUSE Labs
Re: [PATCH v2 1/2] net: sched: make default fifo qdiscs appear in the dump
On Wed, 8 Mar 2017, Eric Dumazet wrote: > > +++ b/net/sched/sch_qfq.c > > @@ -494,6 +494,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 > > classid, u32 parentid, > > goto destroy_class; > > } > > > > + if (cl->qdisc != _qdisc) > > + qdisc_hash_add(cl->qdisc, true); > > > Please move the test in qdisc_hash_add() instead of copy/pasting it all > over the places ? Well, qdisc_hash_add() has a WARN_ON() (inherited from what qdisc_list_add() used to do) for that particular case to catch cases where singleton qdisc would make it there from other places by mistake. By putting this test there we'll effectively giving up on this warning should it ever point to a bug. Thanks, -- Jiri Kosina SUSE Labs
[PATCH v3 1/2] net: sched: make default fifo qdiscs appear in the dump
From: Jiri Kosina <jkos...@suse.cz> The original reason [1] for having hidden qdiscs (potential scalability issues in qdisc_match_from_root() with single linked list in case of large amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable"). This allows us for bringing more clarity and determinism into the dump by making default pfifo qdiscs visible. We're not turning this on by default though, at it was deemed [2] too intrusive / unnecessary change of default behavior towards userspace. Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows applications to request complete qdisc hierarchy dump, including the ones that have always been implicit/invisible. Singleton noop_qdisc stays invisible, as teaching the whole infrastructure about singletons would require quite some surgery with very little gain (seeing no qdisc or seeing noop qdisc in the dump is probably setting the same user expectation). [1] http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v2 -> v3: get rid of uapi breakage by changing value of TCA_PAD (thanks a lot to Jiri Pirko for catching my brainfart) v1 -> v2: introduce exception for singleton noop_qdisc include/net/pkt_sched.h| 2 +- include/net/sch_generic.h | 1 + include/uapi/linux/rtnetlink.h | 1 + net/sched/sch_api.c| 42 ++ net/sched/sch_cbq.c| 5 + net/sched/sch_drr.c| 2 ++ net/sched/sch_dsmark.c | 2 ++ net/sched/sch_generic.c| 2 +- net/sched/sch_hfsc.c | 4 net/sched/sch_htb.c| 2 ++ net/sched/sch_mq.c | 2 +- net/sched/sch_mqprio.c | 2 +- net/sched/sch_multiq.c | 2 ++ net/sched/sch_prio.c | 5 - net/sched/sch_qfq.c| 2 ++ net/sched/sch_red.c| 2 ++ net/sched/sch_sfb.c| 2 ++ net/sched/sch_tbf.c| 2 ++ 18 files changed, 65 insertions(+), 17 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index cd334c9584e9..0625eac2c601 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q, bool invisible); void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index e6aa0a249672..e7dca250d115 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -66,6 +66,7 @@ struct Qdisc { #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy : * qdisc_tree_decrease_qlen() should stop. */ +#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 262f0379d83a..be034cc0e4e4 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -543,6 +543,7 @@ enum { TCA_STATS2, TCA_STAB, TCA_PAD, + TCA_DUMP_INVISIBLE, __TCA_MAX }; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 206dc24add3a..8e4e6ab1847a 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) return NULL; } -void qdisc_hash_add(struct Qdisc *q) +void qdisc_hash_add(struct Qdisc *q, bool invisible) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { struct Qdisc *root = qdisc_dev(q)->qdisc; @@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q) WARN_ON_ONCE(root == _qdisc); ASSERT_RTNL(); hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle); + if (invisible) + q->flags |= TCQ_F_INVISIBLE; } } EXPORT_SYMBOL(qdisc_hash_add); @@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, goto err_out4; } - qdisc_hash_add(sch); + qdisc_hash_add(sch, false); return sch; } @@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, retu
[PATCH v2 1/2] net: sched: make default fifo qdiscs appear in the dump
From: Jiri Kosina <jkos...@suse.cz> The original reason [1] for having hidden qdiscs (potential scalability issues in qdisc_match_from_root() with single linked list in case of large amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable"). This allows us for bringing more clarity and determinism into the dump by making default pfifo qdiscs visible. We're not turning this on by default though, at it was deemed [2] too intrusive / unnecessary change of default behavior towards userspace. Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows applications to request complete qdisc hierarchy dump, including the ones that have always been implicit/invisible. Singleton noop_qdisc stays invisible, as teaching the whole infrastructure about singletons would require quite some surgery with very little gain (seeing no qdisc or seeing noop qdisc in the dump is probably setting the same user expectation). [1] http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: introduce exception for singleton noop_qdisc include/net/pkt_sched.h| 2 +- include/net/sch_generic.h | 1 + include/uapi/linux/rtnetlink.h | 1 + net/sched/sch_api.c| 42 ++ net/sched/sch_cbq.c| 5 + net/sched/sch_drr.c| 2 ++ net/sched/sch_dsmark.c | 2 ++ net/sched/sch_generic.c| 2 +- net/sched/sch_hfsc.c | 4 net/sched/sch_htb.c| 2 ++ net/sched/sch_mq.c | 2 +- net/sched/sch_mqprio.c | 2 +- net/sched/sch_multiq.c | 2 ++ net/sched/sch_prio.c | 5 - net/sched/sch_qfq.c| 2 ++ net/sched/sch_red.c| 2 ++ net/sched/sch_sfb.c| 2 ++ net/sched/sch_tbf.c| 2 ++ 18 files changed, 65 insertions(+), 17 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index cd334c9584e9..0625eac2c601 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q, bool invisible); void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index e6aa0a249672..e7dca250d115 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -66,6 +66,7 @@ struct Qdisc { #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy : * qdisc_tree_decrease_qlen() should stop. */ +#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 262f0379d83a..c7de00e09797 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -542,6 +542,7 @@ enum { TCA_FCNT, TCA_STATS2, TCA_STAB, + TCA_DUMP_INVISIBLE, TCA_PAD, __TCA_MAX }; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 206dc24add3a..8e4e6ab1847a 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) return NULL; } -void qdisc_hash_add(struct Qdisc *q) +void qdisc_hash_add(struct Qdisc *q, bool invisible) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { struct Qdisc *root = qdisc_dev(q)->qdisc; @@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q) WARN_ON_ONCE(root == _qdisc); ASSERT_RTNL(); hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle); + if (invisible) + q->flags |= TCQ_F_INVISIBLE; } } EXPORT_SYMBOL(qdisc_hash_add); @@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, goto err_out4; } - qdisc_hash_add(sch); + qdisc_hash_add(sch, false); return sch; } @@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, return -1; } -static bool tc_qdisc_dump_ignore(struct Qdisc *q) +static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool d
[PATCH v2 2/2] iproute2: add support for invisible qdisc dumping
From: Jiri Kosina <jkos...@suse.cz> Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking kernel to perform 'full qdisc dump', as for historical reasons some of the default qdiscs are being hidden by the kernel. The command syntax is being extended by voluntary 'invisible' argument to 'tc qdisc show'. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix the format of help message tc/tc_qdisc.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c index 3a3701c2..1e9d9097 100644 --- a/tc/tc_qdisc.c +++ b/tc/tc_qdisc.c @@ -34,7 +34,7 @@ static int usage(void) fprintf(stderr, " [ stab [ help | STAB_OPTIONS] ]\n"); fprintf(stderr, " [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n"); fprintf(stderr, "\n"); - fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact ]\n"); + fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact ] [ invisible ]\n"); fprintf(stderr, "Where:\n"); fprintf(stderr, "QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }\n"); fprintf(stderr, "OPTIONS := ... try tc qdisc add help\n"); @@ -292,6 +292,7 @@ static int tc_qdisc_list(int argc, char **argv) { struct tcmsg t = { .tcm_family = AF_UNSPEC }; char d[16] = {}; + bool dump_invisible = false; while (argc > 0) { if (strcmp(*argv, "dev") == 0) { @@ -306,6 +307,8 @@ static int tc_qdisc_list(int argc, char **argv) t.tcm_parent = TC_H_INGRESS; } else if (matches(*argv, "help") == 0) { usage(); + } else if (strcmp(*argv, "invisible") == 0) { + dump_invisible = true; } else { fprintf(stderr, "What is \"%s\"? Try \"tc qdisc help\".\n", *argv); return -1; @@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv) filter_ifindex = t.tcm_ifindex; } - if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) { + if (dump_invisible) { + struct { + struct nlmsghdr n; + struct tcmsg t; + char buf[256]; + } req = { + .n.nlmsg_type = RTM_GETQDISC, + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), + }; + + req.t.tcm_family = AF_UNSPEC; + + addattr(, 256, TCA_DUMP_INVISIBLE); + if (rtnl_dump_request_n(, ) < 0) { + perror("Cannot send dump request"); + return 1; + } + + } else if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) { perror("Cannot send dump request"); return 1; } -- Jiri Kosina SUSE Labs
Re: [PATCH 2/2] iproute2: add support for invisible qdisc dumping
On Mon, 27 Feb 2017, Phil Sutter wrote: > > Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking > > kernel to perform 'full qdisc dump', as for historical reasons some of the > > default qdiscs are being hidden by the kernel. > > > > The command syntax is being extended by voluntary 'invisible' argument to > > 'tc qdisc show'. > > > > Signed-off-by: Jiri Kosina <jkos...@suse.cz> > > --- > > tc/tc_qdisc.c | 25 +++-- > > 1 file changed, 23 insertions(+), 2 deletions(-) > > Would you mind adding a description of the new keyword to tc man page as > well? Seems like tc manpage would need more updates, as it doesn't seem to reflect the '[ ingress | clsact ]' possibility either. So I'll send this as a followup patch, fixing all these at once. > > diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c > > index 3a3701c2..29da9269 100644 > > --- a/tc/tc_qdisc.c > > +++ b/tc/tc_qdisc.c > > @@ -34,7 +34,7 @@ static int usage(void) > > fprintf(stderr, " [ stab [ help | STAB_OPTIONS] ]\n"); > > fprintf(stderr, " [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n"); > > fprintf(stderr, "\n"); > > - fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact > > ]\n"); > > + fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact > > | invisible ]\n"); > > Doesn't look like these are mutually exclusive. Therefore I would > suggest fixing the syntax to: > > | + fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact > ] [ invisible ]\n"); Makes sense, will include this in v2. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump
On Mon, 6 Mar 2017, David Miller wrote: > > Ah, right you are, thanks. The complete fix is not super trivial, as it > > needs some more surgery to tc_dump_qdisc_root(), tc_dump_tclass_root() and > > qdisc_match_from_root() (see 69012ae42 for some details). > > > > There are two options: > > > > - this gets fixed in two phases, in first everything *but* noop qdisc gets > > dumped (in the "give me everything" dump) and later we finalize it by > > teaching the above functions about noop_qdisc as well > > > > - I extend this patchset to handle noop qdisc from the very beginning; > > I am unlikely to find time for this during coming weeks though. But OTOH > > this whole thing is very low priority anyway > > > > What do you think? > > I'm not too hot on this whole idea because the only way you can emit > the noop_qdisc is to "dup" it by allocating a new qdisc so that you > can link it in. This has two downsides: > > 1) Extra overhead and memory usage > > 2) All of the simple checks against _qdisc might not be >so simply any more. Fully agreed. I'd be inclined to just live with the fact that noop qdisc would never ever be visible in the dump. I think that in this particular case this can be easily justified, as the behavior is consistent with what is seen in the dump ("nothing happens to the packets"). All other cases will be covered. If there are no objections to this, I'll send out v2 shortly. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump
On Wed, 1 Mar 2017, David Miller wrote: > > @@ -1066,6 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 > > parentid, > > _qdisc_ops, classid); > > if (cl->qdisc == NULL) > > cl->qdisc = _qdisc; > > + qdisc_hash_add(cl->qdisc, true); > > INIT_LIST_HEAD(>children); > > cl->vt_tree = RB_ROOT; > > cl->cf_tree = RB_ROOT; > > @@ -1425,6 +1426,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt) > > sch->handle); > > if (q->root.qdisc == NULL) > > q->root.qdisc = _qdisc; > > + qdisc_hash_add(q->root.qdisc, true); > > INIT_LIST_HEAD(>root.children); > > q->root.vt_tree = RB_ROOT; > > q->root.cf_tree = RB_ROOT; > > I'm not so sure it is legal is potentially pass _qdisc into > qdisc_hash_add(). Ah, right you are, thanks. The complete fix is not super trivial, as it needs some more surgery to tc_dump_qdisc_root(), tc_dump_tclass_root() and qdisc_match_from_root() (see 69012ae42 for some details). There are two options: - this gets fixed in two phases, in first everything *but* noop qdisc gets dumped (in the "give me everything" dump) and later we finalize it by teaching the above functions about noop_qdisc as well - I extend this patchset to handle noop qdisc from the very beginning; I am unlikely to find time for this during coming weeks though. But OTOH this whole thing is very low priority anyway What do you think? Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 0/2] net: sched: make it possible to unhide default qdiscs
On Sat, 25 Feb 2017, Jiri Kosina wrote: > This is a followup to a patchset submitted back in October 2016 No idea what happened that my usual patchset sending process broke and there is no 'References/In-reply-to' in the actual patches :/ I will investigate that. Please let me know in case you'd like a resend just because of this. Sorry for the hassle, -- Jiri Kosina SUSE Labs
[PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump
From: Jiri Kosina <jkos...@suse.cz> The original reason [1] for having hidden qdiscs (potential scalability issues in qdisc_match_from_root() with single linked list in case of large amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable"). This allows us for bringing more clarity and determinism into the dump by making default pfifo qdiscs visible. We're not turning this on by default though, at it was deemed [2] too intrusive / unnecessary change of default behavior towards userspace. Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows applications to request complete qdisc hierarchy dump, including the ones that have always been implicit/invisible. [1] http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com [2] http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- include/net/pkt_sched.h| 2 +- include/net/sch_generic.h | 1 + include/uapi/linux/rtnetlink.h | 1 + net/sched/sch_api.c| 42 ++ net/sched/sch_cbq.c| 4 net/sched/sch_drr.c| 2 ++ net/sched/sch_dsmark.c | 1 + net/sched/sch_generic.c| 2 +- net/sched/sch_hfsc.c | 2 ++ net/sched/sch_htb.c| 1 + net/sched/sch_mq.c | 2 +- net/sched/sch_mqprio.c | 2 +- net/sched/sch_multiq.c | 1 + net/sched/sch_prio.c | 4 +++- net/sched/sch_qfq.c| 1 + net/sched/sch_red.c| 1 + net/sched/sch_sfb.c| 1 + net/sched/sch_tbf.c| 1 + 18 files changed, 54 insertions(+), 17 deletions(-) diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index cd334c9584e9..0625eac2c601 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q, bool invisible); void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index e6aa0a249672..e7dca250d115 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -66,6 +66,7 @@ struct Qdisc { #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy : * qdisc_tree_decrease_qlen() should stop. */ +#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */ u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h index 262f0379d83a..c7de00e09797 100644 --- a/include/uapi/linux/rtnetlink.h +++ b/include/uapi/linux/rtnetlink.h @@ -542,6 +542,7 @@ enum { TCA_FCNT, TCA_STATS2, TCA_STAB, + TCA_DUMP_INVISIBLE, TCA_PAD, __TCA_MAX }; diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 206dc24add3a..8e4e6ab1847a 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) return NULL; } -void qdisc_hash_add(struct Qdisc *q) +void qdisc_hash_add(struct Qdisc *q, bool invisible) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { struct Qdisc *root = qdisc_dev(q)->qdisc; @@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q) WARN_ON_ONCE(root == _qdisc); ASSERT_RTNL(); hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle); + if (invisible) + q->flags |= TCQ_F_INVISIBLE; } } EXPORT_SYMBOL(qdisc_hash_add); @@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev, goto err_out4; } - qdisc_hash_add(sch); + qdisc_hash_add(sch, false); return sch; } @@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid, return -1; } -static bool tc_qdisc_dump_ignore(struct Qdisc *q) +static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool dump_invisible) { - return (q->flags & TCQ_F_BUILTIN) ? true : false; + if (q->flags & TCQ_F_BUILTIN) + return true; + if ((q->flags & TCQ_F_INVISIBLE) && !dump_invisible) + return true; + + return false; } static int qd
[PATCH 2/2] iproute2: add support for invisible qdisc dumping
From: Jiri Kosina <jkos...@suse.cz> Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking kernel to perform 'full qdisc dump', as for historical reasons some of the default qdiscs are being hidden by the kernel. The command syntax is being extended by voluntary 'invisible' argument to 'tc qdisc show'. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- tc/tc_qdisc.c | 25 +++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c index 3a3701c2..29da9269 100644 --- a/tc/tc_qdisc.c +++ b/tc/tc_qdisc.c @@ -34,7 +34,7 @@ static int usage(void) fprintf(stderr, " [ stab [ help | STAB_OPTIONS] ]\n"); fprintf(stderr, " [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n"); fprintf(stderr, "\n"); - fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact ]\n"); + fprintf(stderr, " tc qdisc show [ dev STRING ] [ ingress | clsact | invisible ]\n"); fprintf(stderr, "Where:\n"); fprintf(stderr, "QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | etc. }\n"); fprintf(stderr, "OPTIONS := ... try tc qdisc add help\n"); @@ -292,6 +292,7 @@ static int tc_qdisc_list(int argc, char **argv) { struct tcmsg t = { .tcm_family = AF_UNSPEC }; char d[16] = {}; + bool dump_invisible = false; while (argc > 0) { if (strcmp(*argv, "dev") == 0) { @@ -306,6 +307,8 @@ static int tc_qdisc_list(int argc, char **argv) t.tcm_parent = TC_H_INGRESS; } else if (matches(*argv, "help") == 0) { usage(); + } else if (strcmp(*argv, "invisible") == 0) { + dump_invisible = true; } else { fprintf(stderr, "What is \"%s\"? Try \"tc qdisc help\".\n", *argv); return -1; @@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv) filter_ifindex = t.tcm_ifindex; } - if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) { + if (dump_invisible) { + struct { + struct nlmsghdr n; + struct tcmsg t; + char buf[256]; + } req = { + .n.nlmsg_type = RTM_GETQDISC, + .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)), + }; + + req.t.tcm_family = AF_UNSPEC; + + addattr(, 256, TCA_DUMP_INVISIBLE); + if (rtnl_dump_request_n(, ) < 0) { + perror("Cannot send dump request"); + return 1; + } + + } else if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) { perror("Cannot send dump request"); return 1; } -- Jiri Kosina SUSE Labs
[PATCH 0/2] net: sched: make it possible to unhide default qdiscs
This is a followup to a patchset submitted back in October 2016 http://lkml.kernel.org/r/alpine.lnx.2.00.1610211024400.31...@cbobk.fhfr.pm that aimed at making the qdisc hierarchy more transparent and unhide the default qdiscs in the dump by default. After some discussion, it turned out that the most viable way to go would be to introduce a new netlink attribute for this, and let the userspace chose whether it wants to see the whole hierarchy dump http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net This is implemented by this patchset. First patch adds the new TCA_DUMP_INVISIBLE netlink attribute and its handling in the kernel, the second one adds the iproute2 counterpart. -- Jiri Kosina SUSE Labs
[PATCH] iproute2: tc: introduce build dependency on libnetlink
From: Jiri Kosina <jkos...@suse.cz> Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong (especially so for builds where libnetlink.a gets statically linked into tc). Fix that by introducing an explicit dependency. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- tc/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tc/Makefile b/tc/Makefile index 6dd984f0..3f7fc939 100644 --- a/tc/Makefile +++ b/tc/Makefile @@ -127,7 +127,7 @@ MODDESTDIR := $(DESTDIR)$(LIBDIR)/tc all: tc $(TCSO) -tc: $(TCOBJ) libtc.a +tc: $(TCOBJ) $(LIBNETLINK) libtc.a $(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@ libtc.a: $(TCLIB) -- Jiri Kosina SUSE Labs
Re: [PATCH] net: sched: make default fifo qdiscs appear in the dump
On Fri, 21 Oct 2016, Eric Dumazet wrote: > Some of us are dealing with huge HTB hierarchies, so adding default fifo > in the dump will add more data pumped from the kernel. > > BwE [1] for instance dumps qdisc/classes every 5 seconds. > > I guess we'll need to not pull this patch in our kernels. Okay, so I probably misunderstood you here: https://marc.info/?l=linux-kernel=146073234818214=2 as I thought that as long as we move towards the hashtable, you wouldn't have any issues with this. I'd really like to unhide the default qdiscs, it makes little sense to be inconsistent in this way. Random shot into darkness -- how about making this a CONFIG/sysctl-selectable? Thanks, -- Jiri Kosina SUSE Labs
[PATCH] net: sched: make default fifo qdiscs appear in the dump
The original reason [1] for having hidden qdiscs (potential scalability issues in qdisc_match_from_root() with single linked list in case of large amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert qdisc linked list to hashtable"). This allows us for bringing more clarity and determinism into the dump by making default pfifo qdiscs visible. [1] http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- Tested for cbq, htb and tbf. net/sched/sch_cbq.c| 4 net/sched/sch_drr.c| 2 ++ net/sched/sch_dsmark.c | 1 + net/sched/sch_hfsc.c | 2 ++ net/sched/sch_htb.c| 1 + net/sched/sch_multiq.c | 1 + net/sched/sch_prio.c | 4 +++- net/sched/sch_qfq.c| 1 + net/sched/sch_red.c| 1 + net/sched/sch_sfb.c| 1 + net/sched/sch_tbf.c| 1 + 11 files changed, 18 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index beb554a..3c85e8d 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -1161,6 +1161,8 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt) if (!q->link.q) q->link.q = _qdisc; + qdisc_hash_add(q->link.q); + q->link.priority = TC_CBQ_MAXPRIO - 1; q->link.priority2 = TC_CBQ_MAXPRIO - 1; q->link.cpriority = TC_CBQ_MAXPRIO - 1; @@ -1606,6 +1608,8 @@ static void cbq_put(struct Qdisc *sch, unsigned long arg) cl->quantum = cl->allot; cl->weight = cl->R_tab->rate.rate; + qdisc_hash_add(cl->q); + sch_tree_lock(sch); cbq_link_class(cl); cl->borrow = cl->tparent; diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c index 8af5c59..1d33b94 100644 --- a/net/sched/sch_drr.c +++ b/net/sched/sch_drr.c @@ -118,6 +118,8 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, u32 parentid, if (cl->qdisc == NULL) cl->qdisc = _qdisc; + qdisc_hash_add(cl->qdisc); + if (tca[TCA_RATE]) { err = gen_replace_estimator(>bstats, NULL, >rate_est, NULL, diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c index 1308bbf..d0bffd6 100644 --- a/net/sched/sch_dsmark.c +++ b/net/sched/sch_dsmark.c @@ -367,6 +367,7 @@ static int dsmark_init(struct Qdisc *sch, struct nlattr *opt) p->q = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, sch->handle); if (p->q == NULL) p->q = _qdisc; + qdisc_hash_add(p->q); pr_debug("%s: qdisc %p\n", __func__, p->q); diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index 000f1d3..a75710e 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1066,6 +1066,7 @@ struct hfsc_sched { _qdisc_ops, classid); if (cl->qdisc == NULL) cl->qdisc = _qdisc; + qdisc_hash_add(cl->qdisc); INIT_LIST_HEAD(>children); cl->vt_tree = RB_ROOT; cl->cf_tree = RB_ROOT; @@ -1425,6 +1426,7 @@ struct hfsc_sched { sch->handle); if (q->root.qdisc == NULL) q->root.qdisc = _qdisc; + qdisc_hash_add(q->root.qdisc); INIT_LIST_HEAD(>root.children); q->root.vt_tree = RB_ROOT; q->root.cf_tree = RB_ROOT; diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index c798d0d..421d0a9 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1459,6 +1459,7 @@ static int htb_change_class(struct Qdisc *sch, u32 classid, qdisc_class_hash_insert(>clhash, >common); if (parent) parent->children++; + qdisc_hash_add(cl->un.leaf.q); } else { if (tca[TCA_RATE]) { err = gen_replace_estimator(>bstats, NULL, diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c index 9ffbb02..9266e9c 100644 --- a/net/sched/sch_multiq.c +++ b/net/sched/sch_multiq.c @@ -217,6 +217,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt) sch_tree_lock(sch); old = q->queues[i]; q->queues[i] = child; + qdisc_hash_add(child); if (old != _qdisc) { qdisc_tree_reduce_backlog(old, diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index 8f57589..604a817 100644 --- a/net/sched/sch_prio.c +++ b/net/sched/sch_prio.c @@ -192,8 +192,10 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt) qdisc_destroy(child); } - for (i = oldbands; i < q->bands; i++) + for (i = oldbands; i < q-
Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump
On Thu, 18 Aug 2016, Cong Wang wrote: > Doesn't this mean we can now just remove the ingress case from > tc_dump_qdisc() and simply iterate the whole hash table? That'd be indeed a nice cleanup, but for that we'll have to make sure that the top-level ingress entry makes it to the hashtable as well; that's not the case currently, and hence the special-casing (but I agree that it's in principle just a remnant of the old design and of a rather 1:1 list->hashtable conversion, and there is no good reason for keeping it this way any more). So patch that'd add ingress qdisc to the hashtable whenever it materializes should be enough to get rid of the "we have to walk two linked lists" heritage and the ingress special-casing for dumps. Will look into it next week unless someone is faster. Thanks, -- Jiri Kosina SUSE Labs
[PATCH 2/2] net: sched: avoid duplicates in qdisc dump
From: Jiri Kosina <jkos...@suse.cz> tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases; first, the "standard" dev->qdisc is being dumped. Second, if there is/are ingress queue(s), they are being dumped as well. After conversion of netdevice's qdisc linked-list into hashtable, these two sets are not in two disjunctive sets/lists any more, but are both "reachable" directly from netdevice's hashtable. As a consequence, the "full-depth" dump of the ingress qdiscs results in immediately hitting the netdevice hashtable again, and duplicating the dump that has already been performed for dev->qdisc. What in fact needs to be dumped in case of ingress queue is "just" the top-level ingress qdisc, as everything else has been dumped already. Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed whether it should (while performing the "full" per-netdev qdisc dump) perform the whole recursion, or just dump "additional" top-level (ingress) qdiscs without performing any kind of recursion. This fixes duplicate dumps such as qdisc mq 0: root qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc clsact : parent :fff1 qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") Reported-by: Daniel Borkmann <dan...@iogearbox.net> Tested-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- net/sched/sch_api.c | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 2b75d68..749811f 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1440,7 +1440,7 @@ err_out: static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, struct netlink_callback *cb, - int *q_idx_p, int s_q_idx) + int *q_idx_p, int s_q_idx, bool recur) { int ret = 0, q_idx = *q_idx_p; struct Qdisc *q; @@ -1460,7 +1460,13 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, q_idx++; } - if (!qdisc_dev(root)) + /* If dumping singletons, there is no qdisc_dev(root) and the singleton +* itself has already been dumped. +* +* If we've already dumped the top-level (ingress) qdisc above and the global +* qdisc hashtable, we don't want to hit it again +*/ + if (!qdisc_dev(root) || !recur) goto out; hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { @@ -1504,13 +1510,13 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb) s_q_idx = 0; q_idx = 0; - if (tc_dump_qdisc_root(dev->qdisc, skb, cb, _idx, s_q_idx) < 0) + if (tc_dump_qdisc_root(dev->qdisc, skb, cb, _idx, s_q_idx, true) < 0) goto done; dev_queue = dev_ingress_queue(dev); if (dev_queue && tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb, - _idx, s_q_idx) < 0) + _idx, s_q_idx, false) < 0) goto done; cont: -- 1.9.2
[PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash
From: Jiri Kosina <jkos...@suse.cz> qdisc_match_from_root() is now iterating over per-netdevice qdisc hashtable instead of going through a linked-list of qdiscs (independently on the actual underlying netdev), which was the case before the switch to hashtable for qdiscs. For singleton qdiscs, there is no underlying netdev associated though, and therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will always be NULL. BUG: unable to handle kernel NULL pointer dereference at 0410 IP: [] qdisc_match_from_root+0x2c/0x70 PGD 1aceba067 PUD 1aceb7067 PMD 0 Oops: [#1] PREEMPT SMP [ ... ] task: 8801ec996e00 task.stack: 8801ec934000 RIP: 0010:[] [] qdisc_match_from_root+0x2c/0x70 RSP: 0018:8801ec937ab0 EFLAGS: 00010203 RAX: 0408 RBX: 88025e612000 RCX: ffd8 RDX: RSI: RDI: 81cf8100 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0 R10: 81cf8100 R11: 0030 R12: R13: 88025e612000 R14: 81cf3140 R15: FS: 7f24b9af6740() GS:88026f28() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0 Stack: 8801ec937ad0 81681210 88025dd51a00 fff1 8801ec937b88 81681e4e 81c42bc0 880262431500 81cf3140 88025dd51a10 88025dd51a24 ec937b38 Call Trace: [] qdisc_lookup+0x40/0x50 [] tc_modify_qdisc+0x21e/0x550 [] rtnetlink_rcv_msg+0x95/0x220 [] ? __kmalloc_track_caller+0x172/0x230 [] ? rtnl_newlink+0x870/0x870 [] netlink_rcv_skb+0xa7/0xc0 [] rtnetlink_rcv+0x28/0x30 [] netlink_unicast+0x15b/0x210 [] netlink_sendmsg+0x319/0x390 [] sock_sendmsg+0x38/0x50 [] ___sys_sendmsg+0x256/0x260 [] ? __pagevec_lru_add_fn+0x135/0x280 [] ? pagevec_lru_move_fn+0xd0/0xf0 [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180 [] ? __lru_cache_add+0x75/0xb0 [] ? _raw_spin_unlock+0x16/0x40 [] ? handle_mm_fault+0x39f/0x1160 [] __sys_sendmsg+0x45/0x80 [] SyS_sendmsg+0x12/0x20 [] do_syscall_64+0x57/0xb0 Fix this by special-casing singleton qdiscs (those that don't have underlying netdevice) and introduce immediate handling of those rather than trying to go over an underlying netdevice. We're in the same situation in tc_dump_qdisc_root() and tc_dump_tclass_root(). Ultimately, this will have to be slightly reworked so that we are actually able to show singleton qdiscs (noop) in the dump properly; but we're not currently doing that anyway, so no regression there, and better do this in a gradual manner. Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") Reported-by: Daniel Borkmann <dan...@iogearbox.net> Tested-by: Daniel Borkmann <dan...@iogearbox.net> Reported-by: David Ahern <d...@cumulusnetworks.com> Tested-by: David Ahern <d...@cumulusnetworks.com> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- net/sched/sch_api.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index c093d32..2b75d68 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) { struct Qdisc *q; + if (!qdisc_dev(root)) + return (root->handle == handle ? root : NULL); + if (!(root->flags & TCQ_F_BUILTIN) && root->handle == handle) return root; @@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, goto done; q_idx++; } + + if (!qdisc_dev(root)) + goto out; + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (q_idx < s_q_idx) { q_idx++; @@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb, if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0) return -1; + if (!qdisc_dev(root)) + return 0; + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0) return -1; -- 1.9.2
[PATCH next-next 0/2] qdisc-hashtable fixes
The following two patches fix all the issues that have been reported against the conversion of qdisc linked list to hashtable (currently in net-next) so far. First patch adjusts handling of singleton qdiscs to the new semantics, and is rather straightforward. The second patch, which fixes "cosmetic" issue of duplicate entries in the qdisc dump for ingress qdiscs, is a little bit more hairy; I personally would love to see all the already existing "if (ingress)"-like hacks go away (by, let's say, introducing a general TCQ_F_? flag), but that's way out of scope of this patchset (but already on my todo). Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues and testing the patches promptly. Jiri Kosina (2): net: sched: fix handling of singleton qdiscs with qdisc_hash net: sched: avoid duplicates in qdisc dump net/sched/sch_api.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) -- Jiri Kosina SUSE Labs
Re: kernel panic in qdisc_match_from_root starting openvswitch
On Tue, 16 Aug 2016, Jiri Kosina wrote: > > I am hitting a kernel panic with '/etc/init.d/openvswitch-switch > > restart' Stack trace below. Reverting 59cc1f61f09c ("net: sched: convert > > qdisc linked list to hashtable") clears the problem. > > Thanks a lot for the report. Could you please test either with the > attached two patches applied on top of net-next, or alternatively with > 'might-rebase/net-sched-qdiscs' branch of > > git://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git > > (HEAD of that branch is 60b3487ad7f) and report back to me whether it > fixes any issues you are seeing? Applying the attached patches on top of > net-next or testing with the git branch above should be equivalent test > wrt. qdiscs. And this time hopefully with the actual patches ... -- Jiri Kosina SUSE Labs From 39e6390a50335e88e1bd566f3a5d2f1b09272896 Mon Sep 17 00:00:00 2001 From: Jiri Kosina <ji...@kernel.org> Date: Fri, 12 Aug 2016 15:53:04 +0200 Subject: [PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash qdisc_match_from_root() is now iterating over per-netdevice qdisc hashtable instead of going through a linked-list of qdiscs (independently on the actual underlying netdev), which was the case before the switch to hashtable for qdiscs. For singleton qdiscs, there is no underlying netdev associated though, and therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will always be NULL. BUG: unable to handle kernel NULL pointer dereference at 0410 IP: [] qdisc_match_from_root+0x2c/0x70 PGD 1aceba067 PUD 1aceb7067 PMD 0 Oops: [#1] PREEMPT SMP [ ... ] task: 8801ec996e00 task.stack: 8801ec934000 RIP: 0010:[] [] qdisc_match_from_root+0x2c/0x70 RSP: 0018:8801ec937ab0 EFLAGS: 00010203 RAX: 0408 RBX: 88025e612000 RCX: ffd8 RDX: RSI: RDI: 81cf8100 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0 R10: 81cf8100 R11: 0030 R12: R13: 88025e612000 R14: 81cf3140 R15: FS: 7f24b9af6740() GS:88026f28() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0 Stack: 8801ec937ad0 81681210 88025dd51a00 fff1 8801ec937b88 81681e4e 81c42bc0 880262431500 81cf3140 88025dd51a10 88025dd51a24 ec937b38 Call Trace: [] qdisc_lookup+0x40/0x50 [] tc_modify_qdisc+0x21e/0x550 [] rtnetlink_rcv_msg+0x95/0x220 [] ? __kmalloc_track_caller+0x172/0x230 [] ? rtnl_newlink+0x870/0x870 [] netlink_rcv_skb+0xa7/0xc0 [] rtnetlink_rcv+0x28/0x30 [] netlink_unicast+0x15b/0x210 [] netlink_sendmsg+0x319/0x390 [] sock_sendmsg+0x38/0x50 [] ___sys_sendmsg+0x256/0x260 [] ? __pagevec_lru_add_fn+0x135/0x280 [] ? pagevec_lru_move_fn+0xd0/0xf0 [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180 [] ? __lru_cache_add+0x75/0xb0 [] ? _raw_spin_unlock+0x16/0x40 [] ? handle_mm_fault+0x39f/0x1160 [] __sys_sendmsg+0x45/0x80 [] SyS_sendmsg+0x12/0x20 [] do_syscall_64+0x57/0xb0 Fix this by special-casing singleton qdiscs (those that don't have underlying netdevice) and introduce immediate handling of those rather than trying to go over an underlying netdevice. We're in the same situation in tc_dump_qdisc_root() and tc_dump_tclass_root(). Ultimately, this will have to be slightly reworked so that we are actually able to show singleton qdiscs (noop) in the dump properly; but we're not currently doing that anyway, so no regression there, and better do this in a gradual manner. Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") Reported-by: Daniel Borkmann <dan...@iogearbox.net> Tested-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- net/sched/sch_api.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index c093d32..2b75d68 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) { struct Qdisc *q; + if (!qdisc_dev(root)) + return (root->handle == handle ? root : NULL); + if (!(root->flags & TCQ_F_BUILTIN) && root->handle == handle) return root; @@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, goto done; q_idx++; } + + if (!qdisc_dev(root)) + goto out; + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (q_idx < s_q_idx) { q_idx++; @@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb, if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p
Re: kernel panic in qdisc_match_from_root starting openvswitch
On Tue, 16 Aug 2016, David Ahern wrote: > Jiri: > > I am hitting a kernel panic with '/etc/init.d/openvswitch-switch > restart' Stack trace below. Reverting 59cc1f61f09c ("net: sched: convert > qdisc linked list to hashtable") clears the problem. Thanks a lot for the report. Could you please test either with the attached two patches applied on top of net-next, or alternatively with 'might-rebase/net-sched-qdiscs' branch of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git (HEAD of that branch is 60b3487ad7f) and report back to me whether it fixes any issues you are seeing? Applying the attached patches on top of net-next or testing with the git branch above should be equivalent test wrt. qdiscs. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH] net: sched: fix handling of singleton qdiscs with qdisc_hash
On Tue, 16 Aug 2016, Jiri Kosina wrote: > From: Jiri Kosina <jkos...@suse.cz> > > qdisc_match_from_root() is now iterating over per-netdevice qdisc > hashtable instead of going through a linked-list of qdiscs (independently > on the actual underlying netdev), which used to be the case before the > switch to hashtable for qdiscs. > > For singleton qdiscs, there is no underlying netdev associated though, and > therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will > always be NULL. [ ... snip ... ] > @@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, > struct sk_buff *skb, > goto done; > q_idx++; > } > + > + if (!qdisc_dev(root)) > + goto done; > + Ok, this will cause default singleton-only devices being missed in the dump. I am now working on creating a automation that'd test as many use cases as possible; will send up a new patch once I have all the known corner cases covered (including the ingress / clsact dump duplication). Please drop this one for now, I'll send up an accumulated followup fixes asap. Thanks, -- Jiri Kosina SUSE Labs
[PATCH] net: sched: fix handling of singleton qdiscs with qdisc_hash
From: Jiri Kosina <jkos...@suse.cz> qdisc_match_from_root() is now iterating over per-netdevice qdisc hashtable instead of going through a linked-list of qdiscs (independently on the actual underlying netdev), which used to be the case before the switch to hashtable for qdiscs. For singleton qdiscs, there is no underlying netdev associated though, and therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will always be NULL. BUG: unable to handle kernel NULL pointer dereference at 0410 IP: [] qdisc_match_from_root+0x2c/0x70 PGD 1aceba067 PUD 1aceb7067 PMD 0 Oops: [#1] PREEMPT SMP [ ... ] task: 8801ec996e00 task.stack: 8801ec934000 RIP: 0010:[] [] qdisc_match_from_root+0x2c/0x70 RSP: 0018:8801ec937ab0 EFLAGS: 00010203 RAX: 0408 RBX: 88025e612000 RCX: ffd8 RDX: RSI: RDI: 81cf8100 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0 R10: 81cf8100 R11: 0030 R12: R13: 88025e612000 R14: 81cf3140 R15: FS: 7f24b9af6740() GS:88026f28() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0 Stack: 8801ec937ad0 81681210 88025dd51a00 fff1 8801ec937b88 81681e4e 81c42bc0 880262431500 81cf3140 88025dd51a10 88025dd51a24 ec937b38 Call Trace: [] qdisc_lookup+0x40/0x50 [] tc_modify_qdisc+0x21e/0x550 [] rtnetlink_rcv_msg+0x95/0x220 [] ? __kmalloc_track_caller+0x172/0x230 [] ? rtnl_newlink+0x870/0x870 [] netlink_rcv_skb+0xa7/0xc0 [] rtnetlink_rcv+0x28/0x30 [] netlink_unicast+0x15b/0x210 [] netlink_sendmsg+0x319/0x390 [] sock_sendmsg+0x38/0x50 [] ___sys_sendmsg+0x256/0x260 [] ? __pagevec_lru_add_fn+0x135/0x280 [] ? pagevec_lru_move_fn+0xd0/0xf0 [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180 [] ? __lru_cache_add+0x75/0xb0 [] ? _raw_spin_unlock+0x16/0x40 [] ? handle_mm_fault+0x39f/0x1160 [] __sys_sendmsg+0x45/0x80 [] SyS_sendmsg+0x12/0x20 [] do_syscall_64+0x57/0xb0 Fix this by special-casing singleton qdiscs (those that don't have underlying netdevice) and introduce immediate handling of those rather than trying to go over an underlying netdevice. We're in the same situation in tc_dump_qdisc_root() and tc_dump_tclass_root() as well. Ultimately, this will have to be slightly reworked so that we are actually able to show singleton qdiscs (noop) in the dump properly, which is our ultimate goal. But we're not currently doing that anyway, so no regression there, and better do this in a gradual manner. Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable") Reported-by: Daniel Borkmann <dan...@iogearbox.net> Tested-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- Daniel, I've covered two more cases of the same in tc_dump_qdisc_root() and tc_dump_tclass_root(), but preserved your Reported-by / Tested-by: for the one you did wrt. qdisc_match_from_root(). Please shout out loud if you object. Thanks. net/sched/sch_api.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index c093d32..83413e7 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) { struct Qdisc *q; + if (!qdisc_dev(root)) + return (root->handle == handle ? root : NULL); + if (!(root->flags & TCQ_F_BUILTIN) && root->handle == handle) return root; @@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb, goto done; q_idx++; } + + if (!qdisc_dev(root)) + goto done; + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (q_idx < s_q_idx) { q_idx++; @@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb, if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0) return -1; + if (!qdisc_dev(root)) + return 0; + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0) return -1; -- 1.9.2
Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable
On Sat, 13 Aug 2016, Cong Wang wrote: > > How about we actually extend a little bit the TCQ_F_BUILTIN special case > > test in qdisc_match_from_root()? > > > > After the change, the only way how qdisc_dev() could be NULL should be a > > TCQ_F_BUILTIN case, right? > > > > I was thinking about something like the patch below (the reasong being > > that ->dev would be NULL only in cases of singletonish qdiscs) ... > > wouldn't that also fix the issue you're seeing? Have to think it through a > > little bit more .. > > I think this is probably why we never show noop qdisc in dump. Well, partially. A lot of 'default' qdiscs are omitted in a not really uniform and deterministic way. That's actually the primary point of this whole effort -- to get rid of the hidden qdiscs entirely. > So I think we should relax the singleton rule for noop_qdisc, to save > some code for noop_qdisc case and also for dumping noop_qdisc. Completely moving away from singleton qdiscs is one of the possibilities, but OTOH I think that my special-casing of !qdisc_dev(root) in qdisc_match_from_root() is correct handling of singletons. I've been completely off the grid for the past three days, but I plan to submit this as a proper followup fix tomorrow if noone has any objections. > I will try to work on a patch tomorrow. What still needs to be looked into are the duplicate clsact entries for multiqueue. Thanks, -- Jiri Kosina SUSE Labs
Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable
On Fri, 12 Aug 2016, Daniel Borkmann wrote: > > I was thinking about something like the patch below (the reasong being > > that ->dev would be NULL only in cases of singletonish qdiscs) ... > > wouldn't that also fix the issue you're seeing? Have to think it > > through a little bit more .. > > Ahh, so this has the same effect as previously observed with the other fix. Thanks a lot for confirming that this fixes the panic. I still have to think a little bit more about this though. > Perhaps it's just a dumping issue, but to the below clsact, there shouldn't > be pfifo_fast instances appearing. > > # tc qdisc show dev wlp2s0b1 > qdisc mq 0: root > qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > # tc qdisc add dev wlp2s0b1 clsact > # tc qdisc show dev wlp2s0b1 > qdisc mq 0: root > qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc clsact : parent :fff1 > qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 > qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Hmm, no immediate idea where those are coming from, we'll have to figure it out. The mq device used here has 4 queues, right? -- Jiri Kosina SUSE Labs
[PATCH net-next] net: fix up a few missing hashtable.h conflict resolutions
From: Jiri Kosina <jkos...@suse.cz> There are a couple of leftover symbol conflicts caused by hashtable.h being included by netdevice.h; those were not caught as build failure (they're "only" a warning, but in fact real bugs). Fix those up. Fixes: e87a8f24c ("net: resolve symbol conflicts with generic hashtable.h") Reported-by: Daniel Borkmann <dan...@iogearbox.net> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- drivers/net/ethernet/dec/tulip/de4x5.c| 4 ++-- drivers/net/ethernet/dec/tulip/de4x5.h| 4 ++-- drivers/net/ethernet/freescale/fec_main.c | 6 +++--- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c b/drivers/net/ethernet/dec/tulip/de4x5.c index f0e9e2e..6620fc8 100644 --- a/drivers/net/ethernet/dec/tulip/de4x5.c +++ b/drivers/net/ethernet/dec/tulip/de4x5.c @@ -1966,7 +1966,7 @@ SetMulticastFilter(struct net_device *dev) } else if (lp->setup_f == HASH_PERF) { /* Hash Filtering */ netdev_for_each_mc_addr(ha, dev) { crc = ether_crc_le(ETH_ALEN, ha->addr); - hashcode = crc & HASH_BITS; /* hashcode is 9 LSb of CRC */ + hashcode = crc & DE4X5_HASH_BITS; /* hashcode is 9 LSb of CRC */ byte = hashcode >> 3;/* bit[3-8] -> byte in filter */ bit = 1 << (hashcode & 0x07);/* bit[0-2] -> bit in byte */ @@ -5043,7 +5043,7 @@ build_setup_frame(struct net_device *dev, int mode) *(pa + i) = dev->dev_addr[i]; /* Host address */ if (i & 0x01) pa += 2; } - *(lp->setup_frame + (HASH_TABLE_LEN >> 3) - 3) = 0x80; + *(lp->setup_frame + (DE4X5_HASH_TABLE_LEN >> 3) - 3) = 0x80; } else { for (i=0; i<ETH_ALEN; i++) { /* Host address */ *(pa + (i&1)) = dev->dev_addr[i]; diff --git a/drivers/net/ethernet/dec/tulip/de4x5.h b/drivers/net/ethernet/dec/tulip/de4x5.h index ec756eb..1bfdc9b 100644 --- a/drivers/net/ethernet/dec/tulip/de4x5.h +++ b/drivers/net/ethernet/dec/tulip/de4x5.h @@ -860,8 +860,8 @@ #define PCI 0 #define EISA 1 -#define HASH_TABLE_LEN 512 /* Bits */ -#define HASH_BITS0x01ff/* 9 LS bits */ +#define DE4X5_HASH_TABLE_LEN 512 /* Bits */ +#define DE4X5_HASH_BITS0x01ff/* 9 LS bits */ #define SETUP_FRAME_LEN 192 /* Bytes */ #define IMPERF_PA_OFFSET 156 /* Bytes */ diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 01f7e81..fb5c638 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2887,7 +2887,7 @@ fec_enet_close(struct net_device *ndev) * this kind of feature?). */ -#define HASH_BITS 6 /* #bits in hash */ +#define FEC_HASH_BITS 6 /* #bits in hash */ #define CRC32_POLY 0xEDB88320 static void set_multicast_list(struct net_device *ndev) @@ -2935,10 +2935,10 @@ static void set_multicast_list(struct net_device *ndev) } } - /* only upper 6 bits (HASH_BITS) are used + /* only upper 6 bits (FEC_HASH_BITS) are used * which point to specific bit in he hash registers */ - hash = (crc >> (32 - HASH_BITS)) & 0x3f; + hash = (crc >> (32 - FEC_HASH_BITS)) & 0x3f; if (hash > 31) { tmp = readl(fep->hwp + FEC_GRP_HASH_TABLE_HIGH); -- Jiri Kosina SUSE Labs
Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable
On Fri, 12 Aug 2016, Daniel Borkmann wrote: > This results in below panic. Tested reverting this patch and it fixes > the panic. > > Did you test this also with ingress or clsact qdisc (just try adding > it to lo dev for example) ? Hi Daniel, thanks for the report. Hmm, I am pretty sure clsact worked for me, but I'll recheck. > What happens is the following in qdisc_match_from_root(): > > [ 995.422187] XXX qdisc:88025e4fc800 queue:880262759000 > dev:880261cc2000 handle: > [ 995.422200] XXX qdisc:81cf8100 queue:81cf8240 dev: > (null) handle: > > I believe this is due to dev_ingress_queue_create() assigning the > global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup() > uses for qdisc_match_from_root(). > > But everything that uses things like noop_qdisc cannot work with the > new qdisc_match_from_root(), because qdisc_dev(root) will always trigger > NULL pointer dereference there. Reason is because the dev is always > NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue > in sch_generic.c. > > Now how to fix it? Creating separate noop instances each time it's set > would be quite a waste of memory. Even fuglier would be to hack a static > net device struct into sch_generic.c and let noop_netdev_queue point there > to get to the hash table. Or we just not use qdisc_dev(). How about we actually extend a little bit the TCQ_F_BUILTIN special case test in qdisc_match_from_root()? After the change, the only way how qdisc_dev() could be NULL should be a TCQ_F_BUILTIN case, right? I was thinking about something like the patch below (the reasong being that ->dev would be NULL only in cases of singletonish qdiscs) ... wouldn't that also fix the issue you're seeing? Have to think it through a little bit more .. diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 25aada7..1c9faed 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -260,6 +260,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) { struct Qdisc *q; + if (!qdisc_dev(root)) + return (root->handle == handle ? root : NULL); + if (!(root->flags & TCQ_F_BUILTIN) && root->handle == handle) return root; Thanks! -- Jiri Kosina SUSE Labs
[PATCH 0/2] Convert qdisc linked list into a hashtable
This is a respin of the v6 of the original patch [1], split into two-patch series as requested by davem; first patch fixes all symbol conflicts that'd happen once netdevice.h starts to include hashtable.h, the second one performs the actual switch to hashtable. I've preserved Cong's Reviewed-by:, as code-wise this series is identical to the original v6 of the patch. [1] lkml.kernel.org/r/alpine.lnx.2.00.1608011220580.22...@cbobk.fhfr.pm -- Jiri Kosina SUSE Labs
[PATCH 2/2] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- include/linux/netdevice.h | 4 include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 3 +++ net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 8 +--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 8 files changed, 30 insertions(+), 18 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..17c6499 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,9 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; +#ifdef CONFIG_NET_SCHED + DECLARE_HASHTABLE (qdisc_hash, 4); +#endif unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..d3736d5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); +#ifdef CONFIG_NET_SCHED + hash_init(dev->qdisc_hash); +#endif dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index ddf047d..c093d32 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) root->handle == handle) return root; - list_for_each_entry_rcu(q, >list, list) { + hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) { if (q->handle == handle) return q; } return NULL; } -void qdisc_list_add(struct Qdisc *q) +void qdisc_hash_add(struct Qdisc *q) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { struct Qdisc *root = qdisc_dev(q)->qdisc; WARN_ON_ONCE(root == _qdisc); ASSERT_RTNL(); - list_add_tail_rcu(>list, >list); + hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle); } } -EXPORT_SYMBOL(qdisc_list_add); +EXPORT_SYMBOL(qdisc_hash_add); -void qdisc_list_del(struct Qdisc *q) +void qdisc_hash_del(struct Qdisc *q) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { ASSERT_RTNL(); - list_del_rcu(>list); + hash_del_rcu(>hash); } } -EXPORT_SYMBOL(qdisc_list_del); +EXPORT_SYMBOL(qdisc_hash_del); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle) { @@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev
[PATCH 1/2] net: resolve symbol conflicts with generic hashtable.h
From: Jiri Kosina <jkos...@suse.cz> This is a preparatory patch for converting qdisc linked list into a hashtable. As we'll need to include hashtable.h in netdevice.h, we first have to make sure that this will not introduce symbol conflicts for any of the netdevice.h users. Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- drivers/net/ethernet/ti/davinci_emac.c | 14 +++--- net/ipv6/ip6_gre.c | 12 ++-- net/ipv6/ip6_tunnel.c | 10 +- net/ipv6/ip6_vti.c | 10 +- net/ipv6/sit.c | 10 +- 5 files changed, 28 insertions(+), 28 deletions(-) diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c index f56d66e..91ca2b2 100644 --- a/drivers/net/ethernet/ti/davinci_emac.c +++ b/drivers/net/ethernet/ti/davinci_emac.c @@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr) } /** - * hash_add - Hash function to add mac addr from hash table + * emac_hash_add - Hash function to add mac addr from hash table * @priv: The DaVinci EMAC private adapter structure * @mac_addr: mac address to delete from hash table * * Adds mac address to the internal hash table * */ -static int hash_add(struct emac_priv *priv, u8 *mac_addr) +static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr) { struct device *emac_dev = >ndev->dev; u32 rc = 0; @@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr) if (hash_value >= EMAC_NUM_MULTICAST_BITS) { if (netif_msg_drv(priv)) { - dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\ + dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): Invalid "\ "Hash %08x, should not be greater than %08x", hash_value, (EMAC_NUM_MULTICAST_BITS - 1)); } @@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr) } /** - * hash_del - Hash function to delete mac addr from hash table + * emac_hash_del - Hash function to delete mac addr from hash table * @priv: The DaVinci EMAC private adapter structure * @mac_addr: mac address to delete from hash table * * Removes mac address from the internal hash table * */ -static int hash_del(struct emac_priv *priv, u8 *mac_addr) +static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr) { u32 hash_value; u32 hash_bit; @@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 action, u8 *mac_addr) switch (action) { case EMAC_MULTICAST_ADD: - update = hash_add(priv, mac_addr); + update = emac_hash_add(priv, mac_addr); break; case EMAC_MULTICAST_DEL: - update = hash_del(priv, mac_addr); + update = emac_hash_del(priv, mac_addr); break; case EMAC_ALL_MULTI_SET: update = 1; diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index fdc9de2..d3697a4 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -61,12 +61,12 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); -#define HASH_SIZE_SHIFT 5 -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) +#define IP6_GRE_HASH_SIZE_SHIFT 5 +#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) static int ip6gre_net_id __read_mostly; struct ip6gre_net { - struct ip6_tnl __rcu *tunnels[4][HASH_SIZE]; + struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; struct net_device *fb_tunnel_dev; }; @@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu); will match fallback tunnel. */ -#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1)) +#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1)) static u32 HASH_ADDR(const struct in6_addr *addr) { u32 hash = ipv6_addr_hash(addr); - return hash_32(hash, HASH_SIZE_SHIFT); + return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT); } #define tunnels_r_ltunnels[3] @@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head) for (prio = 0; prio < 4; prio++) { int h; - for (h = 0; h < HASH_SIZE; h++) { + for (h = 0; h < IP6_GRE_HASH_SIZE; h++) { struct ip6_tnl *t; t = rtnl_dereference(ign->tunnels[prio][h]); diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 7b0481e..2050217 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -64,8 +64,8 @@ MODULE_LICENSE(&qu
Re: qdisc hash table changes...
On Tue, 9 Aug 2016, David Miller wrote: > > It does indirectly pull in some networking headers, but it doesn't pull in > > netdevice.h (which is the place where the actual include of hashtable.h > > happens): > > It pulls in skbuff.h, so are you willing to bet forever that skbuff.h > won't indirectly pull in netdevice.h at some point in the future? > > This is obviously a big problem, having global namespace conflicts > like this. > > Pushing fixing it propering into the future isn't really the right > thing to do. > > So please entertain my request to do this properly. Ok, I will add this to my TODO list, but this can take quite some time (especially as I can imagine some maintainers pushing back on this, because it doesn't really fix any existing issue in their code). Does that strictly have to be a show-stopper for the qdisc hash conversion, given the fact that the whole tree is building properly? > Thank you. Thanks, -- Jiri Kosina SUSE Labs
Re: qdisc hash table changes...
On Mon, 8 Aug 2016, David Miller wrote: > I think there will still be build failures even with v6 due to symbol > clashes. > > For example, kernel/audit_tree.c defines HASH_SIZE as an enumeration > value, and that (indirectly) includes networking headers. It does indirectly pull in some networking headers, but it doesn't pull in netdevice.h (which is the place where the actual include of hashtable.h happens): $ make kernel/audit_tree.i; grep HASH_SIZE kernel/audit_tree.i; grep netdev.*\.h kernel/audit_tree.i CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h CHK include/generated/bounds.h CHK include/generated/timeconst.h CHK include/generated/asm-offsets.h CALLscripts/checksyscalls.sh DESCEND objtool CPP kernel/audit_tree.i enum {HASH_SIZE = 128}; static struct list_head chunk_hash_heads[HASH_SIZE]; return chunk_hash_heads + n % HASH_SIZE; for (i = 0; i < HASH_SIZE; i++) # 1 "./include/linux/netdev_features.h" 1 # 15 "./include/linux/netdev_features.h" # 85 "./include/linux/netdev_features.h" struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, return __netdev_alloc_skb(dev, length, ((( gfp_t)0x20u)|(( gfp_t)0x8u)|(( gfp_t)0x200u))); return __netdev_alloc_skb(((void *)0), length, gfp_mask); return netdev_alloc_skb(((void *)0), length); struct sk_buff *skb = __netdev_alloc_skb(dev, length + 0, gfp); return __netdev_alloc_skb_ip_align(dev, length, ((( gfp_t)0x20u)|(( gfp_t)0x8u)|(( gfp_t)0x200u))); So this is fine. > There are others all over the tree. > > I would therefore ask that you first fix the namespace conflicts against > the hash symbols in the entire tree as a separate patch series (one for > each driver/subsystem which has this problem.) Really, get it down to > "git grep hash_add | grep -v _hash_add" and similar returning no output. Is this really worth the hassle when there are no real conflicts (i.e. the code in question doesn't indirectly pull in netdevice.h)? I just did allmodconfig build with v6, and it passed. Fengguang's 0-day bot didn't report any failures either. > Then we can add the qdisc hash facility. Thanks, -- Jiri Kosina SUSE Labs
[PATCH v6] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. As we're adding hashtable.h include into generic netdevice.h, we have to make sure HASH_SIZE macro is now non-conflicting with local definitions. Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix up RCU hastable usage wrt. rtnl fix compilation of .c files which define their own HASH_SIZE that now oncflicts with the one from hashtable.h (newly included via netdevice.h) v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way fix up the number of hash bucket bits (4 bits for 16 buckets) v3 -> v4: put the hastable into struct netdevice only if CONFIG_NET_SCHED has been enabled v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu) add Cong Wang's reviewed-by v5 -> v6: build fix for davinci_emac driver that got symbol conflict due to hashtable.h include, reported by 0day bot drivers/net/ethernet/ti/davinci_emac.c | 14 +++--- include/linux/netdevice.h | 4 include/net/pkt_sched.h| 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c | 3 +++ net/ipv6/ip6_gre.c | 12 ++-- net/ipv6/ip6_tunnel.c | 10 +- net/ipv6/ip6_vti.c | 10 +- net/ipv6/sit.c | 10 +- net/sched/sch_api.c| 23 +-- net/sched/sch_generic.c| 8 +--- net/sched/sch_mq.c | 2 +- net/sched/sch_mqprio.c | 2 +- 13 files changed, 58 insertions(+), 46 deletions(-) diff --git a/drivers/net/ethernet/ti/davinci_emac.c b/drivers/net/ethernet/ti/davinci_emac.c index f56d66e..91ca2b2 100644 --- a/drivers/net/ethernet/ti/davinci_emac.c +++ b/drivers/net/ethernet/ti/davinci_emac.c @@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr) } /** - * hash_add - Hash function to add mac addr from hash table + * emac_hash_add - Hash function to add mac addr from hash table * @priv: The DaVinci EMAC private adapter structure * @mac_addr: mac address to delete from hash table * * Adds mac address to the internal hash table * */ -static int hash_add(struct emac_priv *priv, u8 *mac_addr) +static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr) { struct device *emac_dev = >ndev->dev; u32 rc = 0; @@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr) if (hash_value >= EMAC_NUM_MULTICAST_BITS) { if (netif_msg_drv(priv)) { - dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\ + dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): Invalid "\ "Hash %08x, should not be greater than %08x", hash_value, (EMAC_NUM_MULTICAST_BITS - 1)); } @@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr) } /** - * hash_del - Hash function to delete mac addr from hash table + * emac_hash_del - Hash function to delete mac addr from hash table * @priv: The DaVinci EMAC private adapter structure * @mac_addr: mac address to delete from hash table * * Removes mac address from the internal hash table * */ -static int hash_del(struct emac_priv *priv, u8 *mac_addr) +static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr) { u32 hash_value; u32 hash_bit; @@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 action, u8 *mac_addr) switch (action) { case EMAC_MULTICAST_ADD: - update = hash_add(priv, mac_addr); + update = emac_hash_add(priv, mac_addr); break; case EMAC_MULTICAST_DEL: - update = hash_del(priv, mac_addr); + update = emac_hash_del(priv, mac_addr); break; case EMAC_ALL_MULTI_SET: update = 1; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..17c6499 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,9 @@ struct net_device { unsigned intnum_tx_queues;
Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
On Sun, 31 Jul 2016, Fengguang Wu wrote: > Jiri, I just double checked and find no similar errors related to > qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock > in dccp_v4_ctl_send_reset") builds fine without error. You are right, I realized my mistake afterwards. This is fixed in v5 of the patch. Thanks, -- Jiri Kosina SUSE Labs
[PATCH v5] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. As we're adding hashtable.h include into generic netdevice.h, we have to make sure HASH_SIZE macro is now non-conflicting with local definitions. Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com> Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix up RCU hastable usage wrt. rtnl fix compilation of .c files which define their own HASH_SIZE that now oncflicts with the one from hashtable.h (newly included via netdevice.h) v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way fix up the number of hash bucket bits (4 bits for 16 buckets) v3 -> v4: put the hastable into struct netdevice only if CONFIG_NET_SCHED has been enabled v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu) add Cong Wang's reviewed-by include/linux/netdevice.h | 4 include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 3 +++ net/ipv6/ip6_gre.c| 12 ++-- net/ipv6/ip6_tunnel.c | 10 +- net/ipv6/ip6_vti.c| 10 +- net/ipv6/sit.c| 10 +- net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 8 +--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 12 files changed, 51 insertions(+), 39 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..17c6499 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,9 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; +#ifdef CONFIG_NET_SCHED + DECLARE_HASHTABLE (qdisc_hash, 4); +#endif unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..d3736d5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); +#ifdef CONFIG_NET_SCHED + hash_init(dev->qdisc_hash); +#endif dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index fdc9de2..d3697a4 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -61,12 +61,12 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); -#define HASH_SIZE_SHIFT 5 -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) +#define IP6_GRE_HASH_SIZE_SHIFT 5 +#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) static int ip6gre_net_id __read_mostly; struct ip6gre_net { - struct ip6_tnl __rcu *tunnels[4][HASH_SIZE]; + struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; struct net_device *fb_tunnel_dev; }; @@ -96,12 +96,12 @@
Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable
On Thu, 28 Jul 2016, kbuild test robot wrote: > [auto build test ERROR on v4.7-rc7] > [also build test ERROR on next-20160728] > [cannot apply to net/master net-next/master ipsec-next/master] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303 > config: i386-randconfig-s0-201630 (attached as .config) > compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 > reproduce: > # save the attached .config to linux build tree > make ARCH=i386 > > All errors (new ones prefixed by >>): > >net/built-in.o: In function `dev_activate': > >> (.text+0x37ccb): undefined reference to `qdisc_hash_add' Dear 0-day team, could you please check my question regarding this very build failure here? lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm Thanks, -- Jiri Kosina SUSE Labs
[PATCH v4] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. As we're adding hashtable.h include into generic netdevice.h, we have to make sure HASH_SIZE macro is now non-conflicting with local definitions. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix up RCU hastable usage wrt. rtnl fix compilation of .c files which define their own HASH_SIZE that now oncflicts with the one from hashtable.h (newly included via netdevice.h) v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way fix up the number of hash bucket bits (4 bits for 16 buckets) v3 -> v4: put the hastable into struct netdevice only if CONFIG_NET_SCHED has been enabled include/linux/netdevice.h | 4 include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 3 +++ net/ipv6/ip6_gre.c| 12 ++-- net/ipv6/ip6_tunnel.c | 10 +- net/ipv6/ip6_vti.c| 10 +- net/ipv6/sit.c| 10 +- net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 6 +++--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 12 files changed, 49 insertions(+), 39 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..17c6499 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,9 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; +#ifdef CONFIG_NET_SCHED + DECLARE_HASHTABLE (qdisc_hash, 4); +#endif unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..d3736d5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); +#ifdef CONFIG_NET_SCHED + hash_init(dev->qdisc_hash); +#endif dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index fdc9de2..d3697a4 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -61,12 +61,12 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); -#define HASH_SIZE_SHIFT 5 -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) +#define IP6_GRE_HASH_SIZE_SHIFT 5 +#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) static int ip6gre_net_id __read_mostly; struct ip6gre_net { - struct ip6_tnl __rcu *tunnels[4][HASH_SIZE]; + struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; struct net_device *fb_tunnel_dev; }; @@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu); will match fallback tunnel. */ -#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>
Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
[ added CCs ] On Tue, 12 Jul 2016, kbuild test robot wrote: > Hi, > > [auto build test ERROR on net/master] > [also build test ERROR on v4.7-rc7 next-20160711] > [if your patch is applied to the wrong git tree, please drop us a note to > help improve the system] > > url: > https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527 > config: arm-tct_hammer_defconfig (attached as .config) > compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205 > reproduce: > wget > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross > -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # save the attached .config to linux build tree > make.cross ARCH=arm > > All errors (new ones prefixed by >>): > >net/built-in.o: In function `dev_activate': > >> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add' This issue is be there even without my patch (but with qdisc_list_add instead), isn't it? The problem is that sch_generic.c (where dev_activate() is) is being compiled everytime CONFIG_NET is set, but sch_api.c (where qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there is no stub for !CONFIG_NET_SCHED case). -- Jiri Kosina SUSE Labs
Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
On Tue, 12 Jul 2016, Cong Wang wrote: > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index f45929c..0b5c172e 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -52,6 +52,7 @@ > > #include > > #include > > #include > > +#include > > > > struct netpoll_info; > > struct device; > > @@ -1778,6 +1779,7 @@ struct net_device { > > unsigned intnum_tx_queues; > > unsigned intreal_num_tx_queues; > > struct Qdisc*qdisc; > > + DECLARE_HASHTABLE (qdisc_hash, 4); > > unsigned long tx_queue_len; > > spinlock_t tx_global_lock; > > int watchdog_timeo; > > Should it be surrounded by CONFIG_NET_SCHED? > To save several bytes for !CONFIG_NET_SCHED case. Makes sense. I'll wait a bit for more feedback (if there is any) before including this in potential v4. Thanks, -- Jiri Kosina SUSE Labs
[RFC PATCH v3] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. As we're adding hashtable.h include into generic netdevice.h, we have to make sure HASH_SIZE macro is now non-conflicting with local definitions. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix up RCU hastable usage wrt. rtnl fix compilation of .c files which define their own HASH_SIZE that now oncflicts with the one from hashtable.h (newly included via netdevice.h) v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way fix up the number of hash bucket bits (4 bits for 16 buckets) include/linux/netdevice.h | 2 ++ include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 1 + net/ipv6/ip6_gre.c| 12 ++-- net/ipv6/ip6_tunnel.c | 10 +- net/ipv6/ip6_vti.c| 10 +- net/ipv6/sit.c| 10 +- net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 6 +++--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 12 files changed, 45 insertions(+), 39 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..0b5c172e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,7 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; + DECLARE_HASHTABLE (qdisc_hash, 4); unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..edc1617 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); + hash_init(dev->qdisc_hash); dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index fdc9de2..d3697a4 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -61,12 +61,12 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); -#define HASH_SIZE_SHIFT 5 -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) +#define IP6_GRE_HASH_SIZE_SHIFT 5 +#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT) static int ip6gre_net_id __read_mostly; struct ip6gre_net { - struct ip6_tnl __rcu *tunnels[4][HASH_SIZE]; + struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE]; struct net_device *fb_tunnel_dev; }; @@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu); will match fallback tunnel. */ -#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1)) +#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(IP6_GRE_HASH_SIZE - 1)) static u32 HASH_ADDR(const struct in6_addr *addr
Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
On Fri, 8 Jul 2016, Eric Dumazet wrote: > > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c > > index fdc9de2..0f70ecc 100644 > > --- a/net/ipv6/ip6_gre.c > > +++ b/net/ipv6/ip6_gre.c > > @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644); > > MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); > > > > #define HASH_SIZE_SHIFT 5 > > -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) > > +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT) > > __ prefix is mostly used for functions having some kind of > shells/helpers. > > I would rather use IP6_GRE_HASH_SIZE or something which has lower > chances of being used elsewhere. Alright, makes sense, will do this in v3. > @@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev) > > qdisc->ops->attach(qdisc); > > } > > } > > + if (dev->qdisc) > > + qdisc_hash_add(dev->qdisc); > > } > > > > I do not understand this addition, could you comment on it ? With linked lists, assigning to struct net_device's Qdisc pointer is enough to "initialize" the linked list and have it contain one (root) item. With hashtable, this is not the case, it needs to be explicitly added. Hmm, dev_init_scheduler() (and perhaps also dev_shutdown()) would possibly need similar treatment in order to have accurate data there 100% of the time even during initialization. -- Jiri Kosina SUSE Labs
Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
On Thu, 7 Jul 2016, Craig Gallek wrote: > This sort of seems like it's just side-stepping the problem. Given > that the size of this hash table is fixed, the lookup time of this > operation is still going to approach linear as the number of qdiscs > increases. That's true; however the primary goal here is not to actually ultimately improve speed of qdisc lookup per se, but rather to make it possible to unhide the qdiscs which are currently omitted as the linked list takes too long to walk. The static hashtable is going help here. Thanks, -- Jiri Kosina SUSE Labs
[RFC PATCH v2] net: sched: convert qdisc linked list to hashtable
From: Jiri Kosina <jkos...@suse.cz> Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. As we're adding hashtable.h include into generic netdevice.h, we have to make sure HASH_SIZE macro is now non-conflicting with local definitions. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- v1 -> v2: fix up RCU hastable usage wrt. rtnl fix compilation of .c files which define their own HASH_SIZE that now oncflicts with the one from hashtable.h (newly included via netdevice.h) include/linux/netdevice.h | 2 ++ include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 1 + net/ipv6/ip6_gre.c| 8 net/ipv6/ip6_tunnel.c | 6 +++--- net/ipv6/ip6_vti.c| 6 +++--- net/ipv6/sit.c| 10 +- net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 6 +++--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 12 files changed, 39 insertions(+), 33 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..630838e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,7 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; + DECLARE_HASHTABLE (qdisc_hash, 16); unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..edc1617 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); + hash_init(dev->qdisc_hash); dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index fdc9de2..0f70ecc 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); #define HASH_SIZE_SHIFT 5 -#define HASH_SIZE (1 << HASH_SIZE_SHIFT) +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT) static int ip6gre_net_id __read_mostly; struct ip6gre_net { - struct ip6_tnl __rcu *tunnels[4][HASH_SIZE]; + struct ip6_tnl __rcu *tunnels[4][__HASH_SIZE]; struct net_device *fb_tunnel_dev; }; @@ -96,7 +96,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int set_mtu); will match fallback tunnel. */ -#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 1)) +#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(__HASH_SIZE - 1)) static u32 HASH_ADDR(const struct in6_addr *addr) { u32 hash = ipv6_addr_hash(addr); @@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, struct list_head *head) for (prio = 0; prio < 4; prio++) { int h; - for (
Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
On Thu, 7 Jul 2016, Eric Dumazet wrote: > > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, > > struct sk_buff *skb, > > { > > int ret = 0, q_idx = *q_idx_p; > > struct Qdisc *q; > > + int b; > > > > if (!root) > > return 0; > > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, > > struct sk_buff *skb, > > goto done; > > q_idx++; > > } > > - list_for_each_entry(q, >list, list) { > > + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) { > > if (q_idx < s_q_idx) { > > q_idx++; > > continue; > > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, > > struct sk_buff *skb, > >int *t_p, int s_t) > > { > > struct Qdisc *q; > > + int b; > > > > if (!root) > > return 0; > > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, > > struct sk_buff *skb, > > if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0) > > return -1; > > > > - list_for_each_entry(q, >list, list) { > > + hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) { > > if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0) > > return -1; > > } > > > Not sure why you used the rcu version here, but the non rcu version in > tc_dump_qdisc_root() Good catch. Actually even the current code is odd in this regard -- qdisc_match_from_root() uses RCU iterator, while tc_dump_*() use the non-RCU one; addition and deletion is performed using RCU primitives. I haven't got my head around this yet; if it's correct at all, it'd at least deserve a comment somewhere. I'll respin v2 of the patch (there is also a conflict on HASH_SIZE definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include in netdevice.h that needs to be resolved as well) that'd make RCU usage consistent. Any other objections/comments? I was namely curious about any opinions regarding the hashtable size. Thanks, -- Jiri Kosina SUSE Labs
[RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
On Fri, 15 Apr 2016, Eric Dumazet wrote: > Anyway, we probably need to improve our ability to understand qdisc > hierarchies. Having some hidden qdiscs is the real problem here. > > We need to add some hash table so that qdisc_match_from_root() does not > have to scan hundred of qdiscs. So how about something like the patch below? I already have preliminary patches on top which unhide the default qdiscs, but let's make this one step after the other. Thanks. From: Jiri Kosina <jkos...@suse.cz> Subject: [PATCH] net: sched: convert qdisc linked list to hashtable Convert the per-device linked list into a hashtable. The primary motivation for this change is that currently, we're not tracking all the qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup performed over the linked list by qdisc_match_from_root() is rather expensive. The ultimate goal is to get rid of hidden qdiscs completely, which will bring much more determinism in user experience. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- include/linux/netdevice.h | 2 ++ include/net/pkt_sched.h | 4 ++-- include/net/sch_generic.h | 2 +- net/core/dev.c| 1 + net/sched/sch_api.c | 23 +-- net/sched/sch_generic.c | 6 +++--- net/sched/sch_mq.c| 2 +- net/sched/sch_mqprio.c| 2 +- 8 files changed, 24 insertions(+), 18 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f45929c..630838e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -52,6 +52,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -1778,6 +1779,7 @@ struct net_device { unsigned intnum_tx_queues; unsigned intreal_num_tx_queues; struct Qdisc*qdisc; + DECLARE_HASHTABLE (qdisc_hash, 16); unsigned long tx_queue_len; spinlock_t tx_global_lock; int watchdog_timeo; diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h index fea53f4..8ba11b4 100644 --- a/include/net/pkt_sched.h +++ b/include/net/pkt_sched.h @@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops); void qdisc_get_default(char *id, size_t len); int qdisc_set_default(const char *id); -void qdisc_list_add(struct Qdisc *q); -void qdisc_list_del(struct Qdisc *q); +void qdisc_hash_add(struct Qdisc *q); +void qdisc_hash_del(struct Qdisc *q); struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle); struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle); struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r, diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index 62d5531..26f5cb3 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -67,7 +67,7 @@ struct Qdisc { u32 limit; const struct Qdisc_ops *ops; struct qdisc_size_table __rcu *stab; - struct list_headlist; + struct hlist_node hash; u32 handle; u32 parent; int (*reshape_fail)(struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 904ff43..edc1617 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, INIT_LIST_HEAD(>all_adj_list.lower); INIT_LIST_HEAD(>ptype_all); INIT_LIST_HEAD(>ptype_specific); + hash_init(dev->qdisc_hash); dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM; setup(dev); diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index ddf047d..82953cb 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include @@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle) root->handle == handle) return root; - list_for_each_entry_rcu(q, >list, list) { + hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle) { if (q->handle == handle) return q; } return NULL; } -void qdisc_list_add(struct Qdisc *q) +void qdisc_hash_add(struct Qdisc *q) { if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) { struct Qdisc *root = qdisc_dev(q)->qdisc; WARN_ON_ONCE(root == _qdisc); ASSERT_RTNL(); - list_add_tail_rcu(>list, >list); + hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle); } } -EXPORT_SYMBOL(qdisc_list_add); +EXPORT_SYMBOL(qdisc_hash_add); -void qdisc_list_del(struct Qdisc *q) +void qdisc_hash_del(struct Qdisc *q) {
Re: Deleting child qdisc doesn't reset parent to default qdisc?
On Tue, 28 Jun 2016, Cong Wang wrote: > > BTW, I've started to actually work on fixing this, and I've noticed that > > TBF behavior actually violates what's stated in pfifo_fast manpage: > > > > == > > Whenever an interface is created, the pfifo_fast qdisc is > > automatically used as a queue. If another qdisc is > > attached, it preempts the default pfifo_fast, which automatically > > returns to function when an existing qdisc is detached. > > > > In this sense this qdisc is magic, and unlike other qdiscs. > > == > > It is out of date, now default qdisc can be set to any other qdisc > via /proc. Also, probably due to historical reasons, we don't have > a unified default default qdisc, some uses bfifo, some uses pfifo, > we may break some existing script if we change that. While I do understand that reasoning, I'd argue that unpredictable and unexpected behavior of TBF causing systems with non-working networking is much more likely than any userspace having hard dependency on the fact that default (*) qdisc for TBF is noop. (*) where 'default upon creation' != 'default when reset' Thanks, -- Jiri Kosina SUSE Labs
[RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)
On Fri, 15 Apr 2016, Eric Dumazet wrote: > Then you need to save the initial qdisc (bfifo for TBF) in a special > place, to make sure the delete operation is guaranteed to succeed. > > Or fail the delete if the bfifo can not be allocated. > > I can tell that determinism if far more interesting than usability for > some users occasionally playing with tc. > > Surely the silent fallback to noop_qdisc is wrong. So before we go further and fix the fact that we actually do have hidden qdiscs (by refactoring qdisc_match_from_root() and friends), I'd still like to bring the patch below up for consideration. Thanks. From: Jiri Kosina <jkos...@suse.cz> Subject: [PATCH] sch_tbf: avoid silent fallback to noop_qdisc TBF started its life as a classless qdisc with a single builtin FIFO queue which was being shaped. When it got later turned into classful qdisc, it was written in a way that the fallback qdisc was noop_qdisc, which produces bad user experience (delete of last manually added class doesn't reset it to initial default, but renders networking unusable instead). Switch the default fallback to bfifo; this also mimics how the other guys (HTB, HFSC, CBQ, ...) are behaving. Signed-off-by: Jiri Kosina <jkos...@suse.cz> --- net/sched/sch_tbf.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 3161e49..b06dffe 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -508,8 +508,12 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new, { struct tbf_sched_data *q = qdisc_priv(sch); - if (new == NULL) - new = _qdisc; + if (new == NULL) { + /* reset to default qdisc */ + new = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, sch->parent); + if (!new) + return -ENOBUFS; + } *old = qdisc_replace(sch, new, >qdisc); return 0; -- Jiri Kosina SUSE Labs
Re: Deleting child qdisc doesn't reset parent to default qdisc?
On Fri, 15 Apr 2016, Eric Dumazet wrote: > > TBF is probably a bad example because it started life as a classless > > qdisc. There was only one built-in fifo queue that was shaped. Then > > someone made it classful and changed this behavior. To me it sounds > > reasonable to have the default behavior restored. At minimal > > consistency. > > > Then you need to save the initial qdisc (bfifo for TBF) in a special > place, to make sure the delete operation is guaranteed to succeed. > > Or fail the delete if the bfifo can not be allocated. > > I can tell that determinism if far more interesting than usability for > some users occasionally playing with tc. BTW, I've started to actually work on fixing this, and I've noticed that TBF behavior actually violates what's stated in pfifo_fast manpage: == Whenever an interface is created, the pfifo_fast qdisc is automatically used as a queue. If another qdisc is attached, it preempts the default pfifo_fast, which automatically returns to function when an existing qdisc is detached. In this sense this qdisc is magic, and unlike other qdiscs. == -- Jiri Kosina SUSE Labs
Re: Deleting child qdisc doesn't reset parent to default qdisc?
On Thu, 14 Apr 2016, Phil Sutter wrote: > > > I've came across the behavior where adding a child qdisc and then > > > deleting > > > it again makes the networking dysfunctional (I guess that's because all > > > of > > > a sudden there is absolutely no working qdisc on the device, although > > > there originally was a default one in the parent). > > > > > > In a nutshell, is this expected behavior or bug? > > > > This is the expected behavior. > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default > one upon deletion instead of noop_qdisc, hence I would describe > the situation using the words 'inconsistent' and 'accident' rather than > 'expected'. :) Would a patch that'd unify this in a sense that all qdiscs would assign the default one upon deletion acceptable? Thanks, -- Jiri Kosina SUSE Labs
Re: Deleting child qdisc doesn't reset parent to default qdisc?
On Thu, 14 Apr 2016, Phil Sutter wrote: > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default > one upon deletion instead of noop_qdisc, hence I would describe > the situation using the words 'inconsistent' and 'accident' rather than > 'expected'. :) Exactly. I'd again like to stress the fact that this configuration works: jikos:~ # tc qdisc show qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms and this (after performing add/delete operation) doesn't: jikos:~ # tc qdisc show qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms It's hard to spot a difference (hint: there is none). Thanks, -- Jiri Kosina SUSE Labs
Re: Deleting child qdisc doesn't reset parent to default qdisc?
On Thu, 14 Apr 2016, Jiri Kosina wrote: > In a nutshell, is this expected behavior or bug? Just to clarify what seems to suggest to me that this is rather a bug that needs to be fixed (but apparently one that has been there for quite a long time) can be demonstrated by this: > > = > jikos:~ # tc qdisc show > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms The above configuration works. > jikos:~ # ping -c 1 nix.cz | head -2 > PING nix.cz (195.47.235.3) 56(84) bytes of data. > 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms > > jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq > jikos:~ # tc qdisc show > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms > qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 > divisor 1024 > > jikos:~ # ping -c 1 nix.cz | head -2 > PING nix.cz (195.47.235.3) 56(84) bytes of data. > 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms > > jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq > jikos:~ # tc qdisc show > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms The above configuration doesn't although it's identical to the working one at the beginning. > jikos:~ # ping -c 1 nix.cz | head -2 > PING nix.cz (195.47.235.3) 56(84) bytes of data. > [ ... nothing happens ... ] > ^C -- Jiri Kosina SUSE Labs
Deleting child qdisc doesn't reset parent to default qdisc?
Hi, I've came across the behavior where adding a child qdisc and then deleting it again makes the networking dysfunctional (I guess that's because all of a sudden there is absolutely no working qdisc on the device, although there originally was a default one in the parent). In a nutshell, is this expected behavior or bug? = jikos:~ # tc qdisc show qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms jikos:~ # ping -c 1 nix.cz | head -2 PING nix.cz (195.47.235.3) 56(84) bytes of data. 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq jikos:~ # tc qdisc show qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 1024 jikos:~ # ping -c 1 nix.cz | head -2 PING nix.cz (195.47.235.3) 56(84) bytes of data. 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq jikos:~ # tc qdisc show qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms jikos:~ # ping -c 1 nix.cz | head -2 PING nix.cz (195.47.235.3) 56(84) bytes of data. [ ... nothing happens ... ] ^C jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq jikos:~ # ping -c 1 nix.cz | head -2 PING nix.cz (195.47.235.3) 56(84) bytes of data. 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.66 ms = Thanks, -- Jiri Kosina
Re: [PATCH 1/4] list: introduce list_is_first()
On Thu, 10 Dec 2015, Jens Axboe wrote: > It's a balance, as we also should not make APIs out of everything. As I said, > purely my opinion, but I think the is_last/is_first have jumped the shark. I don't have a strong opinion either way. What I think we should do though, is to either have both (i.e accept this patchset) or have neither of them (i.e. drop list_is_last()). Otherwise people are likely to be confused by such an asymetric API and will keep posting patches for it over and over again. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lockdep warning
On Fri, 22 Feb 2008, Anders Eriksson wrote: I found this is a newly booted 2.6.25-rc2's syslog. Feb 21 20:46:33 tippex BUG: rwlock wrong owner on CPU#0, runscript.sh/2633, d2c04084 Feb 21 20:46:33 tippex Pid: 2633, comm: runscript.sh Not tainted 2.6.25-rc2 #3 Feb 21 20:46:33 tippex [c02342d0] rwlock_bug+0x50/0x60 Feb 21 20:46:33 tippex [c0234356] _raw_write_unlock+0x56/0x60 Feb 21 20:46:33 tippex [c040365d] _write_unlock+0x1d/0x50 Feb 21 20:46:33 tippex [c03289be] neigh_timer_handler+0x18e/0x2d0 Feb 21 20:46:33 tippex [c0125449] run_timer_softirq+0x119/0x180 Feb 21 20:46:33 tippex [c013856d] ? lock_release_holdtime+0x5d/0x80 Feb 21 20:46:33 tippex [c0328830] ? neigh_timer_handler+0x0/0x2d0 Feb 21 20:46:33 tippex [c0121a64] __do_softirq+0x54/0xb0 Feb 21 20:46:33 tippex [c0121af5] do_softirq+0x35/0x40 Feb 21 20:46:33 tippex [c0121cb4] irq_exit+0x44/0x50 Feb 21 20:46:33 tippex [c0105757] do_IRQ+0x47/0x80 Feb 21 20:46:33 tippex [c01039c3] common_interrupt+0x23/0x28 Feb 21 20:46:33 tippex === This needs to be CCed to netdev. Any chance that git revert 69cc64d8d92 makes this report go away? -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lockdep warning
On Fri, 22 Feb 2008, Anders Eriksson wrote: This needs to be CCed to netdev. Any chance that git revert 69cc64d8d92 makes this report go away? I'll have to install a git repo to check, or maybe you can send me the diff to reverse vs. 2.6.25-rc2? diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 7bb6a9a..a16cf1e 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -834,12 +834,18 @@ static void neigh_timer_handler(unsigned long arg) } if (neigh-nud_state (NUD_INCOMPLETE | NUD_PROBE)) { struct sk_buff *skb = skb_peek(neigh-arp_queue); - + /* keep skb alive even if arp_queue overflows */ + if (skb) + skb_get(skb); + write_unlock(neigh-lock); neigh-ops-solicit(neigh, skb); atomic_inc(neigh-probes); - } + if (skb) + kfree_skb(skb); + } else { out: - write_unlock(neigh-lock); + write_unlock(neigh-lock); + } if (notify) neigh_update_notify(neigh); diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c index c663fa5..8e17f65 100644 --- a/net/ipv4/arp.c +++ b/net/ipv4/arp.c @@ -368,6 +368,7 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) if (!(neigh-nud_stateNUD_VALID)) printk(KERN_DEBUG trying to ucast probe in NUD_INVALID\n); dst_ha = neigh-ha; + read_lock_bh(neigh-lock); } else if ((probes -= neigh-parms-app_probes) 0) { #ifdef CONFIG_ARPD neigh_app_ns(neigh); @@ -377,6 +378,8 @@ static void arp_solicit(struct neighbour *neigh, struct sk_buff *skb) arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr, dst_ha, dev-dev_addr, NULL); + if (dst_ha) + read_unlock_bh(neigh-lock); } static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: lockdep warning
On Fri, 22 Feb 2008, Anders Eriksson wrote: Any chance that git revert 69cc64d8d92 makes this report go away? I've tested the patch and I no longer get that lock thing in my syslog. Thanks for verification. Hmm, I don't immediately see how this patch could make neigh-lock owner to change between lock and unlock ... I have skimmed through the solicit methods, and they don't seem to be doing anything nasty to neigh ... The scenario I was thinking about is that before 69cc64d8d92, if any of the _solicit methods could do anything bad to neigh struct, this warning wouldn't trigger, because the lock has been dropped before calling _solicit() and reacquired later, so no mismatch on -current could happen, but now as long as the lock is held during _solicit() call, this would trigger on the next unlock. But I am not able to see anything like that in the code. Dave, do you have any idea? (the thread started at http://lkml.org/lkml/2008/2/22/105). -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Subject: [PATCH] Revert [RTNETLINK]: Send a single notification on device state changes.
This reverts commit 45b503548210fe6f23e92b856421c2a3f05fd034. It contains deadlock, and breaks userspace applications (wpa_supplicant, networkmanager). References: http://bugzilla.kernel.org/show_bug.cgi?id=10002 http://bugzilla.kernel.org/show_bug.cgi?id=10002 --- net/core/rtnetlink.c | 36 ++-- 1 files changed, 10 insertions(+), 26 deletions(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index ecb02af..61ac8d0 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -504,7 +504,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst, u32 id, EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo); -static int set_operstate(struct net_device *dev, unsigned char transition, bool send_notification) +static void set_operstate(struct net_device *dev, unsigned char transition) { unsigned char operstate = dev-operstate; @@ -527,12 +527,8 @@ static int set_operstate(struct net_device *dev, unsigned char transition, bool write_lock_bh(dev_base_lock); dev-operstate = operstate; write_unlock_bh(dev_base_lock); - - if (send_notification) - netdev_state_change(dev); - return 1; - } else - return 0; + netdev_state_change(dev); + } } static void copy_rtnl_link_stats(struct rtnl_link_stats *a, @@ -826,7 +822,6 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, if (tb[IFLA_BROADCAST]) { nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len); send_addr_notify = 1; - modified = 1; } if (ifm-ifi_flags || ifm-ifi_change) { @@ -839,23 +834,16 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm, dev_change_flags(dev, flags); } - if (tb[IFLA_TXQLEN]) { - if (dev-tx_queue_len != nla_get_u32(tb[IFLA_TXQLEN])) { - dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); - modified = 1; - } - } + if (tb[IFLA_TXQLEN]) + dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); if (tb[IFLA_OPERSTATE]) - modified |= set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]), false); + set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); if (tb[IFLA_LINKMODE]) { - if (dev-link_mode != nla_get_u8(tb[IFLA_LINKMODE])) { - write_lock_bh(dev_base_lock); - dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); - write_lock_bh(dev_base_lock); - modified = 1; - } + write_lock_bh(dev_base_lock); + dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); + write_unlock_bh(dev_base_lock); } err = 0; @@ -869,10 +857,6 @@ errout: if (send_addr_notify) call_netdevice_notifiers(NETDEV_CHANGEADDR, dev); - - if (modified) - netdev_state_change(dev); - return err; } @@ -990,7 +974,7 @@ struct net_device *rtnl_create_link(struct net *net, char *ifname, if (tb[IFLA_TXQLEN]) dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]); if (tb[IFLA_OPERSTATE]) - set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]), true); + set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE])); if (tb[IFLA_LINKMODE]) dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]); -- 1.5.2.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] New Kernel Bugs
On Tue, 13 Nov 2007, Andrew Morton wrote: HID Kernel NULL pointer dereference at :usbhid:hiddev_ioctl+0x2f/0xabc http://bugzilla.kernel.org/show_bug.cgi?id=9216 Kernel: 2.6.23.1 Looks like this is a regression No response from developers Hi, it is assigned to '[EMAIL PROTECTED]', so I didn't notice, it's as simple as that. -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: oops, Linus tree: 2.6.23-g4fa4d23f, BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
successfully arp_tables: (C) 2002 David S. Miller TCP cubic registered NET: Registered protocol family 1 NET: Registered protocol family 10 ip6_tables: (C) 2000-2006 Netfilter Core Team IPv6 over IPv4 tunneling driver NET: Registered protocol family 17 Using IPI Shortcut mode input: ImPS/2 Logitech Wheel Mouse as /devices/platform/i8042/serio1/input/input3 md: Autodetecting RAID arrays. md: Scanned 8 and added 8 devices. md: autorun ... md: considering hdc4 ... md: adding hdc4 ... md: hdc3 has different UUID to hdc4 md: hdc2 has different UUID to hdc4 md: hdc1 has different UUID to hdc4 md: adding hda4 ... md: hda3 has different UUID to hdc4 md: hda2 has different UUID to hdc4 md: hda1 has different UUID to hdc4 md: created md4 md: bindhda4 md: bindhdc4 md: running: hdc4hda4 raid1: raid set md4 active with 2 out of 2 mirrors md4: bitmap initialized from disk: read 11/11 pages, set 6 bits created bitmap (169 pages) for device md4 md: considering hdc3 ... md: adding hdc3 ... md: hdc2 has different UUID to hdc3 md: hdc1 has different UUID to hdc3 md: adding hda3 ... md: hda2 has different UUID to hdc3 md: hda1 has different UUID to hdc3 md: created md3 md: bindhda3 md: bindhdc3 md: running: hdc3hda3 raid1: raid set md3 active with 2 out of 2 mirrors md3: bitmap initialized from disk: read 15/15 pages, set 0 bits created bitmap (233 pages) for device md3 md: considering hdc2 ... md: adding hdc2 ... md: hdc1 has different UUID to hdc2 md: adding hda2 ... md: hda1 has different UUID to hdc2 md: created md2 md: bindhda2 md: bindhdc2 md: running: hdc2hda2 raid1: raid set md2 active with 2 out of 2 mirrors md2: bitmap initialized from disk: read 8/8 pages, set 0 bits created bitmap (123 pages) for device md2 md: considering hdc1 ... md: adding hdc1 ... md: adding hda1 ... md: created md1 md: bindhda1 md: bindhdc1 md: running: hdc1hda1 raid1: raid set md1 active with 2 out of 2 mirrors md1: bitmap initialized from disk: read 10/10 pages, set 0 bits created bitmap (153 pages) for device md1 md: ... autorun DONE. EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 264k freed EXT3 FS on md4, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on md3, internal journal EXT3-fs: mounted filesystem with ordered data mode. Adding 192k swap on /dev/md2. Priority:-1 extents:1 across:192k eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1 eth1: link up, 100Mbps, full-duplex, lpa 0x45E1 eth0: no IPv6 routers present eth1: no IPv6 routers present BUG: unable to handle kernel NULL pointer dereference at virtual address 0004 printing eip: c0383fa0 *pde = Oops: [#1] CPU:0 EIP:0060:[c0383fa0]Not tainted VLI EFLAGS: 00010282 (2.6.23-g4fa4d23f #4) EIP is at sk_filter_delayed_uncharge+0x0/0x20 eax: c5dfb600 ebx: ecx: edx: esi: c5dfb600 edi: ebp: c170d300 esp: c5dcbef8 ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process dhcpd (pid: 5380, ti=c5dca000 task=c47f0520 task.ti=c5dca000) Stack: c038410f 0068 0058 c5dcbf34 b7fd000b c1b75000 c5dfb600 c0371d19 c1b70fd8 c180d300 c1b70f80 0007 0001 b7fd000b c170d900 b7fd000b 080c9fa0 000e c5dcbf6c c036dbc2 000e c1b75000 0008 Call Trace: [c038410f] sk_attach_filter+0xdf/0x100 [c0371d19] sock_setsockopt+0x509/0x590 [c036dbc2] sockfd_lookup_light+0x32/0x60 [c036dd8b] sys_setsockopt+0x9b/0xb0 [c036f705] sys_socketcall+0xd5/0x280 [c010503e] sysenter_past_esp+0x5f/0x85 === Code: cb 04 76 c7 83 c1 01 83 ee 01 39 d1 75 8d eb d7 83 7c cb 04 0f 77 b4 83 c1 01 83 ee 01 39 d1 0f 85 76 ff ff ff eb c0 8d 74 26 00 8b 4a 04 8d 0c cd 10 00 00 00 29 48 5c 8d 42 08 ba 10 40 38 c0 EIP: [c0383fa0] sk_filter_delayed_uncharge+0x0/0x20 SS:ESP 0068:c5dcbef8 -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bluetooth wireless mouse very sluggish when there is moderated network activity
On Sat, 13 Oct 2007, Ismail Dönmez wrote: input: Microsoft Microsoft? Wireless Notebook Presenter Mouse 8000 as /devices/pci:00/:00:1d.1/usb3/3-1/3-1.3/3-1.3:1.0/input/input12 input: USB HID v1.11 Mouse [Microsoft Microsoft? Wireless Notebook Presenter Mouse 8000] on usb-:00:1d.1-1.3 Problem is when there is a moderate traffic on wireless interface (300-400k/s down 50-60k/up on 4Mbit) mouse is really sluggish. This is when I am running ktorrent. If I stop the torrent download/upload its fine again. P.S: If it matters I am using ipw3945 driver with 2.6.18 kernel and iwl3945 with the newer ones. Hi Ismail, it would be great if you'd be able to isolate what is the cause in a little more detailed way - i.e. try a different network card in the same machine, try a different bluetooth mouse in the same machine, try the very same mouse and network card in a different machine, etc. Otherwise I am afraid this is going to be just a totally wild guessing. Thanks, -- Jiri Kosina
[PATCH] IPV6: fix source address selection
From: Jiri Kosina [EMAIL PROTECTED] [PATCH] IPV6: fix source address selection The commit 95c385 broke proper source address selection for cases in which there is a address which is makred 'deprecated'. The commit mistakenly changed ifa-flags to ifa_result-flags (probably copy/paste error from a few lines above) in the 'Rule 3' address selection code. The patch below restores the previous RFC-compliant behavior, please apply. Cc: Jiri Bohac [EMAIL PROTECTED] Cc: Petr Baudis [EMAIL PROTECTED] Signed-off-by: Jiri Kosina [EMAIL PROTECTED] diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 91ef3be..45b4c82 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1021,7 +1021,7 @@ int ipv6_dev_get_saddr(struct net_device hiscore.rule++; } if (ipv6_saddr_preferred(score.addr_type) || - (((ifa_result-flags + (((ifa-flags (IFA_F_DEPRECATED|IFA_F_OPTIMISTIC)) == 0))) { score.attrs |= IPV6_SADDR_SCORE_PREFERRED; if (!(hiscore.attrs IPV6_SADDR_SCORE_PREFERRED)) { - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [374/2many] MAINTAINERS - PCNET32 NETWORK DRIVER
On Sun, 12 Aug 2007, David Miller wrote: Ok, 374 patches is just rediculious. So many patches eats up an enormous amount of mailing list resources, and for these patches in particular there are few reasons to split them up at all. The fact that the split up landed you at 300+ patches is a very good indication of that. Actually, I can see 546 patches up to now. Horrid. Besides that, it's very unfortunate that every single mail of this ridiculously hge patch series starts a new thread (no 'References' or 'In-Reply-To' header), so it's not possible to nuke it at once in an usual way. -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Weird network problems with 2.6.23-rc2
On Sat, 11 Aug 2007, Shish wrote: Something seems to have broken in 2.6.23-rc2, and I'm not sure what, or where I should look for further debugging. The info I have: On my 2.6.23-rc2 desktop, things run fine. On my test server, built from the same source tree, networking goes strange every few minutes, with the following symptoms: o) running ping against the server, the first ping goes through; further pings go AWOL until about icmp_seq=30, when I get 4-5 icmp replies (marked as DUP!), then no pings for a while, then dups, and so on. o) the server doesn't see ARP replies. According to tcpdump, the server will send eg who has 192.168.0.2? tell 192.168.0.1; the client in question will recieve the packet and send a response, but nothing shows up in the server-side tcpdump. o) after a few minutes of random network troubles, everything will work fine again, (ping is normal, arp replies are seen, tcp sessions work) for a few minutes. o) The server's dmesg shows lots of short udp packet messages o) ifdown then ifup'ing the interfaces fixes things, temporarily. Reverting to 2.6.22, everything seems to be running fine (but no lguest, which is what I came for :( ) I've also tried with the latest code from git, the behaviour is the same as 2.6.23-rc2. This needs to go to netdev, CC added. Also, git-bisect will help a lot here to find the commit which caused the regression you are seeing. -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git patches] net driver updates
On Tue, 10 Jul 2007, Jeff Garzik wrote: Various minor updates. The only thing of note is sk98lin driver removal. Please pull from 'upstream-linus' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git upstream-linus This is probably going to cause a new entry in regressions list, as at least Chris Stromsoe has reported bonding-related problems with skge that don't happen with sklin - http://www.uwsg.indiana.edu/hypermail/linux/kernel/0707.1/1158.html -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523
On Fri, 11 May 2007, Satyam Sharma wrote: (later) I Googled a bit to see if this problem was faced elsewhere in the kernel too. Saw the following commit by Ingo Molnar (9883a13c72dbf8c518814b6091019643cdb34429): - lock_sock(sock-sk); + local_bh_disable(); + bh_lock_sock_nested(sock-sk); rc = selinux_netlbl_socket_setsid(sock, sksec-sid); - release_sock(sock-sk); + bh_unlock_sock(sock-sk); + local_bh_enable(); Is it _really_ *this* simple? Hi Satyam, actually this *seems* to be proper solution also for our case, thanks for pointing this out. I will think about it once again, do some more tests with this locking scheme, and will let you know. Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523
On Wed, 16 May 2007, Marcel Holtmann wrote: since Jiri has a good test case for it, I leave it to him for testing. If he confirms that this fixes the locking issues, then this is Signed-off-by: Marcel Holtmann [EMAIL PROTECTED] I will verify later this evening and will let you know. I am however pretty convinced now that this is the right fix. Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523
On Wed, 16 May 2007, Jiri Kosina wrote: since Jiri has a good test case for it, I leave it to him for testing. If he confirms that this fixes the locking issues, then this is Signed-off-by: Marcel Holtmann [EMAIL PROTECTED] I will verify later this evening and will let you know. I am however pretty convinced now that this is the right fix. Satyam, I have just verified that this locking scheme is indeed correct. So you can add Signed-off-by: Jiri Kosina [EMAIL PROTECTED] if you wish to, and submit the patch to Andrew. Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523
On Wed, 16 May 2007, David Miller wrote: I have just verified that this locking scheme is indeed correct. So you can add Signed-off-by: Jiri Kosina [EMAIL PROTECTED] if you wish to, and submit the patch to Andrew. I guess I don't get sent networking patches any more? :-) Well, this is bluetooth-specific, but it seemed to me that Marcel wasn't going to send pull requests to Linus any time soon, therefore I thought going through akpm is a thing to do. Honestly, I really don't care through which tree this goes in, so sorry if any offence was caused here :) -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Getting make net/built-in.o Error with 2.6.21.1 Build
On Tue, 8 May 2007, Satyam Sharma wrote: Sure, my aim here was to only solve the _build breakage_ by fixing the Kconfig for this module (that used code from another kernel module without listing it in its dependencies). If, as you say, the real solution is that we should actually be taking out the offending call to the other module itself, then please go ahead -- I don't know much about the Bluetooth / HIDP subsytem anyway. Converting the hid-ff drivers to be also transport-independent is on my TODO list, but it didn't happen yet. Marcel - are you aware of any devices currently supported by USB HID force-feedback code, which have a bluetooth version, please? I'd propose the patch below, until I make the usbhid force-feedback code transport independent. Thanks. From: Jiri Kosina [EMAIL PROTECTED] [Bluetooth] HIDP - don't initialize force feedback The current implementation of force feedback for HID devices is USB-transport only and therefore calling hid_ff_init() from hidp code is not going to work (plus it creates unwanted dependency of hidp on usbhid). Remove the hid_ff_init() until either the hid-ff is made transport-independent, or at least support for bluetooth transport is added. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index d342e89..3e77e81 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -737,10 +737,8 @@ static inline void hidp_setup_hid(struct list_for_each_entry(report, hid-report_enum[HID_FEATURE_REPORT].report_list, list) hidp_send_report(session, report); - if (hidinput_connect(hid) == 0) { + if (hidinput_connect(hid) == 0) hid-claimed |= HID_CLAIMED_INPUT; - hid_ff_init(hid); - } } int hidp_add_connection(struct hidp_connadd_req *req, struct socket *ctrl_sock, struct socket *intr_sock) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Getting make net/built-in.o Error with 2.6.21.1 Build
On Tue, 8 May 2007, Marcel Holtmann wrote: Marcel - are you aware of any devices currently supported by USB HID force-feedback code, which have a bluetooth version, please? I haven't looked at all details for the PS3 controller, but that might be the first one. In theory they can and at some point they will enter the market. You are right, PS3 controller is going to be shipped in both variants. On the other hand it is perfectly possible that we will need special force-feedback driver for it anyway. BTW when talking about this - we already have PS3 quirk present in usb hid (extra control URB is required to make it operational), probably something similar will be needed for BT version too. From: Jiri Kosina [EMAIL PROTECTED] [Bluetooth] HIDP - don't initialize force feedback The current implementation of force feedback for HID devices is USB-transport only and therefore calling hid_ff_init() from hidp code is not going to work (plus it creates unwanted dependency of hidp on usbhid). Remove the hid_ff_init() until either the hid-ff is made transport-independent, or at least support for bluetooth transport is added. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] Signed-off-by: Marcel Holtmann [EMAIL PROTECTED] Under the condition that you remember to put it back once a generic FF exists. Sure. I will take this through my tree then, thanks. -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523
On Tue, 24 Apr 2007, Herbert Xu wrote: Hmm, *sigh*. I guess the patch below fixes the problem, but it is a masterpiece in the field of ugliness. And I am not sure whether it is completely correct either. Are there any immediate ideas for better solution with respect to how struct sock locking works? Please cc such patches to netdev. Thanks. Hi Herbert, well it's pretty much bluetooth-specific, and bluez-devel was CCed, but OK. diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c index 71f5cfb..c5c93cd 100644 --- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -656,7 +656,10 @@ static int hci_sock_dev_event(struct notifier_block *this, unsigned long event, /* Detach sockets from device */ read_lock(hci_sk_list.lock); sk_for_each(sk, node, hci_sk_list.head) { - lock_sock(sk); + if (in_atomic()) + bh_lock_sock(sk); + else + lock_sock(sk); This doesn't do what you think it does. bh_lock_sock can still succeed even with lock_sock held by someone else. I know, this was precisely the reason why I converted the bh_lock_sock() to lock_sock() here some time ago (as it was racy with l2cap_connect_cfm()). Does this need to occur immediately when an event occurs? If not I'd suggest moving this into a workqueue. Will have to check whether this will be processed properly in time when going to suspend. Thanks, -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: r8169 problems on x86_64 (transmit timeout crash)
On Fri, 26 Jan 2007, mirek kratochvil wrote: In case you have BIOS 04.., try to upgrade to 06.. version. Also, does it help when you boot with acpi=off kernel commandline parameter? (do you compile kernel with both acpi and apic support?). Upgraded to bios 0802, but still the same problem. I think it's definitely something about the r8169 driver. acpi=off doesn't help. turning on/off apic also doesn't matter. Added some relevant CCs; the thread started in lkml at http://lkml.org/lkml/2007/1/25/268 one thing more - have confirmed that the bug is reproducible on *any* laptop with 8168, not only Asus. I guess the disk-usage-crash was some other bug. Confusing ... the other day you stated that r1000 has the exactly same problem? Thanks, -- Jiri Kosina SUSE Labs - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly Fix missing handling of pci_enable_device() return value in skge_resume() Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index 99e9262..e12fb62 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + if ((ret = pci_enable_device(pdev))) { + printk(KERN_ERR sk98lin: Cannot enable PCI device during resume\n); + unregister_netdev(dev); + return ret; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Stephen Hemminger wrote: pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + if ((ret = pci_enable_device(pdev))) { + printk(KERN_ERR sk98lin: Cannot enable PCI device during resume\n); + unregister_netdev(dev); Having the device unregister seems harsh. What would be the proper way? As the initialization failed, accessing the device would not make sense any more (therefore I don't think that calling skge_remove_one() would be OK, as it issues calls to SkEventQueue() and SkEventDispatcher(), trying to send something to the card). Why put condtional on same line? Pardon me? Why not print device name dev-name. Thanks. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..1f03cf8 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + if ((ret = pci_enable_device(pdev))) { + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during resume\n, + dev-name); + return ret; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Stephen Hemminger wrote: Having the device unregister seems harsh. What would be the proper way? As the initialization failed, accessing the device would not make sense any more (therefore I don't think that calling skge_remove_one() would be OK, as it issues calls to SkEventQueue() and SkEventDispatcher(), trying to send something to the card). I guess, its just not clear what the state of the machine is anyway if you can't enable the device something is hosed (or the device was hot removed). Well, it depends on definition of 'hot'. What would for example happen in the case suspend-to-disk - remove the card when the machine is switched off - resume-from-disk? I guess that exactly this pci_enable_device() will fail, so we definitely have to handle this case, as it can easily happen. Why put condtional on same line? Pardon me? I prefer: ret = pci_enable_device(pdev); As you wish. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c |8 +++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..d691811 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during resume\n, + dev-name); + unregister_netdev(dev); + return ret; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly
On Thu, 12 Oct 2006, Andrew Morton wrote: pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_ERR sk98lin: Cannot enable PCI device %s during resume\n, + dev-name); + unregister_netdev(dev); This looks rather wrong - skge_exit() will run unregister_netdev() again. You are of course right (the problem was also spotted by Russell King). This I believe is the correct one for the sk98lin case. [PATCH] fix sk98lin driver, ignoring return value from pci_enable_device() add check of return value to _resume() function of sk98lin driver. Signed-off-by: Jiri Kosina [EMAIL PROTECTED] --- drivers/net/sk98lin/skge.c | 20 +++- 1 files changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c index d4913c3..3a9323d 100644 --- a/drivers/net/sk98lin/skge.c +++ b/drivers/net/sk98lin/skge.c @@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p pci_set_power_state(pdev, PCI_D0); pci_restore_state(pdev); - pci_enable_device(pdev); + ret = pci_enable_device(pdev); + if (ret) { + printk(KERN_WARNING sk98lin: unable to enable device %s in resume\n, + dev-name); + goto out_err; + } pci_set_master(pdev); if (pAC-GIni.GIMacsFound == 2) ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, dev); @@ -5078,10 +5083,8 @@ static int skge_resume(struct pci_dev *p ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, sk98lin, dev); if (ret) { printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, dev-irq); - pAC-AllocFlag = ~SK_ALLOC_IRQ; - dev-irq = 0; - pci_disable_device(pdev); - return -EBUSY; + ret = -EBUSY; + goto out_err; } netif_device_attach(dev); @@ -5098,6 +5101,13 @@ static int skge_resume(struct pci_dev *p } return 0; +out_err: + pAC-AllocFlag = ~SK_ALLOC_IRQ; + dev-irq = 0; + pci_disable_device(pdev); + + return ret; + } #else #define skge_suspend NULL -- Jiri Kosina - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html