Re: [PATCH net] net: sched: red: avoid hashing NULL child

2018-05-18 Thread Jiri Kosina
On Fri, 18 May 2018, Paolo Abeni wrote:

> Hangbin reported an Oops triggered by the syzkaller qdisc rules:
> 
>  kasan: GPF could be caused by NULL-ptr deref or user memory access
>  general protection fault:  [#1] SMP KASAN PTI
>  Modules linked in: sch_red
>  CPU: 0 PID: 28699 Comm: syz-executor5 Not tainted 4.17.0-rc4.kcov #1
>  Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>  RIP: 0010:qdisc_hash_add+0x26/0xa0
>  RSP: 0018:8800589cf470 EFLAGS: 00010203
>  RAX: dc00 RBX:  RCX: 824ad971
>  RDX: 0007 RSI: c9000ce9f000 RDI: 003c
>  RBP: 0001 R08: ed000b139ea2 R09: 8800589cf4f0
>  R10: 8800589cf50f R11: ed000b139ea2 R12: 880054019fc0
>  R13: 880054019fb4 R14: 88005c0af600 R15: 880054019fb0
>  FS:  7fa6edcb1700() GS:88005ce0() knlGS:
>  CS:  0010 DS:  ES:  CR0: 80050033
>  CR2: 2740 CR3: 0fc16000 CR4: 06f0
>  DR0:  DR1:  DR2: 
>  DR3:  DR6: fffe0ff0 DR7: 0400
>  Call Trace:
>   red_change+0x2d2/0xed0 [sch_red]
>   qdisc_create+0x57e/0xef0
>   tc_modify_qdisc+0x47f/0x14e0
>   rtnetlink_rcv_msg+0x6a8/0x920
>   netlink_rcv_skb+0x2a2/0x3c0
>   netlink_unicast+0x511/0x740
>   netlink_sendmsg+0x825/0xc30
>   sock_sendmsg+0xc5/0x100
>   ___sys_sendmsg+0x778/0x8e0
>   __sys_sendmsg+0xf5/0x1b0
>   do_syscall_64+0xbd/0x3b0
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>  RIP: 0033:0x450869
>  RSP: 002b:7fa6edcb0c48 EFLAGS: 0246 ORIG_RAX: 002e
>  RAX: ffda RBX: 7fa6edcb16b4 RCX: 00450869
>  RDX:  RSI: 20c0 RDI: 0013
>  RBP: 0072bea0 R08:  R09: 
>  R10:  R11: 0246 R12: 
>  R13: 8778 R14: 00702838 R15: 7fa6edcb1700
>  Code: e9 0b fe ff ff 0f 1f 44 00 00 55 53 48 89 fb 89 f5 e8 3f 07 f3 fe 48 
> 8d 7b 3c 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 
> 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 51
>  RIP: qdisc_hash_add+0x26/0xa0 RSP: 8800589cf470
> 
> When a red qdisc is updated with a 0 limit, the child qdisc is left
> unmodified, no additional scheduler is created in red_change(),
> the 'child' local variable is rightfully NULL and must not add it
> to the hash table.
> 
> This change addresses the above issue moving qdisc_hash_add() right
> after the child qdisc creation. It additionally removes unneeded checks
> for noop_qdisc.
> 
> Reported-by: Hangbin Liu <liuhang...@gmail.com>
> Fixes: 49b499718fa1 ("net: sched: make default fifo qdiscs appear in the 
> dump")

Acked-by: Jiri Kosina <jkos...@suse.cz>

Thanks for fixing this Paolo.

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution

2018-01-11 Thread Jiri Kosina
On Tue, 9 Jan 2018, Josh Poimboeuf wrote:

> On Tue, Jan 09, 2018 at 11:44:05AM -0800, Dan Williams wrote:
> > On Tue, Jan 9, 2018 at 11:34 AM, Jiri Kosina <ji...@kernel.org> wrote:
> > > On Fri, 5 Jan 2018, Dan Williams wrote:
> > >
> > > [ ... snip ... ]
> > >> Andi Kleen (1):
> > >>   x86, barrier: stop speculation for failed access_ok
> > >>
> > >> Dan Williams (13):
> > >>   x86: implement nospec_barrier()
> > >>   [media] uvcvideo: prevent bounds-check bypass via speculative 
> > >> execution
> > >>   carl9170: prevent bounds-check bypass via speculative execution
> > >>   p54: prevent bounds-check bypass via speculative execution
> > >>   qla2xxx: prevent bounds-check bypass via speculative execution
> > >>   cw1200: prevent bounds-check bypass via speculative execution
> > >>   Thermal/int340x: prevent bounds-check bypass via speculative 
> > >> execution
> > >>   ipv6: prevent bounds-check bypass via speculative execution
> > >>   ipv4: prevent bounds-check bypass via speculative execution
> > >>   vfs, fdtable: prevent bounds-check bypass via speculative execution
> > >>   net: mpls: prevent bounds-check bypass via speculative execution
> > >>   udf: prevent bounds-check bypass via speculative execution
> > >>   userns: prevent bounds-check bypass via speculative execution
> > >>
> > >> Mark Rutland (4):
> > >>   asm-generic/barrier: add generic nospec helpers
> > >>   Documentation: document nospec helpers
> > >>   arm64: implement nospec_ptr()
> > >>   arm: implement nospec_ptr()
> > >
> > > So considering the recent publication of [1], how come we all of a sudden
> > > don't need the barriers in ___bpf_prog_run(), namely for LD_IMM_DW and
> > > LDX_MEM_##SIZEOP, and something comparable for eBPF JIT?
> > >
> > > Is this going to be handled in eBPF in some other way?
> > >
> > > Without that in place, and considering Jann Horn's paper, it would seem
> > > like PTI doesn't really lock it down fully, right?
> > 
> > Here is the latest (v3) bpf fix:
> > 
> > https://patchwork.ozlabs.org/patch/856645/
> > 
> > I currently have v2 on my 'nospec' branch and will move that to v3 for
> > the next update, unless it goes upstream before then.

Daniel, I guess you're planning to send this still for 4.15?

> That patch seems specific to CONFIG_BPF_SYSCALL.  Is the bpf() syscall
> the only attack vector?  Or are there other ways to run bpf programs
> that we should be worried about?

Seems like Alexei is probably the only person in the whole universe who 
isn't CCed here ... let's fix that.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 00/18] prevent bounds-check bypass via speculative execution

2018-01-09 Thread Jiri Kosina
On Fri, 5 Jan 2018, Dan Williams wrote:

[ ... snip ... ]
> Andi Kleen (1):
>   x86, barrier: stop speculation for failed access_ok
> 
> Dan Williams (13):
>   x86: implement nospec_barrier()
>   [media] uvcvideo: prevent bounds-check bypass via speculative execution
>   carl9170: prevent bounds-check bypass via speculative execution
>   p54: prevent bounds-check bypass via speculative execution
>   qla2xxx: prevent bounds-check bypass via speculative execution
>   cw1200: prevent bounds-check bypass via speculative execution
>   Thermal/int340x: prevent bounds-check bypass via speculative execution
>   ipv6: prevent bounds-check bypass via speculative execution
>   ipv4: prevent bounds-check bypass via speculative execution
>   vfs, fdtable: prevent bounds-check bypass via speculative execution
>   net: mpls: prevent bounds-check bypass via speculative execution
>   udf: prevent bounds-check bypass via speculative execution
>   userns: prevent bounds-check bypass via speculative execution
> 
> Mark Rutland (4):
>   asm-generic/barrier: add generic nospec helpers
>   Documentation: document nospec helpers
>   arm64: implement nospec_ptr()
>   arm: implement nospec_ptr()

So considering the recent publication of [1], how come we all of a sudden 
don't need the barriers in ___bpf_prog_run(), namely for LD_IMM_DW and 
LDX_MEM_##SIZEOP, and something comparable for eBPF JIT?

Is this going to be handled in eBPF in some other way?

Without that in place, and considering Jann Horn's paper, it would seem 
like PTI doesn't really lock it down fully, right?

[1] https://bugs.chromium.org/p/project-zero/issues/attachmentText?aid=287305

-- 
Jiri Kosina
SUSE Labs


Re: [Patch net] net_sched: check noop_qdisc before qdisc_hash_add()

2017-04-06 Thread Jiri Kosina
On Tue, 4 Apr 2017, Cong Wang wrote:

> Dmitry reported a crash when injecting faults in
> attach_one_default_qdisc() and dev->qdisc is still
> a noop_disc, the check before qdisc_hash_add() fails
> to catch it because it tests NULL. We should test
> against noop_qdisc since it is the default qdisc
> at this point.
> 
> Fixes: 59cc1f61f09c ("net: sched: convert qdisc linked list to hashtable")
> Reported-by: Dmitry Vyukov <dvyu...@google.com>
> Cc: Jiri Kosina <jkos...@suse.cz>
> Signed-off-by: Cong Wang <xiyou.wangc...@gmail.com>

Acked-by: Jiri Kosina <jkos...@suse.cz>

Thanks.

> ---
>  net/sched/sch_generic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index b052b27..1a2f9e9 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -794,7 +794,7 @@ static void attach_default_qdiscs(struct net_device *dev)
>   }
>   }
>  #ifdef CONFIG_NET_SCHED
> - if (dev->qdisc)
> +     if (dev->qdisc != _qdisc)
>   qdisc_hash_add(dev->qdisc);
>  #endif
>  }
> -- 
> 2.5.5
> 

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH v2 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-08 Thread Jiri Kosina
On Wed, 8 Mar 2017, Eric Dumazet wrote:

> > +++ b/net/sched/sch_qfq.c
> > @@ -494,6 +494,8 @@ static int qfq_change_class(struct Qdisc *sch, u32 
> > classid, u32 parentid,
> > goto destroy_class;
> > }
> >  
> > +   if (cl->qdisc != _qdisc)
> > +   qdisc_hash_add(cl->qdisc, true);
> 
> 
> Please move the test in qdisc_hash_add() instead of copy/pasting it all
> over the places ?

Well, qdisc_hash_add() has a WARN_ON() (inherited from what 
qdisc_list_add() used to do) for that particular case to catch cases where 
singleton qdisc would make it there from other places by mistake. By 
putting this test there we'll effectively giving up on this warning should 
it ever point to a bug.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH v3 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-08 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

The original reason [1] for having hidden qdiscs (potential scalability
issues in qdisc_match_from_root() with single linked list in case of large
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too
intrusive / unnecessary change of default behavior towards userspace.
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
applications to request complete qdisc hierarchy dump, including the
ones that have always been implicit/invisible.

Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
about singletons would require quite some surgery with very little gain
(seeing no qdisc or seeing noop qdisc in the dump is probably setting
the same user expectation).

[1] 
http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com
[2] 
http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v2 -> v3: get rid of uapi breakage by changing value of TCA_PAD (thanks a
  lot to Jiri Pirko for catching my brainfart)

v1 -> v2: introduce exception for singleton noop_qdisc

 include/net/pkt_sched.h|  2 +-
 include/net/sch_generic.h  |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/sch_api.c| 42 ++
 net/sched/sch_cbq.c|  5 +
 net/sched/sch_drr.c|  2 ++
 net/sched/sch_dsmark.c |  2 ++
 net/sched/sch_generic.c|  2 +-
 net/sched/sch_hfsc.c   |  4 
 net/sched/sch_htb.c|  2 ++
 net/sched/sch_mq.c |  2 +-
 net/sched/sch_mqprio.c |  2 +-
 net/sched/sch_multiq.c |  2 ++
 net/sched/sch_prio.c   |  5 -
 net/sched/sch_qfq.c|  2 ++
 net/sched/sch_red.c|  2 ++
 net/sched/sch_sfb.c|  2 ++
 net/sched/sch_tbf.c|  2 ++
 18 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd334c9584e9..0625eac2c601 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672..e7dca250d115 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -66,6 +66,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
  * qdisc_tree_decrease_qlen() should stop.
  */
+#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f0379d83a..be034cc0e4e4 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -543,6 +543,7 @@ enum {
TCA_STATS2,
TCA_STAB,
TCA_PAD,
+   TCA_DUMP_INVISIBLE,
__TCA_MAX
 };
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 206dc24add3a..8e4e6ab1847a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
return NULL;
 }
 
-void qdisc_hash_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q, bool invisible)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
@@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q)
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
+   if (invisible)
+   q->flags |= TCQ_F_INVISIBLE;
}
 }
 EXPORT_SYMBOL(qdisc_hash_add);
@@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
goto err_out4;
}
 
-   qdisc_hash_add(sch);
+   qdisc_hash_add(sch, false);
 
return sch;
}
@@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct 
Qdisc *q, u32 clid,
retu

[PATCH v2 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-08 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

The original reason [1] for having hidden qdiscs (potential scalability
issues in qdisc_match_from_root() with single linked list in case of large
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too
intrusive / unnecessary change of default behavior towards userspace.
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows
applications to request complete qdisc hierarchy dump, including the
ones that have always been implicit/invisible.

Singleton noop_qdisc stays invisible, as teaching the whole infrastructure
about singletons would require quite some surgery with very little gain
(seeing no qdisc or seeing noop qdisc in the dump is probably setting
the same user expectation).

[1] 
http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com
[2] 
http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2: introduce exception for singleton noop_qdisc

 include/net/pkt_sched.h|  2 +-
 include/net/sch_generic.h  |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/sch_api.c| 42 ++
 net/sched/sch_cbq.c|  5 +
 net/sched/sch_drr.c|  2 ++
 net/sched/sch_dsmark.c |  2 ++
 net/sched/sch_generic.c|  2 +-
 net/sched/sch_hfsc.c   |  4 
 net/sched/sch_htb.c|  2 ++
 net/sched/sch_mq.c |  2 +-
 net/sched/sch_mqprio.c |  2 +-
 net/sched/sch_multiq.c |  2 ++
 net/sched/sch_prio.c   |  5 -
 net/sched/sch_qfq.c|  2 ++
 net/sched/sch_red.c|  2 ++
 net/sched/sch_sfb.c|  2 ++
 net/sched/sch_tbf.c|  2 ++
 18 files changed, 65 insertions(+), 17 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd334c9584e9..0625eac2c601 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672..e7dca250d115 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -66,6 +66,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
  * qdisc_tree_decrease_qlen() should stop.
  */
+#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f0379d83a..c7de00e09797 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -542,6 +542,7 @@ enum {
TCA_FCNT,
TCA_STATS2,
TCA_STAB,
+   TCA_DUMP_INVISIBLE,
TCA_PAD,
__TCA_MAX
 };
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 206dc24add3a..8e4e6ab1847a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
return NULL;
 }
 
-void qdisc_hash_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q, bool invisible)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
@@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q)
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
+   if (invisible)
+   q->flags |= TCQ_F_INVISIBLE;
}
 }
 EXPORT_SYMBOL(qdisc_hash_add);
@@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
goto err_out4;
}
 
-   qdisc_hash_add(sch);
+   qdisc_hash_add(sch, false);
 
return sch;
}
@@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct 
Qdisc *q, u32 clid,
return -1;
 }
 
-static bool tc_qdisc_dump_ignore(struct Qdisc *q)
+static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool d

[PATCH v2 2/2] iproute2: add support for invisible qdisc dumping

2017-03-08 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking 
kernel to perform 'full qdisc dump', as for historical reasons some of the 
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2: fix the format of help message

 tc/tc_qdisc.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 3a3701c2..1e9d9097 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -34,7 +34,7 @@ static int usage(void)
fprintf(stderr, "   [ stab [ help | STAB_OPTIONS] ]\n");
fprintf(stderr, "   [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n");
fprintf(stderr, "\n");
-   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
]\n");
+   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
] [ invisible ]\n");
fprintf(stderr, "Where:\n");
fprintf(stderr, "QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | 
etc. }\n");
fprintf(stderr, "OPTIONS := ... try tc qdisc add  
help\n");
@@ -292,6 +292,7 @@ static int tc_qdisc_list(int argc, char **argv)
 {
struct tcmsg t = { .tcm_family = AF_UNSPEC };
char d[16] = {};
+   bool dump_invisible = false;
 
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@@ -306,6 +307,8 @@ static int tc_qdisc_list(int argc, char **argv)
t.tcm_parent = TC_H_INGRESS;
} else if (matches(*argv, "help") == 0) {
usage();
+   } else if (strcmp(*argv, "invisible") == 0) {
+   dump_invisible = true;
} else {
fprintf(stderr, "What is \"%s\"? Try \"tc qdisc 
help\".\n", *argv);
return -1;
@@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv)
filter_ifindex = t.tcm_ifindex;
}
 
-   if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
+   if (dump_invisible) {
+   struct {
+   struct nlmsghdr n;
+   struct tcmsg t;
+   char buf[256];
+   } req = {
+   .n.nlmsg_type = RTM_GETQDISC,
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+   };
+
+   req.t.tcm_family = AF_UNSPEC;
+
+   addattr(, 256, TCA_DUMP_INVISIBLE);
+   if (rtnl_dump_request_n(, ) < 0) {
+   perror("Cannot send dump request");
+   return 1;
+   }
+
+   } else if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
perror("Cannot send dump request");
return 1;
}
-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 2/2] iproute2: add support for invisible qdisc dumping

2017-03-08 Thread Jiri Kosina
On Mon, 27 Feb 2017, Phil Sutter wrote:

> > Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking 
> > kernel to perform 'full qdisc dump', as for historical reasons some of the 
> > default qdiscs are being hidden by the kernel.
> > 
> > The command syntax is being extended by voluntary 'invisible' argument to
> > 'tc qdisc show'.
> > 
> > Signed-off-by: Jiri Kosina <jkos...@suse.cz>
> > ---
> >  tc/tc_qdisc.c | 25 +++--
> >  1 file changed, 23 insertions(+), 2 deletions(-)
> 
> Would you mind adding a description of the new keyword to tc man page as
> well?

Seems like tc manpage would need more updates, as it doesn't seem to 
reflect the '[ ingress | clsact ]' possibility either. So I'll send this 
as a followup patch, fixing all these at once.

> > diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
> > index 3a3701c2..29da9269 100644
> > --- a/tc/tc_qdisc.c
> > +++ b/tc/tc_qdisc.c
> > @@ -34,7 +34,7 @@ static int usage(void)
> > fprintf(stderr, "   [ stab [ help | STAB_OPTIONS] ]\n");
> > fprintf(stderr, "   [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n");
> > fprintf(stderr, "\n");
> > -   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
> > ]\n");
> > +   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
> > | invisible ]\n");
> 
> Doesn't look like these are mutually exclusive. Therefore I would
> suggest fixing the syntax to:
> 
> | +   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
> ] [ invisible ]\n");

Makes sense, will include this in v2.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-07 Thread Jiri Kosina
On Mon, 6 Mar 2017, David Miller wrote:

> > Ah, right you are, thanks. The complete fix is not super trivial, as it 
> > needs some more surgery to tc_dump_qdisc_root(), tc_dump_tclass_root() and 
> > qdisc_match_from_root() (see 69012ae42 for some details).
> > 
> > There are two options:
> > 
> > - this gets fixed in two phases, in first everything *but* noop qdisc gets 
> >   dumped (in the "give me everything" dump) and later we finalize it by
> >   teaching the above functions about noop_qdisc as well
> > 
> > - I extend this patchset to handle noop qdisc from the very beginning; 
> >   I am unlikely to find time for this during coming weeks though. But OTOH
> >   this whole thing is very low priority anyway
> > 
> > What do you think?
> 
> I'm not too hot on this whole idea because the only way you can emit
> the noop_qdisc is to "dup" it by allocating a new qdisc so that you
> can link it in.  This has two downsides:
> 
> 1) Extra overhead and memory usage
> 
> 2) All of the simple checks against _qdisc might not be
>so simply any more.

Fully agreed.

I'd be inclined to just live with the fact that noop qdisc would never 
ever be visible in the dump. I think that in this particular case this can 
be easily justified, as the behavior is consistent with what is seen in 
the dump ("nothing happens to the packets"). All other cases will be 
covered.

If there are no objections to this, I'll send out v2 shortly.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-03-06 Thread Jiri Kosina
On Wed, 1 Mar 2017, David Miller wrote:

> > @@ -1066,6 +1066,7 @@ hfsc_change_class(struct Qdisc *sch, u32 classid, u32 
> > parentid,
> >   _qdisc_ops, classid);
> > if (cl->qdisc == NULL)
> > cl->qdisc = _qdisc;
> > +   qdisc_hash_add(cl->qdisc, true);
> > INIT_LIST_HEAD(>children);
> > cl->vt_tree = RB_ROOT;
> > cl->cf_tree = RB_ROOT;
> > @@ -1425,6 +1426,7 @@ hfsc_init_qdisc(struct Qdisc *sch, struct nlattr *opt)
> >   sch->handle);
> > if (q->root.qdisc == NULL)
> > q->root.qdisc = _qdisc;
> > +   qdisc_hash_add(q->root.qdisc, true);
> > INIT_LIST_HEAD(>root.children);
> > q->root.vt_tree = RB_ROOT;
> > q->root.cf_tree = RB_ROOT;
> 
> I'm not so sure it is legal is potentially pass _qdisc into 
> qdisc_hash_add().

Ah, right you are, thanks. The complete fix is not super trivial, as it 
needs some more surgery to tc_dump_qdisc_root(), tc_dump_tclass_root() and 
qdisc_match_from_root() (see 69012ae42 for some details).

There are two options:

- this gets fixed in two phases, in first everything *but* noop qdisc gets 
  dumped (in the "give me everything" dump) and later we finalize it by
  teaching the above functions about noop_qdisc as well

- I extend this patchset to handle noop qdisc from the very beginning; 
  I am unlikely to find time for this during coming weeks though. But OTOH
  this whole thing is very low priority anyway

What do you think?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 0/2] net: sched: make it possible to unhide default qdiscs

2017-02-25 Thread Jiri Kosina
On Sat, 25 Feb 2017, Jiri Kosina wrote:

> This is a followup to a patchset submitted back in October 2016

No idea what happened that my usual patchset sending process broke and 
there is no 'References/In-reply-to' in the actual patches :/ I will 
investigate that. Please let me know in case you'd like a resend just 
because of this.

Sorry for the hassle,

-- 
Jiri Kosina
SUSE Labs



[PATCH 1/2] net: sched: make default fifo qdiscs appear in the dump

2017-02-25 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

The original reason [1] for having hidden qdiscs (potential scalability 
issues in qdisc_match_from_root() with single linked list in case of large 
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert 
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by 
making default pfifo qdiscs visible.

We're not turning this on by default though, at it was deemed [2] too 
intrusive / unnecessary change of default behavior towards userspace. 
Instead, TCA_DUMP_INVISIBLE netlink attribute is introduced, which allows 
applications to request complete qdisc hierarchy dump, including the ones 
that have always been implicit/invisible.

[1] 
http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com
[2] 
http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 include/net/pkt_sched.h|  2 +-
 include/net/sch_generic.h  |  1 +
 include/uapi/linux/rtnetlink.h |  1 +
 net/sched/sch_api.c| 42 ++
 net/sched/sch_cbq.c|  4 
 net/sched/sch_drr.c|  2 ++
 net/sched/sch_dsmark.c |  1 +
 net/sched/sch_generic.c|  2 +-
 net/sched/sch_hfsc.c   |  2 ++
 net/sched/sch_htb.c|  1 +
 net/sched/sch_mq.c |  2 +-
 net/sched/sch_mqprio.c |  2 +-
 net/sched/sch_multiq.c |  1 +
 net/sched/sch_prio.c   |  4 +++-
 net/sched/sch_qfq.c|  1 +
 net/sched/sch_red.c|  1 +
 net/sched/sch_sfb.c|  1 +
 net/sched/sch_tbf.c|  1 +
 18 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd334c9584e9..0625eac2c601 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,7 +90,7 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q, bool invisible);
 void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index e6aa0a249672..e7dca250d115 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -66,6 +66,7 @@ struct Qdisc {
 #define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
  * qdisc_tree_decrease_qlen() should stop.
  */
+#define TCQ_F_INVISIBLE0x80 /* invisible by default in dump */
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 262f0379d83a..c7de00e09797 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -542,6 +542,7 @@ enum {
TCA_FCNT,
TCA_STATS2,
TCA_STAB,
+   TCA_DUMP_INVISIBLE,
TCA_PAD,
__TCA_MAX
 };
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 206dc24add3a..8e4e6ab1847a 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -274,7 +274,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
return NULL;
 }
 
-void qdisc_hash_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q, bool invisible)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
@@ -282,6 +282,8 @@ void qdisc_hash_add(struct Qdisc *q)
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
+   if (invisible)
+   q->flags |= TCQ_F_INVISIBLE;
}
 }
 EXPORT_SYMBOL(qdisc_hash_add);
@@ -1004,7 +1006,7 @@ static struct Qdisc *qdisc_create(struct net_device *dev,
goto err_out4;
}
 
-   qdisc_hash_add(sch);
+   qdisc_hash_add(sch, false);
 
return sch;
}
@@ -1400,9 +1402,14 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct 
Qdisc *q, u32 clid,
return -1;
 }
 
-static bool tc_qdisc_dump_ignore(struct Qdisc *q)
+static bool tc_qdisc_dump_ignore(struct Qdisc *q, bool dump_invisible)
 {
-   return (q->flags & TCQ_F_BUILTIN) ? true : false;
+   if (q->flags & TCQ_F_BUILTIN)
+   return true;
+   if ((q->flags & TCQ_F_INVISIBLE) && !dump_invisible)
+   return true;
+
+   return false;
 }
 
 static int qd

[PATCH 2/2] iproute2: add support for invisible qdisc dumping

2017-02-25 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Support the new TCA_DUMP_INVISIBLE netlink attribute that allows asking 
kernel to perform 'full qdisc dump', as for historical reasons some of the 
default qdiscs are being hidden by the kernel.

The command syntax is being extended by voluntary 'invisible' argument to
'tc qdisc show'.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 tc/tc_qdisc.c | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/tc/tc_qdisc.c b/tc/tc_qdisc.c
index 3a3701c2..29da9269 100644
--- a/tc/tc_qdisc.c
+++ b/tc/tc_qdisc.c
@@ -34,7 +34,7 @@ static int usage(void)
fprintf(stderr, "   [ stab [ help | STAB_OPTIONS] ]\n");
fprintf(stderr, "   [ [ QDISC_KIND ] [ help | OPTIONS ] ]\n");
fprintf(stderr, "\n");
-   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
]\n");
+   fprintf(stderr, "   tc qdisc show [ dev STRING ] [ ingress | clsact 
| invisible ]\n");
fprintf(stderr, "Where:\n");
fprintf(stderr, "QDISC_KIND := { [p|b]fifo | tbf | prio | cbq | red | 
etc. }\n");
fprintf(stderr, "OPTIONS := ... try tc qdisc add  
help\n");
@@ -292,6 +292,7 @@ static int tc_qdisc_list(int argc, char **argv)
 {
struct tcmsg t = { .tcm_family = AF_UNSPEC };
char d[16] = {};
+   bool dump_invisible = false;
 
while (argc > 0) {
if (strcmp(*argv, "dev") == 0) {
@@ -306,6 +307,8 @@ static int tc_qdisc_list(int argc, char **argv)
t.tcm_parent = TC_H_INGRESS;
} else if (matches(*argv, "help") == 0) {
usage();
+   } else if (strcmp(*argv, "invisible") == 0) {
+   dump_invisible = true;
} else {
fprintf(stderr, "What is \"%s\"? Try \"tc qdisc 
help\".\n", *argv);
return -1;
@@ -325,7 +328,25 @@ static int tc_qdisc_list(int argc, char **argv)
filter_ifindex = t.tcm_ifindex;
}
 
-   if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
+   if (dump_invisible) {
+   struct {
+   struct nlmsghdr n;
+   struct tcmsg t;
+   char buf[256];
+   } req = {
+   .n.nlmsg_type = RTM_GETQDISC,
+   .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct tcmsg)),
+   };
+
+   req.t.tcm_family = AF_UNSPEC;
+
+   addattr(, 256, TCA_DUMP_INVISIBLE);
+   if (rtnl_dump_request_n(, ) < 0) {
+   perror("Cannot send dump request");
+   return 1;
+   }
+
+   } else if (rtnl_dump_request(, RTM_GETQDISC, , sizeof(t)) < 0) {
perror("Cannot send dump request");
return 1;
}
-- 
Jiri Kosina
SUSE Labs



[PATCH 0/2] net: sched: make it possible to unhide default qdiscs

2017-02-25 Thread Jiri Kosina
This is a followup to a patchset submitted back in October 2016


http://lkml.kernel.org/r/alpine.lnx.2.00.1610211024400.31...@cbobk.fhfr.pm

that aimed at making the qdisc hierarchy more transparent and unhide the 
default qdiscs in the dump by default. After some discussion, it turned 
out that the most viable way to go would be to introduce a new netlink 
attribute for this, and let the userspace chose whether it wants to see 
the whole hierarchy dump


http://lkml.kernel.org/r/20161021.105935.1907696543877061916.da...@davemloft.net

This is implemented by this patchset. First patch adds the new 
TCA_DUMP_INVISIBLE netlink attribute and its handling in the kernel, the 
second one adds the iproute2 counterpart.

-- 
Jiri Kosina
SUSE Labs



[PATCH] iproute2: tc: introduce build dependency on libnetlink

2017-02-24 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong 
(especially so for builds where libnetlink.a gets statically linked into 
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 tc/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/Makefile b/tc/Makefile
index 6dd984f0..3f7fc939 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -127,7 +127,7 @@ MODDESTDIR := $(DESTDIR)$(LIBDIR)/tc
 
 all: tc $(TCSO)
 
-tc: $(TCOBJ) libtc.a
+tc: $(TCOBJ) $(LIBNETLINK) libtc.a
$(QUIET_LINK)$(CC) $^ $(LDFLAGS) $(LDLIBS) -o $@
 
 libtc.a: $(TCLIB)

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH] net: sched: make default fifo qdiscs appear in the dump

2016-10-21 Thread Jiri Kosina
On Fri, 21 Oct 2016, Eric Dumazet wrote:

> Some of us are dealing with huge HTB hierarchies, so adding default fifo
> in the dump will add more data pumped from the kernel.
> 
> BwE [1] for instance dumps qdisc/classes every 5 seconds.
> 
> I guess we'll need to not pull this patch in our kernels.

Okay, so I probably misunderstood you here:

https://marc.info/?l=linux-kernel=146073234818214=2

as I thought that as long as we move towards the hashtable, you wouldn't 
have any issues with this.

I'd really like to unhide the default qdiscs, it makes little sense to be 
inconsistent in this way.

Random shot into darkness -- how about making this a 
CONFIG/sysctl-selectable?

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH] net: sched: make default fifo qdiscs appear in the dump

2016-10-21 Thread Jiri Kosina
The original reason [1] for having hidden qdiscs (potential scalability 
issues in qdisc_match_from_root() with single linked list in case of large 
amount of qdiscs) has been invalidated by 59cc1f61f0 ("net: sched: convert 
qdisc linked list to hashtable").

This allows us for bringing more clarity and determinism into the dump by 
making default pfifo qdiscs visible.

[1] 
http://lkml.kernel.org/r/1460732328.10638.74.ca...@edumazet-glaptop3.roam.corp.google.com

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

Tested for cbq, htb and tbf.

 net/sched/sch_cbq.c| 4 
 net/sched/sch_drr.c| 2 ++
 net/sched/sch_dsmark.c | 1 +
 net/sched/sch_hfsc.c   | 2 ++
 net/sched/sch_htb.c| 1 +
 net/sched/sch_multiq.c | 1 +
 net/sched/sch_prio.c   | 4 +++-
 net/sched/sch_qfq.c| 1 +
 net/sched/sch_red.c| 1 +
 net/sched/sch_sfb.c| 1 +
 net/sched/sch_tbf.c| 1 +
 11 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index beb554a..3c85e8d 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -1161,6 +1161,8 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt)
if (!q->link.q)
q->link.q = _qdisc;
 
+   qdisc_hash_add(q->link.q);
+
q->link.priority = TC_CBQ_MAXPRIO - 1;
q->link.priority2 = TC_CBQ_MAXPRIO - 1;
q->link.cpriority = TC_CBQ_MAXPRIO - 1;
@@ -1606,6 +1608,8 @@ static void cbq_put(struct Qdisc *sch, unsigned long arg)
cl->quantum = cl->allot;
cl->weight = cl->R_tab->rate.rate;
 
+   qdisc_hash_add(cl->q);
+
sch_tree_lock(sch);
cbq_link_class(cl);
cl->borrow = cl->tparent;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 8af5c59..1d33b94 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -118,6 +118,8 @@ static int drr_change_class(struct Qdisc *sch, u32 classid, 
u32 parentid,
if (cl->qdisc == NULL)
cl->qdisc = _qdisc;
 
+   qdisc_hash_add(cl->qdisc);
+
if (tca[TCA_RATE]) {
err = gen_replace_estimator(>bstats, NULL, >rate_est,
NULL,
diff --git a/net/sched/sch_dsmark.c b/net/sched/sch_dsmark.c
index 1308bbf..d0bffd6 100644
--- a/net/sched/sch_dsmark.c
+++ b/net/sched/sch_dsmark.c
@@ -367,6 +367,7 @@ static int dsmark_init(struct Qdisc *sch, struct nlattr 
*opt)
p->q = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, sch->handle);
if (p->q == NULL)
p->q = _qdisc;
+   qdisc_hash_add(p->q);
 
pr_debug("%s: qdisc %p\n", __func__, p->q);
 
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 000f1d3..a75710e 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1066,6 +1066,7 @@ struct hfsc_sched {
  _qdisc_ops, classid);
if (cl->qdisc == NULL)
cl->qdisc = _qdisc;
+   qdisc_hash_add(cl->qdisc);
INIT_LIST_HEAD(>children);
cl->vt_tree = RB_ROOT;
cl->cf_tree = RB_ROOT;
@@ -1425,6 +1426,7 @@ struct hfsc_sched {
  sch->handle);
if (q->root.qdisc == NULL)
q->root.qdisc = _qdisc;
+   qdisc_hash_add(q->root.qdisc);
INIT_LIST_HEAD(>root.children);
q->root.vt_tree = RB_ROOT;
q->root.cf_tree = RB_ROOT;
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c798d0d..421d0a9 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1459,6 +1459,7 @@ static int htb_change_class(struct Qdisc *sch, u32 
classid,
qdisc_class_hash_insert(>clhash, >common);
if (parent)
parent->children++;
+   qdisc_hash_add(cl->un.leaf.q);
} else {
if (tca[TCA_RATE]) {
err = gen_replace_estimator(>bstats, NULL,
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9ffbb02..9266e9c 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -217,6 +217,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr 
*opt)
sch_tree_lock(sch);
old = q->queues[i];
q->queues[i] = child;
+   qdisc_hash_add(child);
 
if (old != _qdisc) {
qdisc_tree_reduce_backlog(old,
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 8f57589..604a817 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -192,8 +192,10 @@ static int prio_tune(struct Qdisc *sch, struct nlattr *opt)
qdisc_destroy(child);
}
 
-   for (i = oldbands; i < q->bands; i++)
+   for (i = oldbands; i < q-

Re: [PATCH 2/2] net: sched: avoid duplicates in qdisc dump

2016-08-19 Thread Jiri Kosina
On Thu, 18 Aug 2016, Cong Wang wrote:

> Doesn't this mean we can now just remove the ingress case from 
> tc_dump_qdisc() and simply iterate the whole hash table?

That'd be indeed a nice cleanup, but for that we'll have to make sure that 
the top-level ingress entry makes it to the hashtable as well; that's not 
the case currently, and hence the special-casing (but I agree that it's in 
principle just a remnant of the old design and of a rather 1:1 
list->hashtable conversion, and there is no good reason for keeping it 
this way any more).

So patch that'd add ingress qdisc to the hashtable whenever it 
materializes should be enough to get rid of the "we have to walk two 
linked lists" heritage and the ingress special-casing for dumps.

Will look into it next week unless someone is faster.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH 2/2] net: sched: avoid duplicates in qdisc dump

2016-08-16 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

tc_dump_qdisc() performs dumping of the per-device qdiscs in two phases; 
first, the "standard" dev->qdisc is being dumped. Second, if there is/are 
ingress queue(s), they are being dumped as well.

After conversion of netdevice's qdisc linked-list into hashtable, these 
two sets are not in two disjunctive sets/lists any more, but are both 
"reachable" directly from netdevice's hashtable. As a consequence, the 
"full-depth" dump of the ingress qdiscs results in immediately hitting the 
netdevice hashtable again, and duplicating the dump that has already been 
performed for dev->qdisc. 
What in fact needs to be dumped in case of ingress queue is "just" the 
top-level ingress qdisc, as everything else has been dumped already.

Fix this by extending tc_dump_qdisc_root() in a way that it can be instructed
whether it should (while performing the "full" per-netdev qdisc dump) perform
the whole recursion, or just dump "additional" top-level (ingress) qdiscs
without performing any kind of recursion.

This fixes duplicate dumps such as

qdisc mq 0: root
qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc clsact : parent :fff1
qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1
qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 
1 1 1 1

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Tested-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 net/sched/sch_api.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 2b75d68..749811f 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1440,7 +1440,7 @@ err_out:
 
 static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
  struct netlink_callback *cb,
- int *q_idx_p, int s_q_idx)
+ int *q_idx_p, int s_q_idx, bool recur)
 {
int ret = 0, q_idx = *q_idx_p;
struct Qdisc *q;
@@ -1460,7 +1460,13 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct 
sk_buff *skb,
q_idx++;
}
 
-   if (!qdisc_dev(root))
+   /* If dumping singletons, there is no qdisc_dev(root) and the singleton
+* itself has already been dumped.
+*
+* If we've already dumped the top-level (ingress) qdisc above and the 
global
+* qdisc hashtable, we don't want to hit it again
+*/
+   if (!qdisc_dev(root) || !recur)
goto out;
 
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
@@ -1504,13 +1510,13 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct 
netlink_callback *cb)
s_q_idx = 0;
q_idx = 0;
 
-   if (tc_dump_qdisc_root(dev->qdisc, skb, cb, _idx, s_q_idx) < 
0)
+   if (tc_dump_qdisc_root(dev->qdisc, skb, cb, _idx, s_q_idx, 
true) < 0)
goto done;
 
dev_queue = dev_ingress_queue(dev);
if (dev_queue &&
tc_dump_qdisc_root(dev_queue->qdisc_sleeping, skb, cb,
-  _idx, s_q_idx) < 0)
+  _idx, s_q_idx, false) < 0)
goto done;
 
 cont:
-- 
1.9.2


[PATCH 1/2] net: sched: fix handling of singleton qdiscs with qdisc_hash

2016-08-16 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

qdisc_match_from_root() is now iterating over per-netdevice qdisc 
hashtable instead of going through a linked-list of qdiscs (independently 
on the actual underlying netdev), which was the case before the switch to 
hashtable for qdiscs.

For singleton qdiscs, there is no underlying netdev associated though, and 
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will 
always be NULL.

 BUG: unable to handle kernel NULL pointer dereference at 0410
 IP: [] qdisc_match_from_root+0x2c/0x70
 PGD 1aceba067 PUD 1aceb7067 PMD 0
 Oops:  [#1] PREEMPT SMP
[ ... ]
 task: 8801ec996e00 task.stack: 8801ec934000
 RIP: 0010:[]  [] 
qdisc_match_from_root+0x2c/0x70
 RSP: 0018:8801ec937ab0  EFLAGS: 00010203
 RAX: 0408 RBX: 88025e612000 RCX: ffd8
 RDX:  RSI:  RDI: 81cf8100
 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0
 R10: 81cf8100 R11: 0030 R12: 
 R13: 88025e612000 R14: 81cf3140 R15: 
 FS:  7f24b9af6740() GS:88026f28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0
 Stack:
  8801ec937ad0 81681210 88025dd51a00 fff1
  8801ec937b88 81681e4e 81c42bc0 880262431500
  81cf3140 88025dd51a10 88025dd51a24 ec937b38
 Call Trace:
  [] qdisc_lookup+0x40/0x50
  [] tc_modify_qdisc+0x21e/0x550
  [] rtnetlink_rcv_msg+0x95/0x220
  [] ? __kmalloc_track_caller+0x172/0x230
  [] ? rtnl_newlink+0x870/0x870
  [] netlink_rcv_skb+0xa7/0xc0
  [] rtnetlink_rcv+0x28/0x30
  [] netlink_unicast+0x15b/0x210
  [] netlink_sendmsg+0x319/0x390
  [] sock_sendmsg+0x38/0x50
  [] ___sys_sendmsg+0x256/0x260
  [] ? __pagevec_lru_add_fn+0x135/0x280
  [] ? pagevec_lru_move_fn+0xd0/0xf0
  [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
  [] ? __lru_cache_add+0x75/0xb0
  [] ? _raw_spin_unlock+0x16/0x40
  [] ? handle_mm_fault+0x39f/0x1160
  [] __sys_sendmsg+0x45/0x80
  [] SyS_sendmsg+0x12/0x20
  [] do_syscall_64+0x57/0xb0

Fix this by special-casing singleton qdiscs (those that don't have 
underlying netdevice) and introduce immediate handling of those rather 
than trying to go over an underlying netdevice. We're in the same 
situation in tc_dump_qdisc_root() and tc_dump_tclass_root().

Ultimately, this will have to be slightly reworked so that we are actually 
able to show singleton qdiscs (noop) in the dump properly; but we're not 
currently doing that anyway, so no regression there, and better do this in 
a gradual manner.

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Tested-by: Daniel Borkmann <dan...@iogearbox.net>
Reported-by: David Ahern <d...@cumulusnetworks.com>
Tested-by: David Ahern <d...@cumulusnetworks.com>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 net/sched/sch_api.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c093d32..2b75d68 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
 {
struct Qdisc *q;
 
+   if (!qdisc_dev(root))
+   return (root->handle == handle ? root : NULL);
+
if (!(root->flags & TCQ_F_BUILTIN) &&
root->handle == handle)
return root;
@@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct 
sk_buff *skb,
goto done;
q_idx++;
}
+
+   if (!qdisc_dev(root))
+   goto out;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (q_idx < s_q_idx) {
q_idx++;
@@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct 
sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;
 
+   if (!qdisc_dev(root))
+   return 0;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
-- 
1.9.2


[PATCH next-next 0/2] qdisc-hashtable fixes

2016-08-16 Thread Jiri Kosina
The following two patches fix all the issues that have been reported 
against the conversion of qdisc linked list to hashtable (currently in 
net-next) so far.

First patch adjusts handling of singleton qdiscs to the new semantics, and 
is rather straightforward.

The second patch, which fixes "cosmetic" issue of duplicate entries in the 
qdisc dump for ingress qdiscs, is a little bit more hairy; I personally 
would love to see all the already existing "if (ingress)"-like hacks go 
away (by, let's say, introducing a general TCQ_F_? flag), but that's way 
out of scope of this patchset (but already on my todo).

Thanks a lot to Daniel Borkmann and David Ahern for reporting the issues 
and testing the patches promptly.

Jiri Kosina (2):
  net: sched: fix handling of singleton qdiscs with qdisc_hash
  net: sched: avoid duplicates in qdisc dump

 net/sched/sch_api.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

-- 
Jiri Kosina
SUSE Labs


Re: kernel panic in qdisc_match_from_root starting openvswitch

2016-08-16 Thread Jiri Kosina
On Tue, 16 Aug 2016, Jiri Kosina wrote:

> > I am hitting a kernel panic with '/etc/init.d/openvswitch-switch 
> > restart' Stack trace below. Reverting 59cc1f61f09c ("net: sched: convert 
> > qdisc linked list to hashtable") clears the problem.
> 
> Thanks a lot for the report. Could you please test either with the 
> attached two patches applied on top of net-next, or alternatively with 
> 'might-rebase/net-sched-qdiscs' branch of
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git
> 
> (HEAD of that branch is 60b3487ad7f) and report back to me whether it 
> fixes any issues you are seeing? Applying the attached patches on top of 
> net-next or testing with the git branch above should be equivalent test 
> wrt. qdiscs.

And this time hopefully with the actual patches ... 

-- 
Jiri Kosina
SUSE Labs
From 39e6390a50335e88e1bd566f3a5d2f1b09272896 Mon Sep 17 00:00:00 2001
From: Jiri Kosina <ji...@kernel.org>
Date: Fri, 12 Aug 2016 15:53:04 +0200
Subject: [PATCH 1/2] net: sched: fix handling of singleton qdiscs with
 qdisc_hash

qdisc_match_from_root() is now iterating over per-netdevice qdisc hashtable
instead of going through a linked-list of qdiscs (independently on the actual
underlying netdev), which was the case before the switch to hashtable for
qdiscs.

For singleton qdiscs, there is no underlying netdev associated though, and
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will
always be NULL.

 BUG: unable to handle kernel NULL pointer dereference at 0410
 IP: [] qdisc_match_from_root+0x2c/0x70
 PGD 1aceba067 PUD 1aceb7067 PMD 0
 Oops:  [#1] PREEMPT SMP
[ ... ]
 task: 8801ec996e00 task.stack: 8801ec934000
 RIP: 0010:[]  [] qdisc_match_from_root+0x2c/0x70
 RSP: 0018:8801ec937ab0  EFLAGS: 00010203
 RAX: 0408 RBX: 88025e612000 RCX: ffd8
 RDX:  RSI:  RDI: 81cf8100
 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0
 R10: 81cf8100 R11: 0030 R12: 
 R13: 88025e612000 R14: 81cf3140 R15: 
 FS:  7f24b9af6740() GS:88026f28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0
 Stack:
  8801ec937ad0 81681210 88025dd51a00 fff1
  8801ec937b88 81681e4e 81c42bc0 880262431500
  81cf3140 88025dd51a10 88025dd51a24 ec937b38
 Call Trace:
  [] qdisc_lookup+0x40/0x50
  [] tc_modify_qdisc+0x21e/0x550
  [] rtnetlink_rcv_msg+0x95/0x220
  [] ? __kmalloc_track_caller+0x172/0x230
  [] ? rtnl_newlink+0x870/0x870
  [] netlink_rcv_skb+0xa7/0xc0
  [] rtnetlink_rcv+0x28/0x30
  [] netlink_unicast+0x15b/0x210
  [] netlink_sendmsg+0x319/0x390
  [] sock_sendmsg+0x38/0x50
  [] ___sys_sendmsg+0x256/0x260
  [] ? __pagevec_lru_add_fn+0x135/0x280
  [] ? pagevec_lru_move_fn+0xd0/0xf0
  [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
  [] ? __lru_cache_add+0x75/0xb0
  [] ? _raw_spin_unlock+0x16/0x40
  [] ? handle_mm_fault+0x39f/0x1160
  [] __sys_sendmsg+0x45/0x80
  [] SyS_sendmsg+0x12/0x20
  [] do_syscall_64+0x57/0xb0

Fix this by special-casing singleton qdiscs (those that don't have underlying
netdevice) and introduce immediate handling of those rather than trying to
go over an underlying netdevice. We're in the same situation in
tc_dump_qdisc_root() and tc_dump_tclass_root().

Ultimately, this will have to be slightly reworked so that we are actually
able to show singleton qdiscs (noop) in the dump properly; but we're not
currently doing that anyway, so no regression there, and better do this
in a gradual manner.

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Tested-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 net/sched/sch_api.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c093d32..2b75d68 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
 {
 	struct Qdisc *q;
 
+	if (!qdisc_dev(root))
+		return (root->handle == handle ? root : NULL);
+
 	if (!(root->flags & TCQ_F_BUILTIN) &&
 	root->handle == handle)
 		return root;
@@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct sk_buff *skb,
 			goto done;
 		q_idx++;
 	}
+
+	if (!qdisc_dev(root))
+		goto out;
+
 	hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
 		if (q_idx < s_q_idx) {
 			q_idx++;
@@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct sk_buff *skb,
 	if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p

Re: kernel panic in qdisc_match_from_root starting openvswitch

2016-08-16 Thread Jiri Kosina
On Tue, 16 Aug 2016, David Ahern wrote:

> Jiri:
> 
> I am hitting a kernel panic with '/etc/init.d/openvswitch-switch 
> restart' Stack trace below. Reverting 59cc1f61f09c ("net: sched: convert 
> qdisc linked list to hashtable") clears the problem.

Thanks a lot for the report. Could you please test either with the 
attached two patches applied on top of net-next, or alternatively with 
'might-rebase/net-sched-qdiscs' branch of

git://git.kernel.org/pub/scm/linux/kernel/git/jikos/jikos.git

(HEAD of that branch is 60b3487ad7f) and report back to me whether it 
fixes any issues you are seeing? Applying the attached patches on top of 
net-next or testing with the git branch above should be equivalent test 
wrt. qdiscs.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH] net: sched: fix handling of singleton qdiscs with qdisc_hash

2016-08-16 Thread Jiri Kosina
On Tue, 16 Aug 2016, Jiri Kosina wrote:

> From: Jiri Kosina <jkos...@suse.cz>
> 
> qdisc_match_from_root() is now iterating over per-netdevice qdisc 
> hashtable instead of going through a linked-list of qdiscs (independently 
> on the actual underlying netdev), which used to be the case before the 
> switch to hashtable for qdiscs.
> 
> For singleton qdiscs, there is no underlying netdev associated though, and 
> therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will 
> always be NULL.
[ ... snip ... ]
> @@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> struct sk_buff *skb,
>   goto done;
>   q_idx++;
>   }
> +
> + if (!qdisc_dev(root))
> + goto done;
> +

Ok, this will cause default singleton-only devices being missed in the 
dump.

I am now working on creating a automation that'd test as many use cases as 
possible; will send up a new patch once I have all the known corner cases 
covered (including the ingress / clsact dump duplication).

Please drop this one for now, I'll send up an accumulated followup fixes 
asap.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH] net: sched: fix handling of singleton qdiscs with qdisc_hash

2016-08-16 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

qdisc_match_from_root() is now iterating over per-netdevice qdisc 
hashtable instead of going through a linked-list of qdiscs (independently 
on the actual underlying netdev), which used to be the case before the 
switch to hashtable for qdiscs.

For singleton qdiscs, there is no underlying netdev associated though, and 
therefore dumping a singleton qdisc will panic, as qdisc_dev(root) will 
always be NULL.

 BUG: unable to handle kernel NULL pointer dereference at 0410
 IP: [] qdisc_match_from_root+0x2c/0x70
 PGD 1aceba067 PUD 1aceb7067 PMD 0
 Oops:  [#1] PREEMPT SMP
[ ... ]
 task: 8801ec996e00 task.stack: 8801ec934000
 RIP: 0010:[]  [] 
qdisc_match_from_root+0x2c/0x70
 RSP: 0018:8801ec937ab0  EFLAGS: 00010203
 RAX: 0408 RBX: 88025e612000 RCX: ffd8
 RDX:  RSI:  RDI: 81cf8100
 RBP: 8801ec937ab0 R08: 0001c160 R09: 8802668032c0
 R10: 81cf8100 R11: 0030 R12: 
 R13: 88025e612000 R14: 81cf3140 R15: 
 FS:  7f24b9af6740() GS:88026f28() knlGS:
 CS:  0010 DS:  ES:  CR0: 80050033
 CR2: 0410 CR3: 0001aceec000 CR4: 001406e0
 Stack:
  8801ec937ad0 81681210 88025dd51a00 fff1
  8801ec937b88 81681e4e 81c42bc0 880262431500
  81cf3140 88025dd51a10 88025dd51a24 ec937b38
 Call Trace:
  [] qdisc_lookup+0x40/0x50
  [] tc_modify_qdisc+0x21e/0x550
  [] rtnetlink_rcv_msg+0x95/0x220
  [] ? __kmalloc_track_caller+0x172/0x230
  [] ? rtnl_newlink+0x870/0x870
  [] netlink_rcv_skb+0xa7/0xc0
  [] rtnetlink_rcv+0x28/0x30
  [] netlink_unicast+0x15b/0x210
  [] netlink_sendmsg+0x319/0x390
  [] sock_sendmsg+0x38/0x50
  [] ___sys_sendmsg+0x256/0x260
  [] ? __pagevec_lru_add_fn+0x135/0x280
  [] ? pagevec_lru_move_fn+0xd0/0xf0
  [] ? trace_event_raw_event_mm_lru_insertion+0x180/0x180
  [] ? __lru_cache_add+0x75/0xb0
  [] ? _raw_spin_unlock+0x16/0x40
  [] ? handle_mm_fault+0x39f/0x1160
  [] __sys_sendmsg+0x45/0x80
  [] SyS_sendmsg+0x12/0x20
  [] do_syscall_64+0x57/0xb0

Fix this by special-casing singleton qdiscs (those that don't have underlying
netdevice) and introduce immediate handling of those rather than trying to
go over an underlying netdevice. We're in the same situation in
tc_dump_qdisc_root() and tc_dump_tclass_root() as well.

Ultimately, this will have to be slightly reworked so that we are actually 
able to show singleton qdiscs (noop) in the dump properly, which is our 
ultimate goal. But we're not currently doing that anyway, so no regression 
there, and better do this in a gradual manner.

Fixes: 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Tested-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

Daniel, I've covered two more cases of the same in tc_dump_qdisc_root() 
and tc_dump_tclass_root(), but preserved your Reported-by / Tested-by: for 
the one you did wrt. qdisc_match_from_root(). Please shout out loud if you 
object. Thanks.

 net/sched/sch_api.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index c093d32..83413e7 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -262,6 +262,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
 {
struct Qdisc *q;
 
+   if (!qdisc_dev(root))
+   return (root->handle == handle ? root : NULL);
+
if (!(root->flags & TCQ_F_BUILTIN) &&
root->handle == handle)
return root;
@@ -1456,6 +1459,10 @@ static int tc_dump_qdisc_root(struct Qdisc *root, struct 
sk_buff *skb,
goto done;
q_idx++;
}
+
+   if (!qdisc_dev(root))
+   goto done;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (q_idx < s_q_idx) {
q_idx++;
@@ -1781,6 +1788,9 @@ static int tc_dump_tclass_root(struct Qdisc *root, struct 
sk_buff *skb,
if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
return -1;
 
+   if (!qdisc_dev(root))
+   return 0;
+
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
return -1;
-- 
1.9.2


Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable

2016-08-15 Thread Jiri Kosina
On Sat, 13 Aug 2016, Cong Wang wrote:

> > How about we actually extend a little bit the TCQ_F_BUILTIN special case
> > test in qdisc_match_from_root()?
> >
> > After the change, the only way how qdisc_dev() could be NULL should be a
> > TCQ_F_BUILTIN case, right?
> >
> > I was thinking about something like the patch below (the reasong being
> > that ->dev would be NULL only in cases of singletonish qdiscs) ...
> > wouldn't that also fix the issue you're seeing? Have to think it through a
> > little bit more ..
> 
> I think this is probably why we never show noop qdisc in dump. 

Well, partially. A lot of 'default' qdiscs are omitted in a not really 
uniform and deterministic way. That's actually the primary point of this 
whole effort -- to get rid of the hidden qdiscs entirely.

> So I think we should relax the singleton rule for noop_qdisc, to save 
> some code for noop_qdisc case and also for dumping noop_qdisc.

Completely moving away from singleton qdiscs is one of the possibilities, 
but OTOH I think that my special-casing of !qdisc_dev(root) in 
qdisc_match_from_root() is correct handling of singletons. I've been 
completely off the grid for the past three days, but I plan to submit this 
as a proper followup fix tomorrow if noone has any objections.

> I will try to work on a patch tomorrow.

What still needs to be looked into are the duplicate clsact entries for 
multiqueue.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable

2016-08-12 Thread Jiri Kosina
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> > I was thinking about something like the patch below (the reasong being 
> > that ->dev would be NULL only in cases of singletonish qdiscs) ... 
> > wouldn't that also fix the issue you're seeing? Have to think it 
> > through a little bit more ..
> 
> Ahh, so this has the same effect as previously observed with the other fix.

Thanks a lot for confirming that this fixes the panic. I still have to 
think a little bit more about this though.

> Perhaps it's just a dumping issue, but to the below clsact, there shouldn't
> be pfifo_fast instances appearing.
> 
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> # tc qdisc add dev wlp2s0b1 clsact
> # tc qdisc show dev wlp2s0b1
> qdisc mq 0: root
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc clsact : parent :fff1
> qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
> qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

Hmm, no immediate idea where those are coming from, we'll have to figure 
it out. The mq device used here has 4 queues, right?

-- 
Jiri Kosina
SUSE Labs



[PATCH net-next] net: fix up a few missing hashtable.h conflict resolutions

2016-08-12 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

There are a couple of leftover symbol conflicts caused by hashtable.h 
being included by netdevice.h; those were not caught as build failure 
(they're "only" a warning, but in fact real bugs). Fix those up.

Fixes: e87a8f24c ("net: resolve symbol conflicts with generic hashtable.h")
Reported-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 drivers/net/ethernet/dec/tulip/de4x5.c| 4 ++--
 drivers/net/ethernet/dec/tulip/de4x5.h| 4 ++--
 drivers/net/ethernet/freescale/fec_main.c | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/dec/tulip/de4x5.c 
b/drivers/net/ethernet/dec/tulip/de4x5.c
index f0e9e2e..6620fc8 100644
--- a/drivers/net/ethernet/dec/tulip/de4x5.c
+++ b/drivers/net/ethernet/dec/tulip/de4x5.c
@@ -1966,7 +1966,7 @@ SetMulticastFilter(struct net_device *dev)
 } else if (lp->setup_f == HASH_PERF) {   /* Hash Filtering */
netdev_for_each_mc_addr(ha, dev) {
crc = ether_crc_le(ETH_ALEN, ha->addr);
-   hashcode = crc & HASH_BITS;  /* hashcode is 9 LSb of CRC */
+   hashcode = crc & DE4X5_HASH_BITS;  /* hashcode is 9 LSb of CRC 
*/
 
byte = hashcode >> 3;/* bit[3-8] -> byte in filter */
bit = 1 << (hashcode & 0x07);/* bit[0-2] -> bit in byte */
@@ -5043,7 +5043,7 @@ build_setup_frame(struct net_device *dev, int mode)
*(pa + i) = dev->dev_addr[i]; /* Host address */
if (i & 0x01) pa += 2;
}
-   *(lp->setup_frame + (HASH_TABLE_LEN >> 3) - 3) = 0x80;
+   *(lp->setup_frame + (DE4X5_HASH_TABLE_LEN >> 3) - 3) = 0x80;
 } else {
for (i=0; i<ETH_ALEN; i++) { /* Host address */
*(pa + (i&1)) = dev->dev_addr[i];
diff --git a/drivers/net/ethernet/dec/tulip/de4x5.h 
b/drivers/net/ethernet/dec/tulip/de4x5.h
index ec756eb..1bfdc9b 100644
--- a/drivers/net/ethernet/dec/tulip/de4x5.h
+++ b/drivers/net/ethernet/dec/tulip/de4x5.h
@@ -860,8 +860,8 @@
 #define PCI  0
 #define EISA 1
 
-#define HASH_TABLE_LEN   512   /* Bits */
-#define HASH_BITS0x01ff/* 9 LS bits */
+#define DE4X5_HASH_TABLE_LEN   512   /* Bits */
+#define DE4X5_HASH_BITS0x01ff/* 9 LS bits */
 
 #define SETUP_FRAME_LEN  192   /* Bytes */
 #define IMPERF_PA_OFFSET 156   /* Bytes */
diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 01f7e81..fb5c638 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2887,7 +2887,7 @@ fec_enet_close(struct net_device *ndev)
  * this kind of feature?).
  */
 
-#define HASH_BITS  6   /* #bits in hash */
+#define FEC_HASH_BITS  6   /* #bits in hash */
 #define CRC32_POLY 0xEDB88320
 
 static void set_multicast_list(struct net_device *ndev)
@@ -2935,10 +2935,10 @@ static void set_multicast_list(struct net_device *ndev)
}
}
 
-   /* only upper 6 bits (HASH_BITS) are used
+   /* only upper 6 bits (FEC_HASH_BITS) are used
 * which point to specific bit in he hash registers
 */
-   hash = (crc >> (32 - HASH_BITS)) & 0x3f;
+   hash = (crc >> (32 - FEC_HASH_BITS)) & 0x3f;
 
if (hash > 31) {
tmp = readl(fep->hwp + FEC_GRP_HASH_TABLE_HIGH);

-- 
Jiri Kosina
SUSE Labs



Re: [PATCH 2/2] net: sched: convert qdisc linked list to hashtable

2016-08-12 Thread Jiri Kosina
On Fri, 12 Aug 2016, Daniel Borkmann wrote:

> This results in below panic. Tested reverting this patch and it fixes
> the panic.
> 
> Did you test this also with ingress or clsact qdisc (just try adding
> it to lo dev for example) ?

Hi Daniel,

thanks for the report. Hmm, I am pretty sure clsact worked for me, but 
I'll recheck.

> What happens is the following in qdisc_match_from_root():
> 
> [  995.422187] XXX qdisc:88025e4fc800 queue:880262759000
> dev:880261cc2000 handle:
> [  995.422200] XXX qdisc:81cf8100 queue:81cf8240 dev:
> (null) handle:
> 
> I believe this is due to dev_ingress_queue_create() assigning the
> global noop_qdisc instance as qdisc_sleeping, which later qdisc_lookup()
> uses for qdisc_match_from_root().
> 
> But everything that uses things like noop_qdisc cannot work with the
> new qdisc_match_from_root(), because qdisc_dev(root) will always trigger
> NULL pointer dereference there. Reason is because the dev is always
> NULL for noop, it's a singleton, see noop_qdisc and noop_netdev_queue
> in sch_generic.c.
> 
> Now how to fix it? Creating separate noop instances each time it's set
> would be quite a waste of memory. Even fuglier would be to hack a static
> net device struct into sch_generic.c and let noop_netdev_queue point there
> to get to the hash table. Or we just not use qdisc_dev().

How about we actually extend a little bit the TCQ_F_BUILTIN special case 
test in qdisc_match_from_root()?

After the change, the only way how qdisc_dev() could be NULL should be a 
TCQ_F_BUILTIN case, right?

I was thinking about something like the patch below (the reasong being 
that ->dev would be NULL only in cases of singletonish qdiscs) ... 
wouldn't that also fix the issue you're seeing? Have to think it through a 
little bit more ..


diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 25aada7..1c9faed 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -260,6 +260,9 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
 {
struct Qdisc *q;
 
+   if (!qdisc_dev(root))
+   return (root->handle == handle ? root : NULL);
+
if (!(root->flags & TCQ_F_BUILTIN) &&
root->handle == handle)
return root;


Thanks!

-- 
Jiri Kosina
SUSE Labs



[PATCH 0/2] Convert qdisc linked list into a hashtable

2016-08-10 Thread Jiri Kosina
This is a respin of the v6 of the original patch [1], split into two-patch 
series as requested by davem; first patch fixes all symbol conflicts 
that'd happen once netdevice.h starts to include hashtable.h, the second 
one performs the actual switch to hashtable.

I've preserved Cong's Reviewed-by:, as code-wise this series is identical 
to the original v6 of the patch.

[1] lkml.kernel.org/r/alpine.lnx.2.00.1608011220580.22...@cbobk.fhfr.pm

-- 
Jiri Kosina
SUSE Labs



[PATCH 2/2] net: sched: convert qdisc linked list to hashtable

2016-08-10 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 include/linux/netdevice.h |  4 
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  3 +++
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  8 +---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 8 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+#ifdef CONFIG_NET_SCHED
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
+#endif
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+#ifdef CONFIG_NET_SCHED
+   hash_init(dev->qdisc_hash);
+#endif
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..c093d32 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
root->handle == handle)
return root;
 
-   list_for_each_entry_rcu(q, >list, list) {
+   hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, 
handle) {
if (q->handle == handle)
return q;
}
return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
 
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
-   list_add_tail_rcu(>list, >list);
+   hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
ASSERT_RTNL();
-   list_del_rcu(>list);
+   hash_del_rcu(>hash);
}
 }
-EXPORT_SYMBOL(qdisc_list_del);
+EXPORT_SYMBOL(qdisc_hash_del);
 
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle)
 {
@@ -1004,7 +1005,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue 
*dev

[PATCH 1/2] net: resolve symbol conflicts with generic hashtable.h

2016-08-10 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

This is a preparatory patch for converting qdisc linked list into a 
hashtable. As we'll need to include hashtable.h in netdevice.h, we first 
have to make sure that this will not introduce symbol conflicts for any of 
the netdevice.h users.

Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 drivers/net/ethernet/ti/davinci_emac.c | 14 +++---
 net/ipv6/ip6_gre.c | 12 ++--
 net/ipv6/ip6_tunnel.c  | 10 +-
 net/ipv6/ip6_vti.c | 10 +-
 net/ipv6/sit.c | 10 +-
 5 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..91ca2b2 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr)
 }
 
 /**
- * hash_add - Hash function to add mac addr from hash table
+ * emac_hash_add - Hash function to add mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Adds mac address to the internal hash table
  *
  */
-static int hash_add(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr)
 {
struct device *emac_dev = >ndev->dev;
u32 rc = 0;
@@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 
if (hash_value >= EMAC_NUM_MULTICAST_BITS) {
if (netif_msg_drv(priv)) {
-   dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\
+   dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): 
Invalid "\
"Hash %08x, should not be greater than %08x",
hash_value, (EMAC_NUM_MULTICAST_BITS - 1));
}
@@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 }
 
 /**
- * hash_del - Hash function to delete mac addr from hash table
+ * emac_hash_del - Hash function to delete mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Removes mac address from the internal hash table
  *
  */
-static int hash_del(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr)
 {
u32 hash_value;
u32 hash_bit;
@@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 
action, u8 *mac_addr)
 
switch (action) {
case EMAC_MULTICAST_ADD:
-   update = hash_add(priv, mac_addr);
+   update = emac_hash_add(priv, mac_addr);
break;
case EMAC_MULTICAST_DEL:
-   update = hash_del(priv, mac_addr);
+   update = emac_hash_del(priv, mac_addr);
break;
case EMAC_ALL_MULTI_SET:
update = 1;
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu);
will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 
1))
+#define HASH_KEY(key) (((__force u32)key^((__force 
u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
u32 hash = ipv6_addr_hash(addr);
 
-   return hash_32(hash, HASH_SIZE_SHIFT);
+   return hash_32(hash, IP6_GRE_HASH_SIZE_SHIFT);
 }
 
 #define tunnels_r_ltunnels[3]
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, 
struct list_head *head)
 
for (prio = 0; prio < 4; prio++) {
int h;
-   for (h = 0; h < HASH_SIZE; h++) {
+   for (h = 0; h < IP6_GRE_HASH_SIZE; h++) {
struct ip6_tnl *t;
 
t = rtnl_dereference(ign->tunnels[prio][h]);
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 7b0481e..2050217 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -64,8 +64,8 @@ MODULE_LICENSE(&qu

Re: qdisc hash table changes...

2016-08-09 Thread Jiri Kosina
On Tue, 9 Aug 2016, David Miller wrote:

> > It does indirectly pull in some networking headers, but it doesn't pull in 
> > netdevice.h (which is the place where the actual include of hashtable.h 
> > happens):
> 
> It pulls in skbuff.h, so are you willing to bet forever that skbuff.h
> won't indirectly pull in netdevice.h at some point in the future?
> 
> This is obviously a big problem, having global namespace conflicts
> like this.
> 
> Pushing fixing it propering into the future isn't really the right
> thing to do.
> 
> So please entertain my request to do this properly.

Ok, I will add this to my TODO list, but this can take quite some time 
(especially as I can imagine some maintainers pushing back on this, 
because it doesn't really fix any existing issue in their code).

Does that strictly have to be a show-stopper for the qdisc hash 
conversion, given the fact that the whole tree is building properly?

> Thank you.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: qdisc hash table changes...

2016-08-09 Thread Jiri Kosina
On Mon, 8 Aug 2016, David Miller wrote:

> I think there will still be build failures even with v6 due to symbol
> clashes.
> 
> For example, kernel/audit_tree.c defines HASH_SIZE as an enumeration
> value, and that (indirectly) includes networking headers.

It does indirectly pull in some networking headers, but it doesn't pull in 
netdevice.h (which is the place where the actual include of hashtable.h 
happens):

$ make kernel/audit_tree.i; grep HASH_SIZE kernel/audit_tree.i; grep 
netdev.*\.h kernel/audit_tree.i
  CHK include/config/kernel.release
  CHK include/generated/uapi/linux/version.h
  CHK include/generated/utsrelease.h
  CHK include/generated/bounds.h
  CHK include/generated/timeconst.h
  CHK include/generated/asm-offsets.h
  CALLscripts/checksyscalls.sh
  DESCEND  objtool
  CPP kernel/audit_tree.i
enum {HASH_SIZE = 128};
static struct list_head chunk_hash_heads[HASH_SIZE];
 return chunk_hash_heads + n % HASH_SIZE;
 for (i = 0; i < HASH_SIZE; i++)
# 1 "./include/linux/netdev_features.h" 1
# 15 "./include/linux/netdev_features.h"
# 85 "./include/linux/netdev_features.h"
struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length,
 return __netdev_alloc_skb(dev, length, ((( gfp_t)0x20u)|(( gfp_t)0x8u)|(( 
gfp_t)0x200u)));
 return __netdev_alloc_skb(((void *)0), length, gfp_mask);
 return netdev_alloc_skb(((void *)0), length);
 struct sk_buff *skb = __netdev_alloc_skb(dev, length + 0, gfp);
 return __netdev_alloc_skb_ip_align(dev, length, ((( gfp_t)0x20u)|(( 
gfp_t)0x8u)|(( gfp_t)0x200u)));

So this is fine.

> There are others all over the tree.
> 
> I would therefore ask that you first fix the namespace conflicts against 
> the hash symbols in the entire tree as a separate patch series (one for 
> each driver/subsystem which has this problem.)  Really, get it down to 
> "git grep hash_add | grep -v _hash_add" and similar returning no output.

Is this really worth the hassle when there are no real conflicts (i.e. the 
code in question doesn't indirectly pull in netdevice.h)?

I just did allmodconfig build with v6, and it passed. Fengguang's 0-day 
bot didn't report any failures either.

> Then we can add the qdisc hash facility.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH v6] net: sched: convert qdisc linked list to hashtable

2016-08-01 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>
 
Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if 
  CONFIG_NET_SCHED has been enabled

v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
  add Cong Wang's reviewed-by

v5 -> v6: build fix for davinci_emac driver that got symbol conflict 
  due to hashtable.h include, reported by 0day bot

 drivers/net/ethernet/ti/davinci_emac.c | 14 +++---
 include/linux/netdevice.h  |  4 
 include/net/pkt_sched.h|  4 ++--
 include/net/sch_generic.h  |  2 +-
 net/core/dev.c |  3 +++
 net/ipv6/ip6_gre.c | 12 ++--
 net/ipv6/ip6_tunnel.c  | 10 +-
 net/ipv6/ip6_vti.c | 10 +-
 net/ipv6/sit.c | 10 +-
 net/sched/sch_api.c| 23 +--
 net/sched/sch_generic.c|  8 +---
 net/sched/sch_mq.c |  2 +-
 net/sched/sch_mqprio.c |  2 +-
 13 files changed, 58 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_emac.c 
b/drivers/net/ethernet/ti/davinci_emac.c
index f56d66e..91ca2b2 100644
--- a/drivers/net/ethernet/ti/davinci_emac.c
+++ b/drivers/net/ethernet/ti/davinci_emac.c
@@ -726,14 +726,14 @@ static u32 hash_get(u8 *addr)
 }
 
 /**
- * hash_add - Hash function to add mac addr from hash table
+ * emac_hash_add - Hash function to add mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Adds mac address to the internal hash table
  *
  */
-static int hash_add(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_add(struct emac_priv *priv, u8 *mac_addr)
 {
struct device *emac_dev = >ndev->dev;
u32 rc = 0;
@@ -742,7 +742,7 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 
if (hash_value >= EMAC_NUM_MULTICAST_BITS) {
if (netif_msg_drv(priv)) {
-   dev_err(emac_dev, "DaVinci EMAC: hash_add(): Invalid "\
+   dev_err(emac_dev, "DaVinci EMAC: emac_hash_add(): 
Invalid "\
"Hash %08x, should not be greater than %08x",
hash_value, (EMAC_NUM_MULTICAST_BITS - 1));
}
@@ -768,14 +768,14 @@ static int hash_add(struct emac_priv *priv, u8 *mac_addr)
 }
 
 /**
- * hash_del - Hash function to delete mac addr from hash table
+ * emac_hash_del - Hash function to delete mac addr from hash table
  * @priv: The DaVinci EMAC private adapter structure
  * @mac_addr: mac address to delete from hash table
  *
  * Removes mac address from the internal hash table
  *
  */
-static int hash_del(struct emac_priv *priv, u8 *mac_addr)
+static int emac_hash_del(struct emac_priv *priv, u8 *mac_addr)
 {
u32 hash_value;
u32 hash_bit;
@@ -825,10 +825,10 @@ static void emac_add_mcast(struct emac_priv *priv, u32 
action, u8 *mac_addr)
 
switch (action) {
case EMAC_MULTICAST_ADD:
-   update = hash_add(priv, mac_addr);
+   update = emac_hash_add(priv, mac_addr);
break;
case EMAC_MULTICAST_DEL:
-   update = hash_del(priv, mac_addr);
+   update = emac_hash_del(priv, mac_addr);
break;
case EMAC_ALL_MULTI_SET:
update = 1;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;

Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-08-01 Thread Jiri Kosina
On Sun, 31 Jul 2016, Fengguang Wu wrote:

> Jiri, I just double checked and find no similar errors related to
> qdisc_list_add(). The parent commit 95556a8838 ("dccp: avoid deadlock
> in dccp_v4_ctl_send_reset") builds fine without error.

You are right, I realized my mistake afterwards. This is fixed in v5 of 
the patch.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH v5] net: sched: convert qdisc linked list to hashtable

2016-07-29 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Reviewed-by: Cong Wang <xiyou.wangc...@gmail.com>
Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if 
  CONFIG_NET_SCHED has been enabled

v4 -> v5: fix !CONFIG_NET_SCHED build (reported by Fengguang Wu)
  add Cong Wang's reviewed-by

 include/linux/netdevice.h |  4 
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  3 +++
 net/ipv6/ip6_gre.c| 12 ++--
 net/ipv6/ip6_tunnel.c | 10 +-
 net/ipv6/ip6_vti.c| 10 +-
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  8 +---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+#ifdef CONFIG_NET_SCHED
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
+#endif
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+#ifdef CONFIG_NET_SCHED
+   hash_init(dev->qdisc_hash);
+#endif
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@

Re: [PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Jiri Kosina
On Thu, 28 Jul 2016, kbuild test robot wrote:

> [auto build test ERROR on v4.7-rc7]
> [also build test ERROR on next-20160728]
> [cannot apply to net/master net-next/master ipsec-next/master]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160728-182303
> config: i386-randconfig-s0-201630 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386 
> 
> All errors (new ones prefixed by >>):
> 
>net/built-in.o: In function `dev_activate':
> >> (.text+0x37ccb): undefined reference to `qdisc_hash_add'

Dear 0-day team,

could you please check my question regarding this very build failure here?

lkml.kernel.org/r/alpine.lnx.2.00.1607141612560.24...@cbobk.fhfr.pm

Thanks,

-- 
Jiri Kosina
SUSE Labs



[PATCH v4] net: sched: convert qdisc linked list to hashtable

2016-07-28 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

v3 -> v4: put the hastable into struct netdevice only if 
  CONFIG_NET_SCHED has been enabled

 include/linux/netdevice.h |  4 
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  3 +++
 net/ipv6/ip6_gre.c| 12 ++--
 net/ipv6/ip6_tunnel.c | 10 +-
 net/ipv6/ip6_vti.c| 10 +-
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 49 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..17c6499 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,9 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+#ifdef CONFIG_NET_SCHED
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
+#endif
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..d3736d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+#ifdef CONFIG_NET_SCHED
+   hash_init(dev->qdisc_hash);
+#endif
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu);
will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>

Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable

2016-07-14 Thread Jiri Kosina

[ added CCs ]

On Tue, 12 Jul 2016, kbuild test robot wrote:

> Hi,
> 
> [auto build test ERROR on net/master]
> [also build test ERROR on v4.7-rc7 next-20160711]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Jiri-Kosina/net-sched-convert-qdisc-linked-list-to-hashtable/20160711-220527
> config: arm-tct_hammer_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=arm 
> 
> All errors (new ones prefixed by >>):
> 
>net/built-in.o: In function `dev_activate':
> >> wext-proc.c:(.text+0x38544): undefined reference to `qdisc_hash_add'

This issue is be there even without my patch (but with qdisc_list_add 
instead), isn't it?

The problem is that sch_generic.c (where dev_activate() is) is being 
compiled everytime CONFIG_NET is set, but sch_api.c (where 
qdisc_list_add() is defined) only when CONFIG_NET_SCHED is set (and there 
is no stub for !CONFIG_NET_SCHED case).

-- 
Jiri Kosina
SUSE Labs


Re: [RFC PATCH v3] net: sched: convert qdisc linked list to hashtable

2016-07-13 Thread Jiri Kosina
On Tue, 12 Jul 2016, Cong Wang wrote:

> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index f45929c..0b5c172e 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -52,6 +52,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  struct netpoll_info;
> >  struct device;
> > @@ -1778,6 +1779,7 @@ struct net_device {
> > unsigned intnum_tx_queues;
> > unsigned intreal_num_tx_queues;
> > struct Qdisc*qdisc;
> > +   DECLARE_HASHTABLE   (qdisc_hash, 4);
> > unsigned long   tx_queue_len;
> > spinlock_t  tx_global_lock;
> > int watchdog_timeo;
> 
> Should it be surrounded by CONFIG_NET_SCHED?
> To save several bytes for !CONFIG_NET_SCHED case.

Makes sense. I'll wait a bit for more feedback (if there is any) before 
including this in potential v4.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[RFC PATCH v3] net: sched: convert qdisc linked list to hashtable

2016-07-11 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to 
make sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2: fix up RCU hastable usage wrt. rtnl
  fix compilation of .c files which define their own
  HASH_SIZE that now oncflicts with the one from
  hashtable.h (newly included via netdevice.h)

v2 -> v3: resolve HASH_SIZE identifier conflicts in a cleaner way
  fix up the number of hash bucket bits (4 bits for 16 buckets)

 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  1 +
 net/ipv6/ip6_gre.c| 12 ++--
 net/ipv6/ip6_tunnel.c | 10 +-
 net/ipv6/ip6_vti.c| 10 +-
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..0b5c172e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+   DECLARE_HASHTABLE   (qdisc_hash, 4);
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+   hash_init(dev->qdisc_hash);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..d3697a4 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -61,12 +61,12 @@ static bool log_ecn_error = true;
 module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
-#define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define IP6_GRE_HASH_SIZE_SHIFT  5
+#define IP6_GRE_HASH_SIZE (1 << IP6_GRE_HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][IP6_GRE_HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,12 +96,12 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu);
will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 
1))
+#define HASH_KEY(key) (((__force u32)key^((__force 
u32)key>>4))&(IP6_GRE_HASH_SIZE - 1))
 static u32 HASH_ADDR(const struct in6_addr *addr

Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable

2016-07-08 Thread Jiri Kosina
On Fri, 8 Jul 2016, Eric Dumazet wrote:

> > diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> > index fdc9de2..0f70ecc 100644
> > --- a/net/ipv6/ip6_gre.c
> > +++ b/net/ipv6/ip6_gre.c
> > @@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
> >  MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
> >  
> >  #define HASH_SIZE_SHIFT  5
> > -#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
> > +#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
> 
> __ prefix is mostly used for functions having some kind of
> shells/helpers.
> 
> I would rather use IP6_GRE_HASH_SIZE or something which has lower
> chances of being used elsewhere.

Alright, makes sense, will do this in v3.

> @@ -732,6 +730,8 @@ static void attach_default_qdiscs(struct net_device *dev)
> > qdisc->ops->attach(qdisc);
> > }
> > }
> > +   if (dev->qdisc)
> > +   qdisc_hash_add(dev->qdisc);
> >  }
> >  
> 
> I do not understand this addition, could you comment on it ?

With linked lists, assigning to struct net_device's Qdisc pointer is 
enough to "initialize" the linked list and have it contain one (root) 
item. With hashtable, this is not the case, it needs to be explicitly 
added.

Hmm, dev_init_scheduler() (and perhaps also dev_shutdown()) would possibly 
need similar treatment in order to have accurate data there 100% of the 
time even during initialization.

-- 
Jiri Kosina
SUSE Labs



Re: [RFC PATCH v2] net: sched: convert qdisc linked list to hashtable

2016-07-08 Thread Jiri Kosina
On Thu, 7 Jul 2016, Craig Gallek wrote:

> This sort of seems like it's just side-stepping the problem.  Given
> that the size of this hash table is fixed, the lookup time of this
> operation is still going to approach linear as the number of qdiscs
> increases.  

That's true; however the primary goal here is not to actually ultimately 
improve speed of qdisc lookup per se, but rather to make it possible to 
unhide the qdiscs which are currently omitted as the linked list takes too 
long to walk. The static hashtable is going help here.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[RFC PATCH v2] net: sched: convert qdisc linked list to hashtable

2016-07-07 Thread Jiri Kosina
From: Jiri Kosina <jkos...@suse.cz>

Convert the per-device linked list into a hashtable. The primary 
motivation for this change is that currently, we're not tracking all the 
qdiscs in hierarchy (e.g. excluding default qdiscs), as the lookup 
performed over the linked list by qdisc_match_from_root() is rather 
expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will 
bring much more determinism in user experience.

As we're adding hashtable.h include into generic netdevice.h, we have to make
sure HASH_SIZE macro is now non-conflicting with local definitions.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---

v1 -> v2:   fix up RCU hastable usage wrt. rtnl
fix compilation of .c files which define their own 
 HASH_SIZE that now oncflicts with the one from 
 hashtable.h (newly included via netdevice.h)

 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  1 +
 net/ipv6/ip6_gre.c|  8 
 net/ipv6/ip6_tunnel.c |  6 +++---
 net/ipv6/ip6_vti.c|  6 +++---
 net/ipv6/sit.c| 10 +-
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 12 files changed, 39 insertions(+), 33 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+   DECLARE_HASHTABLE   (qdisc_hash, 16);
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+   hash_init(dev->qdisc_hash);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de2..0f70ecc 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -62,11 +62,11 @@ module_param(log_ecn_error, bool, 0644);
 MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
 
 #define HASH_SIZE_SHIFT  5
-#define HASH_SIZE (1 << HASH_SIZE_SHIFT)
+#define __HASH_SIZE (1 << HASH_SIZE_SHIFT)
 
 static int ip6gre_net_id __read_mostly;
 struct ip6gre_net {
-   struct ip6_tnl __rcu *tunnels[4][HASH_SIZE];
+   struct ip6_tnl __rcu *tunnels[4][__HASH_SIZE];
 
struct net_device *fb_tunnel_dev;
 };
@@ -96,7 +96,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu);
will match fallback tunnel.
  */
 
-#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(HASH_SIZE - 
1))
+#define HASH_KEY(key) (((__force u32)key^((__force u32)key>>4))&(__HASH_SIZE - 
1))
 static u32 HASH_ADDR(const struct in6_addr *addr)
 {
u32 hash = ipv6_addr_hash(addr);
@@ -1089,7 +1089,7 @@ static void ip6gre_destroy_tunnels(struct net *net, 
struct list_head *head)
 
for (prio = 0; prio < 4; prio++) {
int h;
-   for (

Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Jiri Kosina
On Thu, 7 Jul 2016, Eric Dumazet wrote:

> > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> >  {
> > int ret = 0, q_idx = *q_idx_p;
> > struct Qdisc *q;
> > +   int b;
> >  
> > if (!root)
> > return 0;
> > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> > goto done;
> > q_idx++;
> > }
> > -   list_for_each_entry(q, >list, list) {
> > +   hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > if (q_idx < s_q_idx) {
> > q_idx++;
> > continue;
> > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> >int *t_p, int s_t)
> >  {
> > struct Qdisc *q;
> > +   int b;
> >  
> > if (!root)
> > return 0;
> > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> > if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
> > return -1;
> >  
> > -   list_for_each_entry(q, >list, list) {
> > +   hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
> > return -1;
> > }
> 
> 
> Not sure why you used the rcu version here, but the non rcu version in
> tc_dump_qdisc_root()

Good catch.

Actually even the current code is odd in this regard -- 
qdisc_match_from_root() uses RCU iterator, while tc_dump_*() use the 
non-RCU one; addition and deletion is performed using RCU primitives.

I haven't got my head around this yet; if it's correct at all, it'd at 
least deserve a comment somewhere.

I'll respin v2 of the patch (there is also a conflict on HASH_SIZE 
definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include 
in netdevice.h that needs to be resolved as well) that'd make RCU usage 
consistent.

Any other objections/comments? I was namely curious about any opinions 
regarding the hashtable size.

Thanks,

-- 
Jiri Kosina
SUSE Labs



[RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Anyway, we probably need to improve our ability to understand qdisc 
> hierarchies. Having some hidden qdiscs is the real problem here.
> 
> We need to add some hash table so that qdisc_match_from_root() does not 
> have to scan hundred of qdiscs.

So how about something like the patch below? I already have preliminary 
patches on top which unhide the default qdiscs, but let's make this one 
step after the other.

Thanks.




From: Jiri Kosina <jkos...@suse.cz>
Subject: [PATCH] net: sched: convert qdisc linked list to hashtable

Convert the per-device linked list into a hashtable. The primary motivation
for this change is that currently, we're not tracking all the qdiscs in
hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
linked list by qdisc_match_from_root() is rather expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will bring
much more determinism in user experience.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  1 +
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+   DECLARE_HASHTABLE   (qdisc_hash, 16);
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+   hash_init(dev->qdisc_hash);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..82953cb 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
root->handle == handle)
return root;
 
-   list_for_each_entry_rcu(q, >list, list) {
+   hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, 
handle) {
if (q->handle == handle)
return q;
}
return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
 
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
-   list_add_tail_rcu(>list, >list);
+   hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-06-28 Thread Jiri Kosina
On Tue, 28 Jun 2016, Cong Wang wrote:

> > BTW, I've started to actually work on fixing this, and I've noticed that
> > TBF behavior actually violates what's stated in pfifo_fast manpage:
> >
> > ==
> > Whenever  an  interface is created, the pfifo_fast qdisc is
> > automatically used as a queue. If another qdisc is
> > attached, it preempts the default pfifo_fast, which automatically
> > returns to function when an  existing  qdisc is detached.
> >
> > In this sense this qdisc is magic, and unlike other qdiscs.
> > ==
> 
> It is out of date, now default qdisc can be set to any other qdisc
> via /proc. Also, probably due to historical reasons, we don't have
> a unified default default qdisc, some uses bfifo, some uses pfifo,
> we may break some existing script if we change that.

While I do understand that reasoning, I'd argue that unpredictable and 
unexpected behavior of TBF causing systems with non-working networking is 
much more likely than any userspace having hard dependency on the fact 
that default (*) qdisc for TBF is noop.

(*) where 'default upon creation' != 'default when reset'

Thanks,

-- 
Jiri Kosina
SUSE Labs



[RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-06-28 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.
> 
> Surely the silent fallback to noop_qdisc is wrong.

So before we go further and fix the fact that we actually do have hidden 
qdiscs (by refactoring qdisc_match_from_root() and friends), I'd still 
like to bring the patch below up for consideration.

Thanks.




From: Jiri Kosina <jkos...@suse.cz>
Subject: [PATCH] sch_tbf: avoid silent fallback to noop_qdisc

TBF started its life as a classless qdisc with a single builtin FIFO queue
which was being shaped.

When it got later turned into classful qdisc, it was written in a way that
the fallback qdisc was noop_qdisc, which produces bad user experience (delete
of last manually added class doesn't reset it to initial default, but renders
networking unusable instead).

Switch the default fallback to bfifo; this also mimics how the other guys
(HTB, HFSC, CBQ, ...) are behaving.

Signed-off-by: Jiri Kosina <jkos...@suse.cz>
---
 net/sched/sch_tbf.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 3161e49..b06dffe 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -508,8 +508,12 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, 
struct Qdisc *new,
 {
struct tbf_sched_data *q = qdisc_priv(sch);
 
-   if (new == NULL)
-   new = _qdisc;
+   if (new == NULL) {
+   /* reset to default qdisc */
+   new = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, 
sch->parent);
+   if (!new)
+   return -ENOBUFS;
+   }
 
*old = qdisc_replace(sch, new, >qdisc);
return 0;
-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-06-28 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> > TBF is probably a bad example because it started life as a classless 
> > qdisc. There was only one built-in fifo queue that was shaped. Then 
> > someone made it classful and changed this behavior. To me it sounds 
> > reasonable to have the default behavior restored. At minimal 
> > consistency.
> 
> 
> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.

BTW, I've started to actually work on fixing this, and I've noticed that 
TBF behavior actually violates what's stated in pfifo_fast manpage:

==
Whenever  an  interface is created, the pfifo_fast qdisc is 
automatically used as a queue. If another qdisc is
attached, it preempts the default pfifo_fast, which automatically 
returns to function when an  existing  qdisc is detached.

In this sense this qdisc is magic, and unlike other qdiscs.
==

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Phil Sutter wrote:

> > > I've came across the behavior where adding a child qdisc and then 
> > > deleting 
> > > it again makes the networking dysfunctional (I guess that's because all 
> > > of 
> > > a sudden there is absolutely no working qdisc on the device, although 
> > > there originally was a default one in the parent).
> > > 
> > > In a nutshell, is this expected behavior or bug?
> > 
> > This is the expected behavior.
> 
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Would a patch that'd unify this in a sense that all qdiscs would assign 
the default one upon deletion acceptable?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Phil Sutter wrote:

> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Exactly. I'd again like to stress the fact that this configuration works:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
1.0ms 

and this (after performing add/delete operation) doesn't:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
1.0ms 

It's hard to spot a difference (hint: there is none).

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Jiri Kosina wrote:

> In a nutshell, is this expected behavior or bug?

Just to clarify what seems to suggest to me that this is rather a bug that 
needs to be fixed (but apparently one that has been there for quite a long 
time) can be demonstrated by this:

> 
> =
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration works.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
> 
> jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 
> divisor 1024 
> 
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
> 
> jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration doesn't although it's identical to the working one 
at the beginning.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
>   [ ... nothing happens ... ]
> ^C

-- 
Jiri Kosina
SUSE Labs



Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
Hi,

I've came across the behavior where adding a child qdisc and then deleting 
it again makes the networking dysfunctional (I guess that's because all of 
a sudden there is absolutely no working qdisc on the device, although 
there originally was a default one in the parent).

In a nutshell, is this expected behavior or bug?

=
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms

jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 divisor 
1024 

jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms

jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
[ ... nothing happens ... ]
^C
jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
jikos:~ # ping -c 1 nix.cz | head -2
PING nix.cz (195.47.235.3) 56(84) bytes of data.
64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.66 ms
=

Thanks,

-- 
Jiri Kosina



Re: [PATCH 1/4] list: introduce list_is_first()

2015-12-10 Thread Jiri Kosina
On Thu, 10 Dec 2015, Jens Axboe wrote:

> It's a balance, as we also should not make APIs out of everything. As I said,
> purely my opinion, but I think the is_last/is_first have jumped the shark.

I don't have a strong opinion either way.

What I think we should do though, is to either have both (i.e accept this 
patchset) or have neither of them (i.e. drop list_is_last()).

Otherwise people are likely to be confused by such an asymetric API and 
will keep posting patches for it over and over again.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep warning

2008-02-22 Thread Jiri Kosina
On Fri, 22 Feb 2008, Anders Eriksson wrote:

 I found this is a newly booted 2.6.25-rc2's syslog.
 Feb 21 20:46:33 tippex BUG: rwlock wrong owner on CPU#0, runscript.sh/2633, 
 d2c04084
 Feb 21 20:46:33 tippex Pid: 2633, comm: runscript.sh Not tainted 2.6.25-rc2 #3
 Feb 21 20:46:33 tippex [c02342d0] rwlock_bug+0x50/0x60
 Feb 21 20:46:33 tippex [c0234356] _raw_write_unlock+0x56/0x60
 Feb 21 20:46:33 tippex [c040365d] _write_unlock+0x1d/0x50
 Feb 21 20:46:33 tippex [c03289be] neigh_timer_handler+0x18e/0x2d0
 Feb 21 20:46:33 tippex [c0125449] run_timer_softirq+0x119/0x180
 Feb 21 20:46:33 tippex [c013856d] ? lock_release_holdtime+0x5d/0x80
 Feb 21 20:46:33 tippex [c0328830] ? neigh_timer_handler+0x0/0x2d0
 Feb 21 20:46:33 tippex [c0121a64] __do_softirq+0x54/0xb0
 Feb 21 20:46:33 tippex [c0121af5] do_softirq+0x35/0x40
 Feb 21 20:46:33 tippex [c0121cb4] irq_exit+0x44/0x50
 Feb 21 20:46:33 tippex [c0105757] do_IRQ+0x47/0x80
 Feb 21 20:46:33 tippex [c01039c3] common_interrupt+0x23/0x28
 Feb 21 20:46:33 tippex ===

This needs to be CCed to netdev.

Any chance that

git revert 69cc64d8d92

makes this report go away?

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep warning

2008-02-22 Thread Jiri Kosina
On Fri, 22 Feb 2008, Anders Eriksson wrote:

  This needs to be CCed to netdev.
  Any chance that
  git revert 69cc64d8d92
  makes this report go away? 
 I'll have to install a git repo to check, or maybe you can send me the diff 
 to 
 reverse vs. 2.6.25-rc2?

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 7bb6a9a..a16cf1e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -834,12 +834,18 @@ static void neigh_timer_handler(unsigned long arg)
}
if (neigh-nud_state  (NUD_INCOMPLETE | NUD_PROBE)) {
struct sk_buff *skb = skb_peek(neigh-arp_queue);
-
+   /* keep skb alive even if arp_queue overflows */
+   if (skb)
+   skb_get(skb);
+   write_unlock(neigh-lock);
neigh-ops-solicit(neigh, skb);
atomic_inc(neigh-probes);
-   }
+   if (skb)
+   kfree_skb(skb);
+   } else {
 out:
-   write_unlock(neigh-lock);
+   write_unlock(neigh-lock);
+   }
 
if (notify)
neigh_update_notify(neigh);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index c663fa5..8e17f65 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -368,6 +368,7 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
if (!(neigh-nud_stateNUD_VALID))
printk(KERN_DEBUG trying to ucast probe in 
NUD_INVALID\n);
dst_ha = neigh-ha;
+   read_lock_bh(neigh-lock);
} else if ((probes -= neigh-parms-app_probes)  0) {
 #ifdef CONFIG_ARPD
neigh_app_ns(neigh);
@@ -377,6 +378,8 @@ static void arp_solicit(struct neighbour *neigh, struct 
sk_buff *skb)
 
arp_send(ARPOP_REQUEST, ETH_P_ARP, target, dev, saddr,
 dst_ha, dev-dev_addr, NULL);
+   if (dst_ha)
+   read_unlock_bh(neigh-lock);
 }
 
 static int arp_ignore(struct in_device *in_dev, __be32 sip, __be32 tip)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: lockdep warning

2008-02-22 Thread Jiri Kosina
On Fri, 22 Feb 2008, Anders Eriksson wrote:

   Any chance that
 git revert 69cc64d8d92
   makes this report go away?  
 I've tested the patch and I no longer get that lock thing in my syslog.

Thanks for verification.

Hmm, I don't immediately see how this patch could make neigh-lock owner 
to change between lock and unlock ... I have skimmed through the solicit 
methods, and they don't seem to be doing anything nasty to neigh ...

The scenario I was thinking about is that before 69cc64d8d92, if any of 
the _solicit methods could do anything bad to neigh struct, this warning 
wouldn't trigger, because the lock has been dropped before calling 
_solicit() and reacquired later, so no mismatch on -current could happen, 
but now as long as the lock is held during _solicit() call, this would 
trigger on the next unlock.

But I am not able to see anything like that in the code. Dave, do you have 
any idea? (the thread started at http://lkml.org/lkml/2008/2/22/105).

-- 
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Subject: [PATCH] Revert [RTNETLINK]: Send a single notification on device state changes.

2008-02-17 Thread Jiri Kosina
This reverts commit 45b503548210fe6f23e92b856421c2a3f05fd034.

It contains deadlock, and breaks userspace applications (wpa_supplicant,
networkmanager). References:

http://bugzilla.kernel.org/show_bug.cgi?id=10002
http://bugzilla.kernel.org/show_bug.cgi?id=10002
---
 net/core/rtnetlink.c |   36 ++--
 1 files changed, 10 insertions(+), 26 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ecb02af..61ac8d0 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -504,7 +504,7 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst, u32 id,
 
 EXPORT_SYMBOL_GPL(rtnl_put_cacheinfo);
 
-static int set_operstate(struct net_device *dev, unsigned char transition, 
bool send_notification)
+static void set_operstate(struct net_device *dev, unsigned char transition)
 {
unsigned char operstate = dev-operstate;
 
@@ -527,12 +527,8 @@ static int set_operstate(struct net_device *dev, unsigned 
char transition, bool
write_lock_bh(dev_base_lock);
dev-operstate = operstate;
write_unlock_bh(dev_base_lock);
-
-   if (send_notification)
-   netdev_state_change(dev);
-   return 1;
-   } else
-   return 0;
+   netdev_state_change(dev);
+   }
 }
 
 static void copy_rtnl_link_stats(struct rtnl_link_stats *a,
@@ -826,7 +822,6 @@ static int do_setlink(struct net_device *dev, struct 
ifinfomsg *ifm,
if (tb[IFLA_BROADCAST]) {
nla_memcpy(dev-broadcast, tb[IFLA_BROADCAST], dev-addr_len);
send_addr_notify = 1;
-   modified = 1;
}
 
if (ifm-ifi_flags || ifm-ifi_change) {
@@ -839,23 +834,16 @@ static int do_setlink(struct net_device *dev, struct 
ifinfomsg *ifm,
dev_change_flags(dev, flags);
}
 
-   if (tb[IFLA_TXQLEN]) {
-   if (dev-tx_queue_len != nla_get_u32(tb[IFLA_TXQLEN])) {
-   dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
-   modified = 1;
-   }
-   }
+   if (tb[IFLA_TXQLEN])
+   dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
 
if (tb[IFLA_OPERSTATE])
-   modified |= set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]), 
false);
+   set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
 
if (tb[IFLA_LINKMODE]) {
-   if (dev-link_mode != nla_get_u8(tb[IFLA_LINKMODE])) {
-   write_lock_bh(dev_base_lock);
-   dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
-   write_lock_bh(dev_base_lock);
-   modified = 1;
-   }
+   write_lock_bh(dev_base_lock);
+   dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
+   write_unlock_bh(dev_base_lock);
}
 
err = 0;
@@ -869,10 +857,6 @@ errout:
 
if (send_addr_notify)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
-
-   if (modified)
-   netdev_state_change(dev);
-
return err;
 }
 
@@ -990,7 +974,7 @@ struct net_device *rtnl_create_link(struct net *net, char 
*ifname,
if (tb[IFLA_TXQLEN])
dev-tx_queue_len = nla_get_u32(tb[IFLA_TXQLEN]);
if (tb[IFLA_OPERSTATE])
-   set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]), true);
+   set_operstate(dev, nla_get_u8(tb[IFLA_OPERSTATE]));
if (tb[IFLA_LINKMODE])
dev-link_mode = nla_get_u8(tb[IFLA_LINKMODE]);
 
-- 
1.5.2.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-14 Thread Jiri Kosina
On Tue, 13 Nov 2007, Andrew Morton wrote:

  HID
  
  Kernel NULL pointer dereference at :usbhid:hiddev_ioctl+0x2f/0xabc
  http://bugzilla.kernel.org/show_bug.cgi?id=9216
  Kernel: 2.6.23.1
  Looks like this is a regression
 No response from developers

Hi,

it is assigned to '[EMAIL PROTECTED]', so I didn't 
notice, it's as simple as that.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: oops, Linus tree: 2.6.23-g4fa4d23f, BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004

2007-10-20 Thread Jiri Kosina
 successfully
 arp_tables: (C) 2002 David S. Miller
 TCP cubic registered
 NET: Registered protocol family 1
 NET: Registered protocol family 10
 ip6_tables: (C) 2000-2006 Netfilter Core Team
 IPv6 over IPv4 tunneling driver
 NET: Registered protocol family 17
 Using IPI Shortcut mode
 input: ImPS/2 Logitech Wheel Mouse as 
 /devices/platform/i8042/serio1/input/input3
 md: Autodetecting RAID arrays.
 md: Scanned 8 and added 8 devices.
 md: autorun ...
 md: considering hdc4 ...
 md:  adding hdc4 ...
 md: hdc3 has different UUID to hdc4
 md: hdc2 has different UUID to hdc4
 md: hdc1 has different UUID to hdc4
 md:  adding hda4 ...
 md: hda3 has different UUID to hdc4
 md: hda2 has different UUID to hdc4
 md: hda1 has different UUID to hdc4
 md: created md4
 md: bindhda4
 md: bindhdc4
 md: running: hdc4hda4
 raid1: raid set md4 active with 2 out of 2 mirrors
 md4: bitmap initialized from disk: read 11/11 pages, set 6 bits
 created bitmap (169 pages) for device md4
 md: considering hdc3 ...
 md:  adding hdc3 ...
 md: hdc2 has different UUID to hdc3
 md: hdc1 has different UUID to hdc3
 md:  adding hda3 ...
 md: hda2 has different UUID to hdc3
 md: hda1 has different UUID to hdc3
 md: created md3
 md: bindhda3
 md: bindhdc3
 md: running: hdc3hda3
 raid1: raid set md3 active with 2 out of 2 mirrors
 md3: bitmap initialized from disk: read 15/15 pages, set 0 bits
 created bitmap (233 pages) for device md3
 md: considering hdc2 ...
 md:  adding hdc2 ...
 md: hdc1 has different UUID to hdc2
 md:  adding hda2 ...
 md: hda1 has different UUID to hdc2
 md: created md2
 md: bindhda2
 md: bindhdc2
 md: running: hdc2hda2
 raid1: raid set md2 active with 2 out of 2 mirrors
 md2: bitmap initialized from disk: read 8/8 pages, set 0 bits
 created bitmap (123 pages) for device md2
 md: considering hdc1 ...
 md:  adding hdc1 ...
 md:  adding hda1 ...
 md: created md1
 md: bindhda1
 md: bindhdc1
 md: running: hdc1hda1
 raid1: raid set md1 active with 2 out of 2 mirrors
 md1: bitmap initialized from disk: read 10/10 pages, set 0 bits
 created bitmap (153 pages) for device md1
 md: ... autorun DONE.
 EXT3-fs: INFO: recovery required on readonly filesystem.
 EXT3-fs: write access will be enabled during recovery.
 kjournald starting.  Commit interval 5 seconds
 EXT3-fs: recovery complete.
 EXT3-fs: mounted filesystem with ordered data mode.
 VFS: Mounted root (ext3 filesystem) readonly.
 Freeing unused kernel memory: 264k freed
 EXT3 FS on md4, internal journal
 kjournald starting.  Commit interval 5 seconds
 EXT3 FS on md3, internal journal
 EXT3-fs: mounted filesystem with ordered data mode.
 Adding 192k swap on /dev/md2.  Priority:-1 extents:1 across:192k
 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
 eth1: link up, 100Mbps, full-duplex, lpa 0x45E1
 eth0: no IPv6 routers present
 eth1: no IPv6 routers present
 BUG: unable to handle kernel NULL pointer dereference at virtual address 
 0004
 printing eip: c0383fa0 *pde =  
 Oops:  [#1] 
 CPU:0
 EIP:0060:[c0383fa0]Not tainted VLI
 EFLAGS: 00010282   (2.6.23-g4fa4d23f #4)
 EIP is at sk_filter_delayed_uncharge+0x0/0x20
 eax: c5dfb600   ebx:    ecx:    edx: 
 esi: c5dfb600   edi:    ebp: c170d300   esp: c5dcbef8
 ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
 Process dhcpd (pid: 5380, ti=c5dca000 task=c47f0520 task.ti=c5dca000)
 Stack: c038410f 0068 0058 c5dcbf34 b7fd000b c1b75000 c5dfb600 
 c0371d19 
c1b70fd8 c180d300 c1b70f80 0007 0001 b7fd000b c170d900 
 b7fd000b 
080c9fa0 000e  c5dcbf6c c036dbc2 000e c1b75000 
 0008 
 Call Trace:
  [c038410f] sk_attach_filter+0xdf/0x100
  [c0371d19] sock_setsockopt+0x509/0x590
  [c036dbc2] sockfd_lookup_light+0x32/0x60
  [c036dd8b] sys_setsockopt+0x9b/0xb0
  [c036f705] sys_socketcall+0xd5/0x280
  [c010503e] sysenter_past_esp+0x5f/0x85
  ===
 Code: cb 04 76 c7 83 c1 01 83 ee 01 39 d1 75 8d eb d7 83 7c cb 04 0f 77 b4 83 
 c1 01 83 ee 01 39 d1 0f 85 76 ff ff ff eb c0 8d 74 26 00 8b 4a 04 8d 0c cd 
 10 00 00 00 29 48 5c 8d 42 08 ba 10 40 38 c0 
 EIP: [c0383fa0] sk_filter_delayed_uncharge+0x0/0x20 SS:ESP 0068:c5dcbef8
 
 
 

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bluetooth wireless mouse very sluggish when there is moderated network activity

2007-10-14 Thread Jiri Kosina
On Sat, 13 Oct 2007, Ismail Dönmez wrote:

 input: Microsoft Microsoft? Wireless Notebook Presenter Mouse 8000 as 
 /devices/pci:00/:00:1d.1/usb3/3-1/3-1.3/3-1.3:1.0/input/input12 
 input: USB HID v1.11 Mouse [Microsoft Microsoft? Wireless Notebook 
 Presenter Mouse 8000] on usb-:00:1d.1-1.3 Problem is when there is a 
 moderate traffic on wireless interface (300-400k/s down 50-60k/up on 
 4Mbit) mouse is really sluggish. This is when I am running ktorrent. If 
 I stop the torrent download/upload its fine again.
 P.S: If it matters I am using ipw3945 driver with 2.6.18 kernel and iwl3945 
 with the newer ones.

Hi Ismail,

it would be great if you'd be able to isolate what is the cause in a 
little more detailed way - i.e. try a different network card in the same 
machine, try a different bluetooth mouse in the same machine, try the very 
same mouse and network card in a different machine, etc.

Otherwise I am afraid this is going to be just a totally wild guessing.

Thanks,

-- 
Jiri Kosina

[PATCH] IPV6: fix source address selection

2007-09-12 Thread Jiri Kosina
From: Jiri Kosina [EMAIL PROTECTED]

[PATCH] IPV6: fix source address selection

The commit 95c385 broke proper source address selection for cases in which 
there is a address which is makred 'deprecated'. The commit mistakenly 
changed ifa-flags to ifa_result-flags (probably copy/paste error from a 
few lines above) in the 'Rule 3' address selection code.

The patch below restores the previous RFC-compliant behavior, please 
apply.

Cc: Jiri Bohac [EMAIL PROTECTED]
Cc: Petr Baudis [EMAIL PROTECTED]
Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 91ef3be..45b4c82 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1021,7 +1021,7 @@ int ipv6_dev_get_saddr(struct net_device
hiscore.rule++;
}
if (ipv6_saddr_preferred(score.addr_type) ||
-  (((ifa_result-flags 
+  (((ifa-flags 
(IFA_F_DEPRECATED|IFA_F_OPTIMISTIC)) == 0))) {
score.attrs |= IPV6_SADDR_SCORE_PREFERRED;
if (!(hiscore.attrs  
IPV6_SADDR_SCORE_PREFERRED)) {
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [374/2many] MAINTAINERS - PCNET32 NETWORK DRIVER

2007-08-13 Thread Jiri Kosina
On Sun, 12 Aug 2007, David Miller wrote:

 Ok, 374 patches is just rediculious.
 So many patches eats up an enormous amount of mailing list resources, 
 and for these patches in particular there are few reasons to split them 
 up at all.  The fact that the split up landed you at 300+ patches is a 
 very good indication of that.

Actually, I can see 546 patches up to now. Horrid.

Besides that, it's very unfortunate that every single mail of this 
ridiculously hge patch series starts a new thread (no 'References' or 
'In-Reply-To' header), so it's not possible to nuke it at once in an usual 
way.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Weird network problems with 2.6.23-rc2

2007-08-11 Thread Jiri Kosina
On Sat, 11 Aug 2007, Shish wrote:

 Something seems to have broken in 2.6.23-rc2, and I'm not sure what, or
 where I should look for further debugging. The info I have:
 
 On my 2.6.23-rc2 desktop, things run fine.
 
 On my test server, built from the same source tree, networking goes
 strange every few minutes, with the following symptoms:
 
 o) running ping against the server, the first ping goes through;
 further pings go AWOL until about icmp_seq=30, when I get 4-5 icmp
 replies (marked as DUP!), then no pings for a while, then dups, and so
 on.
 
 o) the server doesn't see ARP replies. According to tcpdump, the server
 will send eg who has 192.168.0.2? tell 192.168.0.1; the client in
 question will recieve the packet and send a response, but nothing shows
 up in the server-side tcpdump.
 
 o) after a few minutes of random network troubles, everything will work
 fine again, (ping is normal, arp replies are seen, tcp sessions work)
 for a few minutes.
 
 o) The server's dmesg shows lots of short udp packet messages
 
 o) ifdown then ifup'ing the interfaces fixes things, temporarily.
 
 Reverting to 2.6.22, everything seems to be running fine (but no lguest,
 which is what I came for :( )
 
 I've also tried with the latest code from git, the behaviour is the same
 as 2.6.23-rc2.

This needs to go to netdev, CC added.

Also, git-bisect will help a lot here to find the commit which caused the 
regression you are seeing.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [git patches] net driver updates

2007-07-11 Thread Jiri Kosina
On Tue, 10 Jul 2007, Jeff Garzik wrote:

 Various minor updates.  The only thing of note is sk98lin driver removal.
 Please pull from 'upstream-linus' branch of
 master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 
 upstream-linus

This is probably going to cause a new entry in regressions list, as at 
least Chris Stromsoe has reported bonding-related problems with skge that 
don't happen with sklin - 
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0707.1/1158.html

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-05-16 Thread Jiri Kosina
On Fri, 11 May 2007, Satyam Sharma wrote:

 (later)
 I Googled a bit to see if this problem was faced elsewhere in the kernel
 too. Saw the following commit by Ingo Molnar
 (9883a13c72dbf8c518814b6091019643cdb34429):
 - lock_sock(sock-sk);
 + local_bh_disable();
 + bh_lock_sock_nested(sock-sk);
   rc = selinux_netlbl_socket_setsid(sock, sksec-sid);
 - release_sock(sock-sk);
 + bh_unlock_sock(sock-sk);
 + local_bh_enable();
 Is it _really_ *this* simple?

Hi Satyam,

actually this *seems* to be proper solution also for our case, thanks for 
pointing this out. I will think about it once again, do some more tests 
with this locking scheme, and will let you know.

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-05-16 Thread Jiri Kosina
On Wed, 16 May 2007, Marcel Holtmann wrote:

 since Jiri has a good test case for it, I leave it to him for testing.
 If he confirms that this fixes the locking issues, then this is
 Signed-off-by: Marcel Holtmann [EMAIL PROTECTED]

I will verify later this evening and will let you know. I am however 
pretty convinced now that this is the right fix.

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-05-16 Thread Jiri Kosina
On Wed, 16 May 2007, Jiri Kosina wrote:

  since Jiri has a good test case for it, I leave it to him for testing.
  If he confirms that this fixes the locking issues, then this is
  Signed-off-by: Marcel Holtmann [EMAIL PROTECTED]
 I will verify later this evening and will let you know. I am however 
 pretty convinced now that this is the right fix.

Satyam,

I have just verified that this locking scheme is indeed correct. So you 
can add

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

if you wish to, and submit the patch to Andrew.

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-05-16 Thread Jiri Kosina
On Wed, 16 May 2007, David Miller wrote:

  I have just verified that this locking scheme is indeed correct. So you 
  can add
  
  Signed-off-by: Jiri Kosina [EMAIL PROTECTED]
  
  if you wish to, and submit the patch to Andrew.
 I guess I don't get sent networking patches any more?
 :-)

Well, this is bluetooth-specific, but it seemed to me that Marcel wasn't 
going to send pull requests to Linus any time soon, therefore I thought 
going through akpm is a thing to do.

Honestly, I really don't care through which tree this goes in, so sorry if 
any offence was caused here :)

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Getting make net/built-in.o Error with 2.6.21.1 Build

2007-05-08 Thread Jiri Kosina
On Tue, 8 May 2007, Satyam Sharma wrote:

 Sure, my aim here was to only solve the _build breakage_ by fixing the 
 Kconfig for this module (that used code from another kernel module 
 without listing it in its dependencies). If, as you say, the real 
 solution is that we should actually be taking out the offending call to 
 the other module itself, then please go ahead -- I don't know much about 
 the Bluetooth / HIDP subsytem anyway.

Converting the hid-ff drivers to be also transport-independent is on my 
TODO list, but it didn't happen yet.

Marcel - are you aware of any devices currently supported by USB HID 
force-feedback code, which have a bluetooth version, please? 

I'd propose the patch below, until I make the usbhid force-feedback code 
transport independent. Thanks.



From: Jiri Kosina [EMAIL PROTECTED]

[Bluetooth] HIDP - don't initialize force feedback

The current implementation of force feedback for HID devices is 
USB-transport only and therefore calling hid_ff_init() from hidp code is 
not going to work (plus it creates unwanted dependency of hidp on usbhid). 
Remove the hid_ff_init() until either the hid-ff is made 
transport-independent, or at least support for bluetooth transport is 
added.

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index d342e89..3e77e81 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -737,10 +737,8 @@ static inline void hidp_setup_hid(struct
list_for_each_entry(report, 
hid-report_enum[HID_FEATURE_REPORT].report_list, list)
hidp_send_report(session, report);
 
-   if (hidinput_connect(hid) == 0) {
+   if (hidinput_connect(hid) == 0)
hid-claimed |= HID_CLAIMED_INPUT;
-   hid_ff_init(hid);
-   }
 }
 
 int hidp_add_connection(struct hidp_connadd_req *req, struct socket 
*ctrl_sock, struct socket *intr_sock)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Getting make net/built-in.o Error with 2.6.21.1 Build

2007-05-08 Thread Jiri Kosina
On Tue, 8 May 2007, Marcel Holtmann wrote:

  Marcel - are you aware of any devices currently supported by USB HID 
  force-feedback code, which have a bluetooth version, please?
 I haven't looked at all details for the PS3 controller, but that might
 be the first one. In theory they can and at some point they will enter
 the market.

You are right, PS3 controller is going to be shipped in both variants. On 
the other hand it is perfectly possible that we will need special 
force-feedback driver for it anyway.

BTW when talking about this - we already have PS3 quirk present in usb hid 
(extra control URB is required to make it operational), probably something 
similar will be needed for BT version too.

  From: Jiri Kosina [EMAIL PROTECTED]
  [Bluetooth] HIDP - don't initialize force feedback
  The current implementation of force feedback for HID devices is 
  USB-transport only and therefore calling hid_ff_init() from hidp code is 
  not going to work (plus it creates unwanted dependency of hidp on usbhid). 
  Remove the hid_ff_init() until either the hid-ff is made 
  transport-independent, or at least support for bluetooth transport is 
  added.
  Signed-off-by: Jiri Kosina [EMAIL PROTECTED]
 Signed-off-by: Marcel Holtmann [EMAIL PROTECTED]
 Under the condition that you remember to put it back once a generic FF
 exists.

Sure. I will take this through my tree then, thanks.

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.21-rc7: BUG: sleeping function called from invalid context at net/core/sock.c:1523

2007-04-24 Thread Jiri Kosina
On Tue, 24 Apr 2007, Herbert Xu wrote:

  Hmm, *sigh*. I guess the patch below fixes the problem, but it is a 
  masterpiece in the field of ugliness. And I am not sure whether it is 
  completely correct either. Are there any immediate ideas for better 
  solution with respect to how struct sock locking works?
 Please cc such patches to netdev.  Thanks.

Hi Herbert,

well it's pretty much bluetooth-specific, and bluez-devel was CCed, but 
OK.

  diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
  index 71f5cfb..c5c93cd 100644
  --- a/net/bluetooth/hci_sock.c
  +++ b/net/bluetooth/hci_sock.c
  @@ -656,7 +656,10 @@ static int hci_sock_dev_event(struct notifier_block 
  *this, unsigned long event,
 /* Detach sockets from device */
 read_lock(hci_sk_list.lock);
 sk_for_each(sk, node, hci_sk_list.head) {
  -   lock_sock(sk);
  +   if (in_atomic())
  +   bh_lock_sock(sk);
  +   else
  +   lock_sock(sk);
 
 This doesn't do what you think it does.  bh_lock_sock can still succeed
 even with lock_sock held by someone else.

I know, this was precisely the reason why I converted the bh_lock_sock() 
to lock_sock() here some time ago (as it was racy with 
l2cap_connect_cfm()).

 Does this need to occur immediately when an event occurs? If not I'd
 suggest moving this into a workqueue.

Will have to check whether this will be processed properly in time when 
going to suspend.

Thanks,

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: r8169 problems on x86_64 (transmit timeout crash)

2007-01-26 Thread Jiri Kosina
On Fri, 26 Jan 2007, mirek kratochvil wrote:

  In case you have BIOS 04.., try to upgrade to 06.. version. Also, does 
  it help when you boot with acpi=off kernel commandline parameter? (do 
  you compile kernel with both acpi and apic support?).
 Upgraded to bios 0802, but still the same problem. I think it's 
 definitely something about the r8169 driver. acpi=off doesn't help. 
 turning on/off apic also doesn't matter.

Added some relevant CCs; the thread started in lkml at 
http://lkml.org/lkml/2007/1/25/268

 one thing more - have confirmed that the bug is reproducible on *any* 
 laptop with 8168, not only Asus. I guess the disk-usage-crash was some 
 other bug.

Confusing ... the other day you stated that r1000 has the exactly same 
problem?

Thanks,

-- 
Jiri Kosina
SUSE Labs
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
[PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() 
properly

Fix missing handling of pci_enable_device() return value in skge_resume() 

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

--- 

 drivers/net/sk98lin/skge.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index 99e9262..e12fb62 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   if ((ret = pci_enable_device(pdev))) {
+   printk(KERN_ERR sk98lin: Cannot enable PCI device during 
resume\n);
+   unregister_netdev(dev);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Stephen Hemminger wrote:

  pci_set_power_state(pdev, PCI_D0);
  pci_restore_state(pdev);
  -   pci_enable_device(pdev);
  +   if ((ret = pci_enable_device(pdev))) {
  +   printk(KERN_ERR sk98lin: Cannot enable PCI device during 
  resume\n);
  +   unregister_netdev(dev);
 
 Having the device unregister seems harsh.

What would be the proper way? As the initialization failed, accessing the 
device would not make sense any more (therefore I don't think that calling 
skge_remove_one() would be OK, as it issues calls to SkEventQueue() and 
SkEventDispatcher(), trying to send something to the card).

 Why put condtional on same line?

Pardon me?

 Why not print device name dev-name.

Thanks.

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

--- 

 drivers/net/sk98lin/skge.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..1f03cf8 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,11 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   if ((ret = pci_enable_device(pdev))) {
+   printk(KERN_ERR sk98lin: Cannot enable PCI device %s during 
resume\n, 
+   dev-name);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Stephen Hemminger wrote:

   Having the device unregister seems harsh.
  What would be the proper way? As the initialization failed, accessing 
  the device would not make sense any more (therefore I don't think that 
  calling skge_remove_one() would be OK, as it issues calls to 
  SkEventQueue() and SkEventDispatcher(), trying to send something to 
  the card).
 I guess, its just not clear what the state of the machine is anyway
 if you can't enable the device something is hosed (or the device was
 hot removed).

Well, it depends on definition of 'hot'. What would for example happen in 
the case suspend-to-disk - remove the card when the machine is switched 
off - resume-from-disk? I guess that exactly this pci_enable_device() 
will fail, so we definitely have to handle this case, as it can easily 
happen.

   Why put condtional on same line?
  Pardon me?
 I prefer:
   ret = pci_enable_device(pdev);

As you wish. 

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

---

 drivers/net/sk98lin/skge.c |8 +++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..d691811 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,13 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_ERR sk98lin: Cannot enable PCI device %s during 
resume\n, 
+   dev-name);
+   unregister_netdev(dev);
+   return ret;
+   }
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sk98lin: handle pci_enable_device() return value in skge_resume() properly

2006-10-12 Thread Jiri Kosina
On Thu, 12 Oct 2006, Andrew Morton wrote:

  pci_set_power_state(pdev, PCI_D0);
  pci_restore_state(pdev);
  -   pci_enable_device(pdev);
  +   ret = pci_enable_device(pdev);
  +   if (ret) {
  +   printk(KERN_ERR sk98lin: Cannot enable PCI device %s during 
  resume\n, 
  +   dev-name);
  +   unregister_netdev(dev);
 This looks rather wrong - skge_exit() will run unregister_netdev() again.

You are of course right (the problem was also spotted by Russell King). 
This I believe is the correct one for the sk98lin case.

[PATCH] fix sk98lin driver, ignoring return value from pci_enable_device()

add check of return value to _resume() function of sk98lin driver.

Signed-off-by: Jiri Kosina [EMAIL PROTECTED]

---

 drivers/net/sk98lin/skge.c |   20 +++-
 1 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/sk98lin/skge.c b/drivers/net/sk98lin/skge.c
index d4913c3..3a9323d 100644
--- a/drivers/net/sk98lin/skge.c
+++ b/drivers/net/sk98lin/skge.c
@@ -5070,7 +5070,12 @@ static int skge_resume(struct pci_dev *p
 
pci_set_power_state(pdev, PCI_D0);
pci_restore_state(pdev);
-   pci_enable_device(pdev);
+   ret = pci_enable_device(pdev);
+   if (ret) {
+   printk(KERN_WARNING sk98lin: unable to enable device %s in 
resume\n,
+   dev-name);
+   goto out_err;
+   }   
pci_set_master(pdev);
if (pAC-GIni.GIMacsFound == 2)
ret = request_irq(dev-irq, SkGeIsr, IRQF_SHARED, sk98lin, 
dev);
@@ -5078,10 +5083,8 @@ static int skge_resume(struct pci_dev *p
ret = request_irq(dev-irq, SkGeIsrOnePort, IRQF_SHARED, 
sk98lin, dev);
if (ret) {
printk(KERN_WARNING sk98lin: unable to acquire IRQ %d\n, 
dev-irq);
-   pAC-AllocFlag = ~SK_ALLOC_IRQ;
-   dev-irq = 0;
-   pci_disable_device(pdev);
-   return -EBUSY;
+   ret = -EBUSY;
+   goto out_err;
}
 
netif_device_attach(dev);
@@ -5098,6 +5101,13 @@ static int skge_resume(struct pci_dev *p
}
 
return 0;
+out_err:
+   pAC-AllocFlag = ~SK_ALLOC_IRQ;
+   dev-irq = 0;
+   pci_disable_device(pdev);
+
+   return ret;
+
 }
 #else
 #define skge_suspend NULL

-- 
Jiri Kosina
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html