Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Cong Wang
On Tue, Oct 11, 2016 at 2:28 AM, Krister Johansen
 wrote:
> On Wed, Oct 05, 2016 at 11:01:38AM -0700, Cong Wang wrote:
>> Does the attached patch make any sense now? Our pernet init doesn't
>> rely on act_base, so even we have some race, the worst case is after
>> we initialize the pernet netns for an action but its ops still not
>> visible, which seems fine (at least no crash).
>
> I tried to reproduce the panic with this latest patch, but I am unable
> to do so.  The one difference I notice between this patch, and the one I

Nice, so the crash is fixed. I will send out my patch formally.

> sent to the list, is that with yours it takes much longer before we get
> any output from the simultaneous launch of these containers.  Presumably
> that's the extra latency added by allowing many extra modprobe calls to
> get spawned by request_module().

This is a different problem. When we register a pernet ops, the ops->init()
will be called on each container to initialize its pernet data structures,
this is why request_module() blocks on waiting for register_pernet_subsys()
to finish. As we discussed earlier, this could be solved or workaround by
loading modules prior to creating containers. Or if this really needs to be
fixed, it should be in register_pernet_subsys(), since it is not specific to
tc actions, we have so many places loading modules at run-time in
networking subsystem.

Thanks for testing!


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Cong Wang
On Sat, Oct 8, 2016 at 11:13 PM, Krister Johansen
 wrote:
> Hi Cong,
> Thanks for the follow-up.
>
> On Thu, Oct 06, 2016 at 12:01:15PM -0700, Cong Wang wrote:
>> On Wed, Oct 5, 2016 at 11:11 PM, Krister Johansen
>> > pernet_operations pointer.  The code in register_pernet_subsys() makes
>> > no attempt to check for duplicates.  If we add a pointer that's already
>> > in the list, and subsequently call unregister, the results seem
>> > undefined.  It looks like we'll remove the pernet_operations for the
>> > existing action, assuming we don't corrupt the list in the process.
>> >
>> > Is this actually safe?  If so, what corner case is the act->type /
>> > act->kind protecting us from?
>>
>> ops->type and ops->kind should be unique too, user-space already
>> relies on this (tc action ls action xxx). The code exists probably just
>> for sanity check.
>
> With that in mind, would it make sense to change the check to a WARN/BUG
> or some kind of assertion?  I mistakenly inferred that it was possible to
> legtimately end up in this scenario.


Yes, it makes sense to me.


>> > Part of the desire to inhibit extra modprobe calls is that if hundreds
>> > of these all start at once on boot, it's really unnecessary to have all
>> > of the rest of them wait while lots of extra modprobe calls are forked
>> > by the kernel.
>>
>> You can tell systemd to load these modules before starting these
>> containers to avoid blocking, no?
>
> That was exactly what I did to work around the panic until I was able to
> get a patch together.  The preload of the modules is still occurring,
> but I was hoping to excise that workaround entirely.

Or you can compile these modules into kernel, but I am not sure about
the dependencies. :-D

Thanks.


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-11 Thread Krister Johansen
On Wed, Oct 05, 2016 at 11:01:38AM -0700, Cong Wang wrote:
> Does the attached patch make any sense now? Our pernet init doesn't
> rely on act_base, so even we have some race, the worst case is after
> we initialize the pernet netns for an action but its ops still not
> visible, which seems fine (at least no crash).

I tried to reproduce the panic with this latest patch, but I am unable
to do so.  The one difference I notice between this patch, and the one I
sent to the list, is that with yours it takes much longer before we get
any output from the simultaneous launch of these containers.  Presumably
that's the extra latency added by allowing many extra modprobe calls to
get spawned by request_module().

-K


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-08 Thread Krister Johansen
Hi Cong,
Thanks for the follow-up.

On Thu, Oct 06, 2016 at 12:01:15PM -0700, Cong Wang wrote:
> On Wed, Oct 5, 2016 at 11:11 PM, Krister Johansen
> > pernet_operations pointer.  The code in register_pernet_subsys() makes
> > no attempt to check for duplicates.  If we add a pointer that's already
> > in the list, and subsequently call unregister, the results seem
> > undefined.  It looks like we'll remove the pernet_operations for the
> > existing action, assuming we don't corrupt the list in the process.
> >
> > Is this actually safe?  If so, what corner case is the act->type /
> > act->kind protecting us from?
> 
> ops->type and ops->kind should be unique too, user-space already
> relies on this (tc action ls action xxx). The code exists probably just
> for sanity check.

With that in mind, would it make sense to change the check to a WARN/BUG
or some kind of assertion?  I mistakenly inferred that it was possible to
legtimately end up in this scenario.

> So please give that patch a try, let's see if we miss any other problem.

Will do.  I have not forgotten.  I hope to have results for you in a few
days.

> > Part of the desire to inhibit extra modprobe calls is that if hundreds
> > of these all start at once on boot, it's really unnecessary to have all
> > of the rest of them wait while lots of extra modprobe calls are forked
> > by the kernel.
> 
> You can tell systemd to load these modules before starting these
> containers to avoid blocking, no?

That was exactly what I did to work around the panic until I was able to
get a patch together.  The preload of the modules is still occurring,
but I was hoping to excise that workaround entirely.

Thanks,

-K



Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-06 Thread Cong Wang
On Wed, Oct 5, 2016 at 11:11 PM, Krister Johansen
 wrote:
>
> I'm not sure.  The reason I didn't take this approach from the outset is
> that all of TC's callers of tcf_register_action pass a pointer to a
> static structure as their *ops argument.  The existence of code that
> checks the action for uniqueness suggests that it's possible for
> tcf_register_action to get passed two identical tc_action_ops.  If that
> happens in the current code base, we'll also get passed a duplicate

Each tc action module has its own unique ops, and kernel doesn't allow
one module to register twice (either in parallel or not, see
add_unformed_module()), so we should not have a duplicated case.


> pernet_operations pointer.  The code in register_pernet_subsys() makes
> no attempt to check for duplicates.  If we add a pointer that's already
> in the list, and subsequently call unregister, the results seem
> undefined.  It looks like we'll remove the pernet_operations for the
> existing action, assuming we don't corrupt the list in the process.
>
> Is this actually safe?  If so, what corner case is the act->type /
> act->kind protecting us from?

ops->type and ops->kind should be unique too, user-space already
relies on this (tc action ls action xxx). The code exists probably just
for sanity check.

So please give that patch a try, let's see if we miss any other problem.

>
>> (Sorry that I don't have the environment to reproduce your bug)
>
> I'm sorry that I didn't do a good job of explaining how we end up in
> this situation in the first place.  I can give a few more details,
> because it may explain some of my concern about the request_module()
> call.
>
> The system that encounters this bug launches a bunch of containers from
> systemd on boot.  Each container creates a new user, net, pid, and mount
> namespace and begins its setup.  When the networking in all of these
> containers, each in a new netns, try to configure TC and no modules are
> loaded we end up with this race.
>
> I can also reproduce by unloading the modules, and then launching a
> bunch of processes that configure tc in new namespaces.
>
> Part of the desire to inhibit extra modprobe calls is that if hundreds
> of these all start at once on boot, it's really unnecessary to have all
> of the rest of them wait while lots of extra modprobe calls are forked
> by the kernel.

You can tell systemd to load these modules before starting these
containers to avoid blocking, no?

Thanks.


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-05 Thread Krister Johansen
On Wed, Oct 05, 2016 at 11:01:38AM -0700, Cong Wang wrote:
> On Tue, Oct 4, 2016 at 11:52 PM, Krister Johansen
>  wrote:
> > On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
> >> Please try the attached patch. I also convert the read path to RCU
> >> to avoid a possible deadlock. A quick test shows no lockdep splat.
> >
> > I tried this patch, but it doesn't solve the problem.  I got a panic on
> > my very first try:
> 
> Thanks for testing it.

Absolutely; thanks for helping to try to simplify this fix.

> > The problem here is the same as before: by using RCU the race isn't
> > fixed because the module is still discoverable from act_base before the
> > pernet initialization is completed.
> >
> > You can see from the trap frame that the first two arguments to
> > tcf_hash_check were 0.  It couldn't look up the correct per-subsystem
> > pointer because the id hadn't yet been registered.
> 
> I thought the problem is that we don't do pernet ops registration and
> action ops registration atomically therefore chose to use mutex+RCU,
> but I was wrong, the problem here is just ordering, we need to finish
> the pernet initialization before making action ops visible.
> 
> If so, why not just reorder them? Does the attached patch make any
> sense now? Our pernet init doesn't rely on act_base, so even we have
> some race, the worst case is after we initialize the pernet netns for an
> action but its ops still not visible, which seems fine (at least no crash).
> 
> Or I still miss something here?

I'm not sure.  The reason I didn't take this approach from the outset is
that all of TC's callers of tcf_register_action pass a pointer to a
static structure as their *ops argument.  The existence of code that
checks the action for uniqueness suggests that it's possible for
tcf_register_action to get passed two identical tc_action_ops.  If that
happens in the current code base, we'll also get passed a duplicate
pernet_operations pointer.  The code in register_pernet_subsys() makes
no attempt to check for duplicates.  If we add a pointer that's already
in the list, and subsequently call unregister, the results seem
undefined.  It looks like we'll remove the pernet_operations for the
existing action, assuming we don't corrupt the list in the process.

Is this actually safe?  If so, what corner case is the act->type /
act->kind protecting us from?

> (Sorry that I don't have the environment to reproduce your bug)

I'm sorry that I didn't do a good job of explaining how we end up in
this situation in the first place.  I can give a few more details,
because it may explain some of my concern about the request_module()
call.

The system that encounters this bug launches a bunch of containers from
systemd on boot.  Each container creates a new user, net, pid, and mount
namespace and begins its setup.  When the networking in all of these
containers, each in a new netns, try to configure TC and no modules are
loaded we end up with this race.

I can also reproduce by unloading the modules, and then launching a
bunch of processes that configure tc in new namespaces.

Part of the desire to inhibit extra modprobe calls is that if hundreds
of these all start at once on boot, it's really unnecessary to have all
of the rest of them wait while lots of extra modprobe calls are forked
by the kernel.

> Thanks for your patience and testing!

Thank you for taking the time to look through the fix and discuss
alternatives.

-K


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-05 Thread Cong Wang
On Wed, Oct 5, 2016 at 11:01 AM, Cong Wang  wrote:
> On Tue, Oct 4, 2016 at 11:52 PM, Krister Johansen
>  wrote:
>> On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
>>> Please try the attached patch. I also convert the read path to RCU
>>> to avoid a possible deadlock. A quick test shows no lockdep splat.
>>
>> I tried this patch, but it doesn't solve the problem.  I got a panic on
>> my very first try:
>
> Thanks for testing it.
>
>
>> The problem here is the same as before: by using RCU the race isn't
>> fixed because the module is still discoverable from act_base before the
>> pernet initialization is completed.
>>
>> You can see from the trap frame that the first two arguments to
>> tcf_hash_check were 0.  It couldn't look up the correct per-subsystem
>> pointer because the id hadn't yet been registered.
>
> I thought the problem is that we don't do pernet ops registration and
> action ops registration atomically therefore chose to use mutex+RCU,
> but I was wrong, the problem here is just ordering, we need to finish
> the pernet initialization before making action ops visible.
>
> If so, why not just reorder them? Does the attached patch make any
> sense now? Our pernet init doesn't rely on act_base, so even we have
> some race, the worst case is after we initialize the pernet netns for an
> action but its ops still not visible, which seems fine (at least no crash).

BTW, I should remove the 'if' check for unregister_pernet_subsys() in
tcf_unregister_action()... Otherwise the error path doesn't work. ;-)


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-05 Thread Cong Wang
On Tue, Oct 4, 2016 at 11:52 PM, Krister Johansen
 wrote:
> On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
>> Please try the attached patch. I also convert the read path to RCU
>> to avoid a possible deadlock. A quick test shows no lockdep splat.
>
> I tried this patch, but it doesn't solve the problem.  I got a panic on
> my very first try:

Thanks for testing it.


> The problem here is the same as before: by using RCU the race isn't
> fixed because the module is still discoverable from act_base before the
> pernet initialization is completed.
>
> You can see from the trap frame that the first two arguments to
> tcf_hash_check were 0.  It couldn't look up the correct per-subsystem
> pointer because the id hadn't yet been registered.

I thought the problem is that we don't do pernet ops registration and
action ops registration atomically therefore chose to use mutex+RCU,
but I was wrong, the problem here is just ordering, we need to finish
the pernet initialization before making action ops visible.

If so, why not just reorder them? Does the attached patch make any
sense now? Our pernet init doesn't rely on act_base, so even we have
some race, the worst case is after we initialize the pernet netns for an
action but its ops still not visible, which seems fine (at least no crash).

Or I still miss something here?

(Sorry that I don't have the environment to reproduce your bug)

Thanks for your patience and testing!
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index d09d068..6024920 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -341,22 +341,21 @@ int tcf_register_action(struct tc_action_ops *act,
if (!act->act || !act->dump || !act->init || !act->walk || !act->lookup)
return -EINVAL;
 
+   ret = register_pernet_subsys(ops);
+   if (ret)
+   return ret;
+
write_lock(&act_mod_lock);
list_for_each_entry(a, &act_base, head) {
if (act->type == a->type || (strcmp(act->kind, a->kind) == 0)) {
write_unlock(&act_mod_lock);
+   unregister_pernet_subsys(ops);
return -EEXIST;
}
}
list_add_tail(&act->head, &act_base);
write_unlock(&act_mod_lock);
 
-   ret = register_pernet_subsys(ops);
-   if (ret) {
-   tcf_unregister_action(act, ops);
-   return ret;
-   }
-
return 0;
 }
 EXPORT_SYMBOL(tcf_register_action);
@@ -367,8 +366,6 @@ int tcf_unregister_action(struct tc_action_ops *act,
struct tc_action_ops *a;
int err = -ENOENT;
 
-   unregister_pernet_subsys(ops);
-
write_lock(&act_mod_lock);
list_for_each_entry(a, &act_base, head) {
if (a == act) {
@@ -378,6 +375,8 @@ int tcf_unregister_action(struct tc_action_ops *act,
}
}
write_unlock(&act_mod_lock);
+   if (!err)
+   unregister_pernet_subsys(ops);
return err;
 }
 EXPORT_SYMBOL(tcf_unregister_action);


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-04 Thread Krister Johansen
On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
> Please try the attached patch. I also convert the read path to RCU
> to avoid a possible deadlock. A quick test shows no lockdep splat.

I tried this patch, but it doesn't solve the problem.  I got a panic on
my very first try:

  SYSTEM MAP: /boot/System.map-4.7.5-1.el7.x86_64
DEBUG KERNEL: /usr/lib/debug/lib/modules/4.7.5-1.el7.x86_64/vmlinux 
(4.7.5-1.el7.x86_64)
DUMPFILE: ./vmcore  [PARTIAL DUMP]
CPUS: 4
DATE: Tue Oct  4 22:43:45 2016
  UPTIME: 00:02:32
LOAD AVERAGE: 0.20, 0.14, 0.06
   TASKS: 1637
 RELEASE: 4.7.5-1.el7.x86_64
 VERSION: #1 SMP Tue Oct 4 21:58:08 PDT 2016
 MACHINE: x86_64  (3392 Mhz)
  MEMORY: 4 GB
   PANIC: "BUG: unable to handle kernel NULL pointer dereference at 
  (null)"
 PID: 20151
 COMMAND: "test"
TASK: 880138f89480  [THREAD_INFO: 88009f45]
 CPU: 2
   STATE: TASK_RUNNING (PANIC)

PID: 20151  TASK: 880138f89480  CPU: 2   COMMAND: "test"
 #0 [88009f453368] machine_kexec at 8105c2db
 #1 [88009f4533c8] __crash_kexec at 8ca2
 #2 [88009f453498] crash_kexec at 8d9c
 #3 [88009f4534b8] oops_end at 810309d1
 #4 [88009f4534e0] no_context at 8106a5de
 #5 [88009f453540] __bad_area_nosemaphore at 8106a92e
 #6 [88009f453590] bad_area at 81081b45
 #7 [88009f4535b8] __do_page_fault at 8106b42d
 #8 [88009f453620] trace_do_page_fault at 8106b5a3
 #9 [88009f453658] do_async_page_fault at 81063dda
#10 [88009f453670] async_page_fault at 81735af8
[exception RIP: tcf_hash_check+18]
RIP: 81649772  RSP: 88009f453720  RFLAGS: 00010246
RAX:   RBX: 0001  RCX: 0001
RDX: 8801297ee440  RSI:   RDI: 
RBP: 88009f453738   R8: a04020f0   R9: 88009f453770
R10: 8164a6a1  R11: 880129621f00  R12: 8801297ee440
R13: 8800ba0f9940  R14: 880093cd3e6c  R15: 
ORIG_RAX:   CS: 0010  SS: 0018
#11 [88009f453740] tcf_mirred_init at a04011cc [act_mirred]
#12 [88009f4537c8] tcf_action_init_1 at 8164a7c9
#13 [88009f453868] tcf_action_init at 8164a881
#14 [88009f4539c0] tcf_exts_validate at 8164868a
#15 [88009f4539e8] u32_set_parms at a04153f2 [cls_u32]
#16 [88009f453a60] u32_change at a0416bc4 [cls_u32]
#17 [88009f453b50] tc_ctl_tfilter at 8164900d
#18 [88009f453c48] rtnetlink_rcv_msg at 8162b8f4
#19 [88009f453cc0] netlink_rcv_skb at 81650b77
#20 [88009f453ce8] rtnetlink_rcv at 8162b848
#21 [88009f453d00] netlink_unicast at 81650531
#22 [88009f453d48] netlink_sendmsg at 8165091e
#23 [88009f453dc8] sock_sendmsg at 815fcae8
#24 [88009f453de8] SYSC_sendto at 815fcfa2
#25 [88009f453f18] sys_sendto at 815fdb4e
#26 [88009f453f28] do_syscall_64 at 81003b12
RIP: 004cbfba  RSP: 00c8200a4eb8  RFLAGS: 0212
RAX: ffda  RBX: 00c82180  RCX: 004cbfba
RDX: 008c  RSI: 00c82007a900  RDI: 0007
RBP:    R8: 00c8200fd844   R9: 000c
R10:   R11: 0212  R12: 00c82007a900
R13: 0003  R14: 0012492492492492  R15: 0039
ORIG_RAX: 002c  CS: 0033  SS: 002b

The problem here is the same as before: by using RCU the race isn't
fixed because the module is still discoverable from act_base before the
pernet initialization is completed.

You can see from the trap frame that the first two arguments to
tcf_hash_check were 0.  It couldn't look up the correct per-subsystem
pointer because the id hadn't yet been registered.

> diff --git a/net/sched/act_api.c b/net/sched/act_api.c
> index d09d068..4aac846 100644
> --- a/net/sched/act_api.c
> +++ b/net/sched/act_api.c
>  int tcf_register_action(struct tc_action_ops *act,
>   struct pernet_operations *ops)
> @@ -341,22 +356,23 @@ int tcf_register_action(struct tc_action_ops *act,
>   if (!act->act || !act->dump || !act->init || !act->walk || !act->lookup)
>   return -EINVAL;
>  
> - write_lock(&act_mod_lock);
> + mutex_lock(&act_mod_lock);
>   list_for_each_entry(a, &act_base, head) {
>   if (act->type == a->type || (strcmp(act->kind, a->kind) == 0)) {
> - write_unlock(&act_mod_lock);
> + mutex_unlock(&act_mod_lock);
>   return -EEXIST;
>   }
>   }
> - list_add_tail(&act->head, &act_base);
> - write_unlock(&act_mod_lock);
> + list_add_tail_rcu(&act->head, &act_base);
>  
>   ret = register_pernet_subsys(ops);
> 

Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-03 Thread Krister Johansen
Hi Cong,

Thanks for the feedback.

On Mon, Oct 03, 2016 at 11:22:33AM -0700, Cong Wang wrote:
> On Sat, Oct 1, 2016 at 8:13 PM, Krister Johansen
>  wrote:
> > A tc_action_ops structure is visibile as soon as it is placed in the
> > act_base list.  When tcf_regsiter_action adds an item to this list and
> > drops act_mod_lock, registration is not complete until
> > register_pernet_subsys() finishes.
> 
> Hmm, good catch, but does the fix have to be so complicated?

There were two reasons that the patch I submitted was more complicated
than your proposal.  The first is simply my own lack of knowledge.  I
didn't see many other net subsystems that held locks across the call to
register_pernet_subsys().  I avoided doing so out of caution / paranoia.

The other reason for blocking if a register_pernet_subsys() was already
pending is the behavior of this code when the lookup fails.  The code in
tcf_action_init_1() calls request_module() when tc_lookup_action_n()
fails.  In the cases that I observed, this could lead to hundreds
modprobe processes running for essentially the same few modules.  Only
one of these calls will succeed.

Since the call to request_module() will sleep until the modprobe process
exits, it didn't seem unreasonable to block other threads in the same
code path.  Instead of blocking on a redundant modprobe call, it blocks
pending the completion of a modprobe that's already in progress.

I admit that the patch I submitted didn't close this window entirely,
but in the tests that I ran I was able to see the number of concurrent
modprobe processes go from dozens down to just a few.

> How about moving register_pernet_subsys() under act_mod_lock?
> Similar is needed for unregister too of course. This also means
> we need to convert act_mod_lock to a mutex which allows blocking.
> Fortunately, we don't have to take act_mod_lock in any atomic context.

If it's permissible to hold act_mod_lock across the call to
register_pernet_subsys, then perhaps this could instead be simplified to
use mutex_lock_interruptible() instead of RCU locking.  The blocking
lock would prevent other operations from triggering a modprobe until
the outstanding load completes.  However, the downside is that any
request_module() would block all other lookups.  My attempt to get
around that problem was to record on the action ops whether a pernet
operation was in progress.

> Please try the attached patch. I also convert the read path to RCU
> to avoid a possible deadlock. A quick test shows no lockdep splat.

I'll give it a try, but it may take me a few days to report back with
results.  In the meantime, let's try to reach consensus on an acceptable
solution.

Thanks again,

-K


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-03 Thread Krister Johansen
On Sun, Oct 02, 2016 at 09:18:06PM -0400, Jamal Hadi Salim wrote:
> On 16-10-01 11:13 PM, Krister Johansen wrote:
> >A tc_action_ops structure is visibile as soon as it is placed in the
> >act_base list.  When tcf_regsiter_action adds an item to this list and
> >drops act_mod_lock, registration is not complete until
> >register_pernet_subsys() finishes.
> >
> >If two threads attempt to modify a tc action in a way that triggers a
> >module load, the thread that wins the race ends up defeferencing a NULL
> >pointer after tcf_action_init_1() invokes a_o->init().  In the
> >particular case that this submitter encountered, the panic occurred in
> >tcf_gact_init() when net_generic() returned a NULL tc_action_net
> >pointer.  The gact_net_id needed to fetch the correct pointer was not
> >yet set, because the register_pernet_subsys() call was pending in
> >another thread.
> >
> >Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc 
> >actions")
> >Signed-off-by: Krister Johansen 
> 
> Looks reasonable to me but will let Cong a closer look since he added
> that code.

Thanks, I appreicate you taking a look.  I'll follow up with Cong.

-K


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-03 Thread Cong Wang
On Sat, Oct 1, 2016 at 8:13 PM, Krister Johansen
 wrote:
> A tc_action_ops structure is visibile as soon as it is placed in the
> act_base list.  When tcf_regsiter_action adds an item to this list and
> drops act_mod_lock, registration is not complete until
> register_pernet_subsys() finishes.

Hmm, good catch, but does the fix have to be so complicated?

How about moving register_pernet_subsys() under act_mod_lock?
Similar is needed for unregister too of course. This also means
we need to convert act_mod_lock to a mutex which allows blocking.
Fortunately, we don't have to take act_mod_lock in any atomic context.

Please try the attached patch. I also convert the read path to RCU
to avoid a possible deadlock. A quick test shows no lockdep splat.

Thanks!
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index d09d068..4aac846 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -330,7 +330,22 @@ void tcf_hashinfo_destroy(const struct tc_action_ops *ops,
 EXPORT_SYMBOL(tcf_hashinfo_destroy);
 
 static LIST_HEAD(act_base);
-static DEFINE_RWLOCK(act_mod_lock);
+static DEFINE_MUTEX(act_mod_lock);
+
+static int __tcf_unregister_action(struct tc_action_ops *act)
+{
+   struct tc_action_ops *a;
+   int err = -ENOENT;
+
+   list_for_each_entry(a, &act_base, head) {
+   if (a == act) {
+   list_del_rcu(&act->head);
+   err = 0;
+   break;
+   }
+   }
+   return err;
+}
 
 int tcf_register_action(struct tc_action_ops *act,
struct pernet_operations *ops)
@@ -341,22 +356,23 @@ int tcf_register_action(struct tc_action_ops *act,
if (!act->act || !act->dump || !act->init || !act->walk || !act->lookup)
return -EINVAL;
 
-   write_lock(&act_mod_lock);
+   mutex_lock(&act_mod_lock);
list_for_each_entry(a, &act_base, head) {
if (act->type == a->type || (strcmp(act->kind, a->kind) == 0)) {
-   write_unlock(&act_mod_lock);
+   mutex_unlock(&act_mod_lock);
return -EEXIST;
}
}
-   list_add_tail(&act->head, &act_base);
-   write_unlock(&act_mod_lock);
+   list_add_tail_rcu(&act->head, &act_base);
 
ret = register_pernet_subsys(ops);
if (ret) {
-   tcf_unregister_action(act, ops);
+   __tcf_unregister_action(act);
+   mutex_unlock(&act_mod_lock);
return ret;
}
 
+   mutex_unlock(&act_mod_lock);
return 0;
 }
 EXPORT_SYMBOL(tcf_register_action);
@@ -364,20 +380,13 @@ EXPORT_SYMBOL(tcf_register_action);
 int tcf_unregister_action(struct tc_action_ops *act,
  struct pernet_operations *ops)
 {
-   struct tc_action_ops *a;
int err = -ENOENT;
 
+   mutex_lock(&act_mod_lock);
unregister_pernet_subsys(ops);
+   err = __tcf_unregister_action(act);
+   mutex_unlock(&act_mod_lock);
 
-   write_lock(&act_mod_lock);
-   list_for_each_entry(a, &act_base, head) {
-   if (a == act) {
-   list_del(&act->head);
-   err = 0;
-   break;
-   }
-   }
-   write_unlock(&act_mod_lock);
return err;
 }
 EXPORT_SYMBOL(tcf_unregister_action);
@@ -388,15 +397,15 @@ static struct tc_action_ops *tc_lookup_action_n(char 
*kind)
struct tc_action_ops *a, *res = NULL;
 
if (kind) {
-   read_lock(&act_mod_lock);
-   list_for_each_entry(a, &act_base, head) {
+   rcu_read_lock();
+   list_for_each_entry_rcu(a, &act_base, head) {
if (strcmp(kind, a->kind) == 0) {
if (try_module_get(a->owner))
res = a;
break;
}
}
-   read_unlock(&act_mod_lock);
+   rcu_read_unlock();
}
return res;
 }
@@ -407,15 +416,15 @@ static struct tc_action_ops *tc_lookup_action(struct 
nlattr *kind)
struct tc_action_ops *a, *res = NULL;
 
if (kind) {
-   read_lock(&act_mod_lock);
-   list_for_each_entry(a, &act_base, head) {
+   rcu_read_lock();
+   list_for_each_entry_rcu(a, &act_base, head) {
if (nla_strcmp(kind, a->kind) == 0) {
if (try_module_get(a->owner))
res = a;
break;
}
}
-   read_unlock(&act_mod_lock);
+   rcu_read_unlock();
}
return res;
 }


Re: [PATCH net] Panic when tc_lookup_action_n finds a partially initialized action.

2016-10-02 Thread Jamal Hadi Salim

On 16-10-01 11:13 PM, Krister Johansen wrote:

A tc_action_ops structure is visibile as soon as it is placed in the
act_base list.  When tcf_regsiter_action adds an item to this list and
drops act_mod_lock, registration is not complete until
register_pernet_subsys() finishes.

If two threads attempt to modify a tc action in a way that triggers a
module load, the thread that wins the race ends up defeferencing a NULL
pointer after tcf_action_init_1() invokes a_o->init().  In the
particular case that this submitter encountered, the panic occurred in
tcf_gact_init() when net_generic() returned a NULL tc_action_net
pointer.  The gact_net_id needed to fetch the correct pointer was not
yet set, because the register_pernet_subsys() call was pending in
another thread.

Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions")
Signed-off-by: Krister Johansen 


Looks reasonable to me but will let Cong a closer look since he added
that code.

cheers,
jamal