Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route

2017-07-19 Thread David Ahern
On 7/19/17 10:17 AM, Ido Schimmel wrote:
> I did exactly that in the beginning, but it didn't sit right with me for
> the exact reason you mentioned - it can be a PITA to debug.
> 
> If we use rt6i_ref for something other than FIB references, then it
> breaks existing code that relies on rt6i_ref being 0 to indicate it's
> no longer used by the FIB. A non-zero value can now mean "not used by
> the FIB, but waiting for some module to drop the reference in its
> workqueue".
> 
> The BUG_ON() mentioned in the commit message is just one example.
> Another check was added by you in commit 8048ced9b.
> 
> So I think we both want the same thing, but I'm not sure how your
> approach is safer.

A single reference counter rt6i_ref is best.

There are 2 reads of that counter to determine if the rt is still in the
FIB. Both of those stem from side effects of using the 'lo' for the
device for host addresses. I think an explicit flag can be used for that
purpose instead of trying to deduce it from the reference counter. The
commit you referenced copied what is done in init_loopback for
consistency (both have same end goal).


Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route

2017-07-19 Thread Ido Schimmel
On Wed, Jul 19, 2017 at 09:49:37AM -0600, David Ahern wrote:
> On 7/19/17 1:02 AM, Jiri Pirko wrote:
> > From: Ido Schimmel 
> > 
> > Listeners of the FIB notification chain are expected to be able to take
> > and release a reference on notified IPv6 routes. This is needed in the
> > case of drivers capable of offloading these routes to a capable device.
> > 
> > Since notifications are sent in an atomic context, these drivers need to
> > take a reference on the route, prepare a work item to offload the route
> > and release the reference at the end of the work.
> > 
> > Currently, rt6i_ref is used to indicate in how many FIB nodes a route
> > appears. Different code paths rely on rt6i_ref being 0 to indicate the
> > route is no longer used by the FIB.
> > 
> > For example, whenever a route is deleted or replaced, fib6_purge_rt() is
> > run to make sure the route is no longer present in intermediate nodes. A
> > BUG_ON() at the end of the function is executed in case the reference
> > count isn't 1, as it's only supposed to appear in the non-intermediate
> > node from which it's going to be deleted.
> > 
> > Instead of changing the semantics of rt6i_ref, a new reference count is
> > added, so that external users could also take a reference on routes
> > without modifying rt6i_ref.
> > 
> > To make sure external users don't release routes used by the FIB, the
> > reference count is set to 1 upon creation of a route and decremented by
> > the FIB upon rt6_release().
> > 
> > The reference count is atomic, as it's not protected by any locks and
> > placed in the 40 bytes hole after the existing rt6i_ref.
> 
> I'd rather not add another reference counter. Debugging reference leaks
> is a huge PITA now; adding another counter just makes it worse.
> 
> Why can't the BUG_ON in fib6_purge_rt be removed since there are other
> reference holders now?

I did exactly that in the beginning, but it didn't sit right with me for
the exact reason you mentioned - it can be a PITA to debug.

If we use rt6i_ref for something other than FIB references, then it
breaks existing code that relies on rt6i_ref being 0 to indicate it's
no longer used by the FIB. A non-zero value can now mean "not used by
the FIB, but waiting for some module to drop the reference in its
workqueue".

The BUG_ON() mentioned in the commit message is just one example.
Another check was added by you in commit 8048ced9b.

So I think we both want the same thing, but I'm not sure how your
approach is safer.

Thanks


Re: [patch net-next 11/17] ipv6: fib: Allow non-FIB users to take reference on route

2017-07-19 Thread David Ahern
On 7/19/17 1:02 AM, Jiri Pirko wrote:
> From: Ido Schimmel 
> 
> Listeners of the FIB notification chain are expected to be able to take
> and release a reference on notified IPv6 routes. This is needed in the
> case of drivers capable of offloading these routes to a capable device.
> 
> Since notifications are sent in an atomic context, these drivers need to
> take a reference on the route, prepare a work item to offload the route
> and release the reference at the end of the work.
> 
> Currently, rt6i_ref is used to indicate in how many FIB nodes a route
> appears. Different code paths rely on rt6i_ref being 0 to indicate the
> route is no longer used by the FIB.
> 
> For example, whenever a route is deleted or replaced, fib6_purge_rt() is
> run to make sure the route is no longer present in intermediate nodes. A
> BUG_ON() at the end of the function is executed in case the reference
> count isn't 1, as it's only supposed to appear in the non-intermediate
> node from which it's going to be deleted.
> 
> Instead of changing the semantics of rt6i_ref, a new reference count is
> added, so that external users could also take a reference on routes
> without modifying rt6i_ref.
> 
> To make sure external users don't release routes used by the FIB, the
> reference count is set to 1 upon creation of a route and decremented by
> the FIB upon rt6_release().
> 
> The reference count is atomic, as it's not protected by any locks and
> placed in the 40 bytes hole after the existing rt6i_ref.

I'd rather not add another reference counter. Debugging reference leaks
is a huge PITA now; adding another counter just makes it worse.

Why can't the BUG_ON in fib6_purge_rt be removed since there are other
reference holders now?