Re:Re: Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-13 Thread Gao Feng
At 2018-05-11 22:56:04, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> 
wrote:
>On Fri, May 11, 2018 at 10:44 AM, Gao Feng <gfree.w...@vip.163.com> wrote:
>> At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> 
>> wrote:
>>>On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.w...@vip.163.com> wrote:
>>>> At 2018-05-11 11:54:55, "Willem de Bruijn" 
>>>> <willemdebruijn.ker...@gmail.com> wrote:
>>>>>On Thu, May 10, 2018 at 4:28 AM,  <gfree.w...@vip.163.com> wrote:
>>>>>> From: Gao Feng <gfree.w...@vip.163.com>
>>>>>>
>>>>>> The skb flow limit is implemented for each CPU independently. In the
>>>>>> current codes, the function skb_flow_limit gets the softnet_data by
>>>>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>>>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>>>>> the stats of current CPU, while the skb is going to append the queue of
>>>>>> another CPU. It isn't the expected behavior.
>>>>>>
>>>>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>>>
>>>>>The local cpu softnet_data is used on purpose. The operations in
>>>>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>>>>
>>>> I think the race doesn't exist because of the rps_lock.
>>>> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
>>>
>>>Indeed, I overlooked that. There still is the matter of cache contention.
>>
>> The cache contention is really important in this case?
>> I don't think so, because the enqueue_to_backlog have touched and modified 
>> the softnet_stat
>> of target cpu.
>>
>>>
>>>>>Flow limit tries to detect large ("elephant") DoS flows with a fixed 
>>>>>four-tuple.
>>>>>These would always hit the same RPS cpu, so that cpu being backlogged
>>>>
>>>> They may hit the different target CPU when enable RFS. Because the app 
>>>> could be scheduled
>>>> to another CPU, then RFS tries to deliver the skb to latest core which has 
>>>> hot cache.
>>>
>>>This even more suggest using the initial (or IRQ) cpu to track state, instead
>>>of the destination (RPS/RFS) cpu.
>>
>> I couldn't understand why it is better to track state on initial cpu, not 
>> the target cpu.
>> The latter one could get more accurate result.
>
>For a single DoS flow with normal cpu pinned IRQs, the results will be equally
>good when tracked on the initial IRQ cpu..
>
>>
>>>
>>>>>may be an indication that such a flow is active. But the flow will also 
>>>>>always
>>>>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>>>>
>>>> The RSS couldn't make sure the irq is handled by same cpu. It would be 
>>>> balanced between
>>>> the cpus.
>>>
>>>IRQs are usually pinned to cores. Unless using something like irqbalance,
>>>but that operates at too coarse a timescale to do anything useful at Mpps
>>>packet rates.
>>
>> There are some motherboard which couldn't make sure the irq is pinned.
>> The flow_limit wouldn't work as well as expected.
>
>.. this seems to be the crux of the argument. I am not aware of any network
>interrupts that do not adhere to the cpu pinning configuration in
>
>   /proc/irq/$IRQ/smp_affinity(_list)
>

When smp_affinity is configured 0xff, I met some hardwares they deliver most of
irqs to a specific cpu, and left irqs are spread to the other cpus. And it 
couldn't 
make sure the irq of one flow is delivered to a fixed cpu.

I don't know if it is caused by apic or motherboard.
And what information you need to confirm it.

>What kind of hardware ignores this setting and sprays interrupts? I agree
>that in that case flow_limit as is may be ineffective (if migration happens
>at rates comparable to packet rates). But this should not happen?
>
>>
>>>
>>>>>on the initial CPU is also fine. There may be false positives on other CPUs
>>>>>with the same RPS destination, but that is unlikely with a highly 
>>>>>concurrent
>>>>>traffic server mix ("mice").
>>>>
>>>> If my comment is right, the flow couldn't always arrive one the same 
>>>> initial cp

Re:Re: Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-11 Thread Gao Feng
At 2018-05-11 21:23:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> 
wrote:
>On Fri, May 11, 2018 at 2:20 AM, Gao Feng <gfree.w...@vip.163.com> wrote:
>> At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> 
>> wrote:
>>>On Thu, May 10, 2018 at 4:28 AM,  <gfree.w...@vip.163.com> wrote:
>>>> From: Gao Feng <gfree.w...@vip.163.com>
>>>>
>>>> The skb flow limit is implemented for each CPU independently. In the
>>>> current codes, the function skb_flow_limit gets the softnet_data by
>>>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>>>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>>>> the stats of current CPU, while the skb is going to append the queue of
>>>> another CPU. It isn't the expected behavior.
>>>>
>>>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>>
>>>The local cpu softnet_data is used on purpose. The operations in
>>>skb_flow_limit() on sd fields could race if not executed on the local cpu.
>>
>> I think the race doesn't exist because of the rps_lock.
>> The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.
>
>Indeed, I overlooked that. There still is the matter of cache contention.

The cache contention is really important in this case?
I don't think so, because the enqueue_to_backlog have touched and modified the 
softnet_stat
of target cpu.

>
>>>Flow limit tries to detect large ("elephant") DoS flows with a fixed 
>>>four-tuple.
>>>These would always hit the same RPS cpu, so that cpu being backlogged
>>
>> They may hit the different target CPU when enable RFS. Because the app could 
>> be scheduled
>> to another CPU, then RFS tries to deliver the skb to latest core which has 
>> hot cache.
>
>This even more suggest using the initial (or IRQ) cpu to track state, instead
>of the destination (RPS/RFS) cpu.

I couldn't understand why it is better to track state on initial cpu, not the 
target cpu.
The latter one could get more accurate result.

>
>>>may be an indication that such a flow is active. But the flow will also 
>>>always
>>>arrive on the same initial cpu courtesy of RSS. So storing the lookup table
>>
>> The RSS couldn't make sure the irq is handled by same cpu. It would be 
>> balanced between
>> the cpus.
>
>IRQs are usually pinned to cores. Unless using something like irqbalance,
>but that operates at too coarse a timescale to do anything useful at Mpps
>packet rates.

There are some motherboard which couldn't make sure the irq is pinned.
The flow_limit wouldn't work as well as expected.

>
>>>on the initial CPU is also fine. There may be false positives on other CPUs
>>>with the same RPS destination, but that is unlikely with a highly concurrent
>>>traffic server mix ("mice").
>>
>> If my comment is right, the flow couldn't always arrive one the same initial 
>> cpu,  although
>> it may be sent to one same target cpu.
>>
>>>
>>>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>>>for the cpus on which traffic initially lands, not the RPS destination cpus.
>>>See also Documentation/networking/scaling.txt
>>>
>>>That said, I had to reread the code, as it does seem sensible that the
>>>same softnet_data is intended to be used both when testing qlen and
>>>flow_limit.
>>
>> In most cases, user configures the same RPS map with flow_limit like 0xff.
>> Because user couldn't predict which core the evil flow would arrive on.
>>
>> Take an example, there are 2 cores, cpu0 and cpu1.
>> One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, 
>> the target cpu is cpu1.
>> Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the 
>> queue length
>> of cpu0. Certainly it could pass the check of skb_flow_limit because there 
>> is no any evil flow on cpu0.
>
>No, enqueue_to_backlog passes qlen to skb_flow_limit, so that does
>check the queue length of the RPS cpu.

Sorry, I overlooked the qlen is the length of the rps cpu.
Then it's ok unless the stats may be not accurate when irq isn't pinned.

But I still doubt that is it really important to track state on initial cpu, 
not target cpu?
Because the enqueue_to_backlog have touched the softnet_data of target cpu.

Best Regards
Feng

>
>> Then the cpu0 inserts the skb into the queue of cpu1.
>> As a result, the skb_flow_limit doesn't work as expected.
>>
>> BTW, I have already sent the v2 patch which only adds the "Fixes: tag".
>
>The change also makes the code inconsistent with
>Documentation/networking/scaling.txt
>
>"In such environments, enable the feature on all CPUs that handle
>network rx interrupts (as set in /proc/irq/N/smp_affinity)."


Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-11 Thread Gao Feng
At 2018-05-11 11:54:55, "Willem de Bruijn" <willemdebruijn.ker...@gmail.com> 
wrote:
>On Thu, May 10, 2018 at 4:28 AM,  <gfree.w...@vip.163.com> wrote:
>> From: Gao Feng <gfree.w...@vip.163.com>
>>
>> The skb flow limit is implemented for each CPU independently. In the
>> current codes, the function skb_flow_limit gets the softnet_data by
>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>> the stats of current CPU, while the skb is going to append the queue of
>> another CPU. It isn't the expected behavior.
>>
>> Now pass the softnet_data as a param to softnet_data to make consistent.
>
>The local cpu softnet_data is used on purpose. The operations in
>skb_flow_limit() on sd fields could race if not executed on the local cpu.

I think the race doesn't exist because of the rps_lock.
The enqueue_to_backlog has hold the rps_lock before skb_flow_limit.

>
>Flow limit tries to detect large ("elephant") DoS flows with a fixed 
>four-tuple.
>These would always hit the same RPS cpu, so that cpu being backlogged

They may hit the different target CPU when enable RFS. Because the app could be 
scheduled
to another CPU, then RFS tries to deliver the skb to latest core which has hot 
cache.

>may be an indication that such a flow is active. But the flow will also always
>arrive on the same initial cpu courtesy of RSS. So storing the lookup table

The RSS couldn't make sure the irq is handled by same cpu. It would be balanced 
between
the cpus.

>on the initial CPU is also fine. There may be false positives on other CPUs
>with the same RPS destination, but that is unlikely with a highly concurrent
>traffic server mix ("mice").

If my comment is right, the flow couldn't always arrive one the same initial 
cpu,  although
it may be sent to one same target cpu.

>
>Note that the sysctl net.core.flow_limit_cpu_bitmap enables the feature
>for the cpus on which traffic initially lands, not the RPS destination cpus.
>See also Documentation/networking/scaling.txt
>
>That said, I had to reread the code, as it does seem sensible that the
>same softnet_data is intended to be used both when testing qlen and
>flow_limit.

In most cases, user configures the same RPS map with flow_limit like 0xff.
Because user couldn't predict which core the evil flow would arrive on.

Take an example, there are 2 cores, cpu0 and cpu1. 
One flow is the an evil flow, but the irq is sent to cpu0. After RPS/RFS, the 
target cpu is cpu1.
Now cpu0 invokes enqueue_to_backlog, then the skb_flow_limit checkes the queue 
length
of cpu0. Certainly it could pass the check of skb_flow_limit because there is 
no any evil flow on cpu0.
Then the cpu0 inserts the skb into the queue of cpu1.
As a result, the skb_flow_limit doesn't work as expected.

BTW, I have already sent the v2 patch which only adds the "Fixes: tag".

Best Regards
Feng




Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Gao Feng
At 
2018-05-11 08:55:47, "Eric Dumazet" eric.duma...@gmail.com wrote:


On 05/10/2018 05:18 PM, Gao Feng wrote:
 At 2018-05-10 21:02:55, "Eric Dumazet" eric.duma...@gmail.com 
wrote:


 On 05/10/2018 01:28 AM, gfree.w...@vip.163.com wrote:
 From: Gao Feng gfree.w...@vip.163.com

 The skb flow limit is implemented for each CPU independently. 
In the
 current codes, the function skb_flow_limit gets the 
softnet_data by
 this_cpu_ptr. But the target cpu of enqueue_to_backlog would 
be not
 the current cpu when enable RPS. As the result, the 
skb_flow_limit checks
 the stats of current CPU, while the skb is going to append the 
queue of
 another CPU. It isn't the expected behavior.

 Now pass the softnet_data as a param to softnet_data to make 
consistent.


 Please add a correct Fixes: tag
 
 Thanks Eric.
 
 I have one question about the "Fixes: tag".
 Most of patches are bug fixes, but when need to add the "Fixes: tag", 
and when not ?
 
 I'm not clear about it. Could you explain it please?
 

For this particular patch, since you have not CC Willem (author of the 
patch),
I found very useful that you did a search to find out.
Once you found which commit added the problem, simply add the Fixes: tag 
and CC: the author.

Doing so saves us (stable teams, reviewers, maintainers) a lot of time 
really. Normally I get the "to" list by 
get_maintainer.pl script, now I would save the stable team ASAP.
In my opinion, Fixes: tags should be mandatory when 
applicable.Thanks your explanations, I get 
it.Best RegardsFeng
 Best Regards
 Feng
 

 By doing so, you will likely add a CC: tag to make sure the author 
of the code
 will receive your email and give feed back.

 Thanks !



Re:Re: [PATCH net] net: Correct wrong skb_flow_limit check when enable RPS

2018-05-10 Thread Gao Feng
At 2018-05-10 21:02:55, "Eric Dumazet" <eric.duma...@gmail.com> wrote:
>
>
>On 05/10/2018 01:28 AM, gfree.w...@vip.163.com wrote:
>> From: Gao Feng <gfree.w...@vip.163.com>
>> 
>> The skb flow limit is implemented for each CPU independently. In the
>> current codes, the function skb_flow_limit gets the softnet_data by
>> this_cpu_ptr. But the target cpu of enqueue_to_backlog would be not
>> the current cpu when enable RPS. As the result, the skb_flow_limit checks
>> the stats of current CPU, while the skb is going to append the queue of
>> another CPU. It isn't the expected behavior.
>> 
>> Now pass the softnet_data as a param to softnet_data to make consistent.
>>
>
>Please add a correct Fixes: tag

Thanks Eric.

I have one question about the "Fixes: tag".
Most of patches are bug fixes, but when need to add the "Fixes: tag", and when 
not ?

I'm not clear about it. Could you explain it please?

Best Regards
Feng

>
>By doing so, you will likely add a CC: tag to make sure the author of the code
>will receive your email and give feed back.
>
>Thanks !
>


Re:Re: [PATCH net] net: Fix one possible memleak in ip_setup_cork

2018-04-17 Thread Gao Feng
At 2018-04-17 05:18:25, "Eric Dumazet" <eric.duma...@gmail.com> wrote:
>
>
>On 04/16/2018 09:58 AM, David Miller wrote:
>> From: gfree.w...@vip.163.com
>> Date: Mon, 16 Apr 2018 10:16:45 +0800
>> 
>>> From: Gao Feng <gfree.w...@vip.163.com>
>>>
>>> It would allocate memory in this function when the cork->opt is NULL. But
>>> the memory isn't freed if failed in the latter rt check, and return error
>>> directly. It causes the memleak if its caller is ip_make_skb which also
>>> doesn't free the cork->opt when meet a error.
>>>
>>> Now move the rt check ahead to avoid the memleak.
>>>
>>> Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>> 
>> Looks good, applied and queued up for -stable.
>> 
>> I guess in the other code paths, ip_flush_pending_frames() or similar
>> would clean up the in-sock cork information.
>> 
>
>I am not sure ip_make_skb() can be called with a NULL rt.
>
>Patch makes no harm, but does not seem to fix a bug.
>

Thanks Eric.

I just look up current all callers of ip_make_skb and ip_append_data, they check
if the rt is valid ahead. So current codes won't pass one NULL rt to 
ip_setup_cork indeed.

Then this patch is just as an enhancement, not a fix. 
As the programming rule, the function should free the mem which is allocated by 
itself when
it failed.

Best Regards
Feng


Re:Re: [PATCH net] net: Fix one possible memleak in ip_setup_cork

2018-04-15 Thread Gao Feng
At 2018-04-16 10:55:56, "David Miller" <da...@davemloft.net> wrote:
>From: gfree.w...@vip.163.com
>Date: Mon, 16 Apr 2018 10:16:45 +0800
>
>> From: Gao Feng <gfree.w...@vip.163.com>
>> 
>> It would allocate memory in this function when the cork->opt is NULL. But
>> the memory isn't freed if failed in the latter rt check, and return error
>> directly. It causes the memleak if its caller is ip_make_skb which also
>> doesn't free the cork->opt when meet a error.
>> 
>> Now move the rt check ahead to avoid the memleak.
>> 
>> Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>
>Why did you post this patch twice?

Sorry, it is my input error. I typed "yes" not "all" at the first time when 
execute git-send-email.
Then I corrected it as the second time.

Best Regards
Feng


Re:Re: [PATCH net-next ] net: Remove useless function skb_header_release

2017-09-20 Thread Gao Feng
>On Thu, 2017-09-21 at 12:39 +0800, gfree.w...@vip.163.com wrote:
>> From: Gao Feng <gfree.w...@vip.163.com>
>> 
>> There is no one which would invokes the function skb_header_release.
>> So just remove it now.
>
>This in incomplete.
>There are other references to this function.
>
>$ git grep -w skb_header_release
>drivers/net/usb/asix_common.c:  * TCP packets for example are cloned, but 
>skb_header_release()
>include/linux/skbuff.h: *  skb_header_release - release reference to header
>include/linux/skbuff.h:static inline void skb_header_release(struct sk_buff 
>*skb)
>include/linux/skbuff.h: *  Variant of skb_header_release() assuming skb is 
>private to caller.
>net/batman-adv/soft-interface.c:    * data using skb_header_release in our 
>skbs to allow skb_cow_header to

But there are comments, not real codes.

Unless the skb_header_release may be used in the external modules

Best Regards
Feng


Re:Re: [PATCH net-next] net: sched: Add the invalid handle check in qdisc_class_find

2017-08-21 Thread Gao Feng
At 2017-08-22 03:58:03, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Mon, Aug 21, 2017 at 10:47 AM, David Miller <da...@davemloft.net> wrote:
>> From: gfree.w...@vip.163.com
>> Date: Fri, 18 Aug 2017 15:23:24 +0800
>>
>>> From: Gao Feng <gfree.w...@vip.163.com>
>>>
>>> Add the invalid handle "0" check to avoid unnecessary search, because
>>> the qdisc uses the skb->priority as the handle value to look up, and
>>> it is "0" usually.
>>>
>>> Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>>
>> Jamal, Cong, please review.
>>
>> If 'id' zero is never hashed into the tables, this change looks
>> legitimate.
>
>Looks good to me.
>
>Gao, in the future please Cc maintainers directly, you can
>use ./scripts/get_maintainer.pl.
>
>Thanks.

Hi Cong,

Thanks your reminder firstly.
But I had used the get_maintainer.pl actually before sent the patch.

The following is the output.
[fgao@ikuai8 net-next]#./scripts/get_maintainer.pl 
patch_ScheCheck/0001-net-sched-Add-the-invalid-handle-check-in-qdisc_clas.patch
"David S. Miller" <da...@davemloft.net> (maintainer:NETWORKING [GENERAL])
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-ker...@vger.kernel.org (open list)

I don't know if it is an issue of the script "get_maintainer.pl"  or my usage 
is wrong.

Anyway, thanks you & Jamal's review.

Best Regards
Feng



Re:Re: [PATCH net-next 1/1] driver: pptp: Remove unnecessary statements in pptp_sock_destruct

2017-08-09 Thread Gao Feng

At 2017-08-10 02:08:30, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Wed, Aug 9, 2017 at 8:57 AM,  <gfree.w...@vip.163.com> wrote:
>> From: Gao Feng <gfree.w...@vip.163.com>
>>
>> In the commit ddab82821fa6 ("ppp: Fix a scheduling-while-atomic bug in
>> del_chan"), I moved the synchronize_rcu() from del_chan() to pptp_release
>> after del_chan() to avoid one scheduling-while-atomic bug.
>>
>> Actually the del_chan() and pppox_unbind_sock are unneccessary in the
>> pptp_sock_destruct. Because the pptp sock refcnt wouldn't reach zero until
>> sk_state is set as PPPOX_DEAD in pptp_release. By that time, the del_chan()
>> and pppox_unbind_sock() have been invoked already and the condition check
>> "!(sk->sk_state & PPPOX_DEAD)" of this sock must be false in 
>> pptp_sock_destruct.
>
>I am not sure. The check for sock->sk in the beginning of pptp_release()
>indicates there could be a case we could skip del_chan() in pptp_release(),
>although I can't figure out how.
>
>Also there is a suspicious sock_put() in pptp_release().

Hi Cong, 

Thank you.  I also failed to find the case which causes the sock->sk is null in 
release().

There is a suspicious case in __sock_create following.
err = pf->create(net, sock, protocol, kern);
if (err < 0)
goto out_module_put;
...
out_module_put:
sock->ops = NULL;
module_put(pf->owner);
out_sock_release:
sock_release(sock);
return err;
In the beginning, I thought when create is failed and the sock->sk is null, 
then the sock_release is invoked.
It could cause the sk is null in the release().
But I find  it has already reset the sock->ops as NULL before sock_release 
later, so the release() wouldn't be invoked actually.

Best Regards
Feng


Re:Re: Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-09 Thread Gao Feng

At 2017-08-10 05:00:19, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Wed, Aug 9, 2017 at 12:17 AM, Gao Feng <gfree.w...@vip.163.com> wrote:
>> Hi Cong,
>>
>> Actually I have one question about the SOCK_RCU_FREE.
>> I don't think it could resolve the issue you raised even though it exists 
>> really.
>>
>> I checked the SOCK_RCU_FREE, it just defer the __sk_destruct after one rcu 
>> period.
>> But when it performs, someone still could find this sock by callid during 
>> the del_chan period and it may still deference the sock which may freed soon.
>>
>> The right flow should be following.
>> del_chan()
>> wait a rcu period
>> sock_put()  It is safe that someone gets the sock because it 
>> already hold sock refcnt.
>>
>> When using SOCK_RCU_FREE, its flow would be following.
>> wait a rcu period
>> del_chan()
>> free the sock directly  no sock refcnt check again.
>> Because the del_chan happens after rcu wait, not before, so it isn't helpful 
>> with SOCK_RCU_FREE.
>
>
>Yes, good point! With SOCK_RCU_FREE the sock_hold() should
>not be needed. For RCU, unpublish should indeed happen before
>grace period.

Sorry, I couldn't understand why sock_hold() isn't necessary with SOCK_RCU_FREE.
When lookup_chan finds the sock, it would return and reference it later.
If no refcnt, how to protect the sock ?

Best Regards
Feng

>
>
>>
>> I don't know if I misunderstand the SOCK_RCU_FREE usage.
>>
>> But it is a good news that the del_chan is only invoked in pptp_release 
>> actually and it would wait a rcu period before sock_put.
>>
>
>Looking at the code again, the reader lookup_chan() is actually
>invoked in BH context, but neither add_chan() nor del_chan()
>actually disables BH...




Re:Re: Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-09 Thread Gao Feng
At 2017-08-10 02:18:44, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Tue, Aug 8, 2017 at 10:13 PM, Gao Feng <gfree.w...@vip.163.com> wrote:
>> Maybe I didn't show my explanation clearly.
>> I think it won't happen as I mentioned in the last email.
>> Because the pptp_release invokes the synchronize_rcu to make sure it, and 
>> actually there is no one which would invoke del_chan except pptp_release.
>> It is guaranteed by that the pptp_release doesn't put the sock refcnt until 
>> complete all cleanup include marking sk_state as PPPOX_DEAD.
>>
>> In other words, even though the pptp_release is not the last user of this 
>> sock, the other one wouldn't invoke del_chan in pptp_sock_destruct.
>> Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false.
>
>Only if sock->sk is always non-NULL for pptp_release(), which
>is what I am not sure. If you look at other ->release(), similar checks
>are there too, so not just for pptp.

Yes. It seems only if the release() is invoked twice, the sock->sk would be 
NULL.
But I don't find there is any case which could cause it.

>
>>
>> As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are 
>> unnecessary.
>> And it even brings confusing.
>
>Sorry, I can't draw any conclusion for this.

Thank you all the same, and I have learn a lot from you :)
Wish someone which is familiar with these codes could give more details and 
explanations.

Best Regards
Feng 


Re:Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-09 Thread Gao Feng
At 2017-08-09 13:13:53, "Gao Feng" <gfree.w...@vip.163.com> wrote:
>At 2017-08-09 03:45:53, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>>On Mon, Aug 7, 2017 at 6:10 PM, Gao Feng <gfree.w...@vip.163.com> wrote:
>>>
>>> Sorry, I don't get you clearly. Why the sock_hold() isn't helpful?
>>
>>I already told you, the dereference happends before sock_hold().
>>
>>sock = rcu_dereference(callid_sock[call_id]);
>>if (sock) {
>>opt = >proto.pptp;
>>if (opt->dst_addr.sin_addr.s_addr != s_addr) <=== HERE
>>sock = NULL;
>>else
>>sock_hold(sk_pppox(sock));
>>}
>>
>>If we don't wait for readers properly, sock could be freed at the
>>same time when deference it.
>
>Maybe I didn't show my explanation clearly.
>I think it won't happen as I mentioned in the last email.
>Because the pptp_release invokes the synchronize_rcu to make sure it, and 
>actually there is no one which would invoke del_chan except pptp_release.
>It is guaranteed by that the pptp_release doesn't put the sock refcnt until 
>complete all cleanup include marking sk_state as PPPOX_DEAD. 
>
>In other words, even though the pptp_release is not the last user of this 
>sock, the other one wouldn't invoke del_chan in pptp_sock_destruct.
>Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. 
>
>As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are 
>unnecessary. 
>And it even brings confusing.
>
>Best Regards
>Feng
>
>>
>>> The pptp_release invokes synchronize_rcu after del_chan, it could make sure 
>>> the others has increased the sock refcnt if necessary
>>> and the lookup is over.
>>> There is no one could get the sock after synchronize_rcu in pptp_release.
>>
>>
>>If this were true, then this code in pptp_sock_destruct()
>>would be unneeded:
>>
>>if (!(sk->sk_state & PPPOX_DEAD)) {
>>del_chan(pppox_sk(sk));
>>pppox_unbind_sock(sk);
>>}
>>
>>
>>>
>>>
>>> But I think about another problem.
>>> It seems the pptp_sock_destruct should not invoke del_chan and 
>>> pppox_unbind_sock.
>>> Because when the sock refcnt is 0, the pptp_release must have be invoked 
>>> already.
>>>
>>
>>
>>I don't know. Looks like sock_orphan() is only called
>>in pptp_release(), but I am not sure if there is a case
>>we call sock destructor before release.
>>
>>Also note, this socket is very special, it doesn't support
>>poll(), sendmsg() or recvmsg()..
>
>
>

Hi Cong,

Actually I have one question about the SOCK_RCU_FREE.
I don't think it could resolve the issue you raised even though it exists 
really.

I checked the SOCK_RCU_FREE, it just defer the __sk_destruct after one rcu 
period.
But when it performs, someone still could find this sock by callid during the 
del_chan period and it may still deference the sock which may freed soon.

The right flow should be following.
del_chan()
wait a rcu period
sock_put()  It is safe that someone gets the sock because it 
already hold sock refcnt.

When using SOCK_RCU_FREE, its flow would be following.
wait a rcu period
del_chan()
free the sock directly  no sock refcnt check again.
Because the del_chan happens after rcu wait, not before, so it isn't helpful 
with SOCK_RCU_FREE.

I don't know if I misunderstand the SOCK_RCU_FREE usage.

But it is a good news that the del_chan is only invoked in pptp_release 
actually and it would wait a rcu period before sock_put.

Best Regards
Feng




Re:Re: Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-08 Thread Gao Feng
At 2017-08-09 03:45:53, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Mon, Aug 7, 2017 at 6:10 PM, Gao Feng <gfree.w...@vip.163.com> wrote:
>>
>> Sorry, I don't get you clearly. Why the sock_hold() isn't helpful?
>
>I already told you, the dereference happends before sock_hold().
>
>sock = rcu_dereference(callid_sock[call_id]);
>if (sock) {
>opt = >proto.pptp;
>if (opt->dst_addr.sin_addr.s_addr != s_addr) <=== HERE
>sock = NULL;
>else
>sock_hold(sk_pppox(sock));
>}
>
>If we don't wait for readers properly, sock could be freed at the
>same time when deference it.

Maybe I didn't show my explanation clearly.
I think it won't happen as I mentioned in the last email.
Because the pptp_release invokes the synchronize_rcu to make sure it, and 
actually there is no one which would invoke del_chan except pptp_release.
It is guaranteed by that the pptp_release doesn't put the sock refcnt until 
complete all cleanup include marking sk_state as PPPOX_DEAD. 

In other words, even though the pptp_release is not the last user of this sock, 
the other one wouldn't invoke del_chan in pptp_sock_destruct.
Because the condition "!(sk->sk_state & PPPOX_DEAD)" must be false. 

As summary, the del_chan and pppox_unbind_sock in pptp_sock_destruct are 
unnecessary. 
And it even brings confusing.

Best Regards
Feng

>
>> The pptp_release invokes synchronize_rcu after del_chan, it could make sure 
>> the others has increased the sock refcnt if necessary
>> and the lookup is over.
>> There is no one could get the sock after synchronize_rcu in pptp_release.
>
>
>If this were true, then this code in pptp_sock_destruct()
>would be unneeded:
>
>if (!(sk->sk_state & PPPOX_DEAD)) {
>del_chan(pppox_sk(sk));
>pppox_unbind_sock(sk);
>}
>
>
>>
>>
>> But I think about another problem.
>> It seems the pptp_sock_destruct should not invoke del_chan and 
>> pppox_unbind_sock.
>> Because when the sock refcnt is 0, the pptp_release must have be invoked 
>> already.
>>
>
>
>I don't know. Looks like sock_orphan() is only called
>in pptp_release(), but I am not sure if there is a case
>we call sock destructor before release.
>
>Also note, this socket is very special, it doesn't support
>poll(), sendmsg() or recvmsg()..





Re:Re: [PATCH net] ppp: fix xmit recursion detection on ppp channels

2017-08-08 Thread Gao Feng

At 2017-08-08 21:58:27, "Guillaume Nault" <g.na...@alphalink.fr> wrote:
>On Tue, Aug 08, 2017 at 09:16:33PM +0800, Gao Feng wrote:
>> At 2017-08-08 17:43:24, "Guillaume Nault" <g.na...@alphalink.fr> wrote:
>> >--- a/drivers/net/ppp/ppp_generic.c
>> >+++ b/drivers/net/ppp/ppp_generic.c
>> >@@ -1915,21 +1915,23 @@ static void __ppp_channel_push(struct channel *pch)
>> >spin_unlock(>downl);
>> >/* see if there is anything from the attached unit to be sent */
>> >if (skb_queue_empty(>file.xq)) {
>> >-   read_lock(>upl);
>> >ppp = pch->ppp;
>> >if (ppp)
>> >-   ppp_xmit_process(ppp);
>> >-   read_unlock(>upl);
>> >+   __ppp_xmit_process(ppp);
>> >}
>> > }
>> > 
>> > static void ppp_channel_push(struct channel *pch)
>> > {
>> >-   local_bh_disable();
>> >-
>> >-   __ppp_channel_push(pch);
>> >-
>> >-   local_bh_enable();
>> >+   read_lock_bh(>upl);
>> >+   if (pch->ppp) {
>> >+   (*this_cpu_ptr(pch->ppp->xmit_recursion))++;
>> >+   __ppp_channel_push(pch);
>> >+   (*this_cpu_ptr(pch->ppp->xmit_recursion))--;
>> >+   } else {
>> >+   __ppp_channel_push(pch);
>> >+   }
>> >+   read_unlock_bh(>upl);
>> 
>> If invoked read_lock_bh in ppp_channel_push, it would be unnecessary to 
>> invoke read_lock(>upl)
>> in the __ppp_channel_push.
>> 
>But this patch does remove read_lock(>upl) from
>__ppp_channel_push(). Or have I misunderstood your point?

Sorry, it's my fault.
I forgot your former changes when think about the updates in ppp_channel_push

Best Regards
Feng



Re:[PATCH net] ppp: fix xmit recursion detection on ppp channels

2017-08-08 Thread Gao Feng
At 2017-08-08 17:43:24, "Guillaume Nault"  wrote:
>Commit e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp
>devices") dropped the xmit_recursion counter incrementation in
>ppp_channel_push() and relied on ppp_xmit_process() for this task.
>But __ppp_channel_push() can also send packets directly (using the
>.start_xmit() channel callback), in which case the xmit_recursion

You're right. We lost this case.

>counter isn't incremented anymore. If such packets get routed back to
>the parent ppp unit, ppp_xmit_process() won't notice the recursion and
>will call ppp_channel_push() on the same channel, effectively creating
>the deadlock situation that the xmit_recursion mechanism was supposed
>to prevent.
>
>This patch re-introduces the xmit_recursion counter incrementation in
>ppp_channel_push(). Since the xmit_recursion variable is now part of
>the parent ppp unit, incrementation is skipped if the channel doesn't
>have any. This is fine because only packets routed through the parent
>unit may enter the channel recursively.
>
>Finally, we have to ensure that pch->ppp is not going to be modified
>while executing ppp_channel_push(). Instead of taking this lock only
>while calling ppp_xmit_process(), we now have to hold it for the full
>ppp_channel_push() execution. This respects the ppp locks ordering
>which requires locking ->upl before ->downl.
>
>Fixes: e5dadc65f9e0 ("ppp: Fix false xmit recursion detect with two ppp 
>devices")
>Signed-off-by: Guillaume Nault 
>---
> drivers/net/ppp/ppp_generic.c | 18 ++
> 1 file changed, 10 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
>index bd4303944e44..a404552555d4 100644
>--- a/drivers/net/ppp/ppp_generic.c
>+++ b/drivers/net/ppp/ppp_generic.c
>@@ -1915,21 +1915,23 @@ static void __ppp_channel_push(struct channel *pch)
>   spin_unlock(>downl);
>   /* see if there is anything from the attached unit to be sent */
>   if (skb_queue_empty(>file.xq)) {
>-  read_lock(>upl);
>   ppp = pch->ppp;
>   if (ppp)
>-  ppp_xmit_process(ppp);
>-  read_unlock(>upl);
>+  __ppp_xmit_process(ppp);
>   }
> }
> 
> static void ppp_channel_push(struct channel *pch)
> {
>-  local_bh_disable();
>-
>-  __ppp_channel_push(pch);
>-
>-  local_bh_enable();
>+  read_lock_bh(>upl);
>+  if (pch->ppp) {
>+  (*this_cpu_ptr(pch->ppp->xmit_recursion))++;
>+  __ppp_channel_push(pch);
>+  (*this_cpu_ptr(pch->ppp->xmit_recursion))--;
>+  } else {
>+  __ppp_channel_push(pch);
>+  }
>+  read_unlock_bh(>upl);

If invoked read_lock_bh in ppp_channel_push, it would be unnecessary to invoke 
read_lock(>upl)
in the __ppp_channel_push.

Best Regards
Feng

> }
> 
> /*
>-- 
>2.14.0
>


Re:Re:Re: Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-07 Thread Gao Feng
At 2017-08-08 01:17:02, "Cong Wang" <xiyou.wangc...@gmail.com> wrote:
>On Sun, Aug 6, 2017 at 6:32 PM, Gao Feng <gfree.w...@vip.163.com> wrote:
>> I think the RCU should be supposed to avoid the race between del_chan and 
>> lookup_chan.
>
>More precisely, it is callid_sock which is protected by RCU.
>
>Unless I miss any other code path, pptp_exit_module() is
>problematic too, I don't think it can just vfree() the whole thing.
>
>
>> The synchronize_rcu could make sure if there was one which calls lookup_chan 
>> in this period, it would be finished and the sock refcnt is increased if 
>> necessary.
>>
>> So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, 
>> because the lookup_chan caller has already hold the sock refcnt,
>>

>


Hi Cong,
I just thought about this issue last night, then I get your this email this 
morning.

>If you mean the sock_hold() inside lookup_chan(), no,
>it doesn't help because we already dereference the sock
>before it.

>


Sorry, I don't get you clearly. Why the sock_hold() isn't helpful?
The pptp_release invokes synchronize_rcu after del_chan, it could make sure the 
others has increased the sock refcnt if necessary
and the lookup is over.
There is no one could get the sock after synchronize_rcu in pptp_release.


But I think about another problem.
It seems the pptp_sock_destruct should not invoke del_chan and 
pppox_unbind_sock.
Because when the sock refcnt is 0, the pptp_release must have be invoked 
already.


There are two cases totally.
1. when pptp_release invokes sock_put, the refcnt is 0. The del_chan and 
pppox_unbind_sock are invoked.
2. when pptp_release invokes sock_put, the refcnt is not 0. It means someone 
holds the sock during the period pptp_release invokes del_chan.
Then someone invokes sock_put and the sock refcnt reach 0, it would invoke 
sk_free and invokes pptp_sock_destruct.
So it is unnecessary to invoke del_chan and pppox_unbind_sock again.
And it would bring a race issue even if the pptp_sock_destruct invoked del_chan.


If so, I would send another patch for it.


>Also, lookup_chan_dst() does not have a refcnt, I don't
>find any code preventing it deref'ing other sock in callid_sock

>than the calling one.


Sorry, the last email is html format, not text.
So I send it with text format again.

Best Regards
Feng







Re:Re: [PATCH net] ppp: Fix a scheduling-while-atomic bug in del_chan

2017-08-06 Thread Gao Feng
At 2017-08-03 01:13:36, "Cong Wang"  wrote:
>Hi, Gao
>
>On Tue, Aug 1, 2017 at 1:39 PM, Cong Wang  wrote:
>> From my understanding, this RCU is supposed to protect the pppox_sock
>> pointers in 'callid_sock' which could be NULL'ed in del_chan(). And the
>> pppox_sock is freed when the last refcnt is gone, that is, when sock
>> dctor is called. pptp_release() is ONLY called when the fd in user-space
>> is gone, not necessarily the last refcnt.

Hi Cong,

I am sorry to reply you so late, because I took a short trip recently, and 
didn't check my emails.

I think the RCU should be supposed to avoid the race between del_chan and 
lookup_chan.
The synchronize_rcu could make sure if there was one which calls lookup_chan in 
this period, it would be finished and the sock refcnt is increased if necessary.

So I think it is ok to invoke sock_put directly without SOCK_RCU_FREE, because 
the lookup_chan caller has already hold the sock refcnt, 

Best Regards
Feng

>
>Your commit is probably not the right fix. Can you try the following fix?
>
>diff --git a/drivers/net/ppp/pptp.c b/drivers/net/ppp/pptp.c
>index 6dde9a0cfe76..e75bb95c107f 100644
>--- a/drivers/net/ppp/pptp.c
>+++ b/drivers/net/ppp/pptp.c
>@@ -519,7 +519,6 @@ static int pptp_release(struct socket *sock)
>
>po = pppox_sk(sk);
>del_chan(po);
>-   synchronize_rcu();
>
>pppox_unbind_sock(sk);
>sk->sk_state = PPPOX_DEAD;
>@@ -564,6 +563,7 @@ static int pptp_create(struct net *net, struct
>socket *sock, int kern)
>sk->sk_family  = PF_PPPOX;
>sk->sk_protocol= PX_PROTO_PPTP;
>sk->sk_destruct= pptp_sock_destruct;
>+   sock_set_flag(sk, SOCK_RCU_FREE);
>
>po = pppox_sk(sk);
>opt = >proto.pptp;



Re:Re: [net PATCH] net: sched: Fix one possible panic when no destroy callback

2017-06-27 Thread Gao Feng
At 2017-06-28 01:49:50, "Eric Dumazet" <eric.duma...@gmail.com> wrote:
>On Tue, 2017-06-27 at 10:08 -0700, Cong Wang wrote:
>> On Tue, Jun 27, 2017 at 9:50 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Tue, 2017-06-27 at 09:30 -0700, Cong Wang wrote:
>> >> On Mon, Jun 26, 2017 at 6:35 PM,  <gfree.w...@vip.163.com> wrote:
>> >> > From: Gao Feng <gfree.w...@vip.163.com>
>> >> >
>> >> > When qdisc fail to init, qdisc_create would invoke the destroy callback
>> >> > to cleanup. But there is no check if the callback exists really. So it
>> >> > would cause the panic if there is no real destroy callback like these
>> >> > qdisc codel, pfifo, pfifo_fast, and so on.
>> >> >
>> >> > Now add one the check for destroy to avoid the possible panic.
>> >> >
>> >> > Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>> >>
>> >> Looks good,
>> >>
>> >> Acked-by: Cong Wang <xiyou.wangc...@gmail.com>
>> >>
>> >> This is introduced by commit 87b60cfacf9f17cf71933c6e33b.
>> >> Please add proper Fixes tag next time.

OK. Actually I didn't know it is introduced by this commit before :)
Need I send an update patch again ?

>> >
>> > Given that pfifo, pfifo_fast or codel can not fail their init,
>> 
>> 
>> How about codel_init() -> codel_change() -> nla_parse_nested() ?
>
>
>Yeah, with a malicious user space then (iproute2/tc is fine), codel
>could be problematic.
>
>pfifo and pfifo_fast can definitely not hit this bug.
>
>changelog needs a bit of attention, even if the bug is real.
>
>Thanks.
>
>

Yes, the codel could fail to init, and the fifo/pfifo could failed When 
"nla_len(opt) < sizeof(*ctl)".

Best Regards
Feng




Re:Re: [PATCH net-next] net: rps: Add the rfs_needed check when record flow hash

2017-05-25 Thread Gao Feng
Hi David & Eric,

At 2017-05-26 01:11:41, "David Miller"  wrote:
>From: gfree.w...@vip.163.com
>Date: Wed, 24 May 2017 15:35:59 +0800
>
>>  
>> +static inline void sock_rps_record_flow_hash(__u32 hash)
>> +{
>> +#ifdef CONFIG_RPS
>> +if (static_key_false(_needed))
>> +_sock_rps_record_flow_hash(hash);
>> +#endif
>> +}
>> +
>
>This is no longer used.
>
>If you have some plans to use it in another context, you absolutely _MUST_
>post this change inside of a patch series that adds the new user as well.
>
>Otherwise it is impossible to determine the correctness and
>appropriateness of your changes.
>
>Thanks.

The sock_rps_record_flow_hash is still used now.
There is only one caller which is tun_flow_update in file tun.c.

Best Regards
Feng



Re:Re: [PATCH net-next] net: rfs: Don't reset RFS entries when nothing changed

2017-05-22 Thread Gao Feng
At 2017-05-23 11:02:20, "David Miller" <da...@davemloft.net> wrote:
>From: gfree.w...@vip.163.com
>Date: Tue, 23 May 2017 08:45:11 +0800
>
>> From: Gao Feng <gfree.w...@vip.163.com>
>> 
>> When the new RFS table size specified by sysctl equals the old one,
>> there is nothing changed actually. So it is unnecessary to reset the
>> RFS table entris.
>> 
>> Signed-off-by: Gao Feng <gfree.w...@vip.163.com>
>
>It seems like an intentional feature to be able to reset the
>table by simply writing the same value to the sysfs knob.
>
>I'm not applying this, sorry.

It is ok.
I just thought maybe it was used to reset, but I didn't find any comment and 
tips by google.

Regards
Feng


Re:Re: [PATCH net v2] driver: vrf: Fix one possible use-after-free issue

2017-05-09 Thread Gao Feng

At 2017-05-10 02:37:36, "David Miller"  wrote:
>From: gfree.w...@vip.163.com
>Date: Tue,  9 May 2017 18:27:33 +0800
>
>> @@ -989,6 +989,7 @@ static u32 vrf_fib_table(const struct net_device *dev)
>>  
>>  static int vrf_rcv_finish(struct net *net, struct sock *sk, struct sk_buff 
>> *skb)
>>  {
>> +kfree_skb(skb);
>>  return 0;
>>  }
>>  
>> @@ -998,7 +999,7 @@ static struct sk_buff *vrf_rcv_nfhook(u8 pf, unsigned 
>> int hook,
>>  {
>>  struct net *net = dev_net(dev);
>>  
>> -if (NF_HOOK(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) < 0)
>> +if (nf_hook(pf, hook, net, NULL, skb, dev, NULL, vrf_rcv_finish) != 1)
>>  skb = NULL;/* kfree_skb(skb) handled by nf code */
>>  
>>  return skb;
>
>Indeed, this fixes the immediate problem with NF_STOLEN.
>
>Making NF_STOLEN fully functional is another story, we'd need to stack
>this all together properly:

Yes, this fix just make the vrf wouldn't cause panic which is caused by 
use-after-free skb.
It doesn't work as real NF_QUEUE when reinject the skb. 

>
>static int __ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff 
>*skb)
>{
> ...
>}
>
>static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
>{
>   return l3mdev_ip_rcv(skb, __ip_rcv_finish);
>}
>...
>static inline
>struct sk_buff *l3mdev_ip_rcv(struct sk_buff *skb,
> int (*okfn)(struct net *, struct sock *, struct 
> sk_buff *))
>{
>   return l3mdev_l3_rcv(skb, okfn, AF_INET);
>}
>
>etc. but that's going to really add a kink to the receive path,
>microbenchmark wise.

It is a solution to make NF_STOLEN fully function, but I haven't environment to 
test the benchmark.

Best Regards
Feng




Re:Re: [PATCH net] driver: vrf: Fix one possible use-after-free issue

2017-05-09 Thread Gao Feng
At 2017-05-09 17:21:02, "Florian Westphal"  wrote:
>gfree.w...@vip.163.com  wrote:
>> When one netfilter rule or hook stoles the skb and return NF_STOLEN,
>> it means the skb is taken by the rule, and other modules should not
>> touch this skb ever. Maybe the skb is queued or freed directly by the
>> rule.
>> 
>> Now uses the nf_hook instead of NF_HOOK to get the result of netfilter,
>> and check the return value of nf_hook. Only when its value equals 1, it
>> means the skb could go ahead. Or reset the skb as NULL.
>> 
>> BTW, because vrf_rcv_finish is empty function, so needn't invoke it
>> even though nf_hook returns 1.
>
>Thats a bug then.
>
>The okfn (if called) takes ownership of skb and must free it eventually.
>Otherwise userspace queueing leaks skb on reinjection.
>
>(see nf_reinject() and its use of okfn()).

Thanks, I only thought about the stolen case like synproxy which would free the 
skb directly,
and forget the userspace could reinject the skb.

I would update the patch.

Best Regards
Feng

RE: [PATCH net v4 00/12] Fix possbile memleaks when fail to register_netdevice

2017-05-07 Thread Gao Feng
Hi David,

> From: David Miller [mailto:da...@davemloft.net]
> From: gfree.w...@foxmail.com
> Date: Tue,  2 May 2017 13:58:42 +0800
> 
> > These following drivers allocate kinds of resources in its ndo_init
> > func, free some of them or all in the destructor func. Then there is
> > one memleak that some errors happen after register_netdevice invokes
> > the ndo_init callback. Because only the ndo_uninit callback is invoked
> > in the error handler of register_netdevice, but destructor not.
> >
> > In my original approach, I tried to free the resources in the newlink
> > func when fail to register_netdevice, like destructor did except not
> > free the net_dev. This method is not good when destructor is changed,
> > and the memleak could be not fixed when there is no newlink callback.
> >
> > Now create one new func used to free the resources in the destructor,
> > and the ndo_uninit func also could invokes it when fail to register
> > the net_device by comparing the dev->reg_state with
> > NETREG_UNINITIALIZED.  If there is no existing ndo_uninit, just add
> > one.
> >
> > This solution doesn't only make sure free all resources in any case,
> > but also follows the original desgin that some resources could be kept
> > until the destructor executes normally after register the device
> > successfully.
> 
> Device private teardown is in two stages for the following reason.
> The issue is that netdev_ops->ndo_init() allocates two types of resources.
> 
> One type is OK to release during destruction before the netdev refs goes
to
> zero.  This is what netdev_ops->ndo_uninit() is for.
> 
> The second type is for releasing things which are not safe to drop until
the very
> last netdev reference disappears.  This is what
> netdev->destructor() is for.
> 
> If you look around there are hacks in place all over to try and deal with
this
> issue.  Basically, look for code that checks the return value of
> register_netdev() and if an error is indicated it does some local driver
state
> freeing.  Bonding is one example.  It is trying to deal with the problem
this
> patch set is targetting.

Yes. When I was fixing this bug, I had found the bond had deal with this
issue, frees the resources by itself.

> 
> What really needs to happen is we must divorce the logic of
> dev->destructor() from free_netdev().
> 
> That way we can do the free_netdev in the unregister netdevice path after
> calling netdev->destructor().
> 
> Then the only issue callers of register_netdevice() need to be aware of is
that if
> an error is returned, that caller must call free_netdev().  Which has been
the
> case for decades.
> 
> So I would fix this as follows:
> 
> 1) Rename netdev->destructor() into "netdev->priv_destructor()"  It
>performs all post ndo_ops->ndo_uninit() cleanups, _except_
>free_netdev().  Update all drivers with netdev->destructor().
> 
> 2) Add a boolean state to netdev, which indicates if free_netdev()
>should be performed after netdev->priv_destructor() during
>unregister.
> 
> That provides all of the flexibility necessary to fix this bug in the
core.
> 
> In register_netdevice() if something after ndo_ops->ndo_init() succeeeds,
we
> invoke _both_ ndo_ops->ndo_uninit() and
> netdev->priv_destructor().  We do not look at the netdev "needs
> free_netdev()" boolean.
> 
> In netdev_run_todo(), where we have the one and only
> netdev->destructor() call, change it to:
> 
>   if (dev->priv_destructor)
>   dev->priv_destructor(dev);
>   if (dev->needs_free_netdev)
>   free_netdev(dev);
> 
> That fixes the bug in all cases.  And makes the purpose and logic
extremely
> clear.  Also, no internal state tests leak into the drivers.
> 
> Finally, drivers that try to cover up this issue, such as bonding, need to
be
> changed to no try and free up device private state if their invocation of
> register_netdevice() fails.
> 
> Thanks.

Ok. It is better to fix it by changing the framework of net_dev.
I was afraid that I would bring other bugs if I changed it.
Because it would affect too many codes.

I would like to see you fix it by yourself.

Best Regards
Feng





RE: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice

2017-05-03 Thread Gao Feng
> From: Xin Long [mailto:lucien@gmail.com]
> Sent: Wednesday, May 3, 2017 7:26 PM
> On Wed, May 3, 2017 at 2:37 PM, Gao Feng <gfree.w...@foxmail.com> wrote:
> >> From: Xin Long [mailto:lucien@gmail.com]
> >> Sent: Wednesday, May 3, 2017 1:38 PM
> >> On Wed, May 3, 2017 at 10:07 AM, Gao Feng <gfree.w...@foxmail.com>
> >> wrote:
> >> >> From: netdev-ow...@vger.kernel.org
> >> >> [mailto:netdev-ow...@vger.kernel.org]
> >> >> On Behalf Of Xin Long
> >> >> Sent: Wednesday, May 3, 2017 12:59 AM On Tue, May 2, 2017 at 7:03
> >> >> PM, Gao Feng <gfree.w...@vip.163.com> wrote:
> >> >> >> From: Xin Long [mailto:lucien....@gmail.com]
> >> >> >> Sent: Tuesday, May 2, 2017 3:56 PM On Sat, Apr 29, 2017 at
> >> >> >> 11:51 AM,  <gfree.w...@foxmail.com> wrote:
> >> >> >> > From: Gao Feng <gfree.w...@foxmail.com>
> > [...]
> >> > The fix you mentioned change the original logic.
> >> > The dev->vstats is freed in advance in the ndo_uninit, not destructor.
> >> > It may break the backward.
> >> Sorry, I didn't get your "backward"
> >> I can't see there will be any problem caused by it.
> >> can you say this patch also break the 'backward' ?
> >> https://patchwork.ozlabs.org/patch/748964/
> >>
> >> It's really weird to do dev->reg_state check in ndo_unint ndo_unint
> >> is supposed to free the memory alloced in ndo_init.
> >>
> >
> > I am not sure if it would break the backward, so I said it MAY break.
> > I assumed there may be someone would access the dev->vstats after
> > ndo_uninit, because current veth driver free the mem in the destructor.
> > I selected this approach because I don't want to bring new bugs during fix
> bug.
> >
> > If you're sure it is safe to free dev->vstats in ndo_uninit, I would like to
> update it.
> yes, stats are accessed in .ndo_start_xmit waited by synchronize_net() and
> .ndo_get_stats64 protected by rtnl_lock().

Thanks, I would update the series later with your advice.
I need to wait for a while to collect more comments.
 
> >
> > BTW there are too many drivers which have possible memleak.
> > You could find the list by
> https://www.mail-archive.com/netdev@vger.kernel.org/msg166629.html.
> ah, cool.
> I'm not sure about other dev's stuff, have to check them for sure later.

Expect and thanks your reviews:)

Best Regards
Feng

> 
> >
> > Some drivers allocate the resources in ndo_init, free some in ndo_uninit and
> free left in destructor.
> > I think there are some reasons.
> > We could not move all free in the ndo_uninit from destructor. What's your
> opinion?
> >
> > Best Regards
> > Feng
> >
> >
> >



RE: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice

2017-05-03 Thread Gao Feng
> From: Xin Long [mailto:lucien@gmail.com]
> Sent: Wednesday, May 3, 2017 1:38 PM
> On Wed, May 3, 2017 at 10:07 AM, Gao Feng <gfree.w...@foxmail.com>
> wrote:
> >> From: netdev-ow...@vger.kernel.org
> >> [mailto:netdev-ow...@vger.kernel.org]
> >> On Behalf Of Xin Long
> >> Sent: Wednesday, May 3, 2017 12:59 AM On Tue, May 2, 2017 at 7:03 PM,
> >> Gao Feng <gfree.w...@vip.163.com> wrote:
> >> >> From: Xin Long [mailto:lucien@gmail.com]
> >> >> Sent: Tuesday, May 2, 2017 3:56 PM On Sat, Apr 29, 2017 at 11:51
> >> >> AM,  <gfree.w...@foxmail.com> wrote:
> >> >> > From: Gao Feng <gfree.w...@foxmail.com>
[...]
> > The fix you mentioned change the original logic.
> > The dev->vstats is freed in advance in the ndo_uninit, not destructor.
> > It may break the backward.
> Sorry, I didn't get your "backward"
> I can't see there will be any problem caused by it.
> can you say this patch also break the 'backward' ?
> https://patchwork.ozlabs.org/patch/748964/
> 
> It's really weird to do dev->reg_state check in ndo_unint ndo_unint is 
> supposed
> to free the memory alloced in ndo_init.
> 

I am not sure if it would break the backward, so I said it MAY break.
I assumed there may be someone would access the dev->vstats after ndo_uninit,
because current veth driver free the mem in the destructor.
I selected this approach because I don't want to bring new bugs during fix bug.

If you're sure it is safe to free dev->vstats in ndo_uninit, I would like to 
update it.

BTW there are too many drivers which have possible memleak.
You could find the list by 
https://www.mail-archive.com/netdev@vger.kernel.org/msg166629.html.

Some drivers allocate the resources in ndo_init, free some in ndo_uninit and 
free left in destructor.
I think there are some reasons. 
We could not move all free in the ndo_uninit from destructor. What's your 
opinion?

Best Regards
Feng





RE: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice

2017-05-02 Thread Gao Feng
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Xin Long
> Sent: Wednesday, May 3, 2017 12:59 AM
> On Tue, May 2, 2017 at 7:03 PM, Gao Feng <gfree.w...@vip.163.com> wrote:
> >> From: Xin Long [mailto:lucien@gmail.com]
> >> Sent: Tuesday, May 2, 2017 3:56 PM
> >> On Sat, Apr 29, 2017 at 11:51 AM,  <gfree.w...@foxmail.com> wrote:
> >> > From: Gao Feng <gfree.w...@foxmail.com>
> > [...]
> >> > -static void veth_dev_free(struct net_device *dev)
> >> > +static void veth_destructor_free(struct net_device *dev)
> >> >  {
> >> > free_percpu(dev->vstats);
> >> > +}
> >> not sure why you needed to add this function.
> >> to use free_percpu() directly may be clearer.
> >
> > Because both of ndo_uninit and destructor need to perform same free
> statements.
> > It is good at maintain the codes with the common function.
> >>
> >> > +
> >> > +static void veth_dev_uninit(struct net_device *dev) {
> >> call free_percpu() here, no need to check dev->reg_state.
> >> free_percpu will just return if dev->vstats is NULL.
> >
> > It would break the original design if don't check the reg_state.
> > The original logic is that free the resources in the destructor, not in 
> > ndo_init.
> I got what you're doing now, can you pls try to fix this with:
> 
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -219,10 +219,9 @@ static int veth_dev_init(struct net_device *dev)
> return 0;
>  }
> 
> -static void veth_dev_free(struct net_device *dev)
> +static void veth_dev_uninit(struct net_device *dev)
>  {
> free_percpu(dev->vstats);
> -   free_netdev(dev);
>  }
> 
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> @@ -279,6 +278,7 @@ static void veth_set_rx_headroom(struct net_device
> *dev, int new_hr)
> 
>  static const struct net_device_ops veth_netdev_ops = {
> .ndo_init= veth_dev_init,
> +   .ndo_uninit  = veth_dev_uninit,
> .ndo_open= veth_open,
> .ndo_stop= veth_close,
> .ndo_start_xmit  = veth_xmit,
> @@ -317,7 +317,7 @@ static void veth_setup(struct net_device *dev)
>NETIF_F_HW_VLAN_STAG_TX |
>NETIF_F_HW_VLAN_CTAG_RX |
>NETIF_F_HW_VLAN_STAG_RX);
> -   dev->destructor = veth_dev_free;
> +   dev->destructor = free_netdev;
> dev->max_mtu = ETH_MAX_MTU;
> 
> dev->hw_features = VETH_FEATURES;
> 
> 
> just as what other virtual nic drivers do (vxlan, geneve, macsec, bridge )
> 

The fix you mentioned change the original logic.
The dev->vstats is freed in advance in the ndo_uninit, not destructor.
It may break the backward.

Regards
Feng




RE: [PATCH net v4 00/12] Fix possbile memleaks when fail to register_netdevice

2017-05-02 Thread Gao Feng
Hi David

> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, May 3, 2017 3:30 AM
> From: gfree.w...@foxmail.com
> Date: Tue,  2 May 2017 13:58:42 +0800
> 
[...]
> >
> > This solution doesn't only make sure free all resources in any case,
> > but also follows the original desgin that some resources could be kept
> > until the destructor executes normally after register the device
> > successfully.
> 
> I want to think about this some more.
> 
> It is really unfortunate that resources are allocated strictly from the
ndo_init()
> yet released in two different callbacks which are invoked only in certain
> (different) situations.
> 
> Just the fact that we have to make an internal netdev state test in the
> ndo_uninit callback to get this right is a big red flag to me.

Yes, I am very agree with you.
This fix is just like a workaround under current framework.

The root is that allocate in one spot, but free them at two spots.
It means all ndo_uninit need to handle this case if allocate some resource
and free them in the destructor.
It should be done by the framework.

I thought about if there was a better solution to fix it.
But I think it need to modify the framework of net_device, it seems not good
as a bug fix to net.git.

Best Regards
Feng






RE: [PATCH net v4 00/12] Fix possbile memleaks when fail to register_netdevice

2017-05-02 Thread Gao Feng
Hi all,

> From: gfree.w...@foxmail.com [mailto:gfree.w...@foxmail.com]
> Sent: Tuesday, May 2, 2017 1:59 PM
>  drivers/net/dummy.c | 14 +++---
>  drivers/net/ifb.c   | 33 +++--
>  drivers/net/loopback.c  | 15 ++-
>  drivers/net/team/team.c | 15 ---
>  drivers/net/veth.c  | 15 ++-
>  net/8021q/vlan_dev.c| 17 +
>  net/batman-adv/soft-interface.c | 18 +++---
>  net/ipv4/ip_tunnel.c| 11 ++-
>  net/ipv6/ip6_gre.c  | 17 +
>  net/ipv6/ip6_tunnel.c   | 11 ++-
>  net/ipv6/ip6_vti.c  | 11 ++-
>  net/ipv6/sit.c  | 17 +
>  12 files changed, 158 insertions(+), 36 deletions(-)
> 
> --
>  v4: Make patches as one sery of patches, per David Miler
>  v3: Split one patch to multiple commits, per David Ahern
>  v2: Move the free in ndo_uninit when fail to register, per Herbert Xu
>  v1: initial version
> 
> 

Because I sent these patches too fast today, so that my email server blocks
my account today.
The left patches (08~12) would be sent after my account is unlocked. 

Maybe tomorrow.
I am very sorry about this, and try to change another mail server next time.

Best Regards
Feng





RE: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice

2017-05-02 Thread Gao Feng
> From: Xin Long [mailto:lucien@gmail.com]
> Sent: Tuesday, May 2, 2017 3:56 PM
> On Sat, Apr 29, 2017 at 11:51 AM,  <gfree.w...@foxmail.com> wrote:
> > From: Gao Feng <gfree.w...@foxmail.com>
[...]
> > -static void veth_dev_free(struct net_device *dev)
> > +static void veth_destructor_free(struct net_device *dev)
> >  {
> > free_percpu(dev->vstats);
> > +}
> not sure why you needed to add this function.
> to use free_percpu() directly may be clearer.

Because both of ndo_uninit and destructor need to perform same free statements.
It is good at maintain the codes with the common function.
> 
> > +
> > +static void veth_dev_uninit(struct net_device *dev) {
> call free_percpu() here, no need to check dev->reg_state.
> free_percpu will just return if dev->vstats is NULL.

It would break the original design if don't check the reg_state.
The original logic is that free the resources in the destructor, not in 
ndo_init.

BTW, because I send multiple patches too fast today, the email server blocks my 
account.
So I have to reply you with a different email account. Sorry.

Best Regards
Feng

> 
[...]




RE: [PATCH net v3] driver: veth: Fix one possbile memleak when fail to register_netdevice

2017-05-02 Thread Gao Feng
> From: David Ahern [mailto:d...@cumulusnetworks.com]
> Sent: Monday, May 1, 2017 11:08 PM
> On 4/28/17 9:51 PM, gfree.w...@foxmail.com wrote:
> > diff --git a/drivers/net/veth.c b/drivers/net/veth.c index
> > 8c39d6d..418376a 100644
> > --- a/drivers/net/veth.c
> > +++ b/drivers/net/veth.c
> > @@ -224,9 +224,21 @@ static int veth_dev_init(struct net_device *dev)
> > return 0;
> >  }
> >
> > -static void veth_dev_free(struct net_device *dev)
> > +static void veth_destructor_free(struct net_device *dev)
> 
> _destructor in the name is confusing since veth_dev_free is the
> dev->destructor. Perhaps that should be veth_free_stats.

Because I want to emphasize it should be invoked in the destructor.
What's your opinion ?


[...]
> Functionally, it looks good to me.
> 
> Acked-by: David Ahern 

Thanks David.
I have sent the v4 patches with a series according to David's advice.

BTW, because I send multiple patches too fast today, the email server blocks
my account.
So I have to reply you with a different email account. Sorry.

Regards
Feng




RE: [PATCH net v3] driver: dummy: Fix one possbile memleak when fail to register_netdevice

2017-05-01 Thread Gao Feng
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, May 1, 2017 10:54 AM
> 
> Please, Gao, submit this as a proper, numbered, patch series with a proper
> header posting.
> 
> That way you can explain why you took this strategy to fix this problem,
> compared to your original approach.
> 
> Thanks.

OK, no problem.

Regards
Feng



RE: [PATCH net v2] net: dev: Fix possible memleaks when fail to register_netdevice

2017-04-28 Thread Gao Feng
> From: David Ahern [mailto:dsah...@gmail.com]
> Sent: Friday, April 28, 2017 11:23 PM
> On 4/27/17 9:19 PM, gfree.w...@foxmail.com wrote:
> >  drivers/net/dummy.c | 14 +++---
> >  drivers/net/ifb.c   | 33 +++--
> >  drivers/net/loopback.c  | 15 ++-
> drivers/net/team/team.c
> > | 15 ---
> >  drivers/net/veth.c  | 15 ++-
> >  net/8021q/vlan_dev.c| 17 +
> >  net/ipv4/ip_tunnel.c| 11 ++-
> >  net/ipv6/ip6_gre.c  | 18 ++
> >  net/ipv6/ip6_tunnel.c   | 11 ++-
> >  net/ipv6/ip6_vti.c  | 11 ++-
> >  net/ipv6/sit.c  | 17 +
> >  11 files changed, 144 insertions(+), 33 deletions(-)
> 
> Seems like the changes to each file are completely independent, so they
should
> be in separate patches making each easier to review.

Thanks, I would like split them into multiple patches.

Best Regards
Feng



RE: [PATCH net] driver/net: Fix possible memleaks when fail to register_netdevice

2017-04-27 Thread Gao Feng
> From: Gao Feng [mailto:gfree.w...@foxmail.com]
> Sent: Thursday, April 27, 2017 4:33 PM
> > From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
> > Sent: Thursday, April 27, 2017 4:16 PM On Tue, Apr 25, 2017 at
> > 08:01:50PM +0800, gfree.w...@foxmail.com wrote:
> > > From: Gao Feng <f...@ikuai8.com>
> > >
> [...]
> >
[...]

I think I get one method which could be safe to free the mem in ndo_uninit.
Thanks, I would send the v2 patch later.

Best Regards
Feng





RE: [PATCH net] driver/net: Fix possible memleaks when fail to register_netdevice

2017-04-27 Thread Gao Feng
> From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
> Sent: Thursday, April 27, 2017 4:16 PM
> On Tue, Apr 25, 2017 at 08:01:50PM +0800, gfree.w...@foxmail.com wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
[...]
> 
> This has the potential of creating future bugs, because there is no
guarantee
> that the ndo_init function has been invoked at all.
> 
> Wouldn't it be safer to move the freeing from the destructors into their
> ndo_uninit functions instead?

I considered about this solution, I am not sure if it is safe to move the
freeing from destructors into ndo_uninit.
Because when the free action is done in ndo_uninit, it is earlier than
destructor. 
I am not sure if it break the design of original driver.

I just tested the team driver before. It is ok to free all mems in
ndo_uninit.
Is it possible that anyone are using the net_dev after ndo_uninit ?

If no one, i would like to update the patch. 
Could you give me some guide please?

Regards
Feng

> 
> Thanks,
> --





RE: [B.A.T.M.A.N.] [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice

2017-04-26 Thread Gao Feng
> From: Sven Eckelmann [mailto:s...@narfation.org]
> Sent: Wednesday, April 26, 2017 3:17 PM
> On Mittwoch, 26. April 2017 14:44:24 CEST Gao Feng wrote:
> [...]
> > I get it now, thanks.
> [...]
> > BTW, I think although the batadv_softif_create is legacy, we should
> > fix it when it still exists :)
> 
> I didn't meant that we should not fix it. I just said that it looks to me
like the fix
> should look different to ensure that it actually fixes the sysfs and rtnl
link
> implementation for the batadv interface creation. Right now the ndo_uninit
> (when it would be set by batadv) is called in the netdev core functions
when an
> error happens during the registration. This is not the case for the
destructor.

Thanks your answer.
I assumed the destructor is not for this case before.. 

> Your patch would not change it. It therefore looks like you simply have to
move
> the current destructor (without the free_netdev) to ndo_uninit and change
the
> destructor to free_netdev.

Yes, that patch didn't touch badman-adv. Because current badman-adv doesn't
support newlink now.
It would be good that cleanup the resource in ndo_uninit routine.

Best Regards
Feng
 
> 
> The batadv ops doesn't have a newlink function. It will therefore use the
> register_netdevice code path which calls free_netdev on failures. The
extra
> cleanup you've added in
> https://www.mail-archive.com/netdev@vger.kernel.org/msg165253.html can
> therefore not work for batman-adv. Actually, it is not touching anything
> batman-adv related. The suggestion to change the register_netdevice ->
> free_netdev part in rtnl_newlink was new in the reply to the batadv
discussion.
> It is therefore still an open discussion how it is correctly fixed.
> 
> Kind regards,
>   Sven




RE: [B.A.T.M.A.N.] [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice

2017-04-26 Thread Gao Feng
> From: Sven Eckelmann [mailto:s...@narfation.org]
> Sent: Wednesday, April 26, 2017 1:58 PM
> On Mittwoch, 26. April 2017 08:41:30 CEST Gao Feng wrote:
> > On Dienstag, 25. April 2017 20:03:20 CEST gfree.w...@foxmail.com wrote:
> > > From: Gao Feng <f...@ikuai8.com>
> > >
> > > Because the func batadv_softif_init_late allocate some resources and
> > > it would be invoked in register_netdevice. So we need to invoke the
> > > func batadv_softif_free instead of free_netdev to cleanup when fail
> > > to register_netdevice.
> >
> > I could be wrong, but shouldn't the destructor be replaced with
> > free_netdevice and the batadv_softif_free (without the free_netdev)
> > used as ndo_uninit? The line you've changed should then be kept as
> free_netdevice.
> >
> > At least this seems to be important when using rtnl_newlink() instead
> > of the legacy sysfs netdev stuff from batman-adv. rtnl_newlink() would
> > also only call free_netdevice and thus also not run
> > batadv_softif_free. This seems to be only fixable by calling ndo_uninit.
> >
> > Sorry, I don't get you.
> > The net_dev is created in this func batadv_softif_create.
> > Why couldn't invoke batadv_softif_free to cleanup when fail to
> > register_netdevice?
> >
> 
> Because it is the legacy way to create the batadv interfaces and there is
a
> "new" one. The new way is to use rtnl link (see batadv_link_ops).
> 
> The rtnl linke (rtnl_newlink) would not benefit from your fix and
therefore still
> show the old behavior. I think a different fix is necessary to solve the
problem
> for both ways to create an batadv interface.
> 
> Kind regards,
>   Sven

I get it now, thanks.
Actually I sent another patch about the memleaks when invoke newlink and
fail to register_netdev.
You could refer to it by
https://www.mail-archive.com/netdev@vger.kernel.org/msg165253.html.

In this patch, I add the cleanup when fail to register_netdev.
It would be more simple if modify the rtnl_newlink and invoke the destructor
of netdev when failed.
Like the following codes.
if (ops->newlink) {
err = ops->newlink(link_net ? : net, dev, tb, data);
if (err < 0) {
if (dev->reg_state == NETREG_UNINITIALIZED)
if (dev->destructor)
dev->destructor(dev)
else
free_netdev(dev);
goto out;
}
} else {
err = register_netdevice(dev);
if (err < 0) {
if (dev->destructor)
dev->destructor(dev);
else
free_netdev(dev);
goto out;
}
}

But I don't know if it breaks the design newlink and destructor.

BTW, I think although the batadv_softif_create is legacy, we should fix it
when it still exists :)

Regards
Feng




RE: [B.A.T.M.A.N.] [PATCH net] net: batman-adv: Fix possible memleaks when fail to register_netdevice

2017-04-25 Thread Gao Feng
> From: Sven Eckelmann [mailto:s...@narfation.org]
> Sent: Tuesday, April 25, 2017 9:53 PM
> On Dienstag, 25. April 2017 20:03:20 CEST gfree.w...@foxmail.com wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
> > Because the func batadv_softif_init_late allocate some resources and
> > it would be invoked in register_netdevice. So we need to invoke the
> > func batadv_softif_free instead of free_netdev to cleanup when fail to
> > register_netdevice.
> 
> I could be wrong, but shouldn't the destructor be replaced with
free_netdevice
> and the batadv_softif_free (without the free_netdev) used as ndo_uninit?
The
> line you've changed should then be kept as free_netdevice.
> 
> At least this seems to be important when using rtnl_newlink() instead of
the
> legacy sysfs netdev stuff from batman-adv. rtnl_newlink() would also only
call
> free_netdevice and thus also not run batadv_softif_free. This seems to be
only
> fixable by calling ndo_uninit.
> 
> Kind regards,
>   Sven

Sorry, I don't get you.
The net_dev is created in this func batadv_softif_create.
Why couldn't invoke batadv_softif_free to cleanup when fail to
register_netdevice?

Best Regards
Feng





RE: [PATCH net] driver/net: Fix possible memleaks when fail to register_netdevice

2017-04-25 Thread Gao Feng
> From: Gao Feng <f...@ikuai8.com>
> 
> These drivers allocate kinds of resources in init routine, and free some
> resources in the destructor of net_device. It may cause memleak when some
> errors happen after register_netdevice invokes the init callback. Because
only
> the uninit callback is invoked in the error handler of register_netdevice,
but the
> destructor not. As a result, some resources are lost forever.
> 
> Now invokes the destructor instead of free_netdev somewhere, and free the
> left resources in the newlink func when fail to register_netdevice.
> 
> Signed-off-by: Gao Feng <f...@ikuai8.com>
> ---
>  drivers/net/dummy.c  |  2 +-
>  drivers/net/ifb.c|  2 +-
>  drivers/net/loopback.c   |  2 +-
>  drivers/net/team/team.c  | 11 ++-
>  drivers/net/veth.c   |  4 ++--
>  net/8021q/vlan_netlink.c |  6 +-
>  net/ipv4/ip_tunnel.c |  9 -
>  net/ipv6/ip6_gre.c   |  6 +-
>  net/ipv6/ip6_tunnel.c| 12 ++--
>  net/ipv6/ip6_vti.c   |  7 ++-
>  net/ipv6/sit.c   |  5 -
>  11 files changed, 53 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index
> 2c80611..55b8a50 100644
> --- a/drivers/net/dummy.c
> +++ b/drivers/net/dummy.c
> @@ -383,7 +383,7 @@ static int __init dummy_init_one(void)
>   return 0;
> 
>  err:
> - free_netdev(dev_dummy);
> + dummy_free_netdev(dev_dummy);
>   return err;
>  }
> 
> diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c index 312fce7..a298371
100644
> --- a/drivers/net/ifb.c
> +++ b/drivers/net/ifb.c
> @@ -318,7 +318,7 @@ static int __init ifb_init_one(int index)
>   return 0;
> 
>  err:
> - free_netdev(dev_ifb);
> + ifb_dev_free(dev_ifb);
>   return err;
>  }
> 
> diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c index
> b23b719..c4e1d4c 100644
> --- a/drivers/net/loopback.c
> +++ b/drivers/net/loopback.c
> @@ -208,7 +208,7 @@ static __net_init int loopback_net_init(struct net
*net)
> 
> 
>  out_free_netdev:
> - free_netdev(dev);
> + loopback_dev_free(dev);
>  out:
>   if (net_eq(net, _net))
>   panic("loopback: Failed to register netdevice: %d\n", err);
diff --git
> a/drivers/net/team/team.c b/drivers/net/team/team.c index f8c81f1..0bc80fb
> 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -2109,10 +2109,19 @@ static void team_setup(struct net_device *dev)
> static int team_newlink(struct net *src_net, struct net_device *dev,
>   struct nlattr *tb[], struct nlattr *data[])  {
> + int ret;
> +
>   if (tb[IFLA_ADDRESS] == NULL)
>   eth_hw_addr_random(dev);
> 
> - return register_netdevice(dev);
> + ret = register_netdevice(dev);
> + if (ret) {
> + struct team *team = netdev_priv(dev);
> +
> + free_percpu(team->pcpu_stats);
> + }
> +
> + return ret;
>  }
> 
>  static int team_validate(struct nlattr *tb[], struct nlattr *data[]) diff
--git
> a/drivers/net/veth.c b/drivers/net/veth.c index 8c39d6d..f60f5ee 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -457,13 +457,13 @@ static int veth_newlink(struct net *src_net, struct
> net_device *dev,
>   return 0;
> 
>  err_register_dev:
> - /* nothing to do */
> + free_percpu(dev->vstats);
>  err_configure_peer:
>   unregister_netdevice(peer);
>   return err;
> 
>  err_register_peer:
> - free_netdev(peer);
> + veth_dev_free(peer);
>   return err;
>  }
> 
> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index
> 1270207..a15826a 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -156,7 +156,11 @@ static int vlan_newlink(struct net *src_net, struct
> net_device *dev,
>   if (err < 0)
>   return err;
> 
> - return register_vlan_dev(dev);
> + err = register_vlan_dev(dev);
> + if (err)
> + free_percpu(vlan->vlan_pcpu_stats);
> +
> + return err;
>  }
> 
>  static inline size_t vlan_qos_map_size(unsigned int n) diff --git
> a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index 823abae..4acb296
> 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -63,6 +63,8 @@
>  #include 
>  #endif
> 
> +static void ip_tunnel_dev_free(struct net_device *dev);
> +
>  static unsigned int ip_tunnel_hash(__be32 key, __be32 remote)  {
>   return hash_32((__force u32)key ^ (__force u32)remote, @@ -285,7
> +287,7 @@ static struct net_device *__ip_tunnel_create(struct ne

RE: [PATCH net 1/1] net: tcp: Increase TCPABORTONLINGER when send RST by linger2 in keepalive timer

2017-04-12 Thread Gao Feng
Hi David,

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, April 13, 2017 1:21 AM
> To: gfree.w...@foxmail.com
> Cc: kuz...@ms2.inr.ac.ru; jmor...@namei.org; ka...@trash.net;
> ncardw...@google.com; netdev@vger.kernel.org; f...@ikuai8.com
> Subject: Re: [PATCH net 1/1] net: tcp: Increase TCPABORTONLINGER when send
> RST by linger2 in keepalive timer
> 
> From: gfree.w...@foxmail.com
> Date: Sun,  9 Apr 2017 20:44:41 +0800
> 
> > From: Gao Feng <f...@ikuai8.com>
> >
> > It should increase TCPABORTONLINGER counter when send RST caused by
> > linger2 in keepalive timer.
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> > ---
> >  net/ipv4/tcp_timer.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index
> > b2ab411..5c01f21 100644
> > --- a/net/ipv4/tcp_timer.c
> > +++ b/net/ipv4/tcp_timer.c
> > @@ -650,6 +650,8 @@ static void tcp_keepalive_timer (unsigned long data)
> > tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);
> > goto out;
> > }
> > +   } else {
> > +   NET_INC_STATS(sock_net(sk),
LINUX_MIB_TCPABORTONLINGER);
> > }
> > tcp_send_active_reset(sk, GFP_ATOMIC);
> > goto death;
> 
> I think this else clause is completely unnecessary.  Just do it right
above the
> tcp_send_active_reset() call and at the same indentation level.
> 
> Alternatively, if you are trying to only bump the counter when tp->linger2
is
> >= 0, then you attached the else clause to the wrong if() test.
> 
> Thank you.

Actually I only want increase the TCPABORTONLINGER when linger2 is negative
which
means the timeout of FIN_WAIT2 is 0. We need to make socket state is closed
and send
one RST to peer.

Because tmo may be zero too, so increase this counter in the else block to
avoid it.

Best Regards
Feng




RE: [PATCH net-next v2 1/1] net: ipv4: Refine the ipv4_default_advmss

2017-04-11 Thread Gao Feng
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Joe Perches
> Sent: Wednesday, April 12, 2017 11:57 AM
> To: gfree.w...@foxmail.com; da...@davemloft.net; kuz...@ms2.inr.ac.ru;
> jmor...@namei.org; netdev@vger.kernel.org
> Cc: Gao Feng <f...@ikuai8.com>
> Subject: Re: [PATCH net-next v2 1/1] net: ipv4: Refine the
ipv4_default_advmss
> 
> On Wed, 2017-04-12 at 10:32 +0800, gfree.w...@foxmail.com wrote:
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> 
> more trivia:
> 
> > @@ -1250,14 +1250,11 @@ static void set_class_tag(struct rtable *rt,
> > u32 tag)
> >
> >  static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
> > {
> > -   unsigned int advmss = dst_metric_raw(dst, RTAX_ADVMSS);
> > +   unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct
iphdr);
> > +   unsigned int advmss = max_t(unsigned int, dst->dev->mtu -
header_size,
> > +   ip_rt_min_advmss);
> >
> > -   if (advmss == 0) {
> > -   advmss = max_t(unsigned int, dst->dev->mtu - 40,
> > -  ip_rt_min_advmss);
> > -   if (advmss > 65535 - 40)
> > -   advmss = 65535 - 40;
> > -   }
> > +   advmss = min(advmss, IPV4_MAX_PMTU - header_size);
> > return advmss;
> >  }
> 
> This would probably be simpler to read as:
> 
>   return min(advmss, IPV4_MAX_PMTU - header_size);
> 
> though it's almost certain that the compiler emits identical object code.

Yes. I thought about writing it with the way you mentioned, but gave up.
But i could follow you this time.

Best Regards
Feng





RE: [PATCH net-next 1/1] net: ipv4: Refine the ipv4_default_advmss

2017-04-11 Thread Gao Feng
> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Wednesday, April 12, 2017 10:03 AM
> To: gfree.w...@foxmail.com; da...@davemloft.net; kuz...@ms2.inr.ac.ru;
> jmor...@namei.org; netdev@vger.kernel.org
> Cc: Gao Feng <f...@ikuai8.com>
> Subject: Re: [PATCH net-next 1/1] net: ipv4: Refine the
ipv4_default_advmss
> 
> On Wed, 2017-04-12 at 09:36 +0800, gfree.w...@foxmail.com wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
> > 1. Don't get the metric RTAX_ADVMSS of dst.
> > There are two reasons.
> > 1) Its caller dst_metric_advmss has already invoke dst_metric_advmss
> > before invoke default_advmss.
> > 2) The ipv4_default_advmss is used to get the default mss, it should
> > not try to get the metric like ip6_default_advmss.
> >
> > 2. Use sizeof(tcphdr)+sizeof(iphdr) instead of literal 40.
> >
> > 3. Define one new macro IPV4_MAX_PMTU instead of 65535 according to
> > RFC 2675, section 5.1.
> 
> trivia:
> 
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> []
> > @@ -1250,14 +1250,11 @@ static void set_class_tag(struct rtable *rt,
> > u32 tag)
> >
> >  static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
> > {
> > -   unsigned int advmss = dst_metric_raw(dst, RTAX_ADVMSS);
> > +   unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct
iphdr);
> > +   unsigned int advmss = max_t(unsigned int, dst->dev->mtu -
header_size,
> > +   ip_rt_min_advmss);
> >
> > -   if (advmss == 0) {
> > -   advmss = max_t(unsigned int, dst->dev->mtu - 40,
> > -  ip_rt_min_advmss);
> > -   if (advmss > 65535 - 40)
> > -   advmss = 65535 - 40;
> > -   }
> > +   advmss = min_t(unsigned int, advmss, IPV4_MAX_PMTU - header_size);
> 
> as all the elements are now unsigned int, the min_t cast seems
unnecessary.

Thanks, it is unnecessary to use min_t.
I have corrected it now.

Best Regards
Feng





RE: [PATCH net 1/1] net: tcp: Don't increase the TCP_MIB_OUTRSTS when fail to transmit RST

2017-04-06 Thread Gao Feng
Hi Neal

> -Original Message-
> 
> On Thu, Apr 6, 2017 at 10:05 AM, Gao Feng <gfree.w...@foxmail.com> wrote:
> > If so, we should increase the TCP_MIB_OUTRSTS too when fail to alloc skb.
> > When machine is overloaded and mem is exhausted, it may fail to alloc skb.
> 
> Moving the increment of TCP_MIB_OUTRSTS to the top of
> tcp_send_active_reset() sounds fine to me.
> 
> neal
I sent the v2 patch, and didn't change the tcp_v4_send_reset and 
tcp_v6_send_response.
Because I think they are only used during connecting. The rst count is not as 
important as 
tcp_send_active_reset.

Regards
Feng



.


RE: [PATCH net 1/1] net: tcp: Don't increase the TCP_MIB_OUTRSTS when fail to transmit RST

2017-04-06 Thread Gao Feng
Hi Neal,

> -Original Message-
> From: Neal Cardwell [mailto:ncardw...@google.com]
> Sent: Thursday, April 6, 2017 10:01 PM
> To: Gao Feng <gfree.w...@foxmail.com>
> Cc: David Miller <da...@davemloft.net>; Alexey Kuznetsov
> <kuz...@ms2.inr.ac.ru>; James Morris <jmor...@namei.org>; Patrick McHardy
> <ka...@trash.net>; Netdev <netdev@vger.kernel.org>; Gao Feng
> <f...@ikuai8.com>
> Subject: Re: [PATCH net 1/1] net: tcp: Don't increase the TCP_MIB_OUTRSTS
> when fail to transmit RST
> 
> On Thu, Apr 6, 2017 at 9:35 AM,  <gfree.w...@foxmail.com> wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
> > When fail to transmit RST, don't increase TCP_MIB_OUTRSTS in func
> > tcp_send_active_reset like the case that it only increases
> > LINUX_MIB_TCPABORTFAILED when fail to alloc skb.
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> > ---
> 
> I would be concerned that this is a change in the semantics of
> TCP_MIB_OUTRSTS that might break user-space monitoring tools that rely on
> the current semantics. Counting attempted RSTs could be an important signal
> to monitor, and it could be quite bad if that signal is lost or hidden 
> because the
> machine is so overloaded that the transmission of the RSTs fails.
> 
> Also it would seem to muddy the semantics a bit, since both
> tcp_v4_send_reset() and tcp_v6_send_response() currently increment
> TCP_MIB_OUTRSTS without regard to whether the transmit actually succeeded
> or not.
> 
> neal
If so, we should increase the TCP_MIB_OUTRSTS too when fail to alloc skb.
When machine is overloaded and mem is exhausted, it may fail to alloc skb.

Regards
Feng



RE: [PATCH net-next 1/1] net: tcp: Define the TCP_MAX_WSCALE instead of literal number 14

2017-04-03 Thread Gao Feng
> -Original Message-
> From: Neal Cardwell [mailto:ncardw...@google.com]
> Sent: Monday, April 3, 2017 2:59 AM
> To: gfree.w...@foxmail.com
> Cc: David Miller <da...@davemloft.net>; Alexey Kuznetsov
> <kuz...@ms2.inr.ac.ru>; James Morris <jmor...@namei.org>; Patrick McHardy
> <ka...@trash.net>; Netdev <netdev@vger.kernel.org>; Gao Feng
> <f...@ikuai8.com>
> Subject: Re: [PATCH net-next 1/1] net: tcp: Define the TCP_MAX_WSCALE
> instead of literal number 14
> 
> On Sat, Apr 1, 2017 at 12:14 AM,  <gfree.w...@foxmail.com> wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
> > Define one new macro TCP_MAX_WSCALE instead of literal number '14',
> > and use U16_MAX instead of 65535 as the max value of TCP window.
> > There is another minor change, use rounddown(space, mss) instead of
> > (space / mss) * mss;
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> > ---
> 
> Looks OK to me.
> 
> IMHO this is a nice direction, since there is a proposal to increase the 
> maximum
> shift factor from 14 to 15:
>   https://tools.ietf.org/html/draft-nishida-tcpm-maxwin-03

Do you mean enlarge the wscale to 15 now ?

> 
> Though there are a few spots where a literal 14 appears:
> 
> tcp_input.c:3763:  net_info_ratelimited("%s: Illegal window scaling
> value %d >14 received\n",
> 
> tcp_output.c:238:* See RFC1323 for an explanation of
> the limit to 14

Ok, I will update these two spots.

Regards
Feng
> 
> 
> neal





RE: [PATCH nf-next 1/1] net: tcp: Refine the __tcp_select_window

2017-03-30 Thread Gao Feng

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Friday, March 31, 2017 6:42 AM
> To: gfree.w...@foxmail.com
> Cc: kuz...@ms2.inr.ac.ru; jmor...@namei.org; netdev@vger.kernel.org;
> f...@ikuai8.com
> Subject: Re: [PATCH nf-next 1/1] net: tcp: Refine the __tcp_select_window
> 
> From: gfree.w...@foxmail.com
> Date: Thu, 30 Mar 2017 06:49:19 +0800
> 
> > From: Gao Feng <f...@ikuai8.com>
> >
> > 1. Move the "window = tp->rcv_wnd;" into the condition block without
> > tp->rx_opt.rcv_wscale.
> > Because it is unnecessary when enable wscale;
> >
> > 2. Use the macro ALIGN instead of two statements.
> > The two statements are used to make window align to 1< > Use the ALIGN is more clearer.
> >
> > 3. Use the rounddown to make codes clearer.
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> 
> Applied, but please do not target non-netfilter patches using "nf-next" in
your
> Subject lines.

Sorry, I misspelled the subject.
I would pay more attention on it.

Regards
Feng






RE: [PATCH net-next 1/1] tcp: sysctl: Fix a race to avoid unexpected 0 window from space

2017-03-24 Thread Gao Feng
> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Stephen Hemminger
> Sent: Saturday, March 25, 2017 4:32 AM
> To: gfree.w...@foxmail.com
> Cc: da...@davemloft.net; netdev@vger.kernel.org; Gao Feng
> <f...@ikuai8.com>
> Subject: Re: [PATCH net-next 1/1] tcp: sysctl: Fix a race to avoid
unexpected 0
> window from space
> 
> On Fri, 24 Mar 2017 07:05:12 +0800
> gfree.w...@foxmail.com wrote:
> 
> > From: Gao Feng <f...@ikuai8.com>
> >
> > Because sysctl_tcp_adv_win_scale could be changed any time, so there
> > is one race in tcp_win_from_space.
> > For example,
> > 1.sysctl_tcp_adv_win_scale<=0 (sysctl_tcp_adv_win_scale is negative
> > now)
> > 2.space>>(-sysctl_tcp_adv_win_scale) (sysctl_tcp_adv_win_scale is
> > postive now)
> >
> > As a result, tcp_win_from_space returns 0. It is unexpected.
> >
> > Certainly if the compiler put the sysctl_tcp_adv_win_scale into one
> > register firstly, then use the register directly, it would be ok.
> > But we could not depend on the compiler behavior.
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> > ---
> >  include/net/tcp.h | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h index
> > 6ec4ea6..119b592 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -1252,9 +1252,11 @@ void tcp_select_initial_window(int __space,
> > __u32 mss, __u32 *rcv_wnd,
> >
> >  static inline int tcp_win_from_space(int space)  {
> > -   return sysctl_tcp_adv_win_scale<=0 ?
> > -   (space>>(-sysctl_tcp_adv_win_scale)) :
> > -   space - (space>>sysctl_tcp_adv_win_scale);
> > +   int tcp_adv_win_scale = sysctl_tcp_adv_win_scale;
> > +
> > +   return tcp_adv_win_scale <= 0 ?
> > +   (space>>(-tcp_adv_win_scale)) :
> > +   space - (space>>tcp_adv_win_scale);
> >  }
> >
> >  /* Note: caller must be prepared to deal with negative returns */
> 
> You need to use READ_ONCE() to be safe. The compiler is free to optimized
the
> code back to the original unless READ_ONCE() is used.

Thanks your reminder.
I could not figure out why compiler would do the optimization, rollback the
global variable instead of local tmp variable.
It breaks the original logic.

BTW, I think the READ_ONCE is used to avoid reorder. 
Could you give some guides please ?

Best Regards
Feng






RE: [PATCH nf v3 1/1] netfilter: snmp: Fix one possible panic when snmp_trap_helper fail to register

2017-03-24 Thread Gao Feng
> -Original Message-
> From: Pablo Neira Ayuso [mailto:pa...@netfilter.org]
> Sent: Saturday, March 25, 2017 3:02 AM
> To: f...@ikuai8.com
> Cc: da...@davemloft.net; jo...@redhat.com;
netfilter-de...@vger.kernel.org;
> netdev@vger.kernel.org; gfree.w...@gmail.com
> Subject: Re: [PATCH nf v3 1/1] netfilter: snmp: Fix one possible panic
when
> snmp_trap_helper fail to register
> 
> On Tue, Mar 21, 2017 at 08:22:29AM +0800, f...@ikuai8.com wrote:
> > From: Gao Feng <f...@ikuai8.com>
> >
> > In the commit 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack
> > snmp helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So the
> > snmp_helper is never registered. But it still tries to unregister the
> > snmp_helper, it could cause the panic.
> >
> > Now remove the useless snmp_helper and the unregister call in the
> > error handler.
> >
> > Fixes: 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack snmp
> > helper")
> >
> > Signed-off-by: Gao Feng <f...@ikuai8.com>
> > ---
> >  v3: Remove the angle brackets in description, per Sergei
> >  v2: Add the SHA1 ID in the description, per Sergei
> >  v1: Initial version
> >
> >  net/ipv4/netfilter/nf_nat_snmp_basic.c | 14 +-
> >  1 file changed, 1 insertion(+), 13 deletions(-)
> >
> > diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c
> > b/net/ipv4/netfilter/nf_nat_snmp_basic.c
> > index c9b52c3..5787364 100644
> > --- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
> > +++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
> > @@ -1260,16 +1260,6 @@ static int help(struct sk_buff *skb, unsigned int
> protoff,
> > .timeout= 180,
> >  };
> >
> > -static struct nf_conntrack_helper snmp_helper __read_mostly = {
> > -   .me = THIS_MODULE,
> > -   .help   = help,
> > -   .expect_policy  = _exp_policy,
> > -   .name   = "snmp",
> > -   .tuple.src.l3num= AF_INET,
> > -   .tuple.src.u.udp.port   = cpu_to_be16(SNMP_PORT),
> > -   .tuple.dst.protonum = IPPROTO_UDP,
> > -};
> > -
> >  static struct nf_conntrack_helper snmp_trap_helper __read_mostly = {
> > .me = THIS_MODULE,
> > .help   = help,
> > @@ -1294,10 +1284,8 @@ static int __init nf_nat_snmp_basic_init(void)
> > RCU_INIT_POINTER(nf_nat_snmp_hook, help);
> >
> > ret = nf_conntrack_helper_register(_trap_helper);
> > -   if (ret < 0) {
> > -   nf_conntrack_helper_unregister(_helper);
> > +   if (ret < 0)
> > return ret;
> > -   }
> > return ret;
> 
> You can provide a more simplified version after this is removed, I
> think:
Ok. I thought it should not change the style for bug fix.
I would sent the update patch.

Best Regards
Feng

> 
> @@ -1283,10 +1283,7 @@ static int __init nf_nat_snmp_basic_init(void)
> BUG_ON(nf_nat_snmp_hook != NULL);
> RCU_INIT_POINTER(nf_nat_snmp_hook, help);
> 
> -   ret = nf_conntrack_helper_register(_trap_helper);
> -   if (ret < 0)
> -   return ret;
> -   return ret;
> +   return nf_conntrack_helper_register(_trap_helper);
>  }



RE: [PATCH nf v3 1/1] netfilter: snmp: Fix one possible panic when snmp_trap_helper fail to register

2017-03-24 Thread Gao Feng
Hi Pablo,

> -Original Message-
> From: netfilter-devel-ow...@vger.kernel.org
> [mailto:netfilter-devel-ow...@vger.kernel.org] On Behalf Of Pablo Neira
Ayuso
> Sent: Friday, March 24, 2017 8:23 PM
> To: f...@ikuai8.com
> Cc: da...@davemloft.net; jo...@redhat.com;
netfilter-de...@vger.kernel.org;
> netdev@vger.kernel.org; gfree.w...@gmail.com
> Subject: Re: [PATCH nf v3 1/1] netfilter: snmp: Fix one possible panic
when
> snmp_trap_helper fail to register
> 
> On Fri, Mar 24, 2017 at 01:21:30PM +0100, Pablo Neira Ayuso wrote:
> > On Tue, Mar 21, 2017 at 08:22:29AM +0800, f...@ikuai8.com wrote:
> > > From: Gao Feng <f...@ikuai8.com>
> > >
> > > In the commit 93557f53e1fb ("netfilter: nf_conntrack: nf_conntrack
> > > snmp helper"), the snmp_helper is replaced by nf_nat_snmp_hook. So
> > > the snmp_helper is never registered. But it still tries to
> > > unregister the snmp_helper, it could cause the panic.
> >
> > This patch looks correct to me.
> >
> > But probably for some reason I don't manage to trigger, how do you
> > reproduce this?
> 
> I'm refering to the panic.
[高峰] 
It is got by reviewing the codes.
When nf_conntrack_helper_unregister(_helper), but snmp_helper didn't
register actually.
And snmp_helper. hnode is not initialized, it would trigger the issue when
hlist_del_rcu in nf_conntrack_helper_unregister.

Best Regards
Feng


> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel"
in the
> body of a message to majord...@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html



Re: [PATCH nf 1/1] netfilter: h323,sip: Fix possible dead loop in nat_rtp_rtcp and nf_nat_sdp_media

2017-03-02 Thread Gao Feng
Hi Liping,

On Thu, Mar 2, 2017 at 7:18 PM, Liping Zhang <zlpnob...@gmail.com> wrote:
> Hi,
> 2017-03-02 18:18 GMT+08:00 Gao Feng <f...@ikuai8.com>:
> [...]
>> The expect class is NF_CT_EXPECT_CLASS_DEFAULT, and proto is
>> IPPROTO_UDP at the function "expect_rtp_rtcp",
>> And it makes sure the port is even number.
>>
>> But look at the process_gcf, the port is got from the packet data at
>> function get_h225_addr.
>> So it may be odd number.
>> It also would add one expect node whose class is
>> NF_CT_EXPECT_CLASS_DEFAULT, and proto is IPPROTO_UDP.
>
> The nat_rtp_rtcp() is only invoked by expect_rtp_rtcp, and nf_nat_sdp_media()
> is only invoked by set_expected_rtp_rtcp. So the RTP port is ensured to be
> even, as well as the rtp_exp->tuple.dst.u.udp.port.
>
> Note: the rtp_exp is alloced by nf_ct_expect_alloc, and initialized by
> nf_ct_expect_init, then passed to nat_rtp_rtcp or nf_nat_sdp_media.
>
> So I cannot figure out why process_gcf will affect this? Or I missed 
> something?
>

I was lost in codes and forgot to check the caller of nat_rtp_rtcp.
Thanks your explanations.

There is no any issue in current codes indeed.

Best Regards
Feng




Re: [PATCH nf 1/1] netfilter: h323,sip: Fix possible dead loop in nat_rtp_rtcp and nf_nat_sdp_media

2017-03-02 Thread Gao Feng
Hi Liping,

On Thu, Mar 2, 2017 at 4:50 PM, Liping Zhang <zlpnob...@gmail.com> wrote:
> Hi,
>
> 2017-03-02 15:57 GMT+08:00  <f...@ikuai8.com>:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When h323 and sip try to insert expect nodes, they would increase
>> the port by 2 for loop, and the loop condition is that "port != 0".
>> So when the start port is odd number, port never increases to zero.
>
> This seems will never happen, since the RTP port has been ensured to
> be even.
>
> For example, at expect_rtp_rtcp():
>...
>/* RTP port is even */
>rtp_port = port & ~htons(1);
>rtcp_port = port | htons(1);
>
> And at set_expected_rtp_rtcp():
>...
>base_port = ntohs(tuple.dst.u.udp.port) & ~1;
>rtp_port = htons(base_port);

Thanks your point, I actually did not attention these codes you mentioned.
But I checked the other codes, I think the dangerous may exist.

The expect class is NF_CT_EXPECT_CLASS_DEFAULT, and proto is
IPPROTO_UDP at the function "expect_rtp_rtcp",
And it makes sure the port is even number.

But look at the process_gcf, the port is got from the packet data at
function get_h225_addr.
So it may be odd number.
It also would add one expect node whose class is
NF_CT_EXPECT_CLASS_DEFAULT, and proto is IPPROTO_UDP.

BTW, the loop in nat_rtp_rtcp and nat_rtp_rtcp depend on one potential
condition which is protected by other functions.
It doesn't seem a good style.

Regards
Feng




Re: [lkp-robot] [net] 0148239373: ltp.test_1_to_1_sockopt.fail

2017-02-14 Thread Gao Feng
Hi Xiaolong,

On Wed, Feb 15, 2017 at 9:46 AM, kernel test robot
 wrote:
>
> FYI, we noticed the following commit:
>
> commit: 0148239373801a9c0c63eefc481b710ae7ce2941 ("net: sock: Use double 
> send/recv buff value to compare with max value")
> url: 
> https://github.com/0day-ci/linux/commits/fgao-ikuai8-com/net-sock-Use-double-send-recv-buff-value-to-compare-with-max-value/20170208-213745
>
>
> in testcase: ltp
> with following parameters:
>
> test: net.sctp
>
> test-description: The LTP testsuite contains a collection of tools for 
> testing the Linux kernel and related features.
> test-url: http://linux-test-project.github.io/
>
>
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 
> 128G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire 
> log/backtrace):
>
>
>
> user  :notice: [   54.674772] tag=test_1_to_1_sockopt stime=1487093637
>
> user  :notice: [   54.675078] cmdline="test_1_to_1_sockopt"
>
> user  :notice: [   54.675194] contacts=""
>
> user  :notice: [   54.675333] analysis=exit
>
> user  :notice: [   54.675513] <<>>
>
> user  :notice: [   54.676497] test_1_to_1_sockopt.c1  TPASS  :  
> setsockopt() with a bad socket descriptor - EBADF
>
> user  :notice: [   54.677439] test_1_to_1_sockopt.c2  TPASS  :  
> setsockopt() with an invalid socket - ENOTSOCK
>
> user  :notice: [   54.678409] test_1_to_1_sockopt.c3  TPASS  :  
> setsockopt() with an invalid level - ENOPROTOOPT
>
> user  :notice: [   54.679378] test_1_to_1_sockopt.c4  TPASS  :  
> setsockopt() with invalid option buffer - EFAULT
>
> user  :notice: [   54.680375] test_1_to_1_sockopt.c5  TPASS  :  
> setsockopt() with invalid option name - EOPNOTSUPP
>
> user  :notice: [   54.681354] test_1_to_1_sockopt.c6  TPASS  :  
> getsockopt() with a bad socket descriptor - EBADF
>
> user  :notice: [   54.682297] test_1_to_1_sockopt.c7  TPASS  :  
> getsockopt() with an invalid socket - ENOTSOCK
>
> user  :notice: [   54.683265] test_1_to_1_sockopt.c8  TPASS  :  
> getsockopt() with invalid option buffer - EFAULT
>
> user  :notice: [   54.684267] test_1_to_1_sockopt.c9  TPASS  :  
> getsockopt() with invalid option name - EOPNOTSUPP
>
> user  :notice: [   54.685079] test_1_to_1_sockopt.c   10  TPASS  :  
> getsockopt() SCTP_INITMSG - SUCCESS
>
> user  :notice: [   54.685885] test_1_to_1_sockopt.c   11  TPASS  :  
> setsockopt() SCTP_INITMSG - SUCCESS
>
> user  :notice: [   54.686653] test_1_to_1_sockopt.c   12  TPASS  :  
> setsockopt() SO_LINGER - SUCCESS
>
> user  :notice: [   54.687418] test_1_to_1_sockopt.c   13  TPASS  :  
> getsockopt() SO_LINGER - SUCCESS
>
> user  :notice: [   54.688187] test_1_to_1_sockopt.c   14  TPASS  :  
> getsockopt() SO_RCVBUF - SUCCESS
>
> user  :notice: [   54.688979] test_1_to_1_sockopt.c   15  TPASS  :  
> getsockopt() SCTP_STATUS - SUCCESS
>
> user  :notice: [   54.689752] test_1_to_1_sockopt.c   16  TPASS  :  
> setsockopt() SO_RCVBUF - SUCCESS
>
> user  :notice: [   54.691539] test_1_to_1_sockopt.c   17  TBROK  :  
> test_1_to_1_sockopt.c:341: Comparison failed:Set value and got value differs 
> Set Value=4096 Get Value=8192
>
> user  :notice: [   54.692520] test_1_to_1_sockopt.c   18  TBROK  :  
> test_1_to_1_sockopt.c:341: Remaining cases broken
>
>
>
> To reproduce:
>
> git clone 
> git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git
> cd lkp-tests
> bin/lkp install job.yaml  # job file is attached in this email
> bin/lkp run job.yaml
>
>
>
> Thanks,
> Xiaolong

This commit is not accepted, thanks.
After discuss with Eric, I know it is too late to correct it.

Regards
Feng




Re: [PATCH net 1/1] net: sock: Use double send/recv buff value to compare with max value

2017-02-09 Thread Gao Feng
On Thu, Feb 9, 2017 at 12:00 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2017-02-08 at 21:07 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> Because the value of SO_SNDBUF and SO_RCVBUF is doubled before
>> assignment, so the real value of send and recv buffer could be more
>> than the max sysctl config sysctl_wmem_max and sysctl_rmem_max.
>>
>> Now use doulbe send/recv buffer value to compare with sysctl_wmem_max
>> and sysctl_rmem_max, and it keeps consistence with SOCK_MIN_SNDBUF
>> and SOCK_MIN_RCVBUF.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>
> Looks completely bogus, based on your comprehension of this code.

It is a config param, user could config any value.
So why give it one bogus?
If need more, user could config it by himself.

>
> If you need to, fix the doc, not this code.

The current codes mean the buffer could exceed the sysctl max value.
It seems inconsistent.

Regards
Feng

>
> Unless you give more details of course, why we should take your patch.
>
>
>
>




Re: [PATCH net-next 1/1] driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address

2016-12-20 Thread Gao Feng
On Wed, Dec 21, 2016 at 2:30 AM, David Miller  wrote:
> From: f...@ikuai8.com
> Date: Mon, 19 Dec 2016 09:24:05 +0800
>
>>  It is sent again because the first email is sent during net-next closing.
>
> It is still closed, and will not open again for at least one week.

Thanks David.
I thought it only last one week.

I would waiting for reopen, and resend again.

Regards
Feng




[PATCH] vsock: lookup and setup guest_cid inside vhost_vsock_lock

2016-12-14 Thread Gao feng
Multi vsocks may setup the same cid at the same time.

Signed-off-by: Gao feng <omarapazan...@gmail.com>
---
 drivers/vhost/vsock.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e3b30ea..a08332b 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -50,11 +50,10 @@ static u32 vhost_transport_get_local_cid(void)
return VHOST_VSOCK_DEFAULT_HOST_CID;
 }
 
-static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+static struct vhost_vsock *__vhost_vsock_get(u32 guest_cid)
 {
struct vhost_vsock *vsock;
 
-   spin_lock_bh(_vsock_lock);
list_for_each_entry(vsock, _vsock_list, list) {
u32 other_cid = vsock->guest_cid;
 
@@ -63,15 +62,24 @@ static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
continue;
 
if (other_cid == guest_cid) {
-   spin_unlock_bh(_vsock_lock);
return vsock;
}
}
-   spin_unlock_bh(_vsock_lock);
 
return NULL;
 }
 
+static struct vhost_vsock *vhost_vsock_get(u32 guest_cid)
+{
+   struct vhost_vsock *vsock;
+
+   spin_lock_bh(_vsock_lock);
+   vsock = __vhost_vsock_get(guest_cid);
+   spin_unlock_bh(_vsock_lock);
+
+   return vsock;
+}
+
 static void
 vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
struct vhost_virtqueue *vq)
@@ -562,11 +570,12 @@ static int vhost_vsock_set_cid(struct vhost_vsock *vsock, 
u64 guest_cid)
return -EINVAL;
 
/* Refuse if CID is already in use */
-   other = vhost_vsock_get(guest_cid);
-   if (other && other != vsock)
-   return -EADDRINUSE;
-
spin_lock_bh(_vsock_lock);
+   other = __vhost_vsock_get(guest_cid);
+   if (other && other != vsock) {
+   spin_unlock_bh(_vsock_lock);
+   return -EADDRINUSE;
+   }
vsock->guest_cid = guest_cid;
spin_unlock_bh(_vsock_lock);
 
-- 
2.5.5



Re: [PATCH net 1/1] driver: ipvlan: Unlink the upper dev when ipvlan_link_new failed

2016-12-07 Thread Gao Feng
On Thu, Dec 8, 2016 at 9:39 AM, Mahesh Bandewar (महेश बंडेवार)
<mahe...@google.com> wrote:
> On Wed, Dec 7, 2016 at 5:21 PM,  <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When netdev_upper_dev_unlink failed in ipvlan_link_new, need to
>> unlink the ipvlan dev with upper dev.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  drivers/net/ipvlan/ipvlan_main.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
>> b/drivers/net/ipvlan/ipvlan_main.c
>> index 0fef178..189adbc 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -546,13 +546,15 @@ static int ipvlan_link_new(struct net *src_net, struct 
>> net_device *dev,
>> }
>> err = ipvlan_set_port_mode(port, mode);
>> if (err) {
>> -   goto unregister_netdev;
>> +   goto dev_unlink;
>> }
>>
>> list_add_tail_rcu(>pnode, >ipvlans);
>> netif_stacked_transfer_operstate(phy_dev, dev);
>> return 0;
>>
>> +dev_unlink:
> probably 'unlink_netdev' label inline with other labels used. thanks

OK, it is better name.
I will follow it and send v2 update.

Regards
Feng

>> +   netdev_upper_dev_unlink(phy_dev, dev);
>>  unregister_netdev:
>> unregister_netdevice(dev);
>>  destroy_ipvlan_port:
>> --
>> 1.9.1
>>
>>




Re: [PATCH net-next v2 1/1] driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu

2016-12-06 Thread Gao Feng
Hi Eric,

On Tue, Dec 6, 2016 at 11:18 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Tue, 2016-12-06 at 21:54 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <gfree.w...@gmail.com>
>>
>> There is no one which may reference the ipvlan port when free it in
>> ipvlan_port_create and ipvlan_port_destroy. So it is unnecessary to
>> use kfree_rcu.
>
> You did not really explain _why_ it was safe/unnecessary.
> Why should anyone trust you ?

Thanks your point.
I found the reason yesterday after receive your suggestion and reply
the last v1 email.
Then I send the v2 patch.
I assume the reviewer would know more than me, so I didn't add more details.

I will add more details in v3 patch.

>
> The reason an RCU grace period is not needed is that
> netdev_rx_handler_unregister() already enforces a grace period.
>
> My guess is ipvlan copied code in macvlan.
>
> At the time macvlan was written, commit
> 00cfec37484761a44 ("net: add a synchronize_net() in
> netdev_rx_handler_unregister()") was not there yet.
>
> macvlan could be changed the same way.

Yes. After I find the netdev_rx_handler_unregister which enforces one
grace period.
I prepare to check other codes.

Best Regards
Feng

>
>
>




Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Gao Feng
Hi Eric,

On Tue, Dec 6, 2016 at 2:53 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Tue, 2016-12-06 at 14:31 +0800, Gao Feng wrote:
>
>> Because I don't fully hold the ipvlan codes now, I am afraid of that
>> there is someone which may get the port address when
>> ipvlan_port_destroy. So the original ipvlan_port_destroy uses the
>> kfree_rcu to avoid it.
>>
>> I am sure there is unnecessary to use kfree in ipvlan_port_create.
>
> And I am pretty sure it is unnecessary to use kfree_rcu() in
> ipvlan_port_destroy() as well.
>
> I highly suggest you spend time on learning why.
>
>
>

Thanks your suggestion.
I will send v2 patch after get the reason by myself.

Begards
Feng




Re: [PATCH net-next 1/1] driver: ipvlan: Free the port memory directly with kfree instead of kfree_rcu

2016-12-05 Thread Gao Feng
Hi Eric,

On Tue, Dec 6, 2016 at 2:25 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Tue, 2016-12-06 at 12:29 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> There is no one which may reference the "port" in ipvlan_port_create
>> when netdev_rx_handler_register failed. So it could free it directly
>> with kfree instead of kfree_rcu.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  drivers/net/ipvlan/ipvlan_main.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
>> b/drivers/net/ipvlan/ipvlan_main.c
>> index c6aa667..1a601151 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
>>   return 0;
>>
>>  err:
>> - kfree_rcu(port, rcu);
>> + kfree(port);
>>   return err;
>>  }
>>
>
> This looks a partial patch.
>
> If you really care, why don't you also replace the kfree_rcu() in
> ipvlan_port_destroy() ?

Because I don't fully hold the ipvlan codes now, I am afraid of that
there is someone which may get the port address when
ipvlan_port_destroy. So the original ipvlan_port_destroy uses the
kfree_rcu to avoid it.

I am sure there is unnecessary to use kfree in ipvlan_port_create.

Regards
Feng

>
>
>
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 
> 05a62d2216c54651f6158c35d446d2e395b38dc3..031093e1c25f55244e6bdfde4ebeb65c0f2f10c1
>  100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -97,7 +97,6 @@ struct ipvl_port {
> struct work_struct  wq;
> struct sk_buff_head backlog;
> int count;
> -   struct rcu_head rcu;
>  };
>
>  static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device 
> *d)
> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
> b/drivers/net/ipvlan/ipvlan_main.c
> index 
> 5430460167b5e8945d29a3febdd324461bf5af5c..ffe8994e64fc1791ef07d80ad2340bc82d541bba
>  100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
> return 0;
>
>  err:
> -   kfree_rcu(port, rcu);
> +   kfree(port);
> return err;
>  }
>
> @@ -145,7 +145,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
> netdev_rx_handler_unregister(dev);
> cancel_work_sync(>wq);
> __skb_queue_purge(>backlog);
> -   kfree_rcu(port, rcu);
> +   kfree(port);
>  }
>
>  #define IPVLAN_FEATURES \
>
>
>




Re: [PATCH net-next v2 1/1] driver: ipvlan: Use NF_IP_PRI_LAST as hook priority instead of INT_MAX

2016-11-28 Thread Gao Feng
Hi Mahesh,

On Tue, Nov 29, 2016 at 3:26 AM, Mahesh Bandewar (महेश बंडेवार)
<mahe...@google.com> wrote:
> On Sun, Nov 27, 2016 at 3:18 AM,  <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> It is better to use NF_IP_PRI_LAST instead of INT_MAX as hook priority.
>> The former is good at readability and easier to maintain.
>>
> This IPvlan hook has to be "absolute" last hook and at this moment
> NF_IP_PRI_LAST is set as INT_MAX so it's not altering anything.

Yes. It is same now.
So I prefer to use NF_IP_PRI_LAST than INT_MAX.
Because the nf_hook_ops belongs to the netfilter module. i think the
ipvlan codes should follow its rule.
Since netfilter has defined some specific priority enum value, why
don't we follow it?

>
> If for whatever reasons the value of NF_IP_PRI_LAST changes, there
> could be random IPvlan failure. Since that possibility cannot be
> denied and there are several places INT_MAX is still used as hook
> priority, I don't see any gain in having this patch in fact there
> could be future (possible) downside.

If the netfilter module changed the value of NF_IP_PRI_LAST, it may
decrease it and add one check for the hook priority.
As a result, the ipvlan may fail to register because of invalid priority.

When use INT_MAX not NF_IP_PRI_LAST, there is one assumption that the
hook priority is never changed.
I think it is not good as two different modules.

Regards
Feng

>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  v2: Add the lost header file. It is added in local but not in v1 patch
>>  v1: Inital patch
>>
>>  drivers/net/ipvlan/ipvlan_main.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c 
>> b/drivers/net/ipvlan/ipvlan_main.c
>> index ab90b22..01c7446 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -7,6 +7,7 @@
>>   *
>>   */
>>
>> +#include "linux/netfilter_ipv4.h"
>>  #include "ipvlan.h"
>>
>>  static u32 ipvl_nf_hook_refcnt = 0;
>> @@ -16,13 +17,13 @@
>> .hook = ipvlan_nf_input,
>> .pf   = NFPROTO_IPV4,
>> .hooknum  = NF_INET_LOCAL_IN,
>> -   .priority = INT_MAX,
>> +   .priority = NF_IP_PRI_LAST,
>> },
>> {
>> .hook = ipvlan_nf_input,
>> .pf   = NFPROTO_IPV6,
>> .hooknum  = NF_INET_LOCAL_IN,
>> -   .priority = INT_MAX,
>> +   .priority = NF_IP_PRI_LAST,
>> },
>>  };
>>
>> --
>> 1.9.1
>>
>>




Re: [PATCH net-next 1/1] driver: ipvlan: Use NF_IP_PRI_LAST as hook priority instead of INT_MAX

2016-11-27 Thread Gao Feng
On Sun, Nov 27, 2016 at 7:17 PM, kbuild test robot  wrote:
> Hi Gao,
>
> [auto build test ERROR on net-next/master]
>
> url:
> https://github.com/0day-ci/linux/commits/fgao-ikuai8-com/driver-ipvlan-Use-NF_IP_PRI_LAST-as-hook-priority-instead-of-INT_MAX/20161127-182724
> config: x86_64-allmodconfig (attached as .config)
> compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All errors (new ones prefixed by >>):
>
>>> drivers/net/ipvlan/ipvlan_main.c:19:15: error: 'NF_IP_PRI_LAST' undeclared 
>>> here (not in a function)
>   .priority = NF_IP_PRI_LAST,
>   ^~
>
> vim +/NF_IP_PRI_LAST +19 drivers/net/ipvlan/ipvlan_main.c
>
> 13
> 14  static struct nf_hook_ops ipvl_nfops[] __read_mostly = {
> 15  {
> 16  .hook = ipvlan_nf_input,
> 17  .pf   = NFPROTO_IPV4,
> 18  .hooknum  = NF_INET_LOCAL_IN,
>   > 19  .priority = NF_IP_PRI_LAST,
> 20  },
> 21  {
> 22  .hook = ipvlan_nf_input,
>
> ---
> 0-DAY kernel test infrastructureOpen Source Technology Center
> https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Sorry, it broke the build.
I add the header file, but forget to commit, so that this patch does
not contain header file

Now I have sent the v2 patch.

Regards
Feng




Re: [PATCH net 1/1] net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit

2016-11-21 Thread Gao Feng
Hi Cong,

On Tue, Nov 22, 2016 at 1:29 PM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Sun, Nov 20, 2016 at 4:56 PM,  <f...@ikuai8.com> wrote:
>> From: Gao Feng <gfree.w...@gmail.com>
>>
>> The tc could return NET_XMIT_CN as one congestion notification, but
>> it does not mean the packe is lost. Other modules like ipvlan,
>> macvlan, and others treat NET_XMIT_CN as success too.
>> So l2tp_eth_dev_xmit should add the NET_XMIT_CN check.
>>
>> Signed-off-by: Gao Feng <gfree.w...@gmail.com>
>> ---
>>  net/l2tp/l2tp_eth.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
>> index 965f7e3..3dc97b4 100644
>> --- a/net/l2tp/l2tp_eth.c
>> +++ b/net/l2tp/l2tp_eth.c
>> @@ -97,7 +97,7 @@ static int l2tp_eth_dev_xmit(struct sk_buff *skb, struct 
>> net_device *dev)
>> unsigned int len = skb->len;
>> int ret = l2tp_xmit_skb(session, skb, session->hdr_len);
>>
>> -   if (likely(ret == NET_XMIT_SUCCESS)) {
>> +   if (likely(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)) {
>
> How could l2tp_xmit_skb() possibly return NET_XMIT_CN?
> Note It ignores the return value of l2tp_xmit_core().

You are right. I didn't think about it.

Regards
Feng




Re: [PATCH net v2 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully

2016-11-21 Thread Gao Feng
Hi Florian,


On Mon, Nov 21, 2016 at 9:24 PM, Gao Feng <f...@ikuai8.com> wrote:
> Hi Florian,
>
> On Mon, Nov 21, 2016 at 7:09 PM, Florian Westphal <f...@strlen.de> wrote:
>> f...@ikuai8.com <f...@ikuai8.com> wrote:
>>> From: Gao Feng <f...@ikuai8.com>
>>>
>>> The tc could return NET_XMIT_CN as one congestion notification, but
>>> it does not mean the packet is lost. Other modules like ipvlan,
>>> macvlan, and others treat NET_XMIT_CN as success too.
>>>
>>> So batman-adv should add the NET_XMIT_CN check.
>>
>> "The tc could return NET_XMIT_CN as one congestion notification, but
>> it means another packet got dropped. Other modules like batman do not
>> treat NET_XMIT_CN as success, so modules like ipvlan, macvlan, ..
>> should ignore it as well."

I didn't get you in the last email

You mean the NET_XMIT_CN should be treated as one error?
But the comment of NET_XMIT_CN says "It indicates that the device will
soon be dropping packets, or already drops some packets of the same
priority". It is not sure another packet is dropped.

BTW, the macro "net_xmit_eval" does not treat NET_XMIT_CN as one error.

Regards
Feng

>>
>> What I am asking is:
>> Are you sure adding NET_XMIT_CN handling everywhere is the right way to
>> resolve the inconsistency?
>
> Or create one new macro to handle this case like net_xmit_eval.
> For example,
> #define net_xmit_ok(ret)(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)
>
> Is it ok? But it should be done for net-next.
>
> Best Regards
> Feng




Re: [PATCH net v2 1/1] net: batman-adv: Treat NET_XMIT_CN as transmit successfully

2016-11-21 Thread Gao Feng
Hi Florian,

On Mon, Nov 21, 2016 at 7:09 PM, Florian Westphal <f...@strlen.de> wrote:
> f...@ikuai8.com <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> The tc could return NET_XMIT_CN as one congestion notification, but
>> it does not mean the packet is lost. Other modules like ipvlan,
>> macvlan, and others treat NET_XMIT_CN as success too.
>>
>> So batman-adv should add the NET_XMIT_CN check.
>
> "The tc could return NET_XMIT_CN as one congestion notification, but
> it means another packet got dropped. Other modules like batman do not
> treat NET_XMIT_CN as success, so modules like ipvlan, macvlan, ..
> should ignore it as well."
>
> What I am asking is:
> Are you sure adding NET_XMIT_CN handling everywhere is the right way to
> resolve the inconsistency?

Or create one new macro to handle this case like net_xmit_eval.
For example,
#define net_xmit_ok(ret)(ret == NET_XMIT_SUCCESS || ret == NET_XMIT_CN)

Is it ok? But it should be done for net-next.

Best Regards
Feng




Re: [PATCH net 1/1] driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.

2016-11-07 Thread Gao Feng
Hi David,

On Tue, Nov 8, 2016 at 9:33 AM, David Miller <da...@davemloft.net> wrote:
> From: f...@ikuai8.com
> Date: Fri,  4 Nov 2016 10:28:49 +0800
>
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When there is no existing macvlan port in lowdev, one new macvlan port
>> would be created. But it doesn't be destoried when something failed later.
>> It casues some memleak.
>>
>> Now add one flag to indicate if new macvlan port is created.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>
> You need to be more patient, it sometimes take several days before
> your get patch reviewed or applied.  Sometimes nobody reviews a change
> for some time because it is obscure or everyone is busy.

Sorry, it is my fault.
I thought this patch may be lost because there are too many emails every day.
So ping just as a reminder.

>
> All patches are tracked in patchwork, so it is never an issue of a
> change getting "lost".  Therefore, it never makes sense to ping the
> list again and ask if a change is "ok".
>
> Personally, when people ping like that, it makes me want to review
> that patch _less_ not more.  So please do not do it.
>
> Thank you.
>

Now I get it. I violated some rules as a fresh man in kernel community.
Sorry again.


Best Regards
Feng




Re: [PATCH net-next v2 1/1] driver: veth: Refine the statistics codes of veth driver

2016-11-03 Thread Gao Feng
Hi Eric,

On Thu, Nov 3, 2016 at 11:07 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-11-03 at 22:38 +0800, Gao Feng wrote:
>
>> Because other net devices put the statistics together.
>> Take tun/tap as example, it is a virtual device, but its all
>> statistics are percpu including dropped.
>
> Take a look at 2681128f0ced8aa4e66f221197e183cc16d244fe
> ("veth: reduce stat overhead")
>
> Feel free to fix tun/tap, not bloat veth and undo my work,
> without knowing why it was done this way.
>
> Thanks.
>
>

Thanks your detail explanations.

Best Regards
Feng




Re: [PATCH net-next v2 1/1] driver: veth: Refine the statistics codes of veth driver

2016-11-03 Thread Gao Feng
On Thu, Nov 3, 2016 at 10:31 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-11-03 at 21:39 +0800, Gao Feng wrote:
>> Hi Eric,
>>
>> On Thu, Nov 3, 2016 at 9:30 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> > On Thu, 2016-11-03 at 21:03 +0800, f...@ikuai8.com wrote:
>> >> From: Gao Feng <f...@ikuai8.com>
>> >>
>> >> The dropped count of veth is located in struct veth_priv, but other
>> >> statistics like packets and bytes are in another struct pcpu_vstats.
>> >> Now keep these three counters in the same struct.
>> >>
>> >> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> >> ---
>> >>  v2: Use right "peer" instead of "dev";
>> >>  v1: Initial version
>> >
>> > May I ask : Why ?
>>
>> Just because I think statistics should be in the same struct.
>
> That is not a good reason then.

Because other net devices put the statistics together.
Take tun/tap as example, it is a virtual device, but its all
statistics are percpu including dropped.

Regards
Feng

>
>>
>> >
>> > We did that because there was no point making per-cpu requirements
>> > bigger, for a counter that is hardly ever updated.
>> >
>> > Do you have a real case where performance dropping packets in a driver
>> > is needed ?
>>
>> No, I haven't met the performance issue now.
>
> OK then kill this patch.
>
>>
>> >
>> > At some point we will have to stop dumb percpu explosion, when we have
>> > 128+ cores per host. Folding all these percpu counters is taking a lot
>> > of time too.
>> >
>> >
>> >
>> Ok, I get it. It is designed specially to put the dropped counter as
>> atomic counter, not percpu.
>> But I have one question that when put the counters as percpu, and when not?
>
> Because the regular fast path needs to be fast ?
>
> Try to _use_ veth without these percpu stats and be prepared to be
> shocked.
>
>
>




Re: [PATCH net-next v2 1/1] driver: veth: Refine the statistics codes of veth driver

2016-11-03 Thread Gao Feng
Hi Eric,

On Thu, Nov 3, 2016 at 9:30 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Thu, 2016-11-03 at 21:03 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> The dropped count of veth is located in struct veth_priv, but other
>> statistics like packets and bytes are in another struct pcpu_vstats.
>> Now keep these three counters in the same struct.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  v2: Use right "peer" instead of "dev";
>>  v1: Initial version
>
> May I ask : Why ?

Just because I think statistics should be in the same struct.

>
> We did that because there was no point making per-cpu requirements
> bigger, for a counter that is hardly ever updated.
>
> Do you have a real case where performance dropping packets in a driver
> is needed ?

No, I haven't met the performance issue now.

>
> At some point we will have to stop dumb percpu explosion, when we have
> 128+ cores per host. Folding all these percpu counters is taking a lot
> of time too.
>
>
>
Ok, I get it. It is designed specially to put the dropped counter as
atomic counter, not percpu.
But I have one question that when put the counters as percpu, and when not?

Regards
Feng




Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Gao Feng
Hi Florian,

On Thu, Nov 3, 2016 at 8:58 AM, Florian Fainelli <f.faine...@gmail.com> wrote:
> On 11/02/2016 05:52 PM, Gao Feng wrote:
>> Hi Cong,
>>
>> On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
>>> On Wed, Nov 2, 2016 at 2:59 AM,  <f...@ikuai8.com> wrote:
>>>> From: Gao Feng <f...@ikuai8.com>
>>>>
>>>> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
>>>> sent successfully. Now return the actual value instead of NETDEV_TX_OK
>>>> always.
>>>>
>>>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>>>> ---
>>>>  drivers/net/veth.c | 7 +--
>>>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>>> index fbc853e..769a3bd 100644
>>>> --- a/drivers/net/veth.c
>>>> +++ b/drivers/net/veth.c
>>>> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
>>>> struct net_device *dev)
>>>> struct veth_priv *priv = netdev_priv(dev);
>>>> struct net_device *rcv;
>>>> int length = skb->len;
>>>> +   int ret = NETDEV_TX_OK;
>>>>
>>>> rcu_read_lock();
>>>> rcv = rcu_dereference(priv->peer);
>>>> if (unlikely(!rcv)) {
>>>> kfree_skb(skb);
>>>> +   ret = NET_RX_DROP;
>>>
>>>
>>> Returning NET_RX_DROP doesn't look correct in a xmit function.
>>
>> Yes. But I don't find good macro.
>> NETDEV_TX_BUSY or NET_RX_DROP, which is better ?
>
> There is no much choice you need to return a correct value from the
> netdev_tx_t enum, which NET_RX_DROP is not part of, so that probably
> means using NETDEV_TX_OK here, the packet has been freed, and there is
> no flow control problem mandating the return of NETDEV_TX_BUSY it seems...
> --
> Florian

Thanks your explanation.
It means the veth_xmit must return NETDEV_TX_OK.

Regards
Feng




Re: [PATCH net 1/1] driver: veth: Return the actual value instead return NETDEV_TX_OK always

2016-11-02 Thread Gao Feng
Hi Cong,

On Thu, Nov 3, 2016 at 4:22 AM, Cong Wang <xiyou.wangc...@gmail.com> wrote:
> On Wed, Nov 2, 2016 at 2:59 AM,  <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> Current veth_xmit always returns NETDEV_TX_OK whatever if it is really
>> sent successfully. Now return the actual value instead of NETDEV_TX_OK
>> always.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  drivers/net/veth.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>> index fbc853e..769a3bd 100644
>> --- a/drivers/net/veth.c
>> +++ b/drivers/net/veth.c
>> @@ -111,15 +111,18 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, 
>> struct net_device *dev)
>> struct veth_priv *priv = netdev_priv(dev);
>> struct net_device *rcv;
>> int length = skb->len;
>> +   int ret = NETDEV_TX_OK;
>>
>> rcu_read_lock();
>> rcv = rcu_dereference(priv->peer);
>> if (unlikely(!rcv)) {
>> kfree_skb(skb);
>> +   ret = NET_RX_DROP;
>
>
> Returning NET_RX_DROP doesn't look correct in a xmit function.

Yes. But I don't find good macro.
NETDEV_TX_BUSY or NET_RX_DROP, which is better ?

Thanks
Feng

>
>
>> goto drop;
>> }
>>
>> -   if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
>> +   ret = dev_forward_skb(rcv, skb);
>> +   if (likely(ret == NET_RX_SUCCESS)) {
>> struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
>>
>> u64_stats_update_begin(>syncp);
>> @@ -131,7 +134,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
>> net_device *dev)
>> atomic64_inc(>dropped);
>> }
>> rcu_read_unlock();
>> -   return NETDEV_TX_OK;
>> +   return ret;
>>  }
>>
>>  /*
>> --
>> 1.9.1
>>
>>




Re: [PATCH] net: avoid uninitialized variable

2016-10-26 Thread Gao Feng
On Thu, Oct 27, 2016 at 11:56 AM, zhongjiang  wrote:
> From: zhong jiang 
>
> when I compiler the newest kernel, I hit the following error with
> Werror=may-uninitalized.
>
> net/core/flow_dissector.c: In function ?._skb_flow_dissect?
> include/uapi/linux/swab.h:100:46: error: ?.lan?.may be used uninitialized in 
> this function [-Werror=maybe-uninitialized]
> net/core/flow_dissector.c:248:26: note: ?.lan?.was declared here
>
> This adds an additional check for proto to explicitly tell the compiler
> that vlan pointer have the correct value before it is used.
>
> Signed-off-by: zhong jiang 
> ---
>  net/core/flow_dissector.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 1a7b80f..a04d9cf 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -245,7 +245,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> }
> case htons(ETH_P_8021AD):
> case htons(ETH_P_8021Q): {
> -   const struct vlan_hdr *vlan;
> +   const struct vlan_hdr *vlan = NULL;
>
> if (skb_vlan_tag_present(skb))
> proto = skb->protocol;
> @@ -276,7 +276,8 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> key_vlan->vlan_id = skb_vlan_tag_get_id(skb);
> key_vlan->vlan_priority =
> (skb_vlan_tag_get_prio(skb) >> 
> VLAN_PRIO_SHIFT);
> -   } else {
> +   } else if (proto == cpu_to_be16(ETH_P_8021Q) ||
> +   proto == 
> cpu_to_be16(ETH_P_8021AD)) {
> key_vlan->vlan_id = ntohs(vlan->h_vlan_TCI) &
> VLAN_VID_MASK;
> key_vlan->vlan_priority =
> --
> 1.8.3.1
>

It seems there is one similar patch already.
You could refer to https://patchwork.kernel.org/patch/9389565/

Regards
Feng




Re: [PATCH net-next 1/1] driver: tun: Use new macro SOCK_IOC_MAGIC instead of literal number 0x89

2016-10-26 Thread Gao Feng
Hi David,

On Wed, Oct 26, 2016 at 7:12 PM, David Laight <david.lai...@aculab.com> wrote:
> From: f...@ikuai8.com
>> Sent: 25 October 2016 13:56
>> The current codes use _IOC_TYPE(cmd) == 0x89 to check if the cmd is one
>> socket ioctl command like SIOCGIFHWADDR. But the literal number 0x89 may
>> confuse readers. So create one macro SOCK_IOC_MAGIC like SPI_IOC_MAGIC to
>> enhance the readability.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  drivers/net/tun.c| 2 +-
>>  include/uapi/linux/sockios.h | 2 ++
>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 9328568..9efd185 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1985,7 +1985,7 @@ static long __tun_chr_ioctl(struct file *file, 
>> unsigned int cmd,
>>   int le;
>>   int ret;
>>
>> - if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 0x89) {
>> + if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == 
>> SOCK_IOC_MAGIC) {
>>   if (copy_from_user(, argp, ifreq_len))
>>   return -EFAULT;
>>   } else {
>> diff --git a/include/uapi/linux/sockios.h b/include/uapi/linux/sockios.h
>> index 8e7890b..b8f42f2 100644
>> --- a/include/uapi/linux/sockios.h
>> +++ b/include/uapi/linux/sockios.h
>> @@ -24,6 +24,8 @@
>>  #define SIOCINQ  FIONREAD
>>  #define SIOCOUTQ TIOCOUTQ/* output queue size (not sent + not 
>> acked) */
>>
>> +#define SOCK_IOC_MAGIC   0x89
>> +
>>  /* Routing table calls. */
>>  #define SIOCADDRT0x890B  /* add routing table entry  */
>>  #define SIOCDELRT0x890C  /* delete routing table entry   */
>
> Shouldn't these constants be defined in terms of SOCK_IOC_MAGIC?
> And there must be a better name!
>
> David
>

How about the SOCK_IOC_TYPE or SOCK_IOC_BASE ?
There are ETRAXGPIO_IOCTYPE and HARDWALL_IOCTL_BASE already.

Regards
Feng




Re: [PATCH v2 net-next 1/1] driver: tun: Forbid to set IFF_TUN and IFF_TAP at the same time

2016-10-21 Thread Gao Feng
Hi Eric,

On Fri, Oct 21, 2016 at 7:28 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Fri, 2016-10-21 at 19:02 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> Current tun driver permits the ifr_flags is set with IFF_TUN and
>> IFF_TAP at the same time. But actually there is only IFF_TUN flag
>> works. And it does not make sense these two flags are set, so add
>> this check.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  v2: Remove useless {}
>>  v1: Initial patch
>>
>>  drivers/net/tun.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 8093e39..faaa189 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1752,6 +1752,9 @@ static int tun_set_iff(struct net *net, struct file 
>> *file, struct ifreq *ifr)
>>   if (err < 0)
>>   return err;
>>
>> + if ((ifr->ifr_flags & (IFF_TUN | IFF_TAP)) == (IFF_TUN | 
>> IFF_TAP))
>> + return -EINVAL;
>> +
>>   /* Set dev type */
>>   if (ifr->ifr_flags & IFF_TUN) {
>>   /* TUN device */
>
>
> This might break some applications.

Yes. I consider about this case.
But I think there should be very least applications which set these
two flags at the same time.

>
> It might be too late to add this check without a grace period.
>
>
>
Yes, It needs some discussions.

Regards
Feng




Re: [PATCH net-next 1/1] net: vlan: Use sizeof instead of literal number

2016-10-17 Thread Gao Feng
Hi David,

On Tue, Oct 18, 2016 at 9:36 AM, David Miller  wrote:
>
> It never makes sense to send the same patch for both net and net-next.
>
> If it's a bug fix, it goes to 'net'.  And it will be eventually
> be naturally merged into 'net-next'.
>
> Otherwise, if it's a new feature, cleanup, or optimization it goes to
> 'net-next'.

Because I forget add the "net-next" in the title of first patch, so I
send the second patch with right title.
And I have replied the first patch and said the reason.

Regards
Feng




Re: [PATCH nf-next] netfilter: xt_osf: Use explicit member assignment to avoid implicit no padding rule

2016-09-27 Thread Gao Feng
Hi Liping,

On Tue, Sep 27, 2016 at 2:05 PM, Liping Zhang <zlpnob...@gmail.com> wrote:
> Hi Feng,
>
> 2016-09-27 14:00 GMT+08:00 Gao Feng <f...@ikuai8.com>:
>> Hi Liping,
>>
>>>
>>> This xt_osf_user_finger{} is carefully designed, no padding now, and
>>> will not be changed in the future, otherwise backward compatibility will
>>> be broken.
>>
>> Yes, there is no padding now. So it is ok to use memcmp now.
>> I am afraid the struct would be modified for other requirements.
>
> This is structure was passed by netlink attribute, so modify it will
> break backward compatibility.

Reasonable.
Thanks Liping.

Regards
Feng

>
>>
>> If it is never changed forever, it is ok certainly.
>>
>>>
>>> I don't think this convert is necessary, actually it is a little ugly, and 
>>> will
>>> increase the maintenance burden.
>>
>> I just want the codes don't depend any implicit rule.
>>
>> It is a tradeoff. If never change, needn't do any convert.
>> If may change, the memcmp is a little dangerous.
>>
>> Regards
>> Feng




Re: [PATCH nf-next] netfilter: xt_osf: Use explicit member assignment to avoid implicit no padding rule

2016-09-27 Thread Gao Feng
Hi Liping,


On Tue, Sep 27, 2016 at 1:49 PM, Liping Zhang <zlpnob...@gmail.com> wrote:
> Hi Feng,
>
> 2016-09-27 12:39 GMT+08:00  <f...@ikuai8.com>:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> Current xt_osf codes use memcmp to check if two user fingers are same,
>> so it depends on that the struct xt_osf_user_finger is no padding.
>> It is one implicit rule, and is not good to maintain.
>>
>> Now use zero memory and assign the members explicitly.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  net/netfilter/xt_osf.c | 32 ++--
>>  1 file changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/xt_osf.c b/net/netfilter/xt_osf.c
>> index 2455b69..9793670 100644
>> --- a/net/netfilter/xt_osf.c
>> +++ b/net/netfilter/xt_osf.c
>> @@ -61,6 +61,34 @@ static const struct nla_policy xt_osf_policy[OSF_ATTR_MAX 
>> + 1] = {
>> [OSF_ATTR_FINGER]   = { .len = sizeof(struct xt_osf_user_finger) 
>> },
>>  };
>>
>> +static void copy_user_finger(struct xt_osf_user_finger *dst,
>> +const struct xt_osf_user_finger *src)
>> +{
>> +#define OSF_COPY_MEMBER(mem)   dst->mem = src->mem
>> +
>> +   int i;
>> +
>> +   OSF_COPY_MEMBER(wss.wc);
>> +   OSF_COPY_MEMBER(wss.val);
>> +
>> +   OSF_COPY_MEMBER(ttl);
>> +   OSF_COPY_MEMBER(df);
>> +   OSF_COPY_MEMBER(ss);
>> +   OSF_COPY_MEMBER(mss);
>> +   OSF_COPY_MEMBER(opt_num);
>> +
>> +   memcpy(dst->genre, src->genre, sizeof(dst->genre));
>> +   memcpy(dst->version, src->version, sizeof(dst->version));
>> +   memcpy(dst->subtype, src->subtype, sizeof(dst->subtype));
>> +
>> +   for (i = 0; i < MAX_IPOPTLEN; ++i) {
>> +   OSF_COPY_MEMBER(opt[i].kind);
>> +   OSF_COPY_MEMBER(opt[i].length);
>> +   OSF_COPY_MEMBER(opt[i].wc.wc);
>> +   OSF_COPY_MEMBER(opt[i].wc.val);
>> +   }
>> +}
>> +
>
> This xt_osf_user_finger{} is carefully designed, no padding now, and
> will not be changed in the future, otherwise backward compatibility will
> be broken.

Yes, there is no padding now. So it is ok to use memcmp now.
I am afraid the struct would be modified for other requirements.

If it is never changed forever, it is ok certainly.

>
> I don't think this convert is necessary, actually it is a little ugly, and 
> will
> increase the maintenance burden.

I just want the codes don't depend any implicit rule.

It is a tradeoff. If never change, needn't do any convert.
If may change, the memcmp is a little dangerous.

Regards
Feng




Re: [PATCH v5 nf] netfilter: seqadj: Drop the packet directly when fail to add seqadj extension to avoid dereference NULL pointer later

2016-09-06 Thread Gao Feng
inline

On Tue, Sep 6, 2016 at 10:51 PM, Florian Westphal <f...@strlen.de> wrote:
> f...@ikuai8.com <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When memory is exhausted, nfct_seqadj_ext_add may fail to add the seqadj
>> extension. But the function nf_ct_seqadj_init doesn't check if get valid
>> seqadj pointer by the nfct_seqadj.
>>
>> Now drop the packet directly when fail to add seqadj extension to avoid
>> dereference NULL pointer in nf_ct_seqadj_init.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  v5: Return NF_ACCEPT instead of NF_DROP when nfct_seqadj_ext_add failed in 
>> nf_nat_setup_info
>>  v4: Drop the packet directly when fail to add seqadj extension;
>>  v3: Remove the warning log when seqadj is null;
>>  v2: Remove the unnessary seqadj check in nf_ct_seq_adjust
>>  v1: Initial patch
>>
>>  net/netfilter/nf_conntrack_core.c | 6 +-
>>  net/netfilter/nf_nat_core.c   | 3 ++-
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c 
>> b/net/netfilter/nf_conntrack_core.c
>> index dd2c43a..dfa76ce 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1036,7 +1036,11 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
>>   return (struct nf_conntrack_tuple_hash *)ct;
>>
>>   if (tmpl && nfct_synproxy(tmpl)) {
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct)) {
>> + nf_conntrack_free(ct);
>> + pr_debug("Can't add seqadj extension\n");
>> + return NULL;
>> + }
>
> if (!nfct_add_synrpxy(ct, tmpl)) {
> nf_conntrack_free(ct);
> return NULL;
> }
>
> static bool nf_ct_add_synproxy(struct nf_conn *ct, const struct nf_conn *tmpl)
> {
> if (tmpl && nfct_synproxy(tmpl)) {
> if (!nfct_seqadj_ext_add(ct))
> return false;
>
> if (!nfct_synproxy_ext_add(ct))
> return false;
> }
>
> return true;
> }
>
>> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
>> index de31818..f8b916a 100644
>> --- a/net/netfilter/nf_nat_core.c
>> +++ b/net/netfilter/nf_nat_core.c
>> @@ -441,7 +441,8 @@ nf_nat_setup_info(struct nf_conn *ct,
>>   ct->status |= IPS_DST_NAT;
>>
>>   if (nfct_help(ct))
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct))
>> + return NF_ACCEPT;
>>   }
>
> Hmm, why accept?
>
> We are asked to add extension to rewrite sequence numbers, but
> we cannot.  How can the connection work if we cannot munge/track
> seqno rewrites?

I thought we should drop the packet in that case before.
But Pablo point out one case that the ctnetlink also could add the
seqadj extension too.
There is one synchronization case.

I think he is right. Then modify the patch according to his advice.

Best Regards
Feng




Re: [PATCH v4 nf] netfilter: seqadj: Drop the packet directly when fail to add seqadj extension to avoid dereference NULL pointer later

2016-09-06 Thread Gao Feng
inline

On Tue, Sep 6, 2016 at 6:17 PM, Pablo Neira Ayuso <pa...@netfilter.org> wrote:
> On Tue, Sep 06, 2016 at 09:57:23AM +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When memory is exhausted, nfct_seqadj_ext_add may fail to add the seqadj
>> extension. But the function nf_ct_seqadj_init doesn't check if get valid
>> seqadj pointer by the nfct_seqadj.
>>
>> Now drop the packet directly when fail to add seqadj extension to avoid
>> dereference NULL pointer in nf_ct_seqadj_init.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>>  v4: Drop the packet directly when fail to add seqadj extension;
>>  v3: Remove the warning log when seqadj is null;
>>  v2: Remove the unnessary seqadj check in nf_ct_seq_adjust
>>  v1: Initial patch
>>
>>  net/netfilter/nf_conntrack_core.c | 6 +-
>>  net/netfilter/nf_nat_core.c   | 3 ++-
>>  2 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c 
>> b/net/netfilter/nf_conntrack_core.c
>> index dd2c43a..dfa76ce 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1036,7 +1036,11 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
>>   return (struct nf_conntrack_tuple_hash *)ct;
>>
>>   if (tmpl && nfct_synproxy(tmpl)) {
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct)) {
>> + nf_conntrack_free(ct);
>> + pr_debug("Can't add seqadj extension\n");
>> + return NULL;
>> + }
>>   nfct_synproxy_ext_add(ct);
>
> I think this is part of the same logical change, ie. nf_ct_ext_add()
> returns NULL, then I would also fix nfct_synproxy_ext_add() in this
> go.
>
Sorry, I am not clear well.
Need I modify the logic of this part?

>>   }
>>
>> diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
>> index de31818..b82282a 100644
>> --- a/net/netfilter/nf_nat_core.c
>> +++ b/net/netfilter/nf_nat_core.c
>> @@ -441,7 +441,8 @@ nf_nat_setup_info(struct nf_conn *ct,
>>   ct->status |= IPS_DST_NAT;
>>
>>   if (nfct_help(ct))
>> - nfct_seqadj_ext_add(ct);
>> + if (!nfct_seqadj_ext_add(ct))
>> + return NF_DROP;
>
> ctnetlink may have created a conntrack with seqadj in place by when we
> call nf_nat_setup_info() so NF_ACCEPT would be more conservative, eg.
> via conntrackd state synchronization.
>
> Actually, after a quick look at ctnetlink, I don't see any any call to
> nfct_seqadj_ext_add() from there, so I suspect this is broken since
> SYNPROXY was introduced. It would be great if you can review this and
> send us patches to fix this, if indeed needed.
>
> Thanks!

Let me confirm the problem, or I am afraid I would misunderstand our meaning.
This patch only need to modify the "NF_DROP" to "NF_ACCEPT", is it?

Then I could commit another patch to fix the ctnetlink lost nfct_seqadj_ext_add.

Best Regards
Feng




Re: [PATCH v2 1/2 nf] netfilter: seqadj: Fix one possible panic in seqadj when mem is exhausted

2016-09-02 Thread Gao Feng
Hi Florian,

On Fri, Sep 2, 2016 at 2:59 PM, Florian Westphal <f...@strlen.de> wrote:
> f...@ikuai8.com <f...@ikuai8.com> wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> When memory is exhausted, nfct_seqadj_ext_add may fail to add the seqadj
>> extension. But the function nf_ct_seqadj_init doesn't check if get valid
>> seqadj pointer by the nfct_seqadj, while other functions perform the
>> sanity check.
>>
>> So the system would be panic when nfct_seqadj_ext_add failed.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>
>> diff --git a/net/netfilter/nf_conntrack_seqadj.c 
>> b/net/netfilter/nf_conntrack_seqadj.c
>> index dff0f0c..2c8e201 100644
>> --- a/net/netfilter/nf_conntrack_seqadj.c
>> +++ b/net/netfilter/nf_conntrack_seqadj.c
>> @@ -16,9 +16,14 @@ int nf_ct_seqadj_init(struct nf_conn *ct, enum 
>> ip_conntrack_info ctinfo,
>>   if (off == 0)
>>   return 0;
>>
>> + seqadj = nfct_seqadj(ct);
>> + if (unlikely(!seqadj)) {
>> + WARN_ONCE(1, "Missing nfct_seqadj_ext_add() setup call\n");
>> + return 0;
>> + }
>> +
>
> Not sure this WARN() is really needed, I would remove it (since its most
> likely only missing due to memory shortage).
>
> Other than that, this looks good.
>

I prefer to keep the warning, although it only happens caused by mem
shortage now.
But it would give the accurate description, if the nfct_seqadj_ext_add
was lost when new features were depending on synadj.


Best Regards
Feng




Re: [PATCH 1/2 nf] netfilter: seqadj: Fix some possible panics of seqadj when mem is exhausted

2016-09-01 Thread Gao Feng
Hi Liping,

On Fri, Sep 2, 2016 at 12:50 PM, Liping Zhang <zlpnob...@gmail.com> wrote:
> Hi Feng,
> 2016-09-02 9:48 GMT+08:00  <f...@ikuai8.com>:
>> From: Gao Feng <f...@ikuai8.com>
>> @@ -171,6 +176,11 @@ int nf_ct_seq_adjust(struct sk_buff *skb,
>> struct nf_ct_seqadj *this_way, *other_way;
>> int res;
>>
>> +   if (unlikely(!seqadj)) {
>
> IPS_SEQ_ADJUST_BIT will be tested before we call nf_ct_seq_adjust(),
> so I think seqadj
> will never be NULL here.

Thanks. I didn't check  IPS_SEQ_ADJUST_BIT  flag.

>
>> +   WARN_ONCE(1, "Missing nfct_seqadj_ext_add() setup call\n");
>> +   return 0;
>> +   }
>> +
>> this_way  = >seq[dir];
>> other_way = >seq[!dir];
>>
>> @@ -218,8 +228,10 @@ s32 nf_ct_seq_offset(const struct nf_conn *ct,
>> struct nf_conn_seqadj *seqadj = nfct_seqadj(ct);
>> struct nf_ct_seqadj *this_way;
>>
>> -   if (!seqadj)
>> +   if (unlikely(!seqadj)) {
>> +   WARN_ONCE(1, "Missing nfct_seqadj_ext_add() setup call\n");
>
> But in nf_ct_seq_offset, seqadj is likely to be NULL, see the function
> call path:
> tcp_packet->tcp_in_window->nf_ct_seq_offset, so WARN_ONCE seems unnecessary.
>
>> return 0;
>> +   }

Get it. It could be NULL.

Then only nf_ct_seqadj_init is necessary to check seqadj.
I will update this patch.

Regards
Feng




Re: [PATCH net] rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly

2016-08-31 Thread Gao Feng
On Wed, Aug 31, 2016 at 12:16 PM, Gao Feng <f...@ikuai8.com> wrote:
> On Wed, Aug 31, 2016 at 12:14 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>> On Wed, 2016-08-31 at 10:56 +0800, f...@ikuai8.com wrote:
>>> From: Gao Feng <f...@ikuai8.com>
>>>
>>> The original codes depend on that the function parameters are evaluated from
>>> left to right. But the parameter's evaluation order is not defined in C
>>> standard actually.
>>>
>>> When flow_keys_have_l4() is invoked before ___skb_get_hash(skb, ,
>>> hashrnd) with some compilers or environment, the keys passed to
>>> flow_keys_have_l4 is not initialized.
>>>
>>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>>> ---
>>
>> Good catch, please add
>>
>> Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value 
>> from ___skb_get_hash")
>> Acked-by: Eric Dumazet <eduma...@google.com>
>>
>>
>
> Add it into the description and resend the patch again?
>
> Best Regards
> Feng

Hi Eric,

I have sent the update "v2" patch which has the description "Fixes:
6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from
___skb_get_hash")".
Could you help review it again please?

Best Regards
Feng




Re: [PATCH net] rps: flow_dissector: Fix uninitialized flow_keys used in __skb_get_hash possibly

2016-08-30 Thread Gao Feng
On Wed, Aug 31, 2016 at 12:14 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2016-08-31 at 10:56 +0800, f...@ikuai8.com wrote:
>> From: Gao Feng <f...@ikuai8.com>
>>
>> The original codes depend on that the function parameters are evaluated from
>> left to right. But the parameter's evaluation order is not defined in C
>> standard actually.
>>
>> When flow_keys_have_l4() is invoked before ___skb_get_hash(skb, ,
>> hashrnd) with some compilers or environment, the keys passed to
>> flow_keys_have_l4 is not initialized.
>>
>> Signed-off-by: Gao Feng <f...@ikuai8.com>
>> ---
>
> Good catch, please add
>
> Fixes: 6db61d79c1e1 ("flow_dissector: Ignore flow dissector return value from 
> ___skb_get_hash")
> Acked-by: Eric Dumazet <eduma...@google.com>
>
>

Add it into the description and resend the patch again?

Best Regards
Feng




Re: [PATCH v3] vsock: fix missing cleanup when misc_register failed

2015-10-18 Thread Gao feng
I'm sorry too, fell free to ignore my patch, Thanks.

2015-10-19 9:50 GMT+08:00 David Miller <da...@davemloft.net>:
> From: Gao feng <omarapazan...@gmail.com>
> Date: Sun, 18 Oct 2015 23:35:56 +0800
>
>> reset transport and unlock if misc_register failed.
>>
>> Signed-off-by: Gao feng <omarapazan...@gmail.com>
>
> It's extremely disappointing that it took you three revisions
> just to put together a submission that actually compiles.
>
> That shows me that you put a very low amount of care into your
> submission, and if you don't care about your change why should I?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vsock: fix missing cleanup when misc_register failed

2015-10-18 Thread Gao feng
reset transport and unlock if misc_register failed.

Signed-off-by: Gao feng <omarapazan...@gmail.com>
---
 net/vmw_vsock/af_vsock.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index df5fc6b..5f5fbed 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1948,7 +1948,7 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
err = misc_register(_device);
if (err) {
pr_err("Failed to register misc device\n");
-   return -ENOENT;
+   goto err_reset_transport;
}
 
err = proto_register(_proto, 1);  /* we want our slab */
@@ -1969,8 +1969,9 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
 
 err_unregister_proto:
proto_unregister(_proto);
-err_misc_deregister:
+err_unregister_misc:
misc_deregister(_device);
+err_reset_transport:
transport = NULL;
 err_busy:
mutex_unlock(_register_mutex);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] vsock: fix missing cleanup when misc_register failed

2015-10-18 Thread Gao feng
reset transport and unlock if misc_register failed.

Signed-off-by: Gao feng <omarapazan...@gmail.com>
---
 net/vmw_vsock/af_vsock.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index df5fc6b..598e045 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1948,13 +1948,13 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
err = misc_register(_device);
if (err) {
pr_err("Failed to register misc device\n");
-   return -ENOENT;
+   goto err_reset_transport;
}
 
err = proto_register(_proto, 1);  /* we want our slab */
if (err) {
pr_err("Cannot register vsock protocol\n");
-   goto err_misc_deregister;
+   goto err_deregister_misc;
}
 
err = sock_register(_family_ops);
@@ -1969,8 +1969,9 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
 
 err_unregister_proto:
proto_unregister(_proto);
-err_misc_deregister:
+err_undegister_misc:
misc_deregister(_device);
+err_reset_transport:
transport = NULL;
 err_busy:
mutex_unlock(_register_mutex);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3] vsock: fix missing cleanup when misc_register failed

2015-10-18 Thread Gao feng
reset transport and unlock if misc_register failed.

Signed-off-by: Gao feng <omarapazan...@gmail.com>
---
 net/vmw_vsock/af_vsock.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index df5fc6b..00e8a34 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1948,13 +1948,13 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
err = misc_register(_device);
if (err) {
pr_err("Failed to register misc device\n");
-   return -ENOENT;
+   goto err_reset_transport;
}
 
err = proto_register(_proto, 1);  /* we want our slab */
if (err) {
pr_err("Cannot register vsock protocol\n");
-   goto err_misc_deregister;
+   goto err_deregister_misc;
}
 
err = sock_register(_family_ops);
@@ -1969,8 +1969,9 @@ int __vsock_core_init(const struct vsock_transport *t, 
struct module *owner)
 
 err_unregister_proto:
proto_unregister(_proto);
-err_misc_deregister:
+err_deregister_misc:
misc_deregister(_device);
+err_reset_transport:
transport = NULL;
 err_busy:
mutex_unlock(_register_mutex);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html