Re: [PATCH] rds: send: Fix dead code in rds_sendmsg
On (07/25/18 10:22), Gustavo A. R. Silva wrote: > Currently, code at label *out* is unreachable. Fix this by updating > variable *ret* with -EINVAL, so the jump to *out* can be properly > executed instead of directly returning from function. > > Addresses-Coverity-ID: 1472059 ("Structurally dead code") > Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support") > Signed-off-by: Gustavo A. R. Silva Acked-by: Sowmini Varadhan
Re: [PATCH] rds: send: Fix dead code in rds_sendmsg
On (07/25/18 10:22), Gustavo A. R. Silva wrote: > Currently, code at label *out* is unreachable. Fix this by updating > variable *ret* with -EINVAL, so the jump to *out* can be properly > executed instead of directly returning from function. > > Addresses-Coverity-ID: 1472059 ("Structurally dead code") > Fixes: 1e2b44e78eea ("rds: Enable RDS IPv6 support") > Signed-off-by: Gustavo A. R. Silva Acked-by: Sowmini Varadhan
Re: [rds-devel] KASAN: null-ptr-deref Read in rds_ib_get_mr
On (05/11/18 15:48), Yanjun Zhu wrote: > diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c > index e678699..2228b50 100644 > --- a/net/rds/ib_rdma.c > +++ b/net/rds/ib_rdma.c > @@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void) >void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, > struct rds_sock *rs, u32 *key_ret) >{ > - struct rds_ib_device *rds_ibdev; > + struct rds_ib_device *rds_ibdev = NULL; > struct rds_ib_mr *ibmr = NULL; > - struct rds_ib_connection *ic = rs->rs_conn->c_transport_data; > + struct rds_ib_connection *ic = NULL; > int ret; > > + if (rs->rs_bound_addr == 0) { > + ret = -EPERM; > + goto out; > + } > + > + ic = rs->rs_conn->c_transport_data; > rds_ibdev = rds_ib_get_device(rs->rs_bound_addr); > if (!rds_ibdev) { > ret = -ENODEV; > > I made this raw patch. If you can reproduce this bug, please make tests > with it. I dont think this solves the problem, I think it just changes the timing under which it can still happen. what if the rds_remove_bound() in rds_bind() happens after the check for if (rs->rs_bound_addr == 0) added above by the patch I believe you need some type of synchronization (either through mutex, or some atomic flag in the rs or similar) to make sure rds_bind() and rds_ib_get_mr() are mutually exclusive. --Sowmini
Re: [rds-devel] KASAN: null-ptr-deref Read in rds_ib_get_mr
On (05/11/18 15:48), Yanjun Zhu wrote: > diff --git a/net/rds/ib_rdma.c b/net/rds/ib_rdma.c > index e678699..2228b50 100644 > --- a/net/rds/ib_rdma.c > +++ b/net/rds/ib_rdma.c > @@ -539,11 +539,17 @@ void rds_ib_flush_mrs(void) >void *rds_ib_get_mr(struct scatterlist *sg, unsigned long nents, > struct rds_sock *rs, u32 *key_ret) >{ > - struct rds_ib_device *rds_ibdev; > + struct rds_ib_device *rds_ibdev = NULL; > struct rds_ib_mr *ibmr = NULL; > - struct rds_ib_connection *ic = rs->rs_conn->c_transport_data; > + struct rds_ib_connection *ic = NULL; > int ret; > > + if (rs->rs_bound_addr == 0) { > + ret = -EPERM; > + goto out; > + } > + > + ic = rs->rs_conn->c_transport_data; > rds_ibdev = rds_ib_get_device(rs->rs_bound_addr); > if (!rds_ibdev) { > ret = -ENODEV; > > I made this raw patch. If you can reproduce this bug, please make tests > with it. I dont think this solves the problem, I think it just changes the timing under which it can still happen. what if the rds_remove_bound() in rds_bind() happens after the check for if (rs->rs_bound_addr == 0) added above by the patch I believe you need some type of synchronization (either through mutex, or some atomic flag in the rs or similar) to make sure rds_bind() and rds_ib_get_mr() are mutually exclusive. --Sowmini
Re: KASAN: use-after-free Read in inet_create
#syz dup: KASAN: use-after-free Read in rds_cong_queue_updates There are a number of manifestations of this bug, basically all suggest that the connect/reconnect etc workqs are somehow being scheduled after the netns is deleted, despite the code refactoring in Commit 3db6e0d172c (and looks like the WARN_ONs in that commit are not even being triggered). We've not been able to reproduce this issues, and without a crash dump (or some hint of other threads that were running at the time of the problem) are working on figuring out the root-cause by code-inspection. --Sowmini
Re: KASAN: use-after-free Read in inet_create
#syz dup: KASAN: use-after-free Read in rds_cong_queue_updates There are a number of manifestations of this bug, basically all suggest that the connect/reconnect etc workqs are somehow being scheduled after the netns is deleted, despite the code refactoring in Commit 3db6e0d172c (and looks like the WARN_ONs in that commit are not even being triggered). We've not been able to reproduce this issues, and without a crash dump (or some hint of other threads that were running at the time of the problem) are working on figuring out the root-cause by code-inspection. --Sowmini
Re: KASAN: slab-out-of-bounds Read in rds_cong_queue_updates
On (03/19/18 09:29), Dmitry Vyukov wrote: > > This looks the same as: > > #syz dup: KASAN: use-after-free Read in rds_cong_queue_updates correct, seems like the rds_destroy_pending() fixes did not seal this race condition. I need to look at this more carefully to see what race I missed.. no easy answer here, I am afraid. --Sowmini
Re: KASAN: slab-out-of-bounds Read in rds_cong_queue_updates
On (03/19/18 09:29), Dmitry Vyukov wrote: > > This looks the same as: > > #syz dup: KASAN: use-after-free Read in rds_cong_queue_updates correct, seems like the rds_destroy_pending() fixes did not seal this race condition. I need to look at this more carefully to see what race I missed.. no easy answer here, I am afraid. --Sowmini
Re: WARNING in __local_bh_enable_ip (2)
On (03/14/18 14:28), Eric Dumazet wrote: > > > spin_lock_bh(_tcp_conn_lock);/spin_unlock_bh(_tcp_conn_lock); in > rds_tcp_conn_free() > > is in conflict with the spin_lock_irqsave(_conn_lock, flags); > in __rds_conn_create() yes I was going to look into this and fix it later. > Hard to understand why RDS is messing with hard irqs really. some of it comes from the rds_rdma history: some parts of the common rds and rds_rdma module get called in various driver contexts for infiniband. --Sowmini
Re: WARNING in __local_bh_enable_ip (2)
On (03/14/18 14:28), Eric Dumazet wrote: > > > spin_lock_bh(_tcp_conn_lock);/spin_unlock_bh(_tcp_conn_lock); in > rds_tcp_conn_free() > > is in conflict with the spin_lock_irqsave(_conn_lock, flags); > in __rds_conn_create() yes I was going to look into this and fix it later. > Hard to understand why RDS is messing with hard irqs really. some of it comes from the rds_rdma history: some parts of the common rds and rds_rdma module get called in various driver contexts for infiniband. --Sowmini
Re: [PATCH][rds-next] rds: make functions rds_info_from_znotifier and rds_message_zcopy_from_user static
On (03/11/18 18:03), Colin King wrote: > From: Colin Ian King> > Functions rds_info_from_znotifier and rds_message_zcopy_from_user are > local to the source and do not need to be in global scope, so make them > static. the rds_message_zcopy_from_user warning was already flagged by kbuild-robot last week. See https://www.spinics.net/lists/netdev/msg488041.html
Re: [PATCH][rds-next] rds: make functions rds_info_from_znotifier and rds_message_zcopy_from_user static
On (03/11/18 18:03), Colin King wrote: > From: Colin Ian King > > Functions rds_info_from_znotifier and rds_message_zcopy_from_user are > local to the source and do not need to be in global scope, so make them > static. the rds_message_zcopy_from_user warning was already flagged by kbuild-robot last week. See https://www.spinics.net/lists/netdev/msg488041.html
Re: [rds-devel] [PATCH][rds-next] rds: remove redundant variable 'sg_off'
On (03/11/18 17:27), Colin King wrote: > Variable sg_off is assigned a value but it is never read, hence it is > redundant and can be removed. > Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Re: [rds-devel] [PATCH][rds-next] rds: remove redundant variable 'sg_off'
On (03/11/18 17:27), Colin King wrote: > Variable sg_off is assigned a value but it is never read, hence it is > redundant and can be removed. > Acked-by: Sowmini Varadhan
Re: [PATCH] rds: send: mark expected switch fall-through in rds_rm_size
On (02/19/18 12:10), Gustavo A. R. Silva wrote: > > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Re: [PATCH] rds: send: mark expected switch fall-through in rds_rm_size
On (02/19/18 12:10), Gustavo A. R. Silva wrote: > > In preparation to enabling -Wimplicit-fallthrough, mark switch cases > where we are expecting to fall through. Acked-by: Sowmini Varadhan
Re: KASAN: use-after-free Read in rds_tcp_tune
On (02/14/18 16:28), Dmitry Vyukov wrote: > syzbot is probably not seeing this problem. However if you don't add > the Reported-by tag to commit, nor provide syz fix tag, it will > consider it as "open". One consequence of this is that it is still on > our radars. Another consequence is that syzbot will never report bugs > in rds_tcp_tune ever again as it thinks that it's the same known bug, > so no point in bothering anybody. understood, I think I saw this in the original syzbot mail as well, but I was hesitant to actually add the tag because the fix was based on code-inspection only, and I would have felt more comfortable about asserting the Reported-by if I'd done a clear-cut before/after verification. btw, checkpatch.pl complains about the syzbot*@syzkaller.appspotmail.com addresses as "Unrecognized email address", we should fix that error from checkpatch at some point. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_tune
On (02/14/18 16:28), Dmitry Vyukov wrote: > syzbot is probably not seeing this problem. However if you don't add > the Reported-by tag to commit, nor provide syz fix tag, it will > consider it as "open". One consequence of this is that it is still on > our radars. Another consequence is that syzbot will never report bugs > in rds_tcp_tune ever again as it thinks that it's the same known bug, > so no point in bothering anybody. understood, I think I saw this in the original syzbot mail as well, but I was hesitant to actually add the tag because the fix was based on code-inspection only, and I would have felt more comfortable about asserting the Reported-by if I'd done a clear-cut before/after verification. btw, checkpatch.pl complains about the syzbot*@syzkaller.appspotmail.com addresses as "Unrecognized email address", we should fix that error from checkpatch at some point. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_tune
On (02/14/18 16:11), Dmitry Vyukov wrote: > > Hi Sowmini, > > Was this ever fixed? What's the fix? This still hangs as open. Please > provide "syz fix" tag. Are you still seeing this problem? I had expected that the changes around rds_destroy_pending - see commit ebeeb1ad9b8a - would have taken care of this (note that ebeeb1ad9b8a refactors/updates 3db6e0d172c9) but those fixes were done by inspection only. In other words, I was never able to reproduce this, so we may still have missed some race condition. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_tune
On (02/14/18 16:11), Dmitry Vyukov wrote: > > Hi Sowmini, > > Was this ever fixed? What's the fix? This still hangs as open. Please > provide "syz fix" tag. Are you still seeing this problem? I had expected that the changes around rds_destroy_pending - see commit ebeeb1ad9b8a - would have taken care of this (note that ebeeb1ad9b8a refactors/updates 3db6e0d172c9) but those fixes were done by inspection only. In other words, I was never able to reproduce this, so we may still have missed some race condition. --Sowmini
Re: WARNING: suspicious RCU usage in rds_tcp_conn_alloc
#syz dup: WARNING: suspicious RCU usage in rds_loop_conn_alloc
Re: WARNING: suspicious RCU usage in rds_tcp_conn_alloc
#syz dup: WARNING: suspicious RCU usage in rds_loop_conn_alloc
Re: BUG: sleeping function called from invalid context at mm/slab.h:LINE (3)
#syz dup: WARNING: suspicious RCU usage in rds_loop_conn_alloc
Re: BUG: sleeping function called from invalid context at mm/slab.h:LINE (3)
#syz dup: WARNING: suspicious RCU usage in rds_loop_conn_alloc
Re: WARNING: suspicious RCU usage in rds_loop_conn_alloc
> rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by syzkaller563791/4086: > #0: (rcu_read_lock){}, at: [] > __rds_conn_create+0xe46/0x1b50 net/rds/connection.c:218 > the rcu_read_lock() was added by ebeeb1ad9b. After we add that, everything in the rcu read critical section needs to use GFP_ATOMIC to avoid sleeping. I'll look into this fix. --Sowmini
Re: WARNING: suspicious RCU usage in rds_loop_conn_alloc
> rcu_scheduler_active = 2, debug_locks = 1 > 1 lock held by syzkaller563791/4086: > #0: (rcu_read_lock){}, at: [] > __rds_conn_create+0xe46/0x1b50 net/rds/connection.c:218 > the rcu_read_lock() was added by ebeeb1ad9b. After we add that, everything in the rcu read critical section needs to use GFP_ATOMIC to avoid sleeping. I'll look into this fix. --Sowmini
Re: [rds-devel] BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
On (01/30/18 14:22), Eric Biggers wrote: > > I assume you weren't able to reproduce this? This crash hasn't been > seen again, : > I am invalidating the bug for syzbot so it will report the same crash > signature > again if it occurs, but if you think there is a real bug feel free to keep > looking into it. correct I was not able to reproduce this. However based on code inspecion, I came up with commit 3db6e0d172c94bd9953a1347c55ffb64b1d2e74f rds: use RCU to synchronize work-enqueue with connection teardown Marking it invalid sounds good to me. --Sowmini
Re: [rds-devel] BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
On (01/30/18 14:22), Eric Biggers wrote: > > I assume you weren't able to reproduce this? This crash hasn't been > seen again, : > I am invalidating the bug for syzbot so it will report the same crash > signature > again if it occurs, but if you think there is a real bug feel free to keep > looking into it. correct I was not able to reproduce this. However based on code inspecion, I came up with commit 3db6e0d172c94bd9953a1347c55ffb64b1d2e74f rds: use RCU to synchronize work-enqueue with connection teardown Marking it invalid sounds good to me. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_tune
On (01/11/18 21:29), syzbot wrote: > == > BUG: KASAN: use-after-free in rds_tcp_tune+0x491/0x520 net/rds/tcp.c:397 > Read of size 4 at addr 8801cd5f6c58 by task kworker/u4:4/4954 Just had an offline discussion with santosh around this, here's a summary of that discussion for the archives: Looks like an rds_connect_worker workq got scheduled after the netns was deleted. This could happen if an an rds_connection got added between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. Evaluating various fixes for this (including using _bh instead of _irq as suggested by santosh), I'll get back with a patch soon. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_tune
On (01/11/18 21:29), syzbot wrote: > == > BUG: KASAN: use-after-free in rds_tcp_tune+0x491/0x520 net/rds/tcp.c:397 > Read of size 4 at addr 8801cd5f6c58 by task kworker/u4:4/4954 Just had an offline discussion with santosh around this, here's a summary of that discussion for the archives: Looks like an rds_connect_worker workq got scheduled after the netns was deleted. This could happen if an an rds_connection got added between lines 528 and 529 of 506 static void rds_tcp_kill_sock(struct net *net) : /* code to pull out all the rds_connections that should be destroyed */ : 528 spin_unlock_irq(_tcp_conn_lock); 529 list_for_each_entry_safe(tc, _tc, _list, t_tcp_node) 530 rds_conn_destroy(tc->t_cpath->cp_conn); Such an rds_connection would miss out the rds_conn_destroy() loop (that cancels all pending work) and (if it was scheduled after netns deletion) could trigger the use-after-free. Evaluating various fixes for this (including using _bh instead of _irq as suggested by santosh), I'll get back with a patch soon. --Sowmini
Re: [rds-devel] BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
> From: Santosh Shilimkar> Date: Mon, 18 Dec 2017 08:28:05 -0800 : > > Looks like another one tripping on empty transport. Mostly below > > should > > address it but we will test it if it does. that was my first thought, but it cannot be the case here: rds_sendmsg etc itself would have bombed if that were the case, and the packet would never have gotten queued. This is unlike f3069c6d33, where an applications skips the transport binding (either misses the explicit bind, or gets the wrong transport due to an implicit bind) before it triggers the setsockopt. I suspect that the problems is that the conn (and thus c_trans) have gotten destroyed, but the cp_send_w work got incorrectly re-queued. For example, rds_cong_queue_updates() (because the peer sent a congestion update) can happen in softirq context, and would end up requeing work in the middle of rds_conn_destroy, after we have assumed that everything is quisced. On (12/18/17 12:12), David Miller wrote: > > We're seeming to accumulate a lot of checks like this, maybe there > is a more general way to deal with this problem? Yeah, I was thinking about this.. let me try to reprodcue this in-house and get back with a patchset. --Sowmini
Re: [rds-devel] BUG: unable to handle kernel NULL pointer dereference in rds_send_xmit
> From: Santosh Shilimkar > Date: Mon, 18 Dec 2017 08:28:05 -0800 : > > Looks like another one tripping on empty transport. Mostly below > > should > > address it but we will test it if it does. that was my first thought, but it cannot be the case here: rds_sendmsg etc itself would have bombed if that were the case, and the packet would never have gotten queued. This is unlike f3069c6d33, where an applications skips the transport binding (either misses the explicit bind, or gets the wrong transport due to an implicit bind) before it triggers the setsockopt. I suspect that the problems is that the conn (and thus c_trans) have gotten destroyed, but the cp_send_w work got incorrectly re-queued. For example, rds_cong_queue_updates() (because the peer sent a congestion update) can happen in softirq context, and would end up requeing work in the middle of rds_conn_destroy, after we have assumed that everything is quisced. On (12/18/17 12:12), David Miller wrote: > > We're seeming to accumulate a lot of checks like this, maybe there > is a more general way to deal with this problem? Yeah, I was thinking about this.. let me try to reprodcue this in-house and get back with a patchset. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_dev_event
On (11/13/17 19:30), Girish Moodalbail wrote: > (L538-540). However, it leaves behind some of the rds_tcp connections that > shared the same underlying RDS connection (L534 and 535). These connections > with pointer to stale network namespace are left behind in the global list. It leaves behind no such thing. After mprds, you want to collect only one instance of the conn that is being removed, that's why lines 534-535 skips over duplicat instances of the same conn (for multiple paths in the same conn). > When the 2nd network namespace is deleted, we will hit the above stale > pointer and hit UAF panic. > I think we should move away from global list to a per-namespace list. The > global list are used only in two places (both of which are per-namespace > operations): Nice try, but not so. Let me look at this tomorrow, I missed this mail in my mbox. --Sowmini
Re: KASAN: use-after-free Read in rds_tcp_dev_event
On (11/13/17 19:30), Girish Moodalbail wrote: > (L538-540). However, it leaves behind some of the rds_tcp connections that > shared the same underlying RDS connection (L534 and 535). These connections > with pointer to stale network namespace are left behind in the global list. It leaves behind no such thing. After mprds, you want to collect only one instance of the conn that is being removed, that's why lines 534-535 skips over duplicat instances of the same conn (for multiple paths in the same conn). > When the 2nd network namespace is deleted, we will hit the above stale > pointer and hit UAF panic. > I think we should move away from global list to a per-namespace list. The > global list are used only in two places (both of which are per-namespace > operations): Nice try, but not so. Let me look at this tomorrow, I missed this mail in my mbox. --Sowmini
Re: [PATCH net] rds: Make sure updates to cp_send_gen can be observed
On (07/20/17 12:28), H??kon Bugge wrote: > cp->cp_send_gen is treated as a normal variable, although it may be > used by different threads. I'm confused by that assertion. If you look at the comments right above the change in your patch, there is a note that acquire_in_xmit/release_in_xmit are the synchronization/serialization points. Can you please clarify? > --- a/net/rds/send.c > +++ b/net/rds/send.c > @@ -170,8 +170,8 @@ int rds_send_xmit(struct rds_conn_path *cp) >* The acquire_in_xmit() check above ensures that only one >* caller can increment c_send_gen at any time. >*/ > - cp->cp_send_gen++; > - send_gen = cp->cp_send_gen; > + send_gen = READ_ONCE(cp->cp_send_gen) + 1; > + WRITE_ONCE(cp->cp_send_gen, send_gen); > --Sowmini
Re: [PATCH net] rds: Make sure updates to cp_send_gen can be observed
On (07/20/17 12:28), H??kon Bugge wrote: > cp->cp_send_gen is treated as a normal variable, although it may be > used by different threads. I'm confused by that assertion. If you look at the comments right above the change in your patch, there is a note that acquire_in_xmit/release_in_xmit are the synchronization/serialization points. Can you please clarify? > --- a/net/rds/send.c > +++ b/net/rds/send.c > @@ -170,8 +170,8 @@ int rds_send_xmit(struct rds_conn_path *cp) >* The acquire_in_xmit() check above ensures that only one >* caller can increment c_send_gen at any time. >*/ > - cp->cp_send_gen++; > - send_gen = cp->cp_send_gen; > + send_gen = READ_ONCE(cp->cp_send_gen) + 1; > + WRITE_ONCE(cp->cp_send_gen, send_gen); > --Sowmini
Re: [GIT] Networking
On (07/10/17 18:05), Herbert Xu wrote: > > Hmm, I can't see the problem in af_alg_accept. The struct socket > comes directly from sys_accept() which creates it using sock_alloc. > > So the only thing I can think of is that the memory returned by > sock_alloc is not zeroed and therefore the WARN_ON is just reading > garbage. Then it is odd that this WARN_ON is not triggered for other sockets (e.g., for TCP sockets), though it happens easily with AF_ALG. But it's not sock_alloc() - that function is returning a properly zeroed ->sk. The reason that the WARN_ON is triggered is that af_alg_accept() calls sock_init_data() which does 2636 if (sock) { : 2639 sock->sk= sk; So we can do one of the following: 1. drop the WARN_ON(), which makes true leaks hard to detect 2. change the WARN_ON() to WARN_ON(parent->sk && parent->sk != sk) #2 assumes that all the refcount book-keeping is being done correctly (there is the danger that we end up taking 2 refs on the sk) --Sowmini
Re: [GIT] Networking
On (07/10/17 18:05), Herbert Xu wrote: > > Hmm, I can't see the problem in af_alg_accept. The struct socket > comes directly from sys_accept() which creates it using sock_alloc. > > So the only thing I can think of is that the memory returned by > sock_alloc is not zeroed and therefore the WARN_ON is just reading > garbage. Then it is odd that this WARN_ON is not triggered for other sockets (e.g., for TCP sockets), though it happens easily with AF_ALG. But it's not sock_alloc() - that function is returning a properly zeroed ->sk. The reason that the WARN_ON is triggered is that af_alg_accept() calls sock_init_data() which does 2636 if (sock) { : 2639 sock->sk= sk; So we can do one of the following: 1. drop the WARN_ON(), which makes true leaks hard to detect 2. change the WARN_ON() to WARN_ON(parent->sk && parent->sk != sk) #2 assumes that all the refcount book-keeping is being done correctly (there is the danger that we end up taking 2 refs on the sk) --Sowmini
Re: [GIT] Networking
On (07/09/17 11:49), Linus Torvalds wrote: > > On Sat, Jul 8, 2017 at 3:36 AM, David Miller <da...@davemloft.net> wrote: > > > > 8) Fix socket leak on accept() in RDS, from Sowmini Varadhan. Also > >add a WARN_ON() to sock_graft() so other protocol stacks don't trip > >over this as well. > > Hmm. This one triggers for me on both my desktop and laptop at bootup. > Bog-standard machines, running F25 and F24 respectively. > > The warning doesn't seem particularly useful, although maybe that > "alg_accept()" gives people who know this code enough of a clue. My initial question was whether sock_graft() should do a sock_put() before cutting loose the existing parent->sk and assigning a new parent->sk (https://www.spinics.net/lists/netdev/msg442191.html) It look like PF_ALG sets up a ->sk in alg_create() (but this would get over-written in alg_accept()?) Cc'ing Herbert to see if this is expected behavior (and PF_ALG somehow does the right thing with the refcount for the ->sk set up in alg_create) in which case I suppose we should drop the WARN_ON. --Sowmini > [ cut here ] > WARNING: CPU: 1 PID: 492 at ./include/net/sock.h:1700 > af_alg_accept+0x1bf/0x1f0 > CPU: 1 PID: 492 Comm: systemd-cryptse Not tainted 4.12.0-09010-g2b976203417c > #1 > Hardware name: System manufacturer System Product Name/Z170-K, BIOS > 1803 05/06/2016 > RIP: 0010:af_alg_accept+0x1bf/0x1f0 > Call Trace: > alg_accept+0x15/0x20 > SYSC_accept4+0x105/0x210 > ? getnstimeofday64+0xe/0x20 > ? __audit_syscall_entry+0xb1/0xf0 > ? syscall_trace_enter+0x1bd/0x2d0 > ? __audit_syscall_exit+0x1a5/0x2a0 > SyS_accept+0x10/0x20 > do_syscall_64+0x61/0x140 > entry_SYSCALL64_slow_path+0x25/0x25 > ---[ end trace a35e5baea85df269 ]---
Re: [GIT] Networking
On (07/09/17 11:49), Linus Torvalds wrote: > > On Sat, Jul 8, 2017 at 3:36 AM, David Miller wrote: > > > > 8) Fix socket leak on accept() in RDS, from Sowmini Varadhan. Also > >add a WARN_ON() to sock_graft() so other protocol stacks don't trip > >over this as well. > > Hmm. This one triggers for me on both my desktop and laptop at bootup. > Bog-standard machines, running F25 and F24 respectively. > > The warning doesn't seem particularly useful, although maybe that > "alg_accept()" gives people who know this code enough of a clue. My initial question was whether sock_graft() should do a sock_put() before cutting loose the existing parent->sk and assigning a new parent->sk (https://www.spinics.net/lists/netdev/msg442191.html) It look like PF_ALG sets up a ->sk in alg_create() (but this would get over-written in alg_accept()?) Cc'ing Herbert to see if this is expected behavior (and PF_ALG somehow does the right thing with the refcount for the ->sk set up in alg_create) in which case I suppose we should drop the WARN_ON. --Sowmini > [ cut here ] > WARNING: CPU: 1 PID: 492 at ./include/net/sock.h:1700 > af_alg_accept+0x1bf/0x1f0 > CPU: 1 PID: 492 Comm: systemd-cryptse Not tainted 4.12.0-09010-g2b976203417c > #1 > Hardware name: System manufacturer System Product Name/Z170-K, BIOS > 1803 05/06/2016 > RIP: 0010:af_alg_accept+0x1bf/0x1f0 > Call Trace: > alg_accept+0x15/0x20 > SYSC_accept4+0x105/0x210 > ? getnstimeofday64+0xe/0x20 > ? __audit_syscall_entry+0xb1/0xf0 > ? syscall_trace_enter+0x1bd/0x2d0 > ? __audit_syscall_exit+0x1a5/0x2a0 > SyS_accept+0x10/0x20 > do_syscall_64+0x61/0x140 > entry_SYSCALL64_slow_path+0x25/0x25 > ---[ end trace a35e5baea85df269 ]---
Re: RDS: TCP: Delete an error message for a failed memory allocation in rds_tcp_init_net()
> Do you find information from a Linux allocation failure report sufficient > for such an use case? yes, I suppose that would cover the needed cases. Your change looks good to me. Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Re: RDS: TCP: Delete an error message for a failed memory allocation in rds_tcp_init_net()
> Do you find information from a Linux allocation failure report sufficient > for such an use case? yes, I suppose that would cover the needed cases. Your change looks good to me. Acked-by: Sowmini Varadhan
Re: [PATCH 3/3] RDS: TCP: Delete an error message for a failed memory allocation in rds_tcp_init_net()
On (05/22/17 16:13), SF Markus Elfring wrote: > > Omit an extra message for a memory allocation failure in this function. The change itself is harmless, but I'm curious about the "extra" part: "extra" from what? If this happens, hopefully this will be logged somewhere? Note that this type of (infrequent) logging noise is useful in some cases, e.g., with 8ce675ff, when one is trying to do the post-mortem of where things first went wrong. --Sowmini
Re: [PATCH 3/3] RDS: TCP: Delete an error message for a failed memory allocation in rds_tcp_init_net()
On (05/22/17 16:13), SF Markus Elfring wrote: > > Omit an extra message for a memory allocation failure in this function. The change itself is harmless, but I'm curious about the "extra" part: "extra" from what? If this happens, hopefully this will be logged somewhere? Note that this type of (infrequent) logging noise is useful in some cases, e.g., with 8ce675ff, when one is trying to do the post-mortem of where things first went wrong. --Sowmini
Re: Question about SOCK_SEQPACKET
On (05/05/17 10:45), Steven Whitehouse wrote: > > I do wonder if the man page for recvmsg is wrong, or at least a bit > confusing. SOCK_SEQPACKET is stream based not message based - it just > happens to have EOR markers in the stream. There is no reason that the whole > message needs to be returned in a single read, and in fact that would be > impossible if the sender didn't insert any EOR markers but kept sending data > beyond the size that the socket could buffer. > > I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed > maximum length which is definitely wrong, as is the statement that a > consumer has to read an entire packet with each system call. Which man page do you think is wrong here? The POSIX definition is here http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html The description in http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_10.html says, "It is protocol-specific whether a maximum record size is imposed." In my machine (Ubuntu 4.4.0-72, and it is in socket(2), not socket(7), btw) doesnt have any references to max length, but I'm not sure I'd boldly assert "definitely wrong" about the requirement of having to read entire packet in a system call (see POSIX man page) --Sowmini
Re: Question about SOCK_SEQPACKET
On (05/05/17 10:45), Steven Whitehouse wrote: > > I do wonder if the man page for recvmsg is wrong, or at least a bit > confusing. SOCK_SEQPACKET is stream based not message based - it just > happens to have EOR markers in the stream. There is no reason that the whole > message needs to be returned in a single read, and in fact that would be > impossible if the sender didn't insert any EOR markers but kept sending data > beyond the size that the socket could buffer. > > I notice that man 7 socket says SOCK_SEQPACKET is for datagrams of fixed > maximum length which is definitely wrong, as is the statement that a > consumer has to read an entire packet with each system call. Which man page do you think is wrong here? The POSIX definition is here http://pubs.opengroup.org/onlinepubs/009695399/functions/recvmsg.html The description in http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_10.html says, "It is protocol-specific whether a maximum record size is imposed." In my machine (Ubuntu 4.4.0-72, and it is in socket(2), not socket(7), btw) doesnt have any references to max length, but I'm not sure I'd boldly assert "definitely wrong" about the requirement of having to read entire packet in a system call (see POSIX man page) --Sowmini
Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
On (03/15/17 10:08), Dmitry Vyukov wrote: > After I've applied the patch these reports stopped to happen, and I > have not seem any other reports that look relevant. > However, it there was one, but it looks like a different issue and it > was probably masked by massive amounts of original deadlock reports: Yes, this looks like a valid deadlock. I think there may be some ->dumpit callbacks that take the rtnl_lock and do not unlock it before return, e.g., I see nl80211_dump_interface() doing this at 2612 rtnl_lock(); 2613 if (!cb->args[2]) { : 2619 ret = nl80211_dump_wiphy_parse(skb, cb, ); 2620 if (ret) 2621 return ret; afaict, nl80211_dump_wiphy_parse does not itself do rtnl_unlock on error. If that's the case then we'd run into the circular locking dependancy flagged by lockdep. Disclaimer: I did not check every single ->dumpit, there may be more than one of these..
Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
On (03/15/17 10:08), Dmitry Vyukov wrote: > After I've applied the patch these reports stopped to happen, and I > have not seem any other reports that look relevant. > However, it there was one, but it looks like a different issue and it > was probably masked by massive amounts of original deadlock reports: Yes, this looks like a valid deadlock. I think there may be some ->dumpit callbacks that take the rtnl_lock and do not unlock it before return, e.g., I see nl80211_dump_interface() doing this at 2612 rtnl_lock(); 2613 if (!cb->args[2]) { : 2619 ret = nl80211_dump_wiphy_parse(skb, cb, ); 2620 if (ret) 2621 return ret; afaict, nl80211_dump_wiphy_parse does not itself do rtnl_unlock on error. If that's the case then we'd run into the circular locking dependancy flagged by lockdep. Disclaimer: I did not check every single ->dumpit, there may be more than one of these..
Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
On (03/14/17 09:14), Dmitry Vyukov wrote: > Another one now involving rds_tcp_listen_stop : > kworker/u4:1/19 is trying to acquire lock: > (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock > include/net/sock.h:1460 [inline] > (sk_lock-AF_INET){+.+.+.}, at: [] > rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288 > > but task is already holding lock: > (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20 > net/core/rtnetlink.c:70 Is this also a false positive? genl_lock_dumpit takes the genl_lock and then waits on the rtnl_lock (e.g., out of tipc_nl_bearer_dump). netdev_run_todo takes the rtnl_lock and then wants lock_sock() for the TCP/IPv4 socket. Why is lockdep seeing a circular dependancy here? Same pattern seems to be happening for http://www.spinics.net/lists/netdev/msg423368.html and maybe also http://www.spinics.net/lists/netdev/msg423323.html? --Sowmini > Chain exists of: > sk_lock-AF_INET --> genl_mutex --> rtnl_mutex > > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(rtnl_mutex); >lock(genl_mutex); >lock(rtnl_mutex); > lock(sk_lock-AF_INET); > > *** DEADLOCK *** > > 4 locks held by kworker/u4:1/19: > #0: ("%s""netns"){.+.+.+}, at: [] > __write_once_size include/linux/compiler.h:283 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] atomic64_set > arch/x86/include/asm/atomic64_64.h:33 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] atomic_long_set > include/asm-generic/atomic-long.h:56 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] set_work_data > kernel/workqueue.c:617 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] > set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] > process_one_work+0xab3/0x1c10 kernel/workqueue.c:2089 > #1: (net_cleanup_work){+.+.+.}, at: [] > process_one_work+0xb07/0x1c10 kernel/workqueue.c:2093 > #2: (net_mutex){+.+.+.}, at: [] > cleanup_net+0x22b/0xa90 net/core/net_namespace.c:429 > #3: (rtnl_mutex){+.+.+.}, at: [] > rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70 >
Re: crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
On (03/14/17 09:14), Dmitry Vyukov wrote: > Another one now involving rds_tcp_listen_stop : > kworker/u4:1/19 is trying to acquire lock: > (sk_lock-AF_INET){+.+.+.}, at: [] lock_sock > include/net/sock.h:1460 [inline] > (sk_lock-AF_INET){+.+.+.}, at: [] > rds_tcp_listen_stop+0x5c/0x150 net/rds/tcp_listen.c:288 > > but task is already holding lock: > (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20 > net/core/rtnetlink.c:70 Is this also a false positive? genl_lock_dumpit takes the genl_lock and then waits on the rtnl_lock (e.g., out of tipc_nl_bearer_dump). netdev_run_todo takes the rtnl_lock and then wants lock_sock() for the TCP/IPv4 socket. Why is lockdep seeing a circular dependancy here? Same pattern seems to be happening for http://www.spinics.net/lists/netdev/msg423368.html and maybe also http://www.spinics.net/lists/netdev/msg423323.html? --Sowmini > Chain exists of: > sk_lock-AF_INET --> genl_mutex --> rtnl_mutex > > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(rtnl_mutex); >lock(genl_mutex); >lock(rtnl_mutex); > lock(sk_lock-AF_INET); > > *** DEADLOCK *** > > 4 locks held by kworker/u4:1/19: > #0: ("%s""netns"){.+.+.+}, at: [] > __write_once_size include/linux/compiler.h:283 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] atomic64_set > arch/x86/include/asm/atomic64_64.h:33 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] atomic_long_set > include/asm-generic/atomic-long.h:56 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] set_work_data > kernel/workqueue.c:617 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] > set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline] > #0: ("%s""netns"){.+.+.+}, at: [] > process_one_work+0xab3/0x1c10 kernel/workqueue.c:2089 > #1: (net_cleanup_work){+.+.+.}, at: [] > process_one_work+0xb07/0x1c10 kernel/workqueue.c:2093 > #2: (net_mutex){+.+.+.}, at: [] > cleanup_net+0x22b/0xa90 net/core/net_namespace.c:429 > #3: (rtnl_mutex){+.+.+.}, at: [] > rtnl_lock+0x17/0x20 net/core/rtnetlink.c:70 >
Re: net/rds: use-after-free in inet_create
Actually, I'm not sure if I can assert that these are all manifestations of the same bug- was a netns-delete involved in this one as well? I see: > BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:768 at : > memcmp+0xe3/0x160 lib/string.c:768 : > rds_find_bound+0x4fe/0x8a0 net/rds/bind.c:63 > rds_recv_incoming+0x5f3/0x12c0 net/rds/recv.c:349 > rds_loop_xmit+0x1c5/0x490 net/rds/loop.c:82 : This appears to be for a looped back packet, and looks like there are problems with some rds_sock that got removed from the bind_hash_table.. According to the report, socket was created at > Allocated: > PID = 5235 : > sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334 > sk_alloc+0x105/0x1010 net/core/sock.c:1396 > rds_create+0x11c/0x600 net/rds/af_rds.c:504 and closed at some point: > Freed: > PID = 5235 : > rds_release+0x3a1/0x4d0 net/rds/af_rds.c:89 > sock_release+0x8d/0x1e0 net/socket.c:599 This is all uspace created rds sockets, and while there may be an unrelated bug here, I'm not sure I see the netns/kernel-socket connection.. can you please clarify if this was also seen in some netns context? --Sowmini
Re: net/rds: use-after-free in inet_create
Actually, I'm not sure if I can assert that these are all manifestations of the same bug- was a netns-delete involved in this one as well? I see: > BUG: KASAN: use-after-free in memcmp+0xe3/0x160 lib/string.c:768 at : > memcmp+0xe3/0x160 lib/string.c:768 : > rds_find_bound+0x4fe/0x8a0 net/rds/bind.c:63 > rds_recv_incoming+0x5f3/0x12c0 net/rds/recv.c:349 > rds_loop_xmit+0x1c5/0x490 net/rds/loop.c:82 : This appears to be for a looped back packet, and looks like there are problems with some rds_sock that got removed from the bind_hash_table.. According to the report, socket was created at > Allocated: > PID = 5235 : > sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334 > sk_alloc+0x105/0x1010 net/core/sock.c:1396 > rds_create+0x11c/0x600 net/rds/af_rds.c:504 and closed at some point: > Freed: > PID = 5235 : > rds_release+0x3a1/0x4d0 net/rds/af_rds.c:89 > sock_release+0x8d/0x1e0 net/socket.c:599 This is all uspace created rds sockets, and while there may be an unrelated bug here, I'm not sure I see the netns/kernel-socket connection.. can you please clarify if this was also seen in some netns context? --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 18:45), Dmitry Vyukov wrote: > > Yes, I can now apply custom patches to the bots. However, it fired > only 3 times, so it will give weak signal. But at least it will test > that the patch does not cause other bad things. Ok, let me do my bit of homework on this one and get back to you (probably by tomorrow)..
Re: net/rds: use-after-free in inet_create
On (02/28/17 18:45), Dmitry Vyukov wrote: > > Yes, I can now apply custom patches to the bots. However, it fired > only 3 times, so it will give weak signal. But at least it will test > that the patch does not cause other bad things. Ok, let me do my bit of homework on this one and get back to you (probably by tomorrow)..
Re: net/rds: use-after-free in inet_create
On (02/28/17 17:51), Dmitry Vyukov wrote: > Searching other crashes for "net/rds" I found 2 more crashes that may > be related. They suggest that the delayed works are not properly > stopped when the socket is destroyed. That would explain how > rds_connect_worker accesses freed net, right? yes, I think we may want to explicitly cancel this workq.. this in rds_conn_destroy(). I'm trying to build/sanity-test (if lucky, reproduce the bug) as I send this out.. let me get back to you.. If I have a patch against net-next, would you be willing/able to try it out? given that this does not show up on demand, I'm not sure how we can check that "the fix worked".. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 17:51), Dmitry Vyukov wrote: > Searching other crashes for "net/rds" I found 2 more crashes that may > be related. They suggest that the delayed works are not properly > stopped when the socket is destroyed. That would explain how > rds_connect_worker accesses freed net, right? yes, I think we may want to explicitly cancel this workq.. this in rds_conn_destroy(). I'm trying to build/sanity-test (if lucky, reproduce the bug) as I send this out.. let me get back to you.. If I have a patch against net-next, would you be willing/able to try it out? given that this does not show up on demand, I'm not sure how we can check that "the fix worked".. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 17:32), Dmitry Vyukov wrote: > Not reproducible so far. > > rds is compiled into kernel (no modules): > CONFIG_RDS=y > CONFIG_RDS_TCP=y I see. So if it never gets unloaded, the rds_connections "should" be around forever.. let me inspect code and see if I spot some race-window.. > Also fuzzer actively creates and destroys namespaces. > Yes, I don't see socket(0x15) in the log. Probably it was truncated. I see. May be useful if we coudl get a crash dump to see what other threads were going on (might give a hint about which threads were racing). I'll try reproducing this at my end too. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 17:32), Dmitry Vyukov wrote: > Not reproducible so far. > > rds is compiled into kernel (no modules): > CONFIG_RDS=y > CONFIG_RDS_TCP=y I see. So if it never gets unloaded, the rds_connections "should" be around forever.. let me inspect code and see if I spot some race-window.. > Also fuzzer actively creates and destroys namespaces. > Yes, I don't see socket(0x15) in the log. Probably it was truncated. I see. May be useful if we coudl get a crash dump to see what other threads were going on (might give a hint about which threads were racing). I'll try reproducing this at my end too. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 16:49), Dmitry Vyukov wrote: > > Grepping "socket" there, it was doing lots of things with sockets. Are > we looking for some particular socket type? If there are few programs > that create sockets of that type, then we can narrow down the set: Yes, we are looking for PF_RDS/AF_RDS - this should be #define AF_RDS 21 /* RDS sockets */ I see PF_KCM there (value 41) but no instances of 0x15.. how did the rds_connect_worker thread get kicked off at all? the way this is supposed to work is 1. someone modprobes rds-tcp 2. app tries to do rds_sendmsg to some ip address in a netns - this triggers the creation of an rds_connection, and subsequent kernel socket TCP connection threads (i.e., rds_connect_worker) for that netns 3. if you unload rds-tcp, the module_unload should do all the cleanup needed via rds_tcp_conn_paths_destroy. This is done Its not clear to me that the test is doing any of this... is this reproducible? let me check if there is some race window where we can restart a connection attempt when rds_tcp_kill_sock assumes that the connect worker has been quiesced.. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 16:49), Dmitry Vyukov wrote: > > Grepping "socket" there, it was doing lots of things with sockets. Are > we looking for some particular socket type? If there are few programs > that create sockets of that type, then we can narrow down the set: Yes, we are looking for PF_RDS/AF_RDS - this should be #define AF_RDS 21 /* RDS sockets */ I see PF_KCM there (value 41) but no instances of 0x15.. how did the rds_connect_worker thread get kicked off at all? the way this is supposed to work is 1. someone modprobes rds-tcp 2. app tries to do rds_sendmsg to some ip address in a netns - this triggers the creation of an rds_connection, and subsequent kernel socket TCP connection threads (i.e., rds_connect_worker) for that netns 3. if you unload rds-tcp, the module_unload should do all the cleanup needed via rds_tcp_conn_paths_destroy. This is done Its not clear to me that the test is doing any of this... is this reproducible? let me check if there is some race window where we can restart a connection attempt when rds_tcp_kill_sock assumes that the connect worker has been quiesced.. --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 15:22), Dmitry Vyukov wrote: > > Hello, > > I've got the following report while running syzkaller fuzzer on > linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. So far it > happened only once. net was somehow deleted from underneath > inet_create. I've noticed that rds uses sock_create_kern which does > not take net reference. What is that that must keep net alive then? The rds_connection (which is where the net pointer is being obtained from) should keep the connection alive. Did you have the rds[_tcp] modules loaded at the time of failure? Were there kernel tcp sockets to/from the 16385 port? any hints on what else the test was doing (was it running a userspace RDS application that triggered the kernel TCP connection attempt in the first place)? --Sowmini
Re: net/rds: use-after-free in inet_create
On (02/28/17 15:22), Dmitry Vyukov wrote: > > Hello, > > I've got the following report while running syzkaller fuzzer on > linux-next/8d01c069486aca75b8f6018a759215b0ed0c91f0. So far it > happened only once. net was somehow deleted from underneath > inet_create. I've noticed that rds uses sock_create_kern which does > not take net reference. What is that that must keep net alive then? The rds_connection (which is where the net pointer is being obtained from) should keep the connection alive. Did you have the rds[_tcp] modules loaded at the time of failure? Were there kernel tcp sockets to/from the 16385 port? any hints on what else the test was doing (was it running a userspace RDS application that triggered the kernel TCP connection attempt in the first place)? --Sowmini
Re: [net-next][PATCH] RDS: keep data type consistent in the user visible header
On (02/20/17 10:19), David Miller wrote: > > The correct fix it to use "__u8", "__u64", etc. So the rest of rds.h uses uint8_t, uint32_t etc Perhaps (I'm not sure of the origins) this was because of the shared OpenIB.org BSD license etc using __u8 in one place and uint8_t in another would seem inconsistent, at the very least. --Sowmini
Re: [net-next][PATCH] RDS: keep data type consistent in the user visible header
On (02/20/17 10:19), David Miller wrote: > > The correct fix it to use "__u8", "__u64", etc. So the rest of rds.h uses uint8_t, uint32_t etc Perhaps (I'm not sure of the origins) this was because of the shared OpenIB.org BSD license etc using __u8 in one place and uint8_t in another would seem inconsistent, at the very least. --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/10/17 10:00), Cong Wang wrote: > My understanding about the race here is packet_release() doesn't > wait for flying packets correctly, which leads to a flying packet still > refers to the struct sock which is being released. > > This could happen because struct packet_fanout is refcn'ted, it is : > At least I believe this explains the crash Dmitry reported. hmm, the proof of the pudding is in the eating- would be good to be able to reliably reproduce this somewhere (thus proving that root-cause analysis is rock-solid), maybe by introducing artificial delays to slow down paths.. I'm travelling at the moment but may be able to give this (try to reproduce it reliably) next week. --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/10/17 10:00), Cong Wang wrote: > My understanding about the race here is packet_release() doesn't > wait for flying packets correctly, which leads to a flying packet still > refers to the struct sock which is being released. > > This could happen because struct packet_fanout is refcn'ted, it is : > At least I believe this explains the crash Dmitry reported. hmm, the proof of the pudding is in the eating- would be good to be able to reliably reproduce this somewhere (thus proving that root-cause analysis is rock-solid), maybe by introducing artificial delays to slow down paths.. I'm travelling at the moment but may be able to give this (try to reproduce it reliably) next week. --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/09/17 19:19), Eric Dumazet wrote: > > More likely the bug is in fanout_add(), with a buggy sequence in error > case, and not correct locking. > > kfree(po->rollover); > po->rollover = NULL; > > Two cpus entering fanout_add() (using the same af_packet socket, > syzkaller courtesy...) might both see po->fanout being NULL. > > Then they grab the mutex. Too late... I'm not sure I follow- aiui the panic was in acceessing the sk_receive_queue.lock in a socket that had been closed earlier. I think the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit packet delivery can be done safely, and the synchronize_net in packet_release() makes sure that the Tx paths are quiesced before freeing the socket. What is the race-hole here? Does it have to do with the _bh and softirq context, somehow? --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/09/17 19:19), Eric Dumazet wrote: > > More likely the bug is in fanout_add(), with a buggy sequence in error > case, and not correct locking. > > kfree(po->rollover); > po->rollover = NULL; > > Two cpus entering fanout_add() (using the same af_packet socket, > syzkaller courtesy...) might both see po->fanout being NULL. > > Then they grab the mutex. Too late... I'm not sure I follow- aiui the panic was in acceessing the sk_receive_queue.lock in a socket that had been closed earlier. I think the assumption is that rcu_read_lock_bh in __dev_queue_xmit (and rcu_read_lock in dev_queue_xmit_nit?) should make sure that the nit packet delivery can be done safely, and the synchronize_net in packet_release() makes sure that the Tx paths are quiesced before freeing the socket. What is the race-hole here? Does it have to do with the _bh and softirq context, somehow? --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/09/17 14:14), Dmitry Vyukov wrote: > > Call Trace: : > packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308 > fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388 > packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490 > dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898 : > tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline] : looks like a race between a NIT socket (tcpdump, maybe?) that is closing, and a standard tcp socket.. packet_release() takes the po->bind_lock to remove the socket from the ptype_all NIT queue. but how does that sync with the Tx path for other af_inet/af_inet6 sockets? --Sowmini
Re: net/packet: use-after-free in packet_rcv_fanout
On (02/09/17 14:14), Dmitry Vyukov wrote: > > Call Trace: : > packet_rcv_has_room+0x25/0xb0 net/packet/af_packet.c:1308 > fanout_demux_rollover+0x3bb/0x6b0 net/packet/af_packet.c:1388 > packet_rcv_fanout+0x674/0x800 net/packet/af_packet.c:1490 > dev_queue_xmit_nit+0x73a/0xa90 net/core/dev.c:1898 : > tcp_sendmsg_fastopen net/ipv4/tcp.c:1110 [inline] : looks like a race between a NIT socket (tcpdump, maybe?) that is closing, and a standard tcp socket.. packet_release() takes the po->bind_lock to remove the socket from the ptype_all NIT queue. but how does that sync with the Tx path for other af_inet/af_inet6 sockets? --Sowmini
Re: [PATCH v2 net-next 6/9] sunvnet: straighten up message event handling logic
On (02/08/17 08:28), Shannon Nelson wrote: > The existing code does this as well - if it first finds a RESET, it handles > it then hits the return 0. Next if it finds the UP, it does the goto back > to the ldc_ctrl: to process, and hits the same return 0. Only if neither of > these bits have been seen does the code move on to process the DATA_READY > event. > > If we're seeing cases of both UP and DATA_READY, then yes we'll wnat to look > at changing this logic. I think that should be a separate patch. > Ok. probably we just make one redundant loop out and back into the function. Other than that, the patchset looks good to me. --Sowmini
Re: [PATCH v2 net-next 6/9] sunvnet: straighten up message event handling logic
On (02/08/17 08:28), Shannon Nelson wrote: > The existing code does this as well - if it first finds a RESET, it handles > it then hits the return 0. Next if it finds the UP, it does the goto back > to the ldc_ctrl: to process, and hits the same return 0. Only if neither of > these bits have been seen does the code move on to process the DATA_READY > event. > > If we're seeing cases of both UP and DATA_READY, then yes we'll wnat to look > at changing this logic. I think that should be a separate patch. > Ok. probably we just make one redundant loop out and back into the function. Other than that, the patchset looks good to me. --Sowmini
Re: [PATCH v2 net-next 6/9] sunvnet: straighten up message event handling logic
On (02/07/17 14:12), Shannon Nelson wrote: > + > + /* we don't expect any other bits */ > + BUG_ON(port->rx_event & ~(LDC_EVENT_DATA_READY | > + LDC_EVENT_RESET | > + LDC_EVENT_UP)); > + > + /* RESET takes precedent over any other event */ > + if (port->rx_event & LDC_EVENT_RESET) { : > port->rx_event = 0; > return 0; > } > + if (port->rx_event & LDC_EVENT_UP) { > + vio_link_state_change(vio, LDC_EVENT_UP); > + port->rx_event = 0; > + return 0; > + } > > err = 0; > tx_wakeup = 0; IIRC there were timing-related situations where you can get woken up with both UP and DATA_READY, and if my reading of your patch is correct, we would ignore the DATA_READY, and return, right? --Sowmini
Re: [PATCH v2 net-next 6/9] sunvnet: straighten up message event handling logic
On (02/07/17 14:12), Shannon Nelson wrote: > + > + /* we don't expect any other bits */ > + BUG_ON(port->rx_event & ~(LDC_EVENT_DATA_READY | > + LDC_EVENT_RESET | > + LDC_EVENT_UP)); > + > + /* RESET takes precedent over any other event */ > + if (port->rx_event & LDC_EVENT_RESET) { : > port->rx_event = 0; > return 0; > } > + if (port->rx_event & LDC_EVENT_UP) { > + vio_link_state_change(vio, LDC_EVENT_UP); > + port->rx_event = 0; > + return 0; > + } > > err = 0; > tx_wakeup = 0; IIRC there were timing-related situations where you can get woken up with both UP and DATA_READY, and if my reading of your patch is correct, we would ignore the DATA_READY, and return, right? --Sowmini
Re: [rds-devel] [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use
On (10/15/16 11:53), Joe Perches wrote: > > rds_conn_path_error already prefixes "RDS:" to the output. > > Signed-off-by: Joe Perches <j...@perches.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Re: [rds-devel] [PATCH 1/2] rds: Remove unused rds_conn_error
On (10/15/16 11:53), Joe Perches wrote: > This macro's last use was removed in commit d769ef81d5b59 > ("RDS: Update rds_conn_shutdown to work with rds_conn_path") > so make the macro and the __rds_conn_error function definition > and declaration disappear. > > Signed-off-by: Joe Perches <j...@perches.com> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com>
Re: [rds-devel] [PATCH 2/2] rds: Remove duplicate prefix from rds_conn_path_error use
On (10/15/16 11:53), Joe Perches wrote: > > rds_conn_path_error already prefixes "RDS:" to the output. > > Signed-off-by: Joe Perches Acked-by: Sowmini Varadhan
Re: [rds-devel] [PATCH 1/2] rds: Remove unused rds_conn_error
On (10/15/16 11:53), Joe Perches wrote: > This macro's last use was removed in commit d769ef81d5b59 > ("RDS: Update rds_conn_shutdown to work with rds_conn_path") > so make the macro and the __rds_conn_error function definition > and declaration disappear. > > Signed-off-by: Joe Perches Acked-by: Sowmini Varadhan
Re: [PATCH v05 08/72] rds.h: use __u8, __u16, __s16, __u32 and __s64 from linux/types.h
On (08/22/16 20:32), Mikko Rapeli wrote: > > Fixes userspace compilation errors like: > > linux/rds.h:96:2: error: unknown type name ‘uint8_t’ > > Signed-off-by: Mikko Rapeli <mikko.rap...@iki.fi> Acked-by: Sowmini Varadhan <sowmini.varad...@oracle.com> I think we discussed this some time before, and I certainly dont have any religious opinions about it, but I would like to point out that this means that the exported rds.h will expose __ to user space applications. --Sowmini
Re: [PATCH v05 08/72] rds.h: use __u8, __u16, __s16, __u32 and __s64 from linux/types.h
On (08/22/16 20:32), Mikko Rapeli wrote: > > Fixes userspace compilation errors like: > > linux/rds.h:96:2: error: unknown type name ‘uint8_t’ > > Signed-off-by: Mikko Rapeli Acked-by: Sowmini Varadhan I think we discussed this some time before, and I certainly dont have any religious opinions about it, but I would like to point out that this means that the exported rds.h will expose __ to user space applications. --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (05/04/16 10:32), Herbert Xu wrote: > > Please base it on cryptodev. > Looks like this got fixed in cryptodev by commit cece762f6f3c ("lib/mpi: mpi_write_sgl(): fix out-of-bounds stack access") Thanks, --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (05/04/16 10:32), Herbert Xu wrote: > > Please base it on cryptodev. > Looks like this got fixed in cryptodev by commit cece762f6f3c ("lib/mpi: mpi_write_sgl(): fix out-of-bounds stack access") Thanks, --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (05/03/16 16:12), Herbert Xu wrote: > > Sorry, but your patch doesn't apply against the current tree at all. > Please rebase it if it is still needed. Hello, I had based my patch off of net-next, which is where I do my work. I'd be happy to rebase it on the "current tree", but given that mpicoder.c does not have an entry in MAINTAINERS, please clarify what you mean by "current tree" in this case. do you mean git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git or git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git or git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (which are the three possible candidates I can see in MAINTAINERS). It would be nice to get this bug fixed, since the fix is fairly obvious, and the nuisance factor from the generated "unaligned access" messages on the impacted non-intel platforms is quite high, thanks, --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (05/03/16 16:12), Herbert Xu wrote: > > Sorry, but your patch doesn't apply against the current tree at all. > Please rebase it if it is still needed. Hello, I had based my patch off of net-next, which is where I do my work. I'd be happy to rebase it on the "current tree", but given that mpicoder.c does not have an entry in MAINTAINERS, please clarify what you mean by "current tree" in this case. do you mean git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git or git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git or git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git (which are the three possible candidates I can see in MAINTAINERS). It would be nice to get this bug fixed, since the fix is fairly obvious, and the nuisance factor from the generated "unaligned access" messages on the impacted non-intel platforms is quite high, thanks, --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (04/28/16 09:01), Herbert Xu wrote: > Subject: Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in > mpi_write_to_sgl > > Please cc linux-crypto. Just bounced the message to linux-crypto as well. I think get_maintainers.pl might also need to be updated to generate this automatically. Thanks, --Sowmini
Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
On (04/28/16 09:01), Herbert Xu wrote: > Subject: Re: [PATCH v2] lib/mpi: Fix kernel unaligned access in > mpi_write_to_sgl > > Please cc linux-crypto. Just bounced the message to linux-crypto as well. I think get_maintainers.pl might also need to be updated to generate this automatically. Thanks, --Sowmini
[PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
Commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") added mpi_write_to_sgl() which generates traps due to unaligned access on some platforms like sparc. Fix this by using the get_unaligned* and put_unaligned* functions. Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2: tadeusz.struk comments: Predicate on BYTES_PER_MPI_LIMB. lib/mpi/mpicoder.c | 21 + 1 files changed, 17 insertions(+), 4 deletions(-) diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c index eb15e7d..b61eb6b 100644 --- a/lib/mpi/mpicoder.c +++ b/lib/mpi/mpicoder.c @@ -21,6 +21,7 @@ #include #include #include "mpi-internal.h" +#include #define MAX_EXTERN_MPI_BITS 16384 @@ -405,10 +406,22 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, unsigned *nbytes, p -= sizeof(alimb); continue; } else { - mpi_limb_t *limb1 = (void *)p - sizeof(alimb); - mpi_limb_t *limb2 = (void *)p - sizeof(alimb) - + lzeros; - *limb1 = *limb2; + mpi_limb_t tmp; +#if BYTES_PER_MPI_LIMB == 4 + tmp = get_unaligned_be32((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be32(tmp, (void *)p - + sizeof(alimb)); +#elif BYTES_PER_MPI_LIMB == 8 + tmp = get_unaligned_be64((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be64(tmp, (void *)p - + sizeof(alimb)); +#else +#error please implement for this limb size. +#endif p -= lzeros; y = lzeros; } -- 1.7.1
[PATCH v2] lib/mpi: Fix kernel unaligned access in mpi_write_to_sgl
Commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") added mpi_write_to_sgl() which generates traps due to unaligned access on some platforms like sparc. Fix this by using the get_unaligned* and put_unaligned* functions. Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Signed-off-by: Sowmini Varadhan --- v2: tadeusz.struk comments: Predicate on BYTES_PER_MPI_LIMB. lib/mpi/mpicoder.c | 21 + 1 files changed, 17 insertions(+), 4 deletions(-) diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c index eb15e7d..b61eb6b 100644 --- a/lib/mpi/mpicoder.c +++ b/lib/mpi/mpicoder.c @@ -21,6 +21,7 @@ #include #include #include "mpi-internal.h" +#include #define MAX_EXTERN_MPI_BITS 16384 @@ -405,10 +406,22 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, unsigned *nbytes, p -= sizeof(alimb); continue; } else { - mpi_limb_t *limb1 = (void *)p - sizeof(alimb); - mpi_limb_t *limb2 = (void *)p - sizeof(alimb) - + lzeros; - *limb1 = *limb2; + mpi_limb_t tmp; +#if BYTES_PER_MPI_LIMB == 4 + tmp = get_unaligned_be32((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be32(tmp, (void *)p - + sizeof(alimb)); +#elif BYTES_PER_MPI_LIMB == 8 + tmp = get_unaligned_be64((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be64(tmp, (void *)p - + sizeof(alimb)); +#else +#error please implement for this limb size. +#endif p -= lzeros; y = lzeros; } -- 1.7.1
Re: [PATCH] lib/mpi: Fix kernel unaligned acces in mpi_write_to_sgl
On (04/21/16 10:23), Tadeusz Struk wrote: > > What if the mpi_limb_t will happen to be 64 bit? > Thanks, When I checked this with cscope, I found typedef unsigned long int mpi_limb_t; thus I used the *32 functions. But you obviously know better, since you wrote this code (and bug). If you anticipate that mpi_limb_t in some environment today, I can check for sizeof(mpi_limb_t), and predicate it to use the *32 or *64 functions based on the result. Do you think that is necessary? Regards, --Sowmini
Re: [PATCH] lib/mpi: Fix kernel unaligned acces in mpi_write_to_sgl
On (04/21/16 10:23), Tadeusz Struk wrote: > > What if the mpi_limb_t will happen to be 64 bit? > Thanks, When I checked this with cscope, I found typedef unsigned long int mpi_limb_t; thus I used the *32 functions. But you obviously know better, since you wrote this code (and bug). If you anticipate that mpi_limb_t in some environment today, I can check for sizeof(mpi_limb_t), and predicate it to use the *32 or *64 functions based on the result. Do you think that is necessary? Regards, --Sowmini
[PATCH] lib/mpi: Fix kernel unaligned acces in mpi_write_to_sgl
Commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") added mpi_write_to_sgl() which generates traps due to unaligned access on some platforms like sparc. Fix this by using the get_unaligned* and put_unaligned* functions. Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- lib/mpi/mpicoder.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c index eb15e7d..6771378 100644 --- a/lib/mpi/mpicoder.c +++ b/lib/mpi/mpicoder.c @@ -21,6 +21,7 @@ #include #include #include "mpi-internal.h" +#include #define MAX_EXTERN_MPI_BITS 16384 @@ -405,10 +406,13 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, unsigned *nbytes, p -= sizeof(alimb); continue; } else { - mpi_limb_t *limb1 = (void *)p - sizeof(alimb); - mpi_limb_t *limb2 = (void *)p - sizeof(alimb) - + lzeros; - *limb1 = *limb2; + mpi_limb_t tmp; + + tmp = get_unaligned_be32((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be32(tmp, + (void *)p - sizeof(alimb)); p -= lzeros; y = lzeros; } -- 1.7.1
[PATCH] lib/mpi: Fix kernel unaligned acces in mpi_write_to_sgl
Commit 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") added mpi_write_to_sgl() which generates traps due to unaligned access on some platforms like sparc. Fix this by using the get_unaligned* and put_unaligned* functions. Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers") Signed-off-by: Sowmini Varadhan --- lib/mpi/mpicoder.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/mpi/mpicoder.c b/lib/mpi/mpicoder.c index eb15e7d..6771378 100644 --- a/lib/mpi/mpicoder.c +++ b/lib/mpi/mpicoder.c @@ -21,6 +21,7 @@ #include #include #include "mpi-internal.h" +#include #define MAX_EXTERN_MPI_BITS 16384 @@ -405,10 +406,13 @@ int mpi_write_to_sgl(MPI a, struct scatterlist *sgl, unsigned *nbytes, p -= sizeof(alimb); continue; } else { - mpi_limb_t *limb1 = (void *)p - sizeof(alimb); - mpi_limb_t *limb2 = (void *)p - sizeof(alimb) - + lzeros; - *limb1 = *limb2; + mpi_limb_t tmp; + + tmp = get_unaligned_be32((void *)p - +sizeof(alimb) + +lzeros); + put_unaligned_be32(tmp, + (void *)p - sizeof(alimb)); p -= lzeros; y = lzeros; } -- 1.7.1
[PATCH v9] i40e: Look up MAC address in Open Firmware or IDPROM
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The I40E_FLAG_PF_MAC flag in the platform-private i40e_pf structure tracks whether a platform-specific mac address was found, in which case calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan() will be triggered. Reviewed-by: Martin K. Petersen Signed-off-by: Sowmini Varadhan --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson v8: use pf->flags instead of bit value to track if we found a platform-specific macaddr. Other code-style comments from Shannon Nelson v9: David Miller, Shannon Nelson comments for i40e_get_platform_mac_addr drivers/net/ethernet/intel/i40e/i40e.h |1 + drivers/net/ethernet/intel/i40e/i40e_main.c | 90 +++ 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..b85eaee 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -337,6 +337,7 @@ struct i40e_pf { #define I40E_FLAG_LINK_POLLING_ENABLED BIT_ULL(39) #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40) #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42) +#define I40E_FLAG_PF_MAC BIT_ULL(50) /* tracks features that get auto disabled by errors */ u64 auto_disable_flags; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..3e2d4e0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, +vsi->back->hw.aq.asq_last_status)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,17 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!!(pf->flags & I40E_FLAG_PF_MAC)) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) { + dev_warn(>pdev->dev,
[PATCH v9] i40e: Look up MAC address in Open Firmware or IDPROM
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The I40E_FLAG_PF_MAC flag in the platform-private i40e_pf structure tracks whether a platform-specific mac address was found, in which case calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan() will be triggered. Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson v8: use pf->flags instead of bit value to track if we found a platform-specific macaddr. Other code-style comments from Shannon Nelson v9: David Miller, Shannon Nelson comments for i40e_get_platform_mac_addr drivers/net/ethernet/intel/i40e/i40e.h |1 + drivers/net/ethernet/intel/i40e/i40e_main.c | 90 +++ 2 files changed, 91 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..b85eaee 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -337,6 +337,7 @@ struct i40e_pf { #define I40E_FLAG_LINK_POLLING_ENABLED BIT_ULL(39) #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40) #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42) +#define I40E_FLAG_PF_MAC BIT_ULL(50) /* tracks features that get auto disabled by errors */ u64 auto_disable_flags; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..3e2d4e0 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, +vsi->back->hw.aq.asq_last_status)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,17 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!!(pf->flags & I40E_FLAG_PF_MAC)) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) {
[PATCH v8] i40e: Look up MAC address in Open Firmware or IDPROM
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The I40E_FLAG_PF_MAC flag in the platform-private i40e_pf structure tracks whether a platform-specific mac address was found, in which case calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan() will be triggered. Reviewed-by: Martin K. Petersen Signed-off-by: Sowmini Varadhan --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson v8: use pf->flags instead of bit value to track if we found a platform-specific macaddr. Other code-style comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |1 + drivers/net/ethernet/intel/i40e/i40e_main.c | 89 +++ 2 files changed, 90 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..b85eaee 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -337,6 +337,7 @@ struct i40e_pf { #define I40E_FLAG_LINK_POLLING_ENABLED BIT_ULL(39) #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40) #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42) +#define I40E_FLAG_PF_MAC BIT_ULL(50) /* tracks features that get auto disabled by errors */ u64 auto_disable_flags; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..ad7c941 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, +vsi->back->hw.aq.asq_last_status)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,17 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!!(pf->flags & I40E_FLAG_PF_MAC)) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) { + dev_warn(>pdev->dev, +"could not set up macaddr; err %d\n", +
[PATCH v8] i40e: Look up MAC address in Open Firmware or IDPROM
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The I40E_FLAG_PF_MAC flag in the platform-private i40e_pf structure tracks whether a platform-specific mac address was found, in which case calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan() will be triggered. Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson v8: use pf->flags instead of bit value to track if we found a platform-specific macaddr. Other code-style comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |1 + drivers/net/ethernet/intel/i40e/i40e_main.c | 89 +++ 2 files changed, 90 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..b85eaee 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -337,6 +337,7 @@ struct i40e_pf { #define I40E_FLAG_LINK_POLLING_ENABLED BIT_ULL(39) #define I40E_FLAG_VEB_MODE_ENABLED BIT_ULL(40) #define I40E_FLAG_NO_PCI_LINK_CHECKBIT_ULL(42) +#define I40E_FLAG_PF_MAC BIT_ULL(50) /* tracks features that get auto disabled by errors */ u64 auto_disable_flags; diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..ad7c941 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, +vsi->back->hw.aq.asq_last_status)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,17 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!!(pf->flags & I40E_FLAG_PF_MAC)) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) { + dev_warn(>pd
[PATCH RESEND v7] i40e: Look up MAC address in Open Firmware or IDPROM
[Apologies for fat-fingering subject in the other attempt] This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The is_default_mac field in the platform-private i40e_pf structure tracks whether the mac address was default or not, and in the latter case, will trigger the calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan(). Reviewed-by: Martin K. Petersen Signed-off-by: Sowmini Varadhan --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 87 +++ 2 files changed, 89 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..6d41757 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -420,6 +420,8 @@ struct i40e_pf { u32 ioremap_len; u32 fd_inv; + + u32 is_default_mac:1; }; struct i40e_mac_filter { diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..40a5d53 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret, aq_err; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + aq_err = vsi->back->hw.aq.asq_last_status; + if (aq_err != I40E_AQ_RC_OK) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, aq_err)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!pf->is_default_mac) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) + goto err_netdev; + } case I40E_VSI_VMDQ2: case I40E_VSI_FCOE: ret = i40e_config_netdev(vsi); @@ -10340,6 +10395,36 @@ static void i40e_print_features(struct i40e_pf *pf) } /** + * i40e_get_platform_mac_addr - get platform-specific MAC address + * + * @pdev: PCI device information struct + * @pf: board private structure + * + * Look up the MAC address in Open Firmware on systems that sup
bar
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The is_default_mac field in the platform-private i40e_pf structure tracks whether the mac address was default or not, and in the latter case, will trigger the calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan(). Reviewed-by: Martin K. Petersen Signed-off-by: Sowmini Varadhan --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 87 +++ 2 files changed, 89 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..6d41757 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -420,6 +420,8 @@ struct i40e_pf { u32 ioremap_len; u32 fd_inv; + + u32 is_default_mac:1; }; struct i40e_mac_filter { diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..40a5d53 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret, aq_err; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + aq_err = vsi->back->hw.aq.asq_last_status; + if (aq_err != I40E_AQ_RC_OK) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, aq_err)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!pf->is_default_mac) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) + goto err_netdev; + } case I40E_VSI_VMDQ2: case I40E_VSI_FCOE: ret = i40e_config_netdev(vsi); @@ -10340,6 +10395,36 @@ static void i40e_print_features(struct i40e_pf *pf) } /** + * i40e_get_platform_mac_addr - get platform-specific MAC address + * + * @pdev: PCI device information struct + * @pf: board private structure + * + * Look up the MAC address in Open Firmware on systems that support it, + * and use IDPROM on SPARC if no OF address is found.
[PATCH RESEND v7] i40e: Look up MAC address in Open Firmware or IDPROM
[Apologies for fat-fingering subject in the other attempt] This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The is_default_mac field in the platform-private i40e_pf structure tracks whether the mac address was default or not, and in the latter case, will trigger the calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan(). Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 87 +++ 2 files changed, 89 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..6d41757 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -420,6 +420,8 @@ struct i40e_pf { u32 ioremap_len; u32 fd_inv; + + u32 is_default_mac:1; }; struct i40e_mac_filter { diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..40a5d53 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret, aq_err; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + aq_err = vsi->back->hw.aq.asq_last_status; + if (aq_err != I40E_AQ_RC_OK) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, aq_err)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!pf->is_default_mac) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) + goto err_netdev; + } case I40E_VSI_VMDQ2: case I40E_VSI_FCOE: ret = i40e_config_netdev(vsi); @@ -10340,6 +10395,36 @@ static void i40e_print_features(struct i40e_pf *pf) } /** + * i40e_get_platform_mac_addr - get platform-specific MAC address + * + * @pdev: PCI device information struct + * @pf: board private structure
bar
This is the i40e equivalent of commit c762dff24c06 ("ixgbe: Look up MAC address in Open Firmware or IDPROM"). As with that fix, attempt to look up the MAC address in Open Firmware on systems that support it, and use IDPROM on SPARC if no OF address is found. In the case of the i40e there is an assumption that the default mac address has already been set up as the primary mac filter on probe, so if this filter is obtained from the Open Firmware or IDPROM, an explicit write is needed via i40e_aq_mac_address_write() and i40e_aq_add_macvlan() invocation. The is_default_mac field in the platform-private i40e_pf structure tracks whether the mac address was default or not, and in the latter case, will trigger the calls to i40e_aq_mac_address_write() and i40e_aq_add_macvlan(). Reviewed-by: Martin K. Petersen <martin.peter...@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varad...@oracle.com> --- v2, v3: Andy Shevchenko comments v4: Shannon Nelson review: explicitly set up mac filters before register_netdev v5: Shannon Nelson code style comments v6: Shannon Nelson code style comments v7: Ensure that i40e_macaddr_init() is called only for VSI_MAIN, and only if the mac address is not the default. Some additional code-refactoring based on comments from Shannon Nelson drivers/net/ethernet/intel/i40e/i40e.h |2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 87 +++ 2 files changed, 89 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h index 0b9537b..6d41757 100644 --- a/drivers/net/ethernet/intel/i40e/i40e.h +++ b/drivers/net/ethernet/intel/i40e/i40e.h @@ -420,6 +420,8 @@ struct i40e_pf { u32 ioremap_len; u32 fd_inv; + + u32 is_default_mac:1; }; struct i40e_mac_filter { diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 9e6268b..40a5d53 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -24,6 +24,15 @@ * **/ +#include +#include +#include + +#ifdef CONFIG_SPARC +#include +#include +#endif + /* Local includes */ #include "i40e.h" #include "i40e_diag.h" @@ -9387,6 +9396,44 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi) } /** + * i40e_macaddr_init - explicitly write the mac address filters. + * + * @vsi: pointer to the vsi. + * @macaddr: the MAC address + * + * This is needed when the macaddr has been obtained by other + * means than the default, e.g., from Open Firmware or IDPROM. + * Returns 0 on success, negative on failure + **/ +static int i40e_macaddr_init(struct i40e_vsi *vsi, u8 *macaddr) +{ + int ret, aq_err; + struct i40e_aqc_add_macvlan_element_data element; + + ret = i40e_aq_mac_address_write(>back->hw, + I40E_AQC_WRITE_TYPE_LAA_WOL, + macaddr, NULL); + if (ret) { + dev_info(>back->pdev->dev, +"Addr change for VSI failed: %d\n", ret); + return -EADDRNOTAVAIL; + } + + memset(, 0, sizeof(element)); + ether_addr_copy(element.mac_addr, macaddr); + element.flags = cpu_to_le16(I40E_AQC_MACVLAN_ADD_PERFECT_MATCH); + ret = i40e_aq_add_macvlan(>back->hw, vsi->seid, , 1, NULL); + aq_err = vsi->back->hw.aq.asq_last_status; + if (aq_err != I40E_AQ_RC_OK) { + dev_info(>back->pdev->dev, +"add filter failed err %s aq_err %s\n", +i40e_stat_str(>back->hw, ret), +i40e_aq_str(>back->hw, aq_err)); + } + return ret; +} + +/** * i40e_vsi_setup - Set up a VSI by a given type * @pf: board private structure * @type: VSI type @@ -9510,6 +9557,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type, switch (vsi->type) { /* setup the netdev if needed */ case I40E_VSI_MAIN: + /* Apply relevant filters if a platform-specific mac +* address was selected. +*/ + if (!pf->is_default_mac) { + ret = i40e_macaddr_init(vsi, pf->hw.mac.addr); + if (ret) + goto err_netdev; + } case I40E_VSI_VMDQ2: case I40E_VSI_FCOE: ret = i40e_config_netdev(vsi); @@ -10340,6 +10395,36 @@ static void i40e_print_features(struct i40e_pf *pf) } /** + * i40e_get_platform_mac_addr - get platform-specific MAC address + * + * @pdev: PCI device information struct + * @pf: board private structure + * + * Look up the MAC address in Open Firmware on syst