Re: [PATCH] ravb: remove erroneous comment
Hello! On 3/4/2018 1:39 AM, Niklas Söderlund wrote: When addressing a review comment in a early version of the offending patch a comment where left in which should have been removed. Remove the s/where/was/? comment to keep it consistent with the code. Fixes: 75efa06f457bbed3 ("ravb: add support for changing MTU") Reported-by: Sergei Shtylyov Signed-off-by: Niklas Söderlund Acked-by: Sergei Shtylyov [...] MBR, Sergei
Re: [PATCH net-next] selftests: forwarding: Add suppport to create veth interfaces
On Fri, Mar 02, 2018 at 08:45:53AM -0800, David Ahern wrote: > For tests using veth interfaces, the test infrastructure can create > the netdevs if they do not exist. Arguably this is a preferred approach > since the tests require p$N and p$(N+1) to be pairs. > > Signed-off-by: David Ahern [...] > diff --git a/tools/testing/selftests/net/forwarding/lib.sh > b/tools/testing/selftests/net/forwarding/lib.sh > index d0af52109360..2ce98c6a8c25 100644 > --- a/tools/testing/selftests/net/forwarding/lib.sh > +++ b/tools/testing/selftests/net/forwarding/lib.sh > @@ -76,6 +76,39 @@ done > > ## > # Network interfaces configuration > > +create_netif_veth() > +{ > + local i > + > + for i in $(eval echo {1..$NUM_NETIFS}); do > + j=$((i+1)) local j=$((i+1)) and drop a line. > + ip link show dev ${NETIFS[p$i]} &> /dev/null > + if [[ $? -ne 0 ]]; then > + ip link add ${NETIFS[p$i]} type veth peer name > ${NETIFS[p$j]} Need to break this one. FWIW, I have this in my config: $ cat ~/.vim/after/ftplugin/sh.vim ... highlight OverLength ctermbg=red ctermfg=white match OverLength /\%81v.\+/ Cool patch! Tested on my machine. > + if [[ $? -ne 0 ]]; then > + echo "Failed to create netif" > + exit 1 > + fi > + fi > + i=$j > + done > +} > + > +create_netif() > +{ > + case "$NETIF_TYPE" in > + veth) create_netif_veth > + ;; > + *) echo "Can not create interfaces of type \'$NETIF_TYPE\'" > +exit 1 > +;; > + esac > +} > + > +if [[ "$NETIF_CREATE" = "yes" ]]; then > + create_netif > +fi > + > for i in $(eval echo {1..$NUM_NETIFS}); do > ip link show dev ${NETIFS[p$i]} &> /dev/null > if [[ $? -ne 0 ]]; then > -- > 2.11.0 >
[PATCH] staging: ipx: Replace printk() with appropriate pr_*() macro
Using pr_() is more concise than printk(KERN_). Replace printks having a log level with the appropriate pr_*() macros. Signed-off-by: Arushi Singhal --- drivers/staging/ipx/af_ipx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c index d21a9d1..27f4461 100644 --- a/drivers/staging/ipx/af_ipx.c +++ b/drivers/staging/ipx/af_ipx.c @@ -744,7 +744,7 @@ static void ipxitf_discover_netnum(struct ipx_interface *intrfc, intrfc->if_netnum = cb->ipx_source_net; ipxitf_add_local_route(intrfc); } else { - printk(KERN_WARNING "IPX: Network number collision " + pr_warn("IPX: Network number collision " "%lx\n%s %s and %s %s\n", (unsigned long) ntohl(cb->ipx_source_net), ipx_device_name(i), -- 2.7.4
Re: [PATCH net-next 3/5] net: mvpp2: use a data size of 10kB for Tx FIFO on port 0
Hello, On Sun, 4 Mar 2018 06:29:59 +, Stefan Chulski wrote: > > Is there a reason to hardcode 10KB for port 0, and 3KB for the other ports ? > > Would there be use cases where the user may want different configurations > > ? > > Design requirement are 10KB TX FIFO for the 10Gb/sec and 2.5KB for the > 2.5Gb/sec. What is a "design requirement" ? Is it a HW design limitation ? > Since only port 0 support 10Gb/sec and ports 1&2 support up to 2.5Gb/sec. > I don't see any reason to change this configurations. > Also TX FIFO size could be set only during probe. > > > It's just that it feels very "hardcoded" to enforce specifically those > > numbers. > > > > Also, does it make sense to mention the CP110 here ? Is this 19 KB > > limitation > > a limit of the PPv2.2 IP, or of the CP110 ? > > PPv2.2 IP is part of 110 communication processor. Thanks, I know this :-) > Next communication processor will has different Packet processor or next > generation of PPv2.x > Limit is PPv2.2 TX FIFO. So, the limitation has nothing to do with CP110 really, it's just a limitation of PPv2.2, and mentioning CP110 in the comment doesn't make much sense, correct ? Best regards, Thomas -- Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons) Embedded Linux and Kernel engineering http://bootlin.com
Re: [PATCH net-next 5/5] net: mvpp2: jumbo frames support
Hello, On Sun, 4 Mar 2018 06:56:02 +, Stefan Chulski wrote: > > > + if (port->pool_long->id == MVPP2_BM_JUMBO && port->id != 0) { > > > > Again, all over the place we hardcode the fact that Jumbo frames can only be > > used on port 0. I know port 0 is the only one that can do 10G, but are there > > possibly some use cases where you may want Jumbo frame on another port > > ? > > > > This all really feels very hardcoded to me. > > > > All ports support Jumbo frames. > But only port 0 can do TX HW checksum offload(due to TX FIFO size). > > Packet processor 2.2 has only 19KB TX FIFO size. > So in TX FIFO config code assign for Port 0 - 10KB, Port 1 - 3KB and Port 1 - > 3KB. Yes, but I was also questioning whether hardcoding this configuration was correct. > To perform checksum in HW, HW obviously should work in store and forward > mode. Store all frame in TX FIFO and then check checksum. > If mtu 1500B, everything fine and all port can do this. > > If mtu is 9KB and 9KB frame transmitted, Port 0 still can do HW checksum. But > ports 1 and 2 doesn't has enough FIFO for this. > So we cannot offload this feature and SW should perform checksum. So perhaps the real check should not be "port 0", but whether the MTU is higher or lower than the TX FIFO size assigned to the current port. This would express in much better way the reason why HW checksum can be used or not. > > > + /* 9704 == 9728 - 20 and rounding to 8 */ > > > + dev->max_mtu = MVPP2_BM_JUMBO_PKT_SIZE; > > > > Is this correct for all ports ? Shouldn't the maximum MTU be different > > between port 0 (that supports Jumbo frames) and the other ports ? > > This is correct for all ports. All ports can support Jumbo frames. OK. With your explanation above, I understand better. Best regards, Thomas -- Thomas Petazzoni, CTO, Bootlin (formerly Free Electrons) Embedded Linux and Kernel engineering http://bootlin.com
RE: [PATCH net-next 3/5] net: mvpp2: use a data size of 10kB for Tx FIFO on port 0
> -Original Message- > From: Thomas Petazzoni [mailto:thomas.petazz...@bootlin.com] > Sent: Sunday, March 04, 2018 11:25 AM > To: Stefan Chulski > Cc: Antoine Tenart ; da...@davemloft.net; > Yan Markman ; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; maxime.chevall...@bootlin.com; > gregory.clem...@bootlin.com; miquel.ray...@bootlin.com; Nadav Haklai > ; m...@semihalf.com > Subject: Re: [PATCH net-next 3/5] net: mvpp2: use a data size of 10kB for Tx > FIFO on port 0 > > Hello, > > On Sun, 4 Mar 2018 06:29:59 +, Stefan Chulski wrote: > > > > Is there a reason to hardcode 10KB for port 0, and 3KB for the other ports > ? > > > Would there be use cases where the user may want different > > > configurations ? > > > > Design requirement are 10KB TX FIFO for the 10Gb/sec and 2.5KB for the > 2.5Gb/sec. > > What is a "design requirement" ? Is it a HW design limitation ? We can call it HW design limitation. Anyway to support 10Gb/sec port should have at least 10KB TX FIFO. > So, the limitation has nothing to do with CP110 really, it's just a > limitation of > PPv2.2, and mentioning CP110 in the comment doesn't make much sense, > correct ? I will change it. Stefan.
RE: [PATCH net-next 5/5] net: mvpp2: jumbo frames support
> > To perform checksum in HW, HW obviously should work in store and > forward mode. Store all frame in TX FIFO and then check checksum. > > If mtu 1500B, everything fine and all port can do this. > > > > If mtu is 9KB and 9KB frame transmitted, Port 0 still can do HW checksum. > But ports 1 and 2 doesn't has enough FIFO for this. > > So we cannot offload this feature and SW should perform checksum. > > So perhaps the real check should not be "port 0", but whether the MTU is > higher or lower than the TX FIFO size assigned to the current port. > This would express in much better way the reason why HW checksum can be > used or not. I really don't want involve MTU size here, for each packet we should add to MTU overhead added by HW(offset, CRC, DSA tags and etc). I prefer just to check: port TX FIFO size is 10KB -> port can support HW checksum offload. Do you suggest to keep some shadow table with ports TX FIFO sizes for this? Thanks, Stefan.
[PATCH net-next] net: Make RX-FCS and LRO mutually exclusive
LRO and RX-FCS offloads cannot be enabled at the same time since it is not clear what should happen to the FCS of each coalesced packet. The FCS is not really part of the TCP payload, hence cannot be merged into one big packet. On the other hand, providing one big LRO packet with one FCS contradicts the RX-FCS feature goal. Use the fix features mechanism in order to prevent intersection of the features and drop LRO in case RX-FCS is requested. Enabling RX-FCS while LRO is enabled will result in: $ ethtool -K ens6 rx-fcs on Actual changes: large-receive-offload: off [requested on] rx-fcs: on Signed-off-by: Gal Pressman Reviewed-by: Tariq Toukan --- net/core/dev.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index c9d3058..1bc3792 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7542,6 +7542,12 @@ static netdev_features_t netdev_fix_features(struct net_device *dev, } } + /* LRO feature cannot be combined with RX-FCS */ + if ((features & NETIF_F_LRO) && (features & NETIF_F_RXFCS)) { + netdev_dbg(dev, "Dropping LRO feature since RX-FCS is requested.\n"); + features &= ~NETIF_F_LRO; + } + return features; } -- 2.7.4
[PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion
When inserting duplicate objects (those with the same key), current rhashtable implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey --- include/linux/rhashtable.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index c9df252..668a21f 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast( if (!key || (params.obj_cmpfn ? params.obj_cmpfn(&arg, rht_obj(ht, head)) : -rhashtable_compare(&arg, rht_obj(ht, head +rhashtable_compare(&arg, rht_obj(ht, head { + pprev = &head->next; continue; + } data = rht_obj(ht, head); -- 1.8.4.3
[PATCH net 0/2] rhashtable: Fix rhltable duplicates insertion
On our mlx5 driver fs_core.c, we use the rhltable interface to store flow groups. We noticed that sometimes we get a warning that flow group isn't found at removal. This rare case was caused when a specific scenrio happened, insertion of a flow group with a similar match criteria (duplicate), but only where the flow group rhash_head was second (or not first) on the relevant rhashtable bucket list. The first patch fixes it, and the second one adds a test that show it is now working. Paul Blakey (2): rhashtable: Fix rhltable duplicates insertion test_rhashtable: add test case for rhl_table with duplicate objects include/linux/rhashtable.h | 4 +- lib/test_rhashtable.c | 121 + 2 files changed, 124 insertions(+), 1 deletion(-) -- 1.8.4.3
[PATCH net 2/2] test_rhashtable: add test case for rhl_table with duplicate objects
Tries to insert duplicates in the middle of bucket's chain: bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]] Reuses tid to distinguish the elements insertion order. Signed-off-by: Paul Blakey --- lib/test_rhashtable.c | 121 ++ 1 file changed, 121 insertions(+) diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index 76d3667..4a5f331 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -79,6 +79,21 @@ struct thread_data { struct test_obj *objs; }; +static u32 my_hashfn(const void *data, u32 len, u32 seed) +{ + const struct test_obj_rhl *obj = data; + + return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE; +} + +static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj) +{ + const struct test_obj_rhl *test_obj = obj; + const struct test_obj_val *val = arg->key; + + return test_obj->value.id - val->id; +} + static struct rhashtable_params test_rht_params = { .head_offset = offsetof(struct test_obj, node), .key_offset = offsetof(struct test_obj, value), @@ -87,6 +102,17 @@ struct thread_data { .nulls_base = (3U << RHT_BASE_SHIFT), }; +static struct rhashtable_params test_rht_params_dup = { + .head_offset = offsetof(struct test_obj_rhl, list_node), + .key_offset = offsetof(struct test_obj_rhl, value), + .key_len = sizeof(struct test_obj_val), + .hashfn = jhash, + .obj_hashfn = my_hashfn, + .obj_cmpfn = my_cmpfn, + .nelem_hint = 128, + .automatic_shrinking = false, +}; + static struct semaphore prestart_sem; static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0); @@ -465,6 +491,99 @@ static int __init test_rhashtable_max(struct test_obj *array, return err; } +static unsigned int __init print_ht(struct rhltable *rhlt) +{ + struct rhashtable *ht; + const struct bucket_table *tbl; + char buff[512] = ""; + unsigned int i, cnt = 0; + + ht = &rhlt->ht; + tbl = rht_dereference(ht->tbl, ht); + for (i = 0; i < tbl->size; i++) { + struct rhash_head *pos, *next; + struct test_obj_rhl *p; + + pos = rht_dereference(tbl->buckets[i], ht); + next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : NULL; + + if (!rht_is_a_nulls(pos)) { + sprintf(buff, "%s\nbucket[%d] -> ", buff, i); + } + + while (!rht_is_a_nulls(pos)) { + struct rhlist_head *list = container_of(pos, struct rhlist_head, rhead); + sprintf(buff, "%s[[", buff); + do { + pos = &list->rhead; + list = rht_dereference(list->next, ht); + p = rht_obj(ht, pos); + + sprintf(buff, "%s val %d (tid=%d)%s", buff, p->value.id, p->value.tid, + list? ", " : " "); + cnt++; + } while (list); + + pos = next, + next = !rht_is_a_nulls(pos) ? + rht_dereference(pos->next, ht) : NULL; + + sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " -> " : ""); + } + } + printk(KERN_ERR "\n ht: %s\n-\n", buff); + + return cnt; +} + +static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects, + int cnt) +{ + struct rhltable rhlt; + unsigned int i, ret; + int err; + + err = rhltable_init(&rhlt, &test_rht_params_dup); + if (WARN_ON(err)) + return err; + + for (i = 0; i < cnt; i++) { + rhl_test_objects[i].value.tid = i; + err = rhltable_insert(&rhlt, &rhl_test_objects[i].list_node, + test_rht_params_dup); + if (WARN(err, "error %d on element %d\n", err, i)) + goto skip_print; + } + + ret = print_ht(&rhlt); + WARN(ret != cnt, "missing rhltable elements (%d != %d)\n", ret, cnt); + +skip_print: + rhltable_destroy(&rhlt); + + return 0; +} + +static int __init test_insert_duplicates_run(void) +{ + struct test_obj_rhl rhl_test_objects[3] = {}; + + pr_info("test inserting duplicates\n"); + + /* two different values that map to same bucket */ + rhl_test_objects[0].value.id = 1; + rhl_test_objects[1].value.id = 21; + + /* and another duplicate with same as [0] value +* which will be second on the bucket list */ + rhl_test_objects[2].value.id = rhl_test_objects[0].value.id; + + test_insert_dup(rhl_test_objects, 2); + test_insert_dup(rhl_test_objects, 3); + + return 0; +} +
Re: [PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion
On Sun, Mar 04, 2018 at 02:34:26PM +0200, Paul Blakey wrote: > When inserting duplicate objects (those with the same key), > current rhashtable implementation messes up the chain pointers by > updating the bucket pointer instead of prev next pointer to the > newly inserted node. This causes missing elements on removal and > travesal. > > Fix that by properly updating pprev pointer to point to > the correct rhash_head next pointer. > > Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') > Signed-off-by: Paul Blakey Nack. You must not insert objects with the same key through rhashtable. The reason is that we cannot reliably fetch all of the objects with the same key during a resize. If you need duplicate objects, you should use rhlist. Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion
When inserting duplicate objects (those with the same key), current rhlist implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey --- include/linux/rhashtable.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index c9df252..668a21f 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast( if (!key || (params.obj_cmpfn ? params.obj_cmpfn(&arg, rht_obj(ht, head)) : -rhashtable_compare(&arg, rht_obj(ht, head +rhashtable_compare(&arg, rht_obj(ht, head { + pprev = &head->next; continue; + } data = rht_obj(ht, head); -- 1.8.4.3
[PATCH net v2 2/2] test_rhashtable: add test case for rhltable with duplicate objects
Tries to insert duplicates in the middle of bucket's chain: bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]] Reuses tid to distinguish the elements insertion order. Signed-off-by: Paul Blakey --- lib/test_rhashtable.c | 121 ++ 1 file changed, 121 insertions(+) diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index 76d3667..4a5f331 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -79,6 +79,21 @@ struct thread_data { struct test_obj *objs; }; +static u32 my_hashfn(const void *data, u32 len, u32 seed) +{ + const struct test_obj_rhl *obj = data; + + return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE; +} + +static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj) +{ + const struct test_obj_rhl *test_obj = obj; + const struct test_obj_val *val = arg->key; + + return test_obj->value.id - val->id; +} + static struct rhashtable_params test_rht_params = { .head_offset = offsetof(struct test_obj, node), .key_offset = offsetof(struct test_obj, value), @@ -87,6 +102,17 @@ struct thread_data { .nulls_base = (3U << RHT_BASE_SHIFT), }; +static struct rhashtable_params test_rht_params_dup = { + .head_offset = offsetof(struct test_obj_rhl, list_node), + .key_offset = offsetof(struct test_obj_rhl, value), + .key_len = sizeof(struct test_obj_val), + .hashfn = jhash, + .obj_hashfn = my_hashfn, + .obj_cmpfn = my_cmpfn, + .nelem_hint = 128, + .automatic_shrinking = false, +}; + static struct semaphore prestart_sem; static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0); @@ -465,6 +491,99 @@ static int __init test_rhashtable_max(struct test_obj *array, return err; } +static unsigned int __init print_ht(struct rhltable *rhlt) +{ + struct rhashtable *ht; + const struct bucket_table *tbl; + char buff[512] = ""; + unsigned int i, cnt = 0; + + ht = &rhlt->ht; + tbl = rht_dereference(ht->tbl, ht); + for (i = 0; i < tbl->size; i++) { + struct rhash_head *pos, *next; + struct test_obj_rhl *p; + + pos = rht_dereference(tbl->buckets[i], ht); + next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : NULL; + + if (!rht_is_a_nulls(pos)) { + sprintf(buff, "%s\nbucket[%d] -> ", buff, i); + } + + while (!rht_is_a_nulls(pos)) { + struct rhlist_head *list = container_of(pos, struct rhlist_head, rhead); + sprintf(buff, "%s[[", buff); + do { + pos = &list->rhead; + list = rht_dereference(list->next, ht); + p = rht_obj(ht, pos); + + sprintf(buff, "%s val %d (tid=%d)%s", buff, p->value.id, p->value.tid, + list? ", " : " "); + cnt++; + } while (list); + + pos = next, + next = !rht_is_a_nulls(pos) ? + rht_dereference(pos->next, ht) : NULL; + + sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " -> " : ""); + } + } + printk(KERN_ERR "\n ht: %s\n-\n", buff); + + return cnt; +} + +static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects, + int cnt) +{ + struct rhltable rhlt; + unsigned int i, ret; + int err; + + err = rhltable_init(&rhlt, &test_rht_params_dup); + if (WARN_ON(err)) + return err; + + for (i = 0; i < cnt; i++) { + rhl_test_objects[i].value.tid = i; + err = rhltable_insert(&rhlt, &rhl_test_objects[i].list_node, + test_rht_params_dup); + if (WARN(err, "error %d on element %d\n", err, i)) + goto skip_print; + } + + ret = print_ht(&rhlt); + WARN(ret != cnt, "missing rhltable elements (%d != %d)\n", ret, cnt); + +skip_print: + rhltable_destroy(&rhlt); + + return 0; +} + +static int __init test_insert_duplicates_run(void) +{ + struct test_obj_rhl rhl_test_objects[3] = {}; + + pr_info("test inserting duplicates\n"); + + /* two different values that map to same bucket */ + rhl_test_objects[0].value.id = 1; + rhl_test_objects[1].value.id = 21; + + /* and another duplicate with same as [0] value +* which will be second on the bucket list */ + rhl_test_objects[2].value.id = rhl_test_objects[0].value.id; + + test_insert_dup(rhl_test_objects, 2); + test_insert_dup(rhl_test_objects, 3); + + return 0; +} +
[PATCH net v2 0/2] rhlist: Fix rhltable duplicates insertion
On our mlx5 driver fs_core.c, we use the rhltable interface to store flow groups. We noticed that sometimes we get a warning that flow group isn't found at removal. This rare case was caused when a specific scenario happened, insertion of a flow group with a similar match criteria (a duplicate), but only where the flow group rhash_head was second (or not first) on the relevant rhashtable bucket list. The first patch fixes it, and the second one adds a test that show it is now working. Paul. v1 --> v2 changes: * Changed commit messages to better reflect the change Paul Blakey (2): rhashtable: Fix rhlist duplicates insertion test_rhashtable: add test case for rhltable with duplicate objects include/linux/rhashtable.h | 4 +- lib/test_rhashtable.c | 121 + 2 files changed, 124 insertions(+), 1 deletion(-) -- 1.8.4.3
Re: [PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion
On 04/03/2018 14:57, Herbert Xu wrote: On Sun, Mar 04, 2018 at 02:34:26PM +0200, Paul Blakey wrote: When inserting duplicate objects (those with the same key), current rhashtable implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey Nack. You must not insert objects with the same key through rhashtable. The reason is that we cannot reliably fetch all of the objects with the same key during a resize. If you need duplicate objects, you should use rhlist. Cheers, Hi, I meant the rhlist interface here, sent v2. Thanks, Paul.
Re: [PATCH] staging: ipx: Replace printk() with appropriate pr_*() macro
On Sun, Mar 04, 2018 at 02:29:35PM +0530, Arushi Singhal wrote: > Using pr_() is more concise than printk(KERN_). > Replace printks having a log level with the appropriate pr_*() macros. > > Signed-off-by: Arushi Singhal > --- > drivers/staging/ipx/af_ipx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c > index d21a9d1..27f4461 100644 > --- a/drivers/staging/ipx/af_ipx.c > +++ b/drivers/staging/ipx/af_ipx.c > @@ -744,7 +744,7 @@ static void ipxitf_discover_netnum(struct ipx_interface > *intrfc, > intrfc->if_netnum = cb->ipx_source_net; > ipxitf_add_local_route(intrfc); > } else { > - printk(KERN_WARNING "IPX: Network number collision " > + pr_warn("IPX: Network number collision " It is a driver, so it would be best to use dev_warn() or even better yet, net_warn(). Please try to make that change instead. thanks, greg k-h
Re: [PATCH net 1/2] rhashtable: Fix rhltable duplicates insertion
On Sun, Mar 04, 2018 at 02:34:26PM +0200, Paul Blakey wrote: > When inserting duplicate objects (those with the same key), > current rhashtable implementation messes up the chain pointers by > updating the bucket pointer instead of prev next pointer to the > newly inserted node. This causes missing elements on removal and > travesal. > > Fix that by properly updating pprev pointer to point to > the correct rhash_head next pointer. > > Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') > Signed-off-by: Paul Blakey Ah I see, thanks for catching this! Acked-by: Herbert Xu Cheers, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion
On Sun, Mar 04, 2018 at 03:26:48PM +0200, Paul Blakey wrote: > When inserting duplicate objects (those with the same key), > current rhlist implementation messes up the chain pointers by > updating the bucket pointer instead of prev next pointer to the > newly inserted node. This causes missing elements on removal and > travesal. > > Fix that by properly updating pprev pointer to point to > the correct rhash_head next pointer. > > Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') > Signed-off-by: Paul Blakey Oops, replied to the wrong email. Acked-by: Herbert Xu Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH net v2 2/2] test_rhashtable: add test case for rhltable with duplicate objects
On Sun, Mar 04, 2018 at 03:26:49PM +0200, Paul Blakey wrote: > Tries to insert duplicates in the middle of bucket's chain: > bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]] > > Reuses tid to distinguish the elements insertion order. > > Signed-off-by: Paul Blakey Acked-by: Herbert Xu -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH net-next] selftests: Extend the tc action test for action mirror
Currently the tc action test is used only to test mirred redirect action. This patch extends it for mirred mirror. Signed-off-by: Jiri Pirko Reviewed-by: Ido Schimmel Signed-off-by: Arkadi Sharshevsky --- tools/testing/selftests/net/forwarding/tc_actions.sh | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/tools/testing/selftests/net/forwarding/tc_actions.sh b/tools/testing/selftests/net/forwarding/tc_actions.sh index 8423431..bc09a36 100755 --- a/tools/testing/selftests/net/forwarding/tc_actions.sh +++ b/tools/testing/selftests/net/forwarding/tc_actions.sh @@ -45,8 +45,10 @@ switch_destroy() simple_if_fini $swp1 192.0.2.2/24 } -mirred_egress_redirect_test() +mirred_egress_test() { + local action=$1 + RET=0 tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \ @@ -59,19 +61,19 @@ mirred_egress_redirect_test() check_fail $? "Matched without redirect rule inserted" tc filter add dev $swp1 ingress protocol ip pref 1 handle 101 flower \ - $tcflags dst_ip 192.0.2.2 action mirred egress redirect \ + $tcflags dst_ip 192.0.2.2 action mirred egress $action \ dev $swp2 $MZ $h1 -c 1 -p 64 -a $h1mac -b $h2mac -A 192.0.2.1 -B 192.0.2.2 \ -t ip -q tc_check_packets "dev $h2 ingress" 101 1 - check_err $? "Did not match incoming redirected packet" + check_err $? "Did not match incoming $action packet" tc filter del dev $swp1 ingress protocol ip pref 1 handle 101 flower tc filter del dev $h2 ingress protocol ip pref 1 handle 101 flower - log_test "mirred egress redirect ($tcflags)" + log_test "mirred egress $action ($tcflags)" } gact_drop_and_ok_test() @@ -180,7 +182,8 @@ setup_prepare setup_wait gact_drop_and_ok_test -mirred_egress_redirect_test +mirred_egress_test "redirect" +mirred_egress_test "mirror" tc_offload_check if [[ $? -ne 0 ]]; then @@ -188,7 +191,8 @@ if [[ $? -ne 0 ]]; then else tcflags="skip_sw" gact_drop_and_ok_test - mirred_egress_redirect_test + mirred_egress_test "redirect" + mirred_egress_test "mirror" gact_trap_test fi -- 2.4.11
Re: [RFC PATCH V1 01/12] audit: add container id
On Sat, Mar 3, 2018 at 4:19 AM, Serge E. Hallyn wrote: > On Thu, Mar 01, 2018 at 02:41:04PM -0500, Richard Guy Briggs wrote: > ... >> +static inline bool audit_containerid_set(struct task_struct *tsk) > > Hi Richard, > > the calls to audit_containerid_set() confused me. Could you make it > is_audit_containerid_set() or audit_containerid_isset()? I haven't gone through the entire patchset yet, but I wanted to quickly comment on this ... I really dislike the function-names-as-sentences approach and would would greatly prefer audit_containerid_isset(). >> +{ >> + return audit_get_containerid(tsk) != INVALID_CID; >> +} -- paul moore www.paul-moore.com
Re: [PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion
On 04/03/2018 15:26, Paul Blakey wrote: > When inserting duplicate objects (those with the same key), > current rhlist implementation messes up the chain pointers by > updating the bucket pointer instead of prev next pointer to the > newly inserted node. This causes missing elements on removal and > travesal. > > Fix that by properly updating pprev pointer to point to > the correct rhash_head next pointer. > > Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') > Signed-off-by: Paul Blakey > --- > include/linux/rhashtable.h | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h > index c9df252..668a21f 100644 > --- a/include/linux/rhashtable.h > +++ b/include/linux/rhashtable.h > @@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast( > if (!key || > (params.obj_cmpfn ? >params.obj_cmpfn(&arg, rht_obj(ht, head)) : > - rhashtable_compare(&arg, rht_obj(ht, head > + rhashtable_compare(&arg, rht_obj(ht, head { > + pprev = &head->next; It seems rhashtable_lookup_one() might need the same fix. > continue; > + } > > data = rht_obj(ht, head); > > Mark
[PATCH net v3 2/2] test_rhashtable: add test case for rhltable with duplicate objects
Tries to insert duplicates in the middle of bucket's chain: bucket 1: [[val 21 (tid=1)]] -> [[ val 1 (tid=2), val 1 (tid=0) ]] Reuses tid to distinguish the elements insertion order. Signed-off-by: Paul Blakey --- lib/test_rhashtable.c | 134 ++ 1 file changed, 134 insertions(+) diff --git a/lib/test_rhashtable.c b/lib/test_rhashtable.c index 76d3667..f4000c1 100644 --- a/lib/test_rhashtable.c +++ b/lib/test_rhashtable.c @@ -79,6 +79,21 @@ struct thread_data { struct test_obj *objs; }; +static u32 my_hashfn(const void *data, u32 len, u32 seed) +{ + const struct test_obj_rhl *obj = data; + + return (obj->value.id % 10) << RHT_HASH_RESERVED_SPACE; +} + +static int my_cmpfn(struct rhashtable_compare_arg *arg, const void *obj) +{ + const struct test_obj_rhl *test_obj = obj; + const struct test_obj_val *val = arg->key; + + return test_obj->value.id - val->id; +} + static struct rhashtable_params test_rht_params = { .head_offset = offsetof(struct test_obj, node), .key_offset = offsetof(struct test_obj, value), @@ -87,6 +102,17 @@ struct thread_data { .nulls_base = (3U << RHT_BASE_SHIFT), }; +static struct rhashtable_params test_rht_params_dup = { + .head_offset = offsetof(struct test_obj_rhl, list_node), + .key_offset = offsetof(struct test_obj_rhl, value), + .key_len = sizeof(struct test_obj_val), + .hashfn = jhash, + .obj_hashfn = my_hashfn, + .obj_cmpfn = my_cmpfn, + .nelem_hint = 128, + .automatic_shrinking = false, +}; + static struct semaphore prestart_sem; static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0); @@ -465,6 +491,112 @@ static int __init test_rhashtable_max(struct test_obj *array, return err; } +static unsigned int __init print_ht(struct rhltable *rhlt) +{ + struct rhashtable *ht; + const struct bucket_table *tbl; + char buff[512] = ""; + unsigned int i, cnt = 0; + + ht = &rhlt->ht; + tbl = rht_dereference(ht->tbl, ht); + for (i = 0; i < tbl->size; i++) { + struct rhash_head *pos, *next; + struct test_obj_rhl *p; + + pos = rht_dereference(tbl->buckets[i], ht); + next = !rht_is_a_nulls(pos) ? rht_dereference(pos->next, ht) : NULL; + + if (!rht_is_a_nulls(pos)) { + sprintf(buff, "%s\nbucket[%d] -> ", buff, i); + } + + while (!rht_is_a_nulls(pos)) { + struct rhlist_head *list = container_of(pos, struct rhlist_head, rhead); + sprintf(buff, "%s[[", buff); + do { + pos = &list->rhead; + list = rht_dereference(list->next, ht); + p = rht_obj(ht, pos); + + sprintf(buff, "%s val %d (tid=%d)%s", buff, p->value.id, p->value.tid, + list? ", " : " "); + cnt++; + } while (list); + + pos = next, + next = !rht_is_a_nulls(pos) ? + rht_dereference(pos->next, ht) : NULL; + + sprintf(buff, "%s]]%s", buff, !rht_is_a_nulls(pos) ? " -> " : ""); + } + } + printk(KERN_ERR "\n ht: %s\n-\n", buff); + + return cnt; +} + +static int __init test_insert_dup(struct test_obj_rhl *rhl_test_objects, + int cnt, bool slow) +{ + struct rhltable rhlt; + unsigned int i, ret; + const char *key; + int err = 0; + + err = rhltable_init(&rhlt, &test_rht_params_dup); + if (WARN_ON(err)) + return err; + + for (i = 0; i < cnt; i++) { + rhl_test_objects[i].value.tid = i; + key = rht_obj(&rhlt.ht, &rhl_test_objects[i].list_node.rhead); + key += test_rht_params_dup.key_offset; + + if (slow) { + err = PTR_ERR(rhashtable_insert_slow(&rhlt.ht, key, + &rhl_test_objects[i].list_node.rhead)); + if (err == -EAGAIN) + err = 0; + } else + err = rhltable_insert(&rhlt, + &rhl_test_objects[i].list_node, + test_rht_params_dup); + if (WARN(err, "error %d on element %d/%d (%s)\n", err, i, cnt, slow? "slow" : "fast")) + goto skip_print; + } + + ret = print_ht(&rhlt); + WARN(ret != cnt, "missing rhltable elements (%d != %d, %s)\n", ret, cnt, slow? "slow" : "fast"); + +skip_print: + rhltable_destroy(&rhlt); + + ret
[PATCH net v3 1/2] rhashtable: Fix rhlist duplicates insertion
When inserting duplicate objects (those with the same key), current rhlist implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Issue: 1241076 Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7 Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey --- include/linux/rhashtable.h | 4 +++- lib/rhashtable.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index c9df252..668a21f 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast( if (!key || (params.obj_cmpfn ? params.obj_cmpfn(&arg, rht_obj(ht, head)) : -rhashtable_compare(&arg, rht_obj(ht, head +rhashtable_compare(&arg, rht_obj(ht, head { + pprev = &head->next; continue; + } data = rht_obj(ht, head); diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 3825c30..47de025 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -506,8 +506,10 @@ static void *rhashtable_lookup_one(struct rhashtable *ht, if (!key || (ht->p.obj_cmpfn ? ht->p.obj_cmpfn(&arg, rht_obj(ht, head)) : -rhashtable_compare(&arg, rht_obj(ht, head +rhashtable_compare(&arg, rht_obj(ht, head { + pprev = &head->next; continue; + } if (!ht->rhlist) return rht_obj(ht, head); -- 1.8.4.3
[PATCH net v3 0/2] rhashtable: Fix rhltable duplicates insertion
On our mlx5 driver fs_core.c, we use the rhltable interface to store flow groups. We noticed that sometimes we get a warning that flow group isn't found at removal. This rare case was caused when a specific scenario happened, insertion of a flow group with a similar match criteria (a duplicate), but only where the flow group rhash_head was second (or not first) on the relevant rhashtable bucket list. The first patch fixes it, and the second one adds a test that show it is now working. Paul. v3 --> v2 changes: * added missing fix in rhashtable_lookup_one code path as well. v1 --> v2 changes: * Changed commit messages to better reflect the change Paul Blakey (2): rhashtable: Fix rhlist duplicates insertion test_rhashtable: add test case for rhltable with duplicate objects include/linux/rhashtable.h | 4 +- lib/rhashtable.c | 4 +- lib/test_rhashtable.c | 134 + 3 files changed, 140 insertions(+), 2 deletions(-) -- 1.8.4.3
Re: dsa with 2 rgmii channels (1st is 2 switches & 2nd is phy)
Hi Michael > mdio { > compatible = "cdns,macb-mdio"; > /* reg = <0xe000b000 0x1000>; */ > /* clocks = <&clkc 30>, <&clkc 30>, <&clkc 13>; */ > /* clock-names = "pclk", "hclk", "tx_clk"; */ > #address-cells = <1>; > #size-cells = <0>; > status = "okay"; > switch0: switch@0 { > compatible = > "marvell,mv88e6352"; Please use marvell,mv88e6085. That is what the 6352 is compatible with. It would also be good to sort out your mixup between tabs and spaces. > mdio { > compatible = "cdns,macb-mdio"; > /* reg = <0xe000c000 0x1000>; */ > /* clocks = <&clkc 31>, <&clkc 31>, <&clkc 14>; */ > /* clock-names = "pclk", "hclk", "tx_clk"; */ > #address-cells = <1>; > #size-cells = <0>; > status = "okay"; > ethernet_phy: ethernet-phy@0 { > compatible = > "marvell,mv88e1510"; > device_type = "ethernet-phy"; > reg = <0>; > }; PHYs don't have compatible strings. It is not needed, you can read the vendor and model from its registers. Andrew
Re: [PATCH net v2 1/2] rhashtable: Fix rhlist duplicates insertion
On 04/03/2018 17:13, Mark Bloch wrote: On 04/03/2018 15:26, Paul Blakey wrote: When inserting duplicate objects (those with the same key), current rhlist implementation messes up the chain pointers by updating the bucket pointer instead of prev next pointer to the newly inserted node. This causes missing elements on removal and travesal. Fix that by properly updating pprev pointer to point to the correct rhash_head next pointer. Fixes: ca26893f05e8 ('rhashtable: Add rhlist interface') Signed-off-by: Paul Blakey --- include/linux/rhashtable.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index c9df252..668a21f 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -766,8 +766,10 @@ static inline void *__rhashtable_insert_fast( if (!key || (params.obj_cmpfn ? params.obj_cmpfn(&arg, rht_obj(ht, head)) : -rhashtable_compare(&arg, rht_obj(ht, head +rhashtable_compare(&arg, rht_obj(ht, head { + pprev = &head->next; It seems rhashtable_lookup_one() might need the same fix. yes was just about to send it! it's in v3 with a test that shows it. continue; + } data = rht_obj(ht, head); Mark
Re: [PATCH v2 0/4] net: Use strlcpy() for ethtool::get_strings
On Fri, Mar 02, 2018 at 03:08:35PM -0800, Florian Fainelli wrote: > Hi all, > > After turning on KASAN on one of my systems, I started getting lots of out of > bounds errors while fetching a given port's statistics, and indeed using > memcpy() is unsafe for copying strings which have not been declared as an > array > of ETH_GSTRING_LEN bytes, so let's use strlcpy() instead. This allows the best > of both worlds: we still keep the efficient memory usage of variably sized > strings, but we don't copy more than we need to. Reviewed-by: Andrew Lunn Andrew
Re: lnstat
On Sat, 3 Mar 2018 22:56:02 +0100 David Kaufmann wrote: > Hi! > > `lnstat` segfaults (tested on Debian 9, CentOS 6+7, Fedora 27) if it is > started as `lnstat -w 1` > > according to gdb the crash is in `build_hdr_string` at lnstat.c:212 > > as it seems to be an useless value for the option anyway it might make > sense to just handle a single "1" the same as if "0" was specified. > `-w 0,1`, `-w 1,0`, `-w 1,1` and other variations do work. Right having one character width is breaking the header building code. Probably should just catch it in the option parsing. > > All the best, > Astra > > PS: I did not find any other place to report this, if this is the wrong > place please tell we where to post. This the right place. pgptUEIId_lnX.pgp Description: OpenPGP digital signature
Re: [PATCH iproute2] tc: fix parsing of the control action
On Fri, 2 Mar 2018 19:36:16 +0100 Davide Caratti wrote: > If the user didn't specify any control action, don't pop the command line > arguments: otherwise, parsing of the next argument (tipically the 'index' > keyword) results in an error, causing the following 'tc-testing' failures: > > Test a6d6: Add skbedit action with index > Test 38f3: Delete skbedit action > Test a568: Add action with ife type > Test b983: Add action without ife type > Test 7d50: Add skbmod action to set destination mac > Test 9b29: Add skbmod action to set source mac > Test e93a: Delete an skbmod action > > Also, add missing parse for 'ok' control action to m_police, to fix the > following 'tc-testing' failure: > > Test 8dd5: Add police action with control ok > > tested with: > # ./tdc.py > > test results: > all tests ok using kernel 4.16-rc2, except 9aa8 "Get a single skbmod > action from a list" (which is failing also before this commit) > > Fixes: 3572e01a090a ("tc: util: Don't call NEXT_ARG_FWD() in > __parse_action_control()") > Cc: Michal Privoznik > Cc: Wolfgang Bumiller > Signed-off-by: Davide Caratti > --- Applied thanks.
Re: [PATCH net-next 0/9] sctp: clean up sctp_sendmsg
From: Xin Long Date: Thu, 1 Mar 2018 23:05:09 +0800 > This cleanup mostly does three things: > > - extract some codes into functions to make sendmsg more readable. > > - tidy up some codes to avoid the unnecessary checks. > > - adjust some logic so that it will be easier to add the send flags >and cmsgs features that I will post after this. > > To make it easy to review and to check if the code is compatible with > before, this patchset is to do it step by step in 9 patches. > > NOTE: > There will be a conflict when merging > Commit 2277c7cd75e3 ("sctp: Add LSM hooks") from selinux tree, > the solution is to: > > 1. remove all the lines in [B]: > > <<< HEAD > [A] > === > [B] > >>> 2277c7c... sctp: Add LSM hooks > > 2. and apply the following diff-output: ... Series applied, thank you. In particular, thanks for the merge resolution details.
Re: [PATCH v3 net-next 00/10] net/ipv6: Add support for path selection using hash of 5-tuple
From: David Ahern Date: Fri, 2 Mar 2018 08:32:11 -0800 > Hardware supports multipath selection using the standard L4 5-tuple > instead of just L3 and the flow label. In addition, some network > operators prefer IPv6 path selection to use the 5-tuple. To that end, > add support to IPv6 for multipath hash policy similar to > bf4e0a3db97eb ("net: ipv4: add support for ECMP hash policy choice"). > The default is still L3 which covers source and destination addresses > along with flow label and IPv6 protocol. This gives users a choice in > hash algorithms if they believe L3 only and the IPv6 flow label are not > sufficient for their use case. > > A separate sysctl is added for IPv6, allowing IPv4 and IPv6 to use > different algorithms if desired. > > The first 3 patches modify the IPv4 variant so that at the end of the > patch set the ipv4 and ipv6 implementations are direct parallels. > > Patch 4 refactors the existing rt6_multipath_hash in preparation for > adding the policy option. > > Patch 5 renames the existing netevent to have IPv4 in the name so ipv4 > changes can be distinguished from IPv6 if the netevent handler cares. > > Patch 6 adds the skb as an argument through the FIB lookup functions > to the multipath selection. Needed for the forwarding case. > > Patch 7 adds the L4 hash support. > > Patch 8 adds the hook for the netevent to the spectrum driver to update > the ASIC. > > Patch 9 removes no longer used code. > > Patch 10 adds a testcase for IPv6 multipath with L4 hash. ... Series applied, nice work David.
Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko wrote: > Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote: >>On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko wrote: >>> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote: On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko wrote: > Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote: >>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: >>> >Yeah, this code essentially calls out the "shareable" code with a >>> >comment at the start and end of the section what defines the >>> >virtio_bypass functionality. It would just be a matter of mostly >>> >cutting and pasting to put it into a separate driver module. >>> >>> Please put it there and unite the use of it with netvsc. >> >>Surely, adding this to other drivers (e.g. might this be handy for xen >>too?) can be left for a separate patchset. Let's get one device merged >>first. > > Why? Let's do the generic infra alongside with the driver. I see no good > reason to rush into merging driver and only later, if ever, to convert > it to generic solution. On contrary. That would lead into multiple > approaches and different behavious in multiple drivers. That is plain > wrong. If nothing else it doesn't hurt to do this in one driver in a generic way, and once it has been proven to address all the needs of that one driver we can then start moving other drivers to it. The current solution is quite generic, that was my contribution to this patch set as I didn't like how invasive it was being to virtio and thought it would be best to keep this as minimally invasive as possible. My preference would be to give this a release or two in virtio to mature before we start pushing it onto other drivers. It shouldn't take much to cut/paste this into a new driver file once we decide it is time to start extending it out to other drivers. >>> >>> I'm not talking about cut/paste and in fact that is what I'm worried >>> about. I'm talking about common code in net/core/ or somewhere that >>> would take care of this in-driver bonding. Each driver, like virtio_net, >>> netvsc would just register some ops to it and the core would do all >>> logic. I believe it is essential take this approach from the start. >> >>Sorry, I didn't mean cut/paste into another driver, I meant to make it >>a driver of its own. My thought was to eventually create a shared/core >>driver module that is then used by the other drivers. >> >>My concern right now is that Stephen has indicated he doesn't want >>this approach taken with netvsc, and most of the community doesn't > > IIUC, he only does not like the extra netdev. Is there anything else? Nope that is pretty much it. It doesn't seem like a big deal for virtio, but for netvsc it is significant since they don't have any "backup" bit feature differentiation, so they would likely be stuck with 2 netdevs even in their basic setup. >>want the netvsc approach applied to virtio. Until that impasse can be >>resolved there isn't much value in trying to split this up so it is >>available to other drivers. In addition I would imagine it would make >>it a pain for others to back-port into distros since it would break >>legacy netvsc driver behavior. Patches are always welcome. Once this >>is in you are free to try fighting to get this made into a generic >>module and applied to both drivers, but we have already spent close to >>3 months on this and it seems like there has been significantly more > > Alex, time is never a good argument for poor design and shortcuts. I'm not saying we should go with a poor design due to time. But expecting us to implement something where the maintainer of said driver has not agreed to is pointless, and I don't see it as a design shortcut to implement something in one driver with the expectation that we will then make it core later once it has proven itself and has use elsewhere. In the meantime I would imagine it also makes it easier for things like backports and such for us to do it this way since we are only impacting one driver. You are telling us to do something that not everyone has agreed to. Currently we only have agreement from Michael on taking this code, as such we are working with virtio only for now. When the time comes that we can get other maintainers, specifically Stephen, to agree to it then we can cut/paste this code into a core file or into a module of its own. Alternatively I suppose we could take this up to Dave if you can't get Stephen to agree. If you can get Dave to say we need to change netvsc then we will go ahead with it, but generally I prefer to respect when the maintainer of something says they don't want us modifying their code in some way.
Re: [PATCH v2 net-next 1/1] tools: tc-testing: Add notap option
From: "Brenda J. Butler" Date: Wed, 28 Feb 2018 15:36:19 -0500 > Add a command line arg to suppress tap output. Handy in case > all the tap output is being supplied by the plugins. > > Signed-off-by: Brenda J. Butler > --- > > v2: Drop the first patch that changes the format > of the tap output. The second "notap" patch is > reworked to apply cleanly without the first patch. Applied.
Re: [PATCH 2/2] net: usb: asix88179_178a: de-duplicate code
From: Alexander Kurz Date: Wed, 28 Feb 2018 21:27:39 + > -static int ax88179_bind(struct usbnet *dev, struct usb_interface *intf) > +static int ax88179_link_bind_or_reset(struct usbnet *dev, int do_reset) "do_reset" is a boolean, therefore please use type 'bool' and true/false. Thank you.
Re: [PATCH v2 net-next 0/5] Export SERDES stats via ethtool -S
From: Andrew Lunn Date: Thu, 1 Mar 2018 02:02:26 +0100 > The mv88e6352 family has a SERDES interface which can be used for > example to connect to SFF/SFP modules. This interface has a couple of > statistics counters. Add support for including these counters in the > output of ethtool -S. Series applied, thanks Andrew.
[PATCH net-next 1/2] tcp: add send queue size stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_SENDQ_SIZE stat into SCM_TIMESTAMPING_OPT_STATS. It reports no. of bytes present in send queue, when timestamp is generated. Signed-off-by: Priyaranjan Jha Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 4 +++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index b4a4f64635fa..93bad2128ef6 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -241,6 +241,7 @@ enum { TCP_NLA_MIN_RTT,/* minimum RTT */ TCP_NLA_RECUR_RETRANS, /* Recurring retransmits for the current pkt */ TCP_NLA_DELIVERY_RATE_APP_LMT, /* delivery rate application limited ? */ + TCP_NLA_SNDQ_SIZE, /* Data (bytes) pending in send queue */ }; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index a33539798bf6..162ba4227446 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3031,7 +3031,7 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk) u32 rate; stats = alloc_skb(7 * nla_total_size_64bit(sizeof(u64)) + - 3 * nla_total_size(sizeof(u32)) + + 4 * nla_total_size(sizeof(u32)) + 2 * nla_total_size(sizeof(u8)), GFP_ATOMIC); if (!stats) return NULL; @@ -3061,6 +3061,8 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk) nla_put_u8(stats, TCP_NLA_RECUR_RETRANS, inet_csk(sk)->icsk_retransmits); nla_put_u8(stats, TCP_NLA_DELIVERY_RATE_APP_LMT, !!tp->rate_app_limited); + + nla_put_u32(stats, TCP_NLA_SNDQ_SIZE, tp->write_seq - tp->snd_una); return stats; } -- 2.16.2.395.g2e18187dfd-goog
[PATCH net-next 2/2] tcp: add ca_state stat in SCM_TIMESTAMPING_OPT_STATS
This patch adds TCP_NLA_CA_STATE stat into SCM_TIMESTAMPING_OPT_STATS. It reports ca_state of socket, when timestamp is generated. Signed-off-by: Priyaranjan Jha Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 93bad2128ef6..4c0ae0faf7ca 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -242,6 +242,7 @@ enum { TCP_NLA_RECUR_RETRANS, /* Recurring retransmits for the current pkt */ TCP_NLA_DELIVERY_RATE_APP_LMT, /* delivery rate application limited ? */ TCP_NLA_SNDQ_SIZE, /* Data (bytes) pending in send queue */ + TCP_NLA_CA_STATE, /* ca_state of socket */ }; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 162ba4227446..fb350f740f69 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3032,7 +3032,7 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk) stats = alloc_skb(7 * nla_total_size_64bit(sizeof(u64)) + 4 * nla_total_size(sizeof(u32)) + - 2 * nla_total_size(sizeof(u8)), GFP_ATOMIC); + 3 * nla_total_size(sizeof(u8)), GFP_ATOMIC); if (!stats) return NULL; @@ -3063,6 +3063,7 @@ struct sk_buff *tcp_get_timestamping_opt_stats(const struct sock *sk) nla_put_u8(stats, TCP_NLA_DELIVERY_RATE_APP_LMT, !!tp->rate_app_limited); nla_put_u32(stats, TCP_NLA_SNDQ_SIZE, tp->write_seq - tp->snd_una); + nla_put_u8(stats, TCP_NLA_CA_STATE, inet_csk(sk)->icsk_ca_state); return stats; } -- 2.16.2.395.g2e18187dfd-goog
[PATCH] fsl/fman: avoid sleeping in atomic context while adding an address
__dev_mc_add grabs an adress spinlock so use atomic context in kmalloc. / # ifconfig eth0 inet 192.168.0.111 [ 89.331622] BUG: sleeping function called from invalid context at mm/slab.h:420 [ 89.339002] in_atomic(): 1, irqs_disabled(): 0, pid: 1035, name: ifconfig [ 89.345799] 2 locks held by ifconfig/1035: [ 89.349908] #0: (rtnl_mutex){+.+.}, at: [<(ptrval)>] devinet_ioctl+0xc0/0x8a0 [ 89.357258] #1: (_xmit_ETHER){+...}, at: [<(ptrval)>] __dev_mc_add+0x28/0x80 [ 89.364520] CPU: 1 PID: 1035 Comm: ifconfig Not tainted 4.16.0-rc3-dirty #8 [ 89.371464] Call Trace: [ 89.373908] [e959db60] [c066f948] dump_stack+0xa4/0xfc (unreliable) [ 89.380177] [e959db80] [c00671d8] ___might_sleep+0x248/0x280 [ 89.385833] [e959dba0] [c01aec34] kmem_cache_alloc_trace+0x174/0x320 [ 89.392179] [e959dbd0] [c04ab920] dtsec_add_hash_mac_address+0x130/0x240 [ 89.398874] [e959dc00] [c04a9d74] set_multi+0x174/0x1b0 [ 89.404093] [e959dc30] [c04afb68] dpaa_set_rx_mode+0x68/0xe0 [ 89.409745] [e959dc40] [c057baf8] __dev_mc_add+0x58/0x80 [ 89.415052] [e959dc60] [c060fd64] igmp_group_added+0x164/0x190 [ 89.420878] [e959dca0] [c060ffa8] ip_mc_inc_group+0x218/0x460 [ 89.426617] [e959dce0] [c06120fc] ip_mc_up+0x3c/0x190 [ 89.431662] [e959dd10] [c0607270] inetdev_event+0x250/0x620 [ 89.437227] [e959dd50] [c005f190] notifier_call_chain+0x80/0xf0 [ 89.443138] [e959dd80] [c0573a74] __dev_notify_flags+0x54/0xf0 [ 89.448964] [e959dda0] [c05743f8] dev_change_flags+0x48/0x60 [ 89.454615] [e959ddc0] [c0606744] devinet_ioctl+0x544/0x8a0 [ 89.460180] [e959de10] [c060987c] inet_ioctl+0x9c/0x1f0 [ 89.465400] [e959de80] [c05479a8] sock_ioctl+0x168/0x460 [ 89.470708] [e959ded0] [c01cf3ec] do_vfs_ioctl+0xac/0x8c0 [ 89.476099] [e959df20] [c01cfc40] SyS_ioctl+0x40/0xc0 [ 89.481147] [e959df40] [c0011318] ret_from_syscall+0x0/0x3c [ 89.486715] --- interrupt: c01 at 0x1006943c [ 89.486715] LR = 0x100c45ec Signed-off-by: Denis Kirjanov --- drivers/net/ethernet/freescale/fman/fman_dtsec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.c b/drivers/net/ethernet/freescale/fman/fman_dtsec.c index ea43b4974149..7af31ddd093f 100644 --- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c +++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c @@ -1100,7 +1100,7 @@ int dtsec_add_hash_mac_address(struct fman_mac *dtsec, enet_addr_t *eth_addr) set_bucket(dtsec->regs, bucket, true); /* Create element to be added to the driver hash table */ - hash_entry = kmalloc(sizeof(*hash_entry), GFP_KERNEL); + hash_entry = kmalloc(sizeof(*hash_entry), GFP_ATOMIC); if (!hash_entry) return -ENOMEM; hash_entry->addr = addr; -- 2.13.6
Re: [PATCH v2 net-next 5/5] net: dsa: mv88e6xxx: Get mv88e6352 SERDES statistics
On 03/01/2018 07:10 PM, Andrew Lunn wrote: >> +void mv88e6352_serdes_get_strings(struct mv88e6xxx_chip *chip, >> + int port, uint8_t *data) >> +{ >> +struct mv88e6352_serdes_hw_stat *stat; >> +int i; >> + >> +if (!mv88e6352_port_has_serdes(chip, port)) >> +return; >> + >> +for (i = 0; i < ARRAY_SIZE(mv88e6352_serdes_hw_stats); i++) { >> +stat = &mv88e6352_serdes_hw_stats[i]; >> +memcpy(data + i * ETH_GSTRING_LEN, stat->string, >> + ETH_GSTRING_LEN); > > This has the same problem as Florain just fixed, using memcpy instead > of strcnpy. I will spin a new version with this fixed. This is fine actually, your strings are defined as an array of ETH_GSTRING_LEN characters so while the memcpy() is a bit inefficient and will typically lead to copying a lot of NUL bytes, this won't be causing out of bounds accesses though. -- Florian
Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote: >On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko wrote: >> Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote: >>>On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko wrote: Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote: >On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko wrote: >> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote: >>>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: >Yeah, this code essentially calls out the "shareable" code with a >comment at the start and end of the section what defines the >virtio_bypass functionality. It would just be a matter of mostly >cutting and pasting to put it into a separate driver module. Please put it there and unite the use of it with netvsc. >>> >>>Surely, adding this to other drivers (e.g. might this be handy for xen >>>too?) can be left for a separate patchset. Let's get one device merged >>>first. >> >> Why? Let's do the generic infra alongside with the driver. I see no good >> reason to rush into merging driver and only later, if ever, to convert >> it to generic solution. On contrary. That would lead into multiple >> approaches and different behavious in multiple drivers. That is plain >> wrong. > >If nothing else it doesn't hurt to do this in one driver in a generic >way, and once it has been proven to address all the needs of that one >driver we can then start moving other drivers to it. The current >solution is quite generic, that was my contribution to this patch set >as I didn't like how invasive it was being to virtio and thought it >would be best to keep this as minimally invasive as possible. > >My preference would be to give this a release or two in virtio to >mature before we start pushing it onto other drivers. It shouldn't >take much to cut/paste this into a new driver file once we decide it >is time to start extending it out to other drivers. I'm not talking about cut/paste and in fact that is what I'm worried about. I'm talking about common code in net/core/ or somewhere that would take care of this in-driver bonding. Each driver, like virtio_net, netvsc would just register some ops to it and the core would do all logic. I believe it is essential take this approach from the start. >>> >>>Sorry, I didn't mean cut/paste into another driver, I meant to make it >>>a driver of its own. My thought was to eventually create a shared/core >>>driver module that is then used by the other drivers. >>> >>>My concern right now is that Stephen has indicated he doesn't want >>>this approach taken with netvsc, and most of the community doesn't >> >> IIUC, he only does not like the extra netdev. Is there anything else? > >Nope that is pretty much it. It doesn't seem like a big deal for >virtio, but for netvsc it is significant since they don't have any >"backup" bit feature differentiation, so they would likely be stuck >with 2 netdevs even in their basic setup. Okay. If that is a strict "no-go" for netvsc, this should be just a flag passed down to the in-driver bond code. > >>>want the netvsc approach applied to virtio. Until that impasse can be >>>resolved there isn't much value in trying to split this up so it is >>>available to other drivers. In addition I would imagine it would make >>>it a pain for others to back-port into distros since it would break >>>legacy netvsc driver behavior. Patches are always welcome. Once this >>>is in you are free to try fighting to get this made into a generic >>>module and applied to both drivers, but we have already spent close to >>>3 months on this and it seems like there has been significantly more >> >> Alex, time is never a good argument for poor design and shortcuts. > >I'm not saying we should go with a poor design due to time. But >expecting us to implement something where the maintainer of said >driver has not agreed to is pointless, and I don't see it as a design He just does not like the third netdev, not the fact that the code for in-driver bonding would be shared. >shortcut to implement something in one driver with the expectation >that we will then make it core later once it has proven itself and has >use elsewhere. In the meantime I would imagine it also makes it easier >for things like backports and such for us to do it this way since we >are only impacting one driver. When you are working on upstream kernel, you should not care about backports. That leads to poor design. Not an argument. > >You are telling us to do something that not everyone has agreed to. Who did not? >Currently we only have agreement from Michael on taking this code, as >such we are working with virtio only for now. When the time comes that If you do duplication of netvsc in-driver bonding in virtio_net, it will stay there
[PATCH net] xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto
From: Yossi Kuperman Artem Savkov reported that commit 5efec5c655dd leads to a packet loss under IPSec configuration. It appears that his setup consists of a TUN device, which does not have a MAC header. Make sure MAC header exists. Note: TUN device sets a MAC header pointer, although it does not have one. Fixes: 5efec5c655dd ("xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP version") Reported-by: Artem Savkov Tested-by: Artem Savkov Signed-off-by: Yossi Kuperman --- net/ipv4/xfrm4_mode_tunnel.c | 3 ++- net/ipv6/xfrm6_mode_tunnel.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c index 63faeee..2a9764b 100644 --- a/net/ipv4/xfrm4_mode_tunnel.c +++ b/net/ipv4/xfrm4_mode_tunnel.c @@ -92,7 +92,8 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb) skb_reset_network_header(skb); skb_mac_header_rebuild(skb); - eth_hdr(skb)->h_proto = skb->protocol; + if (skb->mac_len) + eth_hdr(skb)->h_proto = skb->protocol; err = 0; diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c index bb935a3..de1b0b8 100644 --- a/net/ipv6/xfrm6_mode_tunnel.c +++ b/net/ipv6/xfrm6_mode_tunnel.c @@ -92,7 +92,8 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb) skb_reset_network_header(skb); skb_mac_header_rebuild(skb); - eth_hdr(skb)->h_proto = skb->protocol; + if (skb->mac_len) + eth_hdr(skb)->h_proto = skb->protocol; err = 0; -- 2.8.1
[RFC,POC] iptables/nftables to epbf/xdp via common intermediate layer
These patches, which go on top of the 'bpfilter' RFC patches, demonstrate an nftables to ebpf translation (done in userspace). In order to not duplicate the ebpf code generation efforts, the rules iptables -i lo -d 127.0.0.2 -j DROP and nft add rule ip filter input ip daddr 127.0.0.2 drop are first translated to a common intermediate representation, and then to ebpf, which attaches resulting prog to the XDP hook. IMR representation is identical in both cases so therefore both rules result in the same ebpf program. The IMR currently assumes that translation will always be to ebpf. As per previous discussion it doesn't consider other targets, so for instance IMR pseudo-registers map 1:1 to ebpf ones. The IMR is also supposed to be generic enough to make it easy to convert 'fronted' formats (iptables rule blob, nftables netlink) to it, and also extend it to cover ip rule, ovs or any other inputs in the future without need for major changes to the IMR. The IMR currently implements following basic operations: - Relational (equal, not equal) - immediates (32 and 64bit constants) - payload with relative addressing (macr, network, transport header) - verdict (pass, drop, next rule) Its still in early stage, but I think its good enough as a proof-of-concept. Known differences between nftjit.ko and bpfilter.ko: nftjit.ko currently doesn't run transparently, but thats only because I wanted to focus on the IMR and get the POC out of the door. It should be possible to get it transparent via the bpfilter.ko approach. Next steps for the IMR could be addition of binary operations for prefixes ("-d 192.168.0.1/24"), its also needed e.g. for tcp flag matching (-p tcp --syn in iptables) and so on. I'd also be interested in wheter XDP is seen as appropriate target hook. AFAICS the XDP and the nftables ingress hook are similar enough to consider just (re)using the XDP hook to jit the nftables ingress hook. The translator could check if the hook is unused, and return early if some other program is already attached. Comments welcome, especially wrt. IMR concept and what might be next step(s) in moving forward. The patches are also available via git at https://git.breakpoint.cc/cgit/fw/net-next.git/log/?h=bpfilter7 .
[RFC,POC 3/3] bpfilter: switch bpfilter to iptables->IMR translation
Translate basic iptables rule blob to the IMR, then ask IMR to translate to ebpf. IMR is shared between nft and bpfilter translators. iptables_gen_append() is the only relevant function here, as it demonstrates simple 'source/destination matches x' test. Signed-off-by: Florian Westphal --- net/bpfilter/Makefile | 2 +- net/bpfilter/bpfilter_gen.h | 15 + net/bpfilter/bpfilter_mod.h | 16 +- net/bpfilter/iptables.c | 76 + net/bpfilter/iptables.h | 4 +++ net/bpfilter/sockopt.c | 73 +-- 6 files changed, 154 insertions(+), 32 deletions(-) create mode 100644 net/bpfilter/bpfilter_gen.h create mode 100644 net/bpfilter/iptables.c create mode 100644 net/bpfilter/iptables.h diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile index a4064986dc2f..21a8afb60b7c 100644 --- a/net/bpfilter/Makefile +++ b/net/bpfilter/Makefile @@ -5,7 +5,7 @@ hostprogs-y := nftjit.ko bpfilter.ko always := $(hostprogs-y) -bpfilter.ko-objs := bpfilter.o tgts.o targets.o tables.o init.o ctor.o sockopt.o gen.o +bpfilter.ko-objs := bpfilter.o tgts.o targets.o tables.o init.o ctor.o sockopt.o gen.o iptables.o imr.o NFT_LIBS = -lnftnl nftjit.ko-objs := tgts.o targets.o tables.o init.o ctor.o gen.o nftables.o imr.o diff --git a/net/bpfilter/bpfilter_gen.h b/net/bpfilter/bpfilter_gen.h new file mode 100644 index ..71c6e8a73e24 --- /dev/null +++ b/net/bpfilter/bpfilter_gen.h @@ -0,0 +1,15 @@ +struct bpfilter_gen_ctx { + struct bpf_insn *img; + u32 len_cur; + u32 len_max; + u32 default_verdict; + int fd; + int ifindex; + booloffloaded; +}; + +int bpfilter_gen_init(struct bpfilter_gen_ctx *ctx); +int bpfilter_gen_prologue(struct bpfilter_gen_ctx *ctx); +int bpfilter_gen_epilogue(struct bpfilter_gen_ctx *ctx); +int bpfilter_gen_commit(struct bpfilter_gen_ctx *ctx); +void bpfilter_gen_destroy(struct bpfilter_gen_ctx *ctx); diff --git a/net/bpfilter/bpfilter_mod.h b/net/bpfilter/bpfilter_mod.h index b4209985efff..dc3a90df1788 100644 --- a/net/bpfilter/bpfilter_mod.h +++ b/net/bpfilter/bpfilter_mod.h @@ -4,6 +4,7 @@ #include "include/uapi/linux/bpfilter.h" #include +#include "bpfilter_gen.h" struct bpfilter_table { struct hlist_node hash; @@ -71,26 +72,11 @@ struct bpfilter_target { u8 rev; }; -struct bpfilter_gen_ctx { - struct bpf_insn *img; - u32 len_cur; - u32 len_max; - u32 default_verdict; - int fd; - int ifindex; - booloffloaded; -}; - union bpf_attr; int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size); -int bpfilter_gen_init(struct bpfilter_gen_ctx *ctx); -int bpfilter_gen_prologue(struct bpfilter_gen_ctx *ctx); -int bpfilter_gen_epilogue(struct bpfilter_gen_ctx *ctx); int bpfilter_gen_append(struct bpfilter_gen_ctx *ctx, struct bpfilter_ipt_ip *ent, int verdict); -int bpfilter_gen_commit(struct bpfilter_gen_ctx *ctx); -void bpfilter_gen_destroy(struct bpfilter_gen_ctx *ctx); struct bpfilter_target *bpfilter_target_get_by_name(const char *name); void bpfilter_target_put(struct bpfilter_target *tgt); diff --git a/net/bpfilter/iptables.c b/net/bpfilter/iptables.c new file mode 100644 index ..055cfa8fbf21 --- /dev/null +++ b/net/bpfilter/iptables.c @@ -0,0 +1,76 @@ +#include +#include + +typedef uint16_t __sum16; /* hack */ +#include + +#include "bpfilter_mod.h" +#include "iptables.h" +#include "imr.h" + +static int check_entry(const struct bpfilter_ipt_ip *ent) +{ +#define M_FF "\xff\xff\xff\xff" + static const __u8 mask1[IFNAMSIZ] = M_FF M_FF M_FF M_FF; + static const __u8 mask0[IFNAMSIZ] = { }; + int ones = strlen(ent->in_iface); ones += ones > 0; +#undef M_FF + if (strlen(ent->out_iface) > 0) + return -ENOTSUPP; + if (memcmp(ent->in_iface_mask, mask1, ones) || + memcmp(&ent->in_iface_mask[ones], mask0, sizeof(mask0) - ones)) + return -ENOTSUPP; + if ((ent->src_mask != 0 && ent->src_mask != 0x) || + (ent->dst_mask != 0 && ent->dst_mask != 0x)) + return -ENOTSUPP; + + return 0; +} + +int iptables_gen_append(struct imr_state *state, + struct bpfilter_ipt_ip *ent, int verdict) +{ + struct imr_object *left, *right, *relop; + int ret; + + ret = check_entry(ent); + if (ret < 0) + return ret; + if (ent->src_mask == 0 && ent->dst_mask == 0) + return 0; + + imr_state_rule_begin(state); + + if (ent->src_mask) { + left = imr_object_
[RFC,POC 2/3] bpfilter: add nftables jit proof-of-concept
This adds a nftables frontend for the IMR->BPF translator. This doesn't work via UMH yet. AFAIU it should be possible to get transparent ebpf translation for nftables, similar to the bpfilter/iptables UMH. However, at this time I think its better to get IMR "right". nftjit.ko currently needs libnftnl/libmnl but thats convenince on my end and not a "must have". Signed-off-by: Florian Westphal --- net/bpfilter/Makefile | 7 +- net/bpfilter/nftables.c | 679 2 files changed, 685 insertions(+), 1 deletion(-) create mode 100644 net/bpfilter/nftables.c diff --git a/net/bpfilter/Makefile b/net/bpfilter/Makefile index 5a85ef7d7a4d..a4064986dc2f 100644 --- a/net/bpfilter/Makefile +++ b/net/bpfilter/Makefile @@ -3,7 +3,12 @@ # Makefile for the Linux BPFILTER layer. # -hostprogs-y := bpfilter.ko +hostprogs-y := nftjit.ko bpfilter.ko always := $(hostprogs-y) bpfilter.ko-objs := bpfilter.o tgts.o targets.o tables.o init.o ctor.o sockopt.o gen.o + +NFT_LIBS = -lnftnl +nftjit.ko-objs := tgts.o targets.o tables.o init.o ctor.o gen.o nftables.o imr.o +HOSTLOADLIBES_nftjit.ko = `pkg-config --libs libnftnl libmnl` + HOSTCFLAGS += -I. -Itools/include/ diff --git a/net/bpfilter/nftables.c b/net/bpfilter/nftables.c new file mode 100644 index ..5a756ccd03a1 --- /dev/null +++ b/net/bpfilter/nftables.c @@ -0,0 +1,679 @@ +/* + * based on previous code from: + * + * Copyright (c) 2013 Arturo Borrero Gonzalez + * Copyright (c) 2013 Pablo Neira Ayuso + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "bpfilter_mod.h" +#include "imr.h" + +/* Hack, we don't link bpfilter.o */ +extern long int syscall (long int __sysno, ...); + +int sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) +{ + return syscall(321, cmd, attr, size); +} + +static int seq; + +static void memory_allocation_error(void) { perror("allocation failed"); exit(1); } + +static int nft_reg_to_imr_reg(int nfreg) +{ + switch (nfreg) { + case NFT_REG_VERDICT: + return IMR_REG_0; + /* old register numbers, 4 128 bit registers. */ + case NFT_REG_1: + return IMR_REG_4; + case NFT_REG_2: + return IMR_REG_6; + case NFT_REG_3: + return IMR_REG_8; + case NFT_REG_4: + break; + /* new register numbers, 16 32 bit registers, map to old ones */ + case NFT_REG32_00: + return IMR_REG_4; + case NFT_REG32_01: + return IMR_REG_5; + case NFT_REG32_02: + return IMR_REG_6; + default: + return -1; + } + return -1; +} + +static int netlink_parse_immediate(const struct nftnl_expr *nle, void *out) +{ + struct imr_state *state = out; + struct imr_object *o = NULL; + + if (nftnl_expr_is_set(nle, NFTNL_EXPR_IMM_DATA)) { + uint32_t len; + int reg; + + nftnl_expr_get(nle, NFTNL_EXPR_IMM_DATA, &len); + + switch (len) { + case sizeof(uint32_t): + o = imr_object_alloc_imm32(nftnl_expr_get_u32(nle, NFTNL_EXPR_IMM_DATA)); + break; + case sizeof(uint64_t): + o = imr_object_alloc_imm64(nftnl_expr_get_u64(nle, NFTNL_EXPR_IMM_DATA)); + break; + default: + return -ENOTSUPP; + } + reg = nft_reg_to_imr_reg(nftnl_expr_get_u32(nle, +NFTNL_EXPR_IMM_DREG)); + if (reg < 0) { + imr_object_free(o); + return reg; + } + + imr_register_store(state, reg, o); + return 0; + } else if (nftnl_expr_is_set(nle, NFTNL_EXPR_IMM_VERDICT)) { + uint32_t verdict; + int ret; + + if (nftnl_expr_is_set(nle, NFTNL_EXPR_IMM_CHAIN)) + return -ENOTSUPP; + +verdict = nftnl_expr_get_u32(nle, NFTNL_EXPR_IMM_VERDICT); + + switch (verdict) { + case NF_ACCEPT: + o = imr_object_alloc_verdict(IMR_VERDICT_PASS); + break; + case NF_DROP: + o = imr_object_alloc_verdict(IMR_VERDICT_DROP); + break; + default: + fprintf(stderr, "Unhandled verdict %d\n", verdict); + o = imr_object_alloc_verdict(IMR_VERDICT_DROP); + b
[RFC,POC 1/3] bpfilter: add experimental IMR bpf translator
This is a basic intermediate representation to decouple the ruleset representation (iptables, nftables) from the ebpf translation. The IMR currently assumes that translation will always be into ebpf, its pseudo-registers map 1:1 to ebpf ones. Objects implemented at the moment: - relop (eq, ne only for now) - immediate (32, 64 bit constants) - payload, with relative addressing (mac header, network header, transport header) This doesn't add a user; files will not even be compiled yet. Signed-off-by: Florian Westphal --- net/bpfilter/imr.c | 655 + net/bpfilter/imr.h | 78 +++ 2 files changed, 733 insertions(+) create mode 100644 net/bpfilter/imr.c create mode 100644 net/bpfilter/imr.h diff --git a/net/bpfilter/imr.c b/net/bpfilter/imr.c new file mode 100644 index ..09c557ea7c21 --- /dev/null +++ b/net/bpfilter/imr.c @@ -0,0 +1,655 @@ +#include +#include +#include +#include +#include + +#include +#include + +#include +typedef __u16 __bitwise __sum16; /* hack */ +#include +#include + +#include "imr.h" +#include "bpfilter_gen.h" + +#define EMIT(ctx, x) \ + do {\ + if ((ctx)->len_cur + 1 > (ctx)->len_max)\ + return -ENOMEM; \ + (ctx)->img[(ctx)->len_cur++] = x; \ + } while (0) + +struct imr_object { + enum imr_obj_type type:8; + uint8_t len; + + union { + struct { + union { + uint64_t value64; + uint32_t value32; + }; + } immedate; + struct { + struct imr_object *left; + struct imr_object *right; + enum imr_relop op:8; + } relational; + struct { + uint16_t offset; + enum imr_payload_base base:8; + } payload; + struct { + enum imr_verdict verdict; + } verdict; + }; +}; + +struct imr_state { + struct bpf_insn *img; + uint32_t len_cur; + uint32_t len_max; + + struct imr_object *registers[IMR_REG_COUNT]; + uint8_t regcount; + + uint32_t num_objects; + struct imr_object **objects; +}; + +static int imr_jit_object(struct bpfilter_gen_ctx *ctx, + struct imr_state *, const struct imr_object *o); + +static void internal_error(const char *s) +{ + fprintf(stderr, "FIXME: internal error %s\n", s); + exit(1); +} + +/* FIXME: consider len too (e.g. reserve 2 registers for len == 8) */ +static int imr_register_alloc(struct imr_state *s, uint32_t len) +{ + uint8_t reg = s->regcount; + + if (s->regcount >= IMR_REG_COUNT) + return -1; + + s->regcount++; + + return reg; +} + +static int imr_register_get(const struct imr_state *s, uint32_t len) +{ + if (len > sizeof(uint64_t)) + internal_error(">64bit types not yet implemented"); + if (s->regcount == 0) + internal_error("no registers in use"); + + return s->regcount - 1; +} + +static int imr_to_bpf_reg(enum imr_reg_num n) +{ + /* currently maps 1:1 */ + return (int)n; +} + +static int bpf_reg_width(unsigned int len) +{ + switch (len) { + case sizeof(uint8_t): return BPF_B; + case sizeof(uint16_t): return BPF_H; + case sizeof(uint32_t): return BPF_W; + case sizeof(uint64_t): return BPF_DW; + default: + internal_error("reg size not supported"); + } + + return -EINVAL; +} + +static void imr_register_release(struct imr_state *s) +{ + if (s->regcount == 0) + internal_error("regcount underflow"); + s->regcount--; +} + +void imr_register_store(struct imr_state *s, enum imr_reg_num reg, struct imr_object *o) +{ + s->registers[reg] = o; +} + +struct imr_object *imr_register_load(const struct imr_state *s, enum imr_reg_num reg) +{ + return s->registers[reg]; +} + +struct imr_state *imr_state_alloc(void) +{ + struct imr_state *s = calloc(1, sizeof(*s)); + + return s; +} + +void imr_state_free(struct imr_state *s) +{ + int i; + + for (i = 0; i < s->num_objects; i++) + imr_object_free(s->objects[i]); + + free(s); +} + +struct imr_object *imr_object_alloc(enum imr_obj_type t) +{ + struct imr_object *o = calloc(1, sizeof(*o)); + + if (o) + o->type = t; + + return o; +} + +void imr_object_free(struct imr_object *o) +{ + switch (o->type) { + case IMR_OBJ_TYPE_VERDICT: + case IMR_OBJ_TYPE_IMMEDIATE: + case IMR_OBJ_TYPE_PAYLOAD: + break; + case IMR_OBJ_TYPE_RELATIONAL: + imr_object_free(
Re: [crypto v8 04/12] chtls: structure and macro definiton
From: Atul Gupta Date: Thu, 1 Mar 2018 11:19:35 +0530 > + __u8 reneg_to_write_rx; > + __u8 protocol; You should use "u8" rather than "__u8" except in UAPI headers which this file is not. Please audit your entire patch series for this issue. Thank you.
[PATCH v2] staging: Replace printk() with appropriate net_*macro_ratelimited()
Replace printk having a log level with the appropriate net_*macro_ratelimited. It's better to use actual device name as a prefix in error messages. Indentation is also changed, to fix the checkpatch issue. Signed-off-by: Arushi Singhal --- changes in v2 *In previous version printk was changed to pr_*macro(), which is used in kernel instead of calling printk() directly. And for drivers, dev_*macro() or net_*macro_ratelimited() should be used for calling printk() directly. drivers/staging/ipx/af_ipx.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c index d21a9d1..9a96962 100644 --- a/drivers/staging/ipx/af_ipx.c +++ b/drivers/staging/ipx/af_ipx.c @@ -744,13 +744,13 @@ static void ipxitf_discover_netnum(struct ipx_interface *intrfc, intrfc->if_netnum = cb->ipx_source_net; ipxitf_add_local_route(intrfc); } else { - printk(KERN_WARNING "IPX: Network number collision " - "%lx\n%s %s and %s %s\n", - (unsigned long) ntohl(cb->ipx_source_net), - ipx_device_name(i), - ipx_frame_name(i->if_dlink_type), - ipx_device_name(intrfc), - ipx_frame_name(intrfc->if_dlink_type)); + net_warn_ratelimited("IPX: Network number collision " +"%lx\n%s %s and %s %s\n", +(unsigned long) ntohl(cb->ipx_source_net), +ipx_device_name(i), +ipx_frame_name(i->if_dlink_type), +ipx_device_name(intrfc), + ipx_frame_name(intrfc->if_dlink_type)); ipxitf_put(i); } } -- 2.7.4
Re: [Outreachy kernel] [PATCH v2] staging: Replace printk() with appropriate net_*macro_ratelimited()
On Mon, 5 Mar 2018, Arushi Singhal wrote: > Replace printk having a log level with the appropriate > net_*macro_ratelimited. Why did you choose this function? > It's better to use actual device name as a prefix in error messages. What does this message relate to. > Indentation is also changed, to fix the checkpatch issue. It would be better to no exceed 80 characters than to follow the suggestion abotu the argument being to the right of the (. julia > Signed-off-by: Arushi Singhal > --- > changes in v2 > *In previous version printk was changed to pr_*macro(), which is used > in kernel instead of calling printk() directly. And for drivers, > dev_*macro() or net_*macro_ratelimited() should be used for calling > printk() directly. > > drivers/staging/ipx/af_ipx.c | 14 +++--- > 1 file changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c > index d21a9d1..9a96962 100644 > --- a/drivers/staging/ipx/af_ipx.c > +++ b/drivers/staging/ipx/af_ipx.c > @@ -744,13 +744,13 @@ static void ipxitf_discover_netnum(struct ipx_interface > *intrfc, > intrfc->if_netnum = cb->ipx_source_net; > ipxitf_add_local_route(intrfc); > } else { > - printk(KERN_WARNING "IPX: Network number collision " > - "%lx\n%s %s and %s %s\n", > - (unsigned long) ntohl(cb->ipx_source_net), > - ipx_device_name(i), > - ipx_frame_name(i->if_dlink_type), > - ipx_device_name(intrfc), > - ipx_frame_name(intrfc->if_dlink_type)); > + net_warn_ratelimited("IPX: Network number collision " > + "%lx\n%s %s and %s %s\n", > + (unsigned long) > ntohl(cb->ipx_source_net), > + ipx_device_name(i), > + ipx_frame_name(i->if_dlink_type), > + ipx_device_name(intrfc), > + > ipx_frame_name(intrfc->if_dlink_type)); > ipxitf_put(i); > } > } > -- > 2.7.4 > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To post to this group, send email to outreachy-ker...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/20180304204910.GA4840%40seema-Inspiron-15-3567. > For more options, visit https://groups.google.com/d/optout. >
Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On 3/4/2018 10:50 AM, Jiri Pirko wrote: Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote: On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko wrote: Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote: On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko wrote: Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote: On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko wrote: Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote: On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: Yeah, this code essentially calls out the "shareable" code with a comment at the start and end of the section what defines the virtio_bypass functionality. It would just be a matter of mostly cutting and pasting to put it into a separate driver module. Please put it there and unite the use of it with netvsc. Surely, adding this to other drivers (e.g. might this be handy for xen too?) can be left for a separate patchset. Let's get one device merged first. Why? Let's do the generic infra alongside with the driver. I see no good reason to rush into merging driver and only later, if ever, to convert it to generic solution. On contrary. That would lead into multiple approaches and different behavious in multiple drivers. That is plain wrong. If nothing else it doesn't hurt to do this in one driver in a generic way, and once it has been proven to address all the needs of that one driver we can then start moving other drivers to it. The current solution is quite generic, that was my contribution to this patch set as I didn't like how invasive it was being to virtio and thought it would be best to keep this as minimally invasive as possible. My preference would be to give this a release or two in virtio to mature before we start pushing it onto other drivers. It shouldn't take much to cut/paste this into a new driver file once we decide it is time to start extending it out to other drivers. I'm not talking about cut/paste and in fact that is what I'm worried about. I'm talking about common code in net/core/ or somewhere that would take care of this in-driver bonding. Each driver, like virtio_net, netvsc would just register some ops to it and the core would do all logic. I believe it is essential take this approach from the start. Sorry, I didn't mean cut/paste into another driver, I meant to make it a driver of its own. My thought was to eventually create a shared/core driver module that is then used by the other drivers. My concern right now is that Stephen has indicated he doesn't want this approach taken with netvsc, and most of the community doesn't IIUC, he only does not like the extra netdev. Is there anything else? Nope that is pretty much it. It doesn't seem like a big deal for virtio, but for netvsc it is significant since they don't have any "backup" bit feature differentiation, so they would likely be stuck with 2 netdevs even in their basic setup. Okay. If that is a strict "no-go" for netvsc, this should be just a flag passed down to the in-driver bond code. This results in a 3 driver model (virtio/netvsc, vf & bypass) with 2 netdevs created when bypass is based on netvsc and 3 netdevs created when the bypass is based on virtio_net. Unless we agree on a common netdev model between netvsc and virtio_net, i am not sure if it is useful to commonize the code into a separate driver. -Sridhar
Re: [RFC PATCH V1 00/12] audit: implement container id
On Thu, 2018-03-01 at 14:41 -0500, Richard Guy Briggs wrote: > Implement audit kernel container ID. > > This patchset is a preliminary RFC based on the proposal document (V3) > posted: > https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html > > The first patch implements the proc fs write to set the audit container > ID of a process, emitting an AUDIT_CONTAINER record. > > The second implements an auxiliary syscall record AUDIT_CONTAINER_INFO > if a container ID is present on a task. > > The third adds filtering to the exit, exclude and user lists. > > The 4th, implements reading the container ID from the proc filesystem > for debugging. This isn't planned for upstream inclusion. > > The 5th adds signal and ptrace support. > > The 6th attempts to create a local audit context to be able to bind a > standalone record with the container ID record. > > The 7th, 8th, 9th, 10th patches add container ID records to standalone > records. Some of these may end up being syscall auxiliary records and > won't need this specific support since they'll be supported via > syscalls. > > The 11th is a temporary workaround due to the AUDIT_CONTAINER records > not showing up as do AUDIT_LOGIN records. I suspect this is due to its > range (1000 vs 1300), but the intent is to solve it. > > The 12th adds debug information not intended for upstream for those > brave souls wanting to tinker with it in this early state. > > Feedback please! Which tree can this patch set be applied to? Mimi > Here's a quick and dirty test script: > echo 123455 > /proc/$$/containerid; echo $? > sleep 4& > child=$!; sleep 1 > echo 18446744073709551615 > /proc/$child/containerid; echo $? > echo 123456 > /proc/$child/containerid; echo $? > echo 123457 > /proc/$child/containerid; echo $? > sleep 1 > ausearch -ts recent |grep " contid=18446744073709551615"; echo $? > ausearch -ts recent |grep " contid=123456"; echo $? > ausearch -ts recent |grep " contid=123457"; echo $? > echo self:$$ contid:$( cat /proc/$$/containerid) > echo child:$child contid:$( cat /proc/$child/containerid) > > containerid=123458 > key=tmpcontainerid > auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid -F > key=$key || echo failed to add containerid filter rule > bash -c "sleep 1; echo test > /tmp/$key"& > child=$! > echo $containerid > /proc/$child/containerid > sleep 2 > rm -f /tmp/$key > ausearch -ts recent -k $key || echo failed to find CONTAINER_INFO record > auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid -F > key=$key || echo failed to add containerid filter rule > > See: > https://github.com/linux-audit/audit-kernel/issues/32 > https://github.com/linux-audit/audit-userspace/issues/40 > https://github.com/linux-audit/audit-testsuite/issues/64 > > Richard Guy Briggs (12): > audit: add container id > audit: log container info of syscalls > audit: add containerid filtering > audit: read container ID of a process > audit: add containerid support for ptrace and signals > audit: add support for non-syscall auxiliary records > audit: add container aux record to watch/tree/mark > audit: add containerid support for tty_audit > audit: add containerid support for config/feature/user records > audit: add containerid support for seccomp and anom_abend records > debug audit: add container id > debug! audit: add container id > > drivers/tty/tty_audit.c| 5 +- > fs/proc/base.c | 63 +++ > include/linux/audit.h | 36 +++ > include/linux/init_task.h | 4 +- > include/linux/sched.h | 1 + > include/uapi/linux/audit.h | 9 ++- > kernel/audit.c | 74 +++--- > kernel/audit.h | 3 + > kernel/audit_fsnotify.c| 5 +- > kernel/audit_tree.c| 5 +- > kernel/audit_watch.c | 33 +- > kernel/auditfilter.c | 52 ++- > kernel/auditsc.c | 154 > +++-- > 13 files changed, 408 insertions(+), 36 deletions(-) >
Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
On Sun, Mar 4, 2018 at 10:50 AM, Jiri Pirko wrote: > Sun, Mar 04, 2018 at 07:24:12PM CET, alexander.du...@gmail.com wrote: >>On Sat, Mar 3, 2018 at 11:13 PM, Jiri Pirko wrote: >>> Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote: On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko wrote: > Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote: >>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko wrote: >>> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote: On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote: > >Yeah, this code essentially calls out the "shareable" code with a > >comment at the start and end of the section what defines the > >virtio_bypass functionality. It would just be a matter of mostly > >cutting and pasting to put it into a separate driver module. > > Please put it there and unite the use of it with netvsc. Surely, adding this to other drivers (e.g. might this be handy for xen too?) can be left for a separate patchset. Let's get one device merged first. >>> >>> Why? Let's do the generic infra alongside with the driver. I see no good >>> reason to rush into merging driver and only later, if ever, to convert >>> it to generic solution. On contrary. That would lead into multiple >>> approaches and different behavious in multiple drivers. That is plain >>> wrong. >> >>If nothing else it doesn't hurt to do this in one driver in a generic >>way, and once it has been proven to address all the needs of that one >>driver we can then start moving other drivers to it. The current >>solution is quite generic, that was my contribution to this patch set >>as I didn't like how invasive it was being to virtio and thought it >>would be best to keep this as minimally invasive as possible. >> >>My preference would be to give this a release or two in virtio to >>mature before we start pushing it onto other drivers. It shouldn't >>take much to cut/paste this into a new driver file once we decide it >>is time to start extending it out to other drivers. > > I'm not talking about cut/paste and in fact that is what I'm worried > about. I'm talking about common code in net/core/ or somewhere that > would take care of this in-driver bonding. Each driver, like virtio_net, > netvsc would just register some ops to it and the core would do all > logic. I believe it is essential take this approach from the start. Sorry, I didn't mean cut/paste into another driver, I meant to make it a driver of its own. My thought was to eventually create a shared/core driver module that is then used by the other drivers. My concern right now is that Stephen has indicated he doesn't want this approach taken with netvsc, and most of the community doesn't >>> >>> IIUC, he only does not like the extra netdev. Is there anything else? >> >>Nope that is pretty much it. It doesn't seem like a big deal for >>virtio, but for netvsc it is significant since they don't have any >>"backup" bit feature differentiation, so they would likely be stuck >>with 2 netdevs even in their basic setup. > > Okay. If that is a strict "no-go" for netvsc, this should be > just a flag passed down to the in-driver bond code. Are you serious? We might as well just do a per-driver bond then if that is what you want. Once you go back to the "2 netdev" model for this the bond becomes tightly woven into the driver and becomes a separate beast entirely. At that point sharing kind of goes out the window since you have to be tightly coupled into all of the per-driver ops. I would argue there is no way to do the "2 netdev" model generically. It is kind of inherent to the "2 netdev" model in the first place since you can't have a third driver pop up so now everything is pulled into the paravirtual interface unless you invert everything and require the netvsc driver to provide the driver with a set of function pointers allowing it to call back into it. In addition you suddenly have to deal with all the qdisc and Tx queue locking mess. So the 3 netdev model let the driver be lockless and run with no queue disc. Are you telling us you expect our solution to run in both modes or are you pushing the qdisc overhead and Tx queue locking into the 3 netdev model? What it ultimately comes down to is how do you create a new netdev without exposing a new netdev? In the 3 netdev model this all makes sense as we can leave the paravirtual interface in tact. Now you are telling us that based on a flag we either have to embed ourselves into the paravirtual interface without exposing our operations, or we have to embed the paravirtual interface into our device without letting it be visible. The sheer overhead of that will end up more then doubling the code size for just the core bits of this patch. >> want the netvsc app
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: net/ipv6/netfilter/nft_fib_ipv6.c between commit: 47b7e7f82802 ("netfilter: don't set F_IFACE on ipv6 fib lookups") from the net tree and commit: b75cc8f90f07 ("net/ipv6: Pass skb to route lookup") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc net/ipv6/netfilter/nft_fib_ipv6.c index 62fc84d7bdff,3230b3d7b11b.. --- a/net/ipv6/netfilter/nft_fib_ipv6.c +++ b/net/ipv6/netfilter/nft_fib_ipv6.c @@@ -180,7 -180,9 +180,8 @@@ void nft_fib6_eval(const struct nft_exp } *dest = 0; - rt = (void *)ip6_route_lookup(nft_net(pkt), &fl6, lookup_flags); - again: + rt = (void *)ip6_route_lookup(nft_net(pkt), &fl6, pkt->skb, + lookup_flags); if (rt->dst.error) goto put_rt_err; pgp6Wknc9MaBN.pgp Description: OpenPGP digital signature
Re: [PATCH v2 0/4] GSO_BY_FRAGS correctness improvements
From: Daniel Axtens Date: Thu, 1 Mar 2018 17:13:36 +1100 > As requested [1], I went through and had a look at users of gso_size to > see if there were things that need to be fixed to consider > GSO_BY_FRAGS, and I have tried to improve our helper functions to deal > with this case. ... Series applied, thanks Daniel.
Re: [PATCH net-next v2] cxgb4vf: Forcefully link up virtual interfaces
From: Ganesh Goudar Date: Thu, 1 Mar 2018 15:01:04 +0530 > From: Arjun Vynipadath > > The Virtual Interfaces are connected to an internal switch on the chip > which allows VIs attached to the same port to talk to each other even > when the port link is down. As a result, we generally want to always > report a VI's link as being "up". > > Based on the original work by: Casey Leedom > Signed-off-by: Arjun Vynipadath > Signed-off-by: Ganesh Goudar > --- > V2: Doing force_link_up unconditionally Applied.
Re: [PATCH][net-next] net: phy: Fix spelling mistake: "advertisment"-> "advertisement"
From: Colin King Date: Thu, 1 Mar 2018 10:23:03 + > From: Colin Ian King > > Trivial fix to spelling mistake in comments and error message text. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
On Sat, 3 Mar 2018 12:22:36 +0100 Stefano Brivio wrote: > > And please codify the above expectation as a test under > > tools/testing/selftests/net > > And this, along with v2. On a second thought: I start thinking it doesn't make much sense, especially given the current context of self-tests, to explicitly test this, because it's a rather particular corner case. I think it would make more sense to introduce generic tests first. About, say, PMTU, or route exceptions, but not "tunnel causes route exception and administrative change doesn't affect PMTU". -- Stefano
Re: [patch net] mlxsw: spectrum_switchdev: Check success of FDB add operation
From: Jiri Pirko Date: Thu, 1 Mar 2018 11:37:05 +0100 > From: Shalom Toledo > > Until now, we assumed that in case of error when adding FDB entries, the > write operation will fail, but this is not the case. Instead, we need to > check that the number of entries reported in the response is equal to > the number of entries specified in the request. > > Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") > Reported-by: Ido Schimmel > Signed-off-by: Shalom Toledo > Reviewed-by: Ido Schimmel > Signed-off-by: Jiri Pirko Applied and queued up for -stable.
Re: [PATCH v4 2/2] r8169: switch to device-managed functions in probe (part 2)
From: Andy Shevchenko Date: Thu, 1 Mar 2018 13:27:35 +0200 > This is a follow up to the commit > > 4c45d24a759d ("r8169: switch to device-managed functions in probe") > > to move towards managed resources even more. > > Cc: Heiner Kallweit > Signed-off-by: Andy Shevchenko Applied.
Re: [PATCH v4 1/2] r8169: Dereference MMIO address immediately before use
Applied.
Re: [PATCH] net: amd8111e: remove redundant assignment to 'tx_index'
From: Colin King Date: Thu, 1 Mar 2018 16:42:40 + > From: Colin Ian King > > The variable tx_index is being initialized with a value that is never > read and re-assigned a little later, hence the initialization is redundant > and can be removed. > > Cleans up clang warning: > drivers/net/ethernet/amd/amd8111e.c:652:6: warning: Value stored to > 'tx_index' during its initialization is never read > > Signed-off-by: Colin Ian King Applied to net-next.
Re: [PATCH net-next 0/6] enic update
From: Govindarajulu Varadarajan Date: Thu, 1 Mar 2018 11:07:18 -0800 > This series adds support for IPv6 vxlan offload and UDP rss along with a > bug fix in filling the rq ring. Applied, thank you.
Re: [PATCHv2 net-next 0/2] gre: add sequence number for collect md mode.
From: William Tu Date: Thu, 1 Mar 2018 13:49:56 -0800 > Currently GRE sequence number can only be used in native tunnel mode. > The first patch adds sequence number support for gre collect > metadata mode, and the second patch tests it using BPF. > > RFC2890 defines GRE sequence number to be specific to the traffic > flow identified by the key. However, this patch does not implement > per-key seqno. The sequence number is shared in the same tunnel > device. That is, different tunnel keys using the same collect_md > tunnel share single sequence number. > > A new BFP uapi tunnel flag 'BPF_F_SEQ_NUMBER' is added. > -- > v1->v2: > rename BPF_F_GRE_SEQ to BPF_F_SEQ_NUMBER suggested by Daniel Series applied, thank you.
Re: [PATCH net-next] selftests: rtnetlink: remove testns on test fail
From: Prashant Bhole Date: Fri, 2 Mar 2018 11:22:20 +0900 > This patch removes testns after test failure so that next test can > continue with clean ns > > Signed-off-by: Prashant Bhole Applied.
Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang Date: Fri, 2 Mar 2018 17:29:14 +0800 > XDP_REDIRECT support for mergeable buffer was removed since commit > 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() > case"). This is because we don't reserve enough tailroom for struct > skb_shared_info which breaks XDP assumption. So this patch fixes this > by reserving enough tailroom and using fixed size of rx buffer. > > Signed-off-by: Jason Wang > --- > Changes from V1: > - do not add duplicated tracepoint when redirection fails Applied to net-next, thanks Jason.
Re: [PATCH net] tc-testing: skbmod: fix match value of ethertype
From: Davide Caratti Date: Fri, 2 Mar 2018 14:44:39 +0100 > iproute2 print_skbmod() prints the configured ethertype using format 0x%X: > therefore, test 9aa8 systematically fails, because it configures action #4 > using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing > the expected value to 0x31 lets the test result 'not ok' become 'ok'. > > tested with: > # ./tdc.py -e 9aa8 > Test 9aa8: Get a single skbmod action from a list > All test results: > > 1..1 > ok 1 9aa8 Get a single skbmod action from a list > > Fixes: cf797ac49b94 ("tc-testing: Add test cases for police and skbmod") > Signed-off-by: Davide Caratti Applied, thanks Davide.
Re: [PATCH net-next] ipvlan: forbid vlan devices on top of ipvlan
From: Paolo Abeni Date: Fri, 2 Mar 2018 16:03:32 +0100 > Currently we allow the creation of 8021q devices on top of > ipvlan, but such devices are nonfunctional, as the underlying > ipvlan rx_hanlder hook can't match the relevant traffic. > > Be explicit and forbid the creation of such nonfunctional devices. > > Signed-off-by: Paolo Abeni Applied.
Re: [PATCH net] ppp: prevent unregistered channels from connecting to PPP units
From: Guillaume Nault Date: Fri, 2 Mar 2018 18:41:16 +0100 > PPP units don't hold any reference on the channels connected to it. > It is the channel's responsibility to ensure that it disconnects from > its unit before being destroyed. > In practice, this is ensured by ppp_unregister_channel() disconnecting > the channel from the unit before dropping a reference on the channel. > > However, it is possible for an unregistered channel to connect to a PPP > unit: register a channel with ppp_register_net_channel(), attach a > /dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel > with ppp_unregister_channel() and finally connect the /dev/ppp file to > a PPP unit with ioctl(PPPIOCCONNECT). > > Once in this situation, the channel is only held by the /dev/ppp file, > which can be released at anytime and free the channel without letting > the parent PPP unit know. Then the ppp structure ends up with dangling > pointers in its ->channels list. > > Prevent this scenario by forbidding unregistered channels from > connecting to PPP units. This maintains the code logic by keeping > ppp_unregister_channel() responsible from disconnecting the channel if > necessary and avoids modification on the reference counting mechanism. > > This issue seems to predate git history (successfully reproduced on > Linux 2.6.26 and earlier PPP commits are unrelated). > > Signed-off-by: Guillaume Nault Applied and queued up for -stable, thank you.
Re: [PATCH 0/5] pull request for net-next: batman-adv 2018-03-02
From: Simon Wunderlich Date: Fri, 2 Mar 2018 18:57:40 +0100 > here is a little cleanup pull request of batman-adv to go into net-next. > > Please pull or let me know of any problem! Pulled, thanks Simon.
[PATCH net-next v2] net/ncsi: Add generic netlink family
Add a generic netlink family for NCSI. This supports three commands; NCSI_CMD_PKG_INFO which returns information on packages and their associated channels, NCSI_CMD_SET_INTERFACE which allows a specific package or package/channel combination to be set as the preferred choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting. Signed-off-by: Samuel Mendoza-Jonas --- v2: Add a separate NCSI_CMD_CLEAR_INTERFACE command instead of allowing missing attributes in NCSI_CMD_SET_INTERFACE. include/uapi/linux/ncsi.h | 115 + net/ncsi/Makefile | 2 +- net/ncsi/internal.h | 3 + net/ncsi/ncsi-manage.c| 30 +++- net/ncsi/ncsi-netlink.c | 421 ++ net/ncsi/ncsi-netlink.h | 20 +++ 6 files changed, 586 insertions(+), 5 deletions(-) create mode 100644 include/uapi/linux/ncsi.h create mode 100644 net/ncsi/ncsi-netlink.c create mode 100644 net/ncsi/ncsi-netlink.h diff --git a/include/uapi/linux/ncsi.h b/include/uapi/linux/ncsi.h new file mode 100644 index ..4c292ecbb748 --- /dev/null +++ b/include/uapi/linux/ncsi.h @@ -0,0 +1,115 @@ +/* + * Copyright Samuel Mendoza-Jonas, IBM Corporation 2018. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#ifndef __UAPI_NCSI_NETLINK_H__ +#define __UAPI_NCSI_NETLINK_H__ + +/** + * enum ncsi_nl_commands - supported NCSI commands + * + * @NCSI_CMD_UNSPEC: unspecified command to catch errors + * @NCSI_CMD_PKG_INFO: list package and channel attributes. Requires + * NCSI_ATTR_IFINDEX. If NCSI_ATTR_PACKAGE_ID is specified returns the + * specific package and its channels - otherwise a dump request returns + * all packages and their associated channels. + * @NCSI_CMD_SET_INTERFACE: set preferred package and channel combination. + * Requires NCSI_ATTR_IFINDEX and the preferred NCSI_ATTR_PACKAGE_ID and + * optionally the preferred NCSI_ATTR_CHANNEL_ID. + * @NCSI_CMD_CLEAR_INTERFACE: clear any preferred package/channel combination. + * Requires NCSI_ATTR_IFINDEX. + * @NCSI_CMD_MAX: highest command number + */ +enum ncsi_nl_commands { + NCSI_CMD_UNSPEC, + NCSI_CMD_PKG_INFO, + NCSI_CMD_SET_INTERFACE, + NCSI_CMD_CLEAR_INTERFACE, + + __NCSI_CMD_AFTER_LAST, + NCSI_CMD_MAX = __NCSI_CMD_AFTER_LAST - 1 +}; + +/** + * enum ncsi_nl_attrs - General NCSI netlink attributes + * + * @NCSI_ATTR_UNSPEC: unspecified attributes to catch errors + * @NCSI_ATTR_IFINDEX: ifindex of network device using NCSI + * @NCSI_ATTR_PACKAGE_LIST: nested array of NCSI_PKG_ATTR attributes + * @NCSI_ATTR_PACKAGE_ID: package ID + * @NCSI_ATTR_CHANNEL_ID: channel ID + * @NCSI_ATTR_MAX: highest attribute number + */ +enum ncsi_nl_attrs { + NCSI_ATTR_UNSPEC, + NCSI_ATTR_IFINDEX, + NCSI_ATTR_PACKAGE_LIST, + NCSI_ATTR_PACKAGE_ID, + NCSI_ATTR_CHANNEL_ID, + + __NCSI_ATTR_AFTER_LAST, + NCSI_ATTR_MAX = __NCSI_ATTR_AFTER_LAST - 1 +}; + +/** + * enum ncsi_nl_pkg_attrs - NCSI netlink package-specific attributes + * + * @NCSI_PKG_ATTR_UNSPEC: unspecified attributes to catch errors + * @NCSI_PKG_ATTR: nested array of package attributes + * @NCSI_PKG_ATTR_ID: package ID + * @NCSI_PKG_ATTR_FORCED: flag signifying a package has been set as preferred + * @NCSI_PKG_ATTR_CHANNEL_LIST: nested array of NCSI_CHANNEL_ATTR attributes + * @NCSI_PKG_ATTR_MAX: highest attribute number + */ +enum ncsi_nl_pkg_attrs { + NCSI_PKG_ATTR_UNSPEC, + NCSI_PKG_ATTR, + NCSI_PKG_ATTR_ID, + NCSI_PKG_ATTR_FORCED, + NCSI_PKG_ATTR_CHANNEL_LIST, + + __NCSI_PKG_ATTR_AFTER_LAST, + NCSI_PKG_ATTR_MAX = __NCSI_PKG_ATTR_AFTER_LAST - 1 +}; + +/** + * enum ncsi_nl_channel_attrs - NCSI netlink channel-specific attributes + * + * @NCSI_CHANNEL_ATTR_UNSPEC: unspecified attributes to catch errors + * @NCSI_CHANNEL_ATTR: nested array of channel attributes + * @NCSI_CHANNEL_ATTR_ID: channel ID + * @NCSI_CHANNEL_ATTR_VERSION_MAJOR: channel major version number + * @NCSI_CHANNEL_ATTR_VERSION_MINOR: channel minor version number + * @NCSI_CHANNEL_ATTR_VERSION_STR: channel version string + * @NCSI_CHANNEL_ATTR_LINK_STATE: channel link state flags + * @NCSI_CHANNEL_ATTR_ACTIVE: channels with this flag are in + * NCSI_CHANNEL_ACTIVE state + * @NCSI_CHANNEL_ATTR_FORCED: flag signifying a channel has been set as + * preferred + * @NCSI_CHANNEL_ATTR_VLAN_LIST: nested array of NCSI_CHANNEL_ATTR_VLAN_IDs + * @NCSI_CHANNEL_ATTR_VLAN_ID: VLAN ID being filtered on this channel + * @NCSI_CHANNEL_ATTR_MAX: highest attribute number + */ +enum ncsi_nl_channel_attrs { + NCSI_CHANNEL_ATTR_UNSPEC, + NCSI_CHANNEL_ATTR, + NCSI_CHANNEL_ATTR_ID, + NCSI_CHANNEL_ATTR_VERSION_MAJOR, + NCSI_CHANNEL_ATTR_VERSION_MIN
Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
On 3/4/18 4:12 PM, Stefano Brivio wrote: > On Sat, 3 Mar 2018 12:22:36 +0100 > Stefano Brivio wrote: > >>> And please codify the above expectation as a test under >>> tools/testing/selftests/net >> >> And this, along with v2. > > On a second thought: I start thinking it doesn't make much sense, > especially given the current context of self-tests, to explicitly test > this, because it's a rather particular corner case. > > I think it would make more sense to introduce generic tests first. > About, say, PMTU, or route exceptions, but not "tunnel causes route > exception and administrative change doesn't affect PMTU". > I would argue corner cases in particular should be documented. >From the commit message it seems like you took the time to create a test setup using network namespaces. Throw those commands into a shell script -- tools/testing/selftests/net/mtu.sh. It can evolve from there.
Re: [PATCH net-next] selftests: forwarding: Add suppport to create veth interfaces
On 3/4/18 1:14 AM, Ido Schimmel wrote: > On Fri, Mar 02, 2018 at 08:45:53AM -0800, David Ahern wrote: >> For tests using veth interfaces, the test infrastructure can create >> the netdevs if they do not exist. Arguably this is a preferred approach >> since the tests require p$N and p$(N+1) to be pairs. >> >> Signed-off-by: David Ahern > > [...] > >> diff --git a/tools/testing/selftests/net/forwarding/lib.sh >> b/tools/testing/selftests/net/forwarding/lib.sh >> index d0af52109360..2ce98c6a8c25 100644 >> --- a/tools/testing/selftests/net/forwarding/lib.sh >> +++ b/tools/testing/selftests/net/forwarding/lib.sh >> @@ -76,6 +76,39 @@ done >> >> ## >> # Network interfaces configuration >> >> +create_netif_veth() >> +{ >> +local i >> + >> +for i in $(eval echo {1..$NUM_NETIFS}); do >> +j=$((i+1)) > > local j=$((i+1)) and drop a line. not sure how it drops a line but added the 'local' for j since it was missing. > >> +ip link show dev ${NETIFS[p$i]} &> /dev/null >> +if [[ $? -ne 0 ]]; then >> +ip link add ${NETIFS[p$i]} type veth peer name >> ${NETIFS[p$j]} > > Need to break this one. FWIW, I have this in my config: going for readability over strict line lengths. Wrapped in v2. > > $ cat ~/.vim/after/ftplugin/sh.vim > ... > highlight OverLength ctermbg=red ctermfg=white > match OverLength /\%81v.\+/ > > Cool patch! Tested on my machine. > >> +if [[ $? -ne 0 ]]; then >> +echo "Failed to create netif" >> +exit 1 >> +fi >> +fi >> +i=$j >> +done >> +} >> + >> +create_netif() >> +{ >> +case "$NETIF_TYPE" in >> +veth) create_netif_veth >> + ;; >> +*) echo "Can not create interfaces of type \'$NETIF_TYPE\'" >> + exit 1 >> + ;; >> +esac >> +} >> + >> +if [[ "$NETIF_CREATE" = "yes" ]]; then >> +create_netif >> +fi >> + >> for i in $(eval echo {1..$NUM_NETIFS}); do >> ip link show dev ${NETIFS[p$i]} &> /dev/null >> if [[ $? -ne 0 ]]; then >> -- >> 2.11.0 >>
[PATCH v2 net-next] selftests: forwarding: Add suppport to create veth interfaces
For tests using veth interfaces, the test infrastructure can create the netdevs if they do not exist. Arguably this is a preferred approach since the tests require p$N and p$(N+1) to be pairs. Signed-off-by: David Ahern --- v2 - local on j declaration and line wrap .../net/forwarding/forwarding.config.sample| 5 tools/testing/selftests/net/forwarding/lib.sh | 35 ++ 2 files changed, 40 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/forwarding.config.sample b/tools/testing/selftests/net/forwarding/forwarding.config.sample index ab235c124f20..df54c9eb5100 100644 --- a/tools/testing/selftests/net/forwarding/forwarding.config.sample +++ b/tools/testing/selftests/net/forwarding/forwarding.config.sample @@ -14,6 +14,11 @@ NETIFS[p6]=veth5 NETIFS[p7]=veth6 NETIFS[p8]=veth7 +NETIF_TYPE=veth + +# only virtual interfaces (veth) can be created by test infra +#NETIF_CREATE=yes + ## # Defines diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index d0af52109360..273511ef2b43 100644 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -76,6 +76,41 @@ done ## # Network interfaces configuration +create_netif_veth() +{ + local i + + for i in $(eval echo {1..$NUM_NETIFS}); do + local j=$((i+1)) + + ip link show dev ${NETIFS[p$i]} &> /dev/null + if [[ $? -ne 0 ]]; then + ip link add ${NETIFS[p$i]} type veth \ + peer name ${NETIFS[p$j]} + if [[ $? -ne 0 ]]; then + echo "Failed to create netif" + exit 1 + fi + fi + i=$j + done +} + +create_netif() +{ + case "$NETIF_TYPE" in + veth) create_netif_veth + ;; + *) echo "Can not create interfaces of type \'$NETIF_TYPE\'" + exit 1 + ;; + esac +} + +if [[ "$NETIF_CREATE" = "yes" ]]; then + create_netif +fi + for i in $(eval echo {1..$NUM_NETIFS}); do ip link show dev ${NETIFS[p$i]} &> /dev/null if [[ $? -ne 0 ]]; then -- 2.11.0
linux-next: manual merge of the selinux tree with the net-next tree
Hi Paul, Today's linux-next merge of the selinux tree got a conflict in: net/sctp/socket.c between several refactoring commits from the net-next tree and commit: 2277c7cd75e3 ("sctp: Add LSM hooks") from the selinux tree. I fixed it up (I think - see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc net/sctp/socket.c index 7fa76031bb08,73b34a6b5b09.. --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@@ -1606,193 -1622,362 +1622,209 @@@ static int sctp_error(struct sock *sk, static int sctp_msghdr_parse(const struct msghdr *msg, struct sctp_cmsgs *cmsgs); -static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) +static int sctp_sendmsg_parse(struct sock *sk, struct sctp_cmsgs *cmsgs, +struct sctp_sndrcvinfo *srinfo, +const struct msghdr *msg, size_t msg_len) { - struct net *net = sock_net(sk); - struct sctp_sock *sp; - struct sctp_endpoint *ep; - struct sctp_association *new_asoc = NULL, *asoc = NULL; - struct sctp_transport *transport, *chunk_tp; - struct sctp_chunk *chunk; - union sctp_addr to; - struct sctp_af *af; - struct sockaddr *msg_name = NULL; - struct sctp_sndrcvinfo default_sinfo; - struct sctp_sndrcvinfo *sinfo; - struct sctp_initmsg *sinit; - sctp_assoc_t associd = 0; - struct sctp_cmsgs cmsgs = { NULL }; - enum sctp_scope scope; - bool fill_sinfo_ttl = false, wait_connect = false; - struct sctp_datamsg *datamsg; - int msg_flags = msg->msg_flags; - __u16 sinfo_flags = 0; - long timeo; + __u16 sflags; int err; - err = 0; - sp = sctp_sk(sk); - ep = sp->ep; - - pr_debug("%s: sk:%p, msg:%p, msg_len:%zu ep:%p\n", __func__, sk, - msg, msg_len, ep); + if (sctp_sstate(sk, LISTENING) && sctp_style(sk, TCP)) + return -EPIPE; - /* We cannot send a message over a TCP-style listening socket. */ - if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING)) { - err = -EPIPE; - goto out_nounlock; - } + if (msg_len > sk->sk_sndbuf) + return -EMSGSIZE; - /* Parse out the SCTP CMSGs. */ - err = sctp_msghdr_parse(msg, &cmsgs); + memset(cmsgs, 0, sizeof(*cmsgs)); + err = sctp_msghdr_parse(msg, cmsgs); if (err) { pr_debug("%s: msghdr parse err:%x\n", __func__, err); - goto out_nounlock; + return err; } - /* Fetch the destination address for this packet. This - * address only selects the association--it is not necessarily - * the address we will send to. - * For a peeled-off socket, msg_name is ignored. - */ - if (!sctp_style(sk, UDP_HIGH_BANDWIDTH) && msg->msg_name) { - int msg_namelen = msg->msg_namelen; - - err = sctp_verify_addr(sk, (union sctp_addr *)msg->msg_name, - msg_namelen); - if (err) - return err; - - if (msg_namelen > sizeof(to)) - msg_namelen = sizeof(to); - memcpy(&to, msg->msg_name, msg_namelen); - msg_name = msg->msg_name; + memset(srinfo, 0, sizeof(*srinfo)); + if (cmsgs->srinfo) { + srinfo->sinfo_stream = cmsgs->srinfo->sinfo_stream; + srinfo->sinfo_flags = cmsgs->srinfo->sinfo_flags; + srinfo->sinfo_ppid = cmsgs->srinfo->sinfo_ppid; + srinfo->sinfo_context = cmsgs->srinfo->sinfo_context; + srinfo->sinfo_assoc_id = cmsgs->srinfo->sinfo_assoc_id; + srinfo->sinfo_timetolive = cmsgs->srinfo->sinfo_timetolive; } - sinit = cmsgs.init; - if (cmsgs.sinfo != NULL) { - memset(&default_sinfo, 0, sizeof(default_sinfo)); - default_sinfo.sinfo_stream = cmsgs.sinfo->snd_sid; - default_sinfo.sinfo_flags = cmsgs.sinfo->snd_flags; - default_sinfo.sinfo_ppid = cmsgs.sinfo->snd_ppid; - default_sinfo.sinfo_context = cmsgs.sinfo->snd_context; - default_sinfo.sinfo_assoc_id = cmsgs.sinfo->snd_assoc_id; - - sinfo = &default_sinfo; - fill_sinfo_ttl = true; - } else { - sinfo = cmsgs.srinfo; - } - /* Did the user specify SNDINFO/SNDRCVINFO? */ - if (sinfo) { - sinfo_flags = sinfo->sinfo_flags; - associd = sin
Re: [PATCH v2] net: ethernet: Drop unnecessary continue
On Sat, 3 Mar 2018 21:44:56 +0530, Arushi Singhal wrote: > diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c > b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c > index 15fa47f..5cd4f3f 100644 > --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c > +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c > @@ -258,10 +258,8 @@ nfp_net_pf_alloc_vnics(struct nfp_pf *pf, void __iomem > *ctrl_bar, > ctrl_bar += NFP_PF_CSR_SLICE_SIZE; > > /* Kill the vNIC if app init marked it as invalid */ > - if (nn->port && nn->port->type == NFP_PORT_INVALID) { > + if (nn->port && nn->port->type == NFP_PORT_INVALID) > nfp_net_pf_free_vnic(pf, nn); > - continue; > - } This is an error handling path so the continue makes sense here to indicate the processing can't ever fall through if more statements are ever added to the loop. But OK. > } > > if (list_empty(&pf->vnics))
Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
On 2018年03月03日 00:07, Jesper Dangaard Brouer wrote: On Fri, 2 Mar 2018 17:29:14 +0800 Jason Wang wrote: XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang --- Changes from V1: - do not add duplicated tracepoint when redirection fails Acked-by: Jesper Dangaard Brouer I gave it a quick spin on my testlab, and cpumap seems to work/not-crash now (if I managed to turn back config to receive_mergeable() correctly ;-)). Thanks for the testing and reviewing.
Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
On 2018年03月03日 01:36, Michael S. Tsirkin wrote: On Fri, Mar 02, 2018 at 05:29:14PM +0800, Jason Wang wrote: XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang Acked-by: Michael S. Tsirkin I think the next incremental step is to look at splitting out fast path XDP processing to a separate set of functions. Let me try (probably after 1.1 stuffs). Thanks
Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
On 2018年03月05日 07:38, David Miller wrote: From: Jason Wang Date: Fri, 2 Mar 2018 17:29:14 +0800 XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() case"). This is because we don't reserve enough tailroom for struct skb_shared_info which breaks XDP assumption. So this patch fixes this by reserving enough tailroom and using fixed size of rx buffer. Signed-off-by: Jason Wang --- Changes from V1: - do not add duplicated tracepoint when redirection fails Applied to net-next, thanks Jason. Hi David, Consider the change is not large, any chance to make it for -net to keep XDP redirection work? Thanks
Re: [PATCH net V2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
From: Jason Wang Date: Mon, 5 Mar 2018 10:43:41 +0800 > > > On 2018年03月05日 07:38, David Miller wrote: >> From: Jason Wang >> Date: Fri, 2 Mar 2018 17:29:14 +0800 >> >>> XDP_REDIRECT support for mergeable buffer was removed since commit >>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable() >>> case"). This is because we don't reserve enough tailroom for struct >>> skb_shared_info which breaks XDP assumption. So this patch fixes this >>> by reserving enough tailroom and using fixed size of rx buffer. >>> >>> Signed-off-by: Jason Wang >>> --- >>> Changes from V1: >>> - do not add duplicated tracepoint when redirection fails >> Applied to net-next, thanks Jason. > > Hi David, > > Consider the change is not large, any chance to make it for -net to > keep XDP redirection work? Ok, I'll apply this to 'net' too.
Re: [PATCH PATCH net v2 0/9] hv_netvsc: minor fixes
From: Stephen Hemminger Date: Fri, 2 Mar 2018 13:49:00 -0800 > These are improvements to netvsc driver. They aren't functionality > changes so not targeting net-next; and they are not show stopper > bugs that need to go to stable either. > > v2 >- drop the irq flags patch, defer it to net-next >- split the multicast filter flag patch out >- change propogate rx mode patch to handle startup of vf Series applied, thanks Stephen.
[GIT] Networking
1) Use an appropriate TSQ pacing shift in mac80211, from Toke Høiland-Jørgensen. 2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk in ip6_route_me_harder, from Eric Dumazet. 3) Fix several shutdown races and similar other problems in l2tp, from James Chapman. 4) Handle missing XDP flush properly in tuntap, for real this time. From Jason Wang. 5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel Borkmann. 6) Fix phy_resume() locking, from Andrew Lunn. 7) IFLA_MTU values are ignored on newlink for some tunnel types, fix from Xin Long. 8) Revert F-RTO middle box workarounds, they only handle one dimension of the problem. From Yuchung Cheng. 9) Fix socket refcounting in RDS, from Ka-Cheong Poon. 10) Don't allow ppp unit registration to an unregistered channel, from Guillaume Nault. 11) Various hv_netvsc fixes from Stephen Hemminger. Please pull, thanks a lot! The following changes since commit 9cb9c07d6b0c5fd97d83b8ab14d7e308ba4b612f: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2018-02-23 15:14:17 -0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to a7f0fb1bfb66ded5d556d6723d691b77a7146b6f: Merge branch 'hv_netvsc-minor-fixes' (2018-03-04 22:18:21 -0500) Andrew Lunn (1): net: phy: Restore phy_resume() locking assumption Arkadi Sharshevsky (2): devlink: Compare to size_new in case of resource child validation devlink: Fix resource coverity errors Arnd Bergmann (1): net: ipv4: avoid unused variable warning for sysctl Bassem Boubaker (1): cdc_ether: flag the Cinterion PLS8 modem by gemalto as WWAN Boris Pismenny (1): tls: Use correct sk->sk_prot for IPV6 Claudiu Manoil (1): gianfar: Fix Rx byte accounting for ndev stats Daniel Axtens (4): net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len net: sched: tbf: handle GSO_BY_FRAGS case in enqueue net: xfrm: use skb_gso_validate_network_len() to check gso sizes net: make skb_gso_*_seglen functions private Daniel Borkmann (2): bpf: allow xadd only on aligned memory bpf, ppc64: fix out of bounds access in tail call David S. Miller (14): Merge branch 'l2tp-fix-API-races-discovered-by-syzbot' ARM: orion5x: Revert commit 4904dbda41c8. Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge branch 'tunnel-mtu-fixes' Merge branch 's390-qeth-fixes' Merge branch 'tcp-revert-a-F-RTO-extension-due-to-broken-middle-boxes' Merge branch 'net-smc-fixes' Merge branch 'mlxsw-fixes' Merge git://git.kernel.org/.../bpf/bpf Merge tag 'mac80211-for-davem-2018-03-02' of git://git.kernel.org/.../jberg/mac80211 Merge git://git.kernel.org/.../pablo/nf Merge tag 'batadv-net-for-davem-20180302' of git://git.open-mesh.org/linux-merge Merge branch 'GSO_BY_FRAGS-correctness-improvements' Merge branch 'hv_netvsc-minor-fixes' Davide Caratti (2): net/smc: fix NULL pointer dereference on sock_create_kern() error path tc-testing: skbmod: fix match value of ethertype Denis Du (1): hdlc_ppp: carrier detect ok, don't turn off negotiation Edward Cree (1): net: ethtool: don't ignore return from driver get_fecparam method Emil Tantilov (1): ixgbe: fix crash in build_skb Rx code path Eric Dumazet (4): netfilter: use skb_to_full_sk in ip6_route_me_harder test_bpf: add a schedule point r8152: fix tx packets accounting test_bpf: reduce MAX_TESTRUNS Felix Fietkau (2): mac80211: drop frames with unexpected DS bits from fast-rx to slow path netfilter: nf_flow_table: fix checksum when handling DNAT Florian Westphal (7): netfilter: ipt_CLUSTERIP: put config struct if we can't increment ct refcount netfilter: ipt_CLUSTERIP: put config instead of freeing it netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pkt netfilter: bridge: ebt_among: add missing match size checks netfilter: ebtables: convert BUG_ONs to WARN_ONs netfilter: ebtables: CONFIG_COMPAT: don't trust userland offsets netfilter: don't set F_IFACE on ipv6 fib lookups Guillaume Nault (1): ppp: prevent unregistered channels from connecting to PPP units Hans de Goede (1): Bluetooth: btusb: Use DMI matching for QCA reset_resume quirking Ido Schimmel (3): bridge: Fix VLAN reference count problem mlxsw: spectrum: Treat IPv6 unregistered multicast as broadcast spectrum: Reference count VLAN entries James Chapman (5): l2tp: don't use inet_shutdown on tunnel destroy l2tp: don't use inet_shutdown on ppp session destroy l2tp: fix races with tunnel socket close l2tp: fix race in pppol2tp_release with session object destroy l2tp: f
Re: [RFC PATCH V1 00/12] audit: implement container id
On 2018-03-04 16:55, Mimi Zohar wrote: > On Thu, 2018-03-01 at 14:41 -0500, Richard Guy Briggs wrote: > > Implement audit kernel container ID. > > > > This patchset is a preliminary RFC based on the proposal document (V3) > > posted: > > https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html > > > > The first patch implements the proc fs write to set the audit container > > ID of a process, emitting an AUDIT_CONTAINER record. > > > > The second implements an auxiliary syscall record AUDIT_CONTAINER_INFO > > if a container ID is present on a task. > > > > The third adds filtering to the exit, exclude and user lists. > > > > The 4th, implements reading the container ID from the proc filesystem > > for debugging. This isn't planned for upstream inclusion. > > > > The 5th adds signal and ptrace support. > > > > The 6th attempts to create a local audit context to be able to bind a > > standalone record with the container ID record. > > > > The 7th, 8th, 9th, 10th patches add container ID records to standalone > > records. Some of these may end up being syscall auxiliary records and > > won't need this specific support since they'll be supported via > > syscalls. > > > > The 11th is a temporary workaround due to the AUDIT_CONTAINER records > > not showing up as do AUDIT_LOGIN records. I suspect this is due to its > > range (1000 vs 1300), but the intent is to solve it. > > > > The 12th adds debug information not intended for upstream for those > > brave souls wanting to tinker with it in this early state. > > > > Feedback please! > > Which tree can this patch set be applied to? git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit.git next > Mimi > > > Here's a quick and dirty test script: > > echo 123455 > /proc/$$/containerid; echo $? > > sleep 4& > > child=$!; sleep 1 > > echo 18446744073709551615 > /proc/$child/containerid; echo $? > > echo 123456 > /proc/$child/containerid; echo $? > > echo 123457 > /proc/$child/containerid; echo $? > > sleep 1 > > ausearch -ts recent |grep " contid=18446744073709551615"; echo $? > > ausearch -ts recent |grep " contid=123456"; echo $? > > ausearch -ts recent |grep " contid=123457"; echo $? > > echo self:$$ contid:$( cat /proc/$$/containerid) > > echo child:$child contid:$( cat /proc/$child/containerid) > > > > containerid=123458 > > key=tmpcontainerid > > auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid > > -F key=$key || echo failed to add containerid filter rule > > bash -c "sleep 1; echo test > /tmp/$key"& > > child=$! > > echo $containerid > /proc/$child/containerid > > sleep 2 > > rm -f /tmp/$key > > ausearch -ts recent -k $key || echo failed to find CONTAINER_INFO record > > auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid > > -F key=$key || echo failed to add containerid filter rule > > > > See: > > https://github.com/linux-audit/audit-kernel/issues/32 > > https://github.com/linux-audit/audit-userspace/issues/40 > > https://github.com/linux-audit/audit-testsuite/issues/64 > > > > Richard Guy Briggs (12): > > audit: add container id > > audit: log container info of syscalls > > audit: add containerid filtering > > audit: read container ID of a process > > audit: add containerid support for ptrace and signals > > audit: add support for non-syscall auxiliary records > > audit: add container aux record to watch/tree/mark > > audit: add containerid support for tty_audit > > audit: add containerid support for config/feature/user records > > audit: add containerid support for seccomp and anom_abend records > > debug audit: add container id > > debug! audit: add container id > > > > drivers/tty/tty_audit.c| 5 +- > > fs/proc/base.c | 63 +++ > > include/linux/audit.h | 36 +++ > > include/linux/init_task.h | 4 +- > > include/linux/sched.h | 1 + > > include/uapi/linux/audit.h | 9 ++- > > kernel/audit.c | 74 +++--- > > kernel/audit.h | 3 + > > kernel/audit_fsnotify.c| 5 +- > > kernel/audit_tree.c| 5 +- > > kernel/audit_watch.c | 33 +- > > kernel/auditfilter.c | 52 ++- > > kernel/auditsc.c | 154 > > +++-- > > 13 files changed, 408 insertions(+), 36 deletions(-) > > > - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635
[PATCH v3] staging: ipx: Replace printk() with appropriate net_*macro_ratelimited()
Replace printk having a log level with the appropriate net_*macro_ratelimited. It's better to use actual device name as a prefix in error messages. Signed-off-by: Arushi Singhal --- changes in v2 *In v1 printk was changed to pr_*macro(), which is used in kernel instead of calling printk() directly. And for drivers, dev_*macro() or net_*macro_ratelimited() should be used for calling printk() directly. changes in v3 *Indentation is not changed, as line is exceeding 80 characters limit. drivers/staging/ipx/af_ipx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c index d21a9d1..5ec6591 100644 --- a/drivers/staging/ipx/af_ipx.c +++ b/drivers/staging/ipx/af_ipx.c @@ -744,7 +744,7 @@ static void ipxitf_discover_netnum(struct ipx_interface *intrfc, intrfc->if_netnum = cb->ipx_source_net; ipxitf_add_local_route(intrfc); } else { - printk(KERN_WARNING "IPX: Network number collision " + net_warn_ratelimited("IPX: Network number collision " "%lx\n%s %s and %s %s\n", (unsigned long) ntohl(cb->ipx_source_net), ipx_device_name(i), -- 2.7.4
[PATCH] net/mlx4_en: fix potential use-after-free with dma_unmap_page
Take an additional reference to a page whenever it is placed into the rx ring and put the page again after running dma_unmap_page. When swiotlb is in use, calling dma_unmap_page means that the original page mapped with dma_map_page must still be valid, as swiotlb will copy data from its internal cache back to the originally requested DMA location. When GRO is enabled, before this patch all references to the original frag may be put and the page freed before dma_unmap_page in mlx4_en_free_frag is called. It is possible there is a path where the use-after-free occurs even with GRO disabled, but this has not been observed so far. The bug can be trivially detected by doing the following: * Compile the kernel with DEBUG_PAGEALLOC * Run the kernel as a Xen Dom0 * Leave GRO enabled on the interface * Run a 10 second or more test with iperf over the interface. This bug was likely introduced in commit 4cce66cdd14a ("mlx4_en: map entire pages to increase throughput"), first part of u3.6. It was incidentally fixed in commit 34db548bfb95 ("mlx4: add page recycling in receive path"), first part of v4.12. This version applies to the v4.9 series. Signed-off-by: Sarah Newman Tested-by: Sarah Newman --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 39 +--- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 3 ++- drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 1 + 3 files changed, 32 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index bcbb80f..d1fb087 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -80,10 +80,14 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, page_alloc->page = page; page_alloc->dma = dma; page_alloc->page_offset = 0; + page_alloc->page_owner = true; /* Not doing get_page() for each frag is a big win * on asymetric workloads. Note we can not use atomic_set(). */ - page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1); + /* Since the page must be valid until after dma_unmap_page is called, +* take an additional reference we would not have otherwise. +*/ + page_ref_add(page, page_alloc->page_size / frag_info->frag_stride); return 0; } @@ -105,9 +109,13 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, page_alloc[i].page_offset += frag_info->frag_stride; if (page_alloc[i].page_offset + frag_info->frag_stride <= - ring_alloc[i].page_size) - continue; - + ring_alloc[i].page_size) { + WARN_ON(!page_alloc[i].page); + WARN_ON(!page_alloc[i].page_owner); + if (likely(page_alloc[i].page && + page_alloc[i].page_owner)) + continue; + } if (unlikely(mlx4_alloc_pages(priv, &page_alloc[i], frag_info, gfp))) goto out; @@ -131,7 +139,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv, page = page_alloc[i].page; /* Revert changes done by mlx4_alloc_pages */ page_ref_sub(page, page_alloc[i].page_size / - priv->frag_info[i].frag_stride - 1); + priv->frag_info[i].frag_stride); put_page(page); } } @@ -146,11 +154,13 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv, u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride; - if (next_frag_end > frags[i].page_size) + if (next_frag_end > frags[i].page_size) { dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size, frag_info->dma_dir); + put_page(frags[i].page); + } - if (frags[i].page) + if (frags[i].page_owner) put_page(frags[i].page); } @@ -184,9 +194,10 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv, page = page_alloc->page; /* Revert changes done by mlx4_alloc_pages */ page_ref_sub(page, page_alloc->page_size / - priv->frag_info[i].frag_stride - 1); + priv->frag_info[i].frag_stride); put_page(page); page_alloc->page = NULL; + page_alloc->page_owner = false; } return -ENOMEM; } @@ -206,12 +217,14 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv, dma_unmap_page(priv->ddev, page_alloc->dma, page_alloc->page_size, frag_info->dma_dir)
Re: [PATCH v3] staging: ipx: Replace printk() with appropriate net_*macro_ratelimited()
On Mon, Mar 05, 2018 at 09:47:40AM +0530, Arushi Singhal wrote: > Replace printk having a log level with the appropriate > net_*macro_ratelimited. > It's better to use actual device name as a prefix in error messages. > > Signed-off-by: Arushi Singhal > --- > changes in v2 > *In v1 printk was changed to pr_*macro(), which is used > in kernel instead of calling printk() directly. And for drivers, > dev_*macro() or net_*macro_ratelimited() should be used for calling > printk() directly. > > changes in v3 > *Indentation is not changed, as line is exceeding 80 characters limit. > > drivers/staging/ipx/af_ipx.c | 2 +- Did you read drivers/staging/ipx/TODO? Please go do so. sorry, greg k-h
Re: linux-next: manual merge of the selinux tree with the net-next tree
On Mon, Mar 5, 2018 at 9:40 AM, Stephen Rothwell wrote: > Hi Paul, > > Today's linux-next merge of the selinux tree got a conflict in: > > net/sctp/socket.c > > between several refactoring commits from the net-next tree and commit: > > 2277c7cd75e3 ("sctp: Add LSM hooks") > > from the selinux tree. > > I fixed it up (I think - see below) and can carry the fix as The fixup is great! the same as I mentioned in: https://patchwork.ozlabs.org/patch/879898/ for net-next.git > necessary. This is now fixed as far as linux-next is concerned, but any > non trivial conflicts should be mentioned to your upstream maintainer > when your tree is submitted for merging. You may also want to consider > cooperating with the maintainer of the conflicting tree to minimise any > particularly complex conflicts. [net-next,0/9] sctp: clean up sctp_sendmsg, this patchset was just applied in net-next. So I just guess it might not yet be there when selinux tree was being submitted. > > -- > Cheers, > Stephen Rothwell > > diff --cc net/sctp/socket.c > index 7fa76031bb08,73b34a6b5b09.. > --- a/net/sctp/socket.c > +++ b/net/sctp/socket.c > @@@ -1606,193 -1622,362 +1622,209 @@@ static int sctp_error(struct sock *sk, > static int sctp_msghdr_parse(const struct msghdr *msg, > struct sctp_cmsgs *cmsgs); > > -static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) > +static int sctp_sendmsg_parse(struct sock *sk, struct sctp_cmsgs *cmsgs, > +struct sctp_sndrcvinfo *srinfo, > +const struct msghdr *msg, size_t msg_len) > { > - struct net *net = sock_net(sk); > - struct sctp_sock *sp; > - struct sctp_endpoint *ep; > - struct sctp_association *new_asoc = NULL, *asoc = NULL; > - struct sctp_transport *transport, *chunk_tp; > - struct sctp_chunk *chunk; > - union sctp_addr to; > - struct sctp_af *af; > - struct sockaddr *msg_name = NULL; > - struct sctp_sndrcvinfo default_sinfo; > - struct sctp_sndrcvinfo *sinfo; > - struct sctp_initmsg *sinit; > - sctp_assoc_t associd = 0; > - struct sctp_cmsgs cmsgs = { NULL }; > - enum sctp_scope scope; > - bool fill_sinfo_ttl = false, wait_connect = false; > - struct sctp_datamsg *datamsg; > - int msg_flags = msg->msg_flags; > - __u16 sinfo_flags = 0; > - long timeo; > + __u16 sflags; > int err; > > - err = 0; > - sp = sctp_sk(sk); > - ep = sp->ep; > - > - pr_debug("%s: sk:%p, msg:%p, msg_len:%zu ep:%p\n", __func__, sk, > - msg, msg_len, ep); > + if (sctp_sstate(sk, LISTENING) && sctp_style(sk, TCP)) > + return -EPIPE; > > - /* We cannot send a message over a TCP-style listening socket. */ > - if (sctp_style(sk, TCP) && sctp_sstate(sk, LISTENING)) { > - err = -EPIPE; > - goto out_nounlock; > - } > + if (msg_len > sk->sk_sndbuf) > + return -EMSGSIZE; > > - /* Parse out the SCTP CMSGs. */ > - err = sctp_msghdr_parse(msg, &cmsgs); > + memset(cmsgs, 0, sizeof(*cmsgs)); > + err = sctp_msghdr_parse(msg, cmsgs); > if (err) { > pr_debug("%s: msghdr parse err:%x\n", __func__, err); > - goto out_nounlock; > + return err; > } > > - /* Fetch the destination address for this packet. This > - * address only selects the association--it is not necessarily > - * the address we will send to. > - * For a peeled-off socket, msg_name is ignored. > - */ > - if (!sctp_style(sk, UDP_HIGH_BANDWIDTH) && msg->msg_name) { > - int msg_namelen = msg->msg_namelen; > - > - err = sctp_verify_addr(sk, (union sctp_addr *)msg->msg_name, > - msg_namelen); > - if (err) > - return err; > - > - if (msg_namelen > sizeof(to)) > - msg_namelen = sizeof(to); > - memcpy(&to, msg->msg_name, msg_namelen); > - msg_name = msg->msg_name; > + memset(srinfo, 0, sizeof(*srinfo)); > + if (cmsgs->srinfo) { > + srinfo->sinfo_stream = cmsgs->srinfo->sinfo_stream; > + srinfo->sinfo_flags = cmsgs->srinfo->sinfo_flags; > + srinfo->sinfo_ppid = cmsgs->srinfo->sinfo_ppid; > + srinfo->sinfo_context = cmsgs->srinfo->sinfo_context; > + srinfo->sinfo_assoc_id = cmsgs->srinfo->sinfo_assoc_id; > + srinfo->sinfo_timetolive = cmsgs->srinfo->sinfo_timetolive; > } > > - sinit = cmsgs.init; > - if (cmsgs.sinfo != NULL) { > - memset(&default_sinfo, 0, sizeof(default_sinfo)); > - default_sinfo.sinfo_stream = cmsgs.sinfo->snd_sid; > - default_sinfo.sinfo_f
Re: [Outreachy kernel] [PATCH v3] staging: ipx: Replace printk() with appropriate net_*macro_ratelimited()
On Mon, 5 Mar 2018, Arushi Singhal wrote: > Replace printk having a log level with the appropriate > net_*macro_ratelimited. > It's better to use actual device name as a prefix in error messages. > > Signed-off-by: Arushi Singhal > --- > changes in v2 > *In v1 printk was changed to pr_*macro(), which is used > in kernel instead of calling printk() directly. And for drivers, > dev_*macro() or net_*macro_ratelimited() should be used for calling > printk() directly. > > changes in v3 > *Indentation is not changed, as line is exceeding 80 characters limit. Put the v3 changes on top of the v2 changes, so that one can see immediately what has changed most recently. julia > > drivers/staging/ipx/af_ipx.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/staging/ipx/af_ipx.c b/drivers/staging/ipx/af_ipx.c > index d21a9d1..5ec6591 100644 > --- a/drivers/staging/ipx/af_ipx.c > +++ b/drivers/staging/ipx/af_ipx.c > @@ -744,7 +744,7 @@ static void ipxitf_discover_netnum(struct ipx_interface > *intrfc, > intrfc->if_netnum = cb->ipx_source_net; > ipxitf_add_local_route(intrfc); > } else { > - printk(KERN_WARNING "IPX: Network number collision " > + net_warn_ratelimited("IPX: Network number collision " > "%lx\n%s %s and %s %s\n", > (unsigned long) ntohl(cb->ipx_source_net), > ipx_device_name(i), > -- > 2.7.4 > > -- > You received this message because you are subscribed to the Google Groups > "outreachy-kernel" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to outreachy-kernel+unsubscr...@googlegroups.com. > To post to this group, send email to outreachy-ker...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/outreachy-kernel/20180305041740.GA23378%40seema-Inspiron-15-3567. > For more options, visit https://groups.google.com/d/optout. >
Re: [PATCH v2 net-next] selftests: forwarding: Add suppport to create veth interfaces
On Sun, Mar 04, 2018 at 05:37:47PM -0800, David Ahern wrote: > For tests using veth interfaces, the test infrastructure can create > the netdevs if they do not exist. Arguably this is a preferred approach > since the tests require p$N and p$(N+1) to be pairs. > > Signed-off-by: David Ahern Reviewed-by: Ido Schimmel
Re: inconsistent lock state with usbnet/asix usb ethernet and xhci
Hi Oliver, On 2018-02-27 17:07, Oliver Neukum wrote: Am Dienstag, den 27.02.2018, 07:13 -0800 schrieb Eric Dumazet: On Tue, 2018-02-27 at 07:09 -0800, Eric Dumazet wrote: Note that for this one, it seems we also could perform stats updates in BH context, since skb is queued via defer_bh() But simplicity wins I guess. Thinking more about this, I am not sure we have any guarantee that TX and RX can not run on multiple cpus. Using an unique syncp is not going to be safe, even if we make lockdep happy enough with the local_irq save/restore. Unfortunately you are right. It is not guaranteed for some hardware. Does it mean that the fix proposed by Eric is not the proper solution? Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland