Re: 4.4-rc7 failure report
On 12/28/2015 10:53 PM, Doug Ledford wrote: The 4.4-rc7 kernel is failing for me. In my case, all of my vlan interfaces are failing to obtain a dhcp address using dhclient. I've tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both failed. I've tried NetworkManager and the old SysV network service, both fail. I tried a working dhclient from rhel7 on the Fedora rawhide install and it failed too. Running tcpdump on the interface shows the dhcp request going out, and a dhcp response coming back in. Running strace on dhclient shows that it writes the dhcp request, but it never recvs a dhcp response. If I manually bring the interface up with a static IP address then I'm able to run typical IP traffic across the link (aka, ping). It would seem that when dhclient registers a packet filter on the socket, that filter is preventing it from ever getting the dhcp response. The same dhclient works on any non-vlan interfaces in the system, so the filter must work for non-vlan interfaces. Aside from the fact that the interface is a vlan, we also use a priority egress map on the interface, and we use PFC flow control. Let me know if you need anymore to debug the issue, or email me off list and I can get you logins to my reproducer machines. When you say 4.4-rc7 kernel is failing for you, what latest kernel version was working, where the socket filter was properly receiving the response on your vlan iface? Are you reasonably sure that the skb is dropped at the BPF filter attached to the dhcp's packet socket? Can you dump the BPF code of the filter? Are there any vlan offloading settings the filter is not taking into account (in the sense of classic BPF extensions, tcpdump/libpcap finally managed to cope with this)? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH net-next 3/3] macsec: introduce IEEE 802.1AE driver
Sabrina Dubrocawrote: > + if (h->short_length) > + return len == h->short_length + 24; > + else > + return len >= 72; [..] > + return len == h->short_length + 32; [..] > + return len >= 80; [..] > + return len == 8 + icv_len + h->short_length; > + else > + return len >= 8 + icv_len + 48; [..] > + if (h->short_length) > + return len == 16 + icv_len + h->short_length; > + else > + return len >= 16 + icv_len + 48; Could you add some defines instead of magic numbers? > + tx_sa->next_pn++; > + if (tx_sa->next_pn == 0) { > + pr_notice("PN wrapped, transitionning to !oper\n"); Is that _notice intentional? I'm only asking because it seems we printk unconditionally in response to network traffic & I don't get what operator should do in response to that message. > +static void macsec_encrypt_done(struct crypto_async_request *base, int err) > +{ > + struct sk_buff *skb = base->data; > + struct net_device *dev = skb->dev; > + struct macsec_dev *macsec = macsec_priv(dev); > + struct macsec_tx_sa *sa = macsec_skb_cb(skb)->tx_sa; > + int len, ret; > + > + aead_request_free(macsec_skb_cb(skb)->req); > + > + rcu_read_lock_bh(); > + macsec_encrypt_finish(skb, dev); > + macsec_count_tx(skb, >secy.tx_sc, macsec_skb_cb(skb)->tx_sa); > + len = skb->len; > + ret = dev_queue_xmit(skb); > + count_tx(dev, ret, len); > + rcu_read_unlock_bh(); What was the rcu_read_lock_bh protecting? > +static void macsec_decrypt_done(struct crypto_async_request *base, int err) > +{ > + struct sk_buff *skb = base->data; > + struct net_device *dev = skb->dev; > + struct macsec_dev *macsec = macsec_priv(dev); > + struct macsec_rx_sa *rx_sa = macsec_skb_cb(skb)->rx_sa; > + int len, ret; > + > + aead_request_free(macsec_skb_cb(skb)->req); > + > + rcu_read_lock_bh(); > + macsec_finalize_skb(skb, macsec->secy.icv_len, > + macsec_extra_len(macsec_skb_cb(skb)->has_sci)); > + macsec_reset_skb(skb, macsec->secy.netdev); > + > + macsec_rxsa_put(rx_sa); > + len = skb->len; > + ret = netif_rx(skb); > + if (ret == NET_RX_SUCCESS) > + count_rx(dev, len); > + else > + macsec->secy.netdev->stats.rx_dropped++; > + > + rcu_read_unlock_bh(); Same question. > +static void handle_not_macsec(struct sk_buff *skb) > +{ > + struct macsec_rxh_data *rxd = macsec_data_rcu(skb->dev); > + struct macsec_dev *macsec; > + > + /* 10.6 If the management control validateFrames is not > + * Strict, frames without a SecTAG are received, counted, and > + * delivered to the Controlled Port > + */ > + list_for_each_entry_rcu(macsec, >secys, secys) { > + struct sk_buff *nskb; > + int ret; > + struct pcpu_secy_stats *secy_stats = > this_cpu_ptr(macsec->stats); > + > + if (macsec->secy.validate_frames == MACSEC_VALIDATE_STRICT) { > + u64_stats_update_begin(_stats->syncp); > + secy_stats->stats.InPktsNoTag++; > + u64_stats_update_end(_stats->syncp); > + continue; > + } > + > + /* deliver on this port */ > + nskb = skb_clone(skb, GFP_ATOMIC); > + nskb->dev = macsec->secy.netdev; nskb == NULL handling? > +static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb) > +{ > + struct sk_buff *skb = *pskb; > + struct net_device *dev = skb->dev; > + struct macsec_eth_header *hdr; > + struct macsec_secy *secy = NULL; > + struct macsec_rx_sc *rx_sc; > + struct macsec_rx_sa *rx_sa; > + struct macsec_rxh_data *rxd; > + struct macsec_dev *macsec; > + sci_t sci; > + u32 pn, lowest_pn; > + bool cbit; > + struct pcpu_rx_sc_stats *rxsc_stats; > + struct pcpu_secy_stats *secy_stats; > + bool pulled_sci; > + > + rcu_read_lock_bh(); Why? Seems its because of > + if (skb_headroom(skb) < ETH_HLEN) > + goto drop_nosa; > + > + rxd = macsec_data_rcu(skb->dev); this, but rxd isn't dereferenced until a lot later in the function. Also: macsec_data_rcu uses rcu_dereference() but this used rcu_read_lock_bh, is that structure protected by RCU or RCU-bh? > + pn = ntohl(hdr->packet_number); > + if (secy->replay_protect) { > + bool late; > + > + spin_lock(_sa->lock); > + late = rx_sa->next_pn >= secy->replay_window && > +pn < (rx_sa->next_pn - secy->replay_window); > + spin_unlock(_sa->lock); > + > + if (late) { > + u64_stats_update_begin(_stats->syncp); > +
Re: 4.4-rc7 failure report
On 12/28/2015 05:20 PM, Daniel Borkmann wrote: > On 12/28/2015 10:53 PM, Doug Ledford wrote: >> The 4.4-rc7 kernel is failing for me. In my case, all of my vlan >> interfaces are failing to obtain a dhcp address using dhclient. I've >> tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both >> failed. I've tried NetworkManager and the old SysV network service, >> both fail. I tried a working dhclient from rhel7 on the Fedora rawhide >> install and it failed too. Running tcpdump on the interface shows the >> dhcp request going out, and a dhcp response coming back in. Running >> strace on dhclient shows that it writes the dhcp request, but it never >> recvs a dhcp response. If I manually bring the interface up with a >> static IP address then I'm able to run typical IP traffic across the >> link (aka, ping). It would seem that when dhclient registers a packet >> filter on the socket, that filter is preventing it from ever getting the >> dhcp response. The same dhclient works on any non-vlan interfaces in >> the system, so the filter must work for non-vlan interfaces. Aside from >> the fact that the interface is a vlan, we also use a priority egress map >> on the interface, and we use PFC flow control. Let me know if you need >> anymore to debug the issue, or email me off list and I can get you >> logins to my reproducer machines. > > When you say 4.4-rc7 kernel is failing for you, what latest kernel version > was working, where the socket filter was properly receiving the response on > your vlan iface? v4.3 final works. I haven't bisected where in the 4.4 series it quits working. I can do that tomorrow. > Are you reasonably sure that the skb is dropped at the BPF > filter attached to the dhcp's packet socket? No. I'm only reasonably sure that without a filter it works, I don't know if it gets dropped at the BPF filter or something else when the filter is added. It could be an interaction between the filter and PFC or vlan or anything else for all I know. But I figured the level of detail I gave should make it easy to reproduce locally by interested parties. > Can you dump the BPF code of > the filter? It's whatever the filter is that dhclient uses. I'm pretty sure that's a pretty standard filter. And I tried known working dhclients in order to make sure it wasn't the latest dhclient's fault. > Are there any vlan offloading settings the filter is not taking > into account (in the sense of classic BPF extensions, tcpdump/libpcap > finally > managed to cope with this)? Have no clue. -- Doug LedfordGPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
[PATCH 2/2] ixgbe: restrict synchronization of link_up and speed
From: Zhu YanjunWhen the X540 NIC acts as a slave of some virtual NICs, it is very important to synchronize link_up and link_speed, such as a bonding driver in 802.3ad mode. When X540 NIC acts as an independent interface, it is not necessary to synchronize link_up and link_speed. That is, the time span between link_up and link_speed is acceptable. Signed-off-by: Zhu Yanjun --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index ace21b9..1bb6056 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6436,8 +6436,15 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter) * time. To X540 NIC, there is a time span between link_up and * link_speed. As such, only continue if link_up and link_speed are * ready to X540 NIC. +* The time span between link_up and link_speed is very important +* when the X540 NIC acts as a slave in some virtual NICs, such as +* a bonding driver in 802.3ad mode. When X540 NIC acts as an +* independent interface, it is not necessary to synchronize link_up +* and link_speed. +* In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN) */ - if (hw->mac.type == ixgbe_mac_X540) + if ((hw->mac.type == ixgbe_mac_X540) && + (netdev->flags & IFF_SLAVE)) if (link_speed == IXGBE_LINK_SPEED_UNKNOWN) return; -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V3] ixgbe: force to synchronize link_up and speed as a slave
Hi, all During the discussion with Emil, I think the time span between link_up and link_speed is not important when the X540 NIC acts as an independent interface. As such, a new patch is made to restrict synchronization between link_up and link_speed. Any reply is appreciated. Zhu Yanjun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] ixgbe: force to synchronize reporting "link on" and getting speed
From: yzhu1In X540 NIC, there is a time span between reporting "link on" and getting the speed and duplex. To a bonding driver in 802.3ad mode, this time span will make it not work well if the time span is big enough. The big time span will make bonding driver change the state of the slave device to up while the speed and duplex of the slave device can not be gotten. Later the bonding driver will not have change to get the speed and duplex of the slave device. The speed and duplex of the slave device are important to a bonding driver in 802.3ad mode. To 82599_SFP NIC and other kinds of NICs, this problem does not exist. As such, it is necessary for X540 to report"link on" when the link speed is not IXGBE_LINK_SPEED_UNKNOWN. Signed-off-by: yzhu1 Signed-off-by: Zhu Yanjun --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c index aed8d02..ace21b9 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c @@ -6426,6 +6426,21 @@ static void ixgbe_watchdog_link_is_up(struct ixgbe_adapter *adapter) if (netif_carrier_ok(netdev)) return; + /* In X540 NIC, there is a time span between reporting "link on" +* and getting the speed and duplex. To a bonding driver in 802.3ad +* mode, this time span will make it not work well if the time span +* is big enough. To 82599_SFP NIC and other kinds of NICs, this +* problem does not exist. As such, it is better for X540 to report +* "link on" when the link speed is not IXGBE_LINK_SPEED_UNKNOWN. +* To other NICs, the link_up and link_speed are gotten at the same +* time. To X540 NIC, there is a time span between link_up and +* link_speed. As such, only continue if link_up and link_speed are +* ready to X540 NIC. +*/ + if (hw->mac.type == ixgbe_mac_X540) + if (link_speed == IXGBE_LINK_SPEED_UNKNOWN) + return; + adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP; switch (hw->mac.type) { -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote: >wrote: > >In 802.3ad mode, the speed and duplex is needed. But in some NIC, > >there is a time span between NIC up state and getting speed and duplex. > >As such, sometimes a slave in 802.3ad mode is in up state without > >speed and duplex. This will make bonding in 802.3ad mode can not > >work well. > >To make bonding driver be compatible with more NICs, it is > >necessary to restrict the up state in 802.3ad mode. > > What device is this? It seems a bit odd that an Ethernet device > can be carrier up but not have the duplex and speed available. ... > In general, though, bonding expects a speed or duplex change to > be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would > propagate to the 802.3ad logic. > > If the device here is going carrier up prior to having speed or > duplex available, then maybe it should call netdev_state_change() when > the duplex and speed are available, or delay calling netif_carrier_on(). I have encountered this problem (NIC having carrier on before being able to detect speed/duplex and driver not notifying when speed/duplex becomes available) with netxen cards earlier. But it was eventually fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event handling.") so this example rather supports what you said. Michal Kubecek -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bpf: hash: use per-bucket spinlock
On Mon, Dec 28, 2015 at 5:13 PM, Daniel Borkmannwrote: > On 12/26/2015 10:31 AM, Ming Lei wrote: >> >> From: Ming Lei >> >> Both htab_map_update_elem() and htab_map_delete_elem() can be >> called from eBPF program, and they may be in kernel hot path, >> so it isn't efficient to use a per-hashtable lock in this two >> helpers. >> >> The per-hashtable spinlock is used just for protecting bucket's >> hlist, and per-bucket lock should be enough. This patch converts >> the per-hashtable lock into per-bucket spinlock, so that contention >> can be decreased a lot. >> >> Signed-off-by: Ming Lei >> --- >> kernel/bpf/hashtab.c | 35 +-- >> 1 file changed, 25 insertions(+), 10 deletions(-) >> >> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c >> index d857fcb..66bad7a 100644 >> --- a/kernel/bpf/hashtab.c >> +++ b/kernel/bpf/hashtab.c >> @@ -14,9 +14,14 @@ >> #include >> #include >> >> +struct bucket { >> + struct hlist_head head; >> + raw_spinlock_t lock; >> +}; >> + >> struct bpf_htab { >> struct bpf_map map; >> - struct hlist_head *buckets; >> + struct bucket *buckets; >> raw_spinlock_t lock; > > > Shouldn't the lock member be removed from here? Hammm, looks this one is a bad version, sorry for that, and will resend it. > > >> atomic_t count; /* number of elements in this hashtable */ >> u32 n_buckets; /* number of hash buckets */ >> @@ -88,24 +93,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr >> *attr) >> /* make sure page count doesn't overflow */ >> goto free_htab; >> >> - htab->map.pages = round_up(htab->n_buckets * sizeof(struct >> hlist_head) + >> + htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) >> + >>htab->elem_size * >> htab->map.max_entries, >>PAGE_SIZE) >> PAGE_SHIFT; >> >> err = -ENOMEM; >> - htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct >> hlist_head), >> + htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct >> bucket), >> GFP_USER | __GFP_NOWARN); >> >> if (!htab->buckets) { >> - htab->buckets = vmalloc(htab->n_buckets * sizeof(struct >> hlist_head)); >> + htab->buckets = vmalloc(htab->n_buckets * sizeof(struct >> bucket)); >> if (!htab->buckets) >> goto free_htab; >> } >> >> - for (i = 0; i < htab->n_buckets; i++) >> - INIT_HLIST_HEAD(>buckets[i]); >> + for (i = 0; i < htab->n_buckets; i++) { >> + INIT_HLIST_HEAD(>buckets[i].head); >> + raw_spin_lock_init(>buckets[i].lock); >> + } >> >> - raw_spin_lock_init(>lock); >> atomic_set(>count, 0); >> >> return >map; >> @@ -120,11 +126,16 @@ static inline u32 htab_map_hash(const void *key, u32 >> key_len) >> return jhash(key, key_len, 0); >> } >> >> -static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 >> hash) >> +static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 >> hash) >> { >> return >buckets[hash & (htab->n_buckets - 1)]; >> } >> >> +static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 >> hash) >> +{ >> + return &__select_bucket(htab, hash)->head; >> +} >> + >> static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 >> hash, >> void *key, u32 key_size) >> { >> @@ -227,6 +238,7 @@ static int htab_map_update_elem(struct bpf_map *map, >> void *key, void *value, >> struct bpf_htab *htab = container_of(map, struct bpf_htab, map); >> struct htab_elem *l_new, *l_old; >> struct hlist_head *head; >> + struct bucket *b; >> unsigned long flags; >> u32 key_size; >> int ret; >> @@ -248,7 +260,8 @@ static int htab_map_update_elem(struct bpf_map *map, >> void *key, void *value, >> memcpy(l_new->key + round_up(key_size, 8), value, >> map->value_size); >> >> l_new->hash = htab_map_hash(l_new->key, key_size); >> - head = select_bucket(htab, l_new->hash); >> + b = __select_bucket(htab, l_new->hash); >> + head = >head; >> >> /* bpf_map_update_elem() can be called in_irq() */ >> raw_spin_lock_irqsave(>lock, flags); > > > I am a bit confused on this one, though. > > The raw spin lock is still per hashtable (htab->lock), and has not been > converted in > this patch to the per bucket one (as opposed to what the commit message > says), so > this patch seems to go into the right direction, but is a bit incomplete? > Shouldn't > the above f.e. take b->lock, etc? > >> @@ -299,6 +312,7 @@ static int htab_map_delete_elem(struct bpf_map *map, >> void *key) >> {
Re: [PATCH net-next v2 05/10] be2net: log digital signature errors while flashing FW image
Hi Suresh, [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Sathya-Perla/be2net-patch-set/20151228-145423 config: x86_64-allmodconfig (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All warnings (new ones prefixed by >>): drivers/net/ethernet/emulex/benet/be_cmds.c: In function 'be_flash_skyhawk': >> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration >> value 'MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES' not handled in switch >> [-Wswitch] switch (addl_status(status)) { ^ >> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration >> value 'MCC_ADDL_STATUS_FLASH_IMAGE_CRC_MISMATCH' not handled in switch >> [-Wswitch] >> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration >> value 'MCC_ADDL_STATUS_TOO_MANY_INTERFACES' not handled in switch [-Wswitch] >> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration >> value 'MCC_ADDL_STATUS_INSUFFICIENT_VLANS' not handled in switch [-Wswitch] vim +/MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES +2963 drivers/net/ethernet/emulex/benet/be_cmds.c 2947 flash_offset_support = false; 2948 goto retry_flash; 2949 } 2950 2951 /* For old FW images ignore ILLEGAL_FIELD error or errors on 2952 * UFI_DIR region 2953 */ 2954 if (old_fw_img && 2955 (base_status(status) == MCC_STATUS_ILLEGAL_FIELD || 2956 (img_optype == OPTYPE_UFI_DIR && 2957base_status(status) == MCC_STATUS_FAILED))) { 2958 continue; 2959 } else if (status) { 2960 dev_err(dev, "Flashing section type 0x%x failed\n", 2961 img_type); 2962 > 2963 switch (addl_status(status)) { 2964 case MCC_ADDL_STATUS_MISSING_SIGNATURE: 2965 dev_err(dev, 2966 "Digital signature missing in FW\n"); 2967 return -EINVAL; 2968 case MCC_ADDL_STATUS_INVALID_SIGNATURE: 2969 dev_err(dev, 2970 "Invalid digital signature in FW\n"); 2971 return -EINVAL; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH 1/3] bpf: hash: use atomic count
On 12/26/2015 10:31 AM, Ming Lei wrote: Preparing for removing global per-hashtable lock, so the counter need to be defined as aotmic_t first. Signed-off-by: Ming LeiAcked-by: Daniel Borkmann -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode
On 12/28/2015 04:43 PM, Michal Kubecek wrote: On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:wrote: In 802.3ad mode, the speed and duplex is needed. But in some NIC, there is a time span between NIC up state and getting speed and duplex. As such, sometimes a slave in 802.3ad mode is in up state without speed and duplex. This will make bonding in 802.3ad mode can not work well. To make bonding driver be compatible with more NICs, it is necessary to restrict the up state in 802.3ad mode. What device is this? It seems a bit odd that an Ethernet device can be carrier up but not have the duplex and speed available. ... In general, though, bonding expects a speed or duplex change to be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would propagate to the 802.3ad logic. If the device here is going carrier up prior to having speed or duplex available, then maybe it should call netdev_state_change() when the duplex and speed are available, or delay calling netif_carrier_on(). I have encountered this problem (NIC having carrier on before being able to detect speed/duplex and driver not notifying when speed/duplex becomes available) with netxen cards earlier. But it was eventually fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event handling.") so this example rather supports what you said. Michal Kubecek Thanks a lot. I checked the commit 9d01412ae76f ("netxen: Fix link event handling."). The symptoms are the same with mine. The root cause is different. In my problem, the root cause is that LINKS register can not provide link_up and link_speed at the same time. There is a time span between link_up and link_speed. My solution is to force to synchronize link_up and link_speed in ixgbe X540 NIC. Best Regards! Zhu Yanjun -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats
On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote: > From: David Miller> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST) > > > From: Andrew Lunn > > Date: Sun, 27 Dec 2015 12:58:25 +0100 > > > >> This patchset add ethtool support for reading statistics from the PHY. > >> The Marvell and Micrel Phys are then extended to report receiver > >> packet errors and idle errors. > >> > >> v2: > >> Fix linking when phylib is not enabled. > > > > Series applied, thanks Andrew. > > I have to revert this. > > You can't call into PHY library code from net/core/ethtool.c, > because PHY lib can be modular. O.K. How about we change PHYLIB from a tristate to a bool? Thanks Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] bpf: hash: use per-bucket spinlock
On 12/26/2015 10:31 AM, Ming Lei wrote: From: Ming LeiBoth htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used just for protecting bucket's hlist, and per-bucket lock should be enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei --- kernel/bpf/hashtab.c | 35 +-- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index d857fcb..66bad7a 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,9 +14,14 @@ #include #include +struct bucket { + struct hlist_head head; + raw_spinlock_t lock; +}; + struct bpf_htab { struct bpf_map map; - struct hlist_head *buckets; + struct bucket *buckets; raw_spinlock_t lock; Shouldn't the lock member be removed from here? atomic_t count; /* number of elements in this hashtable */ u32 n_buckets; /* number of hash buckets */ @@ -88,24 +93,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) /* make sure page count doesn't overflow */ goto free_htab; - htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) + + htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) + htab->elem_size * htab->map.max_entries, PAGE_SIZE) >> PAGE_SHIFT; err = -ENOMEM; - htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct hlist_head), + htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER | __GFP_NOWARN); if (!htab->buckets) { - htab->buckets = vmalloc(htab->n_buckets * sizeof(struct hlist_head)); + htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket)); if (!htab->buckets) goto free_htab; } - for (i = 0; i < htab->n_buckets; i++) - INIT_HLIST_HEAD(>buckets[i]); + for (i = 0; i < htab->n_buckets; i++) { + INIT_HLIST_HEAD(>buckets[i].head); + raw_spin_lock_init(>buckets[i].lock); + } - raw_spin_lock_init(>lock); atomic_set(>count, 0); return >map; @@ -120,11 +126,16 @@ static inline u32 htab_map_hash(const void *key, u32 key_len) return jhash(key, key_len, 0); } -static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) +static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) { return >buckets[hash & (htab->n_buckets - 1)]; } +static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) +{ + return &__select_bucket(htab, hash)->head; +} + static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash, void *key, u32 key_size) { @@ -227,6 +238,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old; struct hlist_head *head; + struct bucket *b; unsigned long flags; u32 key_size; int ret; @@ -248,7 +260,8 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, memcpy(l_new->key + round_up(key_size, 8), value, map->value_size); l_new->hash = htab_map_hash(l_new->key, key_size); - head = select_bucket(htab, l_new->hash); + b = __select_bucket(htab, l_new->hash); + head = >head; /* bpf_map_update_elem() can be called in_irq() */ raw_spin_lock_irqsave(>lock, flags); I am a bit confused on this one, though. The raw spin lock is still per hashtable (htab->lock), and has not been converted in this patch to the per bucket one (as opposed to what the commit message says), so this patch seems to go into the right direction, but is a bit incomplete? Shouldn't the above f.e. take b->lock, etc? @@ -299,6 +312,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_head *head; + struct bucket *b; struct htab_elem *l; unsigned long flags; u32 hash, key_size; @@ -309,7 +323,8 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) key_size = map->key_size; hash = htab_map_hash(key, key_size); - head = select_bucket(htab, hash); + b = __select_bucket(htab, hash); + head = >head; raw_spin_lock_irqsave(>lock,
pull-request: wireless-drivers 2015-12-28
Hi Dave, here's one more pull request, a bit late due to holidays but I hope this still makes it to 4.4. Just two small fixes to iwlwifi, nothing else. Kalle The following changes since commit eeec5d0ef7ee54a75e09e861c3cc44177b8752c7: rtlwifi: rtl8821ae: Fix lockups on boot (2015-11-17 15:58:53 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git tags/wireless-drivers-for-davem-2015-12-28 for you to fetch changes up to 01d85b9b2b6bec2b0773cf2afc58699dc4b052f8: Merge tag 'iwlwifi-for-kalle-2015-12-16' of https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes (2015-12-18 14:57:02 +0200) iwlwifi * don't load firmware that won't exist for 7260 * fix RCU splat Johannes Berg (2): iwlwifi: separate firmware version for 7260 devices iwlwifi: mvm: protect RCU dereference in iwl_mvm_get_key_sta_id Kalle Valo (1): Merge tag 'iwlwifi-for-kalle-2015-12-16' of https://git.kernel.org/.../iwlwifi/iwlwifi-fixes drivers/net/wireless/iwlwifi/iwl-7000.c | 49 ++- drivers/net/wireless/iwlwifi/mvm/sta.c | 15 ++ 2 files changed, 44 insertions(+), 20 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] bpf: hash: move select_bucket() out of htab's spinlock
On 12/26/2015 10:31 AM, Ming Lei wrote: The spinlock is just used for protecting the per-bucket hlist, so it isn't needed for selecting bucket. Signed-off-by: Ming LeiAcked-by: Daniel Borkmann -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
skb_clone query
Hi, Just out of curisity, I was looking at Kernel 3.14, skb_clone function in f_ncm.c http://lxr.free-electrons.com/source/drivers/usb/gadget/f_ncm.c?v=3.14#L1063 QUERY : Shouldn't a kfree_skb(skb2); happen before goto err @1070 ? Is this not a memleak ? 1068 if (!skb_pull(skb2, index)) { 1069 ret = -EOVERFLOW; 1070 goto err; 1071 } Thanks, Pavi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()
Recent fix "net: add length argument to skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable branches, namely stable-3.2.y: commit 127500d724f8 stable-3.12.y: commit 3e1ac3aafbd0 doesn't handle truncated reads correctly. If read length is shorter than incoming datagram (but non-zero) and first segment of target iovec is sufficient for read length, skb_copy_and_csum_datagram() is used to copy checksum the data while copying it. For truncated reads this means only the copied part is checksummed (rather than the whole datagram) so that the check almost always fails. Add checksum of the remaining part so that the proper checksum of the whole datagram is computed and checked. Special care must be taken if the copied length is odd. For zero read length, we don't have to copy anything but we still should check the checksum so that a peek doesn't return with a datagram which is invalid and wouldn't be returned by an actual read. Signed-off-by: Michal Kubecek--- net/core/datagram.c | 26 +- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/net/core/datagram.c b/net/core/datagram.c index f22f120771ef..af4bf368257c 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, int hlen, struct iovec *iov, int len) { __wsum csum; - int chunk = skb->len - hlen; + int full_chunk = skb->len - hlen; + int chunk = min_t(int, full_chunk, len); - if (chunk > len) - chunk = len; - - if (!chunk) + if (!chunk) { + if (__skb_checksum_complete(skb)) + goto csum_error; return 0; + } /* Skip filled elements. * Pretty silly, look at memcpy_toiovec, though 8) @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base, chunk, )) goto fault; + if (full_chunk > chunk) { + if (chunk % 2) { + __be16 odd = 0; + + if (skb_copy_bits(skb, hlen + chunk, + (char *) + 1, 1)) + goto fault; + csum = add32_with_carry(odd, csum); + csum = skb_checksum(skb, hlen + chunk + 1, + full_chunk - chunk - 1, + csum); + } else + csum = skb_checksum(skb, hlen + chunk, + full_chunk - chunk, csum); + } if (csum_fold(csum)) goto csum_error; if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] unix: properly account for FDs passed over unix sockets
It is possible for a process to allocate and accumulate far more FDs than the process' limit by sending them over a unix socket then closing them to keep the process' fd count low. This change addresses this problem by keeping track of the number of FDs in flight per user and preventing non-privileged processes from having more FDs in flight than their configured FD limit. Reported-by: socketp...@gmail.com Suggested-by: Linus TorvaldsSigned-off-by: Willy Tarreau --- It would be nice if (if accepted) it would be backported to -stable as the issue is currently exploitable. --- include/linux/sched.h | 1 + net/unix/af_unix.c| 24 net/unix/garbage.c| 13 - 3 files changed, 29 insertions(+), 9 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index edad7a4..fbf25f1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -830,6 +830,7 @@ struct user_struct { unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */ #endif unsigned long locked_shm; /* How many pages of mlocked shm ? */ + unsigned long unix_inflight;/* How many files in flight in unix sockets */ #ifdef CONFIG_KEYS struct key *uid_keyring;/* UID specific keyring */ diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 45aebd9..d6d7b43 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1499,6 +1499,21 @@ static void unix_destruct_scm(struct sk_buff *skb) sock_wfree(skb); } +/* + * The "user->unix_inflight" variable is protected by the garbage + * collection lock, and we just read it locklessly here. If you go + * over the limit, there might be a tiny race in actually noticing + * it across threads. Tough. + */ +static inline bool too_many_unix_fds(struct task_struct *p) +{ + struct user_struct *user = current_user(); + + if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE))) + return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); + return false; +} + #define MAX_RECURSION_LEVEL 4 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) @@ -1507,6 +1522,9 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) unsigned char max_level = 0; int unix_sock_count = 0; + if (too_many_unix_fds(current)) + return -ETOOMANYREFS; + for (i = scm->fp->count - 1; i >= 0; i--) { struct sock *sk = unix_get_socket(scm->fp->fp[i]); @@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) if (!UNIXCB(skb).fp) return -ENOMEM; - if (unix_sock_count) { - for (i = scm->fp->count - 1; i >= 0; i--) - unix_inflight(scm->fp->fp[i]); - } + for (i = scm->fp->count - 1; i >= 0; i--) + unix_inflight(scm->fp->fp[i]); return max_level; } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index a73a226..8fcdc22 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -120,11 +120,11 @@ void unix_inflight(struct file *fp) { struct sock *s = unix_get_socket(fp); + spin_lock(_gc_lock); + if (s) { struct unix_sock *u = unix_sk(s); - spin_lock(_gc_lock); - if (atomic_long_inc_return(>inflight) == 1) { BUG_ON(!list_empty(>link)); list_add_tail(>link, _inflight_list); @@ -132,25 +132,28 @@ void unix_inflight(struct file *fp) BUG_ON(list_empty(>link)); } unix_tot_inflight++; - spin_unlock(_gc_lock); } + fp->f_cred->user->unix_inflight++; + spin_unlock(_gc_lock); } void unix_notinflight(struct file *fp) { struct sock *s = unix_get_socket(fp); + spin_lock(_gc_lock); + if (s) { struct unix_sock *u = unix_sk(s); - spin_lock(_gc_lock); BUG_ON(list_empty(>link)); if (atomic_long_dec_and_test(>inflight)) list_del_init(>link); unix_tot_inflight--; - spin_unlock(_gc_lock); } + fp->f_cred->user->unix_inflight--; + spin_unlock(_gc_lock); } static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), -- 1.7.12.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 1/3] bpf: hash: use atomic count
Preparing for removing global per-hashtable lock, so the counter need to be defined as aotmic_t first. Acked-by: Daniel BorkmannSigned-off-by: Ming Lei --- kernel/bpf/hashtab.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 34777b3..2615388 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -18,7 +18,7 @@ struct bpf_htab { struct bpf_map map; struct hlist_head *buckets; raw_spinlock_t lock; - u32 count; /* number of elements in this hashtable */ + atomic_t count; /* number of elements in this hashtable */ u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ }; @@ -106,7 +106,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) INIT_HLIST_HEAD(>buckets[i]); raw_spin_lock_init(>lock); - htab->count = 0; + atomic_set(>count, 0); return >map; @@ -256,7 +256,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, l_old = lookup_elem_raw(head, l_new->hash, key, key_size); - if (!l_old && unlikely(htab->count >= map->max_entries)) { + if (!l_old && unlikely(atomic_read(>count) >= map->max_entries)) { /* if elem with this 'key' doesn't exist and we've reached * max_entries limit, fail insertion of new elem */ @@ -284,7 +284,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, hlist_del_rcu(_old->hash_node); kfree_rcu(l_old, rcu); } else { - htab->count++; + atomic_inc(>count); } raw_spin_unlock_irqrestore(>lock, flags); @@ -319,7 +319,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) if (l) { hlist_del_rcu(>hash_node); - htab->count--; + atomic_dec(>count); kfree_rcu(l, rcu); ret = 0; } @@ -339,7 +339,7 @@ static void delete_all_elements(struct bpf_htab *htab) hlist_for_each_entry_safe(l, n, head, hash_node) { hlist_del_rcu(>hash_node); - htab->count--; + atomic_dec(>count); kfree(l); } } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 3/3] bpf: hash: use per-bucket spinlock
Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used for protecting bucket's hlist, and per-bucket lock is just enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei--- kernel/bpf/hashtab.c | 46 ++ 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index d857fcb..67222a9 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,10 +14,14 @@ #include #include +struct bucket { + struct hlist_head head; + raw_spinlock_t lock; +}; + struct bpf_htab { struct bpf_map map; - struct hlist_head *buckets; - raw_spinlock_t lock; + struct bucket *buckets; atomic_t count; /* number of elements in this hashtable */ u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ @@ -88,24 +92,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) /* make sure page count doesn't overflow */ goto free_htab; - htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) + + htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) + htab->elem_size * htab->map.max_entries, PAGE_SIZE) >> PAGE_SHIFT; err = -ENOMEM; - htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct hlist_head), + htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER | __GFP_NOWARN); if (!htab->buckets) { - htab->buckets = vmalloc(htab->n_buckets * sizeof(struct hlist_head)); + htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket)); if (!htab->buckets) goto free_htab; } - for (i = 0; i < htab->n_buckets; i++) - INIT_HLIST_HEAD(>buckets[i]); + for (i = 0; i < htab->n_buckets; i++) { + INIT_HLIST_HEAD(>buckets[i].head); + raw_spin_lock_init(>buckets[i].lock); + } - raw_spin_lock_init(>lock); atomic_set(>count, 0); return >map; @@ -120,11 +125,16 @@ static inline u32 htab_map_hash(const void *key, u32 key_len) return jhash(key, key_len, 0); } -static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) +static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash) { return >buckets[hash & (htab->n_buckets - 1)]; } +static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash) +{ + return &__select_bucket(htab, hash)->head; +} + static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash, void *key, u32 key_size) { @@ -227,6 +237,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l_new, *l_old; struct hlist_head *head; + struct bucket *b; unsigned long flags; u32 key_size; int ret; @@ -248,10 +259,11 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, memcpy(l_new->key + round_up(key_size, 8), value, map->value_size); l_new->hash = htab_map_hash(l_new->key, key_size); - head = select_bucket(htab, l_new->hash); + b = __select_bucket(htab, l_new->hash); + head = >head; /* bpf_map_update_elem() can be called in_irq() */ - raw_spin_lock_irqsave(>lock, flags); + raw_spin_lock_irqsave(>lock, flags); l_old = lookup_elem_raw(head, l_new->hash, key, key_size); @@ -285,11 +297,11 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, } else { atomic_inc(>count); } - raw_spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); return 0; err: - raw_spin_unlock_irqrestore(>lock, flags); + raw_spin_unlock_irqrestore(>lock, flags); kfree(l_new); return ret; } @@ -299,6 +311,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct hlist_head *head; + struct bucket *b; struct htab_elem *l; unsigned long flags; u32 hash, key_size; @@ -309,9 +322,10 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) key_size = map->key_size; hash = htab_map_hash(key, key_size);
Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()
Hello Michal, 2015-12-28, 15:01:57 +0100, Michal Kubecek wrote: > Recent fix "net: add length argument to > skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable > branches, namely > > stable-3.2.y: commit 127500d724f8 > stable-3.12.y: commit 3e1ac3aafbd0 > > doesn't handle truncated reads correctly. If read length is shorter than > incoming datagram (but non-zero) and first segment of target iovec is > sufficient for read length, skb_copy_and_csum_datagram() is used to copy > checksum the data while copying it. For truncated reads this means only > the copied part is checksummed (rather than the whole datagram) so that > the check almost always fails. I just ran into this issue too, sorry I didn't notice it earlier :( > Add checksum of the remaining part so that the proper checksum of the > whole datagram is computed and checked. Special care must be taken if > the copied length is odd. > > For zero read length, we don't have to copy anything but we still should > check the checksum so that a peek doesn't return with a datagram which > is invalid and wouldn't be returned by an actual read. > > Signed-off-by: Michal Kubecek> --- > net/core/datagram.c | 26 +- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/net/core/datagram.c b/net/core/datagram.c > index f22f120771ef..af4bf368257c 100644 > --- a/net/core/datagram.c > +++ b/net/core/datagram.c > @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff > *skb, >int hlen, struct iovec *iov, int len) > { > __wsum csum; > - int chunk = skb->len - hlen; > + int full_chunk = skb->len - hlen; > + int chunk = min_t(int, full_chunk, len); > > - if (chunk > len) > - chunk = len; > - > - if (!chunk) > + if (!chunk) { > + if (__skb_checksum_complete(skb)) > + goto csum_error; > return 0; > + } > > /* Skip filled elements. >* Pretty silly, look at memcpy_toiovec, though 8) > @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb, > if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base, > chunk, )) > goto fault; > + if (full_chunk > chunk) { > + if (chunk % 2) { > + __be16 odd = 0; > + > + if (skb_copy_bits(skb, hlen + chunk, > + (char *) + 1, 1)) > + goto fault; > + csum = add32_with_carry(odd, csum); > + csum = skb_checksum(skb, hlen + chunk + 1, > + full_chunk - chunk - 1, > + csum); > + } else > + csum = skb_checksum(skb, hlen + chunk, > + full_chunk - chunk, csum); > + } > if (csum_fold(csum)) > goto csum_error; > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) > -- > 2.6.4 This adds quite a bit of complexity. I am considering a revert of my buggy patch, and use what Eric Dumazet suggested instead: https://patchwork.ozlabs.org/patch/543562/ What do you think? Eric, would you submit your patch formally? Thanks -- Sabrina -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next 1/3] uapi: add MACsec bits
Signed-off-by: Sabrina DubrocaReviewed-by: Hannes Frederic Sowa --- include/uapi/linux/Kbuild | 1 + include/uapi/linux/if_ether.h | 1 + include/uapi/linux/if_link.h | 21 ++ include/uapi/linux/if_macsec.h | 160 + 4 files changed, 183 insertions(+) create mode 100644 include/uapi/linux/if_macsec.h diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild index c2e5d6cb34e3..3ee4211af77b 100644 --- a/include/uapi/linux/Kbuild +++ b/include/uapi/linux/Kbuild @@ -173,6 +173,7 @@ header-y += if_hippi.h header-y += if_infiniband.h header-y += if_link.h header-y += if_ltalk.h +header-y += if_macsec.h header-y += if_packet.h header-y += if_phonet.h header-y += if_plip.h diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h index ea9221b0331a..4a93051c578c 100644 --- a/include/uapi/linux/if_ether.h +++ b/include/uapi/linux/if_ether.h @@ -83,6 +83,7 @@ #define ETH_P_8021AD 0x88A8 /* 802.1ad Service VLAN */ #define ETH_P_802_EX1 0x88B5 /* 802.1 Local Experimental 1. */ #define ETH_P_TIPC 0x88CA /* TIPC */ +#define ETH_P_MACSEC 0x88E5 /* 802.1ae MACsec */ #define ETH_P_8021AH 0x88E7 /* 802.1ah Backbone Service Tag */ #define ETH_P_MVRP 0x88F5 /* 802.1Q MVRP */ #define ETH_P_1588 0x88F7 /* IEEE 1588 Timesync */ diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index a30b78090594..be0e6caf0bf0 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -401,6 +401,27 @@ enum { #define IFLA_VRF_MAX (__IFLA_VRF_MAX - 1) +/* MACSEC section */ +enum { + IFLA_MACSEC_UNSPEC, + IFLA_MACSEC_SCI, + IFLA_MACSEC_PORT, + IFLA_MACSEC_ICV_LEN, + IFLA_MACSEC_CIPHER_SUITE, + IFLA_MACSEC_WINDOW, + IFLA_MACSEC_ENCODING_SA, + IFLA_MACSEC_ENCRYPT, + IFLA_MACSEC_PROTECT, + IFLA_MACSEC_INC_SCI, + IFLA_MACSEC_ES, + IFLA_MACSEC_SCB, + IFLA_MACSEC_REPLAY_PROTECT, + IFLA_MACSEC_VALIDATION, + __IFLA_MACSEC_MAX, +}; + +#define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1) + /* IPVLAN section */ enum { IFLA_IPVLAN_UNSPEC, diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h new file mode 100644 index ..405cbf6a8c05 --- /dev/null +++ b/include/uapi/linux/if_macsec.h @@ -0,0 +1,160 @@ +/* + * include/uapi/linux/if_macsec.h - MACsec device + * + * Copyright (c) 2015 Sabrina Dubroca + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#ifndef _UAPI_MACSEC_H +#define _UAPI_MACSEC_H + +#define MACSEC_GENL_NAME "macsec" +#define MACSEC_GENL_VERSION 1 + +#define MACSEC_MAX_KEY_LEN 128 + + +#define DEFAULT_CIPHER_NAME "GCM-AES-128" +#define DEFAULT_CIPHER_ID 0x008002000101ULL +#define DEFAULT_CIPHER_ALT 0x0080C2000101ULL + +#define MACSEC_MIN_ICV_LEN 8 +#define MACSEC_MAX_ICV_LEN 32 + +enum macsec_attrs { + MACSEC_ATTR_UNSPEC, + MACSEC_ATTR_IFINDEX, + MACSEC_ATTR_SCI, + MACSEC_ATTR_PORT, + MACSEC_ATTR_ENCODING_SA, + MACSEC_ATTR_PN, + MACSEC_ATTR_WINDOW, + MACSEC_ATTR_AN, + MACSEC_ATTR_KEYID, + MACSEC_ATTR_KEY, + MACSEC_ATTR_CIPHER_SUITE, + MACSEC_ATTR_ICV_LEN, + MACSEC_TXSA_LIST, + MACSEC_RXSC_LIST, + MACSEC_TXSC_STATS, + MACSEC_SECY_STATS, + MACSEC_ATTR_SC_ACTIVE, + MACSEC_ATTR_SA_ACTIVE, + MACSEC_ATTR_PROTECT, + MACSEC_ATTR_REPLAY, + MACSEC_ATTR_OPER, + MACSEC_ATTR_VALIDATE, + MACSEC_ATTR_ENCRYPT, + MACSEC_ATTR_INC_SCI, + MACSEC_ATTR_ES, + MACSEC_ATTR_SCB, + __MACSEC_ATTR_END, + NUM_MACSEC_ATTR = __MACSEC_ATTR_END, + MACSEC_ATTR_MAX = __MACSEC_ATTR_END - 1, +}; + +enum macsec_sa_list_attrs { + MACSEC_SA_LIST_UNSPEC, + MACSEC_SA, + __MACSEC_ATTR_SA_LIST_MAX, + MACSEC_ATTR_SA_LIST_MAX = __MACSEC_ATTR_SA_LIST_MAX - 1, +}; + +enum macsec_rxsc_list_attrs { + MACSEC_RXSC_LIST_UNSPEC, + MACSEC_RXSC, + __MACSEC_ATTR_RXSC_LIST_MAX, + MACSEC_ATTR_RXSC_LIST_MAX = __MACSEC_ATTR_RXSC_LIST_MAX - 1, +}; + +enum macsec_rxsc_attrs { + MACSEC_ATTR_SC_UNSPEC, + MACSEC_ATTR_SC_STATE, + MACSEC_ATTR_SC_SCI, + MACSEC_RXSA_LIST, + MACSEC_RXSC_STATS, + __MACSEC_ATTR_SC_MAX, + MACSEC_ATTR_SC_MAX = __MACSEC_ATTR_SC_MAX - 1, +}; + +enum macsec_sa_attrs { + MACSEC_ATTR_SA_UNSPEC, + MACSEC_ATTR_SA_STATE, + MACSEC_ATTR_SA_AN, + MACSEC_ATTR_SA_PN, + MACSEC_ATTR_SA_KEYID, +
[RFC PATCH net-next 3/3] macsec: introduce IEEE 802.1AE driver
This is an implementation of MACsec/IEEE 802.1AE. This driver provides authentication and encryption of traffic in a LAN, typically with GCM-AES-128, and optional replay protection. http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf Signed-off-by: Sabrina DubrocaReviewed-by: Hannes Frederic Sowa --- drivers/net/Kconfig |7 + drivers/net/Makefile |1 + drivers/net/macsec.c | 2991 ++ 3 files changed, 2999 insertions(+) create mode 100644 drivers/net/macsec.c diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index f184fb5bd110..2a1ba62b7da2 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -193,6 +193,13 @@ config GENEVE To compile this driver as a module, choose M here: the module will be called geneve. +config MACSEC + tristate "IEEE 802.1AE MAC-level encryption (MACsec)" + select CRYPTO_AES + select CRYPTO_GCM + ---help--- + MACsec is an encryption standard for Ethernet. + config NETCONSOLE tristate "Network console logging support" ---help--- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 900b0c5320bb..1aa7cb845663 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_IPVLAN) += ipvlan/ obj-$(CONFIG_DUMMY) += dummy.o obj-$(CONFIG_EQUALIZER) += eql.o obj-$(CONFIG_IFB) += ifb.o +obj-$(CONFIG_MACSEC) += macsec.o obj-$(CONFIG_MACVLAN) += macvlan.o obj-$(CONFIG_MACVTAP) += macvtap.o obj-$(CONFIG_MII) += mii.o diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c new file mode 100644 index ..64975ef2c6af --- /dev/null +++ b/drivers/net/macsec.c @@ -0,0 +1,2991 @@ +/* + * drivers/net/macsec.c - MACsec device + * + * Copyright (c) 2015 Sabrina Dubroca + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +typedef u64 __bitwise sci_t; + +#define MACSEC_SCI_LEN 8 + +/* SecTAG length = macsec_eth_header without the optional SCI */ +#define MACSEC_TAG_LEN 6 + +struct macsec_eth_header { + struct ethhdr eth; + /* SecTAG */ + u8 tci_an; +#if defined(__LITTLE_ENDIAN_BITFIELD) + u8 short_length:6, + unused:2; +#elif defined(__BIG_ENDIAN_BITFIELD) + u8unused:2, + short_length:6; +#else +#error "Please fix " +#endif + __be32 packet_number; + u8 secure_channel_id[8]; /* optional */ +} __packed; + +#define MACSEC_TCI_VERSION 0x80 +#define MACSEC_TCI_ES 0x40 /* end station */ +#define MACSEC_TCI_SC 0x20 /* SCI present */ +#define MACSEC_TCI_SCB 0x10 /* epon */ +#define MACSEC_TCI_E 0x08 /* encryption */ +#define MACSEC_TCI_C 0x04 /* changed text */ +#define MACSEC_AN_MASK 0x03 /* association number */ +#define MACSEC_TCI_CONFID (MACSEC_TCI_E | MACSEC_TCI_C) + +#define MACSEC_SHORTLEN_THR 48 + +#define GCM_AES_IV_LEN 12 +#define DEFAULT_ICV_LEN 16 + +#define for_each_rxsc(secy, sc)\ + for (sc = rcu_dereference(secy->rx_sc); \ +sc;\ +sc = rcu_dereference(sc->next)) +#define for_each_rxsc_rtnl(secy, sc) \ + for (sc = rtnl_dereference(secy->rx_sc);\ +sc;\ +sc = rtnl_dereference(sc->next)) + +struct gcm_iv { + union { + u8 secure_channel_id[8]; + sci_t sci; + }; + __be32 pn; +}; + +/** + * struct macsec_key - SA key + * @id user-provided key identifier + * @tfm crypto struct, key storage + */ +struct macsec_key { + u64 id; + struct crypto_aead *tfm; +}; + +/** + * struct macsec_rx_sa - receive secure association + * @active + * @next_pn packet number expected for the next packet + * @lock protects next_pn manipulations + * @key key structure + * @stats per-SA stats + */ +struct macsec_rx_sa { + bool active; + u32 next_pn; + spinlock_t lock; + struct macsec_key key; + struct macsec_rx_sa_stats __percpu *stats; + atomic_t refcnt; + struct rcu_head rcu; +}; + +struct pcpu_rx_sc_stats { + struct macsec_rx_sc_stats stats; + struct u64_stats_sync syncp; +}; + +/** + * struct macsec_rx_sc - receive secure channel + * @sci secure channel identifier for this SC + * @active channel is active + * @sa array of secure associations + * @stats per-SC stats + */ +struct macsec_rx_sc { + struct macsec_rx_sc __rcu *next; + sci_t sci; + bool active; + struct macsec_rx_sa __rcu *sa[4]; + struct
[RFC PATCH net-next 2/3] net: add MACsec netdevice priv_flags and helper
Signed-off-by: Sabrina DubrocaReviewed-by: Hannes Frederic Sowa --- include/linux/netdevice.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c20b814e46a0..4f80f621ecf9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1319,6 +1319,7 @@ enum netdev_priv_flags { IFF_OPENVSWITCH = 1<<22, IFF_L3MDEV_SLAVE= 1<<23, IFF_TEAM= 1<<24, + IFF_MACSEC = 1<<25, }; #define IFF_802_1Q_VLANIFF_802_1Q_VLAN @@ -1346,6 +1347,7 @@ enum netdev_priv_flags { #define IFF_OPENVSWITCHIFF_OPENVSWITCH #define IFF_L3MDEV_SLAVE IFF_L3MDEV_SLAVE #define IFF_TEAM IFF_TEAM +#define IFF_MACSEC IFF_MACSEC /** * struct net_device - The DEVICE structure. @@ -3964,6 +3966,11 @@ static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, skb->mac_len = mac_len; } +static inline bool netif_is_macsec(const struct net_device *dev) +{ + return dev->priv_flags & IFF_MACSEC; +} + static inline bool netif_is_macvlan(const struct net_device *dev) { return dev->priv_flags & IFF_MACVLAN; -- 2.6.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH net-next 0/3] MACsec IEEE 802.1AE implementation
MACsec (IEEE 802.1AE [0]) is a protocol that provides security for wired ethernet LANs. MACsec offers two protection modes: authentication only, or authenticated encryption. MACsec defines "secure channels" that allow transmission from one node to one or more others. Communication on a channel is done over a succession of "secure associations", that each use a specific key. Secure associations are identified by their "association number" in the range 0..3. A secure association is retired when its 32-bit packet number would wrap, and the same association number can later be reused with a new key and packet number. The standard mode of encryption is GCM AES with 128 bits keys, although an extension allows 256 bits keys [1] (not implemented in this submission). When using MACsec, an extra header, called "SecTAG", is added between the ethernet header and the original payload: +-+++ |(MACsec ethertype) | TCI_AN | SL | +-+++ | Packet Number | +---+ | Secure Channel Identifier | |(optional) | +---+ TCI_AN: version end_station sci_present scb encrypted changed_text association_number (2 bits) SL: short_length (6 bits) unused (2 bits) The ethertype for the packet is set to 0x88E5, and the original ethertype becomes part of the secure payload, which may be encrypted. The ethernet header and the SecTAG are always transmitted in the clear, but are integrity-protected. MACsec supports optional replay protection with a configurable replay window. MACsec is designed to be used with the MKA extension to 802.1X (MACsec Key Agreement protocol) [2], which provides channel attribution and key distribution to the nodes, but can also be used with static keys getting fed manually by an administrator. Optional (not supported yet) features: - confidentiality offset: in encryption mode, part of the payload may be left unencrypted. - choice of cipher suite: GCM AES with 256 bits has been standardised [1]. Implementation A netdevice is created on top of a real device for each TX secure channel, like we do for VLANs. Multiple TX channels can be created on top of the same underlying device. Several other approaches were considered for the RX path: - dev_add_pack: doesn't work, because we want to filter out unprotected packets - transparent mode: MACsec would be enabled directly on the real netdevice. For this, we cannot use a rx_handler directly because MACsec must be available for underlying devices enslaved in a bridge or in a bond, so we need a hook directly in __netif_receive_skb_core. This approach makes it harder to filter non-encrypted packets on RX without forcing the user to setup some rules, so the "transparent" mode is not so transparent after all. It also makes TX more complex than with a dedicated netdevice. One issue with the proposed implementation is that the qdisc layer for the real device operates on already encrypted packets. Netlink API This is currently a mix of rtnetlink (to create the device and set up the TX channel) and genl (for RX channels, secure associations and their keys). genl provides clean demultiplexing of the {TX,RX}{SC,SA} commands. Use cases The normal use case is wired LANs, including veth and slave devices for bonding/teaming or bridges. MACsec can also be used on any device that makes a full ethernet header visible, for example VXLAN. The VXLAN+MACsec setup would be: hypervisor| virtual machine --|-- And the packets would look like this: | eth | IP | UDP | VXLAN | eth | MACsec | IP | ... | MACsec ICV | One benefit on this approach to encryption in the cloud is that the payload is encrypted by the tenant, not by the tunnel provider, thus the tenant has full control over the keys. Future plans: - offload to hardware, on nics that support it (at least some ixgbe) - implement optional features The corresponding iproute code will be posted later. [0] http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf [1] http://standards.ieee.org/getieee802/download/802.1AEbn-2011.pdf [2] http://standards.ieee.org/getieee802/download/802.1X-2010.pdf Sabrina Dubroca (3): uapi: add MACsec bits net: add MACsec netdevice priv_flags and helper macsec: introduce IEEE 802.1AE driver drivers/net/Kconfig|7 + drivers/net/Makefile |1 + drivers/net/macsec.c | 2991 include/linux/netdevice.h |7 + include/uapi/linux/Kbuild |1 +
Re: [PATCH v1 3/3] bpf: hash: use per-bucket spinlock
On 12/28/2015 01:55 PM, Ming Lei wrote: Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, so it isn't efficient to use a per-hashtable lock in this two helpers. The per-hashtable spinlock is used for protecting bucket's hlist, and per-bucket lock is just enough. This patch converts the per-hashtable lock into per-bucket spinlock, so that contention can be decreased a lot. Signed-off-by: Ming Lei--- kernel/bpf/hashtab.c | 46 ++ 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index d857fcb..67222a9 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -14,10 +14,14 @@ #include #include +struct bucket { + struct hlist_head head; + raw_spinlock_t lock; +}; + struct bpf_htab { struct bpf_map map; - struct hlist_head *buckets; - raw_spinlock_t lock; + struct bucket *buckets; atomic_t count; /* number of elements in this hashtable */ u32 n_buckets; /* number of hash buckets */ u32 elem_size; /* size of each element in bytes */ @@ -88,24 +92,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr) /* make sure page count doesn't overflow */ goto free_htab; When adapting memory accounting and allocation sizes below where you replace sizeof(struct hlist_head) with sizeof(struct bucket), is there a reason why you don't update the overflow checks along with it? [...] /* prevent zero size kmalloc and check for u32 overflow */ if (htab->n_buckets == 0 || htab->n_buckets > U32_MAX / sizeof(struct hlist_head)) goto free_htab; if ((u64) htab->n_buckets * sizeof(struct hlist_head) + (u64) htab->elem_size * htab->map.max_entries >= U32_MAX - PAGE_SIZE) /* make sure page count doesn't overflow */ goto free_htab; [...] - htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) + + htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) + htab->elem_size * htab->map.max_entries, PAGE_SIZE) >> PAGE_SHIFT; err = -ENOMEM; - htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct hlist_head), + htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket), GFP_USER | __GFP_NOWARN); if (!htab->buckets) { - htab->buckets = vmalloc(htab->n_buckets * sizeof(struct hlist_head)); + htab->buckets = vmalloc(htab->n_buckets * sizeof(struct bucket)); if (!htab->buckets) goto free_htab; } [...] Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 0/3] bpf: hash: use per-bucket spinlock
Hi, This patchset tries to optimize ebpf hash map, and follows the idea: Both htab_map_update_elem() and htab_map_delete_elem() can be called from eBPF program, and they may be in kernel hot path, it isn't efficient to use a per-hashtable lock in this two helpers, so this patch converts the lock into per-bucket spinlock. With this patchset, looks the performance penalty from eBPF decreased a lot, see the following test: 1) run 'tools/biolatency' of bcc before running block test; 2) run fio to test block throught over /dev/nullb0, (randread, 16jobs, libaio, 4k bs) and the test box is one 24cores(dual sockets) VM server: - without patchset: 607K IOPS - with this patchset: 1184K IOPS - without running eBPF prog: 1492K IOPS TODO: - remove the per-hashtable atomic counter V1: - fix the wrong 3/3 patch kernel/bpf/hashtab.c | 60 +++- 1 file changed, 36 insertions(+), 24 deletions(-) Thanks, -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v1 2/3] bpf: hash: move select_bucket() out of htab's spinlock
The spinlock is just used for protecting the per-bucket hlist, so it isn't needed for selecting bucket. Acked-by: Daniel BorkmannSigned-off-by: Ming Lei --- kernel/bpf/hashtab.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 2615388..d857fcb 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -248,12 +248,11 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value, memcpy(l_new->key + round_up(key_size, 8), value, map->value_size); l_new->hash = htab_map_hash(l_new->key, key_size); + head = select_bucket(htab, l_new->hash); /* bpf_map_update_elem() can be called in_irq() */ raw_spin_lock_irqsave(>lock, flags); - head = select_bucket(htab, l_new->hash); - l_old = lookup_elem_raw(head, l_new->hash, key, key_size); if (!l_old && unlikely(atomic_read(>count) >= map->max_entries)) { @@ -310,11 +309,10 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) key_size = map->key_size; hash = htab_map_hash(key, key_size); + head = select_bucket(htab, hash); raw_spin_lock_irqsave(>lock, flags); - head = select_bucket(htab, hash); - l = lookup_elem_raw(head, hash, key, key_size); if (l) { -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator
((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal for X >= 0. Signed-off-by: Ivan Safonov--- drivers/net/wireless/ath/ath9k/ar9003_eeprom.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c index 8b4561e..35e57f7 100644 --- a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c +++ b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c @@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw *ah) REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); - therm_on = (thermometer < 0) ? 0 : (thermometer == 0); + therm_on = thermometer == 0; REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); if (pCap->chip_chainmask & BIT(1)) { - therm_on = (thermometer < 0) ? 0 : (thermometer == 1); + therm_on = thermometer == 1; REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); } if (pCap->chip_chainmask & BIT(2)) { - therm_on = (thermometer < 0) ? 0 : (thermometer == 2); + therm_on = thermometer == 2; REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); } -- 2.4.10 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats
On December 28, 2015 12:58:25 AM PST, Andrew Lunnwrote: >On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote: >> From: David Miller >> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST) >> >> > From: Andrew Lunn >> > Date: Sun, 27 Dec 2015 12:58:25 +0100 >> > >> >> This patchset add ethtool support for reading statistics from the >PHY. >> >> The Marvell and Micrel Phys are then extended to report receiver >> >> packet errors and idle errors. >> >> >> >> v2: >> >> Fix linking when phylib is not enabled. >> > >> > Series applied, thanks Andrew. >> >> I have to revert this. >> >> You can't call into PHY library code from net/core/ethtool.c, >> because PHY lib can be modular. > >O.K. > >How about we change PHYLIB from a tristate to a bool? Could we make this in a two layer design instead? Core ethtool changes call into getter functions which are implemented by the netdev driver, if the netdev driver happens to use PHYLIB, calls into the PHYLIB helpers with its phydev? That would also allow a network driver already supporting PHY stats to report them through your new interface. -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()
On Mon, Dec 28, 2015 at 03:29:42PM +0100, Sabrina Dubroca wrote: > 2015-12-28, 15:01:57 +0100, Michal Kubecek wrote: > > Recent fix "net: add length argument to > > skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable > > branches, namely > > > > stable-3.2.y: commit 127500d724f8 > > stable-3.12.y: commit 3e1ac3aafbd0 > > > > doesn't handle truncated reads correctly. If read length is shorter than > > incoming datagram (but non-zero) and first segment of target iovec is > > sufficient for read length, skb_copy_and_csum_datagram() is used to copy > > checksum the data while copying it. For truncated reads this means only > > the copied part is checksummed (rather than the whole datagram) so that > > the check almost always fails. > > I just ran into this issue too, sorry I didn't notice it earlier :( > > > Add checksum of the remaining part so that the proper checksum of the > > whole datagram is computed and checked. Special care must be taken if > > the copied length is odd. > > > > For zero read length, we don't have to copy anything but we still should > > check the checksum so that a peek doesn't return with a datagram which > > is invalid and wouldn't be returned by an actual read. > > > > Signed-off-by: Michal Kubecek> > --- > > net/core/datagram.c | 26 +- > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > diff --git a/net/core/datagram.c b/net/core/datagram.c > > index f22f120771ef..af4bf368257c 100644 > > --- a/net/core/datagram.c > > +++ b/net/core/datagram.c > > @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff > > *skb, > > int hlen, struct iovec *iov, int len) > > { > > __wsum csum; > > - int chunk = skb->len - hlen; > > + int full_chunk = skb->len - hlen; > > + int chunk = min_t(int, full_chunk, len); > > > > - if (chunk > len) > > - chunk = len; > > - > > - if (!chunk) > > + if (!chunk) { > > + if (__skb_checksum_complete(skb)) > > + goto csum_error; > > return 0; > > + } > > > > /* Skip filled elements. > > * Pretty silly, look at memcpy_toiovec, though 8) > > @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff > > *skb, > > if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base, > >chunk, )) > > goto fault; > > + if (full_chunk > chunk) { > > + if (chunk % 2) { > > + __be16 odd = 0; > > + > > + if (skb_copy_bits(skb, hlen + chunk, > > + (char *) + 1, 1)) > > + goto fault; > > + csum = add32_with_carry(odd, csum); > > + csum = skb_checksum(skb, hlen + chunk + 1, > > + full_chunk - chunk - 1, > > + csum); > > + } else > > + csum = skb_checksum(skb, hlen + chunk, > > + full_chunk - chunk, csum); > > + } > > if (csum_fold(csum)) > > goto csum_error; > > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) > > -- > > 2.6.4 > > > This adds quite a bit of complexity. I'm not really happy about it either. :-( Most of the complexity comes from the corner case when one 16-bit word is divided between the copied part and the rest - but I can't see a nicer way to handle it. There is another option: in the case of truncated read, we could simply take the first branch where copying is separated from checksumming. This would be less efficient, obviously, but I must admit I have no idea how much. > I am considering a revert of my > buggy patch, and use what Eric Dumazet suggested instead: > > https://patchwork.ozlabs.org/patch/543562/ > > What do you think? I believe that would work. I have a little bad feeling about such solution as it would keep the function broken and just rely on not hitting it in the case when it matters. But it worked that way for quite some time so it's probably safe. Michal Kubecek -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator
On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote: > ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal > for X >= 0. X is not guaranteed to be >= 0 here > Signed-off-by: Ivan Safonov> --- > drivers/net/wireless/ath/ath9k/ar9003_eeprom.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c > b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c [] > @@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw > *ah) > REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, > AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); > > - therm_on = (thermometer < 0) ? 0 : (thermometer == 0); > + therm_on = thermometer == 0; This code is not equivalent. Check what happens when thermometer is -1. > REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4, > AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); > if (pCap->chip_chainmask & BIT(1)) { > - therm_on = (thermometer < 0) ? 0 : (thermometer == 1); > + therm_on = thermometer == 1; > REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4, > AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); > } > if (pCap->chip_chainmask & BIT(2)) { > - therm_on = (thermometer < 0) ? 0 : (thermometer == 2); > + therm_on = thermometer == 2; > REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, > AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); > } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] batman-adv: use to_delayed_work
Use to_delayed_work() instead of open-coding it. Signed-off-by: Geliang Tang--- net/batman-adv/bridge_loop_avoidance.c | 2 +- net/batman-adv/distributed-arp-table.c | 2 +- net/batman-adv/network-coding.c| 2 +- net/batman-adv/originator.c| 2 +- net/batman-adv/send.c | 4 ++-- net/batman-adv/translation-table.c | 2 +- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/net/batman-adv/bridge_loop_avoidance.c b/net/batman-adv/bridge_loop_avoidance.c index 99dcae3..4f47c82 100644 --- a/net/batman-adv/bridge_loop_avoidance.c +++ b/net/batman-adv/bridge_loop_avoidance.c @@ -1183,7 +1183,7 @@ static void batadv_bla_periodic_work(struct work_struct *work) struct batadv_hard_iface *primary_if; int i; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); priv_bla = container_of(delayed_work, struct batadv_priv_bla, work); bat_priv = container_of(priv_bla, struct batadv_priv, bla); primary_if = batadv_primary_if_get_selected(bat_priv); diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c index a49c705..64facfc 100644 --- a/net/batman-adv/distributed-arp-table.c +++ b/net/batman-adv/distributed-arp-table.c @@ -138,7 +138,7 @@ static void batadv_dat_purge(struct work_struct *work) struct batadv_priv_dat *priv_dat; struct batadv_priv *bat_priv; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); priv_dat = container_of(delayed_work, struct batadv_priv_dat, work); bat_priv = container_of(priv_dat, struct batadv_priv, dat); diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c index f5276be..a5ffe20 100644 --- a/net/batman-adv/network-coding.c +++ b/net/batman-adv/network-coding.c @@ -700,7 +700,7 @@ static void batadv_nc_worker(struct work_struct *work) struct batadv_priv *bat_priv; unsigned long timeout; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); priv_nc = container_of(delayed_work, struct batadv_priv_nc, work); bat_priv = container_of(priv_nc, struct batadv_priv, nc); diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c index 3c782a33..0eb5d01 100644 --- a/net/batman-adv/originator.c +++ b/net/batman-adv/originator.c @@ -1232,7 +1232,7 @@ static void batadv_purge_orig(struct work_struct *work) struct delayed_work *delayed_work; struct batadv_priv *bat_priv; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); bat_priv = container_of(delayed_work, struct batadv_priv, orig_work); _batadv_purge_orig(bat_priv); queue_delayed_work(batadv_event_workqueue, diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c index f664324..848ce65 100644 --- a/net/batman-adv/send.c +++ b/net/batman-adv/send.c @@ -506,7 +506,7 @@ static void batadv_send_outstanding_bcast_packet(struct work_struct *work) struct net_device *soft_iface; struct batadv_priv *bat_priv; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); forw_packet = container_of(delayed_work, struct batadv_forw_packet, delayed_work); soft_iface = forw_packet->if_incoming->soft_iface; @@ -559,7 +559,7 @@ void batadv_send_outstanding_bat_ogm_packet(struct work_struct *work) struct batadv_forw_packet *forw_packet; struct batadv_priv *bat_priv; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); forw_packet = container_of(delayed_work, struct batadv_forw_packet, delayed_work); bat_priv = netdev_priv(forw_packet->if_incoming->soft_iface); diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c index ec67def..ed6fcc7 100644 --- a/net/batman-adv/translation-table.c +++ b/net/batman-adv/translation-table.c @@ -3157,7 +3157,7 @@ static void batadv_tt_purge(struct work_struct *work) struct batadv_priv_tt *priv_tt; struct batadv_priv *bat_priv; - delayed_work = container_of(work, struct delayed_work, work); + delayed_work = to_delayed_work(work); priv_tt = container_of(delayed_work, struct batadv_priv_tt, work); bat_priv = container_of(priv_tt, struct batadv_priv, tt); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()
On Mon, 2015-12-28 at 16:38 +0100, Michal Kubecek wrote: > On Mon, Dec 28, 2015 at 03:29:42PM +0100, Sabrina Dubroca wrote: > > 2015-12-28, 15:01:57 +0100, Michal Kubecek wrote: > > > Recent fix "net: add length argument to > > > skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable > > > branches, namely > > > > > > stable-3.2.y: commit 127500d724f8 > > > stable-3.12.y: commit 3e1ac3aafbd0 > > > > > > doesn't handle truncated reads correctly. If read length is shorter than > > > incoming datagram (but non-zero) and first segment of target iovec is > > > sufficient for read length, skb_copy_and_csum_datagram() is used to copy > > > checksum the data while copying it. For truncated reads this means only > > > the copied part is checksummed (rather than the whole datagram) so that > > > the check almost always fails. > > > > I just ran into this issue too, sorry I didn't notice it earlier :( > > > > > Add checksum of the remaining part so that the proper checksum of the > > > whole datagram is computed and checked. Special care must be taken if > > > the copied length is odd. > > > > > > For zero read length, we don't have to copy anything but we still should > > > check the checksum so that a peek doesn't return with a datagram which > > > is invalid and wouldn't be returned by an actual read. > > > > > > Signed-off-by: Michal Kubecek> > > --- > > > net/core/datagram.c | 26 +- > > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > > > diff --git a/net/core/datagram.c b/net/core/datagram.c > > > index f22f120771ef..af4bf368257c 100644 > > > --- a/net/core/datagram.c > > > +++ b/net/core/datagram.c > > > @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff > > > *skb, > > >int hlen, struct iovec *iov, int len) > > > { > > > __wsum csum; > > > - int chunk = skb->len - hlen; > > > + int full_chunk = skb->len - hlen; > > > + int chunk = min_t(int, full_chunk, len); > > > > > > - if (chunk > len) > > > - chunk = len; > > > - > > > - if (!chunk) > > > + if (!chunk) { > > > + if (__skb_checksum_complete(skb)) > > > + goto csum_error; > > > return 0; > > > + } > > > > > > /* Skip filled elements. > > >* Pretty silly, look at memcpy_toiovec, though 8) > > > @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff > > > *skb, > > > if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base, > > > chunk, )) > > > goto fault; > > > + if (full_chunk > chunk) { > > > + if (chunk % 2) { > > > + __be16 odd = 0; > > > + > > > + if (skb_copy_bits(skb, hlen + chunk, > > > + (char *) + 1, 1)) > > > + goto fault; > > > + csum = add32_with_carry(odd, csum); > > > + csum = skb_checksum(skb, hlen + chunk + 1, > > > + full_chunk - chunk - 1, > > > + csum); > > > + } else > > > + csum = skb_checksum(skb, hlen + chunk, > > > + full_chunk - chunk, csum); > > > + } > > > if (csum_fold(csum)) > > > goto csum_error; > > > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) > > > -- > > > 2.6.4 > > > > > > This adds quite a bit of complexity. > > I'm not really happy about it either. :-( Most of the complexity comes > from the corner case when one 16-bit word is divided between the copied > part and the rest - but I can't see a nicer way to handle it. > > There is another option: in the case of truncated read, we could simply > take the first branch where copying is separated from checksumming. This > would be less efficient, obviously, but I must admit I have no idea how > much. > > > I am considering a revert of my > > buggy patch, and use what Eric Dumazet suggested instead: > > > > https://patchwork.ozlabs.org/patch/543562/ > > > > What do you think? > > I believe that would work. I have a little bad feeling about such > solution as it would keep the function broken and just rely on not > hitting it in the case when it matters. But it worked that way for quite > some time so it's probably safe. For the record, this is what we are using here at Google on our prod kernel. Sabrina, I certainly can send the patch for net-next kernel, as this does not fix a bug for current kernels, but backporting it would be indeed a way to fix the issue for old kernels. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net, socket, socket_wq: fix missing initialization of flags
On Sun, 2015-12-27 at 21:00 +0100, Nicolai Stange wrote: > Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") > > Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from > the current 4.4 release cycle introduces a new flags member in > struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA > from struct socket's flags member into that new place. > > Unfortunately, the new flags field is never initialized properly, at least > not for the struct socket_wq instance created in sock_alloc_inode(). > > One particular issue I encountered because of this is that my GNU Emacs > failed to draw anything on my desktop -- i.e. what I got is a transparent > window, including the title bar. Bisection lead to the commit mentioned > above and further investigation by means of strace told me that Emacs > is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is > reproducible 100% of times and the fact that properly initializing the > struct socket_wq ->flags fixes the issue leads me to the conclusion that > somehow SOCKWQ_ASYNC_NOSPACE got set in the uninitialized ->flags, > preventing my Emacs from receiving any SIGIO's due to send space becoming > available again and it got stuck. > > Make sock_alloc_inode() set the newly created struct socket_wq's ->flags > member to zero. > > Signed-off-by: Nicolai Stange> --- > net/socket.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/socket.c b/net/socket.c > index 29822d6..d730ef9 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -257,6 +257,7 @@ static struct inode *sock_alloc_inode(struct super_block > *sb) > } > init_waitqueue_head(>wait); > wq->fasync_list = NULL; > + wq->flags = 0; > RCU_INIT_POINTER(ei->socket.wq, wq); > > ei->socket.state = SS_UNCONNECTED; Thanks a lot Nicolai for finding this. I completely overlooked this initial value. I checked other places where 'struct socket_wq' were allocated and they look fine. Acked-by: Eric Dumazet -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats
From: Andrew LunnDate: Mon, 28 Dec 2015 09:58:25 +0100 > On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote: >> From: David Miller >> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST) >> >> > From: Andrew Lunn >> > Date: Sun, 27 Dec 2015 12:58:25 +0100 >> > >> >> This patchset add ethtool support for reading statistics from the PHY. >> >> The Marvell and Micrel Phys are then extended to report receiver >> >> packet errors and idle errors. >> >> >> >> v2: >> >> Fix linking when phylib is not enabled. >> > >> > Series applied, thanks Andrew. >> >> I have to revert this. >> >> You can't call into PHY library code from net/core/ethtool.c, >> because PHY lib can be modular. > > O.K. > > How about we change PHYLIB from a tristate to a bool? Beforehand it could be modular just fine, and thus save space for people who do not use it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
4.4-rc7 failure report
Hi Dave, The 4.4-rc7 kernel is failing for me. In my case, all of my vlan interfaces are failing to obtain a dhcp address using dhclient. I've tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both failed. I've tried NetworkManager and the old SysV network service, both fail. I tried a working dhclient from rhel7 on the Fedora rawhide install and it failed too. Running tcpdump on the interface shows the dhcp request going out, and a dhcp response coming back in. Running strace on dhclient shows that it writes the dhcp request, but it never recvs a dhcp response. If I manually bring the interface up with a static IP address then I'm able to run typical IP traffic across the link (aka, ping). It would seem that when dhclient registers a packet filter on the socket, that filter is preventing it from ever getting the dhcp response. The same dhclient works on any non-vlan interfaces in the system, so the filter must work for non-vlan interfaces. Aside from the fact that the interface is a vlan, we also use a priority egress map on the interface, and we use PFC flow control. Let me know if you need anymore to debug the issue, or email me off list and I can get you logins to my reproducer machines. -- Doug LedfordGPG KeyID: 0E572FDD signature.asc Description: OpenPGP digital signature
Re: [PATCH 2/4] wcn36xx: Change indication list lock to spinlock
On Sun, 27 Dec 2015 17:34:25 -0800 Bjorn Anderssonwrote: > In preparation for handling incoming messages from IRQ context, change > the indication list lock to a spinlock > > Signed-off-by: Bjorn Andersson > --- > drivers/net/wireless/ath/wcn36xx/smd.c | 12 ++-- > drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 2 +- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c > b/drivers/net/wireless/ath/wcn36xx/smd.c > index 6b5dbe6f0d0a..4307429740a9 100644 > --- a/drivers/net/wireless/ath/wcn36xx/smd.c > +++ b/drivers/net/wireless/ath/wcn36xx/smd.c > @@ -2165,10 +2165,10 @@ static void wcn36xx_smd_rsp_process(struct wcn36xx > *wcn, void *buf, size_t len) > msg_ind->msg_len = len; > memcpy(msg_ind->msg, buf, len); > > - mutex_lock(>hal_ind_mutex); > + spin_lock(>hal_ind_lock); If you are going to handle messages in IRQ context, that better be a spin_lock_irq() or spin_lock_bh(). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator
On 12/29/2015 12:56 AM, Joe Perches wrote: On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote: ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal for X >= 0. X is not guaranteed to be >= 0 here X is fixed constant. In this case X is {0, 1, 2}. @@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw *ah) REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); - therm_on = (thermometer < 0) ? 0 : (thermometer == 0); + therm_on = thermometer == 0; This code is not equivalent. Check what happens when thermometer is -1. therm_on = (thermometer < 0) ? 0 : (thermometer == 0) => therm_on = (true) ? 0 : (thermometer == 0) => therm_on is 0 therm_on = thermometer == 0 => therm_on = false false is equal to 0 Value of the thermometer variable isanerror code, or athermometercode. The thermometercode is never equal to the error code (thermometercode >= 0, error code <0). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats
From: Florian FainelliDate: Mon, 28 Dec 2015 09:27:25 -0800 > On December 28, 2015 12:58:25 AM PST, Andrew Lunn wrote: >>On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote: >>> From: David Miller >>> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST) >>> >>> > From: Andrew Lunn >>> > Date: Sun, 27 Dec 2015 12:58:25 +0100 >>> > >>> >> This patchset add ethtool support for reading statistics from the >>PHY. >>> >> The Marvell and Micrel Phys are then extended to report receiver >>> >> packet errors and idle errors. >>> >> >>> >> v2: >>> >> Fix linking when phylib is not enabled. >>> > >>> > Series applied, thanks Andrew. >>> >>> I have to revert this. >>> >>> You can't call into PHY library code from net/core/ethtool.c, >>> because PHY lib can be modular. >> >>O.K. >> >>How about we change PHYLIB from a tristate to a bool? > > Could we make this in a two layer design instead? Core ethtool changes call > into getter functions which are implemented by the netdev driver, if the > netdev driver happens to use PHYLIB, calls into the PHYLIB helpers with its > phydev? That would also allow a network driver already supporting PHY stats > to report them through your new interface. Or just put the helpers in net/core/ethtool.c -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nf_unregister_net_hook: hook not found!
Hi, Running a 4.4.0-rc6 kernel i encountered the warning below. -- Sander [ 13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team [ 13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled [ 14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0 [ 14.328141] systemd-logind[2485]: Failed to start user service: Unknown unit: user@117.service [ 14.356634] systemd-logind[2485]: New session c1 of user lightdm. [ 14.357320] [ cut here ] [ 14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357328] nf_unregister_net_hook: hook not found! [ 14.357371] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357380] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U 4.4.0-rc6-x220-20151224+ #1 [ 14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357390] Workqueue: netns cleanup_net [ 14.357393] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357395] 88030e820d80 88030e7cbd90 81c962d8 81c962e0 [ 14.357397] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357398] Call Trace: [ 14.357405] [] ? dump_stack+0x40/0x57 [ 14.357408] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357410] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357416] [] ? mutex_lock+0x9/0x30 [ 14.357418] [] ? netfilter_net_exit+0x25/0x50 [ 14.357421] [] ? ops_exit_list.isra.6+0x2e/0x60 [ 14.357424] [] ? cleanup_net+0x1ab/0x280 [ 14.357427] [] ? process_one_work+0x133/0x330 [ 14.357429] [] ? worker_thread+0x60/0x470 [ 14.357430] [] ? process_one_work+0x330/0x330 [ 14.357434] [] ? kthread+0xca/0xe0 [ 14.357436] [] ? kthread_create_on_node+0x170/0x170 [ 14.357439] [] ? ret_from_fork+0x3f/0x70 [ 14.357441] [] ? kthread_create_on_node+0x170/0x170 [ 14.357443] ---[ end trace 9984cc4b0e89f818 ]--- [ 14.357443] [ cut here ] [ 14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 netfilter_net_exit+0x25/0x50() [ 14.357446] nf_unregister_net_hook: hook not found! [ 14.357472] Modules linked in: iptable_security(+) iptable_raw iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul crc32_pclmul crc32c_intel aesni_intel [ 14.357478] ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore usb_common pps_core [ 14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U W 4.4.0-rc6-x220-20151224+ #1 [ 14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW (1.39 ) 07/18/2013 [ 14.357484] Workqueue: netns cleanup_net [ 14.357486] 81a27dfd 81359c69 88030e7cbd40 81060297 [ 14.357488] 88030e820db8 88030e7cbd90 81c962d8 81c962e0 [ 14.357489] 88030e7cbdf8 81060317 81a2c010 88030018 [ 14.357490] Call Trace: [ 14.357493] [] ? dump_stack+0x40/0x57 [ 14.357495] [] ? warn_slowpath_common+0x77/0xb0 [ 14.357497] [] ? warn_slowpath_fmt+0x47/0x50 [ 14.357499] [] ?
Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats
On December 28, 2015 12:02:13 PM PST, David Millerwrote: >From: Florian Fainelli >Date: Mon, 28 Dec 2015 09:27:25 -0800 > >> On December 28, 2015 12:58:25 AM PST, Andrew Lunn >wrote: >>>On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote: From: David Miller Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST) > From: Andrew Lunn > Date: Sun, 27 Dec 2015 12:58:25 +0100 > >> This patchset add ethtool support for reading statistics from >the >>>PHY. >> The Marvell and Micrel Phys are then extended to report receiver >> packet errors and idle errors. >> >> v2: >> Fix linking when phylib is not enabled. > > Series applied, thanks Andrew. I have to revert this. You can't call into PHY library code from net/core/ethtool.c, because PHY lib can be modular. >>> >>>O.K. >>> >>>How about we change PHYLIB from a tristate to a bool? >> >> Could we make this in a two layer design instead? Core ethtool >changes call into getter functions which are implemented by the netdev >driver, if the netdev driver happens to use PHYLIB, calls into the >PHYLIB helpers with its phydev? That would also allow a network driver >already supporting PHY stats to report them through your new interface. > >Or just put the helpers in net/core/ethtool.c That would work, Intel drivers could benefit from that interface if it is not too phydev centric. We would have to know if the netdev uses phylib, but that's already the case for things like WoL and MII ioctl. -- Florian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net: phy: adds backplane driver for Freescale's PCS PHY
Le 18/12/2015 01:30, shh@gmail.com a écrit : [snip] > +struct training_state_machine { > + bool bin_m1_late_early; > + bool bin_long_late_early; > + bool bin_m1_stop; > + bool bin_long_stop; > + bool tx_complete; > + bool an_ok; > + bool link_up; > + bool running; > + bool sent_init; > + int m1_min_max_cnt; > + int long_min_max_cnt; Can you change this into a succession of enum values to make it clear how many states there are, and how to get from one to the other? [snip] > + > +static void init_state_machine(struct training_state_machine *s_m) > +{ > + s_m->bin_m1_late_early = true; > + s_m->bin_long_late_early = false; > + s_m->bin_m1_stop = false; > + s_m->bin_long_stop = false; > + s_m->tx_complete = false; > + s_m->an_ok = false; > + s_m->link_up = false; > + s_m->running = false; > + s_m->sent_init = false; > + s_m->m1_min_max_cnt = 0; > + s_m->long_min_max_cnt = 0; That alone is a good indication that your state machine is not readable. > +} > + > +void tune_tecr0(struct fsl_xgkr_inst *inst) > +{ > + struct per_lane_ctrl_status *reg_base; > + u32 val; > + > + reg_base = (struct per_lane_ctrl_status *)inst->reg_base; Typical practice is not to cast a register base address into a pointer to a structure, but instead use offsets to the base register address, which is less error prone and more readable, anything that uses inst->reg_base in this driver is modeled after a structure pattern, please fix that. > + > + val = TECR0_INIT | > + inst->adpt_eq << ZERO_COE_SHIFT | > + inst->ratio_preq << PRE_COE_SHIFT | > + inst->ratio_pst1q << POST_COE_SHIFT; > + > + /* reset the lane */ > + iowrite32be(ioread32be(_base->gcr0) & ~GCR0_RESET_MASK, > + _base->gcr0); > + udelay(1); > + iowrite32be(val, _base->tecr0); > + udelay(1); > + /* unreset the lane */ > + iowrite32be(ioread32be(_base->gcr0) | GCR0_RESET_MASK, > + _base->gcr0); > + udelay(1); Since this is a reset control register, you might want to explore using the reset controller subsystem in case this register serves multiple peripherals. [snip] I skipped through all the state machine logic, but you should consider trying to simplify it using different enum values and a switch() case() statements to make it easy to audit and understand. > + > +static int fsl_backplane_probe(struct phy_device *phydev) > +{ > + struct fsl_xgkr_inst *xgkr_inst; > + struct device_node *child, *parent, *lane_node; > + const char *lane_name; > + int len; > + int ret; > + u32 mode; > + u32 lane[2]; > + > + child = phydev->dev.of_node; > + parent = of_get_parent(child); > + if (!parent) { > + dev_err(>dev, "could not get parent node"); > + return 0; > + } > + > + lane_name = of_get_property(parent, "lane-instance", ); > + if (!lane_name) > + return 0; > + > + if (strstr(lane_name, "1000BASE-KX")) > + mode = BACKPLANE_1G_KX; > + else > + mode = BACKPLANE_10G_KR; That seems like I could put whatever value in "lane-instance" and as long as it contains 1000BASE-KX we are golden, that does not sound very robust nor well defined. > + > + lane_node = of_parse_phandle(child, "lane-handle", 0); > + if (lane_node) { Treat this as an error so you can return early and reduce indentation. > + struct resource res_lane; > + > + ret = of_address_to_resource(lane_node, 0, _lane); > + if (ret) { > + dev_err(>dev, "could not obtain memory map\n"); > + return ret; > + } > + > + ret = of_property_read_u32_array(child, "lane-range", > + (u32 *), 2); > + if (ret) { > + dev_info(>dev, "could not get lane-range\n"); > + return -EINVAL; > + } Is not the standard "ranges" property usable here? > + > + phydev->priv = devm_ioremap_nocache(>dev, > + res_lane.start + lane[0], > + lane[1]); What about the "reg" property here? > + > + if (!phydev->priv) { > + of_node_put(lane_node); > + return -EINVAL; > + } > + of_node_put(lane_node); > + } else { > + return -EINVAL; > + } > + > + if (mode == BACKPLANE_1G_KX) { > + phydev->speed = SPEED_1000; > + /* configure the lane for 1000BASE-KX */ > + lane_set_1gkx(phydev->priv); > + return 0; > + } > + > + xgkr_inst = kzalloc(sizeof(*xgkr_inst), GFP_KERNEL); > + if (!xgkr_inst) > + goto mem_err1; devm_kzalloc()? > + > +
Re: [PATCH 2/4] wcn36xx: Change indication list lock to spinlock
On Mon 28 Dec 15:06 PST 2015, Stephen Hemminger wrote: > On Sun, 27 Dec 2015 17:34:25 -0800 > Bjorn Anderssonwrote: > > > In preparation for handling incoming messages from IRQ context, change > > the indication list lock to a spinlock > > > > Signed-off-by: Bjorn Andersson > > --- > > drivers/net/wireless/ath/wcn36xx/smd.c | 12 ++-- > > drivers/net/wireless/ath/wcn36xx/wcn36xx.h | 2 +- > > 2 files changed, 7 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c > > b/drivers/net/wireless/ath/wcn36xx/smd.c > > index 6b5dbe6f0d0a..4307429740a9 100644 > > --- a/drivers/net/wireless/ath/wcn36xx/smd.c > > +++ b/drivers/net/wireless/ath/wcn36xx/smd.c > > @@ -2165,10 +2165,10 @@ static void wcn36xx_smd_rsp_process(struct wcn36xx > > *wcn, void *buf, size_t len) > > msg_ind->msg_len = len; > > memcpy(msg_ind->msg, buf, len); > > > > - mutex_lock(>hal_ind_mutex); > > + spin_lock(>hal_ind_lock); > > If you are going to handle messages in IRQ context, that better be a > spin_lock_irq() or spin_lock_bh(). This function is executed in IRQ context after the next patch, as such I use spin_lock() here and spin_lock_irqsave() in the worker thread (wcn36xx_ind_smd_work()). Is this not how the spin_lock API should be used? Regards, Bjorn -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator
On 12/29/2015 07:31 AM, Joe Perches wrote: On Tue, 2015-12-29 at 01:38 +0700, Ivan Safonov wrote: On 12/29/2015 12:56 AM, Joe Perches wrote: On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote: ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal for X >= 0. X is not guaranteed to be >= 0 here X is fixed constant. In this case X is {0, 1, 2}. Looks like it can be -1 to me (range: -1, 0, 1, 2) static int ar9003_hw_get_thermometer(struct ath_hw *ah) { struct ar9300_eeprom *eep = >eeprom.ar9300_eep; struct ar9300_base_eep_hdr *pBase = >baseEepHeader; int thermometer = (pBase->miscConfiguration >> 1) & 0x3; return --thermometer; } X is not thermometer. The thermometer is {-1, 0, 1, 2}. X is {0, 1, 2}. All possible X valueswritten in the comments: ar9003_hw_get_thermometer used only in ar9003_hw_thermometer_apply: static void ar9003_hw_thermometer_apply(struct ath_hw *ah) { struct ath9k_hw_capabilities *pCap = >caps; int thermometer = ar9003_hw_get_thermometer(ah); u8 therm_on = (thermometer < 0) ? 0 : 1; REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); if (pCap->chip_chainmask & BIT(1)) REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); if (pCap->chip_chainmask & BIT(2)) REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on); therm_on = (thermometer < 0) ? 0 : (thermometer == 0); /* X = 0 */ REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); if (pCap->chip_chainmask & BIT(1)) { therm_on = (thermometer < 0) ? 0 : (thermometer == 1); /* X = 1 */ REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); } if (pCap->chip_chainmask & BIT(2)) { therm_on = (thermometer < 0) ? 0 : (thermometer == 2); /* X = 2 */ REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4, AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on); } } There is no X = -1. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netconsole: Initialize after all core networking drivers
On Wed, Dec 23, 2015 at 3:03 PM, Calvin Owenswrote: > Commit 7332a13b038be05c ("vxlan: defer vxlan init as late as possible") > changed vxlan to use late_initcall(), because vxlan relies on ipv6 being > loaded when a new device is opened. > > This causes netconsole to panic at boot when configured via the kernel > cmdline on an IXGBE NIC, because ixgbe_open() assumes that vxlan has > already been initialized: BTW, it is not fair to blame my commit above, since it was merged long before ixgbe or any hardware driver has vxlan support. Not a big deal, just FYI. ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netconsole: Initialize after all core networking drivers
On Thu, Dec 24, 2015 at 2:25 AM, Hannes Frederic Sowawrote: > Hi, > > On 24.12.2015 00:03, Calvin Owens wrote: >> This patch addresses the issue cited in 7332a13b038be05c by making vxlan >> actually check if ipv6 is loaded, and reverts it to module_init() so >> that it becomes device_initcall() when built-in, eliminating the >> netconsole issue. >> >> The ipv6 module is permanent, so there's no need to actually do the >> usual module_get/module_put dance: once we find it loaded, we can just >> assume that it always will be. >> >> AFAICS, nothing actually ends up calling vxlan_open() during initcalls, >> so in the (IPV6=y && VXLAN=y) case we can't end up there before ipv6 has >> initialized. >> >> Signed-off-by: Calvin Owens > > This architecture just sucks. :( > > > ixgbe should not have to call into vxlan but vxlan has to call to ixgbe. > Thus the vxlan_get_rx_port is absolutely unnecessary and should be > removed. This also lets ixgbe depend on vxlan which is absurd. > > Simply let vxlan_get_rx_port be called from vxlan_notifier_block on > NETDEV_REGISTER or NETDEV_UP events, which is already available. > > For the second vxlan_get_rx_port case, which is a > IXGBE_FLAG2_VXLAN_REREG_NEEDED needed event, I would suggest we also > push that over to the vxlan_notifier_block, maybe with a new event type > for the notifiers. > > After this change ixgbe would not depend on vxlan module any more. > Agreed with this direction, but not yet convinced with this approach. My feeling is that we should move this piece of code into core networking, so that both vxlan and ixgbe driver can call it without worrying about the dependencies. It looks like all we need here is a per-netns structure about family and ports for all vxlans. But I could overlook something here. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 00/26] Phylink & SFP support
Hi Florian, On Sun Dec 27 18:08, Florian Fainelli wrote: > On December 14, 2015 11:26:21 PM PST, Dustin Byford >wrote: > >On Mon Dec 07 17:35, Russell King - ARM Linux wrote: > >> Hi, > > > >Hello. > > > >> SFP modules are hot-pluggable ethernet transceivers; they can be > >> detected at runtime and accordingly configured. There are a range of > >> modules offering many different features. > >> > >> Some SFP modules have PHYs conventional integrated into them, others > >> drive a laser diode from the Serdes bus. Some have monitoring, > >others > >> do not. > >> > >> Some SFP modules want to use SGMII over the Serdes link, others want > >> to use 1000base-X over the Serdes link. > >> > >> This makes it non-trivial to support with the existing code > >structure. > >> Not wanting to write something specific to the mvneta driver, I > >decided > >> to have a go at coming up with something more generic. > >> > >> My initial attempts were to provide a PHY driver, but I found that > >> phylib's state machine got in the way, and it was hard to support two > >> chained PHYs. Conversely, having a fixed DT specified setup (via > >> the fixed phy infrastructure) would allow some SFP modules to work, > >but > >> not others. The same is true of the "managed" in-band status (which > >> is SGMII.) > >> > >> The result is that I came up with phylink - an infrastructure layer > >> which sits between the network driver and any attached PHY, and a > >> SFP module layer detects the SFP module, and configures phylink > >> accordingly. > >> > >> Overall, this supports: > >> > >> * switching the serdes mode at the NIC driver > >> * controlling autonegotiation and autoneg results > >> * allowing PHYs to be hotplugged > >> * allowing SFP modules to be hotplugged with proper link indication > >> * fixed-mode links without involving phylib > >> * flow control > >> * EEE support > >> * reading SFP module EEPROMs > >> > >> Overall, phylink supports several link modes, with dynamic switching > >> possible between these: > >> * A true fixed link mode, where the parameters are set by DT. > >> * PHY mode, where we read the negotiation results from the PHY > >registers > >> and pass them to the NIC driver. > >> * SGMII mode, where the in-band status indicates the speed, duplex > >and > >> flow control settings of the link partner. > >> * 1000base-X mode, where the in-band status indicates only duplex and > >> flow control settings (different, incompatible bit layout from > >SGMII.) > > > >I've been working on some similar code to handle interactions with a > >wide range of SFF modules, 1G to 100G, on Linux network switches for > >some time. For practical reasons a lot of that was in userspace but > >I've been planning and recently working on an SFF kernel driver that > >does some of what's done in this series. I think the model you're > >proposing is right on, and since you're further along in implementation > >I'd like to help round out support for the other SFF modules if I can. > >Then make this work on the network ASICs I have access to. > > > >Any concrete plans for QSFP or the new 25G modules? > > > >> Ethtool support is included, as well as emulation of the MII > >registers > >> for situations where a PHY is not attached, giving compatible > >emulation > >> of existing user interfaces where required. > >> > >> The patches here include modification of mvneta (against 4.4-rc1, so > >> probably won't apply to current development tips.) It basically > >> hooks into the places where the phylib would hook into. > >> > >> DT wise, the changes needed to support SFP look like this (example > >> taken from Clearfog): > >> > >>ethernet@34000 { > >> + managed = "in-band-status"; > >>phy-mode = "sgmii"; > >>status = "okay"; > >> - > >> - fixed-link { > >> - speed = <1000>; > >> - full-duplex; > >> - }; > >>}; > >> ... > >> + sfp: sfp { > >> + compatible = "sff,sfp"; > >> + i2c-bus = <>; > >> + los-gpio = < 12 GPIO_ACTIVE_HIGH>; > >> + moddef0-gpio = < 15 GPIO_ACTIVE_LOW>; > >> + sfp,ethernet = <>; > > > >Using is unambiguous in the this case because there's only one > >serdes and one mac involved. To specify the mac/serdes/cage > >associations at the same level of detail as the gpios it might be nice > >(at least for some devices) to point to a serdes node (or 4 in the case > >of QSFP) instead of Any thoughts on that? > > Using a phandle here allows for quite a lot of flexibility on how you want to > associate a given SFP to its data plane partner. I do not think we need to > get more strict than that strictly mandate an actual Ethernet controller > node. These Marvell adapters typically have one or more "
New "ip wait" subcommand for iproute2
Hallo netdev@, I had occasion to want to programmatically wait for an interface to become available from within a shell script, but found there to be no off-the-shelf tool for such a thing. Could this patch be considered for inclusion as part of iproute2? It adds an "ip wait link" subcommand ("link" required in case someone wants to add things like "ip wait addr" or somesuch) based quite heavily on the ipmonitor.c file. For example, one might "ip wait link dev eth0 up" to wait for an interface of that name to appear (specifically, for a RTM_NEWLINK message). "ip wait link dev eth0 down" will wait for it to go away (RTM_DELLINK). This should be checkpatch clean, but please let me know if I missed something. Cheers, --nwf; From f6e0acbc07af9650cb5e66c0dec08e6e3e878f79 Mon Sep 17 00:00:00 2001 From: Nathaniel Wesley FilardoDate: Mon, 28 Dec 2015 18:35:17 -0500 Subject: [PATCH] New 'ip wait' command Signed-off-by: Nathaniel Filardo --- ip/Makefile| 2 +- ip/ip.c| 3 +- ip/ip_common.h | 1 + ip/ipwait.c| 149 + 4 files changed, 153 insertions(+), 2 deletions(-) create mode 100644 ip/ipwait.c diff --git a/ip/Makefile b/ip/Makefile index f3d2987..cf12af7 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \ iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \ link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \ iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \ -iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o +iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipwait.o RTMONOBJ=rtmon.o diff --git a/ip/ip.c b/ip/ip.c index eea00b8..36d8a24 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -51,7 +51,7 @@ static void usage(void) " ip [ -force ] -batch filename\n" "where OBJECT := { link | address | addrlabel | route | rule | neighbor | ntable |\n" " tunnel | tuntap | maddress | mroute | mrule | monitor | xfrm |\n" -" netns | l2tp | fou | tcp_metrics | token | netconf }\n" +" netns | l2tp | fou | tcp_metrics | token | netconf | wait }\n" " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n" "-h[uman-readable] | -iec |\n" "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |\n" @@ -97,6 +97,7 @@ static const struct cmd { { "mrule", do_multirule }, { "netns", do_netns }, { "netconf",do_ipnetconf }, + { "wait", do_ipwait }, { "help", do_help }, { 0 } }; diff --git a/ip/ip_common.h b/ip/ip_common.h index 9a846df..5670b9b 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -54,6 +54,7 @@ int do_ipfou(int argc, char **argv); int do_tcp_metrics(int argc, char **argv); int do_ipnetconf(int argc, char **argv); int do_iptoken(int argc, char **argv); +int do_ipwait(int argc, char **argv); int iplink_get(unsigned int flags, char *name, __u32 filt_mask); static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb) diff --git a/ip/ipwait.c b/ip/ipwait.c new file mode 100644 index 000..998fb21 --- /dev/null +++ b/ip/ipwait.c @@ -0,0 +1,149 @@ +/* + * ipwait.c"ip wait". + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:Alexey Kuznetsov, (ipmonitor.c) + * Nathaniel Filardo (this file) + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utils.h" +#include "ip_common.h" + +static void usage(void) __attribute__((noreturn)); +int listen_all_nsid; +int wait_for; +char *wait_dev; +int verbose; + +static void usage(void) +{ + fprintf(stderr, + "Usage: ip wait [all-nsid] [verbose] link dev DEVICE {up | down}"); + exit(-1); +} + +static int accept_msg(const struct sockaddr_nl *who, + struct rtnl_ctrl_data *ctrl, + struct nlmsghdr *n, void *arg) +{ + int done = 0; + + if (n->nlmsg_type == RTM_NEWLINK || n->nlmsg_type == RTM_DELLINK) { + if (wait_for == n->nlmsg_type + && wait_for == RTM_DELLINK + && ll_name_to_index(wait_dev) != 0) + done = 1; + + ll_remember_index(who, n, NULL); + if (verbose) + print_linkinfo(who, n, stdout); + + if (wait_for == n->nlmsg_type + && wait_for == RTM_NEWLINK +
Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator
On Tue, 2015-12-29 at 01:38 +0700, Ivan Safonov wrote: > On 12/29/2015 12:56 AM, Joe Perches wrote: > > On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote: > > > ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == > > > X) are equal for X >= 0. > > X is not guaranteed to be >= 0 here > > X is fixed constant. In this case X is {0, 1, 2}. Looks like it can be -1 to me (range: -1, 0, 1, 2) static int ar9003_hw_get_thermometer(struct ath_hw *ah) { struct ar9300_eeprom *eep = >eeprom.ar9300_eep; struct ar9300_base_eep_hdr *pBase = >baseEepHeader; int thermometer = (pBase->miscConfiguration >> 1) & 0x3; return --thermometer; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html