Re: 4.4-rc7 failure report

2015-12-28 Thread Daniel Borkmann

On 12/28/2015 10:53 PM, Doug Ledford wrote:

The 4.4-rc7 kernel is failing for me.  In my case, all of my vlan
interfaces are failing to obtain a dhcp address using dhclient.  I've
tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both
failed.  I've tried NetworkManager and the old SysV network service,
both fail.  I tried a working dhclient from rhel7 on the Fedora rawhide
install and it failed too.  Running tcpdump on the interface shows the
dhcp request going out, and a dhcp response coming back in.  Running
strace on dhclient shows that it writes the dhcp request, but it never
recvs a dhcp response.  If I manually bring the interface up with a
static IP address then I'm able to run typical IP traffic across the
link (aka, ping).  It would seem that when dhclient registers a packet
filter on the socket, that filter is preventing it from ever getting the
dhcp response.  The same dhclient works on any non-vlan interfaces in
the system, so the filter must work for non-vlan interfaces.  Aside from
the fact that the interface is a vlan, we also use a priority egress map
on the interface, and we use PFC flow control.  Let me know if you need
anymore to debug the issue, or email me off list and I can get you
logins to my reproducer machines.


When you say 4.4-rc7 kernel is failing for you, what latest kernel version
was working, where the socket filter was properly receiving the response on
your vlan iface? Are you reasonably sure that the skb is dropped at the BPF
filter attached to the dhcp's packet socket? Can you dump the BPF code of
the filter? Are there any vlan offloading settings the filter is not taking
into account (in the sense of classic BPF extensions, tcpdump/libpcap finally
managed to cope with this)?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next 3/3] macsec: introduce IEEE 802.1AE driver

2015-12-28 Thread Florian Westphal
Sabrina Dubroca  wrote:
> + if (h->short_length)
> + return len == h->short_length + 24;
> + else
> + return len >= 72;
[..]
> + return len == h->short_length + 32;
[..]
> + return len >= 80;
[..]
> + return len == 8 + icv_len + h->short_length;
> + else
> + return len >= 8 + icv_len + 48;
[..]
> + if (h->short_length)
> + return len == 16 + icv_len + h->short_length;
> + else
> + return len >= 16 + icv_len + 48;

Could you add some defines instead of magic numbers?

> + tx_sa->next_pn++;
> + if (tx_sa->next_pn == 0) {
> + pr_notice("PN wrapped, transitionning to !oper\n");

Is that _notice intentional?
I'm only asking because it seems we printk unconditionally in response
to network traffic & I don't get what operator should do in response to
that message.

> +static void macsec_encrypt_done(struct crypto_async_request *base, int err)
> +{
> + struct sk_buff *skb = base->data;
> + struct net_device *dev = skb->dev;
> + struct macsec_dev *macsec = macsec_priv(dev);
> + struct macsec_tx_sa *sa = macsec_skb_cb(skb)->tx_sa;
> + int len, ret;
> +
> + aead_request_free(macsec_skb_cb(skb)->req);
> +
> + rcu_read_lock_bh();
> + macsec_encrypt_finish(skb, dev);
> + macsec_count_tx(skb, >secy.tx_sc, macsec_skb_cb(skb)->tx_sa);
> + len = skb->len;
> + ret = dev_queue_xmit(skb);
> + count_tx(dev, ret, len);
> + rcu_read_unlock_bh();

What was the rcu_read_lock_bh protecting?

> +static void macsec_decrypt_done(struct crypto_async_request *base, int err)
> +{
> + struct sk_buff *skb = base->data;
> + struct net_device *dev = skb->dev;
> + struct macsec_dev *macsec = macsec_priv(dev);
> + struct macsec_rx_sa *rx_sa = macsec_skb_cb(skb)->rx_sa;
> + int len, ret;
> +
> + aead_request_free(macsec_skb_cb(skb)->req);
> +
> + rcu_read_lock_bh();
> + macsec_finalize_skb(skb, macsec->secy.icv_len,
> + macsec_extra_len(macsec_skb_cb(skb)->has_sci));
> + macsec_reset_skb(skb, macsec->secy.netdev);
> +
> + macsec_rxsa_put(rx_sa);
> + len = skb->len;
> + ret = netif_rx(skb);
> + if (ret == NET_RX_SUCCESS)
> + count_rx(dev, len);
> + else
> + macsec->secy.netdev->stats.rx_dropped++;
> +
> + rcu_read_unlock_bh();

Same question.

> +static void handle_not_macsec(struct sk_buff *skb)
> +{
> + struct macsec_rxh_data *rxd = macsec_data_rcu(skb->dev);
> + struct macsec_dev *macsec;
> +
> + /* 10.6 If the management control validateFrames is not
> +  * Strict, frames without a SecTAG are received, counted, and
> +  * delivered to the Controlled Port
> +  */
> + list_for_each_entry_rcu(macsec, >secys, secys) {
> + struct sk_buff *nskb;
> + int ret;
> + struct pcpu_secy_stats *secy_stats = 
> this_cpu_ptr(macsec->stats);
> +
> + if (macsec->secy.validate_frames == MACSEC_VALIDATE_STRICT) {
> + u64_stats_update_begin(_stats->syncp);
> + secy_stats->stats.InPktsNoTag++;
> + u64_stats_update_end(_stats->syncp);
> + continue;
> + }
> +
> + /* deliver on this port */
> + nskb = skb_clone(skb, GFP_ATOMIC);
> + nskb->dev = macsec->secy.netdev;

nskb == NULL handling?

> +static rx_handler_result_t macsec_handle_frame(struct sk_buff **pskb)
> +{
> + struct sk_buff *skb = *pskb;
> + struct net_device *dev = skb->dev;
> + struct macsec_eth_header *hdr;
> + struct macsec_secy *secy = NULL;
> + struct macsec_rx_sc *rx_sc;
> + struct macsec_rx_sa *rx_sa;
> + struct macsec_rxh_data *rxd;
> + struct macsec_dev *macsec;
> + sci_t sci;
> + u32 pn, lowest_pn;
> + bool cbit;
> + struct pcpu_rx_sc_stats *rxsc_stats;
> + struct pcpu_secy_stats *secy_stats;
> + bool pulled_sci;
> +
> + rcu_read_lock_bh();

Why?  Seems its because of

> + if (skb_headroom(skb) < ETH_HLEN)
> + goto drop_nosa;
> +
> + rxd = macsec_data_rcu(skb->dev);

this, but rxd isn't dereferenced until a lot later in the function.

Also: macsec_data_rcu uses rcu_dereference() but this used
rcu_read_lock_bh, is that structure protected by RCU or RCU-bh?

> + pn = ntohl(hdr->packet_number);
> + if (secy->replay_protect) {
> + bool late;
> +
> + spin_lock(_sa->lock);
> + late = rx_sa->next_pn >= secy->replay_window &&
> +pn < (rx_sa->next_pn - secy->replay_window);
> + spin_unlock(_sa->lock);
> +
> + if (late) {
> + u64_stats_update_begin(_stats->syncp);
> + 

Re: 4.4-rc7 failure report

2015-12-28 Thread Doug Ledford
On 12/28/2015 05:20 PM, Daniel Borkmann wrote:
> On 12/28/2015 10:53 PM, Doug Ledford wrote:
>> The 4.4-rc7 kernel is failing for me.  In my case, all of my vlan
>> interfaces are failing to obtain a dhcp address using dhclient.  I've
>> tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both
>> failed.  I've tried NetworkManager and the old SysV network service,
>> both fail.  I tried a working dhclient from rhel7 on the Fedora rawhide
>> install and it failed too.  Running tcpdump on the interface shows the
>> dhcp request going out, and a dhcp response coming back in.  Running
>> strace on dhclient shows that it writes the dhcp request, but it never
>> recvs a dhcp response.  If I manually bring the interface up with a
>> static IP address then I'm able to run typical IP traffic across the
>> link (aka, ping).  It would seem that when dhclient registers a packet
>> filter on the socket, that filter is preventing it from ever getting the
>> dhcp response.  The same dhclient works on any non-vlan interfaces in
>> the system, so the filter must work for non-vlan interfaces.  Aside from
>> the fact that the interface is a vlan, we also use a priority egress map
>> on the interface, and we use PFC flow control.  Let me know if you need
>> anymore to debug the issue, or email me off list and I can get you
>> logins to my reproducer machines.
> 
> When you say 4.4-rc7 kernel is failing for you, what latest kernel version
> was working, where the socket filter was properly receiving the response on
> your vlan iface?

v4.3 final works.  I haven't bisected where in the 4.4 series it quits
working.  I can do that tomorrow.

> Are you reasonably sure that the skb is dropped at the BPF
> filter attached to the dhcp's packet socket?

No.  I'm only reasonably sure that without a filter it works, I don't
know if it gets dropped at the BPF filter or something else when the
filter is added.  It could be an interaction between the filter and PFC
or vlan or anything else for all I know.  But I figured the level of
detail I gave should make it easy to reproduce locally by interested
parties.

> Can you dump the BPF code of
> the filter?

It's whatever the filter is that dhclient uses.  I'm pretty sure that's
a pretty standard filter.  And I tried known working dhclients in order
to make sure it wasn't the latest dhclient's fault.

> Are there any vlan offloading settings the filter is not taking
> into account (in the sense of classic BPF extensions, tcpdump/libpcap
> finally
> managed to cope with this)?

Have no clue.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


[PATCH 2/2] ixgbe: restrict synchronization of link_up and speed

2015-12-28 Thread zyjzyj2000
From: Zhu Yanjun 

When the X540 NIC acts as a slave of some virtual NICs, it is very
important to synchronize link_up and link_speed, such as a bonding
driver in 802.3ad mode. When X540 NIC acts as an independent interface,
it is not necessary to synchronize link_up and link_speed. That is,
the time span between link_up and link_speed is acceptable.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ace21b9..1bb6056 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6436,8 +6436,15 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
 * time. To X540 NIC, there is a time span between link_up and
 * link_speed. As such, only continue if link_up and link_speed are
 * ready to X540 NIC.
+* The time span between link_up and link_speed is very important
+* when the X540 NIC acts as a slave in some virtual NICs, such as
+* a bonding driver in 802.3ad mode. When X540 NIC acts as an
+* independent interface, it is not necessary to synchronize link_up
+* and link_speed.
+* In the end, not continue if (X540 NIC && SLAVE && link_speed UNKNOWN)
 */
-   if (hw->mac.type == ixgbe_mac_X540)
+   if ((hw->mac.type == ixgbe_mac_X540) &&
+   (netdev->flags & IFF_SLAVE))
if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
return;
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V3] ixgbe: force to synchronize link_up and speed as a slave

2015-12-28 Thread zyjzyj2000

Hi, all

During the discussion with Emil, I think the time span between link_up 
and link_speed is not important when the X540 NIC acts as an independent 
interface. As such, a new patch is made to restrict synchronization 
between link_up and link_speed.

Any reply is appreciated.

Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] ixgbe: force to synchronize reporting "link on" and getting speed

2015-12-28 Thread zyjzyj2000
From: yzhu1 

In X540 NIC, there is a time span between reporting "link on" and
getting the speed and duplex. To a bonding driver in 802.3ad mode,
this time span will make it not work well if the time span is big
enough. The big time span will make bonding driver change the state of
the slave device to up while the speed and duplex of the slave device
can not be gotten. Later the bonding driver will not have change to
get the speed and duplex of the slave device. The speed and duplex of
the slave device are important to a bonding driver in 802.3ad mode.

To 82599_SFP NIC and other kinds of NICs, this problem does
not exist. As such, it is necessary for X540 to report"link on" when
the link speed is not IXGBE_LINK_SPEED_UNKNOWN.

Signed-off-by: yzhu1 
Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index aed8d02..ace21b9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6426,6 +6426,21 @@ static void ixgbe_watchdog_link_is_up(struct 
ixgbe_adapter *adapter)
if (netif_carrier_ok(netdev))
return;
 
+   /* In X540 NIC, there is a time span between reporting "link on"
+* and getting the speed and duplex. To a bonding driver in 802.3ad
+* mode, this time span will make it not work well if the time span
+* is big enough. To 82599_SFP NIC and other kinds of NICs, this
+* problem does not exist. As such, it is better for X540 to report
+* "link on" when the link speed is not IXGBE_LINK_SPEED_UNKNOWN.
+* To other NICs, the link_up and link_speed are gotten at the same
+* time. To X540 NIC, there is a time span between link_up and
+* link_speed. As such, only continue if link_up and link_speed are
+* ready to X540 NIC.
+*/
+   if (hw->mac.type == ixgbe_mac_X540)
+   if (link_speed == IXGBE_LINK_SPEED_UNKNOWN)
+   return;
+
adapter->flags2 &= ~IXGBE_FLAG2_SEARCH_FOR_SFP;
 
switch (hw->mac.type) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2015-12-28 Thread Michal Kubecek
On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:
>  wrote:
> >In 802.3ad mode, the speed and duplex is needed. But in some NIC,
> >there is a time span between NIC up state and getting speed and duplex.
> >As such, sometimes a slave in 802.3ad mode is in up state without
> >speed and duplex. This will make bonding in 802.3ad mode can not
> >work well. 
> >To make bonding driver be compatible with more NICs, it is
> >necessary to restrict the up state in 802.3ad mode.
> 
>   What device is this?  It seems a bit odd that an Ethernet device
> can be carrier up but not have the duplex and speed available.
...
>   In general, though, bonding expects a speed or duplex change to
> be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
> propagate to the 802.3ad logic.
> 
>   If the device here is going carrier up prior to having speed or
> duplex available, then maybe it should call netdev_state_change() when
> the duplex and speed are available, or delay calling netif_carrier_on().

I have encountered this problem (NIC having carrier on before being able
to detect speed/duplex and driver not notifying when speed/duplex
becomes available) with netxen cards earlier. But it was eventually
fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
handling.") so this example rather supports what you said.

  Michal Kubecek
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] bpf: hash: use per-bucket spinlock

2015-12-28 Thread Ming Lei
On Mon, Dec 28, 2015 at 5:13 PM, Daniel Borkmann  wrote:
> On 12/26/2015 10:31 AM, Ming Lei wrote:
>>
>> From: Ming Lei 
>>
>> Both htab_map_update_elem() and htab_map_delete_elem() can be
>> called from eBPF program, and they may be in kernel hot path,
>> so it isn't efficient to use a per-hashtable lock in this two
>> helpers.
>>
>> The per-hashtable spinlock is used just for protecting bucket's
>> hlist, and per-bucket lock should be enough. This patch converts
>> the per-hashtable lock into per-bucket spinlock, so that contention
>> can be decreased a lot.
>>
>> Signed-off-by: Ming Lei 
>> ---
>>   kernel/bpf/hashtab.c | 35 +--
>>   1 file changed, 25 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>> index d857fcb..66bad7a 100644
>> --- a/kernel/bpf/hashtab.c
>> +++ b/kernel/bpf/hashtab.c
>> @@ -14,9 +14,14 @@
>>   #include 
>>   #include 
>>
>> +struct bucket {
>> +   struct hlist_head head;
>> +   raw_spinlock_t lock;
>> +};
>> +
>>   struct bpf_htab {
>> struct bpf_map map;
>> -   struct hlist_head *buckets;
>> +   struct bucket *buckets;
>> raw_spinlock_t lock;
>
>
> Shouldn't the lock member be removed from here?

Hammm, looks this one is a bad version, sorry for that, and will resend it.

>
>
>> atomic_t count; /* number of elements in this hashtable */
>> u32 n_buckets;  /* number of hash buckets */
>> @@ -88,24 +93,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr
>> *attr)
>> /* make sure page count doesn't overflow */
>> goto free_htab;
>>
>> -   htab->map.pages = round_up(htab->n_buckets * sizeof(struct
>> hlist_head) +
>> +   htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket)
>> +
>>htab->elem_size *
>> htab->map.max_entries,
>>PAGE_SIZE) >> PAGE_SHIFT;
>>
>> err = -ENOMEM;
>> -   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct
>> hlist_head),
>> +   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct
>> bucket),
>>   GFP_USER | __GFP_NOWARN);
>>
>> if (!htab->buckets) {
>> -   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct
>> hlist_head));
>> +   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct
>> bucket));
>> if (!htab->buckets)
>> goto free_htab;
>> }
>>
>> -   for (i = 0; i < htab->n_buckets; i++)
>> -   INIT_HLIST_HEAD(>buckets[i]);
>> +   for (i = 0; i < htab->n_buckets; i++) {
>> +   INIT_HLIST_HEAD(>buckets[i].head);
>> +   raw_spin_lock_init(>buckets[i].lock);
>> +   }
>>
>> -   raw_spin_lock_init(>lock);
>> atomic_set(>count, 0);
>>
>> return >map;
>> @@ -120,11 +126,16 @@ static inline u32 htab_map_hash(const void *key, u32
>> key_len)
>> return jhash(key, key_len, 0);
>>   }
>>
>> -static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32
>> hash)
>> +static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32
>> hash)
>>   {
>> return >buckets[hash & (htab->n_buckets - 1)];
>>   }
>>
>> +static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32
>> hash)
>> +{
>> +   return &__select_bucket(htab, hash)->head;
>> +}
>> +
>>   static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32
>> hash,
>>  void *key, u32 key_size)
>>   {
>> @@ -227,6 +238,7 @@ static int htab_map_update_elem(struct bpf_map *map,
>> void *key, void *value,
>> struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
>> struct htab_elem *l_new, *l_old;
>> struct hlist_head *head;
>> +   struct bucket *b;
>> unsigned long flags;
>> u32 key_size;
>> int ret;
>> @@ -248,7 +260,8 @@ static int htab_map_update_elem(struct bpf_map *map,
>> void *key, void *value,
>> memcpy(l_new->key + round_up(key_size, 8), value,
>> map->value_size);
>>
>> l_new->hash = htab_map_hash(l_new->key, key_size);
>> -   head = select_bucket(htab, l_new->hash);
>> +   b = __select_bucket(htab, l_new->hash);
>> +   head = >head;
>>
>> /* bpf_map_update_elem() can be called in_irq() */
>> raw_spin_lock_irqsave(>lock, flags);
>
>
> I am a bit confused on this one, though.
>
> The raw spin lock is still per hashtable (htab->lock), and has not been
> converted in
> this patch to the per bucket one (as opposed to what the commit message
> says), so
> this patch seems to go into the right direction, but is a bit incomplete?
> Shouldn't
> the above f.e. take b->lock, etc?
>
>> @@ -299,6 +312,7 @@ static int htab_map_delete_elem(struct bpf_map *map,
>> void *key)
>>   {

Re: [PATCH net-next v2 05/10] be2net: log digital signature errors while flashing FW image

2015-12-28 Thread kbuild test robot
Hi Suresh,

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Sathya-Perla/be2net-patch-set/20151228-145423
config: x86_64-allmodconfig (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/emulex/benet/be_cmds.c: In function 'be_flash_skyhawk':
>> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration 
>> value 'MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES' not handled in switch 
>> [-Wswitch]
   switch (addl_status(status)) {
   ^
>> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration 
>> value 'MCC_ADDL_STATUS_FLASH_IMAGE_CRC_MISMATCH' not handled in switch 
>> [-Wswitch]
>> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration 
>> value 'MCC_ADDL_STATUS_TOO_MANY_INTERFACES' not handled in switch [-Wswitch]
>> drivers/net/ethernet/emulex/benet/be_cmds.c:2963:4: warning: enumeration 
>> value 'MCC_ADDL_STATUS_INSUFFICIENT_VLANS' not handled in switch [-Wswitch]

vim +/MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES +2963 
drivers/net/ethernet/emulex/benet/be_cmds.c

  2947  flash_offset_support = false;
  2948  goto retry_flash;
  2949  }
  2950  
  2951  /* For old FW images ignore ILLEGAL_FIELD error or 
errors on
  2952   * UFI_DIR region
  2953   */
  2954  if (old_fw_img &&
  2955  (base_status(status) == MCC_STATUS_ILLEGAL_FIELD ||
  2956   (img_optype == OPTYPE_UFI_DIR &&
  2957base_status(status) == MCC_STATUS_FAILED))) {
  2958  continue;
  2959  } else if (status) {
  2960  dev_err(dev, "Flashing section type 0x%x 
failed\n",
  2961  img_type);
  2962  
> 2963  switch (addl_status(status)) {
  2964  case MCC_ADDL_STATUS_MISSING_SIGNATURE:
  2965  dev_err(dev,
  2966  "Digital signature missing in 
FW\n");
  2967  return -EINVAL;
  2968  case MCC_ADDL_STATUS_INVALID_SIGNATURE:
  2969  dev_err(dev,
  2970  "Invalid digital signature in 
FW\n");
  2971  return -EINVAL;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 1/3] bpf: hash: use atomic count

2015-12-28 Thread Daniel Borkmann

On 12/26/2015 10:31 AM, Ming Lei wrote:

Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Signed-off-by: Ming Lei 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] bonding: restrict up state in 802.3ad mode

2015-12-28 Thread zhuyj

On 12/28/2015 04:43 PM, Michal Kubecek wrote:

On Thu, Dec 17, 2015 at 01:57:16PM -0800, Jay Vosburgh wrote:

 wrote:

In 802.3ad mode, the speed and duplex is needed. But in some NIC,
there is a time span between NIC up state and getting speed and duplex.
As such, sometimes a slave in 802.3ad mode is in up state without
speed and duplex. This will make bonding in 802.3ad mode can not
work well.
To make bonding driver be compatible with more NICs, it is
necessary to restrict the up state in 802.3ad mode.

What device is this?  It seems a bit odd that an Ethernet device
can be carrier up but not have the duplex and speed available.

...

In general, though, bonding expects a speed or duplex change to
be announced via a NETDEV_UPDATE or NETDEV_UP notifier, which would
propagate to the 802.3ad logic.

If the device here is going carrier up prior to having speed or
duplex available, then maybe it should call netdev_state_change() when
the duplex and speed are available, or delay calling netif_carrier_on().

I have encountered this problem (NIC having carrier on before being able
to detect speed/duplex and driver not notifying when speed/duplex
becomes available) with netxen cards earlier. But it was eventually
fixed in the driver by commit 9d01412ae76f ("netxen: Fix link event
handling.") so this example rather supports what you said.

   Michal Kubecek

Thanks a lot.
I checked the commit 9d01412ae76f ("netxen: Fix link event
handling."). The symptoms are the same with mine.

The root cause is different. In my problem, the root cause is that LINKS 
register
can not provide link_up and link_speed at the same time. There is a time 
span between
link_up and link_speed. My solution is to force to synchronize link_up 
and link_speed in

ixgbe X540 NIC.

Best Regards!
Zhu Yanjun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats

2015-12-28 Thread Andrew Lunn
On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote:
> From: David Miller 
> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST)
> 
> > From: Andrew Lunn 
> > Date: Sun, 27 Dec 2015 12:58:25 +0100
> > 
> >> This patchset add ethtool support for reading statistics from the PHY.
> >> The Marvell and Micrel Phys are then extended to report receiver
> >> packet errors and idle errors.
> >> 
> >> v2:
> >>   Fix linking when phylib is not enabled.
> > 
> > Series applied, thanks Andrew.
> 
> I have to revert this.
> 
> You can't call into PHY library code from net/core/ethtool.c,
> because PHY lib can be modular.

O.K.

How about we change PHYLIB from a tristate to a bool?

Thanks
Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] bpf: hash: use per-bucket spinlock

2015-12-28 Thread Daniel Borkmann

On 12/26/2015 10:31 AM, Ming Lei wrote:

From: Ming Lei 

Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used just for protecting bucket's
hlist, and per-bucket lock should be enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that contention
can be decreased a lot.

Signed-off-by: Ming Lei 
---
  kernel/bpf/hashtab.c | 35 +--
  1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d857fcb..66bad7a 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -14,9 +14,14 @@
  #include 
  #include 

+struct bucket {
+   struct hlist_head head;
+   raw_spinlock_t lock;
+};
+
  struct bpf_htab {
struct bpf_map map;
-   struct hlist_head *buckets;
+   struct bucket *buckets;
raw_spinlock_t lock;


Shouldn't the lock member be removed from here?


atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
@@ -88,24 +93,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
/* make sure page count doesn't overflow */
goto free_htab;

-   htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) +
+   htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) +
   htab->elem_size * htab->map.max_entries,
   PAGE_SIZE) >> PAGE_SHIFT;

err = -ENOMEM;
-   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct 
hlist_head),
+   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
  GFP_USER | __GFP_NOWARN);

if (!htab->buckets) {
-   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
hlist_head));
+   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
bucket));
if (!htab->buckets)
goto free_htab;
}

-   for (i = 0; i < htab->n_buckets; i++)
-   INIT_HLIST_HEAD(>buckets[i]);
+   for (i = 0; i < htab->n_buckets; i++) {
+   INIT_HLIST_HEAD(>buckets[i].head);
+   raw_spin_lock_init(>buckets[i].lock);
+   }

-   raw_spin_lock_init(>lock);
atomic_set(>count, 0);

return >map;
@@ -120,11 +126,16 @@ static inline u32 htab_map_hash(const void *key, u32 
key_len)
return jhash(key, key_len, 0);
  }

-static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
  {
return >buckets[hash & (htab->n_buckets - 1)];
  }

+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+   return &__select_bucket(htab, hash)->head;
+}
+
  static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash,
 void *key, u32 key_size)
  {
@@ -227,6 +238,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new, *l_old;
struct hlist_head *head;
+   struct bucket *b;
unsigned long flags;
u32 key_size;
int ret;
@@ -248,7 +260,8 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);

l_new->hash = htab_map_hash(l_new->key, key_size);
-   head = select_bucket(htab, l_new->hash);
+   b = __select_bucket(htab, l_new->hash);
+   head = >head;

/* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(>lock, flags);


I am a bit confused on this one, though.

The raw spin lock is still per hashtable (htab->lock), and has not been 
converted in
this patch to the per bucket one (as opposed to what the commit message says), 
so
this patch seems to go into the right direction, but is a bit incomplete? 
Shouldn't
the above f.e. take b->lock, etc?


@@ -299,6 +312,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
  {
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct hlist_head *head;
+   struct bucket *b;
struct htab_elem *l;
unsigned long flags;
u32 hash, key_size;
@@ -309,7 +323,8 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
key_size = map->key_size;

hash = htab_map_hash(key, key_size);
-   head = select_bucket(htab, hash);
+   b = __select_bucket(htab, hash);
+   head = >head;

raw_spin_lock_irqsave(>lock, 

pull-request: wireless-drivers 2015-12-28

2015-12-28 Thread Kalle Valo
Hi Dave,

here's one more pull request, a bit late due to holidays but I hope this
still makes it to 4.4. Just two small fixes to iwlwifi, nothing else.

Kalle

The following changes since commit eeec5d0ef7ee54a75e09e861c3cc44177b8752c7:

  rtlwifi: rtl8821ae: Fix lockups on boot (2015-11-17 15:58:53 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2015-12-28

for you to fetch changes up to 01d85b9b2b6bec2b0773cf2afc58699dc4b052f8:

  Merge tag 'iwlwifi-for-kalle-2015-12-16' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 
(2015-12-18 14:57:02 +0200)



iwlwifi

* don't load firmware that won't exist for 7260
* fix RCU splat


Johannes Berg (2):
  iwlwifi: separate firmware version for 7260 devices
  iwlwifi: mvm: protect RCU dereference in iwl_mvm_get_key_sta_id

Kalle Valo (1):
  Merge tag 'iwlwifi-for-kalle-2015-12-16' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-fixes

 drivers/net/wireless/iwlwifi/iwl-7000.c |   49 ++-
 drivers/net/wireless/iwlwifi/mvm/sta.c  |   15 ++
 2 files changed, 44 insertions(+), 20 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] bpf: hash: move select_bucket() out of htab's spinlock

2015-12-28 Thread Daniel Borkmann

On 12/26/2015 10:31 AM, Ming Lei wrote:

The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Signed-off-by: Ming Lei 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


skb_clone query

2015-12-28 Thread pavi1729
Hi,
  Just out of curisity, I was looking at Kernel 3.14, skb_clone
function in f_ncm.c

http://lxr.free-electrons.com/source/drivers/usb/gadget/f_ncm.c?v=3.14#L1063


QUERY : Shouldn't a kfree_skb(skb2); happen before goto err @1070 ?
  Is this not a memleak ?


1068   if (!skb_pull(skb2, index)) {
1069 ret = -EOVERFLOW;
1070 goto err;
1071   }


Thanks,
Pavi
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()

2015-12-28 Thread Michal Kubecek
Recent fix "net: add length argument to
skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable
branches, namely

  stable-3.2.y: commit 127500d724f8
  stable-3.12.y: commit 3e1ac3aafbd0

doesn't handle truncated reads correctly. If read length is shorter than
incoming datagram (but non-zero) and first segment of target iovec is
sufficient for read length, skb_copy_and_csum_datagram() is used to copy
checksum the data while copying it. For truncated reads this means only
the copied part is checksummed (rather than the whole datagram) so that
the check almost always fails.

Add checksum of the remaining part so that the proper checksum of the
whole datagram is computed and checked. Special care must be taken if
the copied length is odd.

For zero read length, we don't have to copy anything but we still should
check the checksum so that a peek doesn't return with a datagram which
is invalid and wouldn't be returned by an actual read.

Signed-off-by: Michal Kubecek 
---
 net/core/datagram.c | 26 +-
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index f22f120771ef..af4bf368257c 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
 int hlen, struct iovec *iov, int len)
 {
__wsum csum;
-   int chunk = skb->len - hlen;
+   int full_chunk = skb->len - hlen;
+   int chunk = min_t(int, full_chunk, len);
 
-   if (chunk > len)
-   chunk = len;
-
-   if (!chunk)
+   if (!chunk) {
+   if (__skb_checksum_complete(skb))
+   goto csum_error;
return 0;
+   }
 
/* Skip filled elements.
 * Pretty silly, look at memcpy_toiovec, though 8)
@@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base,
   chunk, ))
goto fault;
+   if (full_chunk > chunk) {
+   if (chunk % 2) {
+   __be16 odd = 0;
+
+   if (skb_copy_bits(skb, hlen + chunk,
+ (char *) + 1, 1))
+   goto fault;
+   csum = add32_with_carry(odd, csum);
+   csum = skb_checksum(skb, hlen + chunk + 1,
+   full_chunk - chunk - 1,
+   csum);
+   } else
+   csum = skb_checksum(skb, hlen + chunk,
+   full_chunk - chunk, csum);
+   }
if (csum_fold(csum))
goto csum_error;
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] unix: properly account for FDs passed over unix sockets

2015-12-28 Thread Willy Tarreau
It is possible for a process to allocate and accumulate far more FDs than
the process' limit by sending them over a unix socket then closing them
to keep the process' fd count low.

This change addresses this problem by keeping track of the number of FDs
in flight per user and preventing non-privileged processes from having
more FDs in flight than their configured FD limit.

Reported-by: socketp...@gmail.com
Suggested-by: Linus Torvalds 
Signed-off-by: Willy Tarreau 
---
It would be nice if (if accepted) it would be backported to -stable as the
issue is currently exploitable.
---
 include/linux/sched.h |  1 +
 net/unix/af_unix.c| 24 
 net/unix/garbage.c| 13 -
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index edad7a4..fbf25f1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -830,6 +830,7 @@ struct user_struct {
unsigned long mq_bytes; /* How many bytes can be allocated to mqueue? */
 #endif
unsigned long locked_shm; /* How many pages of mlocked shm ? */
+   unsigned long unix_inflight;/* How many files in flight in unix 
sockets */
 
 #ifdef CONFIG_KEYS
struct key *uid_keyring;/* UID specific keyring */
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 45aebd9..d6d7b43 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1499,6 +1499,21 @@ static void unix_destruct_scm(struct sk_buff *skb)
sock_wfree(skb);
 }
 
+/*
+ * The "user->unix_inflight" variable is protected by the garbage
+ * collection lock, and we just read it locklessly here. If you go
+ * over the limit, there might be a tiny race in actually noticing
+ * it across threads. Tough.
+ */
+static inline bool too_many_unix_fds(struct task_struct *p)
+{
+   struct user_struct *user = current_user();
+
+   if (unlikely(user->unix_inflight > task_rlimit(p, RLIMIT_NOFILE)))
+   return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN);
+   return false;
+}
+
 #define MAX_RECURSION_LEVEL 4
 
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
@@ -1507,6 +1522,9 @@ static int unix_attach_fds(struct scm_cookie *scm, struct 
sk_buff *skb)
unsigned char max_level = 0;
int unix_sock_count = 0;
 
+   if (too_many_unix_fds(current))
+   return -ETOOMANYREFS;
+
for (i = scm->fp->count - 1; i >= 0; i--) {
struct sock *sk = unix_get_socket(scm->fp->fp[i]);
 
@@ -1528,10 +1546,8 @@ static int unix_attach_fds(struct scm_cookie *scm, 
struct sk_buff *skb)
if (!UNIXCB(skb).fp)
return -ENOMEM;
 
-   if (unix_sock_count) {
-   for (i = scm->fp->count - 1; i >= 0; i--)
-   unix_inflight(scm->fp->fp[i]);
-   }
+   for (i = scm->fp->count - 1; i >= 0; i--)
+   unix_inflight(scm->fp->fp[i]);
return max_level;
 }
 
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index a73a226..8fcdc22 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -120,11 +120,11 @@ void unix_inflight(struct file *fp)
 {
struct sock *s = unix_get_socket(fp);
 
+   spin_lock(_gc_lock);
+
if (s) {
struct unix_sock *u = unix_sk(s);
 
-   spin_lock(_gc_lock);
-
if (atomic_long_inc_return(>inflight) == 1) {
BUG_ON(!list_empty(>link));
list_add_tail(>link, _inflight_list);
@@ -132,25 +132,28 @@ void unix_inflight(struct file *fp)
BUG_ON(list_empty(>link));
}
unix_tot_inflight++;
-   spin_unlock(_gc_lock);
}
+   fp->f_cred->user->unix_inflight++;
+   spin_unlock(_gc_lock);
 }
 
 void unix_notinflight(struct file *fp)
 {
struct sock *s = unix_get_socket(fp);
 
+   spin_lock(_gc_lock);
+
if (s) {
struct unix_sock *u = unix_sk(s);
 
-   spin_lock(_gc_lock);
BUG_ON(list_empty(>link));
 
if (atomic_long_dec_and_test(>inflight))
list_del_init(>link);
unix_tot_inflight--;
-   spin_unlock(_gc_lock);
}
+   fp->f_cred->user->unix_inflight--;
+   spin_unlock(_gc_lock);
 }
 
 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
-- 
1.7.12.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 1/3] bpf: hash: use atomic count

2015-12-28 Thread Ming Lei
Preparing for removing global per-hashtable lock, so
the counter need to be defined as aotmic_t first.

Acked-by: Daniel Borkmann 
Signed-off-by: Ming Lei 
---
 kernel/bpf/hashtab.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 34777b3..2615388 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -18,7 +18,7 @@ struct bpf_htab {
struct bpf_map map;
struct hlist_head *buckets;
raw_spinlock_t lock;
-   u32 count;  /* number of elements in this hashtable */
+   atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
u32 elem_size;  /* size of each element in bytes */
 };
@@ -106,7 +106,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
INIT_HLIST_HEAD(>buckets[i]);
 
raw_spin_lock_init(>lock);
-   htab->count = 0;
+   atomic_set(>count, 0);
 
return >map;
 
@@ -256,7 +256,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
 
l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
 
-   if (!l_old && unlikely(htab->count >= map->max_entries)) {
+   if (!l_old && unlikely(atomic_read(>count) >= map->max_entries)) {
/* if elem with this 'key' doesn't exist and we've reached
 * max_entries limit, fail insertion of new elem
 */
@@ -284,7 +284,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
hlist_del_rcu(_old->hash_node);
kfree_rcu(l_old, rcu);
} else {
-   htab->count++;
+   atomic_inc(>count);
}
raw_spin_unlock_irqrestore(>lock, flags);
 
@@ -319,7 +319,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
 
if (l) {
hlist_del_rcu(>hash_node);
-   htab->count--;
+   atomic_dec(>count);
kfree_rcu(l, rcu);
ret = 0;
}
@@ -339,7 +339,7 @@ static void delete_all_elements(struct bpf_htab *htab)
 
hlist_for_each_entry_safe(l, n, head, hash_node) {
hlist_del_rcu(>hash_node);
-   htab->count--;
+   atomic_dec(>count);
kfree(l);
}
}
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 3/3] bpf: hash: use per-bucket spinlock

2015-12-28 Thread Ming Lei
Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei 
---
 kernel/bpf/hashtab.c | 46 ++
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d857fcb..67222a9 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -14,10 +14,14 @@
 #include 
 #include 
 
+struct bucket {
+   struct hlist_head head;
+   raw_spinlock_t lock;
+};
+
 struct bpf_htab {
struct bpf_map map;
-   struct hlist_head *buckets;
-   raw_spinlock_t lock;
+   struct bucket *buckets;
atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
u32 elem_size;  /* size of each element in bytes */
@@ -88,24 +92,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
/* make sure page count doesn't overflow */
goto free_htab;
 
-   htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) +
+   htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) +
   htab->elem_size * htab->map.max_entries,
   PAGE_SIZE) >> PAGE_SHIFT;
 
err = -ENOMEM;
-   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct 
hlist_head),
+   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
  GFP_USER | __GFP_NOWARN);
 
if (!htab->buckets) {
-   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
hlist_head));
+   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
bucket));
if (!htab->buckets)
goto free_htab;
}
 
-   for (i = 0; i < htab->n_buckets; i++)
-   INIT_HLIST_HEAD(>buckets[i]);
+   for (i = 0; i < htab->n_buckets; i++) {
+   INIT_HLIST_HEAD(>buckets[i].head);
+   raw_spin_lock_init(>buckets[i].lock);
+   }
 
-   raw_spin_lock_init(>lock);
atomic_set(>count, 0);
 
return >map;
@@ -120,11 +125,16 @@ static inline u32 htab_map_hash(const void *key, u32 
key_len)
return jhash(key, key_len, 0);
 }
 
-static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
 {
return >buckets[hash & (htab->n_buckets - 1)];
 }
 
+static inline struct hlist_head *select_bucket(struct bpf_htab *htab, u32 hash)
+{
+   return &__select_bucket(htab, hash)->head;
+}
+
 static struct htab_elem *lookup_elem_raw(struct hlist_head *head, u32 hash,
 void *key, u32 key_size)
 {
@@ -227,6 +237,7 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct htab_elem *l_new, *l_old;
struct hlist_head *head;
+   struct bucket *b;
unsigned long flags;
u32 key_size;
int ret;
@@ -248,10 +259,11 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);
 
l_new->hash = htab_map_hash(l_new->key, key_size);
-   head = select_bucket(htab, l_new->hash);
+   b = __select_bucket(htab, l_new->hash);
+   head = >head;
 
/* bpf_map_update_elem() can be called in_irq() */
-   raw_spin_lock_irqsave(>lock, flags);
+   raw_spin_lock_irqsave(>lock, flags);
 
l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
 
@@ -285,11 +297,11 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
} else {
atomic_inc(>count);
}
-   raw_spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
 
return 0;
 err:
-   raw_spin_unlock_irqrestore(>lock, flags);
+   raw_spin_unlock_irqrestore(>lock, flags);
kfree(l_new);
return ret;
 }
@@ -299,6 +311,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
 {
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
struct hlist_head *head;
+   struct bucket *b;
struct htab_elem *l;
unsigned long flags;
u32 hash, key_size;
@@ -309,9 +322,10 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
key_size = map->key_size;
 
hash = htab_map_hash(key, key_size);

Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()

2015-12-28 Thread Sabrina Dubroca
Hello Michal,

2015-12-28, 15:01:57 +0100, Michal Kubecek wrote:
> Recent fix "net: add length argument to
> skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable
> branches, namely
> 
>   stable-3.2.y: commit 127500d724f8
>   stable-3.12.y: commit 3e1ac3aafbd0
> 
> doesn't handle truncated reads correctly. If read length is shorter than
> incoming datagram (but non-zero) and first segment of target iovec is
> sufficient for read length, skb_copy_and_csum_datagram() is used to copy
> checksum the data while copying it. For truncated reads this means only
> the copied part is checksummed (rather than the whole datagram) so that
> the check almost always fails.

I just ran into this issue too, sorry I didn't notice it earlier :(


> Add checksum of the remaining part so that the proper checksum of the
> whole datagram is computed and checked. Special care must be taken if
> the copied length is odd.
> 
> For zero read length, we don't have to copy anything but we still should
> check the checksum so that a peek doesn't return with a datagram which
> is invalid and wouldn't be returned by an actual read.
> 
> Signed-off-by: Michal Kubecek 
> ---
>  net/core/datagram.c | 26 +-
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index f22f120771ef..af4bf368257c 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff 
> *skb,
>int hlen, struct iovec *iov, int len)
>  {
>   __wsum csum;
> - int chunk = skb->len - hlen;
> + int full_chunk = skb->len - hlen;
> + int chunk = min_t(int, full_chunk, len);
>  
> - if (chunk > len)
> - chunk = len;
> -
> - if (!chunk)
> + if (!chunk) {
> + if (__skb_checksum_complete(skb))
> + goto csum_error;
>   return 0;
> + }
>  
>   /* Skip filled elements.
>* Pretty silly, look at memcpy_toiovec, though 8)
> @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
>   if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base,
>  chunk, ))
>   goto fault;
> + if (full_chunk > chunk) {
> + if (chunk % 2) {
> + __be16 odd = 0;
> +
> + if (skb_copy_bits(skb, hlen + chunk,
> +   (char *) + 1, 1))
> + goto fault;
> + csum = add32_with_carry(odd, csum);
> + csum = skb_checksum(skb, hlen + chunk + 1,
> + full_chunk - chunk - 1,
> + csum);
> + } else
> + csum = skb_checksum(skb, hlen + chunk,
> + full_chunk - chunk, csum);
> + }
>   if (csum_fold(csum))
>   goto csum_error;
>   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
> -- 
> 2.6.4


This adds quite a bit of complexity.  I am considering a revert of my
buggy patch, and use what Eric Dumazet suggested instead:

https://patchwork.ozlabs.org/patch/543562/

What do you think?
Eric, would you submit your patch formally?


Thanks

-- 
Sabrina
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next 1/3] uapi: add MACsec bits

2015-12-28 Thread Sabrina Dubroca
Signed-off-by: Sabrina Dubroca 
Reviewed-by: Hannes Frederic Sowa 
---
 include/uapi/linux/Kbuild  |   1 +
 include/uapi/linux/if_ether.h  |   1 +
 include/uapi/linux/if_link.h   |  21 ++
 include/uapi/linux/if_macsec.h | 160 +
 4 files changed, 183 insertions(+)
 create mode 100644 include/uapi/linux/if_macsec.h

diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index c2e5d6cb34e3..3ee4211af77b 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -173,6 +173,7 @@ header-y += if_hippi.h
 header-y += if_infiniband.h
 header-y += if_link.h
 header-y += if_ltalk.h
+header-y += if_macsec.h
 header-y += if_packet.h
 header-y += if_phonet.h
 header-y += if_plip.h
diff --git a/include/uapi/linux/if_ether.h b/include/uapi/linux/if_ether.h
index ea9221b0331a..4a93051c578c 100644
--- a/include/uapi/linux/if_ether.h
+++ b/include/uapi/linux/if_ether.h
@@ -83,6 +83,7 @@
 #define ETH_P_8021AD   0x88A8  /* 802.1ad Service VLAN */
 #define ETH_P_802_EX1  0x88B5  /* 802.1 Local Experimental 1.  */
 #define ETH_P_TIPC 0x88CA  /* TIPC */
+#define ETH_P_MACSEC   0x88E5  /* 802.1ae MACsec */
 #define ETH_P_8021AH   0x88E7  /* 802.1ah Backbone Service Tag */
 #define ETH_P_MVRP 0x88F5  /* 802.1Q MVRP  */
 #define ETH_P_1588 0x88F7  /* IEEE 1588 Timesync */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index a30b78090594..be0e6caf0bf0 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -401,6 +401,27 @@ enum {
 
 #define IFLA_VRF_MAX (__IFLA_VRF_MAX - 1)
 
+/* MACSEC section */
+enum {
+   IFLA_MACSEC_UNSPEC,
+   IFLA_MACSEC_SCI,
+   IFLA_MACSEC_PORT,
+   IFLA_MACSEC_ICV_LEN,
+   IFLA_MACSEC_CIPHER_SUITE,
+   IFLA_MACSEC_WINDOW,
+   IFLA_MACSEC_ENCODING_SA,
+   IFLA_MACSEC_ENCRYPT,
+   IFLA_MACSEC_PROTECT,
+   IFLA_MACSEC_INC_SCI,
+   IFLA_MACSEC_ES,
+   IFLA_MACSEC_SCB,
+   IFLA_MACSEC_REPLAY_PROTECT,
+   IFLA_MACSEC_VALIDATION,
+   __IFLA_MACSEC_MAX,
+};
+
+#define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
+
 /* IPVLAN section */
 enum {
IFLA_IPVLAN_UNSPEC,
diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h
new file mode 100644
index ..405cbf6a8c05
--- /dev/null
+++ b/include/uapi/linux/if_macsec.h
@@ -0,0 +1,160 @@
+/*
+ * include/uapi/linux/if_macsec.h - MACsec device
+ *
+ * Copyright (c) 2015 Sabrina Dubroca 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef _UAPI_MACSEC_H
+#define _UAPI_MACSEC_H
+
+#define MACSEC_GENL_NAME "macsec"
+#define MACSEC_GENL_VERSION 1
+
+#define MACSEC_MAX_KEY_LEN 128
+
+
+#define DEFAULT_CIPHER_NAME "GCM-AES-128"
+#define DEFAULT_CIPHER_ID   0x008002000101ULL
+#define DEFAULT_CIPHER_ALT  0x0080C2000101ULL
+
+#define MACSEC_MIN_ICV_LEN 8
+#define MACSEC_MAX_ICV_LEN 32
+
+enum macsec_attrs {
+   MACSEC_ATTR_UNSPEC,
+   MACSEC_ATTR_IFINDEX,
+   MACSEC_ATTR_SCI,
+   MACSEC_ATTR_PORT,
+   MACSEC_ATTR_ENCODING_SA,
+   MACSEC_ATTR_PN,
+   MACSEC_ATTR_WINDOW,
+   MACSEC_ATTR_AN,
+   MACSEC_ATTR_KEYID,
+   MACSEC_ATTR_KEY,
+   MACSEC_ATTR_CIPHER_SUITE,
+   MACSEC_ATTR_ICV_LEN,
+   MACSEC_TXSA_LIST,
+   MACSEC_RXSC_LIST,
+   MACSEC_TXSC_STATS,
+   MACSEC_SECY_STATS,
+   MACSEC_ATTR_SC_ACTIVE,
+   MACSEC_ATTR_SA_ACTIVE,
+   MACSEC_ATTR_PROTECT,
+   MACSEC_ATTR_REPLAY,
+   MACSEC_ATTR_OPER,
+   MACSEC_ATTR_VALIDATE,
+   MACSEC_ATTR_ENCRYPT,
+   MACSEC_ATTR_INC_SCI,
+   MACSEC_ATTR_ES,
+   MACSEC_ATTR_SCB,
+   __MACSEC_ATTR_END,
+   NUM_MACSEC_ATTR = __MACSEC_ATTR_END,
+   MACSEC_ATTR_MAX = __MACSEC_ATTR_END - 1,
+};
+
+enum macsec_sa_list_attrs {
+   MACSEC_SA_LIST_UNSPEC,
+   MACSEC_SA,
+   __MACSEC_ATTR_SA_LIST_MAX,
+   MACSEC_ATTR_SA_LIST_MAX = __MACSEC_ATTR_SA_LIST_MAX - 1,
+};
+
+enum macsec_rxsc_list_attrs {
+   MACSEC_RXSC_LIST_UNSPEC,
+   MACSEC_RXSC,
+   __MACSEC_ATTR_RXSC_LIST_MAX,
+   MACSEC_ATTR_RXSC_LIST_MAX = __MACSEC_ATTR_RXSC_LIST_MAX - 1,
+};
+
+enum macsec_rxsc_attrs {
+   MACSEC_ATTR_SC_UNSPEC,
+   MACSEC_ATTR_SC_STATE,
+   MACSEC_ATTR_SC_SCI,
+   MACSEC_RXSA_LIST,
+   MACSEC_RXSC_STATS,
+   __MACSEC_ATTR_SC_MAX,
+   MACSEC_ATTR_SC_MAX = __MACSEC_ATTR_SC_MAX - 1,
+};
+
+enum macsec_sa_attrs {
+   MACSEC_ATTR_SA_UNSPEC,
+   MACSEC_ATTR_SA_STATE,
+   MACSEC_ATTR_SA_AN,
+   MACSEC_ATTR_SA_PN,
+   MACSEC_ATTR_SA_KEYID,
+   

[RFC PATCH net-next 3/3] macsec: introduce IEEE 802.1AE driver

2015-12-28 Thread Sabrina Dubroca
This is an implementation of MACsec/IEEE 802.1AE.  This driver
provides authentication and encryption of traffic in a LAN, typically
with GCM-AES-128, and optional replay protection.

http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf

Signed-off-by: Sabrina Dubroca 
Reviewed-by: Hannes Frederic Sowa 
---
 drivers/net/Kconfig  |7 +
 drivers/net/Makefile |1 +
 drivers/net/macsec.c | 2991 ++
 3 files changed, 2999 insertions(+)
 create mode 100644 drivers/net/macsec.c

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f184fb5bd110..2a1ba62b7da2 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -193,6 +193,13 @@ config GENEVE
  To compile this driver as a module, choose M here: the module
  will be called geneve.
 
+config MACSEC
+   tristate "IEEE 802.1AE MAC-level encryption (MACsec)"
+   select CRYPTO_AES
+   select CRYPTO_GCM
+   ---help---
+  MACsec is an encryption standard for Ethernet.
+
 config NETCONSOLE
tristate "Network console logging support"
---help---
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 900b0c5320bb..1aa7cb845663 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_IPVLAN) += ipvlan/
 obj-$(CONFIG_DUMMY) += dummy.o
 obj-$(CONFIG_EQUALIZER) += eql.o
 obj-$(CONFIG_IFB) += ifb.o
+obj-$(CONFIG_MACSEC) += macsec.o
 obj-$(CONFIG_MACVLAN) += macvlan.o
 obj-$(CONFIG_MACVTAP) += macvtap.o
 obj-$(CONFIG_MII) += mii.o
diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
new file mode 100644
index ..64975ef2c6af
--- /dev/null
+++ b/drivers/net/macsec.c
@@ -0,0 +1,2991 @@
+/*
+ * drivers/net/macsec.c - MACsec device
+ *
+ * Copyright (c) 2015 Sabrina Dubroca 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+typedef u64 __bitwise sci_t;
+
+#define MACSEC_SCI_LEN 8
+
+/* SecTAG length = macsec_eth_header without the optional SCI */
+#define MACSEC_TAG_LEN 6
+
+struct macsec_eth_header {
+   struct ethhdr eth;
+   /* SecTAG */
+   u8  tci_an;
+#if defined(__LITTLE_ENDIAN_BITFIELD)
+   u8  short_length:6,
+ unused:2;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+   u8unused:2,
+   short_length:6;
+#else
+#error "Please fix "
+#endif
+   __be32 packet_number;
+   u8 secure_channel_id[8]; /* optional */
+} __packed;
+
+#define MACSEC_TCI_VERSION 0x80
+#define MACSEC_TCI_ES  0x40 /* end station */
+#define MACSEC_TCI_SC  0x20 /* SCI present */
+#define MACSEC_TCI_SCB 0x10 /* epon */
+#define MACSEC_TCI_E   0x08 /* encryption */
+#define MACSEC_TCI_C   0x04 /* changed text */
+#define MACSEC_AN_MASK 0x03 /* association number */
+#define MACSEC_TCI_CONFID  (MACSEC_TCI_E | MACSEC_TCI_C)
+
+#define MACSEC_SHORTLEN_THR 48
+
+#define GCM_AES_IV_LEN 12
+#define DEFAULT_ICV_LEN 16
+
+#define for_each_rxsc(secy, sc)\
+   for (sc = rcu_dereference(secy->rx_sc); \
+sc;\
+sc = rcu_dereference(sc->next))
+#define for_each_rxsc_rtnl(secy, sc)   \
+   for (sc = rtnl_dereference(secy->rx_sc);\
+sc;\
+sc = rtnl_dereference(sc->next))
+
+struct gcm_iv {
+   union {
+   u8 secure_channel_id[8];
+   sci_t sci;
+   };
+   __be32 pn;
+};
+
+/**
+ * struct macsec_key - SA key
+ * @id user-provided key identifier
+ * @tfm crypto struct, key storage
+ */
+struct macsec_key {
+   u64 id;
+   struct crypto_aead *tfm;
+};
+
+/**
+ * struct macsec_rx_sa - receive secure association
+ * @active
+ * @next_pn packet number expected for the next packet
+ * @lock protects next_pn manipulations
+ * @key key structure
+ * @stats per-SA stats
+ */
+struct macsec_rx_sa {
+   bool active;
+   u32 next_pn;
+   spinlock_t lock;
+   struct macsec_key key;
+   struct macsec_rx_sa_stats __percpu *stats;
+   atomic_t refcnt;
+   struct rcu_head rcu;
+};
+
+struct pcpu_rx_sc_stats {
+   struct macsec_rx_sc_stats stats;
+   struct u64_stats_sync syncp;
+};
+
+/**
+ * struct macsec_rx_sc - receive secure channel
+ * @sci secure channel identifier for this SC
+ * @active channel is active
+ * @sa array of secure associations
+ * @stats per-SC stats
+ */
+struct macsec_rx_sc {
+   struct macsec_rx_sc __rcu *next;
+   sci_t sci;
+   bool active;
+   struct macsec_rx_sa __rcu *sa[4];
+   struct 

[RFC PATCH net-next 2/3] net: add MACsec netdevice priv_flags and helper

2015-12-28 Thread Sabrina Dubroca
Signed-off-by: Sabrina Dubroca 
Reviewed-by: Hannes Frederic Sowa 
---
 include/linux/netdevice.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c20b814e46a0..4f80f621ecf9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1319,6 +1319,7 @@ enum netdev_priv_flags {
IFF_OPENVSWITCH = 1<<22,
IFF_L3MDEV_SLAVE= 1<<23,
IFF_TEAM= 1<<24,
+   IFF_MACSEC  = 1<<25,
 };
 
 #define IFF_802_1Q_VLANIFF_802_1Q_VLAN
@@ -1346,6 +1347,7 @@ enum netdev_priv_flags {
 #define IFF_OPENVSWITCHIFF_OPENVSWITCH
 #define IFF_L3MDEV_SLAVE   IFF_L3MDEV_SLAVE
 #define IFF_TEAM   IFF_TEAM
+#define IFF_MACSEC IFF_MACSEC
 
 /**
  * struct net_device - The DEVICE structure.
@@ -3964,6 +3966,11 @@ static inline void skb_gso_error_unwind(struct sk_buff 
*skb, __be16 protocol,
skb->mac_len = mac_len;
 }
 
+static inline bool netif_is_macsec(const struct net_device *dev)
+{
+   return dev->priv_flags & IFF_MACSEC;
+}
+
 static inline bool netif_is_macvlan(const struct net_device *dev)
 {
return dev->priv_flags & IFF_MACVLAN;
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next 0/3] MACsec IEEE 802.1AE implementation

2015-12-28 Thread Sabrina Dubroca
MACsec (IEEE 802.1AE [0]) is a protocol that provides security for
wired ethernet LANs.  MACsec offers two protection modes:
authentication only, or authenticated encryption.

MACsec defines "secure channels" that allow transmission from one node
to one or more others.  Communication on a channel is done over a
succession of "secure associations", that each use a specific key.
Secure associations are identified by their "association number" in
the range 0..3.  A secure association is retired when its 32-bit
packet number would wrap, and the same association number can later be
reused with a new key and packet number.

The standard mode of encryption is GCM AES with 128 bits keys,
although an extension allows 256 bits keys [1] (not implemented in
this submission).

When using MACsec, an extra header, called "SecTAG", is added between
the ethernet header and the original payload:

 +-+++
 |(MACsec ethertype)   | TCI_AN |   SL   |
 +-+++
 |   Packet Number   |
 +---+
 | Secure Channel Identifier |
 |(optional) |
 +---+

TCI_AN:
 version
 end_station
 sci_present
 scb
 encrypted
 changed_text
 association_number (2 bits)
SL:
 short_length (6 bits)
 unused (2 bits)

The ethertype for the packet is set to 0x88E5, and the original
ethertype becomes part of the secure payload, which may be encrypted.
The ethernet header and the SecTAG are always transmitted in the
clear, but are integrity-protected.

MACsec supports optional replay protection with a configurable replay
window.

MACsec is designed to be used with the MKA extension to 802.1X (MACsec
Key Agreement protocol) [2], which provides channel attribution and
key distribution to the nodes, but can also be used with static keys
getting fed manually by an administrator.

Optional (not supported yet) features:
 - confidentiality offset: in encryption mode, part of the payload may
   be left unencrypted.
 - choice of cipher suite: GCM AES with 256 bits has been standardised
   [1].


Implementation

A netdevice is created on top of a real device for each TX secure
channel, like we do for VLANs.  Multiple TX channels can be created on
top of the same underlying device.

Several other approaches were considered for the RX path:

 - dev_add_pack: doesn't work, because we want to filter out
   unprotected packets
 - transparent mode: MACsec would be enabled directly on the real
   netdevice.  For this, we cannot use a rx_handler directly because
   MACsec must be available for underlying devices enslaved in a
   bridge or in a bond, so we need a hook directly in
   __netif_receive_skb_core.  This approach makes it harder to filter
   non-encrypted packets on RX without forcing the user to setup some
   rules, so the "transparent" mode is not so transparent after all.
   It also makes TX more complex than with a dedicated netdevice.

One issue with the proposed implementation is that the qdisc layer for
the real device operates on already encrypted packets.


Netlink API

This is currently a mix of rtnetlink (to create the device and set up
the TX channel) and genl (for RX channels, secure associations and
their keys).  genl provides clean demultiplexing of the {TX,RX}{SC,SA}
commands.


Use cases

The normal use case is wired LANs, including veth and slave devices
for bonding/teaming or bridges.

MACsec can also be used on any device that makes a full ethernet
header visible, for example VXLAN.
The VXLAN+MACsec setup would be:

 hypervisor| virtual machine
--|--

And the packets would look like this:

| eth | IP | UDP | VXLAN | eth | MACsec | IP | ... | MACsec ICV |

One benefit on this approach to encryption in the cloud is that the
payload is encrypted by the tenant, not by the tunnel provider, thus
the tenant has full control over the keys.


Future plans:
 - offload to hardware, on nics that support it (at least some ixgbe)
 - implement optional features

The corresponding iproute code will be posted later.


[0] http://standards.ieee.org/getieee802/download/802.1AE-2006.pdf
[1] http://standards.ieee.org/getieee802/download/802.1AEbn-2011.pdf
[2] http://standards.ieee.org/getieee802/download/802.1X-2010.pdf


Sabrina Dubroca (3):
  uapi: add MACsec bits
  net: add MACsec netdevice priv_flags and helper
  macsec: introduce IEEE 802.1AE driver

 drivers/net/Kconfig|7 +
 drivers/net/Makefile   |1 +
 drivers/net/macsec.c   | 2991 
 include/linux/netdevice.h  |7 +
 include/uapi/linux/Kbuild  |1 +
 

Re: [PATCH v1 3/3] bpf: hash: use per-bucket spinlock

2015-12-28 Thread Daniel Borkmann

On 12/28/2015 01:55 PM, Ming Lei wrote:

Both htab_map_update_elem() and htab_map_delete_elem() can be
called from eBPF program, and they may be in kernel hot path,
so it isn't efficient to use a per-hashtable lock in this two
helpers.

The per-hashtable spinlock is used for protecting bucket's
hlist, and per-bucket lock is just enough. This patch converts
the per-hashtable lock into per-bucket spinlock, so that
contention can be decreased a lot.

Signed-off-by: Ming Lei 
---
  kernel/bpf/hashtab.c | 46 ++
  1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index d857fcb..67222a9 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -14,10 +14,14 @@
  #include 
  #include 

+struct bucket {
+   struct hlist_head head;
+   raw_spinlock_t lock;
+};
+
  struct bpf_htab {
struct bpf_map map;
-   struct hlist_head *buckets;
-   raw_spinlock_t lock;
+   struct bucket *buckets;
atomic_t count; /* number of elements in this hashtable */
u32 n_buckets;  /* number of hash buckets */
u32 elem_size;  /* size of each element in bytes */
@@ -88,24 +92,25 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
/* make sure page count doesn't overflow */
goto free_htab;


When adapting memory accounting and allocation sizes below where you replace
sizeof(struct hlist_head) with sizeof(struct bucket), is there a reason why
you don't update the overflow checks along with it?

[...]
/* prevent zero size kmalloc and check for u32 overflow */
if (htab->n_buckets == 0 ||
htab->n_buckets > U32_MAX / sizeof(struct hlist_head))
goto free_htab;

if ((u64) htab->n_buckets * sizeof(struct hlist_head) +
(u64) htab->elem_size * htab->map.max_entries >=
U32_MAX - PAGE_SIZE)
/* make sure page count doesn't overflow */
goto free_htab;
[...]


-   htab->map.pages = round_up(htab->n_buckets * sizeof(struct hlist_head) +
+   htab->map.pages = round_up(htab->n_buckets * sizeof(struct bucket) +
   htab->elem_size * htab->map.max_entries,
   PAGE_SIZE) >> PAGE_SHIFT;

err = -ENOMEM;
-   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct 
hlist_head),
+   htab->buckets = kmalloc_array(htab->n_buckets, sizeof(struct bucket),
  GFP_USER | __GFP_NOWARN);

if (!htab->buckets) {
-   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
hlist_head));
+   htab->buckets = vmalloc(htab->n_buckets * sizeof(struct 
bucket));
if (!htab->buckets)
goto free_htab;
}


[...]

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 0/3] bpf: hash: use per-bucket spinlock

2015-12-28 Thread Ming Lei
Hi,

This patchset tries to optimize ebpf hash map, and follows
the idea:

Both htab_map_update_elem() and htab_map_delete_elem()
can be called from eBPF program, and they may be in kernel
hot path, it isn't efficient to use a per-hashtable lock
in this two helpers, so this patch converts the lock into
per-bucket spinlock.

With this patchset, looks the performance penalty from eBPF
decreased a lot, see the following test:

1) run 'tools/biolatency' of bcc before running block test;

2) run fio to test block throught over /dev/nullb0,
(randread, 16jobs, libaio, 4k bs) and the test box
is one 24cores(dual sockets) VM server:
- without patchset:  607K IOPS
- with this patchset: 1184K IOPS
- without running eBPF prog: 1492K IOPS

TODO:
- remove the per-hashtable atomic counter

V1:
- fix the wrong 3/3 patch 


 kernel/bpf/hashtab.c | 60 +++-
 1 file changed, 36 insertions(+), 24 deletions(-)

Thanks,

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v1 2/3] bpf: hash: move select_bucket() out of htab's spinlock

2015-12-28 Thread Ming Lei
The spinlock is just used for protecting the per-bucket
hlist, so it isn't needed for selecting bucket.

Acked-by: Daniel Borkmann 
Signed-off-by: Ming Lei 
---
 kernel/bpf/hashtab.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 2615388..d857fcb 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -248,12 +248,11 @@ static int htab_map_update_elem(struct bpf_map *map, void 
*key, void *value,
memcpy(l_new->key + round_up(key_size, 8), value, map->value_size);
 
l_new->hash = htab_map_hash(l_new->key, key_size);
+   head = select_bucket(htab, l_new->hash);
 
/* bpf_map_update_elem() can be called in_irq() */
raw_spin_lock_irqsave(>lock, flags);
 
-   head = select_bucket(htab, l_new->hash);
-
l_old = lookup_elem_raw(head, l_new->hash, key, key_size);
 
if (!l_old && unlikely(atomic_read(>count) >= map->max_entries)) {
@@ -310,11 +309,10 @@ static int htab_map_delete_elem(struct bpf_map *map, void 
*key)
key_size = map->key_size;
 
hash = htab_map_hash(key, key_size);
+   head = select_bucket(htab, hash);
 
raw_spin_lock_irqsave(>lock, flags);
 
-   head = select_bucket(htab, hash);
-
l = lookup_elem_raw(head, hash, key, key_size);
 
if (l) {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator

2015-12-28 Thread Ivan Safonov
((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal 
for X >= 0.

Signed-off-by: Ivan Safonov 
---
 drivers/net/wireless/ath/ath9k/ar9003_eeprom.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c 
b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
index 8b4561e..35e57f7 100644
--- a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
+++ b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
@@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw 
*ah)
REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);
 
-   therm_on = (thermometer < 0) ? 0 : (thermometer == 0);
+   therm_on = thermometer == 0;
REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
if (pCap->chip_chainmask & BIT(1)) {
-   therm_on = (thermometer < 0) ? 0 : (thermometer == 1);
+   therm_on = thermometer == 1;
REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
}
if (pCap->chip_chainmask & BIT(2)) {
-   therm_on = (thermometer < 0) ? 0 : (thermometer == 2);
+   therm_on = thermometer == 2;
REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
}
-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats

2015-12-28 Thread Florian Fainelli
On December 28, 2015 12:58:25 AM PST, Andrew Lunn  wrote:
>On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote:
>> From: David Miller 
>> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST)
>> 
>> > From: Andrew Lunn 
>> > Date: Sun, 27 Dec 2015 12:58:25 +0100
>> > 
>> >> This patchset add ethtool support for reading statistics from the
>PHY.
>> >> The Marvell and Micrel Phys are then extended to report receiver
>> >> packet errors and idle errors.
>> >> 
>> >> v2:
>> >>   Fix linking when phylib is not enabled.
>> > 
>> > Series applied, thanks Andrew.
>> 
>> I have to revert this.
>> 
>> You can't call into PHY library code from net/core/ethtool.c,
>> because PHY lib can be modular.
>
>O.K.
>
>How about we change PHYLIB from a tristate to a bool?

Could we make this in a two layer design instead? Core ethtool changes call 
into getter functions which are implemented by the netdev driver, if the netdev 
driver happens to use PHYLIB, calls into the PHYLIB helpers with its phydev? 
That would also allow a network driver already supporting PHY stats to report 
them through your new interface.

-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()

2015-12-28 Thread Michal Kubecek
On Mon, Dec 28, 2015 at 03:29:42PM +0100, Sabrina Dubroca wrote:
> 2015-12-28, 15:01:57 +0100, Michal Kubecek wrote:
> > Recent fix "net: add length argument to
> > skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable
> > branches, namely
> > 
> >   stable-3.2.y: commit 127500d724f8
> >   stable-3.12.y: commit 3e1ac3aafbd0
> > 
> > doesn't handle truncated reads correctly. If read length is shorter than
> > incoming datagram (but non-zero) and first segment of target iovec is
> > sufficient for read length, skb_copy_and_csum_datagram() is used to copy
> > checksum the data while copying it. For truncated reads this means only
> > the copied part is checksummed (rather than the whole datagram) so that
> > the check almost always fails.
> 
> I just ran into this issue too, sorry I didn't notice it earlier :(
> 
> > Add checksum of the remaining part so that the proper checksum of the
> > whole datagram is computed and checked. Special care must be taken if
> > the copied length is odd.
> > 
> > For zero read length, we don't have to copy anything but we still should
> > check the checksum so that a peek doesn't return with a datagram which
> > is invalid and wouldn't be returned by an actual read.
> > 
> > Signed-off-by: Michal Kubecek 
> > ---
> >  net/core/datagram.c | 26 +-
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> > 
> > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > index f22f120771ef..af4bf368257c 100644
> > --- a/net/core/datagram.c
> > +++ b/net/core/datagram.c
> > @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff 
> > *skb,
> >  int hlen, struct iovec *iov, int len)
> >  {
> > __wsum csum;
> > -   int chunk = skb->len - hlen;
> > +   int full_chunk = skb->len - hlen;
> > +   int chunk = min_t(int, full_chunk, len);
> >  
> > -   if (chunk > len)
> > -   chunk = len;
> > -
> > -   if (!chunk)
> > +   if (!chunk) {
> > +   if (__skb_checksum_complete(skb))
> > +   goto csum_error;
> > return 0;
> > +   }
> >  
> > /* Skip filled elements.
> >  * Pretty silly, look at memcpy_toiovec, though 8)
> > @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff 
> > *skb,
> > if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base,
> >chunk, ))
> > goto fault;
> > +   if (full_chunk > chunk) {
> > +   if (chunk % 2) {
> > +   __be16 odd = 0;
> > +
> > +   if (skb_copy_bits(skb, hlen + chunk,
> > + (char *) + 1, 1))
> > +   goto fault;
> > +   csum = add32_with_carry(odd, csum);
> > +   csum = skb_checksum(skb, hlen + chunk + 1,
> > +   full_chunk - chunk - 1,
> > +   csum);
> > +   } else
> > +   csum = skb_checksum(skb, hlen + chunk,
> > +   full_chunk - chunk, csum);
> > +   }
> > if (csum_fold(csum))
> > goto csum_error;
> > if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
> > -- 
> > 2.6.4
> 
> 
> This adds quite a bit of complexity.

I'm not really happy about it either. :-( Most of the complexity comes
from the corner case when one 16-bit word is divided between the copied
part and the rest - but I can't see a nicer way to handle it.

There is another option: in the case of truncated read, we could simply
take the first branch where copying is separated from checksumming. This
would be less efficient, obviously, but I must admit I have no idea how
much.

> I am considering a revert of my
> buggy patch, and use what Eric Dumazet suggested instead:
> 
> https://patchwork.ozlabs.org/patch/543562/
> 
> What do you think?

I believe that would work. I have a little bad feeling about such
solution as it would keep the function broken and just rely on not
hitting it in the case when it matters. But it worked that way for quite
some time so it's probably safe.

  Michal Kubecek

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator

2015-12-28 Thread Joe Perches
On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote:
> ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal 
> for X >= 0.

X is not guaranteed to be >= 0 here

> Signed-off-by: Ivan Safonov 
> ---
>  drivers/net/wireless/ath/ath9k/ar9003_eeprom.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c 
> b/drivers/net/wireless/ath/ath9k/ar9003_eeprom.c
[]
> @@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw 
> *ah)
>   REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
>     AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);
>  
> - therm_on = (thermometer < 0) ? 0 : (thermometer == 0);
> + therm_on = thermometer == 0;

This code is not equivalent.

Check what happens when thermometer is -1.

>   REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4,
>     AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
>   if (pCap->chip_chainmask & BIT(1)) {
> - therm_on = (thermometer < 0) ? 0 : (thermometer == 1);
> + therm_on = thermometer == 1;
>   REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4,
>     AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
>   }
>   if (pCap->chip_chainmask & BIT(2)) {
> - therm_on = (thermometer < 0) ? 0 : (thermometer == 2);
> + therm_on = thermometer == 2;
>   REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
>     AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
>   }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] batman-adv: use to_delayed_work

2015-12-28 Thread Geliang Tang
Use to_delayed_work() instead of open-coding it.

Signed-off-by: Geliang Tang 
---
 net/batman-adv/bridge_loop_avoidance.c | 2 +-
 net/batman-adv/distributed-arp-table.c | 2 +-
 net/batman-adv/network-coding.c| 2 +-
 net/batman-adv/originator.c| 2 +-
 net/batman-adv/send.c  | 4 ++--
 net/batman-adv/translation-table.c | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/batman-adv/bridge_loop_avoidance.c 
b/net/batman-adv/bridge_loop_avoidance.c
index 99dcae3..4f47c82 100644
--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1183,7 +1183,7 @@ static void batadv_bla_periodic_work(struct work_struct 
*work)
struct batadv_hard_iface *primary_if;
int i;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
priv_bla = container_of(delayed_work, struct batadv_priv_bla, work);
bat_priv = container_of(priv_bla, struct batadv_priv, bla);
primary_if = batadv_primary_if_get_selected(bat_priv);
diff --git a/net/batman-adv/distributed-arp-table.c 
b/net/batman-adv/distributed-arp-table.c
index a49c705..64facfc 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -138,7 +138,7 @@ static void batadv_dat_purge(struct work_struct *work)
struct batadv_priv_dat *priv_dat;
struct batadv_priv *bat_priv;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
priv_dat = container_of(delayed_work, struct batadv_priv_dat, work);
bat_priv = container_of(priv_dat, struct batadv_priv, dat);
 
diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index f5276be..a5ffe20 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -700,7 +700,7 @@ static void batadv_nc_worker(struct work_struct *work)
struct batadv_priv *bat_priv;
unsigned long timeout;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
priv_nc = container_of(delayed_work, struct batadv_priv_nc, work);
bat_priv = container_of(priv_nc, struct batadv_priv, nc);
 
diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 3c782a33..0eb5d01 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -1232,7 +1232,7 @@ static void batadv_purge_orig(struct work_struct *work)
struct delayed_work *delayed_work;
struct batadv_priv *bat_priv;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
bat_priv = container_of(delayed_work, struct batadv_priv, orig_work);
_batadv_purge_orig(bat_priv);
queue_delayed_work(batadv_event_workqueue,
diff --git a/net/batman-adv/send.c b/net/batman-adv/send.c
index f664324..848ce65 100644
--- a/net/batman-adv/send.c
+++ b/net/batman-adv/send.c
@@ -506,7 +506,7 @@ static void batadv_send_outstanding_bcast_packet(struct 
work_struct *work)
struct net_device *soft_iface;
struct batadv_priv *bat_priv;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
forw_packet = container_of(delayed_work, struct batadv_forw_packet,
   delayed_work);
soft_iface = forw_packet->if_incoming->soft_iface;
@@ -559,7 +559,7 @@ void batadv_send_outstanding_bat_ogm_packet(struct 
work_struct *work)
struct batadv_forw_packet *forw_packet;
struct batadv_priv *bat_priv;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
forw_packet = container_of(delayed_work, struct batadv_forw_packet,
   delayed_work);
bat_priv = netdev_priv(forw_packet->if_incoming->soft_iface);
diff --git a/net/batman-adv/translation-table.c 
b/net/batman-adv/translation-table.c
index ec67def..ed6fcc7 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3157,7 +3157,7 @@ static void batadv_tt_purge(struct work_struct *work)
struct batadv_priv_tt *priv_tt;
struct batadv_priv *bat_priv;
 
-   delayed_work = container_of(work, struct delayed_work, work);
+   delayed_work = to_delayed_work(work);
priv_tt = container_of(delayed_work, struct batadv_priv_tt, work);
bat_priv = container_of(priv_tt, struct batadv_priv, tt);
 
-- 
2.5.0


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable-3.2 stable-3.12] net: fix checksum check in skb_copy_and_csum_datagram_iovec()

2015-12-28 Thread Eric Dumazet
On Mon, 2015-12-28 at 16:38 +0100, Michal Kubecek wrote:
> On Mon, Dec 28, 2015 at 03:29:42PM +0100, Sabrina Dubroca wrote:
> > 2015-12-28, 15:01:57 +0100, Michal Kubecek wrote:
> > > Recent fix "net: add length argument to
> > > skb_copy_and_csum_datagram_iovec" added to some pre-3.19 stable
> > > branches, namely
> > > 
> > >   stable-3.2.y: commit 127500d724f8
> > >   stable-3.12.y: commit 3e1ac3aafbd0
> > > 
> > > doesn't handle truncated reads correctly. If read length is shorter than
> > > incoming datagram (but non-zero) and first segment of target iovec is
> > > sufficient for read length, skb_copy_and_csum_datagram() is used to copy
> > > checksum the data while copying it. For truncated reads this means only
> > > the copied part is checksummed (rather than the whole datagram) so that
> > > the check almost always fails.
> > 
> > I just ran into this issue too, sorry I didn't notice it earlier :(
> > 
> > > Add checksum of the remaining part so that the proper checksum of the
> > > whole datagram is computed and checked. Special care must be taken if
> > > the copied length is odd.
> > > 
> > > For zero read length, we don't have to copy anything but we still should
> > > check the checksum so that a peek doesn't return with a datagram which
> > > is invalid and wouldn't be returned by an actual read.
> > > 
> > > Signed-off-by: Michal Kubecek 
> > > ---
> > >  net/core/datagram.c | 26 +-
> > >  1 file changed, 21 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/net/core/datagram.c b/net/core/datagram.c
> > > index f22f120771ef..af4bf368257c 100644
> > > --- a/net/core/datagram.c
> > > +++ b/net/core/datagram.c
> > > @@ -809,13 +809,14 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff 
> > > *skb,
> > >int hlen, struct iovec *iov, int len)
> > >  {
> > >   __wsum csum;
> > > - int chunk = skb->len - hlen;
> > > + int full_chunk = skb->len - hlen;
> > > + int chunk = min_t(int, full_chunk, len);
> > >  
> > > - if (chunk > len)
> > > - chunk = len;
> > > -
> > > - if (!chunk)
> > > + if (!chunk) {
> > > + if (__skb_checksum_complete(skb))
> > > + goto csum_error;
> > >   return 0;
> > > + }
> > >  
> > >   /* Skip filled elements.
> > >* Pretty silly, look at memcpy_toiovec, though 8)
> > > @@ -833,6 +834,21 @@ int skb_copy_and_csum_datagram_iovec(struct sk_buff 
> > > *skb,
> > >   if (skb_copy_and_csum_datagram(skb, hlen, iov->iov_base,
> > >  chunk, ))
> > >   goto fault;
> > > + if (full_chunk > chunk) {
> > > + if (chunk % 2) {
> > > + __be16 odd = 0;
> > > +
> > > + if (skb_copy_bits(skb, hlen + chunk,
> > > +   (char *) + 1, 1))
> > > + goto fault;
> > > + csum = add32_with_carry(odd, csum);
> > > + csum = skb_checksum(skb, hlen + chunk + 1,
> > > + full_chunk - chunk - 1,
> > > + csum);
> > > + } else
> > > + csum = skb_checksum(skb, hlen + chunk,
> > > + full_chunk - chunk, csum);
> > > + }
> > >   if (csum_fold(csum))
> > >   goto csum_error;
> > >   if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE))
> > > -- 
> > > 2.6.4
> > 
> > 
> > This adds quite a bit of complexity.
> 
> I'm not really happy about it either. :-( Most of the complexity comes
> from the corner case when one 16-bit word is divided between the copied
> part and the rest - but I can't see a nicer way to handle it.
> 
> There is another option: in the case of truncated read, we could simply
> take the first branch where copying is separated from checksumming. This
> would be less efficient, obviously, but I must admit I have no idea how
> much.
> 
> > I am considering a revert of my
> > buggy patch, and use what Eric Dumazet suggested instead:
> > 
> > https://patchwork.ozlabs.org/patch/543562/
> > 
> > What do you think?
> 
> I believe that would work. I have a little bad feeling about such
> solution as it would keep the function broken and just rely on not
> hitting it in the case when it matters. But it worked that way for quite
> some time so it's probably safe.

For the record, this is what we are using here at Google on our prod
kernel.

Sabrina, I certainly can send the patch for net-next kernel, as this
does not fix a bug for current kernels, but backporting it would be
indeed a way to fix the issue for old kernels.






--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net, socket, socket_wq: fix missing initialization of flags

2015-12-28 Thread Eric Dumazet
On Sun, 2015-12-27 at 21:00 +0100, Nicolai Stange wrote:
> Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection")
> 
> Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from
> the current 4.4 release cycle introduces a new flags member in
> struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA
> from struct socket's flags member into that new place.
> 
> Unfortunately, the new flags field is never initialized properly, at least
> not for the struct socket_wq instance created in sock_alloc_inode().
> 
> One particular issue I encountered because of this is that my GNU Emacs
> failed to draw anything on my desktop -- i.e. what I got is a transparent
> window, including the title bar. Bisection lead to the commit mentioned
> above and further investigation by means of strace told me that Emacs
> is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is
> reproducible 100% of times and the fact that properly initializing the
> struct socket_wq ->flags fixes the issue leads me to the conclusion that
> somehow SOCKWQ_ASYNC_NOSPACE got set in the uninitialized ->flags,
> preventing my Emacs from receiving any SIGIO's due to send space becoming
> available again and it got stuck.
> 
> Make sock_alloc_inode() set the newly created struct socket_wq's ->flags
> member to zero.
> 
> Signed-off-by: Nicolai Stange 
> ---
>  net/socket.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/socket.c b/net/socket.c
> index 29822d6..d730ef9 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -257,6 +257,7 @@ static struct inode *sock_alloc_inode(struct super_block 
> *sb)
>   }
>   init_waitqueue_head(>wait);
>   wq->fasync_list = NULL;
> + wq->flags = 0;
>   RCU_INIT_POINTER(ei->socket.wq, wq);
>  
>   ei->socket.state = SS_UNCONNECTED;

Thanks a lot Nicolai for finding this.
I completely overlooked this initial value.

I checked other places where 'struct socket_wq' were allocated and they
look fine.

Acked-by: Eric Dumazet 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats

2015-12-28 Thread David Miller
From: Andrew Lunn 
Date: Mon, 28 Dec 2015 09:58:25 +0100

> On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote:
>> From: David Miller 
>> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST)
>> 
>> > From: Andrew Lunn 
>> > Date: Sun, 27 Dec 2015 12:58:25 +0100
>> > 
>> >> This patchset add ethtool support for reading statistics from the PHY.
>> >> The Marvell and Micrel Phys are then extended to report receiver
>> >> packet errors and idle errors.
>> >> 
>> >> v2:
>> >>   Fix linking when phylib is not enabled.
>> > 
>> > Series applied, thanks Andrew.
>> 
>> I have to revert this.
>> 
>> You can't call into PHY library code from net/core/ethtool.c,
>> because PHY lib can be modular.
> 
> O.K.
> 
> How about we change PHYLIB from a tristate to a bool?

Beforehand it could be modular just fine, and thus save space
for people who do not use it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


4.4-rc7 failure report

2015-12-28 Thread Doug Ledford
Hi Dave,

The 4.4-rc7 kernel is failing for me.  In my case, all of my vlan
interfaces are failing to obtain a dhcp address using dhclient.  I've
tried a hand built 4.4-rc7, and the Fedora rawhide 4.4-rc7 kernel, both
failed.  I've tried NetworkManager and the old SysV network service,
both fail.  I tried a working dhclient from rhel7 on the Fedora rawhide
install and it failed too.  Running tcpdump on the interface shows the
dhcp request going out, and a dhcp response coming back in.  Running
strace on dhclient shows that it writes the dhcp request, but it never
recvs a dhcp response.  If I manually bring the interface up with a
static IP address then I'm able to run typical IP traffic across the
link (aka, ping).  It would seem that when dhclient registers a packet
filter on the socket, that filter is preventing it from ever getting the
dhcp response.  The same dhclient works on any non-vlan interfaces in
the system, so the filter must work for non-vlan interfaces.  Aside from
the fact that the interface is a vlan, we also use a priority egress map
on the interface, and we use PFC flow control.  Let me know if you need
anymore to debug the issue, or email me off list and I can get you
logins to my reproducer machines.

-- 
Doug Ledford 
  GPG KeyID: 0E572FDD




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/4] wcn36xx: Change indication list lock to spinlock

2015-12-28 Thread Stephen Hemminger
On Sun, 27 Dec 2015 17:34:25 -0800
Bjorn Andersson  wrote:

> In preparation for handling incoming messages from IRQ context, change
> the indication list lock to a spinlock
> 
> Signed-off-by: Bjorn Andersson 
> ---
>  drivers/net/wireless/ath/wcn36xx/smd.c | 12 ++--
>  drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  2 +-
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
> b/drivers/net/wireless/ath/wcn36xx/smd.c
> index 6b5dbe6f0d0a..4307429740a9 100644
> --- a/drivers/net/wireless/ath/wcn36xx/smd.c
> +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
> @@ -2165,10 +2165,10 @@ static void wcn36xx_smd_rsp_process(struct wcn36xx 
> *wcn, void *buf, size_t len)
>   msg_ind->msg_len = len;
>   memcpy(msg_ind->msg, buf, len);
>  
> - mutex_lock(>hal_ind_mutex);
> + spin_lock(>hal_ind_lock);

If you are going to handle messages in IRQ context, that better be a 
spin_lock_irq() or spin_lock_bh().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator

2015-12-28 Thread Ivan Safonov

On 12/29/2015 12:56 AM, Joe Perches wrote:

On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote:

((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer == X) are equal for 
X >= 0.

X is not guaranteed to be >= 0 here


X is fixed constant. In this case X is {0, 1, 2}.


@@ -4097,16 +4097,16 @@ static void ar9003_hw_thermometer_apply(struct ath_hw 
*ah)
REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);
  
-	therm_on = (thermometer < 0) ? 0 : (thermometer == 0);

+   therm_on = thermometer == 0;

This code is not equivalent.

Check what happens when thermometer is -1.


therm_on = (thermometer < 0) ? 0 : (thermometer == 0) =>
therm_on = (true) ? 0 : (thermometer == 0) =>

therm_on is 0


therm_on = thermometer == 0 =>

therm_on = false


false is equal to 0


Value of the thermometer variable isanerror code, or athermometercode. 
The thermometercode is never equal to the error code (thermometercode >= 
0, error code <0).

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats

2015-12-28 Thread David Miller
From: Florian Fainelli 
Date: Mon, 28 Dec 2015 09:27:25 -0800

> On December 28, 2015 12:58:25 AM PST, Andrew Lunn  wrote:
>>On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote:
>>> From: David Miller 
>>> Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST)
>>> 
>>> > From: Andrew Lunn 
>>> > Date: Sun, 27 Dec 2015 12:58:25 +0100
>>> > 
>>> >> This patchset add ethtool support for reading statistics from the
>>PHY.
>>> >> The Marvell and Micrel Phys are then extended to report receiver
>>> >> packet errors and idle errors.
>>> >> 
>>> >> v2:
>>> >>   Fix linking when phylib is not enabled.
>>> > 
>>> > Series applied, thanks Andrew.
>>> 
>>> I have to revert this.
>>> 
>>> You can't call into PHY library code from net/core/ethtool.c,
>>> because PHY lib can be modular.
>>
>>O.K.
>>
>>How about we change PHYLIB from a tristate to a bool?
> 
> Could we make this in a two layer design instead? Core ethtool changes call 
> into getter functions which are implemented by the netdev driver, if the 
> netdev driver happens to use PHYLIB, calls into the PHYLIB helpers with its 
> phydev? That would also allow a network driver already supporting PHY stats 
> to report them through your new interface.

Or just put the helpers in net/core/ethtool.c
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nf_unregister_net_hook: hook not found!

2015-12-28 Thread Sander Eikelenboom

Hi,

Running a 4.4.0-rc6 kernel i encountered the warning below.

--
Sander



[   13.740472] ip_tables: (C) 2000-2006 Netfilter Core Team
[   13.936237] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.945391] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   13.947434] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.223990] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.232065] iwlwifi :03:00.0: L1 Enabled - LTR Disabled
[   14.233570] iwlwifi :03:00.0: Radio type=0x2-0x1-0x0
[   14.328141] systemd-logind[2485]: Failed to start user service: 
Unknown unit: user@117.service

[   14.356634] systemd-logind[2485]: New session c1 of user lightdm.
[   14.357320] [ cut here ]
[   14.357327] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 
netfilter_net_exit+0x25/0x50()

[   14.357328] nf_unregister_net_hook: hook not found!
[   14.357371] Modules linked in: iptable_security(+) iptable_raw 
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support 
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp 
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel 
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media 
kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep 
snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev 
snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul 
crc32_pclmul crc32c_intel aesni_intel
[   14.357380]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common pps_core
[   14.357383] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  
4.4.0-rc6-x220-20151224+ #1
[   14.357384] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 ) 07/18/2013

[   14.357390] Workqueue: netns cleanup_net
[   14.357393]  81a27dfd 81359c69 88030e7cbd40 
81060297
[   14.357395]  88030e820d80 88030e7cbd90 81c962d8 
81c962e0
[   14.357397]  88030e7cbdf8 81060317 81a2c010 
88030018

[   14.357398] Call Trace:
[   14.357405]  [] ? dump_stack+0x40/0x57
[   14.357408]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357410]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357416]  [] ? mutex_lock+0x9/0x30
[   14.357418]  [] ? netfilter_net_exit+0x25/0x50
[   14.357421]  [] ? ops_exit_list.isra.6+0x2e/0x60
[   14.357424]  [] ? cleanup_net+0x1ab/0x280
[   14.357427]  [] ? process_one_work+0x133/0x330
[   14.357429]  [] ? worker_thread+0x60/0x470
[   14.357430]  [] ? process_one_work+0x330/0x330
[   14.357434]  [] ? kthread+0xca/0xe0
[   14.357436]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357439]  [] ? ret_from_fork+0x3f/0x70
[   14.357441]  [] ? 
kthread_create_on_node+0x170/0x170

[   14.357443] ---[ end trace 9984cc4b0e89f818 ]---
[   14.357443] [ cut here ]
[   14.357446] WARNING: CPU: 2 PID: 102 at net/netfilter/core.c:143 
netfilter_net_exit+0x25/0x50()

[   14.357446] nf_unregister_net_hook: hook not found!
[   14.357472] Modules linked in: iptable_security(+) iptable_raw 
iptable_filter ip_tables x_tables input_polldev bnep binfmt_misc nfsd 
auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc uvcvideo 
videobuf2_vmalloc iTCO_wdt arc4 videobuf2_memops iTCO_vendor_support 
intel_rapl iosf_mbi videobuf2_v4l2 x86_pkg_temp_thermal intel_powerclamp 
btusb coretemp snd_hda_codec_hdmi iwldvm videobuf2_core btrtl kvm_intel 
v4l2_common mac80211 videodev btbcm snd_hda_codec_conexant btintel media 
kvm snd_hda_codec_generic bluetooth psmouse thinkpad_acpi iwlwifi 
snd_hda_intel pcspkr serio_raw snd_hda_codec nvram cfg80211 snd_hwdep 
snd_hda_core rfkill i2c_i801 lpc_ich snd_pcm mfd_core snd_timer evdev 
snd soundcore shpchp tpm_tis tpm algif_skcipher af_alg crct10dif_pclmul 
crc32_pclmul crc32c_intel aesni_intel
[   14.357478]  ehci_pci sdhci_pci aes_x86_64 glue_helper ehci_hcd 
e1000e lrw ablk_helper sg sdhci cryptd sd_mod ptp mmc_core usbcore 
usb_common pps_core
[   14.357480] CPU: 2 PID: 102 Comm: kworker/u16:3 Tainted: G U  W   
4.4.0-rc6-x220-20151224+ #1
[   14.357481] Hardware name: LENOVO 42912ZU/42912ZU, BIOS 8DET69WW 
(1.39 ) 07/18/2013

[   14.357484] Workqueue: netns cleanup_net
[   14.357486]  81a27dfd 81359c69 88030e7cbd40 
81060297
[   14.357488]  88030e820db8 88030e7cbd90 81c962d8 
81c962e0
[   14.357489]  88030e7cbdf8 81060317 81a2c010 
88030018

[   14.357490] Call Trace:
[   14.357493]  [] ? dump_stack+0x40/0x57
[   14.357495]  [] ? warn_slowpath_common+0x77/0xb0
[   14.357497]  [] ? warn_slowpath_fmt+0x47/0x50
[   14.357499]  [] ? 

Re: [PATCHv2 net-next 0/3] Ethtool support for phy stats

2015-12-28 Thread Florian Fainelli
On December 28, 2015 12:02:13 PM PST, David Miller  wrote:
>From: Florian Fainelli 
>Date: Mon, 28 Dec 2015 09:27:25 -0800
>
>> On December 28, 2015 12:58:25 AM PST, Andrew Lunn 
>wrote:
>>>On Mon, Dec 28, 2015 at 12:44:27AM -0500, David Miller wrote:
 From: David Miller 
 Date: Mon, 28 Dec 2015 00:32:43 -0500 (EST)
 
 > From: Andrew Lunn 
 > Date: Sun, 27 Dec 2015 12:58:25 +0100
 > 
 >> This patchset add ethtool support for reading statistics from
>the
>>>PHY.
 >> The Marvell and Micrel Phys are then extended to report receiver
 >> packet errors and idle errors.
 >> 
 >> v2:
 >>   Fix linking when phylib is not enabled.
 > 
 > Series applied, thanks Andrew.
 
 I have to revert this.
 
 You can't call into PHY library code from net/core/ethtool.c,
 because PHY lib can be modular.
>>>
>>>O.K.
>>>
>>>How about we change PHYLIB from a tristate to a bool?
>> 
>> Could we make this in a two layer design instead? Core ethtool
>changes call into getter functions which are implemented by the netdev
>driver, if the netdev driver happens to use PHYLIB, calls into the
>PHYLIB helpers with its phydev? That would also allow a network driver
>already supporting PHY stats to report them through your new interface.
>
>Or just put the helpers in net/core/ethtool.c

That would work, Intel drivers could benefit from that interface if it is not 
too phydev centric. We would have to know if the netdev uses phylib, but that's 
already the case for things like WoL and MII ioctl.


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: phy: adds backplane driver for Freescale's PCS PHY

2015-12-28 Thread Florian Fainelli
Le 18/12/2015 01:30, shh@gmail.com a écrit :
[snip]

> +struct training_state_machine {
> + bool bin_m1_late_early;
> + bool bin_long_late_early;
> + bool bin_m1_stop;
> + bool bin_long_stop;
> + bool tx_complete;
> + bool an_ok;
> + bool link_up;
> + bool running;
> + bool sent_init;
> + int m1_min_max_cnt;
> + int long_min_max_cnt;

Can you change this into a succession of enum values to make it clear
how many states there are, and how to get from one to the other?

[snip]

> +
> +static void init_state_machine(struct training_state_machine *s_m)
> +{
> + s_m->bin_m1_late_early = true;
> + s_m->bin_long_late_early = false;
> + s_m->bin_m1_stop = false;
> + s_m->bin_long_stop = false;
> + s_m->tx_complete = false;
> + s_m->an_ok = false;
> + s_m->link_up = false;
> + s_m->running = false;
> + s_m->sent_init = false;
> + s_m->m1_min_max_cnt = 0;
> + s_m->long_min_max_cnt = 0;

That alone is a good indication that your state machine is not readable.

> +}
> +
> +void tune_tecr0(struct fsl_xgkr_inst *inst)
> +{
> + struct per_lane_ctrl_status *reg_base;
> + u32 val;
> +
> + reg_base = (struct per_lane_ctrl_status *)inst->reg_base;

Typical practice is not to cast a register base address into a pointer
to a structure, but instead use offsets to the base register address,
which is less error prone and more readable, anything that uses
inst->reg_base in this driver is modeled after a structure pattern,
please fix that.

> +
> + val = TECR0_INIT |
> + inst->adpt_eq << ZERO_COE_SHIFT |
> + inst->ratio_preq << PRE_COE_SHIFT |
> + inst->ratio_pst1q << POST_COE_SHIFT;
> +
> + /* reset the lane */
> + iowrite32be(ioread32be(_base->gcr0) & ~GCR0_RESET_MASK,
> + _base->gcr0);
> + udelay(1);
> + iowrite32be(val, _base->tecr0);
> + udelay(1);
> + /* unreset the lane */
> + iowrite32be(ioread32be(_base->gcr0) | GCR0_RESET_MASK,
> + _base->gcr0);
> + udelay(1);

Since this is a reset control register, you might want to explore using
the reset controller subsystem in case this register serves multiple
peripherals.

[snip]

I skipped through all the state machine logic, but you should consider
trying to simplify it using different enum values and a switch() case()
statements to make it easy to audit and understand.

> +
> +static int fsl_backplane_probe(struct phy_device *phydev)
> +{
> + struct fsl_xgkr_inst *xgkr_inst;
> + struct device_node *child, *parent, *lane_node;
> + const char *lane_name;
> + int len;
> + int ret;
> + u32 mode;
> + u32 lane[2];
> +
> + child = phydev->dev.of_node;
> + parent = of_get_parent(child);
> + if (!parent) {
> + dev_err(>dev, "could not get parent node");
> + return 0;
> + }
> +
> + lane_name = of_get_property(parent, "lane-instance", );
> + if (!lane_name)
> + return 0;
> +
> + if (strstr(lane_name, "1000BASE-KX"))
> + mode = BACKPLANE_1G_KX;
> + else
> + mode = BACKPLANE_10G_KR;

That seems like I could put whatever value in "lane-instance" and as
long as it contains 1000BASE-KX we are golden, that does not sound very
robust nor well defined.

> +
> + lane_node = of_parse_phandle(child, "lane-handle", 0);
> + if (lane_node) {

Treat this as an error so you can return early and reduce indentation.

> + struct resource res_lane;
> +
> + ret = of_address_to_resource(lane_node, 0, _lane);
> + if (ret) {
> + dev_err(>dev, "could not obtain memory map\n");
> + return ret;
> + }
> +
> + ret = of_property_read_u32_array(child, "lane-range",
> +  (u32 *), 2);
> + if (ret) {
> + dev_info(>dev, "could not get lane-range\n");
> + return -EINVAL;
> + }

Is not the standard "ranges" property usable here?

> +
> + phydev->priv = devm_ioremap_nocache(>dev,
> + res_lane.start + lane[0],
> + lane[1]);

What about the "reg" property here?

> +
> + if (!phydev->priv) {
> + of_node_put(lane_node);
> + return -EINVAL;
> + }
> + of_node_put(lane_node);
> + } else {
> + return -EINVAL;
> + }
> +
> + if (mode == BACKPLANE_1G_KX) {
> + phydev->speed = SPEED_1000;
> + /* configure the lane for 1000BASE-KX */
> + lane_set_1gkx(phydev->priv);
> + return 0;
> + }
> +
> + xgkr_inst = kzalloc(sizeof(*xgkr_inst), GFP_KERNEL);
> + if (!xgkr_inst)
> + goto mem_err1;

devm_kzalloc()?

> +
> + 

Re: [PATCH 2/4] wcn36xx: Change indication list lock to spinlock

2015-12-28 Thread Bjorn Andersson
On Mon 28 Dec 15:06 PST 2015, Stephen Hemminger wrote:

> On Sun, 27 Dec 2015 17:34:25 -0800
> Bjorn Andersson  wrote:
> 
> > In preparation for handling incoming messages from IRQ context, change
> > the indication list lock to a spinlock
> > 
> > Signed-off-by: Bjorn Andersson 
> > ---
> >  drivers/net/wireless/ath/wcn36xx/smd.c | 12 ++--
> >  drivers/net/wireless/ath/wcn36xx/wcn36xx.h |  2 +-
> >  2 files changed, 7 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/wireless/ath/wcn36xx/smd.c 
> > b/drivers/net/wireless/ath/wcn36xx/smd.c
> > index 6b5dbe6f0d0a..4307429740a9 100644
> > --- a/drivers/net/wireless/ath/wcn36xx/smd.c
> > +++ b/drivers/net/wireless/ath/wcn36xx/smd.c
> > @@ -2165,10 +2165,10 @@ static void wcn36xx_smd_rsp_process(struct wcn36xx 
> > *wcn, void *buf, size_t len)
> > msg_ind->msg_len = len;
> > memcpy(msg_ind->msg, buf, len);
> >  
> > -   mutex_lock(>hal_ind_mutex);
> > +   spin_lock(>hal_ind_lock);
> 
> If you are going to handle messages in IRQ context, that better be a
> spin_lock_irq() or spin_lock_bh().

This function is executed in IRQ context after the next patch, as such I
use spin_lock() here and spin_lock_irqsave() in the worker thread
(wcn36xx_ind_smd_work()).

Is this not how the spin_lock API should be used?

Regards,
Bjorn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator

2015-12-28 Thread Ivan Safonov

On 12/29/2015 07:31 AM, Joe Perches wrote:

On Tue, 2015-12-29 at 01:38 +0700, Ivan Safonov wrote:

On 12/29/2015 12:56 AM, Joe Perches wrote:

On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote:

((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer ==
X) are equal for X >= 0.

X is not guaranteed to be >= 0 here

X is fixed constant. In this case X is {0, 1, 2}.

Looks like it can be -1 to me (range: -1, 0, 1, 2)

static int ar9003_hw_get_thermometer(struct ath_hw *ah)
{
struct ar9300_eeprom *eep = >eeprom.ar9300_eep;
struct ar9300_base_eep_hdr *pBase = >baseEepHeader;
int thermometer =  (pBase->miscConfiguration >> 1) & 0x3;

return --thermometer;
}


X is not thermometer. The thermometer is {-1, 0, 1, 2}. X is {0, 1, 2}.
All possible X valueswritten in the comments:

ar9003_hw_get_thermometer used only in ar9003_hw_thermometer_apply:

static void ar9003_hw_thermometer_apply(struct ath_hw *ah)
{
struct ath9k_hw_capabilities *pCap = >caps;
int thermometer = ar9003_hw_get_thermometer(ah);
u8 therm_on = (thermometer < 0) ? 0 : 1;

REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);
if (pCap->chip_chainmask & BIT(1))
REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);
if (pCap->chip_chainmask & BIT(2))
REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON_OVR, therm_on);

therm_on = (thermometer < 0) ? 0 : (thermometer == 0); /* X = 0 */
REG_RMW_FIELD(ah, AR_PHY_65NM_CH0_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
if (pCap->chip_chainmask & BIT(1)) {
therm_on = (thermometer < 0) ? 0 : (thermometer == 1);
/* X = 1 */

REG_RMW_FIELD(ah, AR_PHY_65NM_CH1_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
}
if (pCap->chip_chainmask & BIT(2)) {
therm_on = (thermometer < 0) ? 0 : (thermometer == 2);
/* X = 2 */

REG_RMW_FIELD(ah, AR_PHY_65NM_CH2_RXTX4,
  AR_PHY_65NM_CH0_RXTX4_THERM_ON, therm_on);
}
}

There is no X = -1.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netconsole: Initialize after all core networking drivers

2015-12-28 Thread Cong Wang
On Wed, Dec 23, 2015 at 3:03 PM, Calvin Owens  wrote:
> Commit 7332a13b038be05c ("vxlan: defer vxlan init as late as possible")
> changed vxlan to use late_initcall(), because vxlan relies on ipv6 being
> loaded when a new device is opened.
>
> This causes netconsole to panic at boot when configured via the kernel
> cmdline on an IXGBE NIC, because ixgbe_open() assumes that vxlan has
> already been initialized:

BTW, it is not fair to blame my commit above, since it was merged long before
ixgbe or any hardware driver has vxlan support. Not a big deal, just FYI. ;)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netconsole: Initialize after all core networking drivers

2015-12-28 Thread Cong Wang
On Thu, Dec 24, 2015 at 2:25 AM, Hannes Frederic Sowa
 wrote:
> Hi,
>
> On 24.12.2015 00:03, Calvin Owens wrote:
>> This patch addresses the issue cited in 7332a13b038be05c by making vxlan
>> actually check if ipv6 is loaded, and reverts it to module_init() so
>> that it becomes device_initcall() when built-in, eliminating the
>> netconsole issue.
>>
>> The ipv6 module is permanent, so there's no need to actually do the
>> usual module_get/module_put dance: once we find it loaded, we can just
>> assume that it always will be.
>>
>> AFAICS, nothing actually ends up calling vxlan_open() during initcalls,
>> so in the (IPV6=y && VXLAN=y) case we can't end up there before ipv6 has
>> initialized.
>>
>> Signed-off-by: Calvin Owens 
>
> This architecture just sucks. :(
>
>
> ixgbe should not have to call into vxlan but vxlan has to call to ixgbe.
> Thus the vxlan_get_rx_port is absolutely unnecessary and should be
> removed. This also lets ixgbe depend on vxlan which is absurd.
>
> Simply let vxlan_get_rx_port be called from vxlan_notifier_block on
> NETDEV_REGISTER or NETDEV_UP events, which is already available.
>
> For the second vxlan_get_rx_port case, which is a
> IXGBE_FLAG2_VXLAN_REREG_NEEDED needed event, I would suggest we also
> push that over to the vxlan_notifier_block, maybe with a new event type
> for the notifiers.
>
> After this change ixgbe would not depend on vxlan module any more.
>

Agreed with this direction, but not yet convinced with this approach.

My feeling is that we should move this piece of code into core
networking, so that both vxlan and ixgbe driver can call it without
worrying about the dependencies. It looks like all we need here
is a per-netns structure about family and ports for all vxlans.
But I could overlook something here.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 00/26] Phylink & SFP support

2015-12-28 Thread Dustin Byford
Hi Florian,

On Sun Dec 27 18:08, Florian Fainelli wrote:
> On December 14, 2015 11:26:21 PM PST, Dustin Byford 
>  wrote:
> >On Mon Dec 07 17:35, Russell King - ARM Linux wrote:
> >> Hi,
> >
> >Hello.
> >
> >> SFP modules are hot-pluggable ethernet transceivers; they can be
> >> detected at runtime and accordingly configured.  There are a range of
> >> modules offering many different features.
> >> 
> >> Some SFP modules have PHYs conventional integrated into them, others
> >> drive a laser diode from the Serdes bus.  Some have monitoring,
> >others
> >> do not.
> >> 
> >> Some SFP modules want to use SGMII over the Serdes link, others want
> >> to use 1000base-X over the Serdes link.
> >> 
> >> This makes it non-trivial to support with the existing code
> >structure.
> >> Not wanting to write something specific to the mvneta driver, I
> >decided
> >> to have a go at coming up with something more generic.
> >> 
> >> My initial attempts were to provide a PHY driver, but I found that
> >> phylib's state machine got in the way, and it was hard to support two
> >> chained PHYs.  Conversely, having a fixed DT specified setup (via
> >> the fixed phy infrastructure) would allow some SFP modules to work,
> >but
> >> not others.  The same is true of the "managed" in-band status (which
> >> is SGMII.)
> >> 
> >> The result is that I came up with phylink - an infrastructure layer
> >> which sits between the network driver and any attached PHY, and a
> >> SFP module layer detects the SFP module, and configures phylink
> >> accordingly.
> >> 
> >> Overall, this supports:
> >> 
> >> * switching the serdes mode at the NIC driver
> >> * controlling autonegotiation and autoneg results
> >> * allowing PHYs to be hotplugged
> >> * allowing SFP modules to be hotplugged with proper link indication
> >> * fixed-mode links without involving phylib
> >> * flow control
> >> * EEE support
> >> * reading SFP module EEPROMs
> >> 
> >> Overall, phylink supports several link modes, with dynamic switching
> >> possible between these:
> >> * A true fixed link mode, where the parameters are set by DT.
> >> * PHY mode, where we read the negotiation results from the PHY
> >registers
> >>   and pass them to the NIC driver.
> >> * SGMII mode, where the in-band status indicates the speed, duplex
> >and
> >>   flow control settings of the link partner.
> >> * 1000base-X mode, where the in-band status indicates only duplex and
> >>   flow control settings (different, incompatible bit layout from
> >SGMII.)
> >
> >I've been working on some similar code to handle interactions with a
> >wide range of SFF modules, 1G to 100G, on Linux network switches for
> >some time.  For practical reasons a lot of that was in userspace but
> >I've been planning and recently working on an SFF kernel driver that
> >does some of what's done in this series.  I think the model you're
> >proposing is right on, and since you're further along in implementation
> >I'd like to help round out support for the other SFF modules if I can.
> >Then make this work on the network ASICs I have access to.
> >
> >Any concrete plans for QSFP or the new 25G modules?
> >
> >> Ethtool support is included, as well as emulation of the MII
> >registers
> >> for situations where a PHY is not attached, giving compatible
> >emulation
> >> of existing user interfaces where required.
> >> 
> >> The patches here include modification of mvneta (against 4.4-rc1, so
> >> probably won't apply to current development tips.)  It basically
> >> hooks into the places where the phylib would hook into.
> >> 
> >> DT wise, the changes needed to support SFP look like this (example
> >> taken from Clearfog):
> >> 
> >>ethernet@34000 {
> >> +  managed = "in-band-status";
> >>phy-mode = "sgmii";
> >>status = "okay";
> >> -
> >> -  fixed-link {
> >> -  speed = <1000>;
> >> -  full-duplex;
> >> -  };
> >>};
> >> ...
> >> +  sfp: sfp {
> >> +  compatible = "sff,sfp";
> >> +  i2c-bus = <>;
> >> +  los-gpio = < 12 GPIO_ACTIVE_HIGH>;
> >> +  moddef0-gpio = < 15 GPIO_ACTIVE_LOW>;
> >> +  sfp,ethernet = <>;
> >
> >Using  is unambiguous in the this case because there's only one
> >serdes and one mac involved.  To specify the mac/serdes/cage
> >associations at the same level of detail as the gpios it might be nice
> >(at least for some devices) to point to a serdes node (or 4 in the case
> >of QSFP) instead of   Any thoughts on that?
> 
> Using a phandle here allows for quite a lot of flexibility on how you want to 
> associate a given SFP to its data plane partner. I do not think we need to 
> get more strict than that strictly mandate an actual Ethernet controller 
> node. These Marvell adapters typically have one or more " 

New "ip wait" subcommand for iproute2

2015-12-28 Thread Nathaniel W Filardo
Hallo netdev@,

I had occasion to want to programmatically wait for an interface to become
available from within a shell script, but found there to be no off-the-shelf
tool for such a thing.  Could this patch be considered for inclusion as part
of iproute2?  It adds an "ip wait link" subcommand ("link" required in case
someone wants to add things like "ip wait addr" or somesuch) based quite
heavily on the ipmonitor.c file.

For example, one might "ip wait link dev eth0 up" to wait for an interface
of that name to appear (specifically, for a RTM_NEWLINK message).  "ip wait
link dev eth0 down" will wait for it to go away (RTM_DELLINK).

This should be checkpatch clean, but please let me know if I missed
something.

Cheers,
--nwf;

From f6e0acbc07af9650cb5e66c0dec08e6e3e878f79 Mon Sep 17 00:00:00 2001
From: Nathaniel Wesley Filardo 
Date: Mon, 28 Dec 2015 18:35:17 -0500
Subject: [PATCH] New 'ip wait' command

Signed-off-by: Nathaniel Filardo 
---
 ip/Makefile|   2 +-
 ip/ip.c|   3 +-
 ip/ip_common.h |   1 +
 ip/ipwait.c| 149 +
 4 files changed, 153 insertions(+), 2 deletions(-)
 create mode 100644 ip/ipwait.c

diff --git a/ip/Makefile b/ip/Makefile
index f3d2987..cf12af7 100644
--- a/ip/Makefile
+++ b/ip/Makefile
@@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o 
ipnetns.o \
 iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \
 link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \
 iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \
-iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o
+iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipwait.o
 
 RTMONOBJ=rtmon.o
 
diff --git a/ip/ip.c b/ip/ip.c
index eea00b8..36d8a24 100644
--- a/ip/ip.c
+++ b/ip/ip.c
@@ -51,7 +51,7 @@ static void usage(void)
 "   ip [ -force ] -batch filename\n"
 "where  OBJECT := { link | address | addrlabel | route | rule | neighbor | 
ntable |\n"
 "   tunnel | tuntap | maddress | mroute | mrule | monitor | 
xfrm |\n"
-"   netns | l2tp | fou | tcp_metrics | token | netconf }\n"
+"   netns | l2tp | fou | tcp_metrics | token | netconf | wait 
}\n"
 "   OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n"
 "-h[uman-readable] | -iec |\n"
 "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | 
link } |\n"
@@ -97,6 +97,7 @@ static const struct cmd {
{ "mrule",  do_multirule },
{ "netns",  do_netns },
{ "netconf",do_ipnetconf },
+   { "wait",   do_ipwait },
{ "help",   do_help },
{ 0 }
 };
diff --git a/ip/ip_common.h b/ip/ip_common.h
index 9a846df..5670b9b 100644
--- a/ip/ip_common.h
+++ b/ip/ip_common.h
@@ -54,6 +54,7 @@ int do_ipfou(int argc, char **argv);
 int do_tcp_metrics(int argc, char **argv);
 int do_ipnetconf(int argc, char **argv);
 int do_iptoken(int argc, char **argv);
+int do_ipwait(int argc, char **argv);
 int iplink_get(unsigned int flags, char *name, __u32 filt_mask);
 
 static inline int rtm_get_table(struct rtmsg *r, struct rtattr **tb)
diff --git a/ip/ipwait.c b/ip/ipwait.c
new file mode 100644
index 000..998fb21
--- /dev/null
+++ b/ip/ipwait.c
@@ -0,0 +1,149 @@
+/*
+ * ipwait.c"ip wait".
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Alexey Kuznetsov,  (ipmonitor.c)
+ *  Nathaniel Filardo  (this file)
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "utils.h"
+#include "ip_common.h"
+
+static void usage(void) __attribute__((noreturn));
+int listen_all_nsid;
+int wait_for;
+char *wait_dev;
+int verbose;
+
+static void usage(void)
+{
+   fprintf(stderr,
+   "Usage: ip wait [all-nsid] [verbose] link dev DEVICE {up | 
down}");
+   exit(-1);
+}
+
+static int accept_msg(const struct sockaddr_nl *who,
+ struct rtnl_ctrl_data *ctrl,
+ struct nlmsghdr *n, void *arg)
+{
+   int done = 0;
+
+   if (n->nlmsg_type == RTM_NEWLINK || n->nlmsg_type == RTM_DELLINK) {
+   if (wait_for == n->nlmsg_type
+   && wait_for == RTM_DELLINK
+   && ll_name_to_index(wait_dev) != 0)
+   done = 1;
+
+   ll_remember_index(who, n, NULL);
+   if (verbose)
+   print_linkinfo(who, n, stdout);
+
+   if (wait_for == n->nlmsg_type
+   && wait_for == RTM_NEWLINK
+ 

Re: [PATCH] /drivers/net/wireless/ath/ath9k remove unnecessary ?: operator

2015-12-28 Thread Joe Perches
On Tue, 2015-12-29 at 01:38 +0700, Ivan Safonov wrote:
> On 12/29/2015 12:56 AM, Joe Perches wrote:
> > On Mon, 2015-12-28 at 20:48 +0700, Ivan Safonov wrote:
> > > ((thermometer < 0) ? 0 : (thermometer == X)) and (thermometer ==
> > > X) are equal for X >= 0.
> > X is not guaranteed to be >= 0 here
> 
> X is fixed constant. In this case X is {0, 1, 2}.

Looks like it can be -1 to me (range: -1, 0, 1, 2)

static int ar9003_hw_get_thermometer(struct ath_hw *ah)
{
struct ar9300_eeprom *eep = >eeprom.ar9300_eep;
struct ar9300_base_eep_hdr *pBase = >baseEepHeader;
int thermometer =  (pBase->miscConfiguration >> 1) & 0x3;

return --thermometer;
}

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html